As conversational technologies become more widely used (from smart speakers to smartphones), it’s essential that they are accessible and present an enjoyable experience for all users, including those who may not be fluent in the language they are using. This is not only a worthy global challenge but also matters in domestic markets. For example, in the UK, 8.9% of residents aged 3+ do not speak English or Welsh as their main language (this number has grown over the past ten years).
Research into the experiences of first and second-language English speakers identified several commonalities and differences in how both groups approached making requests to voice assistants in smartphones and smart speakers. For example, we know people rarely speak the entirety of every sentence perfectly: listen carefully in any conversation and you’ll hear people repeat themselves or revise what they’re saying after they’ve started talking. This sort of ‘second saying’ is performed by first and second-language speakers when using a voice interface. As in conversations with other people, if the recipient does not understand, people might repeat an instruction to a device using hyper-articulation and louder volume in successive requests (i.e., by over-emphasising sounds or pauses).
In addition to these common challenges, second-language speakers also encounter specific issues—for example, as they attempt to ‘word-find’ and say words before a voice interface assumes an instruction is complete. Trying to say a complete instruction is hard for people who need to translate their request to a device in real-time, and thus they may pause mid-sentence.
In response to this, technologies could adopt a greater tolerance for silence or even attempt to detect and try to help users complete their original instruction. What would that look like? Any improvement for this particular group of users is likely to benefit others—research shows that people who stammer also struggle to speak their instructions to a speech interface before it times out.
Second-language speakers also ‘code mix’ (swapping language mid-request) to accomplish their request promptly before devices time out. Unfortunately, the conversational technologies do not understand participants’ requests in many of these cases, and there needs to be more technical work with automatic speech recognition to handle multiple concurrent languages. How would interfaces handle this? Could we find better solutions now to make these contexts easier for users?
These sorts of findings highlight the challenges conversational interfaces have in supporting second-language speakers and how future technology could respond to them. However, as research in conversational technologies matures, enhancing their accessibility will improve the experience for all users—everyone benefits from a ‘smarter’ interface! Improvements designed for some users’ experiences will positively impact everyone, and this can only be a good thing.
Members of the Bold Insight team continue to be at the forefront of emerging research in this area. We are co-organising the upcoming Association for Computing Machinery (ACM) Conversational User Interfaces 2023 conference with the theme of Designing for inclusive conversation, as well as a workshop at CHI 2023 on Inclusive Design of CUIs Across Modalities and Mobilities. Combined with underlying technological improvements, we can ensure by uncovering qualitative insights around accessibility, everyone can benefit from voice-enabled technologies with equity.