A voice for everyone: Embracing diversity in speech technology

March 27, 2023

Voice assistants continue to be a growing market. Estimates suggest that 42% of the US population currently uses voice assistants, and 38% of UK adults have smart speakers such as  Amazon Echo and Google Nest devices. The mass adoption of voice assistants in smart speakers, phones, wearables, and connected-home devices has heralded a new era of pervasive speech technology. While this widespread ownership presents an exciting opportunity for new modalities of interaction, there is still an opportunity to improve the accessibility of these technologies, including for those with diverse speech patterns.

Despite the improvements in automatic speech recognition (ASR)—the technology that turns spoken sounds into text—voice assistants still encounter errors when interpreting speech, especially dysfluent speech (e.g., if someone has a stammer/stutter). Improving voice recognition through better training is essential to improving the inclusivity of voice assistants; there are a number of efforts in this regard, including those by Google, Mozilla and the University of Illinois.

But we cannot solely rely on progress in the development of the technology and must continue to learn from how people interact with these products. For example, recent research on people who stammer shows they often adapt how they talk to voice assistants by planning their commands, slowing down their speech or repeating commands multiple times. Other research has shown that non-native English speakers rehearse commands given to voice assistants in English, use more pauses, and rely more on visual feedback to foster successful interactions.

For a more natural (and useful) conversation, people must be able to speak to these technologies without having to excessively modify their speech. Broadening access to speech technologies for some users can improve the experience for all users (or as Microsoft put it: solve for one, extend to many). While progress in ASR research will accommodate a broader range of speakers, we must also understand how to improve the design of these interfaces. As UX researchers, we can implement methods like design fiction to imagine inclusive interfaces and work directly with the communities with whom we want to create a more inclusive, voice-capable future.

Members of the Bold Insight team continue to be at the forefront of emerging research in this area. We are co-organising the upcoming Association for Computing Machinery (ACM) Conversational User Interfaces 2023 conference with the theme of Designing for inclusive conversation, as well as an upcoming workshop at CHI 2023 on Inclusive Design of CUIs Across Modalities and Mobilities. Combined with underlying technological improvements, we can ensure that everyone can benefit from voice-enabled technologies with equity.