The role and challenges of speech technology in global UX research

August 31, 2023

Maybe you want to market your new product to international markets, or understand how to make your existing product or service accessible to users across multiple languages. Gathering information from users in your target population is a proven strategy for aligning your product with relevant user needs. Advanced digital tools – like automatic speech recognition (ASR) – are expanding the horizons of global user experience (UX) research to help companies and manufacturers understand a diverse range of users’ needs.

Despite developments in ASR technology, however, the essence of high-quality global studies still heavily relies on human touch. Researchers, interpreters, and transcriptionists bring invaluable nuance and depth of understanding to participant feedback, often capturing subtleties that even the most advanced technologies may miss.

In this piece – the first in a two-part series on speech technology and global UX research – we’ll discuss the state of ASR tools today, the hurdles they still need to overcome, and how researchers can harness these and other forms of speech technology to support better outcomes.

Current state of ASR: Self-supervised learning

At some point in the global research process, professionals may consider using ASR tools to aid in cross-language communication. These tools listen to spoken words in real time and use machine learning to quickly convert speech into text in a language of choice.

Traditionally, these machine learning algorithms are trained on structured, labeled data to identify parts of speech, sentence structures, etc., and these labels can only draw on Internet-available resources – a problem for “low-resource” languages like Hindi or Yoruba, which lack the digital footprint to sufficiently enable machine learning (we dive more into the concept of language resources in the second part of this series).

Because of these limitations, widely used ASR tools leave thousands of languages and dialects unsupported. For global UX research, that’s far from ideal. Researchers interviewing participants in, say, Delhi or Lagos, most likely won’t be able to leverage ASR tools in their current state to facilitate the research process and communicate across languages.

However, new technological advances are poised to transform the way ASR tools – and global UX researchers – translate and transcribe speech. A handful of newer tools use self-supervised machine learning to understand unstructured, unlabeled data. The underlying algorithms can recognize linguistic patterns and similarities on their own, allowing them to learn more from inputs in historically low-resource languages.

This new wave of speech technology is still in its early days. But in time, self-supervised ASR tools will be able to recognize far more languages than their older counterparts. The potential advantage for global UX research: better transcriptions and translations, which will enable high-quality (and more cost-effective) studies.

Speech technology still faces two big hurdles: quality and trust

ASR tools hold a lot of promise for UX research, but despite recent advances, this kind of technology still suffers from issues when it comes to quality and trust.

ASR tools struggle to accurately parse heavily accented speech (like Igbo-accented English) or “nonstandard” dialects (such as Barese in Italy). The specific context of speech – such as literal, figurative, or idiomatic meaning – is also often lost.

Quality research relies on accurate transcriptions and translations. So UX research teams continue to invest in human interpreters, transcribers, or translators – which can prove costly and time-intensive at scale.

Our global AI study found that current quality and trust limitations ultimately impact UX researchers’ trust in ASR tools for several languages, preventing them from adopting the technology consistently. What’s more, researchers often face serious usability challenges with this technology, which we’ll explore further in the next part of this series.

The bottom line, for now: ASR tools have come a long way. But they have clear barriers to overcome before they’re a fully reliable resource for interpreting participant feedback. Human interpreters continue to provide more accurate translations, transcriptions, and the cross-language context and nuance needed to fully understand participants’ experiences.

The path forward for UX research: handle speech technology with care

Even with their limitations, ASR tools can still be more efficient than manual transcription and translation for testing in certain international markets. And with the right knowledge and planning, researchers can get the most out of this technology. Here’s what we recommend:

  • Understand the fidelity of transcription and translation needed. Some clients may prefer verbatim participant feedback, while others may only need approximations. Approximations may suffice for high-level overviews or initial insights, but they can lack the nuanced understanding required for deeper analyses. If a study hinges on understanding the subtle feelings, perceptions, or capturing quotes from participants, it’s essential to work from verbatim or near-verbatim transcriptions. Consider a hybrid approach, where ASR tools provide an initial transcription, which is then refined by human transcribers for accuracy. Local partners, with their nuanced understanding of the language and culture, can analyze data in-depth and can then employ and supervise ASR tools to swiftly translate and provide high-level feedback in the client’s language, ensuring both depth of analysis and efficient communication.
  • Clarify clients’ and your teams’ data privacy needs. This is key for AI-powered technology, which often uses inputs to continuously train machine learning algorithms. When utilizing AI-powered tools like ASR, it’s crucial to be aware of what happens to the data—your inputs. These inputs, typically audio recordings or transcriptions, can sometimes be retained by the system for continuous learning and refinement of its algorithms. Without stringent data management policies, there’s a risk that participant data, possibly containing sensitive or personal information, might be stored longer than anticipated, shared with third parties, or inadequately anonymized. It’s vital to ensure that any tool or service you engage with has robust data privacy measures in place.
  • Take some time to explore the range of ASR tools available. Newer entrants to the field may bring fresh perspectives and innovations and be better optimized for cost and quality. Additionally, some newer tools may prioritize data privacy more rigorously, aligning with modern concerns and regulations.

With a strategic approach to ASR use, UX researchers can provide insights to improve the reach and usability of new products in global markets. This enables businesses to build user-centric products that resonate with diverse audiences worldwide – a win for everyone.

Global research needs human experts

Speech technology is a fundamental part of global UX research. But it’s only one piece of the puzzle.

Human researchers do the work of interviewing participants in multiple languages, interpreting their insights, and turning them into actionable recommendations. What’s more, they’re invested in staying abreast of new speech technology and its impact on global research.

At Bold Insight, we’re actively engaged in learning from speech technologies as they emerge. We love shiny new tools, and are currently discovering how intelligent tools can streamline our process and research insights while upholding our commitment to quality and accuracy. We’ll be presenting our research on global perceptions of AI tools in UX Research on September 28th at the UX Masterclass in Zaragoza, Spain.

If you’d like to learn more about our global research expertise, let’s start a conversation! We’d love to hear from you.