Blog

Is your research ready for the unpredictable world of LLMs?

October 22, 2025

Your team is eager to launch AI-powered products, but is your research strategy ready? The inherent unpredictability of large language models (LLMs) means that conventional UX methods, built on the premise of predictable product experiences, break down when dealing with systems that generate infinite, variable outcomes.

At EPIC2025, Bold Insight Director Larry Becker and Katie Johnson, Global Head of Research & Consumer Insights at Panasonic Well, presented their case study, From chaos to innovation: Understanding products and people in a non-deterministic world. Co-authored with other Bold Insight team members, the paper illustrates this critical need to reinvent UX research methods to safely and effectively develop LLM-based products.

When variability is a feature, not a bug, it introduces risk, demanding new approaches

We are accustomed to judging product success by consistent, predictable behavior. Traditional small-sample usability studies yield high-value insights because the product remains the same for every user.

LLMs shatter this paradigm. An LLM’s response to the same prompt can differ significantly between two users, and even when a single user asks the same question again. With LLM-based products, each interaction varies not just between users but within individual user experiences.

This stochastic nature collides with the accelerated pace of AI development. Product teams feel the pressure to “Just Ship It,” and researchers are often seen as speed bumps. Compounding the challenge are new tools, like synthetic users and AI analysis of raw data, which tempt product teams with speed but introduce risks like hallucinations, flattened identity models, and abstraction that diminishes lived experience.

Recommendation: To succeed, researchers must understand when and how to invest the time to produce robust results and when to resist the anxious call for speed.

A single bot misstep can do irreparable damage

The Panasonic Well and Bold Insight study focused on building an AI-powered product for family care and coordination. We were aware that while all LLM products are impacted by unpredictability, it compounds in high-stakes areas like family wellness. Even without AI, families are complex, unpredictable systems shaped by social roles, care responsibilities, and emotional asymmetries. Inserting an LLM into this high-stakes, multi-user setting requires time and care.

Family dynamics typically reveal themselves gradually rather than all at once. LLMs require time to understand a family’s language and system, and they depend on humans to guide them through this process.

Recommendation: Resist the urge to generalize learnings from professional/transactional LLM use cases to personal/relational ones.

3 reasons we chose to simulate instead of prototype

With just eight weeks for a critical study, the team had a choice: build a functional, basic prototype or use a simulation to capture the intended use more closely. In this case, the simulation was a researcher filtering real-time LLM responses to personalize and curate them for the family. The team chose to simulate instead of ship for three key reasons that mapped directly to the planned product’s core features.

The digital hub needs to support groups, not just individuals

The planned product would use a single digital hub where family members would collaboratively interact with the AI assistant. At the time of the study, no existing collaboration platform (like Slack or Discord) was built for personal/relational use and had an integrated, omnipresent LLM assistant. Simulating this environment was necessary to observe and respond to the friction between established communication norms and the new AI presence.

Relationships are delicate and take time to earn

We hypothesized that relationships would begin to form between the families and the simulated LLM assistant and that these relationships would take time (multiple satisfying interactions) to move from trial to committed engagement. We also recognized that even a single misstep on the part of the (inherently unpredictable) bot could irreparably damage these relationships.

Inviting humans in requires precision

The product was designed to offer human support when the LLM reached its boundaries. We needed to learn:

How should the human be introduced?
What kind of human should be introduced?
How much time would the human need to create an impact?

All this pointed to a decision to simulate the bot’s presence in our digital hub over a series of interactions so that relationships would have time to form and useful patterns could be observed. This approach ensured the necessary relationship-building and allowed us to explore the critical boundary between AI and human support. Our simulation allowed us to deliberately shape the AI’s persona, build relationships over time, and test high-stakes interactions before committing design and engineering resources.

Recommendation: When you’re building products that introduce new paradigms, simulation often makes sense: If your product relies on building trust, learning complex user group dynamics, or defining the critical handoff from machine to human, a well-designed simulation may be superior to a fragile prototype.

Simulation delivered lasting impact

Our chosen method paid off, delivering three key insights that became the foundation for the product’s successful launch:

Digital hubs struggle to support groups:

We learned that existing digital communication systems are rigid and ill-equipped to scale time and topics for a group. This insight directly led to Panasonic Well gaining multiple engineering patents and redesigning the product’s digital backbone.

Relationships are earned:

Building relationships in a multi-player environment requires clear mental models. This finding continues to guide Panasonic Well and Bold Insight’s exploration of 1:1 vs. multi-player interactions.

Inviting humans in is essential:

In domains like wellness, AI cannot and should not do everything. Identifying the need for a human and escalating to the right one became a core differentiator for the product.

UX research needs a new playbook that we will evolve together

The world is just beginning to use LLMs regularly in personal and relational contexts. Short-changing research and simply “finding out by doing” has already led to costly, public failures in trust and safety.

Recommendations: As a community of researchers, we must:

Develop new longitudinal methods: Re-examine our recipes. Leverage diary, intercepts, interviews, simulations, chat logs, and new methods to understand how people interact with LLMs over time. We must not shy away from including well-designed simulations to study the building of ethical and safe stochastic products.

Include improvisation in our research craft

Craft flexible practices for studying and defining products that are different for each user and use case.
Adopt a willingness to experiment across all phases of research.
As Panasonic Well and Bold Insight’s research collaboration moves forward, we continue to refine our core simulation method to adapt to product needs and user needs.

Invent the future with a product

Use research not just to study what is, but to define what will be.
Think past basic LLM chatbots; study multiplayer, proactivity, and relationship-building now so the insights are ready when the technology arrives.

The collaboration between Panasonic Well and Bold Insight was fast-paced but deliberately thoughtful. At every step, we considered the unpredictability of LLMs and the high stakes of family wellness. We made research design decisions that would satisfy the product’s needs while always putting users’ well-being first. We are excited to be inventing the future of research, and we invite you to join us!

UX research & strategy

AI & emerging tech

Accessibility consulting

Team augmentation

Team Accelerator Program

Human factors research

Medical device research

IFU Scorecard

Global project management

Facilities rental

Automotive

Consumer technology

FemTech

Government

Healthcare

﻿Market leadership & growth

Confident launch & risk mitigation

Global expansion

Breakthrough innovation

Measurable business results

All case studies

Shop our

children's books!

Publications

Blog

Newsletter

Explore Bold perspectives

About us

Meet the team

Research method toolbox

Quality and compliance

Global presence

News

Our locations

UX research & strategy

AI & emerging tech

Accessibility consulting

Team augmentation

Team Accelerator Program

Human factors research

Medical device research

IFU Scorecard

Global project management

Facilities rental

Automotive

Consumer technology

FemTech

Government

Healthcare

﻿Market leadership & growth

Confident launch & risk mitigation

Global expansion

Breakthrough innovation

Measurable business results

All case studies

Shop our

children's books!

Publications

Blog

Newsletter

Explore Bold perspectives

About us

Meet the team

Research method toolbox

Quality and compliance

Global presence

News

Our locations

Blog

Is your research ready for the unpredictable world of LLMs?

When variability is a feature, not a bug, it introduces risk, demanding new approaches

A single bot misstep can do irreparable damage

3 reasons we chose to simulate instead of prototype

The digital hub needs to support groups, not just individuals

Relationships are delicate and take time to earn

Inviting humans in requires precision

Simulation delivered lasting impact

Digital hubs struggle to support groups:

Relationships are earned:

Inviting humans in is essential:

Market leadership & growth

Confident launch & risk mitigation 

Market leadership & growth

Confident launch & risk mitigation