What our research partnership with Google revealed about evaluating AI-powered tools and the questions every product team should be asking. The following is based on peer-reviewed research, co-authored with Google researchers, and published in Frontiers in Artificial Intelligence.
You’ve built something impressive. An AI-powered platform that adapts, responds, and personalizes. The demo lands well. The stakeholders are excited. But somewhere in the back of your mind, a question lingers: How do you know if your platform delivers real value for its end-users?
It’s a question every product team should be sitting with and one that Google tackled head-on through our collaborative research partnership.
The challenge
Google Research recently developed Learn Your Way, an experimental platform that leverages generative AI to transform traditional textbook content into dynamic, multimodal learning experiences, including slides, audio lessons, mind maps, narrated videos, and immersive text with embedded quizzes. Designed with students in mind, Learn Your Way also lets users personalize educational materials by grade level and interest, providing a truly adaptive learning experience.
Although the platform appeared promising, Google still needed to answer one deceptively challenging question: Does Learn Your Way actually improve student learning and performance compared to traditional learning methods?
What rigorous really looks like
To ensure a confident product launch, Google partnered with us to execute a rigorous mixed-methods study. Together, we experimentally tested 60 US-based high school students and split them into two groups to learn an assigned textbook chapter (either using the Learn Your Way platform or a web-based PDF reader).
Together with Google Research, we developed the study design, and our research team managed recruitment, data collection, analysis, and supported the write-up of the study results. We conducted 90-minute in-person sessions incorporating a mix of survey design, learning assessments, and moderated interviews. Our team also followed up with students several days later to re-test their knowledge of the educational material.
With this mixed-methods approach, we not only compared immediate student performance using Learn Your Way against a standard digital textbook but also assessed long-term memory. We further measured experience quality: how did students feel about the learning platform?
That distinction matters more than it might seem. An AI tool can produce technically correct outputs and still fail its users. There might be an immediate boost with the tool. But what if it doesn’t last? Or students might complete a task and still feel lost, frustrated, or disengaged. Together, these outcomes and feelings shape whether a tool meaningfully changes behavior over time.
What the numbers tell us
The findings were clear. Students who used Learn Your Way scored significantly higher on both an immediate recall assessment and a follow-up test administered three to seven days later. These outcomes demonstrate a performance advantage that holds steady over time, adding confidence to the real effects and power behind Google’s AI-driven learning tool. On the experience side, the gaps were even more striking: Participants were significantly more likely to say the Learn Your Way platform made them feel comfortable taking an assessment, was more effective than the tools they already use, and would recommend it to other students.
A deeper story beyond the data
Our quantitative data validates the performance benefits of Learn Your Way, showing us what is possible. But the story isn’t complete without the why.
In post-assessment interviews, students using Learn Your Way described something beyond just performing better. They said they felt in control of their own learning. One student framed it this way: being able to move between different learning modalities meant that if one wasn’t clicking for them, they could find a format that did. Another said the chunked structure of the content made the material feel manageable rather than overwhelming, which uncovered insights that would otherwise be invisible in a test score or survey metric but revealed themselves through direct conversation.
Meanwhile, students in the digital textbook condition expressed wanting the exact features Learn Your Way already offers, without even knowing the platform exists. What resonated with students wasn’t the AI itself. It was the design decisions behind it: content that adapted dynamically rather than static text, multiple learning formats to choose from, quizzes that let students test themselves as they learned, and feedback that told them where they stood. But most importantly, genuine autonomy and choice behind how to engage with the material.
That last one is the insight worth carrying into your own work. The students who thrived weren’t merely the passive recipients of a smarter tool. They were learners who felt like they had agency. The platform gave them meaningful ways to direct their own experience, and that made all the difference.
Key considerations for every AI product team
If you’re building or evaluating an AI-powered platform, here’s what our research suggests you need to ask:
Are you measuring the right things? Engagement metrics and satisfaction scores are easy to collect, but do they reflect what actually matters? Whether your tool changes behavior, improves long-term outcomes, or genuinely serves users’ needs requires a different measure. Define success before you start, not after.
Do you know the “why” behind the outcomes? The most actionable findings from our mixed-methods study came from what students said, not just from their scores. Qual research isn’t merely a “nice-to-have”. It’s how you get from “it worked” to “here’s what to focus on next.”
Does your platform give users real agency? Personalization and adaptability are table stakes now. The deeper question is whether your tool makes users feel in control of their experience or just along for the ride. That distinction drives adoption, retention, and real-world impact.
Are you testing with the right people, in the right conditions? AI platforms behave differently across contexts, use cases, and even user groups. Investment in research that doesn’t reflect your actual or intended user base won’t be enough to provide answers you can confidently act on.
Are you bringing in expertise early enough? Research that informs design decisions is far more valuable than research that audits them after the fact. The earlier a human factors lens enters the process, the more your findings can meaningfully shape the product, not just validate it.
What this means for your next project
The pace of AI development isn’t slowing down. But speed without validation is a risk, not just to product quality, but to user trust. The companies that will get this right are the ones asking hard questions early and building the evidence base to answer them.
That’s exactly where we come in. If you’re developing or assessing an AI-powered platform and want to move forward with confidence, talk to us about the right research approach for your project. Reach out!
Access the full paper
Read the open-access resource, An experimental evaluation of an AI-powered interactive learning platform, published in Frontiers in Artificial Intelligence. Co-authored by Bold Insight Partner Scott Siebert, Senior UX Researcher Lucy Tootill, and UX Researcher Nicole Miller in collaboration with Google researchers, the study tested the effectiveness of Google’s AI-powered platform, Learn Your Way.

