Select Page
A stranded motorist uses an IVR system to request roadside assistance

Do Your IVR And Chatbot Experiences Empower Your Customers?

In the recent webinar, Superior Self-Service, Deliver Better IVR and Chatbot Experiences, three members of the Applause team who interact with customers in different capacities shared recommendations on assessing IVR and chatbot experiences. Senior Solutions Engineer Chris Doucet, Senior UX Researcher Miriam Ross-Hirsch, and Associate Director of Solution Delivery Simeon Olsberg fielded questions from Global Content Marketing Manager David Carty. Read on to learn how to develop a plan to assess your IVR and chatbot experiences.

Customer journeys in IVR and chatbot interactions

One of the first parts of assessing self-service systems like IVR and chatbots is understanding the journeys that bring users to reach out through these channels. Organizations invest in IVR and chat systems to help users quickly and effectively resolve issues and keep customers engaged with the business. To do that, it’s vital to map customer journeys to identify key points where testing and improvement are most critical. 

Chris Doucet pointed out that when customers turn to self-support tools, they’ve typically encountered a problem that they can’t solve on their own. “They will likely be feeling a mix of frustration, urgency, and agitation when they reach out. They will be seeking a swift path to a clear resolution,” he said. “Upfront empathy will go a long way for companies who are evaluating and designing IVR and chatbot systems.” 

Doucet outlined some characteristics of well-designed systems. Good systems do the following: 

  • offer intuitive menus
  • respond to real customer needs
  • use clear language
  • resolve common issues quickly
  • escalate to a human representative when it’s appropriate 

“In other words, they make you feel valuable, that you’re understood, and that you’re in control,” Doucet said.

Ebook

Essential Guide to Creating Exceptional Customer Journeys

Ross-Hirsch added that “It’s essential that these flows are smooth and foolproof so that customers can really achieve their tasks that are designed to self-service without having to go to an agent or another channel.” She emphasized the importance of seeing users and testers in context to see how they engage with the system and reveal breaking points.

Next, the team outlined some common goals and metrics for IVR and chatbot systems: routing accuracy, disconnects, CSAT scores, deflection rate, containment rates, and cost of services. 

In her role as a UX researcher, Ross-Hirsch said she often encounters routing accuracy as an issue in. “Users often aren’t saying the right trigger words. They don’t know what words to say to get them to where they need to go,” she said, or the information provided in menus doesn’t help them identify the option that they’re looking for. “So they’ll go to an agent, and they won’t be able to self-service.” To avoid that, she said, it’s essential for organizations to make sure their testing covers all the utterances and considerations of their specific products.

In addition to routing accuracy, Doucet called out the importance of containment and disconnects. “Beyond minimizing escalations, [containment] is also about providing a seamless and satisfying self-service experience to resolve the issue.” He pointed out that while disconnects can be technical issues, they’re also lost opportunities. “Each disconnect represents a customer who might be giving up and going elsewhere. And we need to dig into why that’s happening. Is it a confusing menu? Is it a lengthy wait time, a system error? Understanding the root cause is key there.”

Ross-Hirsch explained that often, these friction points highlight where UX research is needed. “We’re often coming in and looking at points of low containment and disconnects without task completion to dig in for further research to understand, why is that happening? Where are the potential problems? Are they user-related issues? Is it a UX piece? Or my colleagues will identify if it’s functional or another root cause. And we’ll dig in on further research to really understand what are those problems and where are those solutions.” 

Defining the scope of the project

Carty asked how organizations should approach developing a plan to assess functionality, usability, and trustworthiness across their customer interactions. He invited the presenters to elaborate on how Applause helps clients define their testing scope within these parameters, even if they’re just coming to us with a general problem or idea of what they want to do.

“That’s a big part of what I do in my role, is supporting our clients in identifying the right research and how to best get them to their goals,” Ross-Hirsch said. While some teams already have this defined and simply want applause to execute a UX study on their behalf, others don’t have a defined plan. “They say, we have a problem or we have an area that we need to dig in on.” She emphasized that each project or engagement may support different goals. Some examples:

  • determine what’s causing a low containment rate in a particular area
  • assess how a new system compares with a previous one
  • learn how competitors are handling a tricky interaction
  • uncover functional issues as well as UX friction in a specific flow
  • reduce risk in an AI-based system with red teaming

Building a plan to assess IVR and chatbot systems

Once the goals have been defined, it’s time to put together a comprehensive testing plan for chatbots and IVR systems incorporating different testing and feedback strategies that account for real-world use cases.  

Doucet said, “when it comes to testing, we employ a variety of strategies as part of a multipronged approach. This includes creating thoughtfully designed test cases, regularly collecting UX feedback, and continuous monitoring. And as AI becomes the norm, we now recommend additional strategies, such as prompt response evaluation for accuracy and relevance. As well as red teaming with individuals who can really evaluate these systems… so that models can be updated, if needed, prior to public release.”

Ross-Hirsch covered different types of feedback mechanisms, including techniques such as in-conversation feedback, post-interaction surveys, user analytics, sentiment analysis, and audio or text capture. “We have so many ways to collect feedback. We bring users through IVR and chatbot experiences to provide in-context insights on points of frustration and opportunities for improvement. That’s the crux of the UX work that we’re doing on my team.”

Next, Carty pointed out that while feedback loops and mechanisms are not new, they take on a new degree of importance when AI represents a brand in a more direct way. He encouraged teams to consider what variables they could test or evaluate in house in terms of real-world conditions and which scenarios they can’t replicate. “Lab environments and synthetic data really only go so far compared to the authenticity and validation you get from real-world testing,” he said. He also emphasized the importance of making sure that self-service options are broadly available to customers who may have varying degrees of access to or comfort with technology. For example, in North America, millions of people in rural areas lack access to fixed broadband service. There are also temporary internet outages due to storms, accidents, and other unexpected events, so organizations need to offer multiple service channels. 

Traditional versus AI-powered experiences

Next, the presenters described some of the key differences to consider when testing traditional versus AI-powered chat and IVR tools. Doucet explained that traditional chatbots and IVRs built on a foundation of predefined rules and scripted responses work for simple queries. However, he said, “It’s impossible to really script that whole range of what real users might say, especially the wide swath of users that any individual product might support. So these linear, menu-driven scripts often fall short.”

“But AI changed all that. You have ML and natural language understanding taking place and becoming commonplace. And those AI-powered chatbots and IVR systems have greatly increased their grasp of both the context and the intent when a customer is asking for something… They are more flexible and conversational, and they provide a higher level of service. So as these tools have evolved, so, too, have the scopes and the test plans for validating the readiness of these tools. So testing a traditional rule-based system is very different from testing an AI that’s constantly learning and adapting,” Doucet said.

While evaluating traditional IVR and chatbots mainly involves assessing the different flows and rules for the system, testing AI must account for the nuances of the NLU (natural language understanding), the potential for hallucinations and ethical considerations for AI-driven interactions. 

Ross-Hirsch noted that the more natural experiences that AI-driven tools deliver is changing user expectations for how systems should behave. “In our research, for example, I’m finding increased frustration in our users with traditional and even hybrid IVRs and chatbots because there’s this growing expectation of more natural language processing. And so there’s these really interesting evolutions that are happening from a customer expectation perspective that everybody needs to work towards, no matter what type of chatbot or IVR you have.”

The experts agreed that regardless of the type of system, it’s important to ensure that it accommodates the different demographic and geographic profiles the organization serves, in terms of accents, slang, and contextual activities. The system must also recall historical conversations and integrate with other back-end systems seamlessly to help streamline service.

Executing the plan to uncover friction 

Olsberg described some of the ways Applause teams work through the process of evaluating customer IVRs and chatbots. “Friction can happen at any point along the customer journey. We work really hard to identify what the key areas of focus are for the customer and then help our teams or respondents to work through the IVR or the chatbot, whatever that focus is, to identify not only the UX friction points, but also any technical failures.”  

Other possible issues beyond UX and functional errors include accessibility, security, payment, and localization. Olsberg shared a common example: “We’ve worked with some of our customers where they have an English and a Spanish IVR. And very often, when somebody is navigating through their Spanish IVR, they’re hitting English responses. And that’s obviously not ideal for someone who really is passionate about providing the best customer experience.”

“A critical piece here with a lot of this is that we have the ability to run really targeted recruits, especially from the UX perspective,” said Ross-Hirsch. “So for example, if we’re looking for a particular bank’s chatbot or IVR, we can recruit actual customers of that bank or of competitor banks to really make sure that we’re testing it in context with people who are going to actually be using that system. Or maybe individuals who have taken certain payment actions in the past. We can really get specific here to really make sure that we get targeted, useful feedback.”

Capturing actionable insight

In order for organizations to improve their IVR and chatbot experiences, it’s crucial to pinpoint the exact issues users encounter. For functional issues, detailed reports that include clear, reproducible steps, as well as pictures or recordings of problems, can ease identification and triage. 

Olsberg walked through an example of a report for a structured test case, illustrating where a respondent encountered an issue. “There’s an expected result, and they haven’t hit that expected result. We have a sophisticated platform that the customer does have access to that will provide those steps to reproduce, whether or not more or more respondents have found that same issue on the same platform, on different platforms–maybe it’s reproducible on both iOS and Android–and video or screenshots, if it’s something that is visual, like a chatbot experience versus an IVR journey. But we are able to provide a lot of that detail that helps development teams remediate the issue very quickly.”

Though UX reporting often takes a more variable or less structured format, Ross-Hirsch explained that all reports call out clear data and prioritized issues. “It all really varies, but focuses around, what did we find? What are our insights? And how should our client improve?”

Real-world scenarios 

Olsberg and Ross-Hirsch shared examples of some of Applause’s client engagements, starting with a two-part study of a traditional IVR system for a bank. “This was a bilingual IVR, English and Spanish. There was the functional aspect for the flow and friction points, but there was also a competitive aspect to this. This bank wants to be best in class with their IVR and really make it as easy as possible for their customers to engage with them when something does go wrong,” Olsberg said. 

“We worked with the bank very closely in terms of developing the ideal customer profile that they were looking for, what kind of accounts or bank products these respondents would need to have, and then sourcing those respondents so that we could meet the bank’s needs. We created the research plan, reviewed it with the customer, had that approved, and, once that was ready, went into the questionnaire, dug very, very deep with not only the bank’s own IVR, but with the competitive IVRs to get a really broad picture and allow the customer to see what best in class can look like.”

The next example highlighted AI-powered IVR testing, including functional and adversarial prompt testing. Applause assessed a pharmacy system’s ability to handle requests and recognize various speech patterns, and simulated real-world threats to identify potential security risks, including impersonating a physician calling in a prescription and attempting to access confidential and PII data. The red teaming exercises helped validate the system’s security. 

Olsberg highlighted that in many industries, privacy of the individual is paramount, and self-service channels must protect that privacy. “To ensure that there is trust with that system and with that provider, you really do want to ensure that you’ve tested it thoroughly, not only the happy path, with test cases that everybody can do, but really that more adversarial approach… to try and see if you can get the IVR to respond in a way that’s inappropriate to that particular environment.” 

The next two examples demonstrated how Applause has helped companies with their chatbot testing. One financial services company wanted to understand customer expectations for chatbots in their industry. A study with customers of different brands gave the company insights into how their chatbot compared and whether they should invest in AI. 

“A number of years ago, for this client, we were doing a similar type of research, but around SMS, text messaging looking at the competitive landscape around text messages and what people expect in the financial service industry,” Ross-Hirsch said. “In this most recent study we focused a lot on how people are thinking about chatbots, about the larger idea of self-service in the financial space? And where should our client be heading? And what tools should they be developing as they’re starting to think about what AI should come in?”

“At Applause, we don’t have to just be testing an already existing tool. We can do discovery work to really dig in on what you should build. Where is the opportunity? Where should you be digging in?” Ross-Hirsch said. 

For a car rental service, Applause evaluated whether an AI chatbot could handle common customer questions accurately and clearly. Testers provided various prompts and graded the responses. Ross-Hirsch explained that in this case, participants graded the responses against a rubric assessing characteristics like accuracy, relevance, clarity and completeness. The company was able to see where the model needed additional training. “This testing really started with much more basic chatbot testing, more basic interactions, and then helped them iterate, helped give them information as they grew their tool and followed their evolution to more sophisticated chatbots,” she said. 

Key takeaways

The webinar concluded with a recap of crucial points. First, be very clear about what aspects of your IVR or chatbot you want to evaluate. Second, always test in real-world conditions using testers that match your actual customer profiles. This approach really helps companies ensure they’re getting feedback that truly reflects their users’ experiences and perspectives. Finally, make sure you can take action on that feedback. The goal is not just to find problems, but to get clear next steps on how to fix them.

“I think it’s really important to understand you don’t have to do it all at once,” Olsberg said. “You can take it step by step. You can start off small and grow bigger or more comprehensive. We’ve worked with a number of customers who really don’t have the budget or the mental and physical capacity to deal with what we’re finding all at once. So we break it up into smaller portions that make it easy to find the issues, fix the issues, roll them out, retest the issues, make sure that they’re covered, and then move on to the next section.”

Watch the webinar on demand to learn more about how Applause can help you validate and improve your IVR and chatbot systems.

Webinar

Superior Self-Service: Deliver Better IVR & Chatbot Experiences

Learn how to create seamless IVR and chatbot experiences for your customers — whether you’re using traditional systems or AI-powered options.

Published: May 12, 2025
Reading Time: 18 min

Integrating CX Into Everyday QA Testing

Enhancing quality through a focus on customer experience

Agents and Security: Walking the Line

Common security measures like captchas can prevent AI agents from completing their tasks. To enable agentic AI, organizations must rethink how they protect data.

How Agentic AI Changes Software Development and QA

Agentic AI introduces new ways to develop and test software. To safely and effectively make the most of this new technology, teams must adopt new ways of thinking.

Automation vs. Agentic AI: Key Differences

Explore the core differences between rule-based automation and agentic AI, and their roles in modern software QA.

Usability Testing for Agentic Interactions: Ensuring Intuitive AI-Powered Smart Device Assistants

See why early usability testing is a critical investment in building agentic AI systems that respect user autonomy and enhance collaboration.

Agentic Workflows in the Enterprise

As the level of interest in building agentic workflows in the enterprise increases, there is a corresponding development in the “AI Stack” that enables agentic deployments at scale.
No results found.