Select Page

Multimodal Will Change How We View Voice

Demand for voice is surging as consumers increasingly find the medium a more natural way of experiencing products and services in their daily lives. Consumers are now completing common tasks like getting news updates, checking upcoming appointments, and placing takeout orders via voice. As they rise in popularity, voice experiences have the potential to drive increased engagement, loyalty, and even purchase behavior.

Even with voice becoming a part of consumers’ typical routines, many interactions can be improved by the addition of visuals. That is why multimodal experiences are becoming so important in connecting companies and consumers.

Multimodal experiences combine multiple interfaces (e.g. visual and voice) to provide a more holistic and natural interaction. While devices like the Amazon Echo Show and Google Home Hub immediately spring to mind, multimodal experiences have actually been around for far longer. A device doesn’t necessarily need a screen to be considered multimodal. A visual feature like Alexa’s LED ring, for example, can be an effective way to communicate information like whether a device is listening, processing, or offline. There are other times when adding visual elements such as images, videos, or LED signals can convey information more succinctly or advance the interaction more efficiently than communicating solely via voice.

There are plenty of benefits that come with delivering multimodal experiences. Think about something as simple as checking the weather – it is far more efficient to state the current day’s results and display graphics showing the entire week’s forecast than to read through all the details for each day individually. In fact, most voice apps would benefit from a visual interface.

A recent survey from Walker Sands revealed nearly half of consumers (49%) are not willing to buy luxury items or food and groceries using a voice assistant. Forty-seven percent of respondents similarly ruled out buying furniture through voice-only experiences. Additional research from comScore helps to further explain why consumers may be unwilling to buy these types of products through voice assistants. While security remains a major deterrent to voice commerce, some other areas of concern, such as the inability to view product details or compare products, are easily solved by adding a visual element. Even the potential for misheard or incorrect orders can be minimized with a graphical interface as shoppers could quickly review orders before completing their purchase.

Though multimodal experiences offer their fair share of benefits to brands, their design presents another layer of complexity – particularly as it relates to testing. Multimodal experiences need to be consistent across devices even when moving between totally disparate environments, such as going from a smart speaker to a car’s infotainment system. In-lab testing simply isn’t robust enough to cover all the bases. Moreover, while testing websites or mobile apps is fairly standardized, there is much more variability with voice experiences. The only way to put these experiences through their paces is by testing in a wide range of real-world scenarios, including different combinations of users, devices, voices, languages, and dialects – all on a global scale.

As more brands see the value of multimodal experiences, and start designing for them, quality will become the differentiator between the brands that win and those that lose. Learn more about the key design concerns for multimodal experiences in our new eBook.

Whitepapers

Testing Essentials for Five-Star Voice Experiences

Learn the best practices of voice testing, driving great voice experiences using real-world and automated testing.

Emerson Sklar
Emerson Sklar
Tech Evangelist and Solution Architect
Published On: April 4, 2019
Reading Time: 3 min

How Much Testing Is Enough?

Risk-based testing prioritizes critical tests to reduce risk.

Are AI Tools Improving Accessibility in 2026?

Read the highlights from Applause’s annual survey on the State of Digital Accessibility.

Human Testing vs. AI Testing: What Each Can (and Can’t) Catch

Find the perfect balance for reliable software testing.

From Drift to Deflection: Engineering Trust in AI Systems

Maintaining user trust in your AI chatbots is a continuous process, involving evaluation, observation and adversarial testing.

Test Automation, AI and Gaps in Digital Quality

While AI-generated code and automation can speed releases, they require human oversight to make sure you’re testing what really matters.

What Makes a QA Process Mature?

Mature QA moves from reactive defect-chasing to proactive quality engineering.
No results found.