Multimodal Will Change How We View Voice

Emerson Sklar Emerson Sklar
minute read
Applause Blog Logo

The combination of visual and auditory cues will enhance brand and consumer interactions.

Demand for voice is surging as consumers increasingly find the medium a more natural way of experiencing products and services in their daily lives. Consumers are now completing common tasks like getting news updates, checking upcoming appointments, and placing takeout orders via voice. As they rise in popularity, voice experiences have the potential to drive increased engagement, loyalty, and even purchase behavior.

Even with voice becoming a part of consumers’ typical routines, many interactions can be improved by the addition of visuals. That is why multimodal experiences are becoming so important in connecting companies and consumers.

Multimodal experiences combine multiple interfaces (e.g. visual and voice) to provide a more holistic and natural interaction. While devices like the Amazon Echo Show and Google Home Hub immediately spring to mind, multimodal experiences have actually been around for far longer. A device doesn’t necessarily need a screen to be considered multimodal. A visual feature like Alexa’s LED ring, for example, can be an effective way to communicate information like whether a device is listening, processing, or offline. There are other times when adding visual elements such as images, videos, or LED signals can convey information more succinctly or advance the interaction more efficiently than communicating solely via voice.

There are plenty of benefits that come with delivering multimodal experiences. Think about something as simple as checking the weather – it is far more efficient to state the current day’s results and display graphics showing the entire week’s forecast than to read through all the details for each day individually. In fact, most voice apps would benefit from a visual interface.

A recent survey from Walker Sands revealed nearly half of consumers (49%) are not willing to buy luxury items or food and groceries using a voice assistant. Forty-seven percent of respondents similarly ruled out buying furniture through voice-only experiences. Additional research from comScore helps to further explain why consumers may be unwilling to buy these types of products through voice assistants. While security remains a major deterrent to voice commerce, some other areas of concern, such as the inability to view product details or compare products, are easily solved by adding a visual element. Even the potential for misheard or incorrect orders can be minimized with a graphical interface as shoppers could quickly review orders before completing their purchase.

Though multimodal experiences offer their fair share of benefits to brands, their design presents another layer of complexity – particularly as it relates to testing. Multimodal experiences need to be consistent across devices even when moving between totally disparate environments, such as going from a smart speaker to a car’s infotainment system. In-lab testing simply isn’t robust enough to cover all the bases. Moreover, while testing websites or mobile apps is fairly standardized, there is much more variability with voice experiences. The only way to put these experiences through their paces is by testing in a wide range of real-world scenarios, including different combinations of users, devices, voices, languages, and dialects – all on a global scale.

As more brands see the value of multimodal experiences, and start designing for them, quality will become the differentiator between the brands that win and those that lose. Learn more about the key design concerns for multimodal experiences in our new eBook.

How to Seize the Multimodal Opportunity

Ebook

Learn the key design concerns when building multimodal experiences that connect with customers via visuals and voice.

Read Now
Applause Circle Logo