How Multimodal Can Jumpstart Voice in Retail

Emerson Sklar Emerson Sklar
minute read
Applause Blog Logo

Seeing is believing in the world of vTail.

Voice-driven shopping is powerful, but it understandably comes with some skepticism on the consumer side. Yes, we have grown accustomed to a variety of digital retail methods – such as social commerce and mobile app- and web-driven commerce – but in every scenario, consumers are able to see the entire transaction in front of their eyes. Voice removes that visual element, and for many, that causes a measurable level of uncertainty and discomfort.

That said, 22% of Americans who own a smart speaker have actually made a purchase with it, according to Edison Research. Sure, this is a small fraction of the entire market, but there is clearly traction to the concept. While there are several barriers impacting the adoption of voice shopping, the one that can make an immediate difference is the integration of a screen into the experience.

Visuals Build Consumer Confidence

Every time a consumer makes an online purchase, it has one final chance to review its purchase details before confirming the transaction. If nothing else, this provides peace of mind in knowing the right items in the cart. However, when purchasing through voice, that same visual confirmation of the cart is typically unavailable. We place a lot of trust in technology, but confidence in voice comprehension is something we’re still getting used to.

Adding a visual component to your voice experience can significantly reduce this unease. Not only does it validate your voice commands, but it retains the frictionless element of voice interactions that we desire in the first place. With this validation, customers gain the confidence to carry out a transaction and ideally come back for more.

How to Seize the Multimodal Opportunity

Ebook

Learn the key design concerns when building multimodal experiences that connect with customers via visuals and voice.

Read Now

Visuals Improve Shopping Experience

Consumers have shown they’re willing to buy many items, such as consumables, sight unseen, but when it comes to apparel, it can be a tough sell. Unless consumers know well in advance the specific items they want, they often struggle to know exactly what they’ll be receiving. By bringing a screen into the equation, customers can use a brand’s voice component to filter by style, color, size, and more, but use the screen to browse the catalog and verify their selections.

Consumers feel strongly about this ability to blend technology to form a more complete experience. In a recent survey of the global Applause Community, 69% of those with voice-enabled devices reported they would be more inclined to make a voice purchase through a multimodal experience. Consumers are committing to multimodal displays faster than ever before – there was 558% growth in ownership by U.S. adults from January 2018 to January 2019 – so, as a retailer, you don’t have time to waste in delivering the right experience.

Visuals Spur Purchase Frequency

The time is now to provide a quality multimodal experience. Per Voicebot.ai, smart display owners are 133% more likely to make monthly voice purchases, proving that multimodal experiences are more than a novelty. As the growth in smart display ownership continues, the potential to capture voice-driven revenue will follow suit.

While repeat purchases can be confidently made through any voice device, a multimodal experience offers a greater opportunity for incremental sales. By bringing the personalized recommendation feature back into play (like with traditional ecommerce), you enable consumers to see additional products and make snap decisions as they are wont to do.


Voice represents the next generation of retail, much like mobile commerce did many years ago. As a result, voice adoption by retailers is expected to increase by 127% in the coming year, per Salesforce. While it’s exciting to see this level of adoption, success is no guarantee. Heed the advice and behavior of your customers, though, and you will be one step ahead of your competition.

As we explore the impact of voice on the retail landscape, we will next dive into the world of voice search and how voice engine optimization is the most important piece of the puzzle that you didn’t even know existed.

Applause Circle Logo