“It’s not about the technology, it’s about user experience and design.”
The march of the virtual assistant towards global domination is gathering pace. A report by eMarketer said that mass adoption of virtual assistants is slowly becoming a reality. Voice-enabled device usage is forecast to show a year-on-year increase of 128.9% in 2016, with 60.5 million Americans expected to ask a virtual assistant—Alexa, Cortana et al.—to do something just through the power of their voice.
Over the next few years, there will be a significant increase in not only the number of devices available but also the voice-related capabilities of the leading platforms. Companies that have already nailed their colors to the voice-enabled mast will (unsurprisingly) dominate the market, the report said.
Google, Amazon, Microsoft and Samsung have either products available or are working with partners to build devices. Apple’s Siri is widely rumored to be getting a sleek new physical body in the not-too-distant future, which could propel the voice-activated assistant market into the stratosphere.
“Consumers are becoming increasingly comfortable with the technology, which is driving engagement,” said eMarketer’s vice president of forecasting Martín Utreras, in a blog post. “As prices decrease and functionality increases, consumers are finding more reasons to adopt these devices.”
Voice Interfaces Will Become The Norm
Voice recognition technology in the form of natural language processing has come on in leaps and bounds, so much so that devices that don’t incorporate some form of voice control or interaction will seem, well, dated.
For the moment, Amazon is the clear leader. Around 35.6 million people will use a voice-activated assistant device at least once a month in 2017, eMarketer said. And 70.6% of those people will be speaking to a member of the Echo family.
Google Home—which does have the advantage of a superior search tool in Google Assistant—will account for 23.8% of the market by the end of the year. The remaining 6.6% is likely to be shared around a variety of manufacturers that are starting their own voice-activated journey.
What is beyond question is that in the not-too-distant-future voice user interfaces will become the norm. The big names in 2017—Alexa, Google Assistant, Microsoft’s Cortana, Siri and Samsung’s butler-sounding Bixby—may be leading the field now, but there is a consensus that the time is right for voice to take center stage.
The challenge will be to provide people with voice-enabled experiences that do more than just relay news, play music or provide weather and traffic information.
According to Boston-based Earplay’s CEO Jon Myers, the mass adoption of voice-activated devices will require simulated conversations and narrative entertainment platforms. And, in Myer’s view, the increased focus on voice-activated skills—arguably kickstarted by Alexa—will validate a market that seems to have come out of nowhere.
“It’s not about the technology, it’s about user experience and design,” said Myers. “A lot of the technology behind speech recognition and natural language processing has been around for quite a while. It’s not as if the tech has taken a sudden leap that has made it so much better and everybody wants it … it’s the fact that a company like Amazon spent a lot of time making sure that the user experience was solid.”
Contextual Conversations Will Increase Adoption
The twin pillars of user experience and design are validated by the hype that currently surrounds voice-based tech.
At Google’s I/O 2017 developer conference in Mountain View, voice-activated interfaces were front and center. By the same chalk, the Amazon Echo now comes in a variety of shapes and sizes, all of which bring voice control to the forefront of device engagement. And lets not forget that other leading brands are looking to jump onto the voice bandwagon.
Myers cites a change in consumer attitude to speaking out loud as one aspect that has made voice such a hot topic. On a psychological level a person can get the information they want in an intuitive and fluid way, which enhances the overall user experience. For that reason alone, companies should be looking to use contextual conversations as a basis for engagement or fun.
On the flip side, many of the skills or actions available via existing devices do require a certain degree of structure to get the right answer or provide a worthwhile experience. For example, the Earplay skill provides people with interactive story-telling content—a sort of choose-your-own-adventure that echoes the halcyon days of radio dramas. The difference is that the skill allows people to play an active role in a dynamic storyline … just by using your voice.
“Some people see us as a game … and that’s cool,” Myers said. “Early on we were trying to work out where we fitted into the apps store … are we an audiobook or a game? Eventually we gave up trying to pigeonhole ourselves and said we that we were voice-driven interactive audio entertainment. We invented our own medium!”
In some ways, the need for contextual capabilities in voice-activated devices mirrors the trend for chatbots. Real-time conversations with digital assistants are at the top of the list for many brands so it makes perfect sense that contextual engagement would play a role in expanding the benefits of voice interaction.
“It depends on what the company’s goals are,” Myers said. “If they’d like to enter into the space anew with their own product, then they should be aware that voice experiences are deceptively easy to construct and prototype. What follows is a difficult path and a lot of work from prototype to a high quality experience that will do well upon release. In our experience, the design often requires more time than the engineering.”
People Can Do Cool Things With Their Voice
With that in mind, there are numerous signs that voice-based interactions are about to hit the next level.
Towards the end of 2016, Google opened its “Actions on Google” platform for developers with the sole intention of giving people the ability to have a two-way dialog with its virtual assistant. Amazon Web Services have made the AI technology behind Alexa available to anybody who wants to build a conversational interface. And you don’t need a crystal ball to realize that voice can provide value to any brand that wants to “talk” to its customers.
In what should not be a shock to anyone, voice-based computing is also on the radar of venture capitalists.
Take Voicecamp. The New York-based accelerator is running an 11-week funded program for eight startups that are building conversational interfaces as part of voice-controlled ecosystem. The program—which includes Earplay—is intended to bring developers and platforms together, with the overriding aim of making voice-based computing a “natural and frictionless end user experience.”
All of this is good news for developers and people who see voice as a perfect conduit to merge the worlds of digital and physical experience. Smartphones have ruled the roost when it comes to brand engagement on a device, but they still require people to hold things, press buttons and scroll to get the results they want. Just speaking into thin air is so much easier and (in theory) less time consuming.
“Technology in the voice space has been regularly improving for several years, but suddenly voice is emerging everywhere,” said Myers. “The primary catalyst is not any particular breakthrough in technology or a new way that we’re processing speech. Instead, it’s instead the monthly breakthroughs in how we enable people to use their voice to do cool new things.”