During a recent virtual conference, Nithya Thadani opened her talk with an anecdote about her daughter Sonia’s relationship with voice assistants. “We bought a Sonos speaker and we took it out of the box. [Sonia] came downstairs and walked right up to it and said, ‘Oh hey, Alexa!’ No answer. Then she tried Google. Still no answer. And then she started banging it and said, ‘Hey you, who are you?’”
Relaying a story about her young daughter’s interaction with voice technology meets two criteria. First, it confirms that voice technology is easy to use (you don’t need to know how to scroll, swipe or text to use it) and, second, it bolsters Thadani’s relatability and credibility.
If the goal was to drum up support and enlighten audiences on how voice-activated technology allows users to control their environment, the CEO of Rain, a New York-based agency that specializes in voice strategy, design and development, hit the mark. Held June 23-25, the large-scale virtual event Collision from Home drew more than 32,000 online registrants from 140 countries, featuring technology investors, celebrity speakers and journalists.
“We tend to think about voice as these smart speakers, but those speakers are going to be a rounding error when you talk about the emerging behavior of voice—people speaking to things,” Thadani said, pointing out that twice as many U.S. adults have used voice in the car, regardless of the media hype afforded to smart speakers and other devices.
Thadani wasn’t alone in her quest to inform attendees about the growing potential of ambient intelligence. Joining the virtual summit from his home in San Francisco, Intuit’s chief innovation officer, Bharath Kadaba, said that building a worthwhile ambient experience rests on three factors.
“The first is empathy, which is a genuine understanding of who the customer is and what situation they’re in,” said Kadaba. “The second is expertise, which is to be able to understand the particular challenge that the customer has and having the expertise to solve it. It’s not a one-size-fits-all approach. The third is ethics, which underpins that trust—of privacy, security and explainability, and all of those things that actually build that long-term relationship.”
Embrace AI-Driven Expert Platforms
Building conversational user interface technology rooted in these factors is a foundational component to a viable system, according to Kadaba, who leads Intuit’s Technology Futures group. Broadly explained, the process involves recognizing speech and converting the speaker’s intent into machine readable format, which at once fits into a reasoning engine and combines the knowledge of the domain and the data of the customer to come up with an answer. Once the answer is figured out the response needs to be converted back into speech, in a way that a human can understand.
Kabada identified five technical problems researchers hope to solve in building a true interactive interface. The first is to understand context impact, which requires understanding the customer’s intent and preferences over a longer period of time. The second is to be able to recognize the long tail of possible questions.
The third is to detect, understand and infuse emotion into this conversation. “Injecting emotion is a key component of building trust,” said Kadaba. The fourth is building a reasoning engine, which he noted is probably the hardest thing to do as it combines words, domain knowledge and deep data to build that expertise. And the final problem is leveraging the nuances of back-and-forth conversation.
Kadaba’s position is that blending human input and AI is the best way to deliver utilitarian ambient solutions. Building on PARC scientist Mark Weiser’s theory on ubiquitous computing, “The most profound technologies are those that disappear,” he explained. “Artificial intelligence augmented by a human touch will beam itself into the fabric of everyday life until it is indistinguishable from it.”
Design for Behavior
Thadani supports this notion. “Voice is going to be the gateway drug to ambient computing,” she said. As a medium of communication, speaking is natural and frictionless, and “that’s why it’s the medium of therapy, and it’s also the medium of prayer.”
Behind her reference to being mindful is the success story of how Rain integrated its voice user interface with Headspace, a meditation app that runs across Google, Alexa and Cortana platforms. The startup is the brainchild of Andy Puddicombe, a Buddhist monk. As Rain’s website boasts, it reached the same number of active users within the span of two months as it took to amass on mobile over the course of 40 months.
Similarly, Rain’s technology undergirds the “on command” feature that allows Starbucks guests to order their regular grind using the Amazon Alexa platform or the My Starbucks Barista chatbot. Rain looked at existing behaviors of Starbucks customers and found that 73% of users order the same thing every visit. The data provided a golden ticket to rearchitect the coffee giant’s product API (an application programming interface that allows two applications to talk to each other).
Zeroing in on the behavioral and emotional aspects of the user is key to voice AI success, explained Thadani. The market is expected to “grow to $93 billion over the next couple of years” and she predicts voice assistants will give us content and information wherever we are and at any time.
“Whether you’re designing for a car, or for a headphone, or for a voice device with a screen, it’s important to remember that voice is not an interface,” said Thadani. “It’s a behavior, of people speaking to their environment. They’re ambient assistants. And you’ve got to design for that behavior across all of these different touchpoints.”
Build Emotional Intelligence
Thadani maintains that Rain’s voice algorithms excel at the ability to build emotional intelligence (“You need both EQ and IQ”). For instance, Rain worked with Marriott’s Aloft Hotels to learn how they could improve the guest experience. Their research started out by “listening” to what guests requested most to the front desk. “We call this conversation mining,” Thadani said. The use case showed that guest requests for such things as towels or Wi-Fi passwords could instead be diverted to voice assistants, which would free up resources to focus on the customer experience.
Big brands are leaning in to the idea of customized voice assistants built from a targeted understanding of customers’ emotions and behavior as a way to realize mutually beneficial voice-computing productivity gains. Bank of America’s Erica and Amazon’s Alexa each have their own personality mined from user conversations, sentiment and behavior.
Consider as an example Bank of America’s Erica, which has 10 million users and has taken over 100 million requests. “She does balance transfers, she makes deposits, she takes requests,” said Thadani. But while the ambient technology class of 2020—including Erica, Alexa, Cortana and Julie—have been a huge success, they’re still considered to be “in the Age of Ask, as in, ‘I asked a question and I get an answer,’” recapped Thadani.
Yet, being transactional by taking requests wasn’t really the goal, noted Thadani. “Bank of America is betting on the future, and in the future, Erica is going to take all of this conversational data and sentiment, and get herself a promotion from assistant to true financial advisor,” she said.
Assume Voice Assistants Will Act with Agency
Ambient intelligence extends way beyond voice SEO and Bank of America’s Erica demonstrates that it is well on the way to establishing customer confidence that the solution offered is optimal. “The intelligent system has to be smart enough to figure it out,” said Kadaba.
Thadani waged it won’t be long before conversational data, combined with biometric data from wearables and connectivity—along with the speed of 5G computing—moves to a model of proactive, ambient assistance across industries, from financial services to healthcare and enterprise.
As the COVID-19 pandemic unfolded, voice became essential tech for workers in healthcare and transportation, reflected Thadani. “When you think about adding a proactive assistance layer, you might have an assistant to help calculate patient risk or shuffle warehouse capacity,” she said. “That’s where all of this is going and that’s transformational. This is the future of voice—proactive assistance with agency.”