When I think of voice interaction with a computer, my mind wanders to my first meeting with HAL in 2001: A Space Odyssey. That voice, interestingly enough very human, has excited and frightened generations of movie fans.
While the idea of artificial intelligence gone bad may frighten us, the concept of a companion computer completely accessible and able to communicate via natural language continues to interest programmers and developers alike.
Arthur C. Clarke's predictions may not have come true by 2001, but Tim Tuttle, the founder of Expect Labs, predicts that voice interface technology will grow by leaps and bounds over the next two years.
Expect Lab's voice interface platform, MindMeld, allows developers to embed voice interaction in devices, software and apps. Tuttle states that the platform will be much more like a human-to-human conversation than any interface before it. MindMeld is entering a market already dominated by interfaces such as Siri and Google Now, but this stiff competition simply reiterates that in the not too distant future, voice interaction will make up a large part of how we interact with the technology around us.
Many companies are already researching and implementing voice interaction as a part of their overall product user experience. Voice interaction can be a useful feature for multiple products and devices. Talking to our phones via the guise of Siri or Google Now is already a popular interaction. Ford Sync gives the driver a multitude of hands-free options such as accessing GPS directions and music as well as making phone calls. Kinect allows voice commands for the Xbox One, and in 2011, a Bloomberg Business writer was already calling voice control the "end of the TV remote."
In terms of accessibility, interaction via voice input and output has been built into both Android and iOS devices as well as Mac and Windows OS for quite some time.
The World Wide Web Consortium (W3C) offers specific guidelines on how to make websites more accessible, including alternate text for images that can be understood by standard screen readers. Also, software like Nuance’s Dragon is specifically marketed towards users who have difficulty utilizing a keyboard and mouse. The software allows users to do most any word processing task and search the web completely hands-free.
For all of these reasons, the actual experience of utilizing voice interaction for devices, cars, appliances, and software is becoming an important aspect of UX Design. Along these lines, the important question to ask is: What does a pleasurable voice experience for users sound like?
One reason why many product developers steer away from more human voices and towards more robotic sounding voices is a concept known as the uncanny valley. In 1970, Dr. Masahiro Mori proposed the uncanny valley, and this idea has now become a well-known aspect of human-computer interaction.
Mori stated that as robots move towards a more human-like design, our affinity for them goes up, at least to a certain point. Once robots become almost (but not quite) human, then our affinity for them quickly turns to dislike.
On a scale of affinity and human likeness, this creates a valley between “almost” human robots and healthy humans. This valley explains our natural dislike or feelings of creepiness toward many CGI graphics and very “human-like” robots. At the same time, it also explains our overall feelings of affinity toward “cute robots” like Wall-E. In a UX Booth article, Nicholas Bowman argues that the uncanny valley can also affect how we interact via voice with our devices.
A voice interface like Siri creates a greater sense of connection with our device because the majority of our attention is focused directly through one sense, our hearing. It’s very normal to attribute human traits to any technology that we use. This is how we relate to the world around us. Siri is obviously not human, but she does have human traits, mostly in the humor she portrays.
In this way, Apple has been able to walk a fine line: we can relate to Siri, but she stays just robotic enough that we don’t really think there’s a human woman trapped in our device. We are still talking to an interface, a comforting, yet flat and synthetic robotic interface. If an uncanny valley of audio does exist, Siri falls just far enough on the robotic edge to allow us to be friendly with her.
Any text to speech software can create a feeling of dislike if it is either too robotic or too human. When dealing with interfaces that have to walk the edge of the uncanny valley, we are often looking for that “just right” or “Goldilocks” spot in UX Design. If a voice is too far on the robotic side, it may seem too alien to be relatable. However, if we approach the valley without actually crossing it, then a voice sounds not quite human enough.
If there is a lesson to be learned from user experience of platforms like Siri, then it is that most people can enjoy the company of a robotic companion if they have a sense of humor and the distinction between human and robot is clear.
If companies like Expect Labs have their way, we will be able to completely and naturally converse with a voice interface in the not-too-distant future. Let’s just hope when we do argue with these future interfaces (and we will), they are not quite as difficult to reason with as HAL was.
Get our best human insight resources delivered right to your inbox every month. As a bonus, we'll send you our latest industry report: When business is human, insights drive innovation.