Watch a Demo

Using Voice Command Systems: Siri, Google Now, and Cortana

| April 28, 2015
Sign up to get bi-weekly insights and best practices, and receive your free CX Industry Report.
Thank you!

Get ready for some great content coming to your inbox from the team at UserTesting!

When was the last time you talked with your computer? I mean, really talked?

While many of our conversations with technology involve more frustration than communication, the vision of personal assistants that understand our intentions and expect our needs (à la Iron Man’s J.A.R.V.I.S. or Her’s Samantha) is now moving off the silver screen and onto our phones. Or so Microsoft would have us believe with their latest ad for Cortana on Windows Phones.

But how practical are these systems? Can they help us as partners in daily activities, or are they all talk?

Our research

To answer these questions, we ran a simple user experience study on popular smartphone personal assistants Siri, Google Now, and Cortana.

We asked nine participants (three using each system) to set up dinner and a movie with a friend using their voice as much as possible.

The tasks started off simple: looking up the evening’s weather and (if it was going to be cold) instructions on how to tie a scarf.

Things got more complex when we asked them to find showtimes for a movie and directions to a restaurant nearby, and then send those details to their friend.

Step 1: “What’s the weather going to be like tonight?”

The stereotypical use case for voice commands, asking for the evening’s weather was natural and simple. Even though participants chose different commands ranging from the abrupt “Weather” to the more personal “Is it going to be cold?”, all of their devices presented them with a well-formatted display of the day’s weather forecast, even reading the forecast aloud in some cases.

Asking for instructions on how to tie a scarf did not provide the same immediate satisfaction, showing only the results from a web search. While the results were relevant to the question, the answers were still a tap away.

All participants found they were unable to ask multiple questions in single command. While the solution was simply to split their question up, the extra step disrupted the fluidity of the interaction.

Step 2: Dinner and a movie

Moving on to the brunt of the dinner and movie plan, participants encountered a variety of obstacles in moving from one part of the plan to the next.

The process began by asking their device for “movie times” or to “take me to a movie.” Much like the weather, participants received a well-formatted display of local movies and showtimes. However, getting more specific required manual selection or another search, which proved to be difficult to do through voice alone.

Having selected a theater, participants moved on to find a restaurant near the theater. While it was easy for users to find a restaurant near their current location, it was more difficult to find a restaurant near a different location or landmark.

Commands such as “Restaurants near Carmike 16 movie theater” or “Are there any restaurants near Independence cinema?” were often misunderstood as searches for the movie theater itself—useless for participants who had already decided on the theater. Siri performed the best on this task, understanding two out of three of these commands and delighting participants with a relevant list of results.

For those who were able to choose a restaurant, finding directions to the restaurant was easy. Despite some hiccups (such as Siri providing directions to a restaurant in a different city), phrases like “directions for The Pink Cafe” brought up a navigation app with the route already planned. This was crucial for many participants who said that they often used voice commands for directions while driving. However, they still had to remember the address or name of the restaurant when searching, an extra burden especially while driving.

Once the participants had set up the plan for their night, the final step was to send the details to a friend. Most participants found this easy, asking their device to “text [name]” or “send a message to [name]”, which brought up a message dialog prompting them to speak their message aloud. Many were used to doing this, but still encountered frustration in cases where the systems balked at uncommon names or cut off long sentences while they were being spoken. Repairing these misunderstood messages involved too much repetition and back-and-forth for some participants, who had negative experiences and commented that they would rather just send a text message manually.

Users’ opinions of the voice command systems

Interestingly, the results of the study found that the three voice command systems performed quite similarly to each other (despite the sometimes nasty commercials that would have you believe otherwise).

Accuracy

While each system had a unique personality and different features, they all had comparable levels of accuracy in recognizing participants’ commands and a similar set of well-supported tasks. All also required a tap somewhere on the device to activate voice commands, except for one Android phone with an “always listening” feature.

While recognition of words was generally quite good, most participants still found the systems occasionally lacking in accuracy. Having to repeat themselves or repair a misunderstood message was frustrating and would not be practical while on the go or in the car. Many also commented that they wished the systems learned more about their unique accents, phrasing, and habits to better understand their commands and intentions.

Complex task capabilities

Overall, participants commented favorably on the convenience of voice command for simple “black and white” tasks, noting lots of room for improvement on more complex tasks.

Dedicated features such as weather, movies, and messages were in some cases faster and easier through voice commands than through manual interaction. Participants liked the interactive dialogs for these functions, which anticipated and presented relevant information in an easily understood way.

Outside of these basic functions, most systems were only able to provide a list of search results for questions like “how to tie a scarf,” presenting options to participants but not reading them aloud or giving immediate answers.

Combined with a poor awareness of the context and process of participants, these limitations made setting up a complex plan like dinner and a movie a quite a challenging task. Siri provided some semblance of context by allowing participants to scroll up to view previous interactions, but was not much better than Google Now or Cortana at making use of this information to maintain a coherent story throughout the process. This was clear when one participant, compensating for this limitation, used a pen and paper to keep track of her plan’s details throughout the process.

Clarity of options

Participants were also unsure about what they could or couldn’t use their voice for, as there was no obvious indication of options or available commands. This was apparent when a few participants experimented and discovered functions they weren’t aware of.

Further limitations included the inability to parse multiple commands at once, forcing participants to break up their questions.

Finally, some participants did not care for conversational turn-taking with their device and wanted to be able to stop or override the system at any point.

Conclusion

Smartphone personal assistants may still have a ways to go in following our thought processes, but they are proving to be a powerful and natural way to interact with these devices. Despite frustrations in voice recognition accuracy and unawareness of context, these systems streamline many functions to provide quick and easy information on the go.

With the intense competition between Siri, Google Now, and Cortana, it may not be long before we start talking to our devices more than we tap them.