Voice User Interface Design Guidelines

Through these guidelines you will learn how to use principles of Voice User Interface Design to build more effective skills. These tools will help define and validate the features of the skill before diving deep into development.

This guide will cover some methods to use that can help plan, prototype and test your skill during the early design stages.

The full original guide can be found over at the mycroft documentation

Interactions

Intents

Let's start with an example. A user in Melbourne, Australia might want to know about the weather. To ask for this information, they might say:

"Hey Mycroft, what's today's weather like?"

"Hey Mycroft, what's the weather like in Melbourne?"

"Hey Mycroft, weather"

Even though these are three different expressions, for most of us they probably have roughly the same meaning. In each case we would assume the user expects OVOS to respond with today's weather for their current location.

It is up us as Skill creators to teach OVOS the variety of ways that a user might express the same intent. This is a key part of the design process. It is the key difference between a Skill that kind of works if you know what to say, and a Skill that feels intuitive and natural to talk to.

This is handled by an intent parser whose job it is to learn from your Skill what intents it can handle, and extract from the user's speech and key information that might be useful for your Skill. In this case it might include the specified date and location.

Statements and Prompts

You can think of Prompts as questions and Statements as providing information to the user that does not need a follow-up response. For example a weather forecast like this would be considered a statement:

Today’s forecast is sunny with a high of 60 and a low of 45.

Statements

For a lot of skills the conversation might end with a simple statement from OVOS, and no further action is necessary. Try to imagine what the user is trying to accomplish, if a simple statement gets the job done there is no reason to keep the conversation rolling, and in fact a follow-up might annoy the user with unnecessary interaction.

Prompts

It may be tempting to always give users specific instructions like traditional automated phones systems (Interactive Voice Response). Many phone systems are notorious for being too verbose and difficult to follow.

With OVOS we’re trying to break that mold and make the interaction natural. If you follow the phone system method you may be giving the user the exact phrasing to say, but you’re also taking up valuable time and training them to think the system is very narrow in capability. In the event that the user does give a response that your skill can not handle, create follow-up prompts that steer the user back on track.

Remember, there are no errors in a cooperative conversation.

Avoid

Speaker
Mycroft How many people are playing? For example, you can say 2 players.

Better

Speaker
Mycroft How many players?
User My Brother and Myself
Mycroft I’m sorry, what was the number of players?
User Two

In the first example the user is told explicitly what they can say, but the prompt is unnatural, we don’t typically suggest responses to a conversation partner in real life. These long-winded prompts can become tiresome if they are used repeatedly throughout the skill. Remember the phone system example, typically poorly designed automated phone systems inundate the user with many options and additional instructions at every step of the interaction.

In the second example we see a better prompt, although the user gives a response that is easy for a human to understand it is more difficult to design a skill to understand. Instead, the skill designer can apply a re-prompt strategy that steers the user back on track and doesn't require them to start the interaction over. In this case the re-prompt changes the wording slightly to make it clear that the user needs to say a number. The next time the user interacts with this Skill, they will likely say a number in their first interaction.

Determining whether to respond to the user with a statement or a prompt can be a bit tricky. When the user is somewhat vague it is tempting to assume the user’s intent in order to speed along the interaction. Sometimes this is the right approach and sometimes it is not. If you are very confident in the user’s input then it is alright to assume, and respond with a statement. For example in the Wikipedia Skill the Wikipedia API is used to confidently select the best response.

Wikipedia Skill

Speaker
User Tell me about Abraham Lincoln
Mycroft Abraham Lincoln was an American statesman and lawyer who served as the 16th president of the UnitedStates from 1861 to 1865. Lincoln led the nation through its greatest moral, constitutional, and political crisis in the American Civil War.

In contrast, let’s look at an example of where a follow-up prompt is a better approach. You’re working on a skill that gives important information about birds, such as wingspan, laden and unladen airspeed, etc..

Avoid

Speaker ****
User what is the airspeed velocity of an unladen swallow?
Mycroft The unladen airspeed of the european swallow is 20.1 miles per hour.

Better

Speaker
User What is the airspeed velocity of an unladen Swallow?
Mycroft What do you mean, African or European Swallow?
User European Swallow.
Mycroft The unladen airspeed of the European Swallow is 20.1 miles per hour.

In the first example Mycroft assumes the user is referring to the European Swallow, however there is nothing to give confidence to that assumption. In this case it’s dangerous to assume the User meant the European Swallow since it is just as likely they meant the African Swallow. When there is no confidence in one answer versus another, OVOS should follow up with a prompt as demonstrated in the second example.

Another approach to solving this problem is to offer the most likely answer and inform the user that there are other answers available with a statement.

Confirmations

Confirmation approaches can also be defined by Statements or Prompts , but when we talk about them in the context of confirmations we call them Implicit and Explicit.

Implicit Confirmation

This type of confirmation is also a statement. The idea is to parrot the information back to the user to confirm that it was correct, but not require additional input from the user. The implicit confirmation can be used in a majority of situations. For example Mycroft’s Timer Skill uses implicit confirmation when creating new timers.

Timer Skill

Speaker
User Hey Mycroft set a timer for 5 minutes
Mycroft Alright, I’m starting a 5 minute timer

Avoid

Speaker
User Hey Mycroft, add bananas to my shopping list
Mycroft done

Better

Speaker
User Hey Mycroft, add bananas to my shopping list
Mycroft Adding bananas

Explicit Confirmation

This type of confirmation requires an input from the user to verify everything is correct. Only use explicit confirmation when needed as it can be tedious for the user to always confirm their input. This strategy is best when there are multiple steps, and it would be difficult to fix a misstep without starting over, or when deleting or modifying an existing multistep input.

Avoid

Speaker
User Hey Mycroft, set an alarm for 9:00 am tomorrow
Mycroft Ok, I’ve set an alarm for 9:00 am tomorrow, is that correct?
User Yes
Mycroft 9:00 am alarm set

Better

Speaker
User Hey Mycroft, set an alarm for 9:00 am tomorrow
Mycroft I've set an alarm for tomorrow at 9:00 am, 16 hours from now

The first example requires the user to confirm the alarm. Since the interaction is quite short it isn’t necessary to use explicit confirmation.

Avoid

Speaker
User Hey Mycroft, set a calendar event for a meeting with Chris Stevens
Mycroft When is the event?
User Tomorrow at 11:00 am
Mycroft should I send an invite to Chris Stevens?
User Yes
Mycroft Alright, I've created a meeting for Tomorrow
Mycroft Alright, I’ve created a meeting for Tomorrow at 11:00 am with Chris Stevens
User Cancel my 11:00 am meeting tomorrow
Mycroft Done

Better

Speaker
User Hey Mycroft, set a calendar event for a meeting with Chris Stevens
Mycroft When is the event?
User Tomorrow at 11:00 am
Mycroft should I send an invite to Chris Stevens?
User Yes
Mycroft Alright, I’ve created a meeting for Tomorrow at 11:00 am with Chris Stevens
User Cancel my 11:00 am meeting tomorrow
Mycroft Are you sure you want to delete your meeting tomorrow at 11:00 am?
User Yes

Both examples use explicit and implicit confirmation correctly until the end of the interaction when the user wants to cancel the event. Unlike setting a simple timer or an alarm creating this calendar event requires multiple steps and would require reasonable effort to recreate. There may also be significant consequences if the request was misinterpreted and the meeting was cancelled unintentionally.

Conversations

Any time the user needs to input a lot of information or the user needs to sort through a variety of options a conversation will be needed. Users may be used to systems that require them to separate input into different chunks. For example, many personal navigation devices (GPS units) require users to input the street number, street name, city, and state separately, like this:

Speaker
Navigation Device What is the street number
User 123
Navigation Device What is the street name
User Main Street
Navigation Device What is the city
User Springfield
Navigation Device Oregon

This was a limitation of the technology, both the STT and intent parser were not capable of handling an entire address. With OVOS you do not need to separate the address into multiple inputs. It’s also easy to think like a traditional GUI and translate what are typically separate fields into prompts for the user, but that will only slow down the interaction.

The best solution is to allow the user to input information both ways, all at once, or in pieces. For example the Alarm skill allows the user to set a complicated alarm all in one go, or in steps.

In Steps:

Speaker
User Hey Mycroft, set a recurring alarm
Mycroft For what day?
User weekdays
Mycroft For what time?
User 7 am
Mycroft I've set a recurring Alarm for 7 am on weekdays

All at once:

Speaker
User Hey Mycroft, set a recurring alarm for weekdays at 7 am
Mycroft I've set a recurring Alarm for 7 am on weekdays

Context

Conversational context is a given when speaking to another person, but not always when interacting with a computer based system. Many skills are designed with only a quick back and forth interaction in mind, and can operate well without conversational context, but conversational context can be a powerful addition to a skill and can make the interaction much more life like. Here are some examples of how to use context in conversation.

Speaker
User Who is Shaquille O'Neal
Mycroft Shaquille O'Neal is an American former professional basketball player who is a sports analyst on the television program Inside the NBA on TNT. He is considered one of the greatest players in National Basketball Association history
User How tall is he?
Mycroft seven foot 1 inches tall
User when did he retire
Mycroft 2011

Error Handling

Inevitably, the user will say something that your skill can’t handle. It’s best not to think of these as errors on the part of the user, remember there aren’t really errors in conversations. Conversations are naturally cooperative with both parties seeking to continue to a successful outcome. Keeping that in mind you should do your best to avoid Error Handling that is a dead-end.

Avoid

Speaker
User Timer
Mycroft I'm sorry I can't help you with that

Better

Speaker
User Timer
Mycroft A timer for how long?
User 5 minutes

In the first example OVOS does not give the user any options to finish the job they set out to do. Try to avoid situations where the user has to start the conversation over.