Designing Prompts

  Microsoft Speech Technologies Homepage

The following section discusses issues involved with designing effective prompts.

Directed and Open Prompts

There are two contrasting types of prompts.

  • Directed prompts
  • Open prompts

A directed prompt lists specific choices for users: "Please select cheese, pepperoni, or sausage." An open prompt allows users to speak their own answers: "Which movie would you like?" If users are familiar with the choices, perhaps through frequent use of the application, open prompts are fine. However, if a wide variety of users use the application, or they use it on an infrequent basis, directed prompts are a better choice. For example, a call center application is best suited to use directed prompts.

When choosing between directed and open prompts, also consider the number of options presented to the user and how likely it is that the options will change. If there will never be more than three options, a directed prompt is the best choice, because it minimizes user confusion.

For directed prompts, choose a prompt that follows the form "Please select X, Y, or Z" rather than one that follows the form "Would you like X or do you want Y or Z." The second prompt invites a Yes or No response from the user after each option. Encourage the user to select one of the options by beginning the prompt with the phrase, "Please select."

If the list of options is either long (a list of stock investments, for example) or variable (movie titles, for example), adding a list of choices to the prompt may be impractical. In this case, use an open prompt. Consider providing an example as help.

Application: "Please select a stock name."
User: "Help."
Application: "Please select a stock name. For example, Microsoft please."

For help messages that provide example input, use a different voice to speak the portion of the example that the system expects the user to say (in this example, "Microsoft please"). This technique reinforces the expected form of the answer to the user.

Confirmations

A confirmation is an acknowledgement that the system has heard a user's response.

Application: "Where do you want to fly to?"
User: "Paris"
Application: "On which day would you like to leave Paris?"

Think about where in the dialog flow users need confirmations. Recognizing speech from a telephone is not perfect, particularly under noisy conditions. In addition, voice-only applications have only one channel of communication with the user. An effective confirmation and correction strategy alleviates these issues.

A good voice-only application uses a variety of techniques for confirmation and correction. The techniques depend on the style of the application, the importance of the action being performed, the cost of misunderstanding, and the need for a natural dialog.

For example, a dialog that follows each question with a confirmation of the form "Did you say X?" is slow and potentially very frustrating. Conversely, a dialog that employs no confirmation and, based on a misrecognized command, deletes data without first checking with the user, is equally frustrating. A developer must strike a balance between efficient interaction with the application and protection from wasted time or lost data. In many cases, the cost of misrecognition is so low as to warrant no confirmation and correction at all. In other cases explicit confirmation is always required, regardless of the application's confidence in the user's utterance.

ASP.NET Speech Controls facilitate confirmation using a separate Confirm element that is distinct from the Answer element initially used to obtain the item. The RunSpeech algorithm handles Confirm elements in a different way than it handles Answer elements.

  • The Answer element changes the status of an item of user information from empty to needs confirmation.

  • For Confirm elements, the RunSpeech algorithm handles the further change of status to confirmed or denied, depending on the user's response to the confirmation question.

    • If the user says yes or repeats the item (or both), then the item's status changes to confirmed.
    • If the user changes the item, then the status remains needs confirmation.
    • If the user says no then the item reverts to empty.

    This status change logic does not need to be coded; the RunSpeech algorithm handles it automatically.

Speech Controls provide additional support that implement at three distinctly different styles of confirmation.

  • Explicit Confirmation (EC)
  • Implicit Confirmation (IC)
  • Short Time-out Confirmation (STC)

Explicit Confirmation

Explicit confirmation (EC) is the most basic form of confirmation. Of the three styles of confirmation, EC takes the most user time, because it introduces an extra prompt to explicitly confirm information that the user has previously provided. Use EC for situations in which the cost of a misunderstanding is high. For example, in a flight booking application, the application must understand the cities between which the user wishes to fly. Explicit confirmation results in a dialog interaction of the following form.

Application: "Where are you flying from?"
User: "Seattle"
Application: "Did you say Seattle?"
User: "Yes"
Application: "On what date are you flying?"

Implicit Confirmation

Implicit confirmation (IC) combines the confirmation question with the next information retrieval question into a single prompt. This method uses fewer prompts than explicit confirmation. Consider a flight booking scenario where the application obtains the city that the user is flying from, followed by the date. IC results in a dialog interaction of the following form.

Application: "Where are you flying from?"
User: "Seattle"
Application: "Flying from Seattle. On what date?"

If the user answers this question with a date, then the answer implies that Seattle is correct, thereby confirming selection of Seattle as the departure city. The grammar for IC interaction is subtly different from the grammar for EC. The grammar for IC combines acceptance or denial of the previous prompt (in this case, the city) with supply of information for the next prompt.

Application: "Flying from Seattle. On what date?"
User: "No"
Application: "Where are you flying from?"
User: "Vancouver"
Application: "Flying from Boston. On what date?"
User: "No, Vancouver"
Application: "Flying from Vancouver. On what date?"

Answering with a simple yes does not answer this kind of question completely. The application activates an xpathAcceptConfirms trigger when the user supplies a response of the form Yes, Vancouver, on February 15th.

In the EC scenario, one QA control obtains the information, using an Answer element, followed by another QA control to confirm the information, using a Confirm element. In the IC scenario, a single QA control contains both the Answer and Confirm elements. The prompt select function uses the associative array of previous answers and values to set the value of the city used in the prompt.

Short Time-out Confirmation

For the Short Time-out Confirmation (STC) method, the confirmation question is an echo of the informational item, either as a statement or a question. Aside from the length of the prompt, this scenario also differs from the EC scenario in two ways.

  • The STC method interprets silence as acceptance.
  • The silence time-out, which is the period of time that the application waits for the user to speak, is less than the typical amount in the EC method.

For example, if the normal silence time-out is three seconds, then the time-out for STC should be one second. The application does not expect a response. Instead, the application makes a statement of its understanding to the user and invites a correction. Assuming that the system is correct most of the time, the dialog flow moves quickly and smoothly in the STC method.

Application: "Which city do you want to fly to?"
User: "Seattle."
Application: "Seattle."
User: ""
Application: "At what time do you want to fly?"

Because the user did not correct the system when it repeated the value, the application accepts the value "Seattle."

The value of the firstInitialTimeout property of the QA control determines whether a silence greater than a certain length of time is interpreted as confirmation. If firstInitialTimeout is 0, silence does not imply confirmation. The grammar for the STC interaction is identical to the grammar for the EC interaction. At a minimum, the grammar should support Yes or No on its own, or Yes or No followed by a corrected value, such as "No, Seattle." An STC interaction can revert to being an EC interaction under certain circumstances. If the user causes the current QA to repeat for any reason (mumbling, asking for Help or saying Repeat) then the QA reverts to an EC state, as in this example.

Application: "Which city do you want to fly to?"
User: "Seattle."
Application: "Seattle."
User: "Mumble"
Application: "Am I right with Seattle?"

In this example, the application does not accept silence as a Yes, and the silence time-out returns to its original value. From this point forward, the QA behaves in an EC state. To create this interaction, code the appropriate prompts for the STC and EC versions of the QA, using the count parameter that the prompt engine passes to the prompt function. The count parameter allows the prompt function logic to select between the prompts "Seattle" and "Am I right with Seattle?".

See Also

Prompting the User | Creating Prompts | Prompts