Grammar Development Workflows
Creating grammars is perhaps the most challenging task when building a speech-enabled application. How can you determine whether the grammar that you created provides the user experience that you intend? How can you improve the response of your application to voice input? The Microsoft Grammar Development Tools are designed to provide you with actionable answers to these questions and others throughout the grammar development process.
The processes involved in grammar development may vary depending on your application and its environment. Often, grammar development includes the following phases:
Tuning with Utterances
The following diagram provides an overview of using the Grammar Development Tools in the grammar development process.
Before you begin the process of actually creating grammar files, you will need to envision and define how users will interact with your application by speaking. In this design phase, you will want to answer questions such as the following:
Which application contexts are good candidates for voice interaction?
What actions or application behavior can a user initiate using her voice?
What words can a user say to initiate each action or behavior?
How will users know when they can speak to the application and what they can say?
Answers to these questions and others will focus the task of developing grammars that facilitate engaging and rewarding interactions for users of your application.
You will typically use the Grammar Development Tools after design decisions have been made; voice-driven interactions between your application and its users have been identified and designed, and a vocabulary that your application can understand has been defined.
Now that you are in the authoring phase, you can use an XML editor, for example Visual Studio, to author grammar files. You can also create grammars programmatically that comply with the Speech Recognition Grammar Specification (SRGS) Version 1.0 using the Microsoft.Speech.Recognition.SrgsGrammar namespace, and export the resulting grammars to XML-Format files for consumption by the Grammar Development Tools. See Create Grammars Using SrgsGrammar (Microsoft.Speech).
Having authored one or more grammar files, you can use any of several Grammar Development Tools to determine whether a grammar achieves the goals for which it was designed. In this phase, you may have the following questions:
Does my grammar have access to all the resources it needs?
Does my grammar over-generate or under-generate phrases?
Is a specific phrase in the grammar?
Does my grammar contain acoustically ambiguous content?
In this phase, you will typically revise a grammar based on output from one or more of the Grammar Development Tools, including the following:
Phrase Generator. Produce a comprehensive or filtered list of phrases that a grammar will recognize. The result can help you determine whether your grammars over-generate or under-generate phrases. You can use lists of phrases output by Phrase Generator as input to the Confusability tool. See Phrase Generator Reference Manual.
Check Phrase. Determine with certainty whether or not any given phrase is in a grammar. See Check Phrase Reference Manual.
Confusability. Analyze grammars for words that may be mistaken for each other when spoken. Speech applications may incorrectly recognize a word in a grammar because its sound is similar to another word in the grammar. You can address confusable phrases in your grammars by substituting synonyms, adding custom pronunciations, or making a phrase longer. You can even use Confusability as an authoring tool by checking whether proposed new phrases, if added to the grammar, may be confused with phrases already in the grammar. See Confusability Reference Manual.
Tuning with Utterances
Speech applications may incorrectly accept unintended words or reject words that are part of the application's vocabulary. You are almost ready to deploy, but may still have the following questions:
How accurately does a grammar recognize intended input and reject unintended input?
Which confidencelevel setting on the speech recognition will provide the most accurate performance?
What out-of-grammar phrases should I consider adding to the grammar as synonyms?
If your production schedule allows it, and you have access to utterances that represent the vocabulary of your application, you can test the effectiveness of your application's grammars in recognizing its vocabulary using the following tools:
Simulator. Connects to a speech recognition engine and produces raw recognition results for batches of utterances (such as ".wav" files). See Simulator Reference Manual.
Simulator Results Analyzer. Computes percentages and totals that illustrate the effectiveness of recognition for the utterances used as input to Simulator. The analysis results not only point out errors in recognition, but also provide guidance for determining the optimum confidencelevel setting for your speech recognition engine. See Simulator Results Analyzer Reference Manual.
Speech Recognition Engine Configuration File. Set the confidencelevel of the speech recognition to achieve the best ratio of accepted recognitions to rejected recognitions. See Speech Recognition Engine Configuration File Settings.
If you do not have access to utterances, you can supply Simulator with a list of phrases in an EMMA document and receive emulated recognition results that include whether each phrase is in a grammar, as well as the semantic result for each in-grammar phrase.
When you are ready to deploy, you can compile and optimize your grammars for the Microsoft Speech Platform Runtime 11 using the Compile Grammar tool. See Compile Grammar Reference Manual. Optionally, you can further optimize your compiled grammars for the specific speech recognition engine that your application will use in the production environment in which you expect it to run, using the Prepare Grammar tool. See Prepare Grammar Reference Manual. This will help you address the following concerns:
How can I minimize the risk of application latency when loading grammars (especially for large grammars)?
How can I convert source grammars in Advanced Research Projects Agency (ARPA) format for use with the Speech Platform Runtime 11?