Handling Bing Speech Recognition data
This document describes how to work with the text and other information returned by the Bing Speech Recognition Control.
Before creating speech-enabled applications, you must install the speech control from Visual Studio Gallery or from the Visual Studio Extension Manager, as described in How to: Register and install the Bing Speech Recognition Control. Then, for each project that will use the Speech control, you must complete the preparatory steps described in How to: Enable a project for the Bing Speech Recognition Control.
This document assumes you have created a SpeechRecognizer object and UI elements to support it, as described in How to: Add the Bing Speech Recognition Control to an application with the SpeechRecognizerUx class and How to: Add the Bing Speech Recognition Control to an application with a custom UI.
When you run the SpeechRecognizer.RecognizeSpeechToTextAsync() method, it returns a SpeechRecognitionResult object. This includes the result text, a TextConfidence property that gives the estimated accuracy of the result, and a list of alternate results available through the GetAlternates(int) method. Additional information is available through the SpeechRecognizer.AudioCaptureStateChanged event which identifies the different stages of the speech recognition process, the SpeechRecognizer.AudioLevelChanged event which tracks the current audio input volume, and the SpeechRecognizer.RecognizerResultReceived event which tracks possible results identified by the speech recognition web service. For more information about using the SpeechRecognizer events, see How to: Add the Bing Speech Recognition Control to an application with a custom UI.
The final result text from a speech recognition session resides in the SpeechRecognitionResult.Text property. This is the result deemed most likely to be accurate by the SpeechRecognizer. In addition, the RecognizerResultReceived event provides intermediate results through the SpeechRecognitionResultRecievedEventArgs.Text property.
Text confidence indicates the estimated accuracy of result text. Confidence is returned as a SpeechRecognitionConfidence enumeration value.
The list of alternates is an array of SpeechRecognitionResult objects, arranged in order of confidence, with the final result as item[0] in the array. Calling GetAlternates(int) from any included result will return the same array.
The following example starts a speech recognition session and displays the result text and confidence in a TextBox named ResultText. It then lists alternate results and their confidence in a ListBox named ResultChooser. When a user selects an item from the list, the selected item is then displayed in ResultText. Intermediate results are shown in ResultText until they are overwritten by the final result or by an error message.
Caution |
|---|
When collecting speech results or intermediate results in a JavaScript application, quiet or unclear speech may cause the recognizeSpeechToTextAsync() method to return an error object in place of result text. To maintain smooth program flow, verify that the result text is a string before attempting to read it. For more information, see How to: Add the Bing Speech Recognition Control to an application with a custom UI. |
Because speaking styles vary, it is usually a good idea to provide options for users when the result text is incorrect, such as choosing from a list of alternates or accepting/rejecting a given result. You can also provide guidance in the SpeechRecognizerUx.Tips property or elsewhere in the UI to help users phrase their speech in ways that are more likely to be understood. This is particularly important if your application will respond to particular keywords, phrases, or syntax in user speech.
The Bing Speech Recognition control is optimized for sessions of usually two sentences or less. If your application will be used for dictating longer messages or documents, you may want to configure your UI to encourage a speaking flow that is compatible with this session length, and with short pauses in between for interpretation.
Depending on the intended context of your application, you may want to use SpeechSynthesis, otherwise known as Text To Speech (TTS) to communicate with your users instead of onscreen text. For information on Speech Synthesis for Windows 8.1, see the Windows.Media.SpeechSynthesis documentation.
