How to: Add the Bing Speech Recognition Control to an application with a custom UI
This document describes how to implement speech recognition with a custom UI. To use the SpeechRecognizerUx control in your application, which implements some of the UI functionality automatically, see How to: Add the Bing Speech Recognition Control to an application with the SpeechRecognizerUx class.
Before creating speech-enabled applications, you must install the speech control from Visual Studio Gallery or from the Visual Studio Extension Manager, as described in How to: Register and install the Bing Speech Recognition Control. Then, for each project that will use the Speech control, you must complete the preparatory steps described in How to: Enable a project for the Bing Speech Recognition Control.
Speech recognition functionality depends on the SpeechRecognizer class and its methods and events. The methods allow us to stop and start speech recognition, and the events let us show volume and results data, and mark the different steps in the speech recognition process so that we can adjust the UI accordingly.
For a complete code example using speech recognition with a custom UI, see the SpeechRecognizer documentation. For the list of currently supported languages, see The Bing Speech Recognition Control.
Before calling the SpeechRecognizer.SpeechRecognizer(string, SpeechAuthorizationParameters) constructor, you must create a SpeechAuthorizationParameters object and populate it with your Azure Data Marketplace credentials. These credentials enable the SpeechRecognizer to contact the web service that analyzes the audio data and converts it to text.
The following code example creates a SpeechRecognizer and adds its event handlers.
The most important speech recognition function is the SpeechRecognizer.RecognizeSpeechToTextAsync() method, which starts the speech recognition session, raises the events, and returns the results. You can expose this method directly through a UI element, such as a Button, or call it in response to another application event, such as loading a page or recognizing a microphone.
Once a speech recognition session has been started, you can end it at any time with the RequestCancelOperationl() method. This stops the session and discards any data that may have accumulated. A Cancel button is often useful to users who wish to restart their speech. You can also call this method as part of your error handling.
When the user stops speaking, the SpeechRecognizer.RecognizeSpeechToTextAsync() method detects the drop in audio input levels, stops recording, and finishes interpreting the audio data. Sometimes, especially if there is background noise, there can be a delay between the end of speech and the end of recording. Including a button to call the StopListeningAndProcessAudio() method gives your users a way to end the recording without waiting for the speech recognizer to decide they are done.
The following markup creates buttons for each of the SpeechRecognizer methods.
<!-- If your app targets Windows 8.1 Or higher, use this markup. --> <AppBarButton x:Name="SpeakButton" Icon="Microphone" Click="SpeakButton_Click"></AppBarButton> <AppBarButton x:Name="StopButton" Icon="Stop" Click="StopButton_Click"></AppBarButton> <AppBarButton x:Name="CancelButton" Icon="Cancel" Click="CancelButton_Click"></AppBarButton> <!--If your app targets Windows 8, use this markup. --> <Button x:Name="SpeakButton" Click="SpeakButton_Click" Style="{StaticResource MicrophoneAppBarButtonStyle}" /> AutomationProperties.Name="Speak" /> <Button x:Name="StopButton" Click="StopButton_Click" Style="{StaticResource StopAppBarButtonStyle}" AutomationProperties.Name="Done" /> <Button x:Name="CancelButton" Click="CancelButton_Click" Style="{StaticResource ClosePaneAppBarButtonStyle}" AutomationProperties.Name="Cancel" Content="" /> <TextBlock x:Name="ResultText" />
Note |
|---|
The XAML Style attributes in Windows 8 are set to standard UI element styles available for Windows Store applications. These styles are found in Common\StandardStyles.xaml in Solution Explorer but must be uncommented before use. Setting the AutomationProperties.Name attribute assigns a caption to the bottom of the element, and the Content attribute specifies a character code for a character that appears in the foreground of the button. Not specifying a caption or character code applies the default values specified in the style definition. |
The following code behind handles the click events for the buttons.
Note that the JavaScript code in the example above checks the value of result.text to make sure it is a string before attempting to assign it to another variable. This is because quiet or unclear speech may cause the recognizeSpeechToTextAsync() method to return an error object with error# -2147467261 in place of result.text. This error object is also passed to the intermediate results via the SpeechRecognitionResult.Text property in the SpeechRecognizer.RecognizerResultReceived event. Validating the type of result in this way maintains program flow and gives the opportunity to bypass or respond to a known error.
The most important SpeechRecognizer event is the AudioCaptureStateChanged event, because it tells you where you are in the speech recognition process. Use the SpeechRecognitionAudioCaptureStateChangedEventArgs.State property to access capture state information.
The following AudioCaptureStateChanged event handler and associated helper function show and hide a series of StackPanel elements (XAML) or Div elements (HTML) that correspond to different UI states.
You can use the SpeechRecognizer.AudioLevelChanged event to show real-time audio levels in a variety of ways, or to advise users when they should adjust their speaking volume. The following example represents current audio levels by changing the opacity of a UI element named VolumeMeter.
While the speech recognizer is receiving an audio stream, it makes multiple attempts to interpret the audio data collected so far and convert it to text. At the end of each attempt, it raises the RecognizerResultReceived event. You can use this event to capture the intermediate results while the speech recognizer is still processing. You can also identify the final result text by checking the SpeechRecognitionResultRecievedEventArgs.IsHypothesis property, but the SpeechRecognizer.RecognizeSpeechToTextAsync() returns a more complete result in the form of a SpeechRecognitionResult object.
The following example writes intermediate results to a TextBlock named IntermediateResults.
You now have all of the pieces to put together a custom speech recognition UI. For more information on processing the results, including assessing TextConfidence and making the Alternates list available, see Handling Bing Speech Recognition data.
