Speech Recognition Sample
Glossary Item Box
The SpeechRecognizer service provides different ways to use speech recognition, depending on the complexity of the project or knowledge of the user.
This sample is provided in the C# language. You can find the project files for this sample at the following location under the Microsoft Robotics Developer Studio installation folder:
This sample covers:
A microphone to record speech commands.
The .NET 3.0 runtime or later is required for the System.Speech libraries, and a trained speech recognition profile for improved speech recognition (optional). Note that speech recognition is not available on all platforms. In particular, it is not available for some Intel Architecture-64 bit operating systems, which include Windows XP 64-bit Edition and Windows Server 2003 Enterprise and DataCenter Editions. However, it is included with Vista 64.
The latest version of the Speech API (SAPI) is V5.3 which ships with Vista. However, only version 5.1 is available for Windows XP. This means that Speech Recognition Grammar Specification (SRGS) grammar files are not supported under Windows XP and an exception is generated if you try to load a SRGS file on Windows XP. You can still use the simple dictionary format for grammar files on Windows XP, so speech recognition is still possible.
You will also need Microsoft Internet Explorer or another conventional web browser.
The SpeechRecognizer service represents the core speech recognition service (as opposed to the SpeechRecognizerGui service which offers the user interface component to the core service). The core service allows for usage of simple dictionary-style grammars as well as complex SRGS (Speech Recognition Grammar Specification) grammars, specified in XML.
Step 1: Set the Initial State
The SpeechRecognizer service supports the Initial State partner. The initial state is used to configure:
- What type of grammar is being used
- What the grammar looks like or where it can be loaded from
The default config file has to be called "SpeechRecognizer.config.xml", and it specifies the commands that will be use by the recognizer.
The config file for a dictionary-style grammar could look as follows:
<?xml version="1.0" encoding="utf-8"?> <SpeechRecognizerState xmlns:s="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing" xmlns:d="http://schemas.microsoft.com/xw/2004/10/dssp.html" xmlns="http://schemas.microsoft.com/robotics/2008/02/speechrecognizer.html"> <DictionaryGrammar> <Elem> <string xmlns="">Hello world</string> <string xmlns="">HelloWorld</string> </Elem> </DictionaryGrammar> <IgnoreAudioInput>false</IgnoreAudioInput> <GrammarType>DictionaryStyle</GrammarType> </SpeechRecognizerState>
Step 2: Start and Run the Sample
Start the DSS Command Prompt from the Start > All Programs menu.
Start a DssHost node and create an instance of the service by typing the following command:
dsshost /p:50000 /m:"samples\config\SpeechRecognizer.manifest.xml"
This starts the service and you see a response like the following:
* Starting manifest load: file:///C:/.../samples/config/SpeechRecognizer.manifest.xml [03/28/2008 14:57:30][http://localhost:50000/manifestloaderclient] * Manifest load complete [03/28/2008 14:57:31][http://localhost:50000/manifestloaderclient] * Service started [03/28/2008 14:57:36][http://localhost:50000/speechrecognizer]
Step 3: Start the GUI to Configure Speech Recognition
The SpeechRecognizer service itself does not expose a user interface, which makes it hard to test without writing your own service or VPL diagram. The sibling service SpeechRecognizerGui however allows for configuration of simple dictionary-style grammars or for upload of more complex SRGS (Speech Recognition Grammar Specification) grammar files written in XML by means of a web interface. You can start an instance of the SpeechRecognizerGui once you have a DSS node running by using a web browser and going to the Control Panel page.
Once the SpeechRecognizerGui is running, browse to the web page for the service. This is shown in the figure below. The web interface shows events such as speech detected or speech recognized in a scrolling area that can be cleared.
At the bottom of the SpeechRecognizerGui web page you can define a grammar. Note that the SpeechRecognizer only recognizes words and phrases that are in its grammar. If the grammar is empty, then nothing will be recognized.
The screenshot above shows a simple dictionary type of grammar. This can be used on either Windows XP or Vista. If you change the grammar type to SRGS file, then you will not be able to use Speech Recognition on Windows XP because it does not support this file format.
In this sample, we covered:
User Interface: Microsoft Speech API
Technology Samples: Text-To-Speech Service
VPL User Interface Services: Speech Recognition
© 2012 Microsoft Corporation. All Rights Reserved.