Speech Recognition

Microsoft Robotics

Glossary Item Box

VPL User Interface Services: Text To Speech

Microsoft Robotics Developer StudioSend feedback on this topic

Speech Recognition

Speech Recognizer Icon

Speech recognition (SR) converts spoken words to written text and as a result can be used to provide user interfaces that use spoken input. The Speech Recognizer service enables you to include speech recognition support for your application. Speech recognition requires a special type of software, called an SR engine. The SR engine may be installed with the operating system or at a later time with other software. Speech-enabled packages such as word processors and web browsers, may install their own engines or they can use existing ones. Additional engines are also available through third party manufacturers. These engines are typically designed to only support a specific language and may also target a certain vocabulary; for example, a vocabulary specializing in medical or legal terminology.

Dd145257.hs-caution(en-us,MSDN.10).gif

Note that speech recognition is not available on all versions of Windows. Before you attempt using the Speech Recognizer service, use the Windows Control Panel on your PC to confirm that a compatible (SAPI) speech recognition engine is installed and make sure that it is properly configured and working by using the Help or user documentation that is provided for it.

You will also need a microphone or some other sound input device to receive the sound. In general, the microphone should be a high quality device with noise filters built in. The speech recognition accuracy is directly related to the quality of the input. The recognition rate will be significantly lower or perhaps even unacceptable with a poor microphone.

The .NET 3.0 (or later) runtime is also required for using this service (and may be available from the Microsoft website).

 

Operations

The Speech Recognizer services supports the following requests and notifications.

OperationDescription
GetReturns the entire state of the Speech Recognizer service.
InsertGrammarEntryInserts the specified entry (or entries) of the supplied grammar into the current grammar dictionary. If certain entries exist already a Fault is returned and the whole operation fails without the current dictionary being modified at all.
UpdateGrammarEntryUpdates entries that already exist in the current grammar dictionary with the supplied grammar entries. If certain entries in the supplied grammar do not exist in the current dictionary no Fault is returned. Instead, only the existing entries are updated.
UpsertGrammarEntryInserts entries from the supplied grammar into the current dictionary if they do not exist yet or updates entries that already exist with entries from the supplied grammar.
DeleteGrammarEntryDeletes those entries from the current grammar directory whose keys are equal to one of the supplied grammar entries. If a key from the supplied grammar entries does not exist in the current directory no Fault is returned, but any matching entries are deleted.
SetSrgsGrammarFileSets the grammar type to SRGS file and tries to load the specified file, which has to reside inside your application's /store folder (directory). If loading the file fails, a Fault is returned and the speech recognizer returns the state it was before it processed this request. SRGS grammars require Windows 7 and will not work with Windows Server 2003.
EmulateRecognizeSets the SR engine to emulate speech input but by using Text (string). This is mostly used for testing and debugging.
ReplaceConfigures the speech recognizer service, or indicates that the service's configuration has been changed.
SpeechDetectedIndicates that speech (audio) has been detected and is being processed.
SpeechRecognizedIndicates that speech has been recognized.
SpeechRecognitionRejectedIndicates that speech was detected, but not recognized as one of the words or phrases in the current grammar dictionary. The duration of the speech is available as DurationInTicks.

To support SR you define a grammar - the words and phrases to be recognized and then use notifications provided by the service to determine what SR engine recognized as the spoken input. The Speech Recognizer service supports usage of simple dictionary-style grammars as well as W3C SRGS grammars.

Dd145257.hs-caution(en-us,MSDN.10).gif

Note that you also cannot use the service's grammar operations as requests from VPL because they require a special data structure that cannot easily be supported by VPL, but you can receive them as notifications.

To define which type of grammar you want the Speech Recognizer service to use, you set the state of this service by either using setting its initial configuration in the Properties window (setting Configuration to Set initial configuration) or by using a Replace request or a SetSrgsGrammarFile request.

The service's initial state includes the following properties:

NameTypeDescription
IgnoreAudioInputBooleanSpecifies whether the speech service listens for audio (spoken) input (when this is set to false). This may useful for turning off the SR engine temporarily(or when using emulation recognition).
GrammarTypeGrammarTypeSpecifies the type of grammar the SR engine will use, either a simple Dictionary grammar or SRGS grammar.
SrgsFileLocationstringSpecifies the SRGS grammar file to be loaded (only used if you set GrammarType to SRGS).

Setting GrammarType to Dictionary configures the service to use a simple dictionary-style grammar. A dictionary-style grammar is a list of entries that each consist of a set of words for the speech engine to listen for and an optional corresponding semantic tag that represents that recognition. For example, you might define an entry like, Tell me the time, and call its semantic tag, TimeQuery.

To create a simple dictionary-style grammar you define the grammar as part of the service's configuration XML file (SpeechRecognizer.config.xml) for the service. This file is automatically created for your project if you choose the Set initial configuration option. If you save your project, and then open this file (using any XML editor including Windows Notepad) you can add entries for your grammar and save it back to this location. The file must be saved to this location. It is loaded when your project runs.

For each entry, add a beginning and ending XML Elem tag, and an XML string entry for the words the SR engine should listen for and its for the optional semantic tag. You can also use the SpeechRecognizerGui service to generate a Web page that enables you to enter and save a simple dictionary grammar file. For further details, see information on the SpeechRecognizerGui service.

The following is an example for a simple dictionary-style grammar file:

<?xml version="1.0" encoding="utf-8"?>
<SpeechRecognizerState xmlns="http://schemas.microsoft.com/robotics/2008/02/speechrecognizer.html">
  <DictionaryGrammar>
    <Elem>
      <string xmlns="">Backward</string>
      <string xmlns="">Backward</string>
    </Elem>
    <Elem>
      <string xmlns="">Follow me</string>
      <string xmlns="">FollowMe</string>
    </Elem>
    <Elem>
      <string xmlns="">Forward</string>
      <string xmlns="">Forward</string>
    </Elem>
    <Elem>
      <string xmlns="">Left</string>
      <string xmlns="">Left</string>
    </Elem>
    <Elem>
      <string xmlns="">Right</string>
      <string xmlns="">Right</string>
    </Elem>
    <Elem>
      <string xmlns="">Stop moving</string>
      <string xmlns="">Stop</string>
    </Elem>
  </DictionaryGrammar>
  <IgnoreAudioInput>false</IgnoreAudioInput>
  <GrammarType>DictionaryStyle</GrammarType>
</SpeechRecognizerState>

To use an SRGS grammar, set the GrammarType to Srgs and supply the filename to SrgsFileLocation, either by setting the initial configuration properties or using a Replace request. Using the SetSrgsGrammarFile request automatically sets GrammarType and tries to load the specified SRGS grammar file (which must be located in your application's \Store folder).

SRGS grammars are also XML files which can be created using a simple editor or from speech tools that generate this format. Details about this format can be found at http://www.w3.org/TR/speech-grammar/.

 

Service State

You can use a Get request to return the general state of the Speech Recognizer service. However, the recognition state is provided by the SpeechDetected, SpeechRecognized, and SpeechRecognitionRejected notifications.

SpeechDetected returns StartTime (DateTime), which is the time when the SR detects audio input.

When the SR recognizes the input a SpeechRecognized notification returns the following state:

NameTypeDescription
ConfidencefloatReturn a value between 0 and 1 indicating the SR engine's rating of the certainty of correct recognition for the phrase information returned (higher is better). However, it is a relative measure of the certainty and therefore may vary for each recognition engine. If -1 is returned the speech engine does not provide confidence information.
TextstringReturns the words recognized.
SemanticsRecognizedSemanticValueReturns the semantic value object(s), if any, of the recognized words.
DurationInTickslong integerReturns the duration of the utterance recognized. There are 10,000,000 ticks per second.

If you load an SRGS grammar you can use Semantics to access the semantic information which applies to the recognized utterance. It may also include a collection of the child semantic values of the utterance recognized.

NameTypeDescription
ChildrenDssDictionaryReturns the collection of (child) semantic value objects.
ConfidencefloatReturns a value between 0 and 1 indicating the SR engine's rating of the certainty of correct recognition for the phrase information returned (higher is better). However, it is a relative measure of the certainty and therefore may vary for each recognition engine. If -1 is returned the speech engine does not provide confidence information.
KeyNamestringReturns thekey string by which this semantic value can be referenced.
TypeOfValueRecognizedValueTypeReturns the type of this semantic value.
ValueBoolBooleanReturns the Boolean value of the semantic value.
ValueFloatfloatReturns the float value of the semantic value.
ValueIntintReturns the int value of the semantic value.
ValueStringstringReturns the string value of the semantic value.

To access child semantic values, use the dot notation. For example, you can access the number of child semantic values by using Semantics.Children.Count. To access the values of the children, you can use their grammar rule name. For example, if you have a number of cities listed under a rule called "Destination", and the recognition matched a destination, you could access the confidence rating of the destination match using Semantics.Children["Destination"].Confidence and its value using Semantics.Children["Destination"].Value.

If the SR engine fails to recognize the input as matching anything in its grammar, then it instead generates a SpeechRecognitionRejected notification, returning StartTime (DateTime) and DurationInTicks (long integer).

 

Speech Recognizer Gui

Speech Recognizer Gui Icon

The Speech Recognizer Gui service is a companion service that you can use with the Speech Recognizer service. Including the Speech Recognizer Gui service in your project enables you to enter a simple dictionary-style grammar or to upload SRGS (Speech Recognition Grammar Specification) grammar files through a HTML page.

To access the Speech Recognizer Gui service in VPL, drag a copy of the service block into your diagram. It does not require any connections, and it will start up when you run the diagram. You can also optionally start an instance of the Speech Recognizer Gui once you have a DSS node running by using a web browser and going to the Control Panel page. Starting the service will automatically attempt to load the default SR engine.

Once the Speech Recognizer Gui service is running, browse to the page for the service. To do this start your browser and enter in http:/localhost:50000 (50000 is the default port setting. If you run services on a different port use that.) Then click on Service Directory in the left column to display the list of running services. You should find the entry /speechrecognizergui in the list. Click this and you should see a page like the figure below.

Speech Recognizer

Speech Recognizer Gui - Service page

At the bottom of the Speech Recognizer Gui page you select either a simple dictionary style grammar or load an SRGS grammar. Click Save to use the grammar. This will create a copy of the grammar file in the \Store folder. If you chose to create a Dictionary grammar, the Save command creates a SpeechRecognizer.config.xml file in the \Store folder. If you chose to load a SRGS grammar file, the file you browse to will be copied to \Store.

Note that if you create a grammar using the Speech Recognizer Gui service and do not explicitly configure the Speech Recognizer service to load a grammar file, the Speech Recognizer service will automatically attempt to load and use the grammar file you created.

The Speech Recognizer Gui service page also displays the notification generated by the SR engine such as speech detected or speech recognized in a scrolling area that can be cleared. Note that the SR engine only recognizes words and phrases that are in its grammar. If the grammar is empty, then nothing will be recognized.

The Speech Recognizer Gui service is not designed to be sent requests or to issue notifications. It is only intended to be used via a Web browser to build dictionary-style grammars and test speech recognition.

VPL User Interface Services: Text To Speech

 

 

© 2012 Microsoft Corporation. All Rights Reserved.

Show: