Note

Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.

Speech Recognition Engine Configuration File Settings

With the exception of Simulator Results Analyzer, each of the Microsoft Grammar Development Tools requires a speech recognition engine configuration file at launch. You use a recognizer configuration file, hereafter referred to as "RecoConfig file", to specify which speech recognition engine the Grammar Development Tools will use, and to specify behavioral parameters for the speech recognition engine. The following describes the parameters that you can set in the RecoConfig file.

Configuration Element

The Configuration element is the root element of the RecoConfig file, and contains the following immediate child elements:

Child Element

Description

Provider

Required child element of the Configuration element. Use to specify the environment that hosts one or more speech recognition engines. Takes the type attribute.

Properties

Optional child element of the Configuration element. Takes the attributes that configure the response of the speech recognition engine to grammars that it receives.

CookieJar

Optional child element of the Configuration element. Contains one or more Cookie elements.

The following are examples of scenarios that would require you to use CookieJar:

  • A remote URI fetch is protected by a cookie-based authentication mechanism.

  • A framework requires parameters to be delivered via HTTP cookies instead of URI query parameters.

Cookie request headers with a domain attribute are used to initialize an empty cookie store associated with the current WebReco request. Any remote URI fetch made to load a grammar or audio content must include cookies from that cookie store.

Provider Element

Use the Provider element to designate the environment that hosts a speech recognition engine that is specified in the Properties element. You can also specify a proxy for fetching remote grammars. The Provider element takes the following attributes and child elements:

Attribute or Child Element

Description

type

Required attribute of the Provider element. Specifies the environment that hosts the speech recognition engine specified in the Properties element.

GrammarProxy

Optional child element of the Provider element. Use to make sure the speech recognition can fetch grammars through a proxy. A Provider element can contain only one GrammarProxy element. This attribute may not be available if the value for type is other than local.

Remarks

If the Provider element's type attribute targets an environment that modifies the syntax and schema of the Speech Recognition Grammar Specification (SRGS) Version 1.0 in its implementation of XML-format grammars, this may influence the output of one or more of the Grammar Development Tools. The Grammar Validator tool verifies syntax and schema of submitted grammars according to the derivation of SRGS used by the environment specified by the type attribute.

Properties Element

You can specify parameters for selecting a speech recognition engine and configure the behavior of a selected speech recognition engine by specifying values for the attributes in the Properties element. This is optional. The designated speech recognition engine will use its default values for the attributes of the Properties element unless you specify different values in the RecoConfig file.

Attribute

Description

enginetokenname

Optional attribute of the Properties element. Use this attribute to specify the speech recognition to use. The value must be the exact string that is the token name of the installed speech recognition engine, as found in the registry. This attribute may not be available if the value for the type attribute of the Provider element is other than local.

enginerequiredattributes

Optional attribute of the Properties element. A set of optional name/value pairs used to select a recognition engine, for example, to choose between different recognition vendors, or between recognizers with different abilities. The values are URI-encoded query strings, which may use either ampersands or semicolons for name/value pair separators.

Default: None; use the API endpoint’s default recognition engine.

Examples: "Language=409;CommandAndControl", "Vendor=Microsoft".

engineoptionalattributes

Optional attribute of the Properties element. A set of optional name/value pairs used to select a recognition engine, for example, to choose between different recognition vendors, or between recognizers with different abilities. The values are URI-encoded query strings, which may use either ampersands or semicolons for name/value pair separators.

Default: None; use the API endpoint’s default recognition engine.

Examples: "Language=409;CommandAndControl", "Vendor=Microsoft".

confidencelevel

Optional attribute of the Properties element. The confidence level for speech recognition. Utterances with a confidence score below the specified value are rejected (a nomatch event is thrown). A value of "0.0" means minimum confidence is needed for recognition, and a value of "1.0" requires maximum confidence.

Range: "0.0"-"1.0".

timeout

Optional attribute of the Properties element. The length of initial silence in an audio stream before a noinput event should be returned, given in seconds or milliseconds.

Examples: "3s", "850ms", "0.7s", ".5s" and "+1.5s".

completetimeout

Optional attribute of the Properties element. The length of silence required following user speech before the speech recognizer finalizes a result (either accepting it or throwing a nomatch event), when the speech is a complete match of all active grammars.

Examples: "3s", "850ms", "0.7s", ".5s" and "+1.5s".

incompletetimeout

Optional attribute of the Properties element. The length of silence required following user speech after which a recognizer finalizes a result, when the speech is an incomplete match of all active grammars.  In this case, once the timeout is triggered, the partial result is rejected (with a nomatch event). The incomplete timeout also applies when the speech prior to the silence is a complete match of an active grammar, but where it is possible to speak further and still match the grammar.

Examples: "3s", "850ms", "0.7s", ".5s" and "+1.5s".

sensitivity

Optional attribute of the Properties element. The sensitivity level of the speech recognition engine. A value of "1.0" means that it is highly sensitive to quiet input. A value of "0.0" means it is least sensitive to noise.

Range: "0.0"-"1.0".

speedvsaccuracy

Optional attribute of the Properties element. The desired balance between speed and accuracy for recognizer input. A value of "0.0" means fastest recognition. A value of "1.0" means best accuracy.

Range: "0.0"-"1.0".

maxnbest

Optional attribute of the Properties element. Maximum number of recognition results returned (the maximum size of the n-best list).

Example: "4".

requesttimeout

Optional attribute of the Properties element. The maximum amount of time that the speech recognition engine will spend to service a request.

Examples: "3s", "850ms", "0.7s", ".5s" and "+1.5s".

Note that the apparent requesttimeout intervals may be extended by network latency.

engineproperty

Optional attribute of the Properties element that specifies configuration parameters or other data for configuring the speech recognition engine. There is no standard format for this parameter, although a recommended encoding for the common case of name/value pairs is a URI encoded query string.

Default: None; equivalent to providing an empty string as a value.Example: None; differs per recognizer implementation.

logstring

Optional attribute of the Properties element containing an arbitrary text string that describes the state of the caller’s application at the time it requested recognition. This is used purely for logging, for example, to tag a request as a "locality" recognition rather than a "listing" recognition. API callers may also wish to store other speech-specific parameters such as recognition turn in this string.

Default: None; no application-specific string is logged.Example: "locality".

Remarks

If multiple engines are installed that meet the criteria for engineoptionalattributes and enginerequiredattributes, then the engine chosen will be implementation-specific and is not guaranteed to be deterministic even when the same engines are installed.

  • A best attempt will be made to satisfy all specified engineoptionalattributes, but no error will be returned if no engine is found that satisfies them.

  • If no engine is installed which has ALL the enginerequiredattributes, then an error will be returned.

If the speech recognition engine specified in the Properties element cannot be found, the requesting tool will generate an error that includes the names of available speech recognition engines in the environment.

The values for engineproperty are passed through to the local recognizer. This property is particularly useful if you need to specify a setting that is not explicitly supported by the Grammar Development Tools. Some engine property names are reserved, such as “simulationmode” which may accept a value of “emulate”, “simulateonce”, or “grammarconfusion”.

Any recognition result with a confidence score less than the confidencelevel value will not be classified as CA/In or FA/Out by the Simulator Results Analyzer. See Simulator Results Analyzer Interpreting the Results for more information.

To properly assess the confidence threshold setting for your application via the output of Simulator Results Analyzer, set confidencelevel="0". This will ensure that all recognitions are returned, regardless of confidence score. Specify the configuration file in the /RecoConfig option on the command line when you launch the Simulator tool.

For ARPA-format source grammars only, the Compile Grammar tool will look for a language declaration in the engineoptionalattributes attribute to set the language of the compiled grammar file. If no language is declared, then the language of the compiled file will be US English. A language declaration in the engineoptionalattributes attribute must be in the following form:

enginerequiredattributes="Language=[LanguageCode]"

The language code must be specified in three digit hexadecimal format, without the leading “0x”. Here is an example that specifies the French language as spoken in France.

enginerequiredattributes="Language=40C"

The Microsoft Speech Platform SDK 11 accepts all valid language-country codes as values in the enginerequiredattributes attribute. For a given language code specified in the enginerequiredattributes attribute, a speech recognition engine that supports that language code must be installed for the grammar to be loaded successfully. See Locale ID (LCID) Chart for a list of language codes.

CookieJar Element

Child Element

Description

Cookie

Optional child element of the CookieJar element. The Cookie element takes the following attributes:

  • name Required. The name of the cookie, which you can specify and which may vary from cookie to cookie.

  • value. Required. Information associated with the cookie.

  • domain. Optional. The domain in which the cookie is valid, beginning with a dot (.).

  • path. Optional. The subset of URLs to which the cookie applies.

  • secure. Optional. Indicates that a cookie should only be transmitted over a secure channel.

See Also

Concepts

Speech Recognition Engine Configuration File Examples

Compile Grammar Input and Output File Format

Simulator Results Analyzer Interpreting the Results