Microsoft Speech Platform

SRGS XML Grammar Format Overview

The Microsoft Speech Platform supports XML-format grammars authored in accordance with the Speech Recognition Grammar Specification (SRGS) Version 1.0. The following is a summary of the most commonly used elements in an SRGS XML grammar. Links in the table lead to more information about each element.

Element	Description
XML Declaration	Specifies the XML version number, and optionally the character encodings. This header must appear on the first line of all XML documents.
grammar element	The highest level container for an XML grammar definition. Specifies properties of the grammar, such as language and semantic format.
rule element	Contains text or XML elements that define what speakers can say, and the order in which they can say it. Every grammar must have at least one rule element.
item element	Specifies a word or other entity that can be spoken, such as content in token elements, a ruleref element, a tag element, or any logical combination of these.
one-of element	Specifies a set of alternative phrases that can possibly be matched by a user. Each alternative word or phrase must be enclosed within an item element.
ruleref element	Specifies a reference by the containing rule to another rule, either in the same grammar or in an external grammar.
token element	Contains a string that a speech recognizer can use for recognition and optionally specifies the display form of the string and the precise pronunciation that will trigger recognition.
tag element	Contains semantic information, either as a string or as ECMAScript (JavaScript, JScript), which returns additional information when an element or series of elements is recognized.

Note: The Speech Platform does not support SRGS grammars in Augmented Backus-Naur Form (ABNF).

Example

The following example grammar uses the elements described above and illustrates the structure of an SRGS grammar. This grammar recognizes phrases such as "The warrior's name is Klhtr" and "The warrior's name is Eanor".

<?xml version="1.0" encoding="UTF-8"?>
<grammar
version="1.0" mode="voice" root="warriors"
xml:lang="en-US" tag-format="semantics/1.0"

xml:base="https://www.contoso.com/"
xmlns="http://www.w3.org/2001/06/grammar"
sapi:alphabet="x-microsoft-ups"
xmlns:sapi="https://schemas.microsoft.com/Speech/2002/06/SRGSExtensions">
<rule id="warriors" scope="public">
<item> The warrior's name is </item>
<ruleref uri="#warriorNames" />
<tag> out=rules.latest(); </tag>
</rule>
<rule id="warriorNames">
<one-of>
<item><token sapi:pron="K L EH . S1 T AA R"> Klhtr </token></item>
<item><token sapi:pron="S1 I . AX . N O R"> Eanor </token> </item>
<item><token sapi:pron="P UH N . S1 T AA . R IH K"> Puntahrik </token></item>
</one-of>
</rule>
</grammar>

For more information about the elements and attributes of SRGS grammars and their support by the Microsoft Speech Platform, see SRGS Grammar XML Reference (Microsoft.Speech). Also see Introduction to XML Grammar Elements for examples of how to define recognizable phrases in a grammar.

The purpose of grammars

Grammars created using SRGS XML provide the following benefits to a speech application:

Improve recognition accuracy by restricting and indicating to an engine what words it should expect.
Improve maintainability of textual grammars, by providing constructs for reusable text components (internal and external rule references), phrase lists, and string and numeric identifiers.
Improve translation of recognized speech into application actions. This is made easier by providing "semantic tags," (property name, and value associations) to words/phrases declared inside the grammar.

The XML source of an SRGS grammar is compiled into a binary grammar format and is the format used by the Speech Platform during application run time.

Share via

Microsoft Speech Platform

SRGS XML Grammar Format Overview

Example

The purpose of grammars

See Also

Additional resources