Share via


Note

Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.

Microsoft Speech Platform

SRGS XML Grammar Format Overview

The Microsoft Speech Platform supports XML-format grammars authored in accordance with the Speech Recognition Grammar Specification (SRGS) Version 1.0. The following is a summary of the most commonly used elements in an SRGS XML grammar. Links in the table lead to more information about each element.

Element Description
XML Declaration Specifies the XML version number, and optionally the character encodings. This header must appear on the first line of all XML documents.
grammar element The highest level container for an XML grammar definition. Specifies properties of the grammar, such as language and semantic format.
rule element Contains text or XML elements that define what speakers can say, and the order in which they can say it. Every grammar must have at least one rule element.
item element Specifies a word or other entity that can be spoken, such as content in token elements, a ruleref element, a tag element, or any logical combination of these.
one-of element Specifies a set of alternative phrases that can possibly be matched by a user. Each alternative word or phrase must be enclosed within an item element.
ruleref element Specifies a reference by the containing rule to another rule, either in the same grammar or in an external grammar.
token element Contains a string that a speech recognizer can use for recognition and optionally specifies the display form of the string and the precise pronunciation that will trigger recognition.
tag element Contains semantic information, either as a string or as ECMAScript (JavaScript, JScript), which returns additional information when an element or series of elements is recognized.

Note: The Speech Platform does not support SRGS grammars in Augmented Backus-Naur Form (ABNF).

Example

The following example grammar uses the elements described above and illustrates the structure of an SRGS grammar. This grammar recognizes phrases such as "The warrior's name is Klhtr" and "The warrior's name is Eanor".

`

<?xml version="1.0" encoding="UTF-8"?>

<grammar version="1.0" mode="voice" root="warriors" xml:lang="en-US" tag-format="semantics/1.0"
xml:base="https://www.contoso.com/" xmlns="http://www.w3.org/2001/06/grammar" sapi:alphabet="x-microsoft-ups" xmlns:sapi="https://schemas.microsoft.com/Speech/2002/06/SRGSExtensions">

<rule id="warriors" scope="public"> <item> The warrior's name is </item> <ruleref uri="#warriorNames" /> <tag> out=rules.latest(); </tag> </rule>

<rule id="warriorNames"> <one-of> <item><token sapi:pron="K L EH . S1 T AA R"> Klhtr </token></item> <item><token sapi:pron="S1 I . AX . N O R"> Eanor </token> </item> <item><token sapi:pron="P UH N . S1 T AA . R IH K"> Puntahrik </token></item> </one-of> </rule>

</grammar>

`

For more information about the elements and attributes of SRGS grammars and their support by the Microsoft Speech Platform, see SRGS Grammar XML Reference (Microsoft.Speech). Also see Introduction to XML Grammar Elements for examples of how to define recognizable phrases in a grammar.

The purpose of grammars

Grammars created using SRGS XML provide the following benefits to a speech application:

  • Improve recognition accuracy by restricting and indicating to an engine what words it should expect.
  • Improve maintainability of textual grammars, by providing constructs for reusable text components (internal and external rule references), phrase lists, and string and numeric identifiers.
  • Improve translation of recognized speech into application actions. This is made easier by providing "semantic tags," (property name, and value associations) to words/phrases declared inside the grammar.

The XML source of an SRGS grammar is compiled into a binary grammar format and is the format used by the Speech Platform during application run time.

See Also