Grammar Format Tags (SAPI 5.3)
Microsoft Speech API 5.3
Grammar Format Tags
The SAPI text grammar format is composed of XML tags, which can be structured to define the phrases that the speech recognition engine recognizes. The following document explains each tag in more detail, including sample source code, sample XML grammar snippets, and relevant application scenarios.
The XML tags descriptions are organized by XML element, where each element description contains information for relevant attributes.
XML Tags: Elements
<DEFINE>
Summary: The DEFINE tag is used for declaring a set of string identifiers for numeric values.
XML Attributes:
None
XML Parent Elements:
GRAMMAR: The container for the entire XML grammar.
XML Child Elements:
ID (1 or more required): The DEFINE tag can contain one or more ID tags, each
of which defines one string identifier.
Detailed Description:
None
XML Grammar Sample(s):
<GRAMMAR>
<DEFINE>
<ID NAME="TheNumberFive" VAL="5"/>
</DEFINE>
<!-- Note that the ID takes a number, which is actually "5" -->
<RULE ID="TheNumberFive" TOPLEVEL="ACTIVE">
<P>five</P>
</RULE>
</GRAMMAR>
Programmatic Equivalent:
See the ID tag.
Back to top
<DICTATION>
Summary: The DICTATION tag is used in rules or phrases that need basic dictation support.
XML Attributes:
MAX (optional, type=VT_I4, default=MIN): Specifies the maximum number of dictation
words that can be recognized.
The application must specify a MAX value that is greater than or equal to
the MIN value. The application can specify a pseudo-infinite maximum
by specifying INF as the MAX. The pseudo-infinite is actually 255
dictation words.
An application that needs free-form dictation, such as the subject line of
an email should use a large MAX. Alternatively, an application that
needs to recognize a person's name may want a much smaller value,
such as 5 words.
MIN (optional, type=VT_I4, default=1): Specifies the minimum number of dictation
words that must be recognized.
If the grammar author specifies the MIN value, and the recognizer does not
meet the minimum, the rule will fail to be recognized.
A Scenario where it may make sense to set a value greater than one would be
an application that is asking for a first and last name.
PROPID (optional, type=VT_I4): Specifies the semantic property's numeric identifier.
PROPNAME (optional): Specifies the semantic property's string identifier.
XML Parent Elements:
LIST, L: List of phrases which can be recognized.
PHRASE, P: Phrase that must be recognized for the containing rule to be recognized.
OPT, O: Optional phrase that may be recognized.
RULE: Rule that contains phrases or text to be recognized.
XML Element Children:
None.
Detailed Description:
The DICTATION tag is designed for applications that need to integrate command &
control and dictation support into a CFG. For example, an application may
allow the user to speak free-form dictation into a command (e.g. "save document
as our family's budget" where "our family's budget is free-form dictation).
The application may also create a CFG which supports a set of specific phrases or words,
and also includes a single DICTATION tag in case of an unexpected user-phrase.
For example, a CFG may include a set of address book names which are known, and
if the user speaks another name, then the application prompts the user for
validation of the dictated result. Note that the SR engine's accuracy may
suffer by mixing dictation and CFG phrases together, since many words sound
similar, and a CFG is generally preferred for application development with known
words.
The grammar author can also use a special character, asterisk (*) instead of the entire
XML tag. See XML Grammar Format: Special Dictation Tag.
By using semantic properties, the application can easily retrieve the exact text that
was dictated by the speaker. To specify a semantic property for the DICTATION tag
the grammar author should specify the PROPID and/or PROPNAME attributes. The
SAPI run time will automatically set the semantic tag's starting phrase element,
allowing the application to search for the specific semantic property in the
properties hierarchy (see SPPHRASEPROPERTY.ulFirstElement). If multiple dictation
words are recognized by the SR engine (e.g. DICTATION MAX > 1), then the SAPI
run time will generate multiple semantic properties, one for each word, where
all of the properties will have the same numeric ID and/or string NAME.
If the speech recognition engine supports multiple dictation topics (e.g. spelling,
general, legal, medical, etc.), the DICTATION tag in the grammar will refer to
topic that was selected when ISpRecoGrammar::LoadDictation was called. If the
topic was not explicitly selected, then the default SR engine dictation topic
will be loaded. Currently, it is not possible to load multiple dictation topics
inside of a single command & control grammars. Application should create multiple
grammar objects to implement the latter scenario.
If there is ambiguity between a dictation phrase and a CFG phrase, the speech
recognition engine will typically choose the CFG phrase. Preferring CFGs over
dictation prevents dictation from automatically consuming all CFG phrases.
The speech recognition engine must support dictation inside of a CFG for the grammar
to load and activate successfully. The application can determine if an engine
supports the DICTATION tag by retrieving the SR engine's object token (see
ISpRecognizer::GetRecognizer), and then checking for the existence of the
engine attribute "DictationInCFG" (see ISpObjectToken::MatchesAttributes).
The engine can specify support for the DICTATION tag to be anywhere in the
CFG phrase (attribute value="Anywhere"), or only at the end (attribute
value="Trailing").
XML Grammar Sample(s):
<GRAMMAR>
<!-- basic command to create a self-note for the user with free-form text -->
<RULE ID="SelfNote" TOPLEVEL="ACTIVE">
<P>note to self</P>
<DICTATION MAX="INF"/>
</RULE>
<!-- command to query a name from an address book -->
<RULE ID="QueryName" TOPLEVEL="ACTIVE">
<P>list first names of all persons with last name</P>
<!-- Store only one word for the last name, more will fail command -->
<DICTATION MAX="1">
</RULE>
<!-- command to handle first and last names with semantic properties -->
<!-- By using semantic properties, the application can ignore all of
the text returned, except for the text associated with the dictation
tags' semantic properties "PID_FirstName" and "PID_LastName" -->
<RULE ID="SubmitName" TOPLEVEL="ACTIVE">
<P>
my first name is
<!-- Note the implicit maximum is only one word -->
<DICTATION PROPID="PID_FirstName"/>
and my last name is
<!-- Note the implicit maximum is two words -->
<DICTATION PROPID="PID_LastName" MAX="2"/>
</P>
</RULE>
</GRAMMAR>
Programmatic Equivalent:
To programmatically create a dictation transition (i.e. DICTATION tag) in a CFG, the application developer
can use the ISpGrammarBuilder::AddRuleTransition with a special rule handle,
called SPRULETRANS_DICTATION. For example, the following code creates a simple
command called "SendMail" which recognizes the command "send mail to DICTATION".
SPSTATEHANDLE hsSendMail;
// Create new top-level rule called "SendMail"
hr = cpRecoGrammar->GetRule(L"SendMail", NULL,
SPRAF_TopLevel | SPRAF_Active, TRUE,
&hsSendMail);
// Check hr
// Create an interim state before the dictation transition
SPSTATEHANDLE hsBeforeDictation;
hr = cpRecoGrammar->CreateNewState(hsSendMail, &hsBeforeDictation);
// Check hr
// Add the command words "send mail to"
hr = cpRecoGrammar->AddWordTransition(hsSendMail, hsBeforeDictation,
L"send mail to", L" ", SPWT_LEXICAL, NULL, NULL);
// Check hr
// Add trailing dictation transition
hr = cpRecoGrammar->AddRuleTransition(hsBeforeDictation, NULL,
SPRULETRANS_DICTATION, NULL, NULL);
// Check hr
// save/commit changes
hr = cpRecoGrammar->Commit(NULL);
// Check hr
Note that the previous sample code only supports one dictation word. To support
more than one word, the code would need to build more dictation transition
states, each of which begins at the previous dictation state - effectively,
a series of consecutive single-word dictation transitions.
Back to top
<GRAMMAR>
Summary: The GRAMMAR tag is the outermost container for the XML grammar definition.
XML Attributes:
LANGID (optional, type=numeric): The language identifier of the grammar.
The identifier will be compared against the
the supported languages of the Speech Recognition engine. If the language is
not supported, the grammar load call will fail (e.g. ISpRecoGrammar::LoadCmdFromFile).
It is recommended that all XML grammars include the LANGID attribute to avoid the scenario
where the SR engine tries to load a grammar with an unspecified language ID, and fails due
to confusing words.
SAPI supports fuzzy language ID matching, in that the SR engine can
report that is supports the major portion of the Language ID (e.g. 0x009 in 0x409),
which means the SR engine will try to load and recognize any grammar that matches the
major portion of the language ID.
LEXDELIMITER (optional): The LEXDELIMITER attribute specifies the delimiter for explicit
lexicon entries specified in the grammar.
Grammar authors are able to specify the lexicon information by using a special
sequence of characters. The sequence of characters is:
LEXDELIMITERDisplayFormLEXDELIMITERLexicalFormLEXDELIMITERPronunciation;
The default delimiter is the backslash character "/".
See also PHRASE.
WORDTYPE (optional): The WORDTYPE attribute specifies the type of the word(s) when they are added to the
grammar.
The default value is "LEXICAL".
The value must be "LEXICAL".
XML Parent Elements:
None
XML Child Elements:
DEFINE (optional): Specifies the constant definitions for the grammar.
RULE (1 or more required): Specifies the rules, including top-level and non-top-level.
Detailed Description:
Every XML grammar must have the container tag, GRAMMAR.
XML Grammar Sample(s):
<!-- Language ID = British English -->
<GRAMMAR LANGID="413" LEXDELIMITER="|" WORDTYPE="LEXICAL">
<RULE NAME="HelloWorld" TOPLEVEL="ACTIVE">
<!-- when the user says the following pronunciation, "Hiya" will be displayed -->
<P>|Hiya|Hello|h eh l ow;</P>
</RULE>
</GRAMMAR>
Programmatic Equivalent:
To programmatically set the language ID of a new grammar, the application developer should
call ISpGrammarBuilder::ResetGrammar.
The application developer does not need to change the LEXDELIMITER or the WORDTYPE, since the ISpLexicon
interface can be used to modify the lexicon.
Back to top
<ID>
Summary: The ID tag is used for declaring a string identifier for numeric
values.
XML Attributes:
NAME (required): The NAME attribute defines the string identifier that will be associated
with the constant value.
VAL (required, type=VT_UI4,VT_I4,VT_R4,VT_R8): The VAL attribute defines the constant
value that will be associated with the string identifier.
XML Parent Elements:
DEFINE: The container for the constant definitions.
XML Child Elements:
None
Detailed Description:
The ID tag should be used by grammar author to make the grammar easier to read and
maintain. The grammar author can use string identifiers which succinctly explain
the use of the identifier (e.g. RID_FileNew, PVAL_MAIN_WINDOW, etc.). The grammar
compiler stores the identifiers in the binary format, and string identifiers are
typically much larger than numeric identifiers. Also, the application developer
can use a simple numeric comparison to handle rule and semantic property logic,
rather than performing a more complex string comparison.
XML Grammar Sample(s):
<GRAMMAR>
<DEFINE>
<ID NAME="RuleId_A" VAL="1"/>
<ID NAME="PropId_B" VAL="2"/>
<ID NAME="PropVal_AB" VAL="3"/>
</DEFINE>
<!-- Note that Rule ID, Phrase PROPID and VAL take a numeric values. -->
<RULE ID="RuleId_A" TOPLEVEL="ACTIVE">
<P PROPID="PropId_B" VAL="PropVal_AB">five</P>
</RULE>
</GRAMMAR>
Programmatic Equivalent:
The Grammar Compiler that ships in the Microsoft Speech SDK includes a command line
argument to generate a C-style header (see "-h"), which includes the programmatic
constant definitions for all of the IDs defined in the XML grammar. The application
developer can include the header file and easily use the same identifiers inside
the application logic, without needing to redefine and maintain the numeric values.
The XML Grammar Sample above would create the following C-style header file:
#define RuleId_A 1
#define PropId_B 2
#define PropVal_AB 3
Back to top
<LIST>, <L>
Summary: The LIST tag is used for specifying a list of phrases or transitions.
XML Attributes:
PROPID (optional, type=VT_I4): The numeric identifier that will be inherited by all
semantic properties in the child elements (e.g. phrases).
PROPNAME (optional): The string identifier that will be inherited by all semantic
properties in the child elements (e.g. phrases).
XML Parent Elements:
LIST, L: List of phrases or rules which can be recognized.
PHRASE, P: Phrase that must be recognized for the containing rule to be recognized.
OPT, O: Optional phrase causing the rule reference to be implicitly optional.
RULE: Rule that contains phrases or text to be recognized.
XML Child Elements:
RULEREF: Import, or reference, another rules contents
PHRASE, P: Specifies text or leaf nodes.
LIST, L: Specifies a list of phrases or transitions for recognition.
TEXTBUFFER: Specifies a reference to the run-time application maintained
text-buffer.
WILDCARD: Specifies a garbage word; one or more non-silence, ignorable words
DICTATION: Specifies a piece of text recognized by the loaded dictation topic.
Detailed Description:
The LIST tag is a quick and efficient way to support lists of phrases or text. Instead
of creating separate rules for each piece of text, the LIST tag can be used
where its children are the phrase, rule reference, or other tags.
The grammar author can use the shorthand version of the LIST tag, the L tag.
The LIST tag is more of a virtual tag, since it does not affect the semantic property
hierarchy (LIST children are not child properties). While it allows the grammar
author to specify a string or numeric identifier, the identifier is only used
to pass on to the child element as a default property identifier.
XML Grammar Sample(s):
<GRAMMAR>
<!-- Note that rule is not top-level and is only used as a reusable component rule -->
<RULE NAME="Numbers">
<!-- The list tag includes a semantic property Id, "PID_Value" which
is inherited by all child phrase elements -->
<LIST PROPID="PID_Value">
<!-- If the user says "one" then the semantic property returned will
be the name/value pair "PID_Value"/"1" -->
<P VAL="1">one</P>
<P VAL="2">two</P>
<P VAL="3">three</P>
<P VAL="4">four</P>
<P VAL="5">five</P>
</LIST>
</RULE>
<!-- The rule contains a list of various types of transitions -->
<RULE NAME="Sampler" TOPLEVEL="ACTIVE">
<!-- the list property specifies a default property name of "TYPE_NUMBER",
which will overridden by specific list children -->
<LIST PROPNAME="TYPE_NUMBER">
<P VAL="1">one</P>
<P VAL="2">two</P>
<P VAL="3">three</P>
<P PROPNAME="TYPE_STRING" VALSTR="FOUR">four</P>
<P PROPNAME="TYPE_NONE">five</P>
<RULEREF NAME="Numbers" PROPNAME="TYPE_RULEREF"/>
<TEXTBUFFER PROPNAME="TYPE_TEXTBUFFER"/>
<DICTATION PROPNAME="TYPE_DICTATION"/>
</LIST>
</RULE>
</GRAMMAR>
Programmatic Equivalent:
To programmatically create a list, or a set of sibling/parallel transitions, the application
needs to create a start state, then create multiple transitions out of the state. For
example, the following sample code shows how to make a list of phrases (e.g. "one",
"two", "three").
SPSTATEHANDLE hsList;
// Create new top-level rule called "List"
hr = cpRecoGrammar->GetRule(L"List", NULL,
SPRAF_TopLevel | SPRAF_Active, TRUE,
&hsList);
// Check hr
// Add the word "one" to the list
hr = cpRecoGrammar->AddWordTransition(hsList, NULL,
L"one", L" ",
SPWT_LEXICAL, NULL, NULL);
// Check hr
// Add the word "two" to the list
hr = cpRecoGrammar->AddWordTransition(hsList, NULL,
L"two", L" ",
SPWT_LEXICAL, NULL, NULL);
// Check hr
// Add the word "three" to the list
hr = cpRecoGrammar->AddWordTransition(hsList, NULL,
L"three", L" ",
SPWT_LEXICAL, NULL, NULL);
// Check hr
// save/commit changes
hr = cpRecoGrammar->Commit(NULL);
// Check hr
The application developer can use similar code to create a list of rule references,
dictation, or text buffer transitions. To change the type of list item, change
the ::AddWordTransition call to ::AddRuleTransition.
Back to top
<OPT>, <O>
Summary: The OPT tag is used for specifying optional text in a command phrase.
XML Attributes:
DISP (optional): Specifies the display form of the phrase text.
MAX (optional, type=VT_I4, default=MIN): Specifies the maximum number of times the user
can repeat the phrase and still be successfully recognized.
MIN (optional, type=VT_I4, default=1): Specifies the minimum number of times the user
must repeat the phrase and still be successfully recognized.
PRON (optional): Specifies the pronunciation to be used by the recognizer when listening
for the text.
PROPID (optional, type=VT_I4): Specifies the numeric identifier to associate with the phrase
tag's semantic property.
PROPNAME (optional): Specifies the string identifier to associate with the phrase tag's
semantic property.
VAL (optional, type=VT_I4): Specifies the semantic property's numeric value.
VALSTR (optional): Specifies the semantic property's string value.
WEIGHT (type=VT_UI4,VT_I4,VT_R4,VT_R8, default=1/n_sibling_transitions): The probability
that the user will speak the contents of the PHRASE tag, versus another
sibling transition or phrase.
XML Parent Elements:
RULEREF: Import, or reference, another rules contents
PHRASE, P: Specifies text or leaf nodes.
OPT, O: Optional phrase causing the rule reference to be implicitly optional.
LIST, L: Specifies a list of phrases or transitions for recognition.
TEXTBUFFER: Specifies a reference to the run-time application maintained
text-buffer.
WILDCARD: Specifies a garbage word; one or more non-silence, ignorable words
DICTATION: Specifies a piece of text recognized by the loaded dictation topic.
XML Child Elements:
RULEREF: Import, or reference, another rules contents
PHRASE, P: Specifies text or leaf nodes.
OPT, O: Optional phrase causing the rule reference to be implicitly optional.
LIST, L: Specifies a list of phrases or transitions for recognition.
TEXTBUFFER: Specifies a reference to the run-time application maintained
text-buffer.
WILDCARD: Specifies a garbage word; one or more non-silence, ignorable words
DICTATION: Specifies a piece of text recognized by the loaded dictation topic.
Detailed Description:
The OPT tag along with the OPT tag are the only tags that can directly
contain recognizable text.
The grammar author can use the shorthand version of the OPT tag, the O tag.
The grammar author can also specify custom word pronunciations and display
text by using the PRON and DISP attributes. For example, a grammar
might contain application or domain specific text, which has a custom
pronunciation. The author can specify the pronunciation on a specific
OPT tag to avoid the need for updating the user or application
lexicon (especially if the pronunciation is command specific).
The grammar author can also use special shorthand characters inside of the
content section of the PHRASE tag (e.g. dictation, wildcard, etc.). See
the XML Special Characters.
XML Grammar Sample(s):
<GRAMMAR>
<!-- Create a simple "hello world" rule -->
<!-- the second word is optional -->
<RULE NAME="HelloWorld" TOPLEVEL="ACTIVE">
<P>hello</P>
<OPT>world</OPT>
</RULE>
<!-- Create a rule that changes the pronunciation and the display
form of the phrase. When the user says "eh" the display
text will be "I don't understand?". Note the user didn't
say "huh". The pronunciation for "what" is specific to this
phrase tag and is not changed for the user or application
lexicon, or even other instances of "what" in the grammar -->
<RULE NAME="Question_Pron" TOPLEVEL="ACTIVE">
<P DISP="I don't understand" PRON="eh">what</P>
</RULE>
<!-- Create a phrase with an attached semantic property -->
<!-- Speaking "one two three" will return three different unique
semantic properties, with different names, and different
values -->
<!-- Speaking "one three" will return two different unique
semantic properties, with different names, and different
values -->
<!-- Speaking "one two" will return two different unique
semantic properties, with different names, and different
values -->
<!-- Speaking "one" will return two different unique
semantic properties, with different names, and different
values -->
<!-- Note that the number of semantic properties returned is
variable, and that the application should be designed to
handle all of the variations -->
<RULE NAME="UseProps" TOPLEVEL="ACTIVE">
<!-- named property, without value -->
<P PROPNAME="NOVALUE">one</P>
<!-- named property, with numeric value -->
<O PROPNAME="NUMBER" VAL="2">two</O>
<!-- named property, with string value -->
<O PROPNAME="STRING" VALSTR="three">three</O>
</RULE>
<!-- Create a rule for optional command prefix -->
<!-- Note that entire rule reference is optional. In cases where
there are properties associated with the rule reference, the
semantic property tree may change -->
<!-- the rule supports the phrases "play cards", "please play cards", and
"please play cards" -->
<RULE NAME="PlayCard" TOPLEVEL="ACTIVE">
<O><RULEREF NAME="PLEASE"/></O>
<P>play cards</P>
</RULE>
<!-- The first word "pretty" is optional, while the second is required -->
<RULE NAME="PLEASE">
<O>pretty</O>
<P>please</P>
</RULE>
</GRAMMAR>
Programmatic Equivalent:
To add an optional phrase to a rule, SAPI provides an API called
ISpGrammarBuilder::AddWordTransition. The application developer can add
the optional structure as follows:
SPSTATEHANDLE hsHelloWorld;
// Create new top-level rule called "HelloWorld"
hr = cpRecoGrammar->GetRule(L"HelloWorld", NULL,
SPRAF_TopLevel | SPRAF_Active, TRUE,
&hsHelloWorld);
// Check hr
// create an interim state
SPSTATEHANDLE hInterim;
hr = cpRecoGrammar->CreateNewState(hsHelloWorld, &hInterim);
// Check hr
// Add the command word "hello" which terminates at the interim
// state
hr = cpRecoGrammar->AddWordTransition(hsHelloWorld, hInterim,
L"hello", NULL,
SPWT_LEXICAL, NULL, NULL);
// Check hr
// Add the optional command word "world"
hr = cpRecoGrammar->AddWordTransition(hInterim, NULL,
L"hello", NULL,
SPWT_LEXICAL, NULL, NULL);
// Check hr
// Add the epsilon transition, which means no word need be spoken
hr = cpRecoGrammar->AddWordTransition(hInterim, NULL,
NULL, NULL,
SPWT_LEXICAL, NULL, NULL);
// Check hr
// save/commit changes
hr = cpRecoGrammar->Commit(NULL);
// Check hr
Back to top
<PHRASE>, <P>
Summary: The PHRASE tag and the OPT tags are the sole methods of explicitly specifying text to be
recognized by the speech recognition engine.
XML Attributes:
DISP (optional): Specifies the display form of the phrase text.
MAX (optional, type=VT_I4, default=MIN): Specifies the maximum number of times the user
can repeat the phrase and still be successfully recognized.
MIN (optional, type=VT_I4, default=1): Specifies the minimum number of times the user
must repeat the phrase and still be successfully recognized.
PRON (optional): Specifies the pronunciation to be used by the recognizer when listening
for the text.
PROPID (optional, type=VT_I4): Specifies the numeric identifier to associate with the phrase
tag's semantic property.
PROPNAME (optional): Specifies the string identifier to associate with the phrase tag's
semantic property.
VAL (optional, type=VT_I4): Specifies the semantic property's numeric value.
VALSTR (optional): Specifies the semantic property's string value.
WEIGHT (type=VT_UI4,VT_I4,VT_R4,VT_R8, default=1/n_sibling_transitions): The probability
that the user will speak the contents of the PHRASE tag, versus another
sibling transition or phrase.
XML Parent Elements:
RULEREF: Import, or reference, another rules contents
PHRASE, P: Specifies text or leaf nodes.
OPT, O: Optional phrase causing the rule reference to be implicitly optional.
LIST, L: Specifies a list of phrases or transitions for recognition.
TEXTBUFFER: Specifies a reference to the run-time application maintained
text-buffer.
WILDCARD: Specifies a garbage word; one or more non-silence, ignorable words
DICTATION: Specifies a piece of text recognized by the loaded dictation topic.
XML Child Elements:
RULEREF: Import, or reference, another rules contents
PHRASE, P: Specifies text or leaf nodes.
OPT, O: Optional phrase causing the rule reference to be implicitly optional.
LIST, L: Specifies a list of phrases or transitions for recognition.
TEXTBUFFER: Specifies a reference to the run-time application maintained
text-buffer.
WILDCARD: Specifies a garbage word; one or more non-silence, ignorable words
DICTATION: Specifies a piece of text recognized by the loaded dictation topic.
Detailed Description:
The PHRASE tag along with the OPT tag are the only tags that can directly
contain recognizable text. Except for grammars that contain rule
references, every grammar must have at least one PHRASE tag.
The grammar author can use the shorthand version of the PHRASE tag, the P tag.
The grammar author can also specify custom word pronunciations and display
text by using the PRON and DISP attributes. For example, a grammar
might contain application or domain specific text, which has a custom
pronunciation. The author can specify the pronunciation on a specific
PHRASE tag to avoid the need for updating the user or application
lexicon (especially if the pronunciation is command specific).
The grammar author can also use special shorthand characters inside of the
content section of the PHRASE tag (e.g. dictation, wildcard, etc.). See
the XML Special Characters.
XML Grammar Sample(s):
<GRAMMAR>
<!-- Create a simple "hello world" rule -->
<RULE NAME="HelloWorld" TOPLEVEL="ACTIVE">
<P>hello world</P>
</RULE>
<!-- Create a more advanced "hello world" rule that changes the
display form. When the user says "hello world" the display
text will be "Hiya there!" -->
<RULE NAME="HelloWorld_Disp" TOPLEVEL="ACTIVE">
<P DISP="Hiya there!">hello world</P>
</RULE>
<!-- Create a rule that changes the pronunciation and the display
form of the phrase. When the user says "eh" the display
text will be "I don't understand?". Note the user didn't
say "huh". The pronunciation for "what" is specific to this
phrase tag and is not changed for the user or application
lexicon, or even other instances of "what" in the grammar -->
<RULE NAME="Question_Pron" TOPLEVEL="ACTIVE">
<P DISP="I don't understand" PRON="eh">what</P>
</RULE>
<!-- Create a rule demonstrating repetition -->
<!-- the rule will only be recognized if the user says "hey diddle
diddle" -->
<RULE NAME="NurseryRhyme" TOPLEVEL="ACTIVE">
<P>hey</P>
<P MIN="2" MAX="2">diddle</P>
</RULE>
<!-- Create a list with variable phrase weights -->
<!-- If the user says similar phrases, the recognizer will use
the weights to pick a match -->
<RULE NAME="UseWeights" TOPLEVEL="ACTIVE">
<LIST>
<!-- Note the higher likelihood that the user is
expected to say "recognizer speech" -->
<P WEIGHT=".95">recognize speech</P>
<P WEIGHT=".05">wreck a nice beach</P>
</LIST>
</RULE>
<!-- Create a phrase with an attached semantic property -->
<!-- Speaking "one two three" will return three different unique
semantic properties, with different names, and different
values -->
<RULE NAME="UseProps" TOPLEVEL="ACTIVE">
<!-- named property, without value -->
<P PROPNAME="NOVALUE">one</P>
<!-- named property, with numeric value -->
<P PROPNAME="NUMBER" VAL="2">two</P>
<!-- named property, with string value -->
<P PROPNAME="STRING" VALSTR="three">three</P>
</RULE>
</GRAMMAR>
Programmatic Equivalent:
To add a phrase to a rule, SAPI provides an API called
ISpGrammarBuilder::AddWordTransition. The application developer can add
the sentences as follows:
SPSTATEHANDLE hsHelloWorld;
// Create new top-level rule called "HelloWorld"
hr = cpRecoGrammar->GetRule(L"HelloWorld", NULL,
SPRAF_TopLevel | SPRAF_Active, TRUE,
&hsHelloWorld);
// Check hr
// Add the command words "hello world"
// Note that the lexical delimiter is " ", a space character.
// By using a space delimiter, the entire phrase can be added
// in one method call
hr = cpRecoGrammar->AddWordTransition(hsHelloWorld, NULL,
L"hello world", L" ",
SPWT_LEXICAL, NULL, NULL);
// Check hr
// Add the command words "hiya there"
// Note that the lexical delimiter is "|", a pipe character.
// By using a pipe delimiter, the entire phrase can be added
// in one method call
hr = cpRecoGrammar->AddWordTransition(hsHelloWorld, NULL,
L"hiya|there", L"|",
SPWT_LEXICAL, NULL, NULL);
// Check hr
// save/commit changes
hr = cpRecoGrammar->Commit(NULL);
// Check hr
Back to top
<RESOURCE>
Summary: The RESOURCE tag is used by grammar authors who want to store arbitrary string
data on rules (e.g. for use by a CFG Interpreter, or an SR engine aware of the
the resources).
XML Attributes:
NAME: specifies the name of the resource to attach to the rule.
XML Parent Elements:
RULE: The rule that contains the resource reference.
XML Child Elements:
[CDATA] (required): The resource value is specified by a CDATA section.
For example,
<![CDATA[This is a test string]]>
The RESOURCE tag contains the CDATA element, which itself contains the string.
Detailed Description:
The RESOURCE tag is a facility allowing the grammar author to communicate
information [attached to rules] to a CFG Interpreter (see
ISpCFGInterpreter and ISpCFGInterpreterSite::GetResourceValue) or a
speech recognition engine that is aware of the resource information
(see ISpSREngineSite::GetResource).
XML Grammar Sample(s):
<GRAMMAR>
<!-- Note resource value can be any string -->
<RULE ID="RID_TestResource" TOPLEVEL="ACTIVE">
<RESOURCE NAME="AResource">
<![CDATA[AResource's Value: String]]>
</RESOURCE>
<P>test an embedded resource</P>
</RULE>
</GRAMMAR>
Programmatic Equivalent:
To add a resource to a rule, SAPI provides an API called
ISpGrammarBuilder::AddResource. The application developer can add
the aforementioned resource (see XML Grammar Sample) with the following
code:
SPSTATEHANDLE hsTestResource;
// Create new top-level rule called "TestResource"
hr = cpRecoGrammar->GetRule(NULL, RID_TestResource,
SPRAF_TopLevel | SPRAF_Active, TRUE,
&hsTestResource);
// Check hr
// Add the command words "test an embedded resource"
hr = cpRecoGrammar->AddWordTransition(hsTestResource, NULL,
L"test an embedded resource", L" ",
SPWT_LEXICAL, NULL, NULL);
// Check hr
// Add the resource named "AResource"
hr = cpRecoGrammar->AddResource(hsTestResource,
L"AResource",
L"AResource's Value: String");
// Check hr
// save/commit changes
hr = cpRecoGrammar->Commit(NULL);
// Check hr
Then, the SR-Engine can retrieve the resource value when it is processing
the rule updates or CFG-recognition by making the following call:
// set hRule to handle with resource
hr = cpSREngineSite->GetResource(hRule,
L"AResource",
&pwszResValue);
if (S_OK == hr)
{
// pwszResValue contains the value
// perform value-sensitive processing
// release value memory
::CoTaskMemFree(pwszResValue);
}
Back to top
<RULE>
Summary: The RULE tag is the core tag for defining which commands are available for
recognition. Every grammar must have at least one top-level rule, and
every rule must have at least one rule reference or recognizable text.
XML Attributes:
DYNAMIC (optional, default is FALSE): Specifies whether the rule supports dynamic
modifications at run time. By default, an application cannot modify rules
in an XML grammar. To modify a rule, the rule must be marked DYNAMIC, and
the grammar must be loaded with the dynamic flag (see ISpRecoGrammar and
SPLOADOPTIONS). Dynamic rules cannot be marked EXPORT.
EXPORT (optional, default is FALSE): Specifies whether the rule allows external
grammar to reference it. For example, a grammar author that wants to allow
other grammar author's to reuse her rules must mark each of the reusable
rules with EXPORT="TRUE"). Exported rules cannot be marked DYNAMIC.
ID (required, type=VT_I4): Specifies the numeric identifier of the rule. The ID
or the NAME must be specified, or both. The identifier must be unique in
the rule namespace, which is the entire grammar (see GRAMMAR).
INTERPRETER (optional, default is FALSE): Specifies if the rule should use the
CFG interpreter (see ISpCFGInterpreter) when it is recognized. For example,
a rule might contain semantic properties or text that should be modified
at run time (e.g. replace value of the semantic property named "TODAY" with
the system's current date and time).
NAME (required): Specifies the string identifier of the rule. The NAME
or the ID must be specified, or both. The identifier must be unique in
the rule namespace, which is the entire grammar (see GRAMMAR).
TOPLEVEL (optional): Specifies that the rule is directly recognizable by a user.
If the TOPLEVEL tag is not specified, then the rule is not recognizable
unless it is referenced by another top-level rule structure. For example,
component rules (see RULEREF) do not need to specify the TOPLEVEL attribute.
When a grammar author specifies a rule as TOPLEVEL, she must also specify
if the rule is to be enabled by default. If the rule is enabled by default
(e.g. TOPLEVEL="ACTIVE"), then when the application activates the default
set of rules (e.g. ISpRecoGrammar::SetRuleState(NULL, NULL, SPRS_ACTIVE)),
then the rule will be activated. If a rule is specified as
TOPLEVEL="INACTIVE", then it will only be activated when explicitly set to
active (see ISpRecoGrammar::SetRuleState and
ISpRecoGrammar::SetRuleIdState).
XML Parent Elements:
GRAMMAR: The container for the entire XML grammar.
XML Child Elements:
RULEREF: Import, or reference, another rules contents
PHRASE, P: Specifies text or leaf nodes.
LIST, L: Specifies a list of phrases for recognition.
OPT, O: Specifies an optional piece of text that can be spoken.
TEXTBUFFER: Specifies a reference to the run-time application maintained
text-buffer.
WILDCARD: Specifies a garbage word; one or more non-silence, ignorable words
DICTATION: Specifies a piece of text recognized by the loaded dictation topic.
RESOURCE: Specifies a labeled piece of arbitrary string data which can be
accessed by a special SR engine, or a CFG interpreter.
Detailed Description:
The RULE tag is the core of the XML grammar text format. The purpose of creating
a CFG is to define a specific set of words and phrases that can be
spoken by the user and recognized by the speech recognition engine. The
rules can be written by the grammar author in a way that makes them
reusable, textually maintainable, and conducive to application logic
that is based on semantic properties or actions (not on phrase text).
Each rule must contain at least one piece of text, or a rule reference (which
has the same requirements). Effectively, every rule will eventually end
with a piece of text (i.e. leaf or terminal node).
The rule can be identified by either a numeric identifier (ID) or a string identifier
(NAME). The grammar author can use the DEFINE tag to define constant string
identifiers for numeric values. By using the constant string identifiers,
the grammar author can avoid magic numbers (i.e. hard-coded numbers that can
cause maintenance problems when updating code/grammar). See the ID tag for
more information on constant identifiers.
By using rule importing (references) and rule exporting, grammar authors can
leverage reusable grammar components (e.g. numbers or date grammars).
Similarly, grammar authors can abstract certain portions of the grammar
text away from the semantic content by using semantic properties, or
tags. Semantic properties are name/value pairs which are associated
with rule nodes in the rule hierarchy, and can even contain relevant
information from the recognized text (see SPPHRASEPROPERTY.ulStartingElement
and SPPHRASEPROPERTY.ulCountOfElements).
The grammar author can also use a CFG interpreter, which is a COM object that can
re-process the semantic property tree and phrase text to modify the content
at run time. For example, an application may load a grammar which includes
a "days of the week" rule. By integrating a CFG interpreter with the grammar,
the interpreter could replace the "days of the week" properties (e.g. Sunday,
Monday, Tuesday, etc.) with the actual calendar dates relative to the
application's host system (e.g. GetSystemTime). See ISpCFGInterpreter.
SAPI supports a feature called "semantic property pushing" which enables
applications to detect the semantic property structure more accurately at
recognition time. "Property pushing" is done by SAPI at compile-
time, whereby the compiler moves semantic properties to the last terminal
node within a rule which remains unambiguous. For example, the phrases "a b
c d" and "a b e f g" both have prefixes of "a b". The compiler will
automatically split the phrases into three separate phrases, "a b", "c d",
and "e f g", where the first phrase is the common prefix to both recognizable
phrases. The purpose of this feature is to enable applications that place
properties on the phrases, will be able to detect which branch is being
hypothesized as soon as the first unambiguous (non-common) portion of the
phrase is spoken. When the user speaks "a b" it is not clear if the user will
say "a b c d" or "a b e f g". If the user then says "e", the application
can obviously eliminate the "a b c d" option. If the grammar author attached
properties to the end of both phrases, the semantic property would be
returned as soon as the user spoke the first unambiguous portion of the text
(e.g. "c" or "e"). See Semantic Properties, Hypotheses, and "Property Pushing."
XML Grammar Sample(s):
<GRAMMAR>
<DEFINE>
<ID NAME="RID_Hello" VAL="1"/>
<ID NAME="RID_World" VAL="2"/>
<ID NAME="RID_AddNumbers" VAL="3"/>
<ID NAME="RID_Numbers" VAL="4"/>
<ID NAME="RID_Numbers_Exportable" VAL="5"/>
<ID NAME="RID_Names" VAL="6"/>
</DEFINE>
<!-- create a simple top-level rule that uses a constant defined identifier -->
<RULE ID="RID_Hello" TOPLEVEL="ACTIVE">
<P>hello</P>
</RULE>
<!-- Create a simple top-level rule that is inactive by default -->
<RULE NAME="Hiya" TOPLEVEL="INACTIVE">
<P>hiya</P>
</RULE>
<!-- Create a rule, which a CFG-interpreter can re-process to modify the semantic
properties -->
<RULE NAME="InterpretedRule" TOPLEVEL="ACTIVE" INTERPRETER="TRUE">
<P PROPNAME="TODAY">what is today's date</P>
</RULE>
<!-- Create a simple top-level rule that references another non top-level rule -->
<RULE ID="RID_AddNumbers" TOPLEVEL="ACTIVE">
<P>add</P>
<RULEREF REFID="RID_Numbers"/>
<P>to</P>
<RULEREF REFID="RID_Numbers"/>
</RULE>
<!-- Note that rule is not top-level and is only used as a reusable component rule -->
<RULE ID="RID_Numbers">
<LIST PROPID="PID_Value">
<P VAL="1">one</P>
<P VAL="2">two</P>
<P VAL="3">three</P>
<P VAL="4">four</P>
<P VAL="5">five</P>
</LIST>
</RULE>
<!-- mark the rule as dynamic so the application can update the list of names
at runtime -->
<RULE ID="RID_Names" DYNAMIC="TRUE">
<LIST>
<P>bob</P>
<P>jane</P>
<P>kate</P>
<P>tom</P>
</LIST>
</RULE>
<!-- Mark the rule as exportable, so other external grammars can access it -->
<RULE ID="RID_Numbers_Exportable" EXPORT="TRUE">
<LIST PROPID="PID_Value">
<P VAL="6">six</P>
<P VAL="7">seven</P>
<P VAL="8">eight</P>
<P VAL="9">nine</P>
<P VAL="10">ten</P>
</LIST>
</RULE>
</GRAMMAR>
Programmatic Equivalent:
Application developers can programmatically add rules to a grammar by using the
ISpGrammarBuilder interface inherited by ISpRecoGrammar. The following sample code
shows how to add a rule to a grammar. To choose the rule attributes, see the
ISpGrammarBuilder::GetRule method and SPCFGRULEATTRIBUTES.
SPSTATEHANDLE hHelloWorld;
// Create new rule called "HelloWorld"
// Note that the second parameter is the ID, which can also be specified
// Note also that the rule is marked as top-level and active
hr = cpRecoGrammar->GetRule(L"SpeakNumber", NULL, SPRAF_TopLevel | SPRAF_Active,
TRUE, &hHelloWorld);
// Check hr
// add the text "hello world"
hr = cpRecoGrammar->AddWordTransition(hHelloWorld, NULL, L"hello world",
L" ", SPWT_LEXICAL, 1, NULL);
// Check hr
// save the grammar changes
hr = cpRecoGrammar->Commit(NULL);
// Check hr
The following sample code shows how to modify a rule in an existing grammar. Specifically,
the code will update the list of names rule shown in the XML Sample Grammar
section. By updating the names rule, all rules that reference the names will
automatically be able to recognize the updated names (after calling ::Commit).
SPSTATEHANDLE hNames;
// Get a handle to the existing rule
// Note the use of the constant identifier RID_Names, which was defined in the
// XML sample. See the ID tag for information on generating a C-style header
hr = cpRecoGrammar->GetRule(NULL, RID_Names, NULL, TRUE, &hNames);
// Check hr
// clear the rule to update the entire list
hr = cpRecoGrammar->ClearRule(hNames);
// Check hr
// add name "sally"
hr = cpRecoGrammar->AddWordTransition(hNames, NULL, L"sally", NULL,
SPWT_LEXICAL, NULL, NULL);
// Check hr
// add name "jim"
hr = cpRecoGrammar->AddWordTransition(hNames, NULL, L"jim", NULL,
SPWT_LEXICAL, NULL, NULL);
// Check hr
// add name "diane"
hr = cpRecoGrammar->AddWordTransition(hNames, NULL, L"diane", NULL,
SPWT_LEXICAL, NULL, NULL);
// Check hr
// save grammar changes
hr = cpRecoGrammar->Commit(NULL);
// Check hr
Back to top
<RULEREF>
Summary: The RULEREF tag is used for importing rules from the same grammar, or another
grammar. The RULEREF tag is especially useful for reusing component or
off-the-shelf rules and grammars.
XML Attributes:
NAME (required): Specifies the string identifier of the rule to reference. The NAME
or the REFID must be specified. If both are specified, they must refer to the
same rule.
OBJECT (optional): Specifies the programmatic identifier (ProgId) of the COM
object which contains the compiled grammar (see ISpCFGInterpreter and
ISpCFGInterpreter::InitGrammar).
PROPID (optional, type=VT_I4): Specifies the numeric identifier of the semantic property
attached to the rule reference.
PROPNAME (optional): Specifies the string identifier of the semantic property attached
to the rule reference.
REFID (required, type=VT_I4): Specifies the numeric identifier of the rule to reference.
The NAME or the REFID must be specified. If both are specified, they must refer
to the same rule.
URL (optional): Specifies the uniform resource locator (URL) of the rule to reference.
The URL can be prefixed by "http://", "file://", or no prefix for a relative
address. The URL can reference either a compiled grammar (e.g. *.cfg) or an
uncompiled XML grammar (e.g. *.xml) which will be compiled by SAPI on demand.
VAL (optional): Specifies the numeric value that will be associated with the semantic
property attached to the rule reference.
VALSTR (optional): Specifies the string value that will be associated with the semantic
property attached to the rule reference.
WEIGHT (optional, type=VT_UI4,VT_I4,VT_R4,VT_R8, default=1/n_sibling_transitions): The
probability of the contents of the rule (which is referenced) being spoken by
the user.
XML Parent Elements:
LIST, L: List of phrases or rules which can be recognized.
PHRASE, P: Phrase that must be recognized for the containing rule to be recognized.
OPT, O: Optional phrase causing the rule reference to be implicitly optional.
RULE: Rule that contains phrases or text to be recognized.
XML Child Elements:
None
Detailed Description:
The RULEREF tag is provided to grammar authors to allow for grammar reusability, and for
structuring semantic properties into a hierarchy.
Grammar reusability is provided by allowing rules to reference other rules. For example,
an independent software vendor (ISV) could developer a series of grammars that
supported mathematic operations and easy to speak numbers. They could redistribute
their grammars via either a web site (URL, http), a COM object (ProgId), or a
compiled grammar. Grammar authors who want to use the ISV's grammars would only
need to add a RULEREF tag into their grammar which referenced the appropriate
file or resource location. Similarly, grammar authors can build basic rule
components into their grammars (e.g. spelling, numbers, or proper names), then
build complex commands by reusing the basic rule components (local rule reference).
Structured, hierarchal semantic properties are built on top of RULEs and RULEREFs. All of
the semantic properties specified inside of a rule are siblings (ordered by
order of declaration in the recognized transition path). The semantic properties
that are in rules referenced by another rule are child properties of the
rule that made the reference. For example, examine the following grammar:
<RULE NAME="A" TOPLEVEL="ACTIVE">
<P PROPNAME="ROOT">
<RULEREF NAME="B" PROPNAME="ROOT_SIBLING"/>
</P>
</RULE>
<RULE NAME="B">
<P PROPNAME="CHILD">hello</P>
<P PROPNAME="LEAF">world</P>
</RULE>
The grammar contains two rules, one top-level rule which references another rule.
The top-level rule contains two semantic properties, one attached to a phrase tag
(e.g. "ROOT"), and the other attached to the rule reference tag (e.g.
"ROOT_SIBLING"). The second rule also contains two semantic properties, one
attached to a phrase tag (e.g. "CHILD), and the other attached to the phrase tag
(e.g. "LEAF"). If the recognized phrase is "hello world", the semantic property
structure is as follows:
SPPHRASE->pProperties.pszName == "ROOT"
SPPHRASE->pProperties->pNextSibling.pszName == "ROOT_SIBLING"
SPPHRASE->pProperties->pFirstChild.pszName == "CHILD"
SPPHRASE->pProperties->pFirstChild->pNextSibling.pszName == "LEAF"
Note that no matter how many phrases or semantic properties are contained in a
single RULE, all of the properties are siblings. Child semantic properties are only
created by using rule references. See also the Whitepaper, Designing Grammar Rules:
Retrieving Semantic Properties.
XML Grammar Sample(s):
<GRAMMAR>
<DEFINE>
<ID NAME="RID_Numbers" VAL="1"/>
<ID NAME="RID_AddNumbers" VAL="2"/>
<ID NAME="PID_Value" VAL="1"/>
</DEFINE>
<!-- create a simple rule that reuses the local numbers rule component -->
<RULE ID="RID_AddNumbers" TOPLEVEL="ACTIVE">
<P>add</P>
<!-- the first operand will be a number from the numbers rule-->
<!-- the application can retrieve the child property of this property "operand_1"
which has a value of 1-5 -->
<RULEREF REFID="RID_Numbers" PROPNAME="operand_1"/>
<P>to</P>
<!-- the second operand will be a number from the numbers rule-->
<!-- the application can retrieve the child property of this property "operand_2"
which has a value of 1-5 -->
<RULEREF REFID="RID_Numbers" PROPNAME="operand_2"/>
</RULE>
<!-- Note that rule is not top-level and is only used as a reusable component rule -->
<RULE ID="RID_Numbers">
<LIST PROPID="PID_Value">
<P VAL="1">one</P>
<P VAL="2">two</P>
<P VAL="3">three</P>
<P VAL="4">four</P>
<P VAL="5">five</P>
</LIST>
</RULE>
<RULE NAME="SearchWeb" TOPLEVEL="ACTIVE">
<P>search web for site named</P>
<!-- Reference a fictitious rule located on the web which contains a daily updated
list of SR-friendly web site names -->
<RULEREF NAME="SiteNames" URL="http://www.msn.com/WebServices/SpeechObjects.cfg"/>
</RULE>
<RULE NAME="SearchAddressBook" TOPLEVEL="ACTIVE">
<P>find address of</P>
<!-- Reference a fictitious rule located in a registered COM object, which contains
a dynamic list of Exchange server address book names -->
<RULEREF NAME="FullNames" OBJECT="Exchange.SpeechGrammars"/>
</RULE>
</GRAMMAR>
Programmatic Equivalent:
Application developers can programmatically import rules from URLs by using the following format:
Rule Name = "URL:" + FILENAME + "\\" RULENAME
For example, to import a rule called "Numbers" from the file "A.cfg", use the following sample code:
SPSTATEHANDLE hSpeakNumber;
SPSTATEHANDLE hsBeforeImport;
SPSTATEHANDLE hsRuleImport;
// Create new rule called "SpeakNumber"
hr = cpRecoGrammar->GetRule(L"SpeakNumber", NULL, NULL, TRUE, &hSpeakNumber);
// Check hr
// Create new state for the beginning text
hr = cpRecoGrammar->CreateNewState(hSpeakNumber, &hsBeforeImport);
// Check hr
// add the beginning text "speak the number"
hr = cpRecoGrammar->AddWordTransition(hSpeakNumber, hsBeforeImport, L"speak the number",
L" ", SPWT_LEXICAL, 1, NULL);
// Check hr
// Import the rule "Numbers" from A.cfg
hr = cpRecoGrammar->GetRule(L"URL:file://A.cfg\\Numbers", 0, SPRAF_Import, TRUE, &hsRuleImport);
// Check hr
// reference the "Numbers" rule after the beginning text
hr = cpRecoGrammar->AddRuleTransition(hsBeforeImport, NULL, hsRuleImport, 1, NULL);
// Check hr
hr = cpRecoGrammar->Commit(NULL);
// Check hr
Back to top
<TEXTBUFFER>
Summary: The TEXTBUFFER tag is used for applications needing to integrate a dynamic
text box or text selection with a voice command.
XML Attributes:
PROPID (optional, type=VT_I4): Specifies the semantic property's numeric identifier.
PROPNAME (optional): Specifies the semantic property's string identifier.
WEIGHT (optional, type=VT_UI4,VT_I4,VT_R4,VT_R8, default=1/n_sibling_transitions): Specifies
the probability of the TEXTBUFFER-based phrase being spoken by the user.
XML Parent Elements:
LIST, L: List of phrases which can be recognized.
PHRASE, P: Phrase that must be recognized for the containing rule to be recognized.
OPT, O: Optional phrase that may be recognized.
RULE: Rule that contains phrases or text to be recognized.
Detailed Description:
The TEXTBUFFER tag is useful for applications that have a dynamic buffer of text,
and want to allow the user to speak portions of the text. The most obvious
example is likely the text selection user interface. The application offers
a buffer of text, and allows the user to select any contiguous subset of
the buffer. For example, when the text is "a b c d e", the user can select
"a b c" and "c d e", but not "b e" since it is not a contiguous subset of
the text buffer.
The TEXTBUFFER tag allows the grammar author to define a command, and reference the
dynamic text buffer which will be set and maintained at application run time.
For example, the grammar might contain the command "select TEXTBUFFER_PORTION",
which, when using the previous text sample, would allow the phrases "select a
b c", "select "c d e", but not "select b e". The grammar author should focus
her efforts on building commands to operate on the text buffer, while the
application developer need only focus on maintaining the text buffer (see
ISpRecoGrammar::SetWordSequenceData and ISpRecoGrammar::SetTextSelection) and
responding to the TEXTBUFFER-based commands.
The TEXTBUFFER has three main components, the complete text buffer, the text allowed
text subsets in the buffer, and the active selection. The complete text buffer
is a string of text characters, which is double-NULL terminated. The reason
for using a double-NULL to allow for multiple exclusive subsets of the buffer
to be active (e.g. each subset is a paragraph). The recognition engine will
not recognize phrases which span the exclusive subsets (delimited by a single
NULL character). The third component is the active selection, or current
portion of the buffer that should be recognizable (e.g. the application can
update the selection to include on the text visible on the screen, or only
the text selected by the user). Note that any portion of the buffer that is
not included in the TEXTBUFFER's active selection is not recognizable.
The TEXTBUFFER tag is shared across all of the commands associated with a single
grammar object. For applications that need to support multiple text buffers,
the application has three options. If the text buffers use the same commands,
but do not need to be active simultaneously, the application can use the active
selection feature (of the TEXTBUFFER) to switch between buffers. If the text
buffers are unique, but the buffers need to be active simultaneously, the
application can use the single-NULL terminated subsets of the TEXTBUFFER
(noting that each set is exclusive and non-contiguous). Finally, if the
application has multiple text buffers, requires the buffers to be active
simultaneously, and uses different commands for each buffer, the application
can use a single grammar object for each buffer.
The application should use semantic properties (see attributes PROPNAME and PROPID)
to quickly and easily parse the TEXTBUFFER-related text out of the command.
SAPI will automatically set the semantic property's phrase
element range to match the elements taken from the TEXTBUFFER.
The speech recognition engine must support text-buffers inside of a CFG for the
grammar to load and activate successfully. The application can determine if
an engine supports the TEXTBUFFER tag by retrieving the SR engine's object
token (see ISpRecognizer::GetRecognizer), and then checking for the existence
of the engine attribute "WordSequences" (see ISpObjectToken::MatchesAttributes).
XML Grammar Sample(s):
<GRAMMAR>
<!-- basic command to perform text selection -->
<RULE ID="SelectText" TOPLEVEL="ACTIVE">
<P>select the words</P>
<TEXTBUFFER PROPID="PID_SelectedText"/>
</RULE>
</GRAMMAR>
Programmatic Equivalent:
To programmatically create a text-buffer transition in a CFG, the application developer
can use the ISpGrammarBuilder::AddRuleTransition with a special rule handle,
called SPRULETRANS_TEXTBUFFER. For example, the following code creates a simple
command called "SelectText" which recognizes the command "select TEXTBUFFER".
SPSTATEHANDLE hsSelectText;
// Create new top-level rule called "SelectText"
hr = cpRecoGrammar->GetRule(L"SelectText", NULL,
SPRAF_TopLevel | SPRAF_Active, TRUE,
&hsSelectText);
// Check hr
// Create an interim state before the text-buffer transition
SPSTATEHANDLE hsBeforeTextBuffer;
hr = cpRecoGrammar->CreateNewState(hsPlayCard, &hsBeforeTextBuffer);
// Check hr
// Add the command word "select"
hr = cpRecoGrammar->AddWordTransition(hsSelectText, hsBeforeTextBuffer,
L"select", L" ", SPWT_LEXICAL, NULL, NULL);
// Check hr
// Add text-buffer transition
hr = cpRecoGrammar->AddRuleTransition(hsBeforeTextBuffer, NULL,
SPRULETRANS_TEXTBUFFER, NULL, NULL);
// Check hr
// save/commit changes
hr = cpRecoGrammar->Commit(NULL);
// Check hr
// ... perform other processing/setup
// Setup text-buffer
// Place the contents of text buffer into pwszCoMem and
// the length of the text in cch
SPTEXTSELECTIONINFO tsi;
tsi.ulStartActiveOffset = 0;
tsi.cchActiveChars = cch;
tsi.ulStartSelection = 0;
tsi.cchSelection = cch;
pwszCoMem2 = (WCHAR *)CoTaskMemAlloc(sizeof(WCHAR) * (cch + 2));
if (pwszCoMem2)
{
// SetWordSequenceData requires double NULL terminator.
memcpy(pwszCoMem2, pwszCoMem, sizeof(WCHAR) * cch);
pwszCoMem2[cch] = L'\0';
pwszCoMem2[cch+1] = L'\0';
// set the text buffer data
hr = cpRecoGrammar->SetWordSequenceData(pwszCoMem2, cch + 2, NULL);
// Check hr
// set the text selection information independently
hr = cpRecoGrammar->SetTextSelection(&tsi);
// Check hr
CoTaskMemFree(pwszCoMem2);
}
CoTaskMemFree(pwszCoMem);
// the SR engine is now capable of recognizing the contents of the text buffer
Back to top
<WILDCARD>
Summary: The WILDCARD tag is used in rules or phrases that need added robustness and
flexibility for the speaker's phrasing.
XML Attributes:
None
XML Parent Elements:
LIST, L: List of phrases which can be recognized.
PHRASE, P: Phrase that must be recognized for the containing rule to be recognized.
OPT, O: Optional phrase that may be recognized.
RULE: Rule that contains phrases or text to be recognized.
XML Element Children:
None.
Detailed Description:
The WILDCARD tag is designed for applications that would like to recognize
some phrases without failing due to irrelevant, or ignorable words. For
example, an application may have a command with the phrase "save document".
Many users may trivially modify the phrase by saying "save my document",
"save the document", "save this document", etc.. With a pure CFG, the latter
phrases would all fail to be recognized due to the extra words. The grammar
author can add a wildcard, or garbage field, which will consume the extra
words, and allow the application to successfully handle all of the phrases.
In the aforementioned case, the grammar would need a wildcard before the word
"document".
The WILDCARD is different from DICTATION in that the application will never see the
recognized garbage words, even though they were recognized. Consequently, the
application and grammar author should not place wildcards in places which may
affect the intended user action (e.g. "cancel save" is not the same as "please
save".
The grammar author can also use a special character, ellipsis (...) instead of the entire
XML tag. See XML Grammar Format: Special Wildcard Tag.
The speech recognition engine must support wildcards inside of a CFG for the grammar
to load and activate successfully. The application can determine if an engine
supports the WILDCARD tag by retrieving the SR engine's object token (see
ISpRecognizer::GetRecognizer), and then checking for the existence of the
engine attribute "WildcardInCFG" (see ISpObjectToken::MatchesAttributes).
The engine can specify support for the WILDCARD tag to be anywhere in the
CFG phrase (attribute value="Anywhere"), or only at the end (attribute
value="Trailing").
XML Grammar Sample(s):
<GRAMMAR>
<!-- basic command to play the queen of hearts -->
<RULE ID="PlayCard" TOPLEVEL="ACTIVE">
<P>play <WILDCARD/> queen of hearts</P>
</RULE>
<!-- basic command to play the queen of hearts, using special ellipsis -->
<RULE ID="PlayCard_Ellipsis" TOPLEVEL="ACTIVE">
<P>play ... queen of hearts</P>
</RULE>
</GRAMMAR>
Programmatic Equivalent:
To programmatically create a wildcard transition in a CFG, the application developer
can use the ISpGrammarBuilder::AddRuleTransition with a special rule handle,
called SPRULETRANS_WILDCARD. For example, the following code creates a simple
command called "PlayCard" which recognizes the command "play WILDCARD queen of hearts".
SPSTATEHANDLE hsPlayCard;
// Create new top-level rule called "PlayCard"
hr = cpRecoGrammar->GetRule(L"PlayCard", NULL,
SPRAF_TopLevel | SPRAF_Active, TRUE,
&hsPlayCard);
// Check hr
// Create an interim state before the wildcard transition
SPSTATEHANDLE hsBeforeWildcard;
hr = cpRecoGrammar->CreateNewState(hsPlayCard, &hsBeforeWildcard);
// Check hr
// Add the command word "play"
hr = cpRecoGrammar->AddWordTransition(hsSendMail, hsBeforeWildcard,
L"play", L" ", SPWT_LEXICAL, NULL, NULL);
// Check hr
// Create an interim state after the wildcard transition
SPSTATEHANDLE hsAfterWildcard;
hr = cpRecoGrammar->CreateNewState(hsPlayCard, &hsAfterWildcard);
// Check hr
// Add interim wildcard transition
hr = cpRecoGrammar->AddRuleTransition(hsBeforeWildcard, hsAfterWildcard,
SPRULETRANS_WILDCARD, NULL, NULL);
// Check hr
// Add the command words "queen of hearts"
hr = cpRecoGrammar->AddWordTransition(hsAfterWildcard, NULL,
L"queen of hearts", L" ", SPWT_LEXICAL, NULL, NULL);
// Check hr
// save/commit changes
hr = cpRecoGrammar->Commit(NULL);
// Check hr
The previous sample code will support any of the following phrases:
"play the queen of hearts"
"play a queen of hearts"
"play the left queen of hearts"
etc.
Note that the italicized words will be recognized by the speech recognition engine,
but will not be returned to the application. The application should not put
any application-logic sensitive inside of a wildcard, since the text is not
returned.
Back to top