Windows apps
Collapse the table of content
Expand the table of content
The topic you requested is included in another documentation set. For convenience, it's displayed below. Choose Switch to see the topic in its original location.

token Element

Contains a string that a speech recognizer can use for recognition and optionally specifies the display form of the string and the precise pronunciation that will trigger recognition.


  sapi:display = "string"  
  sapi:pron = "string">





Optional. Specifies the form of the word or phrase contained by the token element that should be displayed in the user interface. The token element contains the lexical form of a word, which is used for recognition unless a custom pronunciation is specified by the sapi:pron attribute. The display form of a word is often the same as its lexical form.

When using sapi:display in a token element, the grammar Element must include the following declaration: xmlns:sapi=""


Optional. Specifies an inline, custom pronunciation that the speech recognition engine can use to recognize the contents of the token element. The value of sapi:pron must use phones from the phonetic alphabet specified in the sapi:alphabet attribute of the grammar element.

When using sapi:pron in a token element, the grammar element must include the sapi:alphabet attribute, and must also contain the following declaration: xmlns:sapi=""


A token element typically contains a word or short phrase in the language being recognized. For example, although the city name San Francisco consists of two character strings separated by a space, English speakers recognize the name as a single entity. A token element must not be empty.

The token element allows you to specify three forms of a word: the display form, the lexical form, and a custom pronunciation for the word. Possible uses for the display form include cardinal numbers and acronyms. In the following example, the phrases "United States of America" and "fifty" would be used for recognition, but the user interface would display "USA" and "50".

<item> The <token sapi:display="USA"> United States of America </token> has <token sapi:display="50"> fifty </token> states. </item>

Phones are letters or symbols that describe the sounds of speech. System.Speech supports three phonetic alphabets for specifying custom pronunciations: the Universal Phone Set (UPS), the Speech API (SAPI) Phone set, and the International Phonetic Alphabet (IPA). The phones specified in sapi:pron must match the phonetic alphabet specified in the sapi:alphabet attribute of the grammar element. If the phones are not space-delimited or the specified string contains an unrecognized phone, the recognition engine does not recognize the specified pronunciation as a valid pronunciation of the word contained by the token element. If sapi:pron is specified, the speech recognition engine does not use the string contained by the token element for recognition, but returns the string as the recognition result if the speech input matches the pronunciation specified in the sapi:pron attribute.

Pronunciations specified in token elements in speech recognition grammar documents take precedence over pronunciations specified in lexicons associated with a grammar or a recognition engine. Also, the pronunciation in a token element applies only to the single occurrence of the word or phrase contained by the token element.

Unlike the Speech Recognition Grammar Specification (SRGS) Version 1.0 specification, System.Speech does not support the use of the xml:lang attribute on the token element. Grammars in System.Speech can contain only one language, and this must be declared in the grammar Element. To support multiple languages for your applications, you can use multiple grammars in parallel, each with a separate single language.


The grammar in the following example contains slang words and also has an uncommon word: "whatchamacallit". Adding a custom, inline pronunciation using the sapi:pron attribute can improve the accuracy of recognition for the word "whatchamacallit" as well as for the entire phrase that contains it. The example uses phones from the Microsoft Universal Phone Set (UPS) to define the custom pronunciations.

<?xml version="1.0" encoding="utf-8"?>
<grammar xml:lang="en-US" root="slang" 
tag-format="semantics/1.0" sapi:alphabet="x-microsoft-ups" 
version="1.0" xmlns=""

  <rule id="slang">

      <item> give me </item>
      <item> gimme </item>
      <item> hand me </item>
      <item> ha'me </item>

      <item> the </item>
      <item> duh </item>

      <item> thingamajig </item>
      <item> <token sapi:pron="W AE T CH AE M AE K AA L IH T"> whatchamacallit </token> 


© 2017 Microsoft