lexicon Element

Article
01/20/2015

Specifies an external pronunciation lexicon file.

Syntax

<lexicon
    uri = lexiconURI
    type = mediaType />

Attributes

Attribute	Description
uri	Required. Specifies the location of the pronunciation lexicon, which is a relative URI or an absolute URI.
type	Optional. Specifies the media type of the pronunciation lexicon document. Two values are currently supported. The value application/pls+xml indicates that the lexicon conforms to the Pronunciation Lexicon Specification (PLS) Version 1.0 specification. This is the preferred format to use. The value application/vdn.ms-sapi-lex indicates the lexicon format is Uncompressed Lexicon, a format created by Microsoft. This is a legacy format and we recommend that you use the PLS format described above.

uri

Required. Specifies the location of the pronunciation lexicon, which is a relative URI or an absolute URI.

type

Optional. Specifies the media type of the pronunciation lexicon document. Two values are currently supported.

The value application/pls+xml indicates that the lexicon conforms to the Pronunciation Lexicon Specification (PLS) Version 1.0 specification. This is the preferred format to use.

The value application/vdn.ms-sapi-lex indicates the lexicon format is Uncompressed Lexicon, a format created by Microsoft. This is a legacy format and we recommend that you use the PLS format described above.

Remarks

A pronunciation lexicon is a collection of words or phrases together with their pronunciations, which consist of letters and characters from a supported phonetic alphabet. You can use lexicons to create custom pronunciations for specialized vocabulary in your application. Pronunciations specified in an external lexicon file take precedence over the pronunciations of the speech synthesizer's internal lexicon or dictionary. However, pronunciations specified inline in prompts using the token Element take precedence over pronunciations specified in any lexicon. Inline pronunciations apply only to a single occurrence of a word. For more information, see Lexicons and Phonetic Alphabets.

The lexicon element is an immediate child of the grammar Element. In System.Speech, each grammar document can reference only one lexicon. This is a departure from the Speech Recognition Grammar Specification (SRGS) Version 1.0.

The lexicon element must be placed after the opening tag of the grammar element and before the first rule element in the grammar element.

The pronunciation information provided by a lexicon is specific to the single language declared in the xml:lang attribute of the grammar element, and is used only for content defined within the enclosing SRGS document.

Example

The following grammar specifies a simple PLS lexicon that defines the pronunciation for a single word, "blue". Typically, you would use the sapi:pron attribute of the token Element to specify a pronunciation for a single occurrence of a word. However, this example demonstrates how to attach a lexicon to an SRGS grammar and how the pronunciation defined in a lexicon may affect the results of speech recognition. A custom lexicon overrides the pronunciations in the speech recognition engine's internal lexicon and applies to all occurrences of the words it defines.

<?xml version="1.0" encoding="UTF-8"?>

<grammar 
  version="1.0" mode="voice" root="colors"
  xml:lang="en-US" tag-format="semantics/1.0" 
sapi:alphabet="x-microsoft-ups" 
xml:base="https://www.contoso.com/"
xmlns="http://www.w3.org/2001/06/grammar"
xmlns:sapi="https://schemas.microsoft.com/Speech/2002/06/SRGSExtensions">
 
  <lexicon uri="C:\Test\Blue.pls" />
  
  <rule id="colors" scope="public"> 
      <one-of>
        <item> blue </item>
        <item> yellow </item>
        <item> red </item>
      </one-of>  
  </rule>

</grammar>

The following are the contents of the PLS lexicon file Blue.pls. The pronunciation is specified in the phoneme element using characters from the Universal Phone Set (UPS), and corresponds to the word spelling "blee".When this lexicon is linked to an SRGS grammar, as shown above, the speech recognition engine will recognize the speech input "blee", but will return "blue" as the recognized text. This grammar will probably not recognize the speech input "blue". If it does, it will be with much lower confidence than when recognizing the speech input "blee".

<?xml version="1.0" encoding="UTF-8"?>

<lexicon version="1.0" 
      xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="x-microsoft-ups" xml:lang="en-US">


  <lexeme>
    <grapheme> blue </grapheme>
    <phoneme> B L I </phoneme>
  </lexeme>

</lexicon>

lexicon Element

Syntax

Attributes

Remarks

Example

See Also

Concepts

Additional resources