speak Element (Microsoft.Speech)

The required root element of a Speech Synthesis Markup Language (SSML) document.


<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="string"> </speak>





Required. Indicates the version of the World Wide Web Consortium Speech Synthesis Markup Language (SSML) Version 1.0 specification used to interpret the document markup. The current version is 1.0.


Required. Specifies the language of the root document. The value may contain only a lower-case, two-letter language code, (such as en for English or it for Italian) or may optionally include an upper-case, country/region or other variation in addition to the language code. Examples with a county/region code include es-US for Spanish as spoken in the US, or fr-CA for French as spoken in Canada. See the Remarks section for additional information.


Required. Specifies the URI to the document that defines the markup vocabulary (the element types and attribute names) of the SSML document.

The current URI is http://www.w3.org/2001/10/synthesis.


The Microsoft Speech Platform SDK 11 accepts all valid language-country codes as values for the xml:lang attribute. For a given language code specified in the xml:lang attribute, a Runtime Language that supports that language code must be installed to correctly pronounce words in the specified language.

If the xml:lang attribute specifies only a language code, (such as "en" for English or "es" for Spanish), and not a country/region code, then any installed voice that expresses support for that generic, region-independent language may produce acceptable pronunciations for words in the specified language. See Language Identifier Constants and Strings for a comprehensive list of language codes.


A voice is an installed Runtime Language for speech synthesis (TTS, or text-to-speech). The Microsoft Speech Platform Runtime 11 and Microsoft Speech Platform SDK 11 do not include any Runtime Languages for speech synthesis. You must download and install a Runtime Language for each language in which you want to generate synthesized speech. A Runtime Language includes the language model, acoustic model, and other data necessary to provision a speech engine to perform speech synthesis in a particular language. See InstalledVoice for more information.

The other elements in the SSML document that also take the xml:lang attribute (voice, p, and s elements) may declare different languages than the language declared in the speak element. The Speech Platform SDK 11 supports multiple languages in SSML documents.


The following example creates a prompt that counts to ten; beginning in English, then in French, and finishing in German. Correct pronunciation of all the numbers will only be achieved if Runtime Languages for English, French, and German have been installed. As it processes the prompt, the SpeechSynthesizer object automatically selects and uses the default voice (if installed) for each language specified by an xml:lang attribute.

<?xml version="1.0" encoding="ISO-8859-1"?>
<speak version="1.0"

  <!-- Count in English -->
  <p> one, two, three </p>

  <!-- Count in French -->
  <p xml:lang="fr-FR"> quatre, cinq, six </p>

  <!-- Count in German -->
  <p xml:lang="de-DE"> sieben, acht, neun, zehn </p>