Note

Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.

Speech Synthesis (Microsoft.Speech)

The Microsoft.Speech.Synthesis namespace contains classes that allow you to initialize and configure a speech synthesis engine, create prompts, generate speech, respond to events, and modify voice characteristics. Speech synthesis is often referred to as text-to-speech or TTS.

Create TTS Content (Prompts)

The content that a TTS engine speaks is called a prompt. Creating a prompt can be as simple typing a string. See Speak the Contents of a String (Microsoft.Speech).

For greater control over speech output, you can create prompts programmatically using the methods of the PromptBuilder class to assemble content for prompts from text, Speech Synthesis Markup Language (SSML), files containing text or SSML markup, and prerecorded audio files. PromptBuilder also allows you to select a speaking voice and to control attributes of the voice such as rate and volume. See Construct and Speak a Simple Prompt (Microsoft.Speech) and Construct a Complex Prompt (Microsoft.Speech) for more information and examples.

You can also author content using SSML-compliant XML, which provides a full range of content authoring features and also allows you to select and control speaking voices. See Speech Synthesis Markup Language Reference (Microsoft.Speech) for a guide to SSML markup.

You can append SSML files to a PromptBuilder object for playback by the SpeechSynthesizer. See Use SSML to Control Synthesized Speech (Microsoft.Speech).

Initialize and Manage the Speech Synthesizer

The SpeechSynthesizer class provides access to the functionality of a TTS engine in the Microsoft Speech Platform Runtime 11. Using the SpeechSynthesizer class, you can select a speaking voice, specify the output for generated speech, create handlers for events that the speech synthesizer generates, and start, pause, and resume speech generation. See Initialize and Manage the Speech Synthesizer (Microsoft.Speech).

A voice is an installed Runtime Language for speech synthesis. You can download and install any of 26 Runtime Languages to generate speech in a particular language. See Microsoft Speech Platform SDK 11 Requirements and Installation.

Generate Speech

Using methods on the SpeechSynthesizer class, you can generate speech as either a synchronous or an asynchronous operation from text, SSML markup, files containing text or SSML markup, and prerecorded audio files. See Initialize and Manage the Speech Synthesizer (Microsoft.Speech).

Respond to Events

When generating synthesized speech, the SpeechSynthesizer raises events that inform a speech application about the beginning and end of the speaking of a prompt, the progress of a speak operation, and details about specific features encountered in a prompt. EventArgs classes provide notification and information about events raised and allow you to write handlers that respond to events as they occur. See Use Speech Synthesis Events (Microsoft.Speech)for more information and examples.

Control Voice Characteristics

To control the characteristics of speech output, you can select a voice with specific attributes such as language or gender, modify properties of the SpeechSynthesizer such as rate and volume, or adding instructions either in prompt content or in separate lexicon files that guide the pronunciation of specified words or phrases. See Control Voice Attributes (Microsoft.Speech) for more information and examples.

In This Section