Export (0) Print
Expand All

Responding to speech interactions (XAML)

Applies to Windows and Windows Phone

Incorporate voice commands, speech recognition, and text-to-speech (TTS) into the user interaction experience of your Windows Runtime app.

Note  Voice commands and speech recognition are not supported by Windows Store apps.

Speech can be a very compelling and enjoyable way for users to interact with your app. Use it as a primary or complementary input option that can increase usability and broaden the appeal of your app in the Store. Speech integration can be particularly useful where physical manipulation or eye contact is difficult or undesirable.

Tip  The info in this topic is specific to developing Windows Store apps using C++, C#, or Visual Basic. See Responding to speech interactions (HTML) for Windows Store apps using JavaScript.

Prerequisites:  Have a look through these topics to get familiar with the technologies discussed here.

Create your first Windows Store app using C# or Visual Basic

Create your first Windows Store app using C++

Roadmap for Windows Store apps using C# or Visual Basic

Roadmap for Windows Store apps using C++

Learn about events with Events and routed events overview

User experience guidelines:  

See Speech design guidelines for helpful tips on designing a useful and engaging speech-enabled app.

In this section


Quickstart: Voice commands with Cortana

Use voice commands with Cortana to launch an app and specify an action or command to execute.

How to dynamically modify VCD phrase lists

Learn how to update the list of supported phrases (PhraseList elements) in a Voice Command Definition (VCD) file at run time.

Quickstart: Speech recognition

Use speech recognition to provide input, specify an action or command, and accomplish tasks within your Windows Phone Store app using JavaScript.

How to define custom recognition constraints

Learn how to define and use custom constraints for speech recognition.

How to manage issues with audio input

Learn how to manage issues with speech-recognition accuracy caused by audio-input quality and condition.



Speech functionality is composed of three modes: system-supported voice commands (Windows Phone only), app-enabled speech recognition (Windows Phone only), and TTS. This illustration shows how these modes work together.

The three speech components

  • Applies to Windows Phone

Voice commands

Voice commands are supported by the system, extended in your app, and accessed by the user from outside your app.

Once an app is installed, it can be launched through voice commands such as "open" or "start", followed by the app name. By extending voice command functionality in your app, you can link to a specific page in the app, perform a task, or initiate an action using a phrase such as "Start Contoso Search" or "Contoso Show Me My Favorites."

When you extend and customize voice commands, users can discover what phrases your app is listening for through system help and the What can I say screen.

For more info, see Quickstart: Voice commands.

  • Applies to Windows Phone

Speech recognition

Speech recognition is implemented in your app and accessed by the user from your app.

Users can provide input or accomplish tasks with speech recognition. The feature includes support for pre-defined grammars for free-text dictation and web search, and support for custom grammars authored using Speech Recognition Grammar Specification (SRGS) Version 1.0.

You can use the default system UI for speech recognition that supports disambiguation and provides visual feedback to users, or you can create your own UI.

See Quickstart: Speech recognition and Windows.Media.SpeechRecognition.

  • Applies to Windows
  • Applies to Windows Phone

Text-to-speech (TTS)

Text-to-speech (TTS), also known as speech synthesis, is implemented in your app and accessed by the user from your app.

TTS enables your app to read aloud a basic text string, or a more complex one declared in Speech Synthesis Markup Language (SSML).

SSML provides a standard way to control characteristics of speech output such as pronunciation, volume, pitch, rate or speed, and emphasis.

See Windows.Media.SpeechSynthesis.

Related topics

Responding to user interaction



© 2014 Microsoft