Skip to main content

Windows Dev Center

Text-to-Speech: Accessibility Feature for
Silverlight 3.0 Applications

Apurva Lawale, Stefan Wick

(Special thanks to Mark Rideout, Jennifer Linn, Annuska Perkins, and Sean Hayes)

Microsoft Corporation

March 2009

Silverlight 3.0 enables you to build sophisticated, accessible applications that enable as many people as possible—including people with disabilities—to use your product. This article shows you how to use text-to-speech functionality to help make your Silverlight 3.0 applications accessible.

Text-to-speech services are commonly used by people who are blind or have other vision impairments, as well as by people who experience learning difficulties, such as dyslexia. From an application developer’s perspective, text-to-speech services enable you to let your users hear audio feedback and instructions about the UI.

This article includes sample code that shows how text-to-speech functionality in Silverlight 3.0 applications use a Web service to turn short strings into audio. A fully working project with the entire source code is provided to highlight some powerful features of Silverlight 3, such as simultaneously processing and encoding audio for text-to-speech purposes. To download the source code, see Text-to-Speech: Accessibility Feature for Silverlight 3.0 Applications on MSDN Code Gallery.

Text-to-Speech vs. Microsoft SAPI

Before we go any further with the approach mentioned in this article, let’s compare another approach that uses the Microsoft Speech API (SAPI) ActiveX control to implement text-to-speech functionality, as implemented in the ButtercupReader, an online digital talking book reader. The following table highlights the differences between using Text-to-Speech and Microsoft SAPI.

Implement accessibility using…AdvantagesDisadvantages
Text-to-Speech functionality and  Silverlight 3.0
  • Works with any browser that is officially supported by Silverlight 3 (cross-browser)
  • Works on operating systems supported by Silverlight 3 (cross-platform)
  • Text-to-Speech decoding happens on the server-side
  • Web service needs to be implemented
  • Works well for short sentences; for longer sentences, you need to break-up sentences into smaller pieces of text and then process.
Microsoft Speech API (SAPI) (as used in Project ButtercupReader)
  • Text-to-Speech decoding happens on the client-side
  • No implementation of Web services thus avoiding a Web server dependency
  • Long sentences could be processed much easily and faster on the client-side
  • Works only on Windows platform
  • Might need to change security settings in Internet Explorer to allow initializing and scripting of ActiveX controls


As shown in the table, each approach has its advantages and disadvantages. The method you choose depends on the type of functionality you want to provide in the application, keeping the end user in mind.

Text-to-Speech Services in Silverlight

By default Silverlight does not provide text-to-speech functionality; however, Silverlight content by default is accessible using screen readers and other assistive technologies. The following solution enables applications and users to have more control over identifying which content is spoken and when as compared to standard screen readers. It is also useful for dynamic text that needs to be spoken aloud.

The solution consists of five steps:

1.    Entering text into a textbox.

2.    Calling a Web Service to process the text.

3.    Using Microsoft’s Text to Speech API to convert the text to WAV.

4.    Decoding the byte array with a WAV decoding class.

5.    Playing back the WAV stream with Silverlight’s MediaElement.

Let’s look at the five steps in more detail.

Step 1:

We have created a “Silverlight Application” project. The user interface is simple—it consists of a button and a textbox where the user enters the text to be read aloud.

The XAML (Page.xaml) for our user interface looks like:

<UserControl x:Class="TextToSpeech.Page"
Width="400" Height="300">
<Grid x:Name="LayoutRoot" Background="White">
<TextBox x:Name="textToSpeak"/>
<Button Content="Speak" Click="Button_Click"/>
<MediaElement x:Name="media"/>

The code-behind file (Page.xaml.cs) looks like:

using System;
using System.IO;
using System.Windows;
using System.Windows.Controls;
using System.ServiceModel;
using TextToSpeech.SpeechService;
namespace TextToSpeech
    public partial class Page : UserControl
        public Page()
        private void Button_Click(object sender, RoutedEventArgs e)
            //Create a new instance to the TTS web service
            BasicHttpBinding binding = new BasicHttpBinding() { MaxReceivedMessageSize = int.MaxValue };
            EndpointAddress address = new EndpointAddress(@"http://localhost:60382/SpeechService.svc");
            SpeechServiceClient client = new SpeechServiceClient(binding, address);
            //Subscribe to its Completed event so we know
            //it has processed and returned.
            client.CreateWavStreamCompleted += new
            //Send the text to the web service to process
        //The completed event as shown in Step 3 below will come here.

Step 2:

Silverlight alone does not have functionality to perform text to speech. To provide this functionality we will use a Web service. In the sample code provided with this article, “Silverlight-enabled WCF Service” is used in the project.

using System.IO;
using System.Speech.Synthesis;
using System.ServiceModel;
using System.ServiceModel.Activation;
namespace TextToSpeech.Web
    [ServiceContract(Namespace = "")]
    [AspNetCompatibilityRequirements(RequirementsMode = AspNetCompatibilityRequirementsMode.Allowed)]
    public class SpeechService
        public byte[] CreateWavStream(string textToSpeech)
//create an instance of the speech syntheizer
            SpeechSynthesizer ss = new SpeechSynthesizer();
//buffer to hold data to send back to Silverlight
            MemoryStream ms = new MemoryStream();
//pass the audio in memory to Silverlight as a byte array
            return ms.ToArray();

The .NET 3.0 framework and above have the necessary assemblies that allow audio to be synthesized and we are using them in our Web service. Remember to reference the System.Speech assembly by adding it to References in the Solution Explorer of your project.

Step 3:

The Web service performs only the string-to-WAV conversion. The WAV decoding takes place in managed code on the client side. It is now time to prepare the WAV file for proper decoding.

void client_CreateWavStreamCompleted(object sender, CreateWavStreamCompletedEventArgs e)
//Convert the byte array back as a memorystream
      MemoryStream ms = new MemoryStream(e.Result);
      //pass the memorystream to WAVMediaStreamSource for decoding
//after decoding pass it to Silverlight’s MediaElement for
//Audio decoding in step 4 will be here
//Playback in step 5 will be here

Step 4:

//Audio data held in memory using the MemoryStream as shown in step 3
//is passed to WAVMedisStreamSource
WAVMediaStreamSource ws = new WAVMediaStreamSource(ms);

WAVMediaStreamSource is a custom WAV decoding class that is used to decode the byte array as WAV. This class is inherited from System.Windows.Media.MediaStreamSource. Be sure to view this class in the source code that is provided along with this article to see some of the newer functionalities of MediaStreamSource in Silverlight. The source code of the WAVMediaStreamSource class is documented with necessary comments to make understanding easy.

Step 5:

In the previous step we parsed the WAV, which is now ready to be played back. We use the WAVMediaStreamSource and pass it to Silverlight for playback.

//Pass the WAVMediaStreamSource instance created in step 4
//for plyback using Silverlight’s MediaElement



For more information about designing accessible applications using Silverlight, please see the resources listed below.