ISpVoice (SAPI 5.3)

Speech API 5.3
Microsoft Speech API 5.3

Text-to-Speech Overview

ISpVoice Introduction

The central SAPI API for text-to-speech (TTS) is ISpVoice. Using this interface, applications can add TTS support such as speaking text, modifying speech characteristics, changing voices, as well as responding to real-time events while speaking. In fact, most applications should need only this single interface to accomplish everything that is needed for basic TTS support.

Applications obtain access to ISpVoice interface methods by creating a COM object. As the name implies, an ISpVoice object is simply a single instance of a specific TTS voice. Every ISpVoice object is an individual voice. Even if two different ISpVoice objects select the same base voice (for example "Mike"), each of the two voices can be changed and modified independently of the other.



When an application first creates an ISpVoice object, the object initializes to the default voice (set in Speech properties of Control Panel). This means that the new object is immediately ready to speak text, no special initialization is needed. At this point, applications can use Speak or SpeakStream to speak any Unicode text data.


Synchronous vs. Asynchronous Speaking

The two speaking functions can generate speech either synchronously (function does not return until text has completely spoken) or asynchronously (function returns immediately but continues speaking as a background process). Asynchronous operation is chosen if the application needs to do something else (highlight text, paint animation, monitor controls, etc.) while speaking. Otherwise, the simplest case is to speak synchronously.


Getting Status Information

During asynchronous speech, applications can get current status information (text position, speech done state, bookmarks, etc.) in one of two ways. The simplest way is to periodically poll the ISpVoice object using the GetStatus method. The other way is to initialize the ISpVoice object so that it sends real-time events to the application as they happen.


Flow Control

As a convenience, most TTS applications allow users to temporarily suspend speech output. The Pause and Resume methods are typically called in response to a user initiated action.


Modifying Voice Attributes

Often with TTS, voice output needs to be modified from its default setting. There are two ways to do this is; either by calling certain ISpVoice API methods, or by embedding special Extended Markup Language (XML) tags within the spoken text. Typically, the API functions are used as global settings that affect the speech independent of current selected voice or document that is spoken. While the XML tags are usually used in much narrower scope, affecting only the spoken style in a single document.


Audio Output

Although usually the default for desktop applications, audio output for TTS is not restricted to hardware sound card destinations. SAPI TTS supports, either directly or indirectly, just about any audio configuration an application may require. Whether the destination is a PC sound card, buffer in memory, or a special telephony hardware, ISpVoice has several audio control methods to change the audio path from its default configuration.



ISpVoice Methods

Speaking Text
SpeakSpeaks a text string or file.
SpeakStreamSpeaks a text stream or plays an audio (WAV) stream.
Real-time Status
GetStatusReturns current speech and event status information.
WaitUntilDoneDelays until either the voice has completed speaking or the specified time interval has elapsed.
SpeakCompleteEventReturns an event handle that will be signaled when speech is done.
Flow Control
PausePauses the output speech at the nearest alert boundary.
ResumeResumes speaking.
SkipSkips ahead or backward to a new input text position while speaking.
Changing Voice Attributes
SetRateSets the speaking rate in real time.
GetRateReturns the current speaking rate.
SetVolumeSets the speech volume level in real time.
GetVolumeReturns the current speech volume level.
SetVoiceSets the identity of the voice used for synthesis.
GetVoiceRetrieves the object token that identifies the current voice.
Real-time Event Management (inherited from ISpEventSource)
SetInterestSets the type of events to queue.
GetEventsReturns the queued events.
GetInfoReturns information about the event queue.
SetNotifySinkSets up the instance to make free-threaded calls through ISpNotifySink::Notify.
SetNotifyWindowMessageSets a window handle to receive notifications as window messages.
SetNotifyCallbackFunctionSets a callback function to receive notifications.
SetNotifyCallbackInterfaceSets an object derived from ISpTask to receive notifications.
SetNotifyWin32EventSets up a Win32 event object to be used by this instance for notifications.
WaitForNotifyEventA blocking call which waits for a notification.
GetNotifyEventHandleRetrieves Win32 event handle associated with this notify source.


Audio Output Control
SetOutputSets the current output object. A value of NULL may be used to select the default audio device.
GetOutputStreamRetrieves a pointer to the current output stream.
GetOutputObjectTokenRetrieves the object token for the current output object.
SetPrioritySets the priority for the voice.
GetPriorityRetrieves the current voice priority level.
SetAlertBoundarySpecifies which event should be used as the insertion point for alerts.
GetAlertBoundaryRetrieves the event that is currently being used as the insertion point for alerts.
IsUISupportedDetermines if the specified type of UI is supported.
DisplayUIDisplays the requested UI.
SetSyncSpeakTimeoutSets the timeout interval in milliseconds after which, synchronous Speak and SpeakStream calls to this instance of the voice will timeout.
GetSyncSpeakTimeoutRetrieves the timeout interval for synchronous speech operations for this ISpVoice instance.