Microsoft Speech Platform

Persist TTS Output to a WAV File

This topic explains how to use the text-to-speech (TTS) functionality in the Microsoft Speech Platform to capture TTS output in a WAV file. The example illustrates how to select a specific voice, how to generate speech using the Speak and SpeakStream methods, and how to set the output audio stream to a WAV file.

The following example in C++/ATL COM speaks a text string ("Hello World") and records the synthesized speech output to the file "ttstemp.wav". The example uses the helper class CSpStreamFormat to set the format of the WAV file. It then uses the helper method, SPBindToFile to bind the audio stream to the WAV file. The example specifies SPSF_22kHz8BitMono for the output audio format. Microsoft TTS engines downsample audio that is of greater than 8-bit resolution.

It is important to use the ISpVoice::SetOutput() method to set the audio outputs to the right stream. This is because the default setting directs the output to the default audio device. For simplification, the ISpVoice::Speak() call is synchronous. If you want to speak asynchronously, change the speak flag to SPF_ASYNC and call ISpVoice::WaitUntilDone() after ISpVoice::Speak() to wait for the completion of the speak process.

	
HRESULT	              hr = S_OK;
CComPtr <ISpVoice>		cpVoice;
CComPtr <ISpStream>		cpStream;
CSpStreamFormat		cAudioFmt;

// Create a voice.
hr = cpVoice.CoCreateInstance( CLSID_SpVoice );

// Set the audio format.
if(SUCCEEDED(hr))
{
  hr = cAudioFmt.AssignFormat(SPSF_22kHz16BitMono);
}
	
// Bind the audio stream to the file.
if(SUCCEEDED(hr))
{
  hr = SPBindToFile( L"c:\\ttstemp.wav",  SPFM_CREATE_ALWAYS,
			&cpStream;, & cAudioFmt.FormatId(),cAudioFmt.WaveFormatExPtr() );
}
	
// Store the output audio data in cpStream.
if(SUCCEEDED(hr))
{
  hr = cpVoice->SetOutput( cpStream, TRUE );
}

// Speak the text "hello world" synchronously.
if(SUCCEEDED(hr))
{
  hr = cpVoice->Speak( L"Hello World",  SPF_DEFAULT, NULL );
}
	
// Close the stream.
if(SUCCEEDED(hr))
{
  hr = cpStream->Close();
}

// Release the stream and the voice object.
cpStream.Release();
cpVoice.Release();
	
	
Show: