Collapse the table of content
Expand the table of content

MkVoice (SAPI 5.3)

Speech API 5.3
Microsoft Speech API 5.3



MkVoice creates text-to-speech (TTS) voice fonts for sample use. It combines a series of individually spoken words into a single file. The resulting file is automatically loaded and you may use it in any TTS application recognizing SAPI 5 voices.

MkVoice is intended to demonstrate making sample voice fonts. The limited scope is not ideal to create larger, more robust voice samples. Additionally, extensive error checking or error prevention routines are not included so that the resulting file may contain conditions not optimal to superior performance.

Running MkVoice

MkVoice is a command line application that accepts three parameters:

mkvoice WordListFile VoiceFile VoiceName

This is a document list of words to concatenate and form the output file. The file needs to be saved as text only or created using a simple editor such as NotePad. No character formatting is allowed. The list requires only one word per line and that the line terminate with a carriage return.

The list can be of any size but each word must have a corresponding wav file with the same name. That is, if you use the word "enter," you need a file named enter.wav. The first entry in the list is used as the default word. If a word is encountered that is not otherwise in the list, use this default word instead.

For example, if using the SDK example TTSApp with the "Sample TTS Voice," the text initially displayed will be spoken: enter text to be spoken here. If you change the text by adding a word not in the file list such as "enter text now," it will be spoken as "enter text blah." "Now" is not a part of word list.txt. However, since "blah" is the first entry, it will be used for all unknown words.


This is the resulting output file. By SAPI 5 convention, the vce name is the recommended suffix, although it is not required. If successfully generated, MkVoice automatically loads the sample. The new voice will be displayed with the name "Sample TTS Voice." This replaces any previous voice fonts. The voice will be defined as an English-speaking male.

Finally, to run MkVoice, the application must be in the same folder as the wav files and word list. If run successfully, the application creates an output file; otherwise, it will display appropriate error messages.


This is the name associated with the voice by the object token. Creating voice fonts

Creating Voice Fonts

Voice fonts are a collection of words spoken by a person and assembled into a phonetic dictionary. When SAPI encounters a word, this database looks it up. A successful match plays a portion of the sound file of the word. By contrast, a synthesized voice uses mathematical algorithms to produce the word. The voice fonts produce what is often considered better and more natural prosody--the way the word sounds.

Voice fonts require two components. First, a word list must be generated. This is the same list as the first parameter described above. In the SDK, the MkVoice example uses wordlist.txt.

Second, individual wav files are needed for each word. The first part of the file name must correspond to an entry in the word list. The name suffix must be wav. Additionally, the wav file must contain the following characteristics:

Audio formatPCM
Sample rate11.25 kHz
Audio sample size16 bit
Channels1 (mono)

Files should contain only one word each. Silence leading or trailing a word should be minimal to provide the best playback. You may not use punctuation marks. Replace any marks with underscores ("_") in both the file name and word list. For instance, MkVoice provides a file named computer_s.wav and this matches the corresponding entry in wordlist.txt. A simple way to generate the wav files is to use Sound Recorder provided by the Windows operating system. The file characteristics may have to be changed to the above requirements.

© 2015 Microsoft