Share via


Note

Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.

Microsoft Speech Platform

Audio Interfaces

Use the audio interfaces in the Speech Platform to manage audio input for speech recognition and audio output for speech synthesis (TTS, text-to-speech) and playback of audio files.

With the exception of ISpTranscript, the audio interfaces in the Speech Platform inherit from the standard COM IStream interface. However, since audio devices represent hardware, a ::Clone method may be not be used and will return E_NOTIMPL.

Development Helpers

The following table list enumerations, functions, and classes that are useful when working with audio in the Speech Platform:

Enumeration, Function, or Class Description
SPSTREAMFORMAT Stream formats supported by the Speech Platform.
CSpEvent Class for decoding event structures.
CSpDynamicString Class for managing dynamically sized WCHAR strings.
SpBindToFile Function converts the specified stream format into a wave format structure. 
CSpStreamFormat Class for managing Stream formats and WAVEFORMATEX structures supported by the Speech Platform.

In This Section