Using VoiceXML in a UCMA 3.0 Application (Part 1 of 4)

Summary:   The Microsoft Unified Communications Managed API (UCMA) 3.0 Core SDK can be used to write interactive voice response (IVR) applications that work with VoiceXML documents. This set of articles shows how you can use the speech recognition and speech synthesis features of VoiceXML in a Microsoft Unified Communications Managed API (UCMA) 3.0 application.

Applies to:   Microsoft Unified Communications Managed API (UCMA) 3.0 Core SDK

Published:   June 2011 | Provided by:   Mark Parker, Microsoft | About the Author


Download code   Download code

Watch video   See video

This is the first in a series of four articles about how to use VoiceXML in a UCMA 3.0 application.

UCMA 3.0 provides two approaches for developing Interactive Voice Response (IVR) applications.

One approach uses the SpeechRecognitionConnector class to combine the call-control capabilities of UCMA 3.0 with the speech recognition capabilities of the SpeechRecognitionEngine class. The grammars that are used in this approach can be either Speech Recognition Grammar Specification (SRGS) XML grammars or grammars that are created by using the GrammarBuilder API. This approach is discussed in Using Speech Recognition in UCMA 3.0 and Lync 2010: Scenario Overview (Part 1 of 5).

The other approach uses the Browser class to load and interpret a VoiceXML document that can incorporate an SRGS grammar and embedded JavaScript. This approach is the subject of this series of articles.

The UCMA 3.0 application described in this series of articles loads and interprets a VoiceXML document that consists of one form element that contains a single field element. The purpose of the VoiceXML application is to ask the user for the name of a city, and to return the local time in that city. A prompt element in the field asks the user to say the name of a city. When the user speaks the name of one of the cities in a grammar that is used by the VoiceXML document, the form constructs a string that contains the current time in that city, and uses speech synthesis to speak the string to the user.

The following is a typical dialog between the application and a user who is in Los Angeles, and the local time is 2:23 PM.

  • Application: “Say the name of a city to find the present time there.”

  • User: “Atlanta.”

  • Application: “The time in Atlanta is 5:23 PM.”

Implementing the scenario requires the following components.



UCMA 3.0 application

In addition to creating the infrastructure, which includes the CollaborationPlatform instance and an endpoint, the UCMA 3.0 application is responsible for creating and initializing a Browser instance. The Browser class exposes methods that can be used to load and execute a VoiceXML document.

VoiceXML document

The VoiceXML document guides the dialog between the computer and a user, by using structures such as forms, menus, and fields.

Speech Recognition Grammar Specification (SRGS) XML grammar

A VoiceXML document can use an SRGS grammar to help with speech recognition. The SRGS grammar can be inline in the VoiceXML document or can be external to the VoiceXML document.

Important note Important

UCMA 3.0 provides support for VoiceXML 2.0. VoiceXML 2.1 is not supported.

Mark Parker is a programming writer at Microsoft whose current responsibility is the UCMA SDK documentation. Mark previously worked on the Microsoft Speech Server 2007 documentation.