Using Speech Recognition in UCMA 3.0 and Lync 2010: Scenario Overview (Part 1 of 5)

Summary:   This is the first in a series of five articles that describe how a Microsoft Unified Communications Managed API (UCMA) 3.0 application and a Microsoft Lync 2010 application can be combined to perform speech recognition. Part 1 describes the scenario that motivates the two applications.

Applies to:   Microsoft Lync Server 2010 | Microsoft Unified Communications Managed API 3.0 Core SDK | Microsoft Lync 2010

Published:   April 2011 | Provided by:   Mark Parker, Microsoft | About the Author


Code Gallery   Download code

This is the first in a five-part series of articles that describe how to incorporate speech recognition in Lync 2010 applications that interoperate with UCMA 3.0.

The SpeechRecognitionConnector class in the Microsoft Unified Communications Managed API (UCMA) 3.0 Core SDK is used to connect a SpeechRecognitionEngine object to an audio stream in a UCMA 3.0 application. When a suitable grammar is loaded into the SpeechRecognitionEngine object, the UCMA 3.0 application is ready to recognize specific phrases that come from the Microsoft Lync 2010 application. This article describes the UCMA 3.0 application, the Speech Recognition Grammar Specification (SRGS) XML grammar, and the Lync 2010 Silverlight application that runs in the Lync 2010 Conversation Window Extension (CWE).

The UCMA 3.0 and Lync 2010 applications that are described in this series of articles are an implementation of the following scenario. The UCMA 3.0 application creates an audio call to the Lync 2010 user, and then creates a context channel between itself and the Lync 2010 user. The UCMA 3.0 application then establishes the context channel, which causes the Lync 2010 CWE to open in the Lync 2010 client. A simulated website for Blue Yonder Airlines appears in the CWE, and then the Lync 2010 user is prompted to specify information for an airline flight.

Note Note

To save space, the following illustrations show only the Lync 2010 CWE. The Lync 2010 conversation window that the CWE is attached to is minimized.

Figure 1. Lync 2010 application before speech recognition

Extensibility form before speech recognition

When the user speaks a sentence such as “I want to fly from Denver to Miami,” the audio for this utterance is fed into a SpeechRecognitionEngine instance that is connected to the UCMA 3.0 application. An SRGS XML grammar that was previously loaded into the speech recognition engine extracts semantic information about the origination city (Denver) and the destination city (Miami). This semantic information and the cost of the ticket are packed into an array and are then sent to the Lync 2010 client through the context channel. The Microsoft Silverlight application on the Lync 2010 client separates the following three items, and displays each in a text box in the form that appears in the CWE.

  • Origination city

  • Destination city

  • Ticket price

The Lync 2010 user can stop the application by clicking the Exit button on the form in the CWE.

Figure 2. Lync 2010 application after semantic results are returned

Extensibility form after speech recognition

Mark Parker is a programming writer at Microsoft whose current responsibility is the UCMA SDK documentation. Mark previously worked on the Microsoft Speech Server 2007 documentation.