Using Speech Synthesis in UCMA 3.0: UCMA Application (Part 2 of 4)

Summary:   Combine the capabilities of Microsoft Unified Communications Managed API (UCMA) 3.0 Core SDK with Microsoft Speech Platform SDK to make synthesized speech in your application more natural sounding. Use Speech Synthesis Markup Language (SSML) to insert pauses, increase or decrease the speech volume, expand abbreviations correctly, and pronounce words or phrases phonetically. Part 2 discusses the steps that are required in the Microsoft Unified Communications Managed API (UCMA) 3.0 application for supported synthesized speech.

Applies to:   Microsoft Unified Communications Managed API (UCMA) 3.0 Core SDK | Microsoft Speech Platform SDK

Published:   August 2011 | Provided by:   Mark Parker, Microsoft | About the Author

Contents

  • Setting Up the Speech Synthesis Infrastructure

  • Delegates and Event Handlers

  • Callback Methods

  • Shutdown Process

  • Part 3

  • Additional Resources

Download code  Download code

This article is the second in a four-part series of articles on how to use speech synthesis in a UCMA 3.0 application.

Setting Up the Speech Synthesis Infrastructure

The following example shows declarations for the global variables that are used in later code descriptions.

private UCMASampleHelper _helper;
private UserEndpoint _userEndpoint;
private AudioVideoCall _audioVideoCall;
private AudioVideoFlow _audioVideoFlow;
private SpeechSynthesizer _speechSynthesizer;

private AutoResetEvent _waitForCallToBeAccepted = new AutoResetEvent(false);
private AutoResetEvent _waitForConversationToBeTerminated = new AutoResetEvent(false);
private AutoResetEvent _waitForSynthesisToBeFinished = new AutoResetEvent(false);
  1. Create and establish a UserEndpoint instance.

    _userEndpoint = _helper.CreateEstablishedUserEndpoint("SpeechSynthesis Sample User");
    

    A helper method, CreateEstablishedUserEndpoint, creates and starts a CollaborationPlatform instance, and then creates and establishes a UserEndpoint instance.

  2. Register a delegate to be invoked when an audio/video call arrives.

    _userEndpoint.RegisterForIncomingCall<AudioVideoCall>(AudioVideoCall_Received);
    
  3. After an audio/video call arrives, create a SpeechSynthesisConnector instance.

    SpeechSynthesisConnector speechSynthesisConnector = new SpeechSynthesisConnector();
    
  4. Attach the AudioVideoFlow instance to the speech synthesis connector that is created in step 3.

    speechSynthesisConnector.AttachFlow(_audioVideoFlow);
    

    The AudioVideoFlow instance is obtained in a handler for the AudioVideoFlowConfigurationRequested event on the AudioVideoCall instance. Event handlers are discussed later in this article.

  5. Create a speech synthesizer.

    _speechSynthesizer = new SpeechSynthesizer();
    
  6. Set the output of the speech synthesizer to the speech synthesis connector.

    SpeechAudioFormatInfo audioformat = new SpeechAudioFormatInfo(16000, AudioBitsPerSample.Sixteen, Microsoft.Speech.AudioFormat.AudioChannel.Mono);
    _speechSynthesizer.SetOutputToAudioStream(speechSynthesisConnector, audioformat);
    
  7. Register for notification of the SpeakCompleted and SpeakStarted events on the speech synthesizer.

    _speechSynthesizer.SpeakStarted += new EventHandler<SpeakStartedEventArgs>(_speechSynthesizer_SpeakStarted);
    _speechSynthesizer.SpeakCompleted += new EventHandler<SpeakCompletedEventArgs>(_speechSynthesizer_SpeakCompleted);
    
  8. Start the speech synthesis connector.

    speechSynthesisConnector.Start();
    

At this point, the speech synthesis infrastructure is now enabled. In the following example, the overloaded Speak() method on the SpeechSynthesizer class can be used to speak the text in a string. Other Speak overloads can be used to speak the contents of Prompt or PromptBuilder instances.

_speechSynthesizer.Speak("Good morning");

For examples of specialized prompts, see Using Speech Synthesis in UCMA 3.0: Working with SSML (Part 3 of 4).

Delegates and Event Handlers

This section describes the delegates and event handlers that are used in the sample application that is presented in this series of articles.

AudioVideoCall_Received Delegate

The AudioVideoCall_Received delegate is invoked when an incoming audio/video call arrives. This delegate is registered in step 2 of the previous procedure, by using a call to the RegisterForIncomingCall<TCall> method on the endpoint.

The following example shows the definition of the AudioVideoCall_Received delegate.

void AudioVideoCall_Received(object sender, CallReceivedEventArgs<AudioVideoCall> e)
{
  _audioVideoCall = e.Call;
  _audioVideoCall.AudioVideoFlowConfigurationRequested += this.AudioVideoCall_FlowConfigurationRequested;
    
  // For logging purposes, register for notification of the StateChanged event on the call.
  _audioVideoCall.StateChanged +=
            new EventHandler<CallStateChangedEventArgs>(AudioVideoCall_StateChanged);
    
  // Remote Participant URI represents the far end (caller) in this conversation. 
  Console.WriteLine("Call received from: " + e.RemoteParticipant.Uri);
    
  // Now, accept the call. CallAcceptCB will run on the same thread.
  _audioVideoCall.BeginAccept(CallAcceptCB, _audioVideoCall);
}

The AudioVideoCall_Received delegate performs the following tasks.

  1. Store the value of the Call property on the CallReceivedEventArgs<TCall> parameter in the _audioVideoCall global variable.

  2. Register for notification of the AudioVideoFlowConfigurationRequested event on the call.

  3. Register for notification of the StateChanged event on the call.

  4. Accept the call by invoking the BeginAccept method on the call. The definition of the callback method whose name appears in the first parameter is shown later in this article.

AudioVideoFlowConfigurationRequested Event Handler

The AudioVideoCall_FlowConfigurationRequested delegate is called when the AudioVideoFlowConfigurationRequested event on the call is raised. The most important task that is performed by this method is to retrieve the value of the Flow property from the AudioVideoFlowConfigurationRequestedEventArgs parameter, and then store it in the _autoVideoFlow global variable. The _autoVideoFlow variable is the flow that is attached to the SpeechSynthesisConnector instance in step 4 of the previous procedure.

The following example shows the definition of the AudioVideoCall_FlowConfigurationRequested delegate.

public void AudioVideoCall_FlowConfigurationRequested(object sender, AudioVideoFlowConfigurationRequestedEventArgs e)
{
  Console.WriteLine("Flow Created.");
  _audioVideoFlow = e.Flow;

  // Now that the flow is non-null, bind the event handler for State Changed.
  // When the flow goes active, (as indicated by the state changed event) the application can take media-related actions on the flow.
  _audioVideoFlow.StateChanged += new EventHandler<MediaFlowStateChangedEventArgs>(AudioVideoFlow_StateChanged);
}

SpeakStarted Event Handler

The SpeechSynthesizer_SpeakStarted delegate is invoked every time that the SpeakStarted event on the speech synthesizer is raised. This method is registered in step 7 of the previous procedure.

The following example shows the definition of the SpeechSynthesizer_SpeakStarted delegate.

void SpeechSynthesizer_SpeakStarted(object sender, SpeakStartedEventArgs e)
{
  Console.WriteLine("SpeakStarted event raised.");
}

SpeakCompleted Event Handler

The SpeechSynthesizer_SpeakCompleted delegate is invoked every time that the SpeakCompleted event on the speech synthesizer is raised. This method is registered in step 7 of the previous procedure.

The following example shows the definition of the SpeechSynthesizer_SpeakCompleted delegate.

void SpeechSynthesizer_SpeakCompleted(object sender, SpeakCompletedEventArgs e)
{
  Console.WriteLine("SpeakCompleted event raised.");
}

Callback Methods

This section describes the callback methods that are used in the sample application.

CallAcceptCB Callback Method

The CallAcceptCB callback method calls EndAccept on the AudioVideoCall instance. Before exiting, this method calls the Set method on the _waitForCallToBeAccepted object, which allows the main thread to resume execution. The main thread was paused after step 2 in the previous procedure.

The following example shows the definition of the CallAcceptCB callback method.

private void CallAcceptCB(IAsyncResult ar)
{
  AudioVideoCall audioVideoCall = ar.AsyncState as AudioVideoCall;
  try
  {
    // Determine whether the call was accepted successfully.
    audioVideoCall.EndAccept(ar);
  }
  catch (RealTimeException exception)
  {
    // RealTimeException may be thrown on media or link-layer failures. 
    // A production application should catch additional exceptions, such as OperationTimeoutException,
    // OperationTimeoutException, and CallOperationTimeoutException.
    Console.WriteLine(exception.ToString());
  }
  finally
  {
    // Synchronize with main thread.
    _waitForCallToBeAccepted.Set();
  }
}

CallTerminateCB Callback Method

The CallTerminateCB callback method completes the call termination process by calling the EndTerminate method on the AudioVideoCall instance. This method then detaches the handler for the StateChanged event on the call. Before exiting, the CallTerminateCB method calls the BeginTerminate method on the Conversation property of the call. This is an example of callback chaining. For more information, see Working with Presence and Groups in UCMA 3.0: Callback Chaining (Part 2 of 5).

The following example shows the definition of the CallTerminateCB callback method.

private void CallTerminateCB(IAsyncResult ar)
{
  AudioVideoCall audioVideoCall = ar.AsyncState as AudioVideoCall;

  // Finish terminating the incoming call.
  audioVideoCall.EndTerminate(ar);

  // Remove this event handler now that the call has been terminated.
  _audioVideoCall.StateChanged -= AudioVideoCall_StateChanged;

  // Terminate the conversation.
  _audioVideoCall.Conversation.BeginTerminate(ConversationTerminateCB, _audioVideoCall.Conversation);
}

ConversationTerminateCB Callback Method

The ConversationTerminateCB callback method completes the conversation termination process by calling the EndTerminate method on the Conversation instance. Because the application no longer must respond to incoming calls, this method then calls the UnregisterForIncomingCall<TCall> method, to unregister the AudioVideoCall_Received delegate. Before exiting, this method calls the Set method on the _waitForConversationToBeTerminated object, which allows the main thread to resume execution.

The following example shows the definition of the ConversationTerminateCB callback method.

private void ConversationTerminateCB(IAsyncResult ar)
{
  Conversation conversation = ar.AsyncState as Conversation;

  // Finish terminating the conversation.
  conversation.EndTerminate(ar);

  // Synchronize with main thread.
  _userEndpoint.UnregisterForIncomingCall<AudioVideoCall>(AudioVideoCall_Received);
  _waitForConversationToBeTerminated.Set();
}

Shutdown Process

After the last speech synthesis is performed, the application performs a graceful shutdown, as shown in the following steps.

  1. Stop the speech synthesis connector.

    speechSynthesisConnector.Stop();
    
  2. Detach the flow from the speech synthesis connector.

    speechSynthesisConnector.DetachFlow();
    
  3. Terminate the call.

    _audioVideoCall.BeginTerminate(CallTerminateCB, _audioVideoCall);
    

    The definition of the CallTerminateCB callback method is shown in the previous section.

  4. Terminate the conversation.

    _audioVideoCall.Conversation.BeginTerminate(ConversationTerminateCB, _audioVideoCall.Conversation);
    

    The definition of the ConversationTerminateCB callback method is shown in the previous section.

  5. Unregister the delegate that is invoked for incoming calls.

    _userEndpoint.UnregisterForIncomingCall<AudioVideoCall>(AudioVideoCall_Received);
    
  6. Shut down the platform.

    _helper.ShutdownPlatform();
    

Part 3

Using Speech Synthesis in UCMA 3.0: Working with SSML (Part 3 of 4)

Additional Resources

For more information, see the following resources:

About the Author

Mark Parker is a programming writer at Microsoft whose current responsibility is the UCMA SDK documentation. Mark previously worked on the Microsoft Speech Server 2007 documentation.