Using Speech Synthesis in UCMA 3.0: Code Listing and Conclusion (Part 4 of 4)

Summary:   Combine the capabilities of Microsoft Unified Communications Managed API (UCMA) 3.0 Core SDK with Microsoft Speech Platform SDK to make synthesized speech in your application more natural sounding. Use Speech Synthesis Markup Language (SSML) to insert pauses, increase or decrease the speech volume, expand abbreviations correctly, and pronounce words or phrases phonetically. Part 4 contains a listing of the code described in this set of articles, and a conclusion.

Applies to:   Microsoft Unified Communications Managed API (UCMA) 3.0 Core SDK | Microsoft Speech Platform SDK

Published:   August 2011 | Provided by:   Mark Parker, Microsoft | About the Author

Contents

  • Application Configuration

  • Application Code

  • Helper Class

  • SSML Code

  • Conclusion

  • Additional Resources

Download code  Download code

This article is the last in a four-part series of articles on how to use speech synthesis in a Microsoft Unified Communications Managed API (UCMA) 3.0 application.

Application Configuration

App.Config, the application configuration file, is used to configure settings for the computer that is hosting the application. When the appropriate parameters are entered in the add elements (and the XML comment delimiters are removed), they do not have to be entered from the keyboard when the application is running.

The following example shows the App.Config file.

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <appSettings>
    <!-- Provide parameters necessary for the sample to run without 
    prompting for input. -->

    <!-- Provide the FQDN of the Microsoft Lync Server. -->
    <!-- <add key="ServerFQDN1" value=""/> -->

    <!-- The user ID of the user on whose behalf the application runs. -->
    <!-- Leave this value as blank to use credentials of the currently logged on user. -->
    <!-- <add key="UserName1" value=""/> -->

    <!-- The domain of the user on whose behalf the application runs. -->
    <!-- Leave this value as blank to use the credentials of the currently logged on user. -->
    <!-- <add key="UserDomain1" value=""/> -->

    <!-- The URI of the user on whose behalf the application runs, in the form user@host. -->
    <!-- <add key="UserURI1" value=""/> -->
  </appSettings>
  <runtime>
    <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
      <dependentAssembly>
        <assemblyIdentity name="mscorlib" publicKeyToken="b77a5c561934e089" culture="neutral" />
        <bindingRedirect oldVersion="2.0.0.0" newVersion="4.0.0.0"/>
      </dependentAssembly>
    </assemblyBinding>
  </runtime>
</configuration>

Application Code

The following example is the code for the application that is described in this set of articles.

// .NET namespaces
using System;
using System.Xml;
using System.Threading;
using Microsoft.Speech.AudioFormat;
using Microsoft.Speech.Synthesis;

// UCMA namespaces
using Microsoft.Rtc.Collaboration;
using Microsoft.Rtc.Collaboration.AudioVideo;
using Microsoft.Rtc.Signaling;

// UCMA samples namespaces
using Microsoft.Rtc.Collaboration.Sample.Common;

namespace Microsoft.Rtc.Collaboration.Sample.SpeechSynthesis
{
  public class SpeechSynthesisSample
  {
    #region Globals
    private UCMASampleHelper _helper;

    private UserEndpoint _userEndpoint;

    private AudioVideoCall _audioVideoCall;
    private AudioVideoFlow _audioVideoFlow;
    private SpeechSynthesizer _speechSynthesizer;

    // Wait handles are used to keep the main thread and worker threads synchronized.
    private AutoResetEvent _waitForCallToBeAccepted = new AutoResetEvent(false);
    private AutoResetEvent _waitForConversationToBeTerminated = new AutoResetEvent(false);
    private AutoResetEvent _waitForSynthesisToBeFinished = new AutoResetEvent(false);

    #endregion

    #region Methods
    public static void Main(string[] args)
    {
      SpeechSynthesisSample sample = new SpeechSynthesisSample();
      sample.Run();
    }
        
    private void Run()
    {
      // A helper class to take care of platform and endpoint setup and cleanup. 
      _helper = new UCMASampleHelper();

      // Create a user endpoint using the network credential object. 
      _userEndpoint = _helper.CreateEstablishedUserEndpoint("SpeechSynthesis Sample User");

      // Register a delegate to be called when an incoming audio-video call arrives.
      _userEndpoint.RegisterForIncomingCall<AudioVideoCall>(AudioVideoCall_Received);

      // Wait for the incoming call to be accepted, then terminate the conversation.
      Console.WriteLine("Waiting for incoming call...");
      _waitForCallToBeAccepted.WaitOne();

      // Create a speech synthesis connector and attach it to the AudioVideoFlow instance.
      SpeechSynthesisConnector speechSynthesisConnector = new SpeechSynthesisConnector();

      speechSynthesisConnector.AttachFlow(_audioVideoFlow);

      // Create a speech synthesizer and set its output to the speech synthesis connector.
      _speechSynthesizer = new SpeechSynthesizer();
      SpeechAudioFormatInfo audioformat = new SpeechAudioFormatInfo(16000, AudioBitsPerSample.Sixteen, Microsoft.Speech.AudioFormat.AudioChannel.Mono);
      _speechSynthesizer.SetOutputToAudioStream(speechSynthesisConnector, audioformat);

      // Register for notification of the SpeakCompleted and SpeakStarted events on the speech synthesizer.
      _speechSynthesizer.SpeakStarted += new EventHandler<SpeakStartedEventArgs>(SpeechSynthesizer_SpeakStarted);
      _speechSynthesizer.SpeakCompleted += new EventHandler<SpeakCompletedEventArgs>(SpeechSynthesizer_SpeakCompleted);

      // Start the speech synthesis connector.
      speechSynthesisConnector.Start();

      TestSpeechSynthesis();

      // Stop the speech synthesis connector.
      speechSynthesisConnector.Stop();
      Console.WriteLine("Stopping the speech synthesis connector.");

      speechSynthesisConnector.DetachFlow();

      _waitForSynthesisToBeFinished.WaitOne();
      UCMASampleHelper.PauseBeforeContinuing("Press ENTER to shut down and exit.");

      // Terminate the call, the conversation, and then unregister the 
      // endpoint from receiving an incoming call. 
      _audioVideoCall.BeginTerminate(CallTerminateCB, _audioVideoCall);
      _waitForConversationToBeTerminated.WaitOne();
 
      // Clean up by shutting down the platform.
      _helper.ShutdownPlatform();
    }

    private void TestSpeechSynthesis()
    {
      PromptBuilder prompt = new PromptBuilder();
      String str;

      // Speak a prompt that is loaded from a file.
      String currDirPath = Environment.CurrentDirectory;
      prompt.AppendSsml(XmlReader.Create(currDirPath + "\\ssml.xml"));
      _speechSynthesizer.Speak(prompt);
      prompt.ClearContent();

      // Speak a prompt that includes an audio file.
      prompt.AppendText("Listen for the sound of the chimes.");
      prompt.AppendAudio(currDirPath + "\\CHIMES.WAV");
      _speechSynthesizer.Speak(prompt);
      prompt.ClearContent();
      

      // Speak some phonemes - "measure" - https://msdn.microsoft.com/en-us/library/bb813894(office.12).aspx
      // Phoneme Table for English (United States)
      str = "In good <phoneme alphabet=\"x-microsoft-ups\" ph=\"M EH . ZH AX RA\">alternate text</phoneme>";
      prompt.AppendSsmlMarkup(str);
      _speechSynthesizer.Speak(prompt);
      prompt.ClearContent();

      // Speak some phonemes - "Rodin's Thinker".
      str = "<phoneme alphabet=\"x-microsoft-ups\" ph=\"RA O + UH . S1 D AE N Z\">alternate text</phoneme>";
      // prompt.AppendSsmlMarkup(str);
      str += "<phoneme alphabet=\"x-microsoft-ups\" ph=\"TH IH NG . K RA\">alternate text</phoneme>";
      prompt.AppendSsmlMarkup(str);
      _speechSynthesizer.Speak(prompt);
      prompt.ClearContent();

      // Speak some phonemes - "there".
      str = "Here and <phoneme alphabet=\"x-microsoft-ups\" ph=\"DH EH RA\">alternate text</phoneme>";
      prompt.AppendSsmlMarkup(str);
      _speechSynthesizer.Speak(prompt);
      prompt.ClearContent();

      // Speak a fraction.
      str = "The recipe calls for <say-as interpret-as=\"number\">3/4</say-as> of a cup of milk";
      prompt.AppendSsmlMarkup(str);
      _speechSynthesizer.Speak(prompt);
      prompt.ClearContent();

      // Speak the time.
      str = "The plane arrives at <say-as interpret-as=\"time\">3:52</say-as>P.M";
      prompt.AppendSsmlMarkup(str);
      _speechSynthesizer.Speak(prompt);
      prompt.ClearContent();
      
      // Speak a telephone number.
      str = "For more information, call <say-as interpret-as=\"telephone\">425-555-1212</say-as>";
      prompt.AppendSsmlMarkup(str);
      _speechSynthesizer.Speak(prompt);
      prompt.ClearContent();


      // Speak an address.
      str = "The Red Sox play in Cincinnati <say-as interpret-as=\"address\">OH</say-as>";
      prompt.AppendSsmlMarkup(str);
      _speechSynthesizer.Speak(prompt);
      prompt.ClearContent();

      str = "I live on <say-as interpret-as=\"address\">St. Paul St.</say-as>";
      prompt.AppendSsmlMarkup(str);
      _speechSynthesizer.Speak(prompt);
      prompt.ClearContent();

      // Speak an ordinal number.
      str = "We are going on the <say-as interpret-as=\"ordinal\">2</say-as> of June";
      prompt.AppendSsmlMarkup(str);
      _speechSynthesizer.Speak(prompt);
      prompt.ClearContent();

      // Speak as a date - month and day.
      str = "His birthday is <say-as interpret-as=\"date_md\">8.21</say-as>";
      prompt.AppendSsmlMarkup(str);
      _speechSynthesizer.Speak(prompt);
      prompt.ClearContent();

      // Speak a number as a year.
      str = "In the year <say-as interpret-as=\"date:y\">2011</say-as>";
      prompt.AppendSsmlMarkup(str);
     _speechSynthesizer.Speak(prompt);
      prompt.ClearContent();

      // Speak an acronym.
      str = "Universal resource locator is abbreviated as <say-as interpret-as=\"letters\">url</say-as>";
      prompt.AppendSsmlMarkup(str);
      _speechSynthesizer.Speak(prompt);
      prompt.ClearContent();

      // Speak with emphasis.
      str = "We need to get this done <emphasis>today</emphasis>";
      prompt.AppendSsmlMarkup(str);
      _speechSynthesizer.Speak(prompt);
      prompt.ClearContent(); 

      // Speak with emphasis.
      str = "We need to get this done <break size=\"small\"/> <emphasis>right away</emphasis>";
      prompt.AppendSsmlMarkup(str);
      _speechSynthesizer.Speak(prompt);
      prompt.ClearContent(); 

      _waitForSynthesisToBeFinished.Set();
    }
    #endregion

    #region EVENT HANDLERS
    // Record the state transitions in the console.
    void AudioVideoCall_StateChanged(object sender, CallStateChangedEventArgs e)
    {
      Console.WriteLine("Previous call state: " + e.PreviousState + "\nCurrent state: " + e.State);
    }

    // Handler for the AudioVideoFlowConfigurationRequested event on the call.
    // This event is raised when there is a flow present to begin media operations with, and that it is no longer null.
    public void AudioVideoCall_FlowConfigurationRequested(object sender, AudioVideoFlowConfigurationRequestedEventArgs e)
    {
      Console.WriteLine("Flow Created.");
      _audioVideoFlow = e.Flow;

      // Now that the flow is non-null, bind the event handler for State Changed.
      // When the flow goes active, (as indicated by the state changed event) the application can take media-related actions on the flow.
      _audioVideoFlow.StateChanged += new EventHandler<MediaFlowStateChangedEventArgs>(AudioVideoFlow_StateChanged);
    }

    // Handler for the StateChanged event on an AudioVideoFlow instance.
    private void AudioVideoFlow_StateChanged(object sender, MediaFlowStateChangedEventArgs e)
    {
      // When the flow is active, media operations can begin.
      Console.WriteLine("Previous flow state: " + e.PreviousState.ToString() + "\nNew flow state: " + e.State.ToString());
    }

    // // Delegate that is called when an incoming AudioVideoCall arrives.
    void AudioVideoCall_Received(object sender, CallReceivedEventArgs<AudioVideoCall> e)
    {
      //_waitForCallToBeReceived.Set();
      _audioVideoCall = e.Call;
      _audioVideoCall.AudioVideoFlowConfigurationRequested += this.AudioVideoCall_FlowConfigurationRequested;
        
      // For logging purposes, register for notification of the StateChanged event on the call.
      _audioVideoCall.StateChanged +=
                new EventHandler<CallStateChangedEventArgs>(AudioVideoCall_StateChanged);
            
      // Remote Participant URI represents the far end (caller) in this conversation. 
      Console.WriteLine("Call received from: " + e.RemoteParticipant.Uri);
            
      // Now, accept the call. CallAcceptCB will run on the same thread.
      _audioVideoCall.BeginAccept(CallAcceptCB, _audioVideoCall);
    }

    // Handler for the SpeakStarted event on the SpeechSynthesizer.
    void SpeechSynthesizer_SpeakStarted(object sender, SpeakStartedEventArgs e)
    {
      Console.WriteLine("SpeakStarted event raised.");
    }

    // Handler for the SpeakCompleted event on the SpeechSynthesizer.
    void SpeechSynthesizer_SpeakCompleted(object sender, SpeakCompletedEventArgs e)
    {
      Console.WriteLine("SpeakCompleted event raised.");
    }
    #endregion

    #region CALLBACKS
    private void CallAcceptCB(IAsyncResult ar)
    {
      AudioVideoCall audioVideoCall = ar.AsyncState as AudioVideoCall;
      try
      {
        // Determine whether the call was accepted successfully.
        audioVideoCall.EndAccept(ar);
      }
      catch (RealTimeException exception)
      {
        // RealTimeException may be thrown on media or link-layer failures. 
        // A production application should catch additional exceptions, such as OperationTimeoutException,
        // OperationTimeoutException, and CallOperationTimeoutException.

        Console.WriteLine(exception.ToString());
      }
      finally
      {
        // Synchronize with main thread.
        _waitForCallToBeAccepted.Set();
      }
    }

    private void CallTerminateCB(IAsyncResult ar)
    {
      AudioVideoCall audioVideoCall = ar.AsyncState as AudioVideoCall;

      // Finish terminating the incoming call.
      audioVideoCall.EndTerminate(ar);

      // Remove this event handler now that the call has been terminated.
      _audioVideoCall.StateChanged -= AudioVideoCall_StateChanged;

      // Terminate the conversation.
      _audioVideoCall.Conversation.BeginTerminate(ConversationTerminateCB, _audioVideoCall.Conversation);
    }

    private void ConversationTerminateCB(IAsyncResult ar)
    {
      Conversation conversation = ar.AsyncState as Conversation;

      // Finish terminating the conversation.
      conversation.EndTerminate(ar);

      // Synchronize with main thread.
      _userEndpoint.UnregisterForIncomingCall<AudioVideoCall>(AudioVideoCall_Received);
      _waitForConversationToBeTerminated.Set();
    }
    #endregion
  }
}

Helper Class

The helper class contains member methods that create and start a CollaborationPlatform instance, and then create and establish the UserEndpoint instance that is used in the sample. For more information, see “Helper Class” in Using UCMA 3.0 BackToBackCall: Code Listing (Part 3 of 4).

SSML Code

The following example shows the Speech Synthesis Markup Language (SSML) markup that is used in one of the examples in Using Speech Synthesis in UCMA 3.0: Working with SSML (Part 3 of 4).

<?xml version="1.0" encoding="ISO-8859-1"?>
<speak version="1.0" xmlns:ssml="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
  <sentence>
    <prosody volume="70">Your order for <break time="500ms" />
    </prosody>
    <prosody rate="-20%" volume="100">
      <emphasis>3 <break time="250ms" /> books</emphasis>
    </prosody>
    <prosody volume="70"> <break time="500ms" /> will be shipped <say-as interpret-as="date_md">7.21</say-as>
    </prosody>
  </sentence>
</speak>

Conclusion

With only a small amount of additional work, you can add another method of communication to your application by adding speech synthesis. The SpeechSynthesisConnector class in UCMA 3.0 provides access to features of the SpeechSynthesizer class that is included in the Microsoft Speech Platform SDK. By using these features, you can provide wider accessibility for your application, and can give it a more professional appearance.

Additional Resources

For more information, see the following resources:

About the Author

Mark Parker is a programming writer at Microsoft whose current responsibility is the UCMA SDK documentation. Mark previously worked on the Microsoft Speech Server 2007 documentation.