February 2012

Volume 27 Number 02

The Working Programmer - Talk to Me: Voice and SMS in the Cloud

By Ted Neward | February 2012

Ted NewardThis past October found me doing some charity work with GiveCamp in Seattle. (Don’t know what GiveCamp is? Take a second and have a look: givecamp.org.) While there, I ran across a group that was interested in doing some SMS messaging as part of an application the members wanted to build. We got to talking, and the subject of interactive-voice applications came up. They were interested in doing something like that (specifically, setting up some automated thank-you calls to donors), but figured you had to run your own call center and install your own PBX software and hardware to make it work.

Au contraire, mes amis.

As luck would have it, at the Philadelphia Emerging Technology Event (also known as Philly ETE) early last year, I met some guys from Voxeo, an enterprise-class PBX system, and they introduced me to Tropo, their cloud-hosted voice-and-SMS (among other things) solution. And for anything telephony-related, folks, it’s worth a look.

Tropo: A Testamonial

Fundamentally, Tropo isn’t all that different from some of the other voice/SMS services available, but it has one distinct advantage over the others I’ve looked at: during your development cycle, no money changes hands. Building an app that uses this service during development is trivial—just wander on over to the Tropo Web site (tropo.com) and sign up, and the full suite of services is available and at your fingertips for as long as you want.

Naturally, there are other voice/SMS services, and it’s always worth examining the alternatives. But for this article (and its successor), I’m going to use Tropo’s services. Caveat emptor.

Getting Started

As with any other of the cloud-based services, getting started with Tropo requires creating an account on its servers and responding to the e-mail verification. Once that’s done, logging back in to Tropo will show the account dashboard, and it’s here that the fun begins.

Just to show you how much fun doing this can be, try it now. Pick up the phone and dial 530-206-0504.

Hello, World!

As most readers of this magazine well know, tradition and the Gods of Computer Science both demand that the first application in any new language or platform be the “Hello, world!” application. Far be it from me to buck tradition (at least, when it serves my purpose to stay with it, anyway), so the first step here is to create a new Tropo application and create a simple voice greeting with it. But before we get too far with that, let’s make sure we’re clear on the architecture here.

Like most cloud-hosted services, Tropo owns and maintains the servers on which the telephony hardware runs, and as with most cloud-hosted applications, that means the application developers don’t have any hardware—both a blessing and a curse in most scenarios. In this case, however, most of the “curse” end of the cloud falls away, because we’re not going to ask Tropo to host any data for us. In fact, Tropo doesn’t even have to host the scripts that drive the appli­cation. It can and quite happily will, but if that’s of concern, then the script can be pulled from any arbitrary URL and executed on Tropo’s servers. For this piece, we’re going to use Tropo-hosted files, just because that seems easier to start.

Once the e-mail has arrived and verification is taken care of, logging back in to Tropo should reveal a dashboard.

Click “Create an application,” which will take you to Figure 1.

Creating an Application in Tropo
Figure 1 Creating an Application in Tropo

Choose “Tropo Scripting” and give it a name; I used “HelloMSDN” for this example. Finally, click on the “Hosted File” link and choose “Create a new hosted file for this application”; a simple text editor will pop up at that point. As you might guess, you’re building a small script file (in your choice of scripting languages—JavaScript, PHP, Ruby, Groovy or Python) that will be fired when Tropo is told to “execute” this script, which in this case will be when somebody dials a phone number that Tropo will give you. (More on this later.)

Call the file “HelloMSDNScript.js” (the .js extension being important, to tell Tropo that this is a JavaScript script), and in the body of the file, put the following:

say("Greetings, MSDN fans!")

When that’s done, it should look like Figure 2.

Creating a Script in Tropo
Figure 2 Creating a Script in Tropo

If all is well, click “Create File,” and then “Create Application.” Once that’s done, Tropo will take you back to the application dashboard, which will look a little different now, as you can see in Figure 3.

The Tropo Application Dashboard
Figure 3 The Tropo Application Dashboard

This dashboard is particularly important, because this is where you’re going to have some control over the channels to which Tropo is listening. In many respects, the one that generates the most visceral reaction from non-technical people is the phone demo, so let’s get Tropo to assign a phone number to the application. This is done by clicking “Add a phone number” and selecting an area code from which the number will be generated. Naturally, U.S. toll-free (1-800) numbers are supported as well, but because of the costs involved, that requires setting up a billing plan. Once you’ve selected an area code, Tropo needs a few minutes to provision the number, and then you can dial the number and be greeted in fine synthesized voice fashion. (Yes, do it now.)

Hi, Honey! Do You Still Love Me?

But that’s not nearly enough. Being as how the Valentine’s season is approaching, and the holiday season is just past, and you probably played too much Xbox 360 over Christmas, and that got your significant other all annoyed with you for not paying attention to him/her (yes, my wife still brings this up over dinner), you might want to make sure your loved one (girlfriend/boyfriend/spouse/whatever) is still in love with you. So let’s flip the code around for a bit and make sure. Editing the running application is pretty easy: going back to the application dashboard (which should still be up, assuming you’ve closed the text editor window; if you haven’t, don’t worry, just stay there), simply click on the “Hosted File” link again and choose the “Edit this hosted file” option to bring the text editor back up. This time, replace the say code with the following (with, of course, your loved one’s name in place of my wife’s):

say("I love you, Charlotte!");

var results = ask("Do you love me too? Yes or no?", {

  choices: "yes, no"

});

log("results.value: " + results.value)

if (results.value == "yes") {

  say("Yay! That makes me happy.");

}

else {

  say("Oh. Now I'm a sad panda.");

}

The ask routine is a blocking call, playing the prompt, and then waiting (up to a configurable timeout number of seconds) for a voice response. This being a JavaScript API, we can pass in a number of optional parameters in the JSON struct at the end of the ask call, which in this case contains the “choices” string, which is a comma-delimited list of acceptable voice responses. Tropo contains some speech-to-text translators and will attempt to parse the spoken response from the caller—as best it can, anyway. (When Tropo does its parsing, it does so with a “confidence” factor, indicating how strongly it thinks it parsed correctly, and the level of confidence you demand in your spoken responses can be configured. By default it’s .3, which is pretty loosey-goosey, but usually sufficient for spoken responses when the acceptable results are prompted, as in the previous “Yes or no” prompt.)

But wait! The default voice is a female voice, and that could sound a little weird when sending it to my wife. So let’s change the voice over to something more like my own. This is done by passing in the optional “voice” field in the JSON arguments for both say and ask:

say("I love you, Charlotte!", { voice:"victor"});

var results = ask("Do you love me too? Yes or no?", {

  voice: "victor",

  choices: "yes, no"

});

log("results.value: " + results.value)

if (results.value == "yes") {

  say("Yay! That makes me happy.", { voice:"victor"});

}

else {

  say("Oh. Now I'm a sad panda.", { voice:"victor"});

}

Once done editing, save the file; Tropo will update the file in place, and the next phone call made will play with the new voice. Note that “victor” is just one of a number of possible voices, including a variety of different accents. Make sure you don’t pick one that sounds sexier than your own natural voice, though, or your chosen loved one may prefer the phone over you, and that would probably be bad.

Of course, it would be best if it were your own voice, and with a little preparation, you can make it so. Both say and ask support playing an MP3 or WAV file instead of doing the text-to-speech option currently being used, so grab your trusty computer microphone, record the prompts and the responses, and upload them to your favorite Web server. Then, instead of offering up the text to parse and synthesize, provide the URLs for the pre-recorded files (which you’ll have to record and store on an HTTP-accessible server) to play; the code will read as such (where the URL is your recorded file):

say('https://www.tedneward.com/howdy.wav');

Next: Artificial Intelligence

I’ve only scratched the surface of what you can do with Tropo—in fact, I have much more to explore with Tropo before I’m done with the subject—but to fully understand where I want to go with this particular example, we’ll have to take a side trip into the wonderful world of artificial intelligence, then revisit Tropo again.    


Ted Neward is an architectural consultant with Neudesic LLC. He’s written more than 100 articles, is a C# MVP and INETA speaker and has authored and coauthored a dozen books, including the recently released “Professional F# 2.0” (Wrox). He consults and mentors regularly. Reach him at ted@tedneward.com if you’re interested in having him come work with your team, or read his blog at blogs.tedneward.com.

Thanks to the following technical expert for reviewing this article: Adam Kalsey