Making Your Applications Talk
This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.
Peter Vogel
Not everyone has 20/20 vision. If you're displaying large amounts of text, it
might be easier for your users to listen to your data. When something goes
wrong, you might want to provide an audible cue that's more informative than
"beep." For all of these tasks, the Microsoft TextToSpeech object is
just what you need. Peter Vogel leads the conversation.
In all probability, you already have Microsoft's text-to-speech software
installed on your computer. This technology, when passed a string of text, reads
(or speaks) the text back to you. I remember the first time that I heard
text-to-speech and speech-recognition technology demonstrated on a personal
computer. While impressive, it was like a dog walking on its hind legs—not
because it was done well, but because you were surprised that it worked at all.
Both text reading and speech recognition have come a very long way since then,
to the point where Microsoft's text-to-speech (TTS) technology is distributed
for free and is very impressive.
As part of my interest in effective user-interface design, I've been spending
more time looking at accessibility issues: making user interfaces work for users
with less than optimal hearing, vision, and motor control. As a person who owns
his second set of bifocals (and is too cheap to buy a 21" monitor), I've
mostly been focusing on issues around poor visibility. Blind users typically use
screen readers to read a page back to them, for instance, and I'm developing
some understanding about how to support screen-reading tools. Blind users are an
extreme case, however. Many users with less than optimal sight require only
occasional support for text reading.
Nor is text-to-speech technology only worthwhile for visually impaired users.
Even for users with adequate sight, the ability to have large amounts of text
fed back to them may be a benefit. And how many developers, when writing an
error handling routine, really just wanted the application to yell "Don't
do that!" at the user? Alternatively, I know one book author who uses
text-to-speech technology to have his words read back to him as part of his
editing process.
Getting started with TTS
Getting information about Microsoft's TTS isn't easy. Some relevant
information can be found with the documentation for Microsoft Agent (the Office
Assistant and other annoying little pop-up creatures). The online entry point
for Microsoft Agent documentation is at
www.microsoft.com/msagent. If the
TTS component isn't installed on your computer (and it probably is—remember
that you've already got the Office Assistant), you can download what
documentation there is from here. To check to see if you have TTS installed, in
Access go to your Tools | References list and look for the Microsoft Voice Text
item.
There are actually two sets of downloads that you'll need:
- The Agent Core Components
- Text-to-speech engines, which handle the translation from text in a
particular language to a particular speaking voice
The core components that I downloaded came with a single text-to-speech
engine, named Sam (I'm not making this up). As a Canadian citizen with two
official languages, I also downloaded two additional text-to-speech engines:
- Lernout & Hauspie TTS3000 TTS Engine—French
- Lernout & Hauspie TTS3000 TTS Engine—British English
While Sam is a male speaker, the British and French engines include both a
male and a female speaking voice. More languages are available on the Agent
site, and other engines are available from third-party sources. The engines
available on the Agent site have the advantage of being free.
Adding TTS to an application is simple:
- From the toolbox, click on the More Controls button.
- Scroll down through the list to the TextToSpeech Class entry.
- Click on the entry.
- Draw the control on your form.
You'll end up with a blank white square on your form at design time. At
runtime, you'll get a pair of lips that faintly resembles the Rolling Stones'
logo (see
Figure 1 and
Figure 2).
The resulting ActiveX control will be given the name TextToSpeech1 by Access,
and I've used that name through the rest of this article.
Once the control is on your form, you'll need to declare a variable to refer
to it. I used this code at the top of my form's code module to create a variable
that would be available throughout my whole application:
Dim tts As HTTSLib.TextToSpeech
In my Form Load event, I used this code to set the variable to the
TextToSpeech object inside the ActiveX control on my form:
Set tts = Me.TextToSpeech1.Object
To create the warning message I mentioned at the beginning of this article,
all I needed to do was add this code that passes some text to the object's Speak
method:
tts.Speak ("Don't do that.")
It's just a short step from this to putting a text box on your form and
having the voice say whatever you type into it (the sample application included
in the
Download file will let you do this). However,
passing a Null value to the Speak method raises an error, so you'll need to
check for that:
If Not IsNull(Me.txtSpeak) Then
tts.Speak Me.txtSpeak
Else
tts.Speak "Nothing to say."
End If
To stop a voice from speaking, you can use the StopSpeaking method:
tts.StopSpeaking
How good is the pronunciation? Good enough that family members who I tested
this on were able to make out what the computer voice was saying. The
translation isn't character for character, either. Faced with "Elizabeth
II," "NASA," and "FBI," the TextToSpeech class produced
"Elizabeth Second," "NASA" (as a word), and "F, B,
I" (as three separately sounded letters). Punctuation marks also counted,
triggering the voice to insert pauses and inflections for commas, periods, and—under
at least some circumstances—question marks. Exclamation marks never seemed to
affect the voice, unfortunately. That would have added some emphasis to my
warning message.
Configuring your voice
You have some ability to configure the voice used by TTS. For
instance, as the Speak method runs, the lips on your screen move. This can look
odd, but the appearance of the mouth does vary with what's being said, providing
feedback to the user on what's being said. For instance, when asked to say
"pop pop pop," the upper teeth in the mouth appear between the
animated lips. Asked to say "me me me," the teeth don't appear.
"Joy joy joy" causes the lower teeth to appear. You can't control the
size of the lips, but if the animated lips don't provide useful feedback for
your user interface, you can make them invisible:
Me.TextToSpeech1.Visible = False
You can also control the speed of the voice. The unit of measure is
"words per minute," with 150 being a typical speaking pace. This code
provides a very slow speaking voice:
tts.Speed = 50
If you set the number too high, the object either ignores the number or you
trigger a "catastrophic failure" message (what's considered "too
high" seems to vary from engine to engine). I used this code to reset the
speed if I triggered an error:
On Error Resume Next
tts.Speed = Me.lblSpeed.Caption
If Err.Number > 0 Then
tts.Speed = 150
End If
The TextToSpeech object offers a variety of properties that describe the
voice, including Age, Gender, and Speaker. However, most of these options are
read-only and are controlled by which speech engine you choose to use. When the
TextToSpeech class is loaded, it will build a list of speech engines installed
on your computer. You control your voice by selecting a specific engine.
A count of the number of engines available can be found in the CountEngine
property. For any property that's dependent on the speech engine, you pass an
integer value to the property to retrieve the setting for that engine. For
instance, this code goes through the list of speech engines displaying the name
of each engine (note that the first speech engine is at position 1, not position
0):
For ing = 1 To tts.CountEngines
Debug.Print tts.ModeName(ing)
Next
Picking the voice that you want to hear consists of selecting the engine that
you want to use. The engine selection is done by passing the position of the
engine in the engine list to the object's Select method. This code selects the
second engine:
tts.Select 2
As you might expect, choosing between the British and French voices results
in very different pronunciations of the word "Bonjour" (the British
engine pronounces it as two words: "Bawn Jer"). However, the choice of
engines is more critical than that because different letter combinations mean
different things to different speech engines.
For instance, the abbreviation "S.A." is short for "Society
Anonyme" in much of the world (a Society Anonyme is the equivalent of a
limited company in North America). Asked to speak the abbreviation
"S.A." the Sam voice and the British voices sound out the letters; the
French voice says "Society Anonyme." The reverse is true for common
North American abbreviations like "Ltd."—the North American and
English voices pronounce it "limited," while the French voice sounds
it out. Another common business abbreviation has more variations. When presented
with "corp," all the engines pronounce it as a word (with only the
French engine giving what I'd regard as the correct pronunciation:
"core"). With a period at the end to make "corp." Sam said
"corporation" and the other voices stuck with their original choice.
With an "s" at the end to make "corps," all the voices but
Sam pronounced the word as "core" (Sam pronounced it as the plural of
"corp," a word that I don't think actually exists). For each speech
engine, you'll find a Word document describing the engine's behavior on the
Microsoft Agent site.
Selecting engines
These variations being the case, you may want to check your input
text and load an engine that will read it correctly. For instance, if you were
pronouncing the names of the contact people in the Northwind's Customer table,
you might want to check the Country field and select the engine on that basis.
To select the right engine, you could keep track of what engine is loaded
into what position, but that would require you to control which engines are
installed on your user's computer. A better method is to use the FindEngine
method. This method accepts 28 parameters, all of them required. The first 14
allow you to specify the kind of speech engine that you're looking for. Setting
the tenth parameter, for instance, allows you to specify the gender of the
speaker that you're looking for (1 = female, 2 = male). The second set of 14
parameters allows you to specify a ranking for each of the parameters to
indicate which parameters are more important to you. The FindEngine method
returns the position of the engine that you're looking for and can be used with
the Select method.
In this code, I'm looking for a female voice of age 30. I've indicated that
the gender is my No. 1 priority and age is my No. 2 priority:
tts.Select tts.FindEngine("", "", "", "", "", 0, _
"", "", "", 1, 30, 0, 0, 0, 0, 0, 0, 0, _
0, 0, 0, 0, 1, 2, 0, 0, 0, 0)
To check what engine you've selected, you can retrieve information about the
currently selected engine from those read-only properties that I mentioned
earlier. The first step in retrieving the information is to use the CurrentMode
property to get the position of the engine currently being used. That
information can be used with the properties controlled by the engine to retrieve
information about the engine. This code determines the age of the current voice,
for instance:
Dim intCurrentMode As Integer
intCurrentMode = tts.CurrentMode
Debug.Print tts.Age(intCurrentMode)
The engine isn't perfect. I found, for instance, that TTS can get
tongue-tied. Feeding the Speak method a name from the Northwind database with
two accented "e" characters ("Frédérique Citeaux") caused
the engine to go into an endless loop, eventually consuming 98 percent of my
CPU. This may be a result of the character set that's used for non-North
American characters on my computer—international keyboards and localized
operating systems may bypass this problem.
In the meantime, my computer is talking to me. Got to go.
Download TTS.exe
To find out more about Smart Access and Pinnacle Publishing, visit their website at
http://www.pinpub.com/html/main.isx?sub=57
Note: This is not a Microsoft Corporation website. Microsoft is not responsible for its content.
This article is reproduced from the June 2003 issue of Smart Access. Copyright 2003, by Pinnacle Publishing, Inc., unless otherwise noted. All rights are reserved. Smart Access is an independently produced publication of Pinnacle Publishing, Inc. No part of this article may be used or reproduced in any fashion (except in brief quotations used in critical articles and reviews) without prior consent of Pinnacle Publishing, Inc. To contact Pinnacle Publishing, Inc., please call 1-800-788-1900.