audio

audio element

Plays an audio file or converts text to speech within a prompt.

Syntax

<audio 
expr = "ECMAScript_Expression"
fetchhint = "string"
fetchtimeout = "w3ctime"
maxage = "seconds"
maxstale = "seconds"
src = "URI"
/>

Attributes

expr

An ECMAScript expression that evaluates to a URL to be used in place of the src attribute or a variable associated with the name attribute of the record element.

fetchhint

Defines when the interpreter context may retrieve audio files from the server:

prefetchAudio files may be prefetched.
safeAudio files may only be fetched when needed, never before.

fetchtimeout

The time in seconds (s) or milliseconds (ms) for the VoiceXML interpreter to wait for an audio file to be returned by the HTTP server before instead synthesizing and playing to the user the alternate text (if any). The default value is 30s.

maxage

Sends the max-age Cache-Control HTTP header along with the request for the specified resource. The header indicates that the document is willing to use content whose age is no greater than the specified time in seconds, unless maxstale is also provided. Voice application developers should use extreme caution when setting this attribute. If used improperly, it could have an adverse effect on the performance of your application. You should only consider using this attribute in requests for frequently changing content (e.g. dynamically generated content) hosted on a misconfigured HTTP server that you do not control. To reduce load, some HTTP servers are configured to indicate to clients that content expires some arbitrary time in the future. In that case, set the maxage attribute to 0. If you do control the HTTP server, you should instead configure the HTTP server to omit the expires header and possibly to send the Cache-Control: no-cache header. The former requires the VoiceXML interpreter to check with the server before using any cached content. The latter requires the VoiceXML interpreter to not cache the fetched resource.

maxstale

Instructs the VoiceXML interpreter to send a max-stale Cache-Controlheader along with the HTTP request for the specified resource. The header indicates that the document is willing to use content that has exceeded its expiration time by no more than the specified number of seconds. Voice application developers should use extreme caution when setting this attribute. If used improperly, your application may present stale content to users. If you do control the HTTP server, you should instead configure the HTTP server to send an expires header with a time in the distant future.

src

The URI of the recorded audio file.

Parents

audio, block, catch, emphasis, enumerate, error, field, filled, foreach, help, if, link, menu, noinput, nomatch, object, p element, prompt, prosody, record, s element, subdialog, transfer, voice

Children

audio, break, emphasis, enumerate, mark, p element, phoneme, prosody, s element, say-as, sub, value, voice

Remarks

The audio element plays back a pre-recorded audio file or text that's synthesized using a Text-To-Speech (TTS) engine. If the src or expr attribute points to a valid audio file, any text specified within the audio element is ignored. If the audio file cannot be retrieved, the specified text is synthesized and played to the user.

If you specify a URI to a non-existent file and do not provide alternate text, the prompt is simply ignored. Tellme recommends that you always specify TTS fallback in association with an audio element.

In addition to specifying SSML and plain text within an audio element, you can also nest audio elements to play alternate recordings.

The src and expr attributes are mutually exclusive.

Audio elements are played back to the user in the order in which they are executed.

Because the standard telephone network (PSTN) is only capable of playing back 8 KHz mono, Tellme recommends you downsample your audio files to mu-law, 8 KHz, 8-bit, single channel (mono) with a standard RIFF header before deploying them to your production Web servers. Doing so reduces the amount of network bandwidth required to transfer the audio files and also minimizes the processing performed by the Tellme platform before playing audio to the user. If you do not downsample your audio, the Tellme platform will do so automatically before playback. The following table lists the characteristics of the audio data that the Tellme platform is capable of handling:

Codec

Sample rate (KHz)

Bits per sample

Channels

Linear PCM

8, 11, 16, 22, or 44

8 or 16

Mono or Stereo

mu-law

8

8

Mono or Stereo

a-law

8

8

Mono or Stereo

The following table lists the MIME type your Web server should send via the HTTP Content-Type response header. The MIME type your server transmits indicates the characteristics of the audio data included in the HTTP response. This is especially important if your audio files don't include a header of their own, also known as raw or headerless.

Content-Type

Meaning

audio/x-wav

Indicates the audio stream following the HTTP response headers begins with a RIFF header describing the characteristics of the audio data. See the table above for the audio formats supported by the Tellme Platform.

audio/basic

Indicates the audio stream is headerless (raw) and consists of mu-law, 8 KHz, 8-bit, mono audio data.

audio/x-alaw-basic

Indicates the audio stream is headerless (raw) and consists of a-law, 8 KHz, 8-bit, mono audio data.

Use the expr attribute to playback audio recorded using the record element. See the record element for an example.

If any of the fetchtimeout, fetchhint, maxage, or maxstale attributes is not specified for an audio element, then the value of the fetchtimeout, audiofetchhint, audiomaxage, or audiomaxstale property, respectively, is used.

Prior to Revision 3, if an audio element specifies an expr attribute that evaluates to ECMAScript undefined, the expr is ignored, and if alternate content is specified, that text is synthesized and played to the user. In Revison 3 and later, if the expr attribute evaluates to ECMAScript undefined, the audio element, including any content specified within it, is ignored.

Examples

The following example includes both recorded audio and TTS. The location of the audio is relative to the location of the VoiceXML document that contains the audio element. If the recorded audio cannot be fetched, the VoiceXML interpreter plays back the TTS string instead.

<?xml version="1.0"?>
<vxml version="2.1"
 xmlns="http://www.w3.org/2001/vxml">
<form>
   <block>
      <audio src="welcome.wav">
      Welcome to Tellme University
      </audio>
   </block>
</form>
</vxml>

The following example uses a variable and a constant string to reference an audio file. When referencing a variable, use the expr attribute instead of the src attribute.

<?xml version="1.0"?>
<vxml version="2.1"
 xmlns="http://www.w3.org/2001/vxml">
<form>
   <var name="path_earcons" expr="'http://audio.en-US.tellme.com/common-audio/'"/>
   <block>
      <audio expr="path_earcons + 'intellipause.wav'"/>
   </block>
</form>
</vxml>

The following example plays back TTS stored in a variable, if the audio file sorry_dave.wav cannot be retrieved. To reference a variable containing TTS, use the value element.

<?xml version="1.0"?>
<vxml version="2.1"
 xmlns="http://www.w3.org/2001/vxml">
<form>
   <var name="motd" expr="'I am sorry, Dave, but I cannot do that.'"/>
   <block>
      <audio src="sorry_dave.wav">
        <value expr="motd"/>
      </audio>
   </block>
</form>
</vxml>

The following example attempts to retrieve a recorded audio file from audio01.acme.net. If the fetch fails, the interpreter attempts to retrieve an alternate recording from audio02.acme.net. If that fetch fails, the interpreter renders the TTS "123".

<vxml version="2.1"
 xmlns="http://www.w3.org/2001/vxml">
  <form>
    <block>
      <audio src="http://audio01.acme.net/numbers/123.wav">
        <audio src="http://audio02.acme.net/numbers/123.wav">
        123
        </audio>
      </audio>
    </block>
  </form>
</vxml>

See Also

TTS Engine Behavior