This documentation is archived and is not being maintained.

W3C VoiceXML 2.0 Introduction

Speech Server 2007

This content is no longer actively maintained. It is provided as is, for anyone who may still be using these technologies, with no warranties or claims of accuracy with regard to the most recent product version or service release.


This Version:

Latest Version:

Previous Version:


Scott McGlashan, Hewlett-Packard (Editor-in-Chief)

Daniel C. Burnett, Nuance Communications

Jerry Carter, Invited Expert

Peter Danielsen, Lucent (until October 2002)

Jim Ferrans, Motorola

Andrew Hunt, ScanSoft

Bruce Lucas, IBM

Brad Porter, Tellme Networks

Ken Rehor, Vocalocity

Steph Tryphonas, Tellme Networks

Please refer to the errata for this document, which may include some normative corrections.

See also translations.

Copyright ?? 2004 W3C?? (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.

This document specifies VoiceXML, the Voice Extensible Markup Language. VoiceXML is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed initiative conversations. Its major goal is to bring the advantages of Web-based development and content delivery to interactive voice response applications.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at

This document has been reviewed by W3C Members and other interested parties, and it has been endorsed by the Director as a W3C Recommendation. W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionality and interoperability of the Web.

This specification is part of the W3C Speech Interface Framework and has been developed within the W3C Voice Browser Activity by participants in the Voice Browser Working Group (W3C Members only).

The design of VoiceXML 2.0 has been widely reviewed (see the disposition of comments) and satisfies the Working Group's technical requirements. A list of implementations is included in the VoiceXML 2.0 implementation report, along with the associated test suite.

Comments are welcome on (archive). See W3C mailing list and archive usage guidelines.

The W3C maintains a list of any patent disclosures related to this work.

In this document, the key words "must", "must not", "required", "shall", "shall not", "should", "should not", "recommended", "may", and "optional" are to be interpreted as described in [RFC2119] and indicate requirement levels for compliant VoiceXML implementations.

Abbreviated Contents

1. Overview

2. Dialog Constructs

3. User Input

4. System Output

5. Control flow and scripting

6. Environment and Resources


Full Contents

1. Overview

????????1.1 Introduction

????????1.2 Background

??????????????????1.2.1 Architectural Model

??????????????????1.2.2 Goals of VoiceXML

??????????????????1.2.3 Scope of VoiceXML

??????????????????1.2.4 Principles of Design

??????????????????1.2.5 Implementation Platform Requirements

????????1.3 Concepts

??????????????????1.3.1 Dialogs and Subdialogs

??????????????????1.3.2 Sessions

??????????????????1.3.3 Applications

??????????????????1.3.4 Grammars

??????????????????1.3.5 Events

??????????????????1.3.6 Links

????????1.4 VoiceXML Elements

????????1.5 Document Structure and Execution

??????????????????1.5.1 Execution within one Document

??????????????????1.5.2 Executing a Multi-Document Application

??????????????????1.5.3 Subdialogs

??????????????????1.5.4 Final Processing

2. Dialog Constructs

????????2.1 Forms

??????????????????2.1.1 Form Interpretation

??????????????????2.1.2 Form Items

??????????????????2.1.3 Form Item Variables and Conditions

??????????????????2.1.4 Directed Forms

??????????????????2.1.5 Mixed Initiative Forms

??????????????????2.1.6 Form Interpretation Algorithm

??????2.2 Menus

??????????????????2.2.1 menu element

??????????????????2.2.2 choice element

??????????????????2.2.3 DTMF in Menus

??????????????????2.2.4 enumerate element

??????????????????2.2.5 Grammar Generation

??????????????????2.2.6 Interpretation Model

????????2.3 Form Items

??????????????????2.3.1 field element

??????????????????2.3.2 block element

??????????????????2.3.3 initial element

??????????????????2.3.4 subdialog element

??????????????????2.3.5 object element

??????????????????2.3.6 record element

??????????????????2.3.7 transfer element

????????2.4 Filled

????????2.5 Links

3. User Input

????????3.1 Grammars

??????????????????3.1.1 Speech Grammars

??????????????????3.1.2 DTMF Grammars

??????????????????3.1.3 Scope of Grammars

??????????????????3.1.4 Activation of Grammars

??????????????????3.1.5 Semantic Interpretation of Input

??????????????????3.1.6 Mapping Semantic Interpretation Results to VoiceXML forms

4. System Output

????????4.1 Prompt

??????????????????4.1.1 Speech Markup

??????????????????4.1.2 Basic Prompts

??????????????????4.1.3 Audio Prompting

??????????????????4.1.4 <value> Element

??????????????????4.1.5 Bargein

??????????????????4.1.6 Prompt Selection

??????????????????4.1.7 Timeout

??????????????????4.1.8 Prompt Queueing and Input Collection

5. Control flow and scripting

????????5.1 Variables and Expressions

??????????????????5.1.1 Declaring Variables

??????????????????5.1.2 Variable Scopes

??????????????????5.1.3 Referencing Variables

??????????????????5.1.4 Standard Session Variables

??????????????????5.1.5 Standard Application Variables

????????5.2 Event Handling

??????????????????5.2.1 throw element

??????????????????5.2.2 catch element

??????????????????5.2.3 Shorthand Notation

??????????????????5.2.4 catch Element Selection

??????????????????5.2.5 Default catch elements

??????????????????5.2.6 Event Types

????????5.3 Executable Content

??????????????????5.3.1 var element

??????????????????5.3.2 assign element

??????????????????5.3.3 clear element

??????????????????5.3.4 if, elseif, else elements

??????????????????5.3.5 prompts

??????????????????5.3.6 reprompt element

??????????????????5.3.7 goto element

??????????????????5.3.8 submit element

??????????????????5.3.9 exit element

??????????????????5.3.10 return element

??????????????????5.3.11 disconnect element

??????????????????5.3.12 script element

??????????????????5.3.13 log element

6.Environment and Resources

????????6.1 Resource Fetching

??????????????????6.1.1 Fetching

??????????????????6.1.2 Caching

??????????????????6.1.3 Prefetching

??????????????????6.1.4 Protocols

????????6.2 Metadata Information

??????????????????6.2.1 meta element

??????????????????6.2.2 metadata element

????????6.3 property element

??????????????????6.3.1 Platform-Specific Properties

??????????????????6.3.2 Generic Speech Recognizer Properties

??????????????????6.3.3 Generic DTMF Recognizer Properties

??????????????????6.3.4 Prompt and Collect Properties

??????????????????6.3.5 Fetching Properties

??????????????????6.3.6 Miscellaneous Properties

????????6.4 param element

????????6.5 Value Designations


????????Appendix??A. Glossary of Terms

????????Appendix??B. VoiceXML Document Type Definition

????????Appendix??C. Form Interpretation Algorithm

????????Appendix??D. Timing Properties

????????Appendix??E. Audio File Formats

????????Appendix??F. Conformance

????????Appendix??G. Internationalization

????????Appendix??H. Accessibility

????????Appendix??I. Privacy

????????Appendix??J. Changes from VoiceXML 1.0

????????Appendix??K. Reusability

????????Appendix??L. Acknowledgements

????????Appendix??M. References

????????Appendix??N. Media Type and File Suffix

????????Appendix??O. VoiceXML XML Schema Definition

????????Appendix??P. Builtin Grammar Types