Office Open XML, ECMA-376, and ISO/IEC 29500
Summary: This article contains a brief overview of Office Open XML (Open XML), explains its uses, and helps the reader become familiar with the ECMA-376 Office Open XML File Formats specification that defines Open XML.
Applies to: The 2007 Microsoft Office system | Microsoft Office 2010
Published: February 2011
Provided by: Microsoft Corporation
What is Office Open XML?
Office Open XML (Open XML) is an internationally-accepted file format standard that office software suites implement to save and exchange information. For example, the 2007 Microsoft Office system and Microsoft Office 2010 both save their documents in Open XML format. Other office software suites that adhere to the Open XML standard should be able to read, edit, and write these files.
In the early days of desktop software, each document format was tightly coupled with a specific application, with popular formats including WordStar, WordPerfect, VisiCalc, Lotus 1-2-3, and so on. These formats supported the specific features of their applications, but were typically controlled by a single vendor. Therefore, they did not provide the long-term benefits associated with international standardization. Most importantly, the applications of that time typically were not able to use files that were created by another vendor’s software, which is termed interoperation. In addition, early file formats lacked features such as thorough support documentation, archival protection, and community-led maintenance that gave the public an opportunity to participate in evolution of the standard. With the advent of the Open XML standard, these benefits can be obtained.
One of the primary goals for the Open XML standard is to be fully compatible with the corpus of Microsoft Office documents that existed at the time that the standard was written. This goal was achieved in the final standard, as users can easily convert existing documents from the previous proprietary binary formats to the standardized Open XML format without loss of document content, semantics, or structure. Open XML is the only format that addresses this goal, which is an important consideration for users and organizations with large investments in existing documents based on the .doc, .xls and .ppt formats.
Other formats have goals that address other needs. For example, the ISO/IEC 26300 Open Document Format states its goal as being “suitable for office documents,” but does not include many of the longstanding features of the previous binary formats. These include robust change tracking, spreadsheet formulas, and flexible mail-merge capabilities.
Open XML enjoys wide support. For instance, various Apple-compatible products can edit Open XML files. These products include the iWork desktop productivity suite, TextEdit for the OS-X operation system and iPhone platform, and the Microsoft Office for Mac product line. On the Linux operating system, products from OpenOffice.org, IBM, Novell, and Gnumeric provide Open XML compatibility. On the Windows operating system, Microsoft Office, Corel WordPerfect Office X-4, Nuance’s OmniPage, Datawatch’s Monarch, and Altova’s XMLSpy are all able to use Open XML files. Also, Google Docs, Zoho Writer, and Office 365 all let you view, update, and share Open XML files.
The ECMA-376 Standard
The Open XML format is defined by a standards organization called the ECMA. Formerly known as the European Computer Manufacturer’s Association, the ECMA specifies global standards for information exchange. The particular document that defines Open XML is the ECMA-376 standard.
Another global standards organization, the International Organization for Standards (ISO), also provides a standard for Open XML, known as ISO/IEC 29500. Involving both the ECMA and the ISO in the creation and maintenance of the Open XML standard ensures maximum community participation.
The ECMA-376 standard is available for free on the ECMA website. The ISO version can be purchased from the ISO website.
Editions of the Standard
As of this writing, there are two editions of the ECMA-376 standard. There is also an edition of the Open XML standard published by the ISO.
The ISO/IEC 29500 version of the Open XML standard specifies two varieties of Open XML files: Strict and Transitional. Transitional ISO/IEC 29500 is almost identical to first edition of ECMA-376. Edition 2 of the ECMA-376 standard is identical to the Strict version of ISO 29500.
The 2007 Microsoft Office system reads and writes files that comply with the ECMA-376 Edition 1 standard. Office 2010reads files conformant to ECMA-376 Edition 1, reads and writes files conformant to ISO/IEC 29500 Transitional, and reads files conformant to ISO/IEC 29500 Strict.
Language, Wording, and Style of the Standard
The Open XML standard uses particular language, wording, and writing styles to avoid multiple interpretations that might result in incompatible software implementations. The first and most important of these is the differentiation between normative and informative text.
Anything that the Open XML specification states is normative is required. For instance, as you read the specification, you will see that it contains a section called Normative References in Part 1, Section 3. The normative references provide a list of documents that you must read and understand the specification.
Unless otherwise indicated, all text in the specification is normative. Software must the implement the requirements specified in normative text in order to be considered conformant to the Open XML standard.
Anything in the specification that is listed as informative is optional. The specification contains blocks of text that are specifically marked as informative by such statements as This subclause is informative and End informative subclause. The specification includes this text for informational purposes only.
A second approach that ISO/IEC 29500 and ECMA-376 Edition 2 include to increase clarity is the use of the words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL. The specification capitalizes all of these words and uses them in accordance with a specification called RFC 2119, which is published by the Internet Engineering Task Force (IETF). RFC 2119 gives clear definitions for these words to indicate when something in a specification is required, optional, prohibited, recommended, or not recommended.
When reading a specification, implementers may misinterpret a requirement because it is unclear where notes and examples end. To ensure that such misinterpretations do not occur, the Open XML specification marks the beginning and ending of notes and examples as another technique for increasing clarity. It specifies where notes start and end with the text [Note: and end note]. Similarly, it marks the beginning and ending of examples with [Example: and end example]. The specification also includes markers such as [Rationale: and end rationale], [Guidance: and end guidance], and so on.
For additional clarity in the Open XML specification, the text numbers all features that software must implement to be conformant. It marks these with an M, for mandatory, followed by the section and requirement number. As an example, the specification contains the following statement.
Package implementers shall not map logical item name(s) mapped to the Content Types stream in a ZIP archive to a part name. [M3.11]
The specification puts the marker [M3.11] on the statement to indicate that the associated statement is mandatory for conformance and gives it a numeric identifier that is unique within that topic.
The specification also contains conformance features that software should implement but are not required for conformance. It marks these features with an S instead of an M. In addition, it explains features that are optional and marks them with an O.
Organization of the ECMA-376 Edition 1 Standard
The ECMA-376 standard defines Open XML as a set of primary XML-based markup languages. Each markup language is dedicated to a particular purpose. Specifically, the standard defines its primary markup languges as WordprocessingML (WML), PresentationML (PML), and SpreadsheetML (SML). These markup languages use a set of secondary markup languages to provide common methods of performing tasks that are common to all of the primary markup languages. An example of this is DrawingML (DML), which WML, PML, and SML all use as their mechanism for processing graphics. The secondary markup languages form the vocabularies that the primary markup languages are based on. The standard’s vocabularies also include such information as metadata, equations, and so on.
In addition, the standard specifies other necessary technologies, such as the organization and layout of Open XML files and the underlying technologies that are used to implement them. It presents Open XML in five distinct parts.
Part 1 – Fundamentals
Part 1 describes the fundamental concepts used in the standard and specifies the details of the primary and secondary markup languages. Because Part 1 covers fundamental concepts, it is the best starting point for those unfamiliar with Open XML. It defines the basic terms that the standard uses and explains the acronyms and abbreviations the reader will encounter.
In addition, Part 1 provides an overview of the layout and implementation of Open XML files. It introduces the Markup Compatibility and Extensibility features that software vendors can use to add enhancements and extensions to Open XML files.
Part 2 - Open Packaging Conventions
An Office Open XML document is represented as a series of related parts that are stored in a container called a package. Part 2 of the ECMA-376 Edition 1 specification describes packages, parts, and the relationships between the parts. It tells how to store packages, parts, and relationships as physical files on a disk or other storage medium.
Other topics that are presented in Part 2 include the core properties of a package, digital signatures, and a set of requirements for conformance to the Open XML standard.
Part 3 - Primer
Part 3 of Edition 1 of the ECMA-376 standard presents an extensive primer on Open XML. It explains the primary and secondary markup languages Open XML uses. It also provides an introduction to object embedding in Open XML, and a discussion of how to add extensions to the Open XML standard.
Part 4 - Markup Language Reference
In Edition 1 of the ECMA-376 specification, Part 4 provides an extensive set of reference pages detailing the markup languages used in Open XML. It also provides XML and RELAX NG schemas that define Open XML.
Part 5 - Markup Compatibility and Extensibility
In Part 5, the ECMA-376 Edition 1 standard defines ways of extending the Open XML markup languages. This enables individual software implementers to innovate on the standard and provide features that are unique to their own applications. However, if an implementer adds features in compliance with the Markup Compatibility and Extensibility (MCE) rules, the files produced by their software are still compliant with the ECMA-376 standard. As a result, they should be readable by all software that offers Open XML support. In other words, vendors can innovate and still interoperate.
Organization of the ECMA-376 Edition 2 Standard
Edition 2 of the ECMA-376 standard updates and extends the design of Open XML. It also reorganizes and expands on much of the information in Edition 1.
Part 1 – Fundamentals and Markup Language Reference
Part 1 of Edition 2 of the ECMA-376 specification covers the same information that is presented in Part 1 of Edition 1. In addition, it provides extensive reference material for the standard’s markup languages. This includes an explanation of each XML element, its parent elements, its child elements, and its attributes. Part 1 also contains the XML schema that specifies Open XML. This schema is defined by using the World Wide Web Committee (W3C) XML Schema 1.0 syntax.
Other topics that are presented in Part 1 include conformance classes, interoperability, a description of the differences between ECMA-376 Edition 2 (ISO/IEC 29500) and ECMA-376 Edition 1.
Part 2 – Open Packaging Conventions
Part 2 of Edition 2 of the ECMA-376 specification contains an updated version of the information in Part 2 of Edition 1. One of the main differences between the two editions is that Annex H of Part 2 of Edition 1 is a set of normative requirements for conformance. In Edition 2, Annex H is instead a set of informative guidelines on how to create conformant documents. It contains information about best practices for implementing Open XML files and is provided as a convenience, instead of as a set of requirements.
Part 3 – Markup Compatibility and Extensibility
In Part 3, the ECMA-376 Edition 2 standard defines ways of extending the Open XML markup languages. Part 3 of Edition 2 covers the same information as Part 5 of Edition 1.
Part 4 – Transitional Migration Features
Part 4 of Edition 2 of the ECMA-376 standard focuses on backward compatibility older document types. Specifically, it provides features that make it possible to load binary documents created by versions of Microsoft Office released before the 2007 Microsoft Office system and to then convert those documents to Open XML.
Also, Part 4 discusses compatibility with features that are being phased out, such as VML, in favor of more recent innovations.
Open XML provides a standard that enables a wide range of features to implementers of office software. These features include interoperability, archival protection, extensibility, compatibility with existing Microsoft Office documents, and more.
The ECMA-376 standard, which specifies Open XML, describes the requirements that are needed to create, edit, and save Open XML files. It provides information about the markup languages that comprise Open XML. It explains the packaging conventions used for Open XML files, and describes the mechanisms software vendors can use to build custom extension to the standard. ECMA-376 also presents the standard’s features for maintaining backward compatibility.