Erika Ehrli, Microsoft Corporation
Applies to: 2007 Microsoft Office System, Microsoft Office Word 2007
Summary: This article provides a high-level overview of new features for developers in Microsoft Office Word 2007 Beta 2. The article reviews content controls, XML mapping, document building blocks, the Word XML Format, and other new features. (9 printed pages)
Introduction to Word 2007
Microsoft Office Word 2007 introduces a variety of new and exciting features that make it the most advanced and compelling version of Word to be released. This article offers an overview of new features in Office Word 2007 from a developer's perspective.
Some of the new features are common to all of the core applications in the 2007 Microsoft Office system, and some are specific to Word. The enhancements that are common to Word 2007, Microsoft Office PowerPoint 2007, and Microsoft Office Excel 2007 include the new user interface (UI), called the Office Fluent UI, the new Microsoft Office Open XML Formats (Office XML Formats), and the ability to easily attach custom XML data to a file. The Ribbon greatly improves the ability to navigate through and locate commands. The new Office XML Formats and the ability to attach custom XML data to a document further expand XML support in Word and Office. Microsoft Office Word 2003 was the first Office application to introduce a full-fidelity XML file format with WordprocessingML. This gave users the ability to work with custom XML data. Expanding on the XML support of Word 2003, Office Word 2007 dramatically advances XML capabilities. The new default file format is written almost entirely in XML.
The most notable new Word 2007 features center on XML. These features combine to help template authors create templates that are more reliable, more stable, and rich.
Content controls New in Word 2007, content controls are predefined blocks of content that you can position anywhere in the document. Example content control types include text boxes, drop-down menus, calendars, and pictures. You can map most content controls to elements in XML data attached to a document. You can easily map this XML data using the Word 2007 XML Format. This ability to map content eliminates certain vulnerabilities present when working with XML in Word 2003 and results in more robust documents.
XML mapping This is a feature of Office Word 2007 that enables you to create a link between a document and an XML file. This creates true data/view separation between the document formatting and custom XML data.
Document building blocks These are predefined pieces of content, such as a cover page, a header, a footer, or a custom-built clause in a contract. Custom building blocks facilitate the quick creation of professional-looking Word documents.
Word XML Format The Microsoft Office Word XML Format (Word XML Format) is based on the Open Packaging Conventions. Its main purpose is to divide the file into document parts, each of which defines a part of the overall contents of the file. This feature enables you to edit something in the file, such as the header or footer, without accidentally modifying any other XML document parts. Similarly, all custom XML data is in its own part, so working with custom XML is now easier.
Other new features in Word 2007 include bibliography, citation, and equation features. The new equations feature allows for professional-looking formatting of complicated mathematical equations.
This article reviews and provides an overview of these features to help you understand the development opportunities offered by Word 2007.
Word 2003 introduced the ability to attach an XML schema to a document. You could add elements from an XML file, provided they conformed to the schema. This helped create a robust document structure that allowed easier access to data. However, there were limitations. The most significant was that the presentation and custom XML data were linked through the document editing surface. Consequently, an end user could accidentally delete part of the XML structure used to define the document, thereby invalidating the document's XML structure according to its schema. Word 2007 addresses this issue with the addition of content controls.
Word 2007 introduces new features designed to make Word a highly reliable platform for document-based solutions, including structured document assembly, data capture/extraction, and document construction. One major area of investment is the introduction of content controls, which provide template creators the ability to more easily structure arbitrary pieces of a Word 2007 document by using semantics, content restrictions, and behaviors.
Content controls are predefined pieces of content. There are numerous types of content controls, including text blocks, drop-down menus, combo boxes, calendar controls, and pictures. You can map these content controls to an element in an XML file. Using XML Path Language (XPath), you can programmatically map content in an XML file to a content control. This enables you to write a simple and short application to manipulate and modify data in a document.
The following figure shows a plain-text content control.
Figure 1. Content controls in Word 2007
You can lock content controls to prevent users from editing or deleting them. This is a great improvement in template creation.
In previous versions of Word, it was difficult to lock pieces of content in a document. In Word 2007, content controls simplify this process, allowing you to lock content through the UI or programmatically.
You can populate portions of a document template with data from an XML file using XML mapping. Using the object model, you can add structured custom data (stored in any number of XML files) to the document and map the data to specific content controls. With the advent of the Word 2007 XML Format, programmatic access to this data has never been easier.
XML mapping allows for many possible scenarios in which data behind the document is automatically updated using events on ContentControl objects. One such scenario involves a document with stock data attached to it. In this scenario, you can programmatically update stock quotes in XML format to reflect new daily price changes so that the user does not have to do anything. You can use events, such as the Open event of the Document object, to trigger the document to perform an action. In this scenario, when the user opens a document, you can use an add-in to retrieve updated stock prices and push them into the document's XML data store. You can use an XPath to map the elements in which these stock prices are stored to content controls in the document.
Imagine a table created by you, as the template author, that is meant to contain stock data. Next, you insert text controls in the cells that display stock quotes, with one quote per cell. Programmatically, each control is mapped to the appropriate element in the appropriate CustomXMLPart object. However, you can think of a CustomXMLPart object as a data store. By default, the new Word XML Format stores CustomXMLPart objects in a directory called datastore.
The following figure shows a content control object that is meant to contain stock data.
Figure 2. Content control that is meant to contain stock data
As the template author, you probably do not want template users to modify or delete this stock table. As explained earlier, you can easily lock each control in the table to prevent editing or deletion of the control itself. The lock does not affect whether the user can update the text in a control. Users can still update the text, as defined by the content control.
CustomXMLPart objects are part of the 2007 release of the Office object model and, therefore, are also available in Excel 2007 and PowerPoint 2007. All of these applications in the 2007 Office release benefit from the separation of XML data from document formatting and layout. The Office XML Formats store custom XML data in document parts. Document parts help define sections of the overall contents of the file. The new file format consists of many different types of document parts, including a custom XML data part. All document content associated with an XML element maps to data in a custom XML data part. This separation of the XML from the document formatting and layout makes it easier to access data programmatically and provides a more robust document.
Document Building Blocks
Document building blocks are new in Office Word 2007. A document building block is a predesigned piece of content, such as a cover page or a header or footer. A document building block enables the user to choose from a defined list of document building blocks to insert into a document. Word 2007 includes a library of document building blocks that helps make creating professional-looking documents a quicker and easier process. What might take thirty minutes in Word 2003 is reduced to a quick click of a button using the new Insert Ribbon in Word 2007.
While document building blocks simplify the quick assembly of professional-looking documents, their greatest potential lies in the creation of custom building blocks. You are not limited to only the building blocks included in Word 2007. Consider the scenario of a company that wants a specific cover page for all of its documents. This cover page might contain the company's logo, a corporate standard font, and a text content control designed to contain the name of the project to which the document is related.
The following figure shows a cover page building block.
Figure 3. Document building blocks
To create the custom building block above, you construct the cover page, select all, and then choose to create a custom building block from the selected text. However, because you are a forward-thinking template designer, you can also prepare for possible future changes. For example, if you know that the company logo may change soon, and project names change frequently in your company, you can prepare for that. The Word 2007 solution is to use content controls for each content item and map each control to XML that contains data you can easily update. Therefore, you place a picture content control on the cover page that displays the image retrieved from an element in an attached CustomXMLPart object. Similarly, you create the project name using a text content control that you map to an element in a CustomXMLPart containing the project name. Should you need to update one of these items, you can write a few lines of code to update every document stored on the server that uses this cover page building block. You can replace the old logo with the new one. If the project name changes, you can update the text in the XML element that you mapped to the text content control containing the project name, thereby automatically updating all the documents stored on the server.
Word 2007 XML Format
The new Word 2007 XML Format takes advantage of the Open Packaging Conventions, which describe the method for packaging information in a file format and describe metadata, parts, and relationships. The Word 2007 XML Format, with a few minor exceptions, is written entirely in XML and is contained in a .zip file. This creates significant advantages over the old binary file format:
The file size is much smaller because of ZIP compression.
The file is much more robust because it is broken up into different document parts. Should one part become damaged (for example, a part describing headers), the rest of the document remains intact and still opens successfully.
The file is easier to work with programmatically because of the new structure. For example, it is easier to access embedded content, such as images, because they are stored in their native format inside the file.
Custom XML is also easier to work with because it is stored in its own part, separate from the XML that describes the bulk of a document.
The old binary file format was created when priorities in software differed from the priorities of today. Back then, the ability to transfer a Word document from computer to computer using a floppy disc ranked very high, and the tight structure of a binary format worked well. As software advanced, other priorities became clear, such as the ability to write code against a file format and make it as robust as possible. XML is a clear solution.
Microsoft began to address this issue in previous versions of Microsoft Office by introducing SpreadSheetML and WordprocessingML. However, only now, with the 2007 release of Microsoft Office, have the goals that were conceived as far back as 1999 been accomplished fully. By including the XML File Format inside a ZIP container, the benefit of a small compressed file format is also realized. Excel 2007 and PowerPoint 2007 share this new file format technology, described by the Open Packaging Conventions. Together, the shared formats are called the Microsoft Office Open XML Formats. The new Word 2007 XML Format is the default file format, although the old binary file format is still available in the 2007 Microsoft Office system.
An easy way to look inside the new file format is to save a Word 2007 document in the new default format and then rename the file with a .zip extension. By double-clicking the renamed file, you can open and look at its contents. Inside the file, you can see the document parts that make up the file, along with the relationships that describe how the parts interact with one another. However, it is important to note that, with a few exceptions defined within the Open Packaging Conventions, the actual file directory structure is arbitrary. The relationships of the files within the package, not the file structure, are what determine file validity. You can rearrange and rename the parts of an Word 2007 file inside its .zip container if you update the relationships properly so that the document parts continue to relate to one another as designed. If the relationships are accurate, the file opens without error. The initial file structure in a Word 2007 file is simply the default structure created by Word. This default structure enables developers to determine the composition of Word 2007 files easily.
The following figure displays the contents of a sample project report document in a ZIP file.
Figure 4. Contents of a sample document in a ZIP file
The easiest way to modify a Word 2007 XML file programmatically is to use the System.IO.Packaging class. Using this technology, you can easily update header and footer files programmatically across numerous Word 2007 documents stored on a server.
Other New Features
From a developer's perspective, the most exciting features in Word 2007 are the new file format, content controls, and XML mapping, all of which are reviewed in this article. The following section briefly describes other new items in Word 2007.
Word 2007 Ribbons
Word 2007 uses the new the Fluent UI for all of its commands. Each previous version of Word added commands to the menus and toolbars. Often, Word nested a valuable tool so far under menus that few users discovered it. The Fluent UI makes commands easier to find, thereby making the Word user experience more productive. Default Ribbons in Word 2007 include Home, Insert, Page Layout, References, Mailings, Review, View, and Developer. By default, the Developer Ribbon is not displayed.
To show the Developer tabH9
Click the Microsoft Office Button, and then click Word Options.
In the Top options for working with Word section of the Word Options dialog box, click Personalize.
Select Show Developer tab in the Ribbon, as shown in Figure 5.
Figure 5. The Word Options dialog box
When you are in developer mode, you see the Developer tab in the Ribbon.
Each Ribbon organizes commands into logical groupings. Similar to the way that you programmatically created additional toolbar items in Word 2003, in Word 2007 you can create custom Ribbons for commands by defining them in an XML file contained in your solution.
The following figure shows the Developer Ribbon.
Figure 6. Developer tab in the Ribbon
Citations and Bibliography
On the References Ribbon, the Citations & Bibliography grouping allows you to easily construct a bibliography and insert citations into your document. New members in the Word 2007 object model, such as the Bibliography object in both the Application object and the Document object, allow you to work with bibliographies and citations programmatically.
The following figure shows a partial object model diagram of the classes in Word 2007.
Figure 7. Partial Word 2007 object model (click to see complete image)
New in Word 2007 is functionality that enables you to create professional-looking mathematical equations. On the Insert tab, in the Symbols group, there is an Equations command that allows users to insert complex and advanced mathematical symbols, functions, and equations. The Word 2007 object model also contains new items to support equations programmatically.
The exciting additions to Office Word 2007, such as XML mapping, content controls, and the new Word XML Format, all combine to help developers and template designers. With the new default file format in XML, developers have unprecedented access to Word files. Template designers can create robust and rich templates in a manner that is much easier and intuitive than previous versions of Word. For example, in Word 2003, protecting only portions of a template required detailed work. The process was not intuitive. In Word 2007, content controls simplify this process. In Word 2003, including custom XML required the use of XML elements on the document's surface. This made the document fragile because an end user unfamiliar with XML could accidentally break the document by deleting an element. Custom XML parts and XML mapping prevent this scenario. Also, in Word 2007, documents can use events to update their content intelligently from an XML source with no interaction on the part of the user.
And, while the development and template design features are attractive, Word 2007 also includes other, equally exciting features, such as the new Ribbon and improvements in other areas.
I want to thank Mark Iverson and Tristan Davis for their contributions to this article.
Other ResourcesIntroducing the Microsoft Office (2007) Open XML File Formats
Specifications and License Downloads
Blog: Brian Jones: Office Extensibility Using Open XML Formats
Channel 9 Video: Office 12 – Word-to-PDF File Translation
Channel 9 Video: Open XML File Formats
Channel 9 Video: Brian Jones - New Office File Formats Announced