Developing Word XML Using the Microsoft Office Word 2003 XML Toolbox
John R. Durant
Microsoft® Office Word 2003
Summary: Microsoft Office Word 2003 can support documents with XML schemas, allowing you to save, manipulate, and even create documents as well-formed XML documents. To make it easier to exploit these capabilities, you can download a free toolbox that assists you in working with the new XML features of Word. This article introduces the Microsoft Office Word 2003 XML Toolbox and explains its benefits for developers. The Word XML Toolbox is available as a download from the Microsoft Download Center. (16 printed pages)
Word XML Scenario
Inserting XML into the Document
Monitoring XML Events
Familiarizing Yourself with Dialog and Task Pane Shortcuts
Using Document Protection
Applying Cool Colors
Using XML, you can create custom tags that enable users to organize and work with documents and data in innovate ways using Microsoft® Office Word 2003. For example, a special document used in the hiring process, while filled in by a user in the normal way, can offer up its contents as discretely accessible data elements. You can send the information off to a database or integrate it with some other external system. At the heart of the XML features of Word is the ability to liberate user content from the confines of the document so it can be re-purposed in ways that are meaningful to a specific organization. You can associate XML schemas, a key part of XML support in Word, with a document so that its contents are clearly organized and even validated according to the data definition contained in the schema.
Note XML features, except for saving documents as XML with the Word XML schema, are available only in Microsoft Office Professional Edition 2003 and stand-alone Microsoft Office Word 2003.
All of these features are great, but the actual features (menus, shortcuts, etc.) of Word are designed for end-users not for developers when creating Word-based solutions. In other words, as a developer, you may find working with the dialogs and so forth a little slow going. Fortunately, you can download and install the XML Toolbox for Microsoft Office Word 2003. It's free, and its sole purpose is to make life easier for developers. Go to the Office Developer Center Tools and Utilities to find the link for this and other key tools that can help you developing with Office. After the toolbox is installed, the Word XML Toolbox toolbar appears (Figure 1). You can hide or show the toolbar by going to the View menu, clicking on Toolbars, and then selecting the Word XML Toolbox item from the list.
Figure 1. The Word XML Toolbox toolbar with button labels
Remember that the toolbox only works with Microsoft Office Word 2003. In addition, the toolbox makes heavy use of the Microsoft .NET Framework and requires that .NET Programmability Support is enabled on your development computer prior to attempting to install or run the toolbox.
Note For more information about installing .NET Programmability Support, see Installing the Office Primary Interop Assemblies.
To help present the benefits of the XML toolbox, I list and describe a series of developer scenarios. Then, I show how the Word XML Toolbox can help make the job easier. Some of the scenarios are admittedly a bit contrived, as they are necessarily simplified versions of some of the larger problems we developers really face. Nevertheless, I believe the benefits of the toolbox come through. A few of the scenarios are not elaborate at all, as they are designed to show how the toolbox solves a rather basic little problem such as too many clicks to get to a relevant dialog box.
The boss did some reading and finally understands how great the XML support in Word is and wants to start reaping some benefits. Working for a small newspaper, your task is to create a template that allows writers to continue writing their content in Word, but ensure that the content conforms to a schema. This schema increases the accuracy and completeness of the content, and it allows you to extract the content once it is finalized. Furthermore, the schema can even allow an automated system to pull content blocks from a database and create the sketch of an article in Word automatically so that the writer can further refine and amend it.
Dutifully, you open Word, create a blank document, and then stare at the blinking cursor. What is the next step? Your task boils down to creating an XML schema that reflects the needs of your organization, and then creating a Word document with XML tags in it that correspond to the XML schema. You could fire up an XML editor (everything from Notepad to Microsoft Visual Studio® .NET 2003 or a third-party tool) and start creating the .xsd file from scratch. It's not a bad approach, and for elaborate schemas, a good XML editor is the right tool for the job. However, using the XML Toolbox, you can begin laying down some XML tags and deriving a schema right inside of Word.
Adding XML Tags
To get started, just start typing some XML tags in a document, arranging them in the order that you want them. Figure 2 shows the document and one way to order the XML tags.
Figure 2. Typing XML tags directly in a document
At this point, the XML tags may look like XML to you, but Word thinks of them as regular text. They are not yet identified as real XML tags so that Word responds to them as full-fledged XML in a structured document rather than just ad hoc textual input. To convert the text to true XML, on the XML Toolbox menu, click Convert <Tags/> to XML Nodes as shown in Figure 3.
Figure 3. Convert ad hoc text into true XML nodes
After the conversion, what was previously identified as text appears as XML recognized by Word. The result is shown in Figure 4.
Figure 4. The result of converting to XML nodes
At this point, the project is shaping up nicely. You now have a document with XML tags in it. The tags are named and arranged in such a way that they reflect the business requirement for article authoring. However, some changes are needed. First, the last three tags really ought to be part of the <body> node. That way, a conclusion cannot be written that is not part of the actual body and all of the text for the article can be accessed as a complete unit rather than having to be continually assembled from three smaller units. To make this change, you can highlight the last three nodes and drag them into the <body> node.
Creating the Schema
While arranging XML nodes is a large part of the overall task, this document is still not associated with a schema. This scenario requires that you create an explicit schema to use for other documents and even by a completely different system to auto-generate documents from external content sources. The Word XML Toolbox includes a feature that allows you to generate an inferred schema.
First, you need to save the document. Because you are trying to create a template for later use, you should save it as a document template. Then, on the Word XML Toolbox toolbar, click Generate Inferred Schema. The toolbox presents you with a dialog box (Figure 5) wherein you provide the root for the schema, its namespace and final file path.
Figure 5. The toolbox lets you infer a schema and save it as a separate file
Once you create the file, the toolbox provides a dialog box informing you that the process is complete, and it generates a new document with the schema already associated. In this example, the schema is as follows:
<?xml version="1.0" encoding="utf-16"?> <!--Schema generated by the Word XML Toolbox for Microsoft Office Word 2003--> <xs:schema xmlns:tns="schemas-customArticleSchema" attributeFormDefault="unqualified" elementFormDefault="qualified" targetNamespace="schemas-customArticleSchema" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="article"> <xs:complexType mixed="true"> <xs:sequence> <xs:element name="title" type="xs:string" /> <xs:element name="author" type="xs:string" /> <xs:element name="date" type="xs:string" /> <xs:element name="body"> <xs:complexType mixed="true"> <xs:sequence> <xs:element name="introduction" type="xs:string" /> <xs:element name="maintext" type="xs:string" /> <xs:element name="conclusion" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
The Word XML Toolbox also adds the schema to the Word schema library after inferring the schema and saving it in a separate file.
Note The schema library allows you to specify the namespaces that are available for your document or solution. You can also configure options, such as set the friendly names (aliases) for schemas and any XSLT files that are associated with a schema. Schemas in the schema library are available to attach to a document and are listed on the XML Schema tab of the Templates and Add-ins dialog box.
Once the schema appears in the schema library, you can add it to the template solution. To do this, make sure the template document is active and choose Templates and Add-ins to see the list of schemas currently available. Select the box next to the schema you just created and in the Schema validation options section, check the box next to Validate document against attached schemas. This associates the template with the schema and makes sure that a document created based on this template validates against the schema. Save the changes to the template.
Note The solution created in this article saves the template and the schema on the local computer. To make them available to others, you can place them on a publicly available share or SharePoint site.
Adding Placeholder Text
The solution now has some of the fundamental pieces in place. However, it is not very nice to look at and not very user-friendly. Keep in mind that only in rare cases should a user actually see the XML tags in the document or know much about the underlying schema. Typically, you designate placeholder text for each of the XML tags in the document. Thus, the user sees the placeholder text and knows the information to add at appropriate locations in the document. The Word XML Toolbox includes a tool to change XML attributes for the tags in the document and add placeholder text quickly and easily. To access this tool, on the toolbox toolbar click the XmlNode Property Viewer (Figure 6).
Figure 6. Accessing the XmlNode Property Viewer
The viewer allows you to move through the nodes in the document and view or change some of its attributes. Figure 7 shows a highlighted XML node and its properties in the viewer. Type the placeholder text as shown and click Next Node >> in the viewer to move to the next node in the document. After making changes to the nodes, close the viewer.
Figure 7. Accessing XmlNode Property Viewer
It's now time to see what a user would see when viewing the document without the XML tags showing. Once again, the Word XML Toolbox includes a feature that makes it easy to switch between viewing the documents with the tags displayed and without them displayed. Click Toggle XML Tag View (Figure 8).
Figure 8. Accessing the XML Tag View
The document with placeholder text showing should appear as follows:
Figure 9. A document with the XML tags hidden.
You can now format sections of the document with styles and generally set up the document template so that it simplifies the user experience.
Word added XML support to the Range object, one of the most loyal objects in the Word development realm. This is an example of where the new Word XML support blends rather cleverly with the classic Word object model. Thus, you can go after portions (or all) of a document using the traditional Range object but then work with it as XML. The new Range object supports a method called InsertXML that lets you insert arbitrary XML directly into the document. However, there is no user interface for adding XML this way—that is what Word is for after all!
However, developers are a different sort. When developing an XML-based Word solution, you may want to add XML to the document directly, not just text. When would this be useful? To help this make sense, think of our solution that we created thus far. Imagine a schema that has validation that signals invalid text. Imagine also the solution with a smart tag that processes textual input and manipulates the content as XML. One way to test this is to insert some XML in the document and see how the solution responds. The Word XML Toolbox comes with a function that lets you insert XML into a document conveniently. Click Insert XML Dialog (Figure 10) to open the Insert XML Editor (Figure 11).
Figure 10. Accessing the Insert XML Dialog
Figure 11. Adding arbitrary XML with resulting insert
There are only a few tags you need to worry about when adding XML this way, and adding bolded text is just a hint of how far you can take things. The Insert XML Editor is a .NET Framework-based form that lets you type free-form character strings in it. It is not sophisticated as far as editors go (again, that is for what you use applications such as Word), but it does have a feature to let you undo your insert if it is not what you wanted to see. One caveat is that Word provides limited information about why the InsertXML method may have failed. Thus, if your XML is not well-formed (the most common problem) or some other error occurs, then you need to do your own investigation to figure out what went wrong. Another common error occurs when the inserted XML does not conform to the WordProcessingML (or WordML for short) schemas.
One of the single greatest things about the new XML support in Word is that there are events that fire when XML-related things happen. For example, if an XML node is added that is in the wrong place, an event fires. When a user moves from node to node, events fire. When textual input occurs, events fire. This is overwhelmingly interesting to us as developers because our curious nature always causes us to want to know what is going on when applications are running. However, it also leads ultimately to a better user experience. Trapping these events can be useful when validating input and letting users know what changes need to be made so that content is more accurate and useful.
The following two events pertain to the Document object:
- The XMLAfterInsert event fires just after a new element has been inserted
- The XMLBeforeDelete event fires just before an element is deleted
These events pertain to the Application object:
- XMLValidationError: Fires when a validation error occurs in the document and receives a single parameter: a reference to the node with the error
- XMLSelectionChange: Fires when you select a new node. This event receives parameters: the WordSelection object for the newly selected material, references to both the node that lost focus and the node that gained focus, and a reason code
When testing, it would be nice to have a type of "watch" window that lets you know what events are firing, what is causing them to fire, and what messages are pushed up the queue. That is why the toolbox includes the XML Event Monitor, accessible from the toolbar.
Figure 12. Accessing the XML Event Monitor
This viewer displays a window with a current snapshot of the XML-related events firing in Word. No longer do you need to stub out each event and put in a message box to peek at what is going on under the hood. The XML Event Viewer simplifies things and displays the messages in a tidy grid. Figure 13 shows the viewer with the text of one entry magnified to be more easily seen.
Figure 13. XML events with a single event drawn out
The XML Event Viewer can also be helpful when debugging a solution. A quick look at the output may provide insight about where your solution is failing.
The ability to add XML is important to developers, but so is viewing it. It would inconvenient to save your document and then open an XML editor every time you want to see the XML output of your solution. The Word XML Toolbox lets you see the XML representation of your document any time. Moreover, you have three different ways of seeing the output:
- The entire document in WordML (Entire Document (WordML) on the toolbar)
- A selected portion of the document in WordML (Current Selection (WordML) on the toolbar)
- A selected portion of the document such that only the data are displayed in unadorned XML (Current Selection (Data Only) on the toolbar)
Because this article was created in Word 2003, let us look at this very sentence, viewing it in the last two ways described. All three views are accessible from the XML Toolbox menu. To select a view, click XML Toolbox, point to View XML, and then click desired view. Choosing to view the selection as WordML opens the XML Viewer, a simple form that displays all of the XML as text along with buttons to copy the text to the clipboard or save the text as a file. When viewing a selection as WordML, a lot of style-related information is included with the actual data, so the output is verbose. Here is the last portion of the WordML output that contains our sentence:
<w:body> <wx:sect> <w:p> <w:r> <w:t>Because this article was written in Word 2003, let us look at this very sentence, viewing it in the last two ways described.</w:t> </w:r> </w:p> <w:sectPr> <w:pgSz w:w="12240" w:h="15840" /> <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="720" w:footer="720" w:gutter="0" /> <w:cols w:space="720" /> </w:sectPr> </wx:sect> </w:body>
Viewing just the data of the same selection is shorter because it does not contain the style-related information. However, there is one key difference in using these two views: viewing just the data requires that the sentence reside in an XML node (from your custom-defined schema) in the Word document. Copying the sentence into a node (in this case a node called "data") and then viewing the data yields the following output:
<?xml version="1.0" encoding="utf-16" standalone="no"?> <data>Because this article was written in Word 2003, let us look at this very sentence, viewing it in the last two ways described.</data>
In all cases, you avoid unnecessary steps to view the document in XML.
The XML Toolbox menu includes a handful of shortcuts that save you the trouble of clicking through deeply nested menus in Word. The fact that they are deeply nested is not a big obstacle for end-users because they rarely use the dialog boxes. There are four key shortcuts on the XML Toolbox menu:
- Templates Expansion Pack Dialog
- Templates XML Schema Dialog
- Schema Library Dialog
- XML Options Dialog
For more information about each of these dialog boxes, as well as their purpose and configuration options, see the Word help documentation.
The Word XML Toolbox toolbar includes two buttons that also make it easier to get to key features. The first is the XML Structure Task Pane button (Figure 14).
Figure 14. Accessing the XML Structure task pane
Clicking it causes the target task pane to be displayed. The second is the XML Schema in Templates button (Figure 15) that has the same function as the Templates XML Schema Dialog menu item previously listed.
Figure 15. Accessing the XML Schema dialog box
In this latest version of Word, you can also apply protective restrictions to sections of document. For a specified section of a document, you can grant permission to specific users to edit it and specify the type of editing to allow. For more information, see the Word help documentation.
Note Even though you can see and alter the document protection settings for a document at the WordML-level, Word actually sets permissions at the range level.
Once again, the XML support in Word does not disappoint as it provides a mechanism for configuring restrictions in WordML directly. However, there is no way to get to the XML level directly through Word. The Word XML Toolbox provides some of this capability through two menu items, one for granting permission to the Everyone security group for all nodes in a document and another for replicating custom permissions on one node to all other nodes in the document.
Looking at Figure 16, you see the document with permissions configured.
Figure 16. The outlined document section has custom permissions
The entire document is restricted such that users can only add comments. However, in the section outlined in red a specified user (in this case me) can freely edit the text. The Protect Document pane shows this corresponding information.
Using the Word XML Toolbox, you can replicate the protection settings for one XML node of a document to other sections using XML. The Duplicate Current Node to all Other Nodes menu item duplicates the protection settings of the currently selected XML node to every other leaf node (a node with no children) in the document. When using this feature, the toolbox prompts you to confirm that you want to change the permissions in this way (Figure 17).
Figure 17. The toolbox asks you to confirm when replicating permissions
Click OK to allow the toolbox to change the permissions throughout the document.
The toolbox has another item, Set All Nodes to Everyone Permission, which grants the Everyone security group the permission to edit every leaf node (a node with no children).
The final, fun feature of the toolbox is the menu item, Choose XML Tag Color. By default, Word uses a pinkish color for the XML tags in the document. The color for the XML tags is the same as that for the smart tags in the document. You can change the color of the XML tags, and the toolbox makes it easy to do so. You can choose among black, blue, green, red, yellow, and the default color. You must restart Word for the change to take effect. While it is not a dramatically helpful feature, it does add a little life to the developer experience.
Working with XML in Word 2003 means many new opportunities for developing with Office. Most users will never know exactly what is going on under the hood with the custom solutions you build. The truth is, if you develop in the right way, they should have no reason for knowing. To make it easier to develop with Word and XML, you can download and starting using the Word XML Toolbox, something designed just for developers. Get started developing your XML-based solution, and leverage the toolbox to get the job done with greater speed and simplicity.
The following are key downloads and articles that you ought to know about when working with XML in Word: