Use XPath Explorer to Unlock Data in Word 2007 XML Files
Summary: XPath Explorer is a new tool that you can use to view the XML element hierarchy in any XML file, including Word 2003 and Word 2007 XML files. (6 printed pages)
Bill Coan, Microsoft Word MVP
Applies to: 2007 Microsoft Office System, Microsoft Office Word 2007
With the release of Microsoft Office 2003 in December 2003, Microsoft placed a big bet on the future of XML. Three years later, XML is more popular than ever, and now Microsoft has released 2007 Microsoft Office System—the most XML-friendly version of Office ever.
As Office XML files proliferate, the volume of data inside those files grows larger by the day. Fortunately, data tagged with XML markup does not need to remain locked inside the files where the data is stored. Instead, with the help of the XML Path Language (more commonly known as XPath), you can extract the data in XML files with the speed and precision of a robotic arm.
If this strikes you as something that only a crazed data-holic or an IT specialist would find exciting, wait until you see how easy it is to navigate an XML hierarchy using XPath Explorer.
Keep in mind that for as long as you have been using Microsoft Windows you have been navigating hierarchical data structures. For example, Figure 1 shows a familiar hierarchical data structure: a set of nested folders on a hard drive.
If you started at the root folder of the hard drive (C:\) and had to find your way to the FIRSTNAME folder near the bottom of Figure 1, how would you get there?
I suppose you could check every folder on the hard drive until you found one called FIRSTNAME, but what if there were multiple folders called FIRSTNAME? How could you be certain that you found the folder shown at the bottom of Figure 1?
The answer is both simple and obvious. To get to the desired folder, you would start at the root folder (C:\), and inside that folder you would open the BOOKS folder, and inside that folder you would open the BOOK folder, and inside that folder you would open the AUTHOR folder, and inside that folder you would open the FIRSTNAME folder.
We sometimes refer to this as "walking the hierarchy" or "walking the path" to the desired folder. Indeed, in Windows (as in other file systems), the full name of a folder is called its path name. The pathname of the folder at the bottom of Figure 1 is C:\BOOKS\BOOK\AUTHOR\FIRSTNAME.
An XML data hierarchy closely resembles a set of nested folders on a hard drive. If you think it is useful to walk a path to a particular folder on a hard drive, just imagine being able to walk a path inside an XML file until you find the exact element or collection of elements that your boss wants on her desk this instant! That is exactly what the XPath language enables you to do.
Consider the following XML markup inside a Microsoft Word document. As you can see in Figure 2, the markup describes a hierarchical data structure very similar to the nested folders in Figure 1.
In this case, there is a BOOKS element (instead of a BOOKS folder) with a BOOK element nested inside it. Inside the BOOK element is a TITLE element. Inside the TITLE element is an AUTHOR element. Then inside the AUTHOR element is a FIRSTNAME element and a LASTNAME element.
Although you cannot tell just by looking at the document, the elements in the document belong to a particular namespace that distinguishes them from similarly named elements in other namespaces. The namespace for this particular XML is "http://www.wordsite.com/books."
In order to walk the path to the data you are interested in, you need to specify the namespace that the data belongs to. To specify the namespace, you use a statement such as xmlns:x="http://www.wordsite.com/books".
Then you need to specify the path that you want to walk. To specify the path that starts at the BOOKS element and then winds its way down to the BOOK element, you would use the XPath expression /x:BOOKS/x:BOOK.
The point here is not to fully explain the XPath language, because there are many articles that already do that (see the links section at the end of this article). Rather, the point here is to help you recognize that walking the path to a particular element of data inside an XML file has a lot in common with walking the path to a particular folder on a hard drive.
Starting with Word 2003, you can use XPath expressions in INCLUDETEXT fields to pull data from an XML document into a Word document. The XML document can be an ordinary XML text file or a Microsoft Office document (such as an Excel spreadsheet or a Word document) that you have saved in XML format. For more information about the use of XPath expressions in INCLUDETEXT fields, look up INCLUDETEXT in the Word 2003 or Word 2007 Help system, or read Field codes: IncludeText field.
Starting with Word 2007, you can use XPath expressions to link content controls to XML data in the document's datastore. Because external programs can access the XML datastore, content controls in the Word document that are linked to the datastore can automatically display XML data from external programs. For more information about the use of XPath expressions with content controls, look up content controls in the Word 2007 Help system, or read Application Development using the Open XML File Formats.
Although the fundamentals of the XPath language are simple, the language itself is very powerful. The best way to learn how to harness that power is to test XPath expressions to see which data a particular XPath expression returns. Using XPath Explorer, a new freeware tool I developed, you can test any XPath expression that you want. Figure 3 shows the main screen of the XPath Explorer tool.
XPath Explorer is compatible with Word 2003 and Word 2007, and it works with any document that contains XML markup, including arbitrary XML files opened in Word. If you want to try out XPath Explorer but you don't have an XML document handy, XPath Explorer can generate one for you. After XPath Explorer has generated a sample XML document, you can experiment with 30 built-in XPath expressions for that sample. In addition, you can enter any arbitrary XPath expression that you want to test.
XPath Explorer is a flexible tool. The tool lets you view the results of an XPath expression complete with XML markup (including WordProcessingML markup if desired) or as plain text with no markup.
Download a free copy of XPath Explorer.