Unlocking Information in the Microsoft Office System Using XML

 

Chris Kunicki and Charles Maxson
OfficeZealot.com

Frank C. Rice
Microsoft Corporation

May 2003

Applies to:
    Microsoft® Office Excel 2003
    Microsoft Office Word 2003

Summary: Learn about a sample Web application built using ASP.NET that demonstrates how Microsoft Office documents can be processed and used external to the application that created them. (17 printed pages)

Contents

Overview
About the Application
Technical Approach
Preparing the Solution
Creating the Documents at Runtime
Extension Masking
Viewing the Documents
Solution Notes
Demo Data
Appendix A – Solution Setup
Appendix B – Solution Walkthrough

Note   The files discussed in this article are available as part of the Word XML Software Development Kit (SDK) to help developers quickly get up to speed on how to build XML solutions using Word 2003 as a development platform.

Overview

This document provides a technical review of the Unlocking Information sample solution which comes with the Microsoft Office System Content Development Kit (CDK). The Unlocking Information sample is a Web-based application built using ASP.NET to show how Microsoft® Office System documents (both Microsoft Office Word 2003 documents and Microsoft Office Excel 2003 spreadsheets) can be processed externally of their respective applications. Additionally, the sample demonstrates how Office documents can be accessed or "repurposed" for uses that range beyond Microsoft Office.

About the Application

The Unlocking Information sample application builds on the concept that organizations often need to create documentation in quantity. This documentation is often part of a repeatable process, prepared by varying team members, that incorporates line-of-business data with boilerplate templates, and needs to be managed as a grouping or "project" of documents.

An example might be a company that underwrites insurance policies for clients. When they prepare a new policy or renew an existing one for a customer, they produce a series of documents that constitutes a policy offer. Perhaps they need to create a cover letter document introducing the policy, timeframes and next steps, a declarations page that explains the policy details such as benefits, pricing, deductions and exceptions, and then they might need to include some form of compliance or disclaimer document that is required by law. Additionally, they might also include spreadsheets that breakdown the costs of the policy by different options or list actuarial data in cells and formulas explaining how the rates are determined. In the end, the result is a group of documents that make up a policy (which we will refer to as a "project" for the rest of this document).

Since many companies share Word and Excel documents with their client-base, Office is the common tool of choice to produce the documents. Additionally, many people need to generate and collaborate using these documents, so centralizing the application on a Web server makes the most sense. In the past, that meant manipulating Word and Excel on the server which was possible but less than optimal for many reasons.

Now, with the XML capabilities and native file formats offered by the Microsoft Office System, developers can unlock Office data through a variety of methods. In the case of the Unlocking Information sample application, documents that have been created with the Microsoft Office System are saved in an XML file format and loaded onto a Web-server where they await processing. When a user makes a request for the documents, the application loads the files as XML, prepares them on the server by inserting line-of-business and user driven data and saves them out as XML files adhering to the XSD Schema requirements that Word and Excel have defined. Upon completion, the user can request to view and work with those files in Word or Excel directly from the server, just as you would expect, even though they are still XML based.

What’s more, since the Office documents simply reside in open standard XML files on the server, the user doesn’t even require Office to view or edit the content. The Unlocking Information sample application allows the user the ability to look at a document in views that differ from Office from outside of Office, and they can even look at multiple documents in a view that consolidates documents across a project. Furthermore, users can edit the documents outside of Office such as adding or changing document comments.

Note that all of this is accomplished without requiring that the Word or Excel executables reside on the server. Simply, XML in The Microsoft Office System changes the way solutions can be built.

Technical Approach

The Unlocking Information sample application is built entirely using Web technologies; server logic using Visual Basic.NET provides the processing punch while and HTML and Java Script are used for client-side rendering. The only true Office functionality used from a development prospective is the new ability for Office to read and write XML. So please note that your mastery of Web topics has been assumed here and the goal of this document is to demonstrate how the XML capabilities in Office can be used to make an application like this possible.

Preparing the Solution

The Unlocking Information sample application begins with a fair amount of Web development. But with that behind you, adding the Office content is very straight forward and almost embarrassingly easy. The application can use any document that has been saved from Word or Excel as XML and it simply looks into a subdirectory created off of the application root folder called OfficeML for the documents it will process.

There is some metadata stored in the sample application in a Microsoft Access database used to manage the documents the application consumes. The metadata include a friendly name and a file description to make the user interface a little richer, but you could conceivably bypass it and have an application that loads any and all documents directly to remove even this light layer of administration – it just makes a better demo to have it.

The documents themselves can be plain or complex and even user created. There really are no rules in consuming and producing Office XML, as long as you stick to the schema(s). Within the documents, there are several methodologies in which data is being inserted at run time ranging from bookmarks, document properties, field codes, and XML nodes in Word to document properties and defined range names in Excel. These elements used to mark up the documents as data placeholders are set up at design time in the document by the author and only require use of native Office functionality. Later, the application will simply walk the XML to recognize these document objects in order to manipulate them.

Creating the Documents at Runtime

When a user requests a new document project in the application, they are prompted to input several data items and to select which of the particular documents they want the new project to be comprised of.

Behind the scenes, the application begins a procedure where it loads each of the Office-created XML files as an in-memory XML Document object. In turn, each of the XML Documents are individually processed to insert the user provided data by using XPath to query to the appropriate nodes within the XML Document.

The application contains several different routines designed with specific XPath queries that target individual elements within Office XML documents. These routines include logic to populate items such as the Document Properties, Defined Ranges, Bookmarks and Mapped XML Ranges found inside the XML Documents. Below is a listing of the routine used to insert document properties in a Word XML Document:

Routine used to insert document properties into WordML using XPath

Sub WordBuiltInDocProp(ByVal sElement As String, _
    ByVal sValue As String, ByVal xdd As XmlDocument, _
    ByVal nsm As XmlNamespaceManager)
    Dim xddNode As XmlNode = xdd.DocumentElement.SelectSingleNode("/w:wordDocument/ _
o:DocumentProperties/o:" & _
    sElement, nsm)

    If xddNode Is Nothing Then
        Dim xddElement As XmlElement = xdd.CreateElement("o", _ 
    sElement, _
            nsm.LookupNamespace(nsm.NameTable.Get("o")))

        Dim xddNodeParent As XmlNode = xdd.DocumentElement.SelectSingleNode("/w:wordDocument/  _
o:DocumentProperties", nsm)

        xddElement.InnerText = sValue
        xddNodeParent.AppendChild(xddElement)
    Else
        xddNode.InnerText = sValue
    End If

End Sub

Hopefully, you will find that all of these routines are actually fairly straightforward for any developer comfortable with XML and XPath, and once you have familiarized yourself with the XML produced by saving Word 2003 and Excel 2003 documents, you could easily extend and add your own. All of the document creation code for the application can be reviewed in the module behind Create.aspx Web form called Create.aspx.vb in the DocLibrary project for Visual Studio.NET. (Note: You may also simply open the file in a text editor such as notepad to review the code.)

Extension Masking

When you have Word and Excel installed on the local machine, Office is able to detect if an XML file is associated with either of its application allowing a document to launch with the respective application. However, when an Office XML file with its “.xml” file extension is served from a Web server, the association is not inherently recognizable as anything unique so the browser draws the page as an XML document. That’s not necessary the behavior that is beneficial for the typical user – in fact the user should never even know we are using XML under the covers.

One little trick of note that the application implements to work around this is that it masks the file extension’s of the Office XML documents it produces. Simply, a “.doc” or “.xls” is appended to the end of the XML document’s file name (for example, “Contract.xml” becomes “Contract.xml.doc”) to make the file open up in the correct Office application. The file is still pure 100% XML but with the coerced extension, it becomes downloadable for direct use in Office.

Viewing the Documents

It was stated earlier in this document that all you needed were Web skills in building an application like this. That is arguably not true unless you consider parsing XML and writing XSLT as Web skills – as some developers do. The Unlocking Information application makes extensive use of parsing and transformation technologies such as XPath and XSLT. This becomes very evident when you examine how the application repurposes Office XML document for use in the browser.

The design goal for the application was to move the all of the formatting logic for displaying Word and Excel documents out of code for both manageability and extensibility. To do this, whenever you see an Office document rendered by the application, it is being driven by an XSLT (Extensible Stylesheet Language Transformations) to alter and display the content.

A good example of this is found in the code behind the DetailDocumentTransform.ascx user control in the DocLibrary project, DetailDocumentTransform.ascx.vb. This user control is responsible for rendering each unique view (for example, preview, summary, styles, comments, and so forth.) of an Office document when you click on a document name from the Projects page (http://<machinename>/DocLibrary/Projects.aspx). The code is kept to an unbelievable minimum by using an XML server control in ASP.NET, essentially boiling it down to critical two lines. One line sets the control’s XML property to that of the Office XML Document and the other to set its XSLT property with the desired XSLT which is loaded at runtime from a group of XSLT files located in the XSLTforOffice directory off of the root path. The code behind DetailDocumentTransform.ascx user control is as follows:

VB.NET code that processes XML/XSLT to view Office documents

Private Sub Page_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load

    'Prepare the control's client objects
    hypClose.Attributes.Add("onclick", "div_Click(this)")
    hypClose.ID = "hyp_" & nDivID

    Try
        'Set the doc property to the Office XML Document object 
        xmlTransform.Document = xdd

        Manage exceptions for Word Comments and Excel Links
        Select Case nType
            Case Detail.TransformType.Regular
                'Common Transform
                imgAdd.Visible = False
                xmlTransform.TransformSource = sXsl
            Case Detail.TransformType.wdComments
                imgAdd.Visible = True
                imgAdd.Attributes.Add("onclick", "NewNote('"  _
    & sLink & "','" & sType & "')")

                ParameterizedTransform()
                PageTitle = " " & PageTitle

            Case Detail.TransformType.xlLinks
                imgAdd.Visible = False
                ParameterizedTransform()

            End Select

    lblError.Visible = False
Catch
    Err.Clear()
    lblError.Text = "Unable to process the XML " & _
        "transformation for this view."

Finally
    If lblError.Text <> "" Then
        lblError.Visible = True
    End If

End Try
End Sub

'Routine to manage XSLTs that require parameters
Private Sub ParameterizedTransform()
    Dim xslT As New XslTransform()
    xslT.Load(MapPath(sXsl))
    Dim xslArg As XsltArgumentList = New XsltArgumentList()

    xslArg.AddParam("sLink", "", sLink)
    xslArg.AddParam("sType", "", sType)

    xmlTransform.TransformArgumentList = xslArg
    xmlTransform.Transform = xslT

End Sub

If you want to examine the entire transformation process of Office documents, also review the Web form Detail.aspx and its code behind module Detail.aspx.vb.

XSLTs for Office Documents

To make really powerful use of the XML that Office produces, you have to be able to manipulate it for your own purposes. Nothing offers more power and flexibility than for writing XSLT to do so. Although perhaps not the simplest of skills to pick up, writing XSLT for Word XML and Excel XML allows you change the way users see Office documents that can serve numerous purposes. What follows is an example of an XSLT document that the Unlocking Information sample application uses to display a view of only the built-in document properties of a Word document:

XSLT used to display built-in document properties of a Word XML document

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"  
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xmlns:w="http://schemas.microsoft.com/office/word/2002/8/wordml" 
xmlns:v="urn:schemas-microsoft-com:vml"  _
xmlns:w10="urn:schemas-microsoft-com:office:word" 
xmlns:SL="http://schemas.microsoft.com/schemaLibrary/2002/8/core" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xmlns:aml="http://schemas.microsoft.com/aml/2001/core" 
xmlns:wx="http://schemas.microsoft.com/office/word/2002/8/auxHint" 
xmlns:o="urn:schemas-microsoft-com:office:office" 
xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882">
<xsl:template match="/">
    <html>
    <head>
    <STYLE type="text/css">
        tblTitle { background-color: #CCCCCC; _ 
            font-weight: bold; border=0 }
        tblData { border-style:solid}
    </STYLE>
    </head>
    <body>
        <xsl:apply-templates select="w:wordDocument/o:DocumentProperties"/>
    </body>
    </html>
</xsl:template>
<xsl:template match="w:wordDocument/o:DocumentProperties">
<table border="1" width="100%">
    <xsl:if test="position()=1">
       <xsl:text disable-output-escaping="yes">&lt;tbody&gt;</xsl:text>
    </xsl:if>
    <tr>
        <td class="tblTitle" width="10%">Title</td>
        <td class="tblData" width="23%">
            xsl:value-of select="o:Title"/>
        </td>
        <td class="tblTitle">Author</td>
        <td class="tblData">
            <xsl:value-of select="o:Author"/>
        </td>
        <td class="tblTitle" width="10%">Category</td>
        <td class="tblData" width="23%">
            <xsl:value-of select="o:Category"/>
        </td>
    </tr>
    <tr>
        <td class="tblTitle">Subject</td>
        <td class="tblData">
            <xsl:value-of select="o:Subject"/>
        </td>
        <td class="tblTitle">Manager</td>
        <td class="tblData">
            <xsl:value-of select="o:Manager"/>
        </td>
        <td class="tblTitle">Keywords</td>
        <td class="tblData">
            <xsl:value-of select="o:Keywords"/>
        </td>
    </tr>
    <tr>
        <td/>
        <td/>
        <td class="tblTitle" width="10%">Company</td>
        <td class="tblData" width="23%">
            <xsl:value-of select="o:Company"/>
        </td>
        <td class="tblTitle">Comments</td>
        <td class="tblData">
            <xsl:value-of select="o:Description"/>
        </td>
    </tr>
    <xsl:if test="position()=last()">
        <xsl:text disable-output-escaping="yes">&lt;/tbody&gt;</xsl:text>
    </xsl:if>
</table>
</xsl:template>
</xsl:stylesheet>

Solution Notes

The Office XML documents, namespaces, XPath and XSLT used in developing this application were designed to work with Microsoft Office System Beta 2. Attempts to run this solution with earlier or later versions of the Microsoft Office System may yield unexpected results.

Demo Data

Additionally, the Unlocking Information sample application offers configurable data which changes the context and look of the application for demo purposes. Using an Microsoft Office Access 2003 database file OfficeML.mdb, you may change setting such as logo, scenario and other data elements by editing the database.

Appendix A – Solution Setup

The Unlocking Information solution files included enable you to run the complete solution as well as view any associated source code for the solution.

System Requirements

The following system requirements are needed to run the solution.

  • Microsoft Windows® XP Professional or Microsoft Windows 2000 Server with Service Pack 2 or Microsoft Windows 2000 Professional with Service Pack 2 or Microsoft .NET Framework 1.0
  • Internet Information Services 5.0
  • Optional: Microsoft Visual Studio .NET 2002 to edit or view the Unlocking Information XML web services.
  • The Microsoft Office System (to view the documents within the Microsoft Office System on the client.

Configuring the System

To install the Unlocking Information Sample Solution primarily involves configuring Internet Information Services to run the ASP.NET solution.

  1. Check to make sure that the default Web site in IIS is running. To do so, in Control Panel, open Administrative Tools, and then open Internet Services Manager (for Windows 2000) or Internet Information Services (for Windows XP Professional), which displays the Internet Information Services console. In the left-hand pane, expand the node containing the name of your Web server (expand the Web Sites node if you're using Windows XP Professional), right-click the Default Web Site node, and click Start if it is enabled (if Start is disabled, IIS is running).
  2. Using the Internet Services Manager, create a new virtual directory named DocLibrary. The directory path for this virtual directory should be set to the DocLibrary folder in the samples (which by default are installed in "C:\Program Files\Microsoft Office Beta Documentation\Microsoft Word XML Content Development Kit Beta 2\DocLibrary"). Additionally, the directory should be configured as an IIS Application. This setting is configured inside the properties dialog for the virtual directory by clicking the Create button.
  3. If the .NET Framework is not installed, install the .NET Framework Redistributable or install Microsoft Visual Studio .NET. Visual Studio .NET which includes the .NET Framework, which is required to host the Unlocking Information ASP.NET solution.

On some systems, ASP.NET needs to be registered with IIS. See the command-line utility installed with the .NET framework: aspnet_regiis.exe found in the .NET directory. (Example: C:\WINDOWS\Microsoft.NET\Framework\v1.0.3705\aspnet_regiis.exe -i)

Appendix B – Solution Walkthrough

Starting the Solution

This sample solution is a Web solution that dynamically creates a set of Office documents, using XML that can then be viewed and modified within the Office applications.

  1. Start Internet Explorer.

  2. Navigate to http://machinename/doclibrary/. The screen shown in Figure 1 is displayed.

    Figure 1. Screen 1 of the Unlocking Information solution

  3. Type any string into the Password edit control and click Enter. The Projects Summary screen is displayed.

    Figure 2. Screen 2 of the Unlocking Information solution

In this solution, a Project is a collection of documents that are focused on a specific customer. When a new project is created, one or more documents are automatically created for a specific customer.

  1. Click New in left-hand Project menu. The screen is displayed.

    Figure 3. Screen 3 of the Unlocking Information solution

  2. Click the magnifying glass to list current customers by account number or company name

  3. Select a customer from the Customer drop down.

  4. Click the Filter icon and enter text in the Filter text edit control.

  5. Click the Apply Filter icon. Click Next.

    Figure 4. Screen 4 of the Unlocking Information solution

  6. Select one or more documents to be created.

  7. Click Next.

    Figure 5. Screen 5 of the Unlocking Information solution

  8. Type or select the desired data elements.

  9. Click Next.

    Figure 6. Screen 6 of the Unlocking Information solution

  10. Type a name for the project.

  11. Click Submit. The following screen is displayed indicating what documents were created.

    Figure 7. Screen 7 of the Unlocking Information solution

Clicking on the icon opens the document in the respective application (either Word or Excel). Clicking the text opens the document in Microsoft Internet Explorer as shown in the figure:

Figure 8. The Cover Letter opens in Internet Explorer

Editors Note   This image is from the Beta 2 version of the Microsoft Office System and shows an invalid schema for the final, commercial version.

See Office XML Output

The Office documents created in the project are dynamically created by generating an XML file that conforms to the Office XML schema of the document. To view this XML document in an Office application, you can simply click on the Word/Excel icon. This loads the XML document into the appropriate Office application embedded inside Internet Explorer. For demo purposes, you can also view the raw XML file to show that the document is indeed a true XML file.

  1. Click the Word/Excel icon.
  2. Click on the document name.
  3. Click Close when done.

Browse Project

The Project level of the solution enables the user to view all of the documents collectively at once, rather than requiring them to be opened one at a time using the associated Office application.

For example, you can easily review the summaries of all of the documents within the project. This is accomplished by extracting the defined range ‘Summary’ from all of the documents using XML.

  • Click the project name in the navigation menu in the left frame.

Add/Delete Project Notes and Tasks

In addition to just reviewing information from one or more documents, you can also dynamically add new notes or tasks associated with them. The notes and tasks are not stored directly in the documents, but in a related XML file associated with the document.

  1. Click the green Go arrow to add notes or tasks.
  2. Click on red X button next to the note or task to delete it.

Browsing a Document Within the Project

Because the documents are stored natively as XML, specific sections of the documents can be pulled out and displayed separately. For example, when you view a document you can select to view different sections of the document, such as comments within the document.

  1. Click on a document by name using the navigation menu.
  2. Select different check boxes in the View area.
  3. Repeat for different document types (Word or Excel).

Add/Change Document Comments

Not only can certain sections elements within the documents be viewed, but they can also be added or modified. If you click on the Go arrow, you can add a comment to the document. This comment is added by modifying the native XML file using standard XML manipulation techniques. When the document is opened within the Office application, you’ll see that the comment has been directly added to the document.

  1. Click the Project Name in the navigation menu in the left frame.
  2. Click on the green Go arrow to add a new comment.
  3. Type the comment in the Add Comment window and click Go to accept.
  4. Click the Word icon.

Natively Edit a Document

Since Word uses XML as an alternative native file format, you can open the document directly in Word, make changes to it, and save it back as XML.

Then, when you browse the document using the Web solution, you will immediately see the changes made to the document.

  1. Open document from the Web solution.
  2. Modify the document and save it back to the site.
  3. Browse to the document in the Web solution.

© Microsoft Corporation. All rights reserved.