Using Office Open XML Formats to Support Electronic Health Records Portability and Health Industry Standards

Summary: Use Office Open XML Formats and HL7 CDA to exchange health data securely using industry standard formats. (24 printed pages)

Ted Pattison, Chris Predeek Ted Pattison Group

Chris Predeek, Ted Pattison Group

Wouter van Vugt, Info Support

October 2007

Applies to: 2007 Microsoft Office System

Download the sample files: 2007 Office Sample: Using the Office Open XML Formats to Support Electronic Health Records Portability and Health Industry Standards

Contents

  • Overview

  • Customer Scenario

  • Code Overview

  • Standards Overview

  • Understanding the Office Open XML Formats Architecture

  • Making Use of the Open Packaging Conventions Specification

  • Implementing XSLT

  • Using the Open XML SDK

  • Understanding Document Structure

  • Creating XML Data Stores

  • Digital Signatures in the Open XML File Formats

  • Conclusion

  • About the Authors

Overview

Empowering patients and consumers to exchange Electronic Health Records securely is a big debate in the health industry across the globe. Learn how to use Office Open XML Formats and custom XML formats to exchange data securely. This particular scenario shows the use of Health Level Seven (HL7) Clinical Document Architecture (CDA) to represent the Electronic Health Record in an industry standard format. It also shows how to include the data in a secured document, based on Office Open XML Formats, for portability across multiple care providers.

Customer Scenario

Mr. Yasuda just moved into town and decided to sign up with a new primary care physician. He selected Contoso, LLC as his physician practice. Contoso is a leader not only in the medical field, but also in the area of technology adoption and customer empowerment, allowing future patients to sign up online securely by completing a simple form, and current patients to download secure copies of their medical records. Mr. Yasuda completes the sign-up form. The information he provides is automatically extracted and stored in Contoso’s systems and verified upon the first patient visit. After many years of visiting Contoso for ordinary care, Mr. Yasuda accumulated a good deal of medical history that is safely stored in Contoso’s Electronic Health Record (EHR) system. Unlike many other physician practices that use paper-based systems to maintain medical records, Contoso invested in an EHR system that allows their patients to export their medical records by using industry standard formats: HL7 CDA for the medical records and a digitally signed document based on Office Open XML Formats for the visual representation and secure envelope. The document allows the patient to access it in a rich format suitable for viewing and printing while preserving the full richness and semantic meaning of the medical record in HL7 CDA format.

From a technical perspective, the data for these medical records is stored in an EHR system in Contoso. After a patient, such as Mr. Yasuda, clicks the link to download a copy of his medical records from the secure Web site, the HL7 CDA is assembled dynamically, potentially from multiple data sources and systems, and then embedded in the document and downloaded to the patient’s desktop.

Using Microsoft Office Word 2007, Mr. Yasuda can view his medical records and, if necessary, print them for archival. He can also back up the document electronically and file it with other important records. Being in complete control of his medical records, Mr. Yasuda can also share his records with his relatives or with another doctor or organization. Because the medical record is a copy of what is stored in the physician’s office, he cannot update or tamper with the document as this breaks the digital signature that was applied to it before the medical records were downloaded from Contoso’s Web site.

When Mr. Yasuda shares his medical records with other organizations, the systems there can verify that the document is not tampered with and that the HL7 CDA document embedded in it still reflects an accurate snapshot of the medical records at the time it was downloaded or assembled.

Mr. Yasuda decides to sign up for an online Personal Health Record (PHR) service that he selected for its ability to accept medical records in the HL7 CDA format, embedded in a document based on the Office Open XML Formats. When he uses this feature, Mr. Yasuda’s medical records are transferred with full fidelity from Contoso, and he avoids retyping the information and introducing potential errors. The PHR service allows Mr. Yasuda to privately view his medical records and more effectively and proactively manage his health. As part of the uploading process, the document is inspected and the digital signature validated. As long as the document is digitally signed by an organization such as Contoso that is known to the PHR service and the digital signature is correctly verified, the HL7 CDA can be extracted and stored in the PHR service systems. Thanks to the successful verification of the digital signature, the PHR service has a higher degree of confidence that these medical records were provided by Contoso and that they were not altered.

Code Overview

Accompanying this article is a solution based on the Microsoft .NET Framework that contains a sample implementation of this scenario. For more information, see 2007 Office Sample: Using the Office Open XML Formats to Support Electronic Health Records Portability and Health Industry Standards.

There are two Web sites, one for the physician and the other for the PHR service. Assuming Mr. Yasuda’s identity, you can register at the Contoso practice and download your medical records. These records are signed by the practitioner to assure document integrity. You can easily verify the integrity of the signature in Office Word 2007. After verification, you can submit the records to the online PHR service.

The strategies for storing the HL7 CDA can differ. For example, you can choose to disassemble the document and store the elements of the medical record in a database. Alternatively, you could also store the entire XML document and extract elements when needed. In this example, we decided to store the HL7 CDA “as-is” and display it by using the standard XSLT provided by HL7 with the CDA standard. The sample data used in this scenario and the sample code are from the sample CDA document included in the HL7 standard.

Standards Overview

This section reviews the standards used to implement this solution.

Office Open XML File Formats

The 2007 Microsoft Office system adheres to the new Office Open XML Formats markup language standard for documents, spreadsheets, and presentations. For more information about the Office Open XML Formats standard, see TC45 - Office Open XML Formats. The file format is also based on the Open Packaging Conventions and used by the XML Paper Specification (XPS). The Office Open XML Formats standard provides a document markup standard designed to facilitate the many different tasks people perform using documents, and it also provides a model for representing business-related data inside the document. The standard includes features such as accessibility support for making documents open to people with disabilities. Office Open XML Formats are implemented in multiple commercial and open source productivity suites. By combining Office Open XML Formats with the power to embed custom XML data in a document, a compelling opportunity exists for enabling documents in the health industry. The standardization process ensures document transparency and compatibility throughout its lifetime.

Organizations in the healthcare industry are required to generate and manage many clinical documents—such as patient charts, discharge summaries, referrals, and prescriptions—for an extended period. Any technology used to manage medical records in electronic format must persist. Technologies, which are introduced and then quickly abandoned, create problems when it comes to accessing that information. The Office Open XML Formats are an international standard. This alleviates the difficulties faced with document retention and the changes in the format used to describe these documents. The standard and the accompanying specifications for the Office Open XML Formats, coupled with the use of industry standards such as HL7 CDA for structured information, allow anyone to build a lasting representation of such medical records. These standards enable interoperability across organizations such as hospitals and PHR services while at the same time empowering consumers to have an active role in managing their own medical records.

Another important aspect when handling and storing sensitive data such as patient records or research results is assuring that data is not tampered with. To facilitate the detection of unauthorized changes, the Office Open XML Formats provide an elaborate structure for signing document content. You can use industry standards such as X509 certificates or custom implementations.

The Office Open XML Formats use industry standards as the core building blocks and build upon them to create a rich environment for authoring documents. These technologies are available on any platform. The document consists of XML files assembled in a ZIP container. Most of this XML data is a representation of one of the three markup languages. The naming of these languages follows a simple pattern. WordprocessingML defines word-processing document content. SpreadsheetML defines spreadsheet document content, and PresentationML defines presentation content. There are also supporting languages, such as DrawingML, which provides a common language for defining figures, charts, tables, or diagrams.

Figure 1. Main components of the Open XML standard

Main components of the Open XML standard

Besides providing a structure for defining document content, Office Open XML Formats provide several technologies that allow you to represent business data within the document. This combination of formatting and business data enables great customer solutions. Before Office Open XML Formats, business-related data was usually stored in normal document content. To retrieve data such as a patient number or prescription the standard approach was to open a document programmatically, search three paragraphs down, and then locate the first italic text, which should represent that piece of data. This approach is obviously flawed. As soon as you format the document further or you apply a new template for the same data, the code breaks and you must revise it. By using Office Open XML Formats, you can combine the reference schemas, which define bold text, italic text, and other formatting with structured business data. Instead of accessing a patient number based on its format, you can just access the patient number directly. One added benefit is that you can maintain the data outside the main document body, allowing easy access to extract or embed data from the document.

Figure 2. Identifying data by formatting versus by content control

Identifying data by formatting

Health Level Seven (HL7) Clinical Document Architecture (CDA)

Health Level Seven (HL7) is one of the many Standards Development Organizations (SDO) operating around the world in the area of Healthcare Information and Communication Technology. HL7 started working on standards for clinical integration in the 1980s. Today it is one of the most successful organizations creating standards in healthcare. HL7 maintains some of the most widely used standards for clinical information exchange around the world: HL7 Messaging Version 2, HL7 Version 3, and the Clinical Document Architecture (CDA).

HL7 CDA is a document markup standard based on XML. The HL7 Version 3 Reference Information Model (RIM) specifies both the structure and the meaning of individual data elements in the document for the purpose of exchange. HL7 CDA is used primarily for clinical documents such as discharge summaries, referrals, and dictation reports.

Since the approval of CDA as an ANSI standard in 2005, the HL7 committee working on its development has focused its attention on creating reusable templates and constraints for commonly used clinical documents. The first product of this work is the Continuity of Care Document (CCD) that we used in the sample code that accompanies this article to represent the personal health record of the patient.

For more information about HL7, CDA, and CCD, see the HL7 Web site.

Understanding the Office Open XML Formats Architecture

The new file format container is based on the simple and parts-based compressed ZIP file format specification. At the core of the new Office Open XML Formats is the use of XML reference schemas and a ZIP container. Each file contains a collection of any number of parts; this collection defines the document.

Document parts are stored in the container file or package using the industry-standard ZIP format. Most parts are XML files that describe application data, metadata, and even customer data, stored inside the container file. Other non-XML parts may also be included within the container package, including such parts as binary files representing images or OLE objects embedded in the document. In addition, there are relationship parts that specify the relationships between parts; this design provides the structure for an Office file. While the parts make up the content of the file, the relationships describe how the pieces of content work together.

For detailed information, see the following resources:

An easy way to look inside the new file format is to save a Word 2007 document in the new default format and rename the file with a .zip extension. Double-click the file to open and view its contents.

NoteNote

To understand the composition of a file based on Office Open XML Formats, you may want to extract its parts. To open the file, we assume that you have a ZIP application, such as WinZip, installed on your computer.

To open a Word XML format file in Word 2007

  1. Create a temporary folder in which to store the file and its parts.

  2. Save a Word 2007 document (that contains text, pictures, and so on) as a .docx file.

  3. Add a .zip extension to the end of the file name.

  4. Double-click the file. It will open in the ZIP application. You can see the document parts that are included in the file.

  5. Extract the parts to the folder that you created earlier.

Figure 3. Hierarchical file structure of a Word 2007 document containing images and embedded objects

Hierarchical file structure of a Word document

The Open Packaging Conventions specification defines the structure of Word 2007 documents using the new file format. For more information about open packaging conventions, see the Open Packaging Conventions also used by the XML Paper Specification. In addition, there is a recently released Java-based version. OpenXML4J is a Java library dedicated to the creation and manipulation of Office Open XML (ECMA-376) and OPC-based documents (for example, Microsoft Office Word 2007, Microsoft Office Excel 2007 and Microsoft Office PowerPoint 2007 documents). OpenXML4J provides you a way to create and manipulate Open XML documents for many scenarios without using any Microsoft Office suite. For more information, see the OpenXML4J Project: Office Open XML File Format library for Java.

This article uses a “Hello World” example as a sample document. The walkthrough uses the Packaging API to address the low-level XML and ZIP details. This Hello World document contains three parts, only two of which you create manually. First, you create a document part that stores the “Hello World” text. WordprocessingML uses a separate document part to store this data. Next, you create the relationship between the package and the document part that contains the data by creating a relationship part. Finally, the Packaging API creates a content types listing. This content types listing identifies the contents of any document part in the package. Because the Packaging API only provides an abstraction of the ZIP details, you must create the XML markup that forms the “Hello World” text manually by using System.Xml.

Figure 4. Example of the XML markup for the “Hello World” text

Example of the XML markup

The following sections review the major components of the Office Open XML Formats architecture.

Content Types

The first important concept is the content types listing. In the root of the document container, there is a document part called [Content_Types].xml. This document part is unique and stores a dictionary with content type strings for all the other parts inside the package. The content type indicates to the consumer what type of content to expect in a part. There is a distinction between binary and XML data, but XML data is further split into additional types because most of the parts inside a package are defined using XML. You can define the content type by using two models. You can tie a default content type to a specific file name extension, or you can override this default content type based on the location of a package part inside the container. The following code example depicts a simple content types list.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
    <Default
        Extension="rels"
        ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
    <Default
        Extension="xml"
        ContentType="application/xml"/>
    <Override
        PartName="/word/document.xml"
        ContentType="application/vnd.openxmlformats-…
            …officedocument.wordprocessingml.document.main+xml"/>
</Types>

Relationships

Another XML file within the ZIP container, which uses the .rels extension, is always stored in folders called _rels. These relationship parts tie the various parts of the document together. Instead of storing relationships between the package parts inline in each part itself, the relationship part model is used. Any document part can relate to any other part by using relationships. This includes circular relationships. The relationship model greatly eases the workload of applications that need to browse through a package to find specific elements. You are not required to parse large document content to find a header. Instead, the relationship part provides this information. This is a very important aspect when it comes to working with packages based on the Office Open XML Formats. Because of the relationship parts, the location of any part inside the package can change based on the preference of each user. Paths inside the package are not static. They may change through versions of the document, or because the document is being modified with different productivity applications. You always need to work with the relationships.

A relationship stores four pieces of data. Each relationship uses a unique relationship ID. This ID must be unique for each source of the relationship. Next, you apply a relationship type. This is the second model that you can use to find document parts inside a package. For WordprocessingML-based documents, the headers and footers are located based on the relationship type. Embedded images are samples of document parts found using the relationship ID. The relationship stores the target URI of the location it points to. This URI can be either inside the package, or outside. This is indicated by a flag in the relationship markup, as shown in the following code example.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships
    xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
    <Relationship Id="rId1"
        Type="http://schemas.openxmlformats.org/officeDocument/
        2006/relationships/extended-properties"
        Target="docProps/app.xml"/>
    <Relationship Id="rId2"
        Type="http://schemas.openxmlformats.org/officeDocument/
        2006/relationships/officeDocument"
        Target="word/document.xml"/>
</Relationships>

The Open XML specification defines two types of relationships, always stored in a file with the .rels extension. There are relationships that have a source part defined and those that do not have a source part defined.

The relationships without a source part are package-level relationships. You could consider the entire package the source of the relationship. These package-level relationships are the ones used to start browsing through a package. They are always stored in a specific relationship file. You can find it in the _rels/.rels file. This location cannot change. Most documents based on the Office Open XML Formats have at least three package-level relationships: One for the core properties, one for the application properties, and one for finding a special part in the package called the start part. This relationship has a specific type to indicate where to start parsing a document. This relationship points to a package part called the start part. The relationship type used for finding the start part is displayed below.

http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument

The start part usually defines relationships that point to parts containing other document content such as a style or theme. Because the relationship is defined within a source part, it is considered an item-level relationship. These item-level relationships are stored in the _rels subfolder relative to the location of the source part. Each relationship file is named after the related part by appending a .rels file name extension to the full file name of the related part. For example, the relationship part for the document.xml part is called document.xml.rels.

Figure 5. Package relationships and document parts

Package relationships and document parts

Making Use of the Open Packaging Conventions Specification

The Open Packaging Conventions uses the same concepts for defining file structure. The Package class defines an entire Open XML package. The PackageRelationship defines a single relationship. The PackagePart class encapsulates a package part inside the package.

To create the sample document, you need three lines of Packaging API code and some code to generate the WordprocessingML markup for displaying the “Hello World” text.

Creating the Package

The first step is to create a package. To do this, use the Package class and the static Open method. This method returns a new Package instance.

class Program
{
    static void Main()
    {
        using (Package package = Package.Open("Sample.docx", FileMode.Create))
        {
            // The rest of the code goes here.
        }
    }
}

Creating the Main Document Part

After you create the package, the next step is to create the main document part. You can create a document part by calling the CreatePart method on the open package. To create a part, first specify a unique location inside the package where you want to store the new part. To identify the location, use a URI. Note that an exception is thrown if the location is already in use. The second parameter to the CreatePart method is the content type definition for the new part. This automatically updates the [Content_Types].xml list inside the package. You do not need to do this manually.

PackagePart mainPart = package.CreatePart(
    new Uri("/document.xml", UriKind.Relative),
    "application/vnd.openxmlformats-
     officedocument.wordprocessingml.document.main+xml");

Creating the Relationship

The last step is to create a relationship to point to the main package part. You create this relationship by calling the CreateRelationship method on the source object. If the source is the package, you call package.CreateRelationship. If the source is another document part, call mainPart.CreateRelationship. For example, you want the mainPart to have a relationship to another child part. There are three pieces of data to specify. Specify the destination by using a URI value. This destination can be inside the package (TargetMode.Internal), or outside (TargetMode.External). This second target mode is used for hyperlinks and other external links. Finally, you need to specify the relationship type as the last parameter. Later, you can use this relationship type to browse through a package.

PackageRelationship mainRelationship = package.CreateRelationship(
    mainPart.Uri, 
    TargetMode.Internal,
    "http://schemas.openxmlformats.org/officeDocument/2006/
    relationships/officeDocument");

Using System.Xml

After you address the packaging details, the next step is to fill the main document part with content. The Hello World example has a basic hierarchy. The <document> tag is the root. Next, you can find <body>, which contains the document body. The body is formed using block-level content such as paragraphs or tables. Paragraphs contain runs using <r>. A run is split into text using <t>. The reason for this model is that a run is the lowest-level element to which you can apply formatting. The run can also contain non-printing characters such as a carriage-return, or <cr />.

The Open Packaging Conventions API does not implement the XML details of the Office Open XML Formats. It provides the part content as a Stream object. Using an XmlWriter instance, you can write XML to this stream. There is one important fact to consider. The nodes that make up the document text all need to reside in the WordprocessingML namespace.

http://schemas.openxmlformats.org/wordprocessingml/2006/main

You can use the following code to create a basic document that contains one paragraph of text.

string xmlNamespace =      "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
using (XmlWriter writer = XmlWriter.Create(
    mainPart.GetStream(FileMode.Create, FileAccess.ReadWrite))) {
    writer.WriteStartDocument();
    writer.WriteStartElement("w", "document", xmlNamespace);
    writer.WriteStartElement("w", "body", xmlNamespace);
    writer.WriteStartElement("w", "p", xmlNamespace);
    writer.WriteStartElement("w", "r", xmlNamespace);
    writer.WriteStartElement("w", "t", xmlNamespace);
    writer.WriteString("Hello World");
    writer.WriteEndDocument();
}

Alternatives to System.Xml

This section walks through alternatives for generating documents without using any code.

Implementing XSLT

To generate an entire document using XML-related code is a tiresome process. You need to maintain a lot of code to generate even the simplest document. To alleviate the difficulties of creating big documents using 100% code, two main solutions are used. You can use document templates and XSLT files or any combination of the two. Using a document template, you can greatly ease the workload by keeping the formatting aspect of the document outside the main code base within templates. These templates are easier to edit and reformat directly. However, one difficulty is identifying the areas that contain data. The technologies available to do that are discussed later in this article. First, an XSLT file needs further examination.

An example of a simple XSLT file looks like the following.

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/">
        <w:document 
            xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
            <w:body>
                <w:p>
                    <w:r>
                        <w:t>Hello World!</w:t>
                    </w:r>
                </w:p>
            </w:body>
        </w:document>
    </xsl:template>
</xsl:stylesheet>

Because the XML data inside an XSLT file can remain in an XML format, it is easier to modify the XSLT file later. Because an XSLT file is a text file, you do not need to re-test compiled code. This eases the development and maintenance process.

There are two important considerations to working with XSLT files. The first is the lack of expressive power compared to a code-based approach. XSLT files are not very good at formatting dates, for example. Many XSLT engines allow the extension of the XSLT file with custom objects to provide more advanced functionality not defined within the XSLT specification. The engine used in the .NET Framework allows the use of extension-objects and custom parameters.

The second consideration concerns the output capability of the XSLT engine. An Open XML package consists of many XML files, many of which are related through various properties. The XSLT 1.0 specification only allows a single output file per XSLT. The XSLT 2.0 specification does allow you to create multiple files using a single XSLT file, but engines that implement XSLT 2.0 are not yet common.

Using the Open XML SDK

Another approach that reduces the amount of code you must write and maintain for Open XML solutions is to use the Microsoft SDK for Open XML Formats, which was first released in June of 2007. This API provides high-level abstractions for concepts such as the document start part and other parts, eliminating the need to manage relationships and content types explicitly in your code.

To download a free copy of the SDK, see Open XML Format SDK 1.0. To view the online documentation, see Welcome to the Open XML Format SDK 1.0.

As of this writing, the final release date and planned functionality for the final version of this API is not announced. Microsoft is soliciting developer feedback on the new API to help define the roadmap, through an MSDN support forum located here: Open XML Format SDK Discussion Forum.

Understanding Document Structure

In a simple patient registration form, two levels of content are usually defined. The first level is defined with the layout of the form and the basic formatting. The second are the placeholders for various pieces of information such as patient details or insurance information. One of the difficulties in working with these documents in digital form is to allow programs to access the data in such a manner that you can ensure the right content is being retrieved. One would not want the extracted patient's name to read, “Please enter your name” because a slight change in formatting changed the look of the registration form.

Office Open XML Formats provide various technologies to allow data to be better accessible inside the document without relying on code-based solutions such as Microsoft ActiveX. For WordprocessingML, the main technology is the content control, also known as the structured document tag. You can use these structured document tags to create placeholders for simple elements such as dates, and for complex structures such as paragraphs or pages. Consider the following patient registration form. It is a generic form that can be provided by a physician practice (such as Contoso in our earlier scenario) and used by many of its practitioners.

Figure 6. Sample patient registration form

Sample patient registration form

There are many pieces of data defined inside the registration form. You can find various types of data such as simple text, dates, and choice fields. In printed form, it is difficult to validate whether the form is completed correctly. Using content controls, you can provide for all these fields types and provide extra features such as prevention of deletion and validation of content.

Because the content control is just another markup element of WordprocessingML, the data is still stored inside WordprocessingML markup. Custom solutions building on the data in the registration form have a more difficult time accessing this data compared to when it is allowed to remain in its native business-related form. To facilitate this separation of data and formatting, you can create XML data islands inside a document and bind the data to the content controls in the document.

The patient registration form uses this data-binding technology to enable a custom solution to extract the data. You can then upload the patient registration form to the Web site of a practitioner to register a patient. On the Web site, the data inside the registration form is easily available because of the data binding in the document. The XML data from the document is processed and stored in the back-end data store. Typically a database or an EHR system would be used for storing this content, but for simplicity's sake in our scenario, the data is kept in XML form. To validate the new patient, an XSLT transforms the registration details into HTML, which is then displayed on the practitioner's Web site.

Figure 7. Transforming patient data for display using an XSLT

Transforming patient data for display

In our scenario, Contoso makes the medical records available to the patient in HL7 CDA format. The data for the HL7 CDA is partly constructed from the data extracted from the patient registration form. All that is required to generate the HL7 CDA document is another XSLT transformation based on the extracted patient data.

Figure 8. Transforming patient data into an HL7 format using a second XSLT

Transforming patient data into an HL7 format

It is also important to note that the choice of the most appropriate tool for completing forms depends on the scenario. In our case we chose to use Word 2007 to illustrate the concept of content controls; in other cases, other applications, such as Microsoft Office InfoPath 2007 could be more appropriate.

Creating XML Data Stores

As the name implies, an XML data island stores business-related data in XML form inside a document based on Office Open XML Formats. It is one example of a custom XML part. This XML format contains no references to the Office Open XML Formats. It is purely business-related data, and any valid XML document can be stored within it. The XML data island is stored as a separate part inside the package. This package part is related from the main document part using a specific relationship type to allow the XML data to be found.

http://schemas.openxmlformats.org/officeDocument/2006/relationships/customXml

The new-patient form contains an XML data island to store the XML data of all the form fields. The following XML is a small abstract of the message being defined in the form.

<?xml version="1.0" encoding="utf-8"?>
<Data xmlns="http://www.example.org/ooxml/healthcare/registrationform">
    <Patient>
    <MaritalStatus>Married</MaritalStatus>
    <BirthDate>2007-05-01T00:00:00</BirthDate>
    <Age>12</Age>
    <Sex>Male</Sex>
    <SSN>123-23-1234</SSN>
    <Name />
    <Address />
    <Contact />
    <Employer />
    </Patient>
    <Insurance covered="false">
        <Subscriber />
    </Insurance>
    <Emergency>
        <Name />
        <Relationship />
        <HomePhone />
        <WorkPhone />
    </Emergency>
</Data>

Although the patient registration form contains only a single registration, you might have a form that displays multiple registrations, perhaps if you create an overview document for all the registrations in the past month. You do not need to modify the XML data island when you are combining two of these registration forms. You can identify the XML data uniquely by using a set of properties. You can also use these properties to add a reference to XML schemas to use to validate the data in the registration form. The properties are stored in a separate part inside the package. The XML data island is a custom XML part. The properties are called custom XML property parts.

Figure 9. Creating a custom XML part

Creating a custom XML part

To view the data inside the custom XML part outside of the productivity application (such as Microsoft Office), the easiest approach is to use a package viewer application such as Open XML Package Explorer. This tool allows you to open any document based on Office Open XML Formats and view or edit the individual parts of the document.

Figure 10. Using Package Explorer to view and edit document parts

Using Package Explorer

Using Content Controls

The patient registration form contains many placeholders to identify regions of content. It is important that customers enter the details in the registration form correctly. To facilitate this, the example form creates the placeholders in the top part of the form using structured document tags. The bottom half of the form still uses the older model of creating formatting-based placeholders using continuous underlines _____ or a different background color. The structured document tags provide a richer and non-intrusive model for defining the regions of content in the form.

Figure 11. Defining content form regions

Defining content form regions

A structured document tag can provide a custom user interface to enhance the interaction with the user and provide a safer model for defining the allowed types of data. This user interface is only visible when the user is actively working with the placeholder. Viewing the document normally displays only the content. A structured document tag can define a title and placeholder text for visual identification.

To identify the structured document tag using code, apply a tag name. This tag name enables you to identify the placeholder independent of its location inside the document. Using tag names, you can enable various solutions. Think of a custom registration form for each practitioner, each using the practitioner's own logo and style. As long as you use the same tag names, the applications building on the registration forms can support such a scenario without the need to revise the code for each practitioner.

Another advantage of the structured document tag is that it allows separation of the creation of the document template from the code that automates the document. Someone skilled at creating beautiful documents can work on presentation while you create the code. You only need to know the tag name and the type of the structured document tag. You do not need to know where these structured document tags are located inside the document.

Figure 12. Document tags available in WordprocessingML

Document tags available in WordprocessingML

Most of the structured tags provide simple document content. The special tags are the group tag and building block tag. You can use the group tag to organize content, including other document tags, into groups. Normally, you use it to lock down a specific piece of the document, preventing editing except for within the child document tags. This enables you to create a form that contains editable placeholders where the form layout and text are non-editable. The building block tag allows you to replace its content with a predefined building block defined in the document template. You can create your own building blocks, such disclaimers or common clauses.

To create a structured document tag, you use the sdt element. The structured document tag uses two distinct sections: properties and content. Inside the properties, you define details such as the type of document tag, a title, and a tag-name. The type is identified using a specific child element such as <date/>. You can further customize the settings for each type of document tag inside the element defining the type. The sdtContent is used for two things: It contains the content of the structured document tag, or it contains placeholder text. Based on the showingPlcHdr property, the customer can decide on the appropriate data to provide.

Figure 13. Example of placeholder text in a form and content within the structured document tag

Example of placeholder text

The following example shows how to create a text control and set the text content.

<w:sdt>
    <w:sdtPr>
        <w:alias w:val="Last Name" />
        <w:tag w:val="Last Name" />
        <w:text />
    </w:sdtPr>
    <w:sdtContent>
        <w:r>
            <w:t>Predeek</w:t>
        </w:r>
    </w:sdtContent>
</w:sdt>

Binding Data to Custom XML Files

The structured document provides a great first step to the patient registration document. To further enable the custom solution, there is an extra feature that provides great benefit. In our scenario, the data is extracted from the document and stored in the back-end data store on the practitioner’s Web site as an XML message. To ease the workload of applications that access document data, you can bind the structured document tag to an XML data island. This bridges the gap between document formatting and document data.

Figure 14. Binding the document tag to an XML data island

Binding the document tag

You can bind a structured document tag to an XML data island by using an XPath expression. This expression can include XML namespaces using the common model of applying namespace prefixes. A separate mapping of namespace prefixes to XML namespaces is maintained in the properties of the structured document tag. The properties can also include the identity of an XML data island using a GUID identifier. Without a GUID, the XPath expression is evaluated to the first matching data island in the document.

<w:sdt>
    <w:sdtPr>
        <w:alias w:val="Last Name" />
        <w:tag w:val="LastName" />
        <w:dataBinding
            w:prefixMappings="xmlns:ns0='http://www.example.org/ooxml/…
                …healthcare/registrationform'"
            w:xpath="/ns0:Data/ns0:Patient/ns0:Name/ns0:Last"
            w:storeItemID="{A5566E5B-6220-4F8C-BB50-CB2E486681CB}" />
        <w:text />
    </w:sdtPr>
    <w:sdtContent>
        <w:r>
            <w:t>Predeek</w:t>
        </w:r>
    </w:sdtContent>
</w:sdt>

You can add the <dataBinding> tag in Office Word by using the Microsoft Visual Basic for Applications (VBA) environment, or manually by using a tool such as Package Explorer. To enable a better experience for mapping structured document tags to the XML data islands in the document, you can also use the Content Control Toolkit. To download a free copy of the toolkit, see Word 2007 Content Control Toolkit.

Figure 15. Word 2007 Content Control Toolkit

Word Content Control toolkit

To learn more about content controls and their data-binding capability, see Ted Pattison’s OfficeSpace column, Building Office Open XML Files.

Digital Signatures in the Open XML File Formats

To ensure that you can trace the medical records that the patient uploads to the PHR service back to a physician in the Contoso practice, each document is signed with a digital certificate issued to the physician or the Contoso practice. Open XML provides an extensive environment for signing documents using a digital signature. You can create the signature by industry standards such as X509 certificates or use a custom implementation. You can sign entire documents, only specific parts, and the relationships between the parts. You can optionally sign only a subset of the relationships, leaving out the styles and theme, for example.

Overview of Digital Signatures

Digital signatures is an encompassing name for technologies related to asserting the authenticity and integrity of messages, or in this case, documents. The approach stores an encrypted hash taken from the document inside the ZIP container. If later a claim needs to be made that the document is unchanged, or the signing party needs to be identified, the encrypted hash is decrypted and compared to a new hash taken directly from the document. If the hashes are equal, you can verify the authenticity of the file because the hash was decrypted using the public key. The document was obviously unchanged because the hash values are equal. Office Open XML Formats provide a standardized approach to signing a document.

Signing a Document with the Open Packaging Convention

The first step in signing a document is to get a list of locations for all parts to sign by using the Package class of the Open Packaging Convention.

PackagePartCollection parts = package.GetParts();
List<Uri> toSign = new List<Uri>();
foreach (PackagePart packagePart in package.GetParts())
{
    toSign.Add(packagePart.Uri);
}

Next, you need an X509 certificate to sign the document with. You can use the X509Store class to open a certificate store of the current user and extract a certificate. This class, in addition to other certificate-related classes, is part of the .NET Framework.

X509Store store = new X509Store(StoreLocation.CurrentUser);
store.Open(OpenFlags.OpenExistingOnly);
X509Certificate2Collection certificates = store.Certificates
    .Find(X509FindType.FindByKeyUsage, X509KeyUsageFlags.DigitalSignature, false)
    .Find(X509FindType.FindBySubjectName, keyName, false);

The Packaging API uses the PackageDigitalSignatureManager class as the façade for working with signatures in the Office Open XML Formats. You can use this class to sign documents and validate their signatures. The constructor is provided with a reference to the document.

PackageDigitalSignatureManager dsm = new PackageDigitalSignatureManager(package);
dsm.CertificateOption = CertificateEmbeddingOption.InSignaturePart;
dsm.Sign(toSign, certificates[0]);

Validating Signatures in Office Word 2007

After the medical records document is signed using the Open Packaging API, you can validate the authenticity of the document in Office Word. This application also allows you to sign a document, normally when used on the client side. A signature is automatically validated when you open the document.

The first indication that a signature is present is the disabled user interface. Because editing the document invalidates all signatures, all controls dealing with editing are disabled.

Figure 16. The user interface is disabled for digitally signed documents

Disabled UI for digitally signed documents

When a signature is invalid, the immediate feedback that you receive is an extra warning pane displayed across the screen.

Figure 17. Warning when the digital signature is invalid

Warning when the digital signature is invalid

You can display all the signatures in the document by using the Signatures task pane. To open this task pane, click the Microsoft Office Button, and then point to the Prepare group. You can remove each signature displayed on the signature task pane by using a menu on the item in the list.

Figure 18. Viewing the signature pane

Viewing the signature pane

Verifying the Signature Using the Open Packaging Conventions

Besides using the Packaging API to sign documents, you can also use it to validate signatures. The PackageDigitalSignatureManager class facilitates this also. The VerifySignatures method returns VerifyResult.Success when all signatures are valid. You can validate an individual signature by first retrieving it using the signature manager.

PackageDigitalSignatureManager dsm = new PackageDigitalSignatureManager(package);
VerifyResult result = dsm.VerifySignatures(false);

Conclusion

This article examined how Open XML file formats can benefit the health industry worldwide by providing an open and standardized approach for storing document content and by providing a model for making business data available inside the document. The increase in data portability opens new possibilities for creating custom solutions. Being a well-documented standard and given the many great resources available, these solutions will be easier to build and maintain. The use of industry standards such as Health Level Seven (HL7) Clinical Document Architecture (CDA) is just one sample of where Open XML can provide benefit to the industry. Many others, such as insurance or government agencies, can equally benefit from working with Open XML, creating an environment where document exchange is better facilitated and more transparent.

About the Authors

  • Ted Pattison is an author, trainer, and SharePoint MVP who lives in Tampa, Florida. Ted is a long-time columnist for MSDN Magazine. His column, called Office Space, focuses on Microsoft Office SharePoint Products and Technologies and Microsoft Office development. Ted has also just finished writing a new book for Microsoft Press entitled Inside Windows SharePoint Services 3.0. Ted Pattison and several other SharePoint MVPs deliver advanced SharePoint 2007 Training to professional developers through his company, Ted Pattison Group.

  • Chris Predeek is an independent consultant and trainer for Developmentor who has been working with Office Open XML since 2005. His software development career started in the late 90s as a Microsoft C++ developer for the Windows operating system.  He used Microsoft technologies through COM, COM+, and finally to the Microsoft .NET Framework. Recently he specializes in the 2007 Office system, Office SharePoint Server 2007, and Microsoft BizTalk 2006.

  • Wouter van Vugt lives in the Netherlands. He works for a leading IT consultancy company, Info Support. He first started as a consultant and now a trainer in technologies based on the.NET Framework including Microsoft Visual C#, Microsoft ASP.NET and SharePoint Products and Technologies. Blog: Read more from Wouter.