Creating Documents by Using the Open XML Format SDK Version 2.0 CTP (Part 1 of 3)

Summary: Explore the architecture and advantages of using the Open XML Format SDK 2.0 (CTP). Additionally, see in-depth scenarios of how to assemble and manipulate data in Microsoft Office Excel 2007 workbooks, Microsoft Office PowerPoint 2007 presentations, and Microsoft Office Word 2007 documents by using the Open XML Format SDK APIs. (8 printed pages)

Zeyad Rajabi, Microsoft Corporation

Frank Rice, Microsoft Corporation

February 2009

Applies to: Microsoft Office Excel 2007, Microsoft Office PowerPoint 2007, Microsoft Office Word 2007

Contents

  • Introduction to This Series of Articles

  • What is the Open XML Format SDK?

  • Why Use the Open XML Format SDK?

  • Open XML Format SDK Limitations

  • Open XML Format SDK Roadmap

  • Scenarios Targeted by the Open XML Format SDK

  • Introducing the Open XML Format SDK Architecture

  • Summary

  • Additional Resources

Introduction to This Series of Articles

This series of articles explores the overall design of the Open XML Format SDK 2.0 Community Technical Preview (CTP) with respect to goals and scenarios. Creating Documents by Using the Open XML Format SDK Version 2.0 CTP (Part 1 of 3) dives deeply into the architecture of the Open XML Format SDK. Creating Documents by Using the Open XML Format SDK 2.0 CTP (Part 2 of 3) and Creating Documents by Using the Open XML Format SDK 2.0 CTP (Part 3 of 3) present common scenario and show you lots of sample code.

You can access the Open XML Format SDK 2.0 (CTP) from the following locations:

What is the Open XML Format SDK?

The Open XML Format SDK provides a set of Microsoft .NET Framework application programming interfaces (APIs) that allows you to create and manipulate documents in both client and server environments by using the Open XML Formats without requiring Microsoft Office client applications. The SDK makes it easier for you to build solutions using the Open XML Format by allowing you to perform complex operations, such as creating Open XML Format packages or adding and deleting tables, with just a few lines of code. For example, look at the following code sample that creates a basic WordprocessingML document and inserts the text Hello World into it:

public void HelloWorld(string docName) 
{
  // Create a Wordprocessing document. 
  using (WordprocessingDocument package = WordprocessingDocument.Create(docName, WordprocessingDocumentType.Document)) 
  {
    // Add a new main document part. 
    package.AddMainDocumentPart(); 

    // Create the Document DOM. 
    package.MainDocumentPart.Document = 
      new Document( 
        new Body( 
          new Paragraph( 
            new Run( 
              new Text("Hello World!"))))); 

    // Save changes to the main document part. 
    package.MainDocumentPart.Document.Save(); 
  } 
}

The code sample provides the structure of the Open XML Format and the XML contained in each of the documents parts of the package. With this SDK, you can add or remove document parts within a package and manipulate XML constructs, such as paragraphs and tables.

The SDK also supports programming in the style of LINQ to XML, which makes coding against XML content much easier than the traditional W3C XML Document Object Model (DOM) programming model.

Why Use the Open XML Format SDK?

Using the Open XML Format SDK 1.0 to create solutions that manipulate documents directly has many advantages compared with automating Microsoft Office applications by using macros or Microsoft Visual Basic for Applications (VBA). For more information about the considerations of automating Microsoft Office applications on the server, see the following Microsoft Knowledge Base article: Considerations for server-side Automation of Office.

The major advantage to using the Open XML Format SDK is that it is fully supported on the server, unlike automating Microsoft Office applications. That means you can create managed code solutions that are scalable and stable on the server. Imagine being able to write multi-threaded solutions that build on top of the SDK.

In addition, there is a huge performance advantage when developing solutions with the Open XML Format SDK, which is evident when dealing with large numbers of documents. You can programmatically generate thousands of documents based on data from a database within a matter of seconds rather than hours.

Finally, the Open XML Format SDK is a dedicated file format API that specializes in the manipulation and creation of Open XML Format packages. The SDK is fully compliant to the structure and schema of Open XML Formats.

Open XML Format SDK Limitations

Before getting into the design of the Open XML Format SDK 1.0, there are a few key limitations. The Open XML Format SDK:

  • Is not a replacement for the Microsoft Office Object Model and provides no abstraction on top of the file formats. You still need to understand the structure of the file formats to use the Open XML Format SDK.

  • Does not provide functionality to convert Open XML Formats to and from other formats, like HTML or XPS.

  • Does not guarantee document validity of Open XML Formats when you use the SDK or if you choose to manipulate the underlying XML directly.

  • Does not provide application behaviors such as layout (for example, pagination of WordprocessingML documents) or recalculation functionality.

Open XML Format SDK Roadmap

There are currently two releases of the Open XML Format SDK:

The Open XML Format SDK 1.0 addresses the structure of Open XML Formats, while the Open XML Format SDK 2.0 addresses the XML contained within each of the document parts. Open XML Format SDK 1.0 was published in June 2008 with a license that includes the right to use and distribute. With this license, you can build and deploy production solutions by using the Open XML Format SDK. Open XML Format SDK 2.0 (CTP) was published in September of 2008. This version of the Open XML Format SDK is pre-release software, so the APIs are currently under development. The license includes the right to use but not to distribute. For this reason, you should not use information in this SDK in deployed or production solutions.

Scenarios Targeted by the Open XML Format SDK

The Open XML Format SDK, in all versions, is intended for developers that understand the Open XML Format standard and are quite comfortable manipulating and creating Open XML Format files. The Open XML Format SDK targets the following core scenarios:

Strongly-Typed Classes and Objects

Instead of relying on generic functionality to manipulate XML where you need to adhere to the spelling of elements, attributes, values, and namespaces, you can use the Open XML Format SDK to accomplish the same solutions by manipulating objects that represent those object model members. All schema types are represented as strongly-typed common language runtime (CLR) classes and all attribute values are represented as enumerations. You do not need to always refer to the Open XML Format standard and Open XML Format schemas for hierarchy, spelling and namespace; instead you can use the Microsoft .NET Framework Intellisense capabilities to develop solutions faster and more reliably.

Content Construction, Search, and Manipulation

By using the Open XML Format SDK, you can use LINQ to XML technology because it is built directly into the SDK. With the Open XML Format SDK, you perform functional constructs and lambda expression queries directly on objects representing Open XML Format elements. In addition, the Open XML Format SDK allows you to traverse and manipulate content easily by providing support of collections of objects such as tables and paragraphs.

Validation

You can specify the version of the Open XML Format SDK to target, and the API considers this during validation.

Markup Language Specific Scenario

With the Open XML Format SDK, you can perform a variety of tasks on all types of Open XML Format packages and Open XML Format markup languages. For example, you can construct tables with dynamic data in a WordprocessingML document, extract and analyze data in a SpreadsheetML workbook, search and report noncompliant content in a PresentationML presentation, or change shape colors in DrawingML.

Introducing the Open XML Format SDK Architecture

The Open XML Format SDK is implemented using a layered approach, starting from a base layer moving towards higher-level functionality, such as validation. The following diagram illustrates an overview of the Open XML Format SDK components.

Figure 1. The Open XML Format SDK component architecture

Open XML Format SDK component architecture

The System Support layer contains the fundamental components upon which the Open XML Format SDK is built. The Open XML File Format Base Level layer is the core foundation of the Open XML Format SDK. This layer gives you the functionality to create Open XML Format packages, add or remove document parts, and read, add, remove, or manipulate XML elements and attributes. The Open XML File Format Higher-Level layer is the last layer in the architecture. This layer provides functionality to make it easier for you to code against Open XML Formats. For example, this layer could contain schema-level and semantic-level validation to help assist you in generating proper and valid Open XML Format files. These layers and components are described in the following sections. Note that the Open XML Format SDK 1.0 only provides the Open XML Packaging API component, whereas the Open XML Format SDK 2.0 provides all components built on top of the System Support layer.

System Support Layer

Figure 2. The Open XML Format SDK System Support layer

Open XML Format SDK System Support layer

The System Support layer consists of the following components:

  • .NET Framework 3.5. The Open XML Format SDK uses the advanced technology provided by .NET Framework 3.5, especially LINQ to XML, which makes manipulating XML much easier and more intuitive.

  • System.IO.Packaginghttp://msdn.microsoft.com/en-us/library/system.io.packaging.aspx. The Open XML Format SDK adds and removes parts contained within Open XML Format packages. Included as part of .NET Framework 3.0 are a set of generic packaging APIs capable of adding removing parts of Open Package Convention (OPC) conforming packages. Because Open XML Formats are based on OPC, the Open XML Format SDK uses System.IO.Packaging APIs to open, edit, create, and save Open XML Format packages.

  • Open XML Schemas. The Open XML Format SDK is based on Open XML Formats, which are represented and described as schemas. These schemas form the foundation of the Open XML Format SDK. Currently the Open XML Format SDK is based on Standard Ecma-376.

Open XML File Format Base Level Layer

Figure 3. The Open XML File Format Base Level layer

Open XML File Format Base Level layer

The Open XML File Format Base Level layer provides a platform for you to create Open XML Format solutions and consists of the following components:

  • Open XML Packaging API. This component is built on the .NET Framework 3.0 System.IO.Packaging component. Instead of providing generic access to the document parts contained in the Open XML Format package, this component allows you to manipulate Open XML Format parts with strongly-typed classes and objects. This component was included as part of the Open XML Format SDK 1.0. The following sample code illustrates using this component to open and manipulate a WordprocessingML document.

    //Open and manipulate temp.docx. 
    
    using (WordprocessingDocument myDoc = WordprocessingDocument.Open("temp.docx", true)) 
    { 
       //Access main part of document. 
       MainDocumentPart mainPart = myDoc.MainDocumentPart; 
       //Add new comments part to document. 
       mainPart.AddNewPart<WordprocessingCommentsPart>(); 
       //Delete Styles part within document. 
       mainPart.DeletePart(mainPart.StyleDefinitionsPart); 
       //Iterate through all custom xml parts within document. 
       foreach (CustomXmlPart customXmlPart in mainPart.CustomXmlParts) 
       { 
          //Do something. 
       } 
    }
    
  • Open XML Low-Level DOM. This component represents the XML wrapper of the Open XML Format schemas. You can use this component to manipulate the Open XML Format tree directly by working with strongly-typed objects and classes instead of traditional XML nodes that require you to be aware of namespaces and element and attribute names. The major advantage of having strongly-typed classes and objects is that you can easily see what properties are defined on a given class through Intellisense. For example, you can quickly identify what properties and children can exist off of a Paragraph object. This component uses many LINQ designs. The following sample code illustrates using this component to create a WordprocessingML document with the text Hello World!

    // Create a Wordprocessing document. 
    using (WordprocessingDocument myDoc = WordprocessingDocument.Create(docName, WordprocessingDocumentType.Document)) 
    { 
       // Add a new main document part. 
       MainDocumentPart mainPart = myDoc.AddMainDocumentPart(); 
       //Create DOM tree for simple document. 
       mainPart.Document = new Document(); 
       Body body = new Body(); 
       Paragraph p = new Paragraph(); 
       Run r = new Run(); 
       Text t = new Text("Hello World!"); 
       //Append elements appropriately. 
       r.Append(t); 
       p.Append(r); 
       body.Append(p); 
       mainPart.Document.Append(body); 
       // Save changes to the main document part. 
       mainPart.Document.Save(); 
    }
    
  • Stream Reading/Writing. This component includes stream reader and writer interfaces specifically targeting Open XML Format elements and attributes. The readers and writers function similarly to the XmlReader class and XmlWriter class, but are easier to use because the interfaces are Open XML Format aware.

All the components mentioned previously are available in the September CTP of the Open XML Format SDK 2.0. The following two layers are still under development.

Open XML File Format Higher-Level Layer

Figure 4. The Open XML File Format Higher-Level layer

Open XML File Format Higher-Level layer

Note

This layer is still under development.

The Open XML File Format Higher-Level layer provides functionality to help you debug and validate Open XML Format files. To accomplish this, the layer provides the following components:

  • Schema-level Validation. Manipulating Open XML Formats by using the Open XML Base layer makes it easier for you to work on Open XML Format files. This component assists in debugging and validating Open XML Format documents based on the Open XML Format schemas.

  • Additional Semantic Validation. This component is similar to the Schema Level Validation component except that it provides additional information based on semantic and syntax constraints as defined by the Open XML Format standard. For example, for a comment to work as expected within a WordprocessingML document, you must define the comment in the Comments part and mark it appropriately in the main document story, otherwise the comment is ignored.

Because this type of information cannot be represented in XSD files, you must use prose within the standard. With this layer, you can use code in the Open XML Format SDK 2.0 to perform this manual work.

Helper Function Layer

Figure 5. The Helper Function layer

Helper Function layer

Note

This layer is still under development.

As with the Open XML File Format Higher-Level layer, the Helper Function layer is still under development. However, when implemented, this layer provides helper functions or code examples to simplify creating valid Open XML Format files. Certain operations within Open XML Format can be complex, for example, deleting a paragraph in a WordprocessingML document is not simply just deleting the paragraph node. There are a variety of extra steps required to delete a paragraph and maintain the integrity of a valid Open XML Format document.

With this in mind, the updated Open XML Format SDK 2.0 (CTP) provides higher-level helper functions or code examples to address common complex file format operations. These helper functions modify the appropriate XML, document part, and relationship when performing complex tasks. They do not abstract away from the actual XML itself, but rather perform operations on the XML elements by using the validation awareness. For example, a potential helper function for deleting a WordprocessingML paragraph performs this delete operation and the required steps to clean the resulting XML to ensure validity. You can apply these delete helper functions to other elements that are hard to delete, such as tables and comments. You can use these higher-level functions directly on the XML elements and the file format standard itself constrains their functionality.

Summary

Now that you are familiar with the goals and architecture of the Open XML Format SDK 2.0 (CTP), Creating Documents by Using the Open XML Format SDK 2.0 CTP (Part 2 of 3) and Creating Documents by Using the Open XML Format SDK 2.0 CTP (Part 3 of 3) explore common solutions where you can use the Open XML Format SDK 2.0 APIs with Excel 2007, PowerPoint 2007, and Word 2007 to build documents, work with data sources, and take advantage of content controls.

Additional Resources

For more information about the Open XML Format SDK see the following resources: