Export (0) Print
Expand All

What's New in the Open XML SDK 2.0 for Microsoft Office

This topic describes the new and improved features included in the Open XML SDK 2.0 for Microsoft Office.

The Open XML SDK 2.0 for Microsoft Office is a collection of classes that let you create and manipulate Open XML documents – documents that adhere to the Office Open XML File Formats Standard. Because the SDK provides an application program interface that lets you manipulate Open XML documents directly, you can do so without the need for the Office client products themselves in both client and server operating environments. The SDK is designed to let you build high performance client-side or server-side solutions that perform complex operations using only a small amount of program code.

This release of the SDK greatly extends support for the file formats and does so at a granular level, while also adding new features such as support for stream reading and writing, markup compatibility features, autosave, document validation checking and a unified developer productivity tool.

Strongly Typed Class Support for Office Markup Languages

The first release of the SDK supported dealing with documents at the part level. To work with the XML inside a part, you had to employ programmatic means outside of the SDK. With this release, the SDK supports working at both the part level and the document markup language level with strongly-typed classes. This means you do not have to manipulate the XML markup directly if you do not want to, and you do not have to be aware of namespaces and element/attribute names. Every element in the Office Open XML File Format standard has a corresponding class in the SDK and the classes are designed to facilitate their use. For example, a Paragraph class has a first class property called ParagraphProperties that provides direct access to the properties of the paragraph. Thus, you can use the SDK classes which then read and write the appropriate XML markup.

For example, suppose I want to create a Word document that contains the text “Hello World!” The following code example shows XML markup for the document (with some text excluded).

<w:document ... >
  <w:body>
    <w:p>
      <w:r>
        <w:t>Hello World!</w:t>
      </w:r>
    </w:p>
...
    </w:body>
</w:document> 

The following code example shows a HelloWorld method that creates this WordprocessingML document.

public void HelloWorld(string docName) 
{
    // Create a Wordprocessing document. 
    using (WordprocessingDocument package =
        WordprocessingDocument.Create(docName,
            WordprocessingDocumentType.Document)) 
    {
        // Add a new main document part. 
        package.AddMainDocumentPart(); 

        // Create the Document DOM. 
        package.MainDocumentPart.Document = 
          new Document( 
            new Body( 
              new Paragraph( 
                new Run( 
                  new Text("Hello World!"))))); 
    } 
}

Notice that the WordprocessingML elements used in the example are represented by classes, as listed in the following table.

WordprocessingML Element

Open XML SDK 2.0 Class

document

Document

body

Body

p

Paragraph

r

Run

t

Text

The same granular level of support exists for each of the three major markup languages: WordprocessingML, SpreadsheetML, and PresentationML.

Thus, this release of the Open XML SDK 2.0 handles the structure of the Office Open XML File Formats in terms of parts and their relationships, as well as the XML contained in each of the parts in terms of classes and properties that correspond to markup language elements and attributes.

Layered Architecture

The Open XML SDK 2.0 is designed and implemented using a layered approach that, from a logical design point of view, starts from a base layer and builds up towards higher level functionality. The base or system support layer includes the fundamental components that the SDK is built upon, such as .NET Framework 3.5, System.IO.Packaging, and the Open XML File Format Schemas that are specified in the ECMA 376 and ISO/IEC 29500 standards.

The next higher level, the Office Open XML File Format Layer, supports new stream reading and writing features, an Open XML packaging API that provides strongly-typed classes built on top of the System.IO.Packaging component, and the new Open XML low-level document object model that lets you work with an Open XML markup language using strongly-typed classes. This layer now includes support for both the Office 2007 and Office 2010 document file formats. For example, new Part level objects have been added to accommodate new parts added to the file format, in addition to classes for new XML elements.

Finally, the next higher level, the higher level services layer, supports new document validation features. The following table summarizes the layered architecture.

Level

Description

Higher Level Services

This layer supports document validation against the Office 2007 and Office 2010 file formats.

Office Open XML File Format Layer

This layer includes the stream reading and writing support, the packaging API, and the Open XML low level document object model strongly-typed classes.

System Support

This layer represents the fundamental building blocks that include .NET Framework 3.5, System.IO.Packaging, and the Office Open XML File Format Schema.

The layered approach to the SDK means it is extensible. Customers can extend the library to fit their needs.

Support for LINQ to XML Technology

New to this release is support for .NET Language-Integrated Query (LINQ) technology in the form of LINQ to XML. This is designed to make it easy for you to query a document for all elements or objects of a certain type, for example.

The new Open XML SDK 2.0 Productivity Tool for Microsoft Office provides a number of features designed to improve your productivity while working with the SDK and Open XML files.

Using the Productivity Tool, you can generate Open XML SDK 2.0 source code based on document content. The source code can be used to regenerate all or part of the document. You can also use the tool to compare source and target Open XML documents to highlight the differences. You can reveal differences in part structure as well as content differences. Based on those differences, you can generate source code that employs the Open XML SDK 2.0 to generate the target document from the source.

Also supported are validation features. You can validate an entire document, a specific document part, or a segment of content against the Office 2007 or Office 2010 file formats. In addition, you can display API documentation for the classes, methods and properties in the Open XML SDK, as well as the relevant corresponding information from the ISO/IEC 29500 Open XML File Formats standard and the Microsoft Office implementer notes. All of this information is available at-a-glance in a tabbed interface.

The ISO/IEC 29500 Office Open XML File Formats specification discusses markup compatibility and extensibility in part 3. Markup compatibility is the ability for a document expressed in one of the Office Open XML markup languages to facilitate interoperability between applications, or versions of an application, with different feature sets. The specification defines XML attributes to express compatibility rules, and XML elements to specify alternate content. For example, the Ignorable attribute specifies namespaces that can be ignored when they are not understood by the consuming application. Alternate-Content elements specify markup alternatives that can be chosen by an application at run time. For example, Office Word 2007 can choose only the markup alternative that it recognizes. The complete list of compatibility-rule attributes and alternate-content elements and their details can be found in the specification.

The Open XML SDK 2.0 now supports markup compatibility in accordance with the preprocessing model described in ISO/IEC 29500 Part 3.13. This means you can specify that the SDK consider a document to be Office 2007 or Office 2010 compatible, and it will do the work of removing elements and attributes from non-understood namespaces, and processing markup compatibility elements appropriately.

For example, the following code example opens a word processing document and displays only Office 2007 content.

OpenSettings openSettings = new OpenSettings();

openSettings.MarkupCompatibilityProcessSettings = 
    new MarkupCompatibilityProcessSettings(
        MarkupCompatibilityProcessMode.ProcessLoadedPartsOnly,
        FileFormatVersions.Office2007);

using (WordprocessingDocument wordDoc = 
    WordprocessingDocument.Open(
        @"c:\temp\test.docx", 
        true, 
        openSettings))
{
    Run r = 
        wordDoc.MainDocumentPart.Document.Descendants<Run>().First();

    if (r != null)
        Console.WriteLine(r.OuterXml);
}

New to this release is a stream model for accessing XML content in a document part. For applications that have strict memory requirements and that load large Open XML documents, the stream model lets you avoid having to load an entire part into memory. For example, using the OpenXmlReader class, you can write code to access the XML content in a part in a forward reading fashion without having to load the part entirely into memory first.

When you use code to manipulate Open XML files, it is possible that you may introduce file format errors. Typically, these have to do either with the structure of the XML itself, or with content that does not conform to the documentation of the Open XML standard. This SDK provides new functionality that enables you to identify these types of errors.

You can use the new classes in the DocumentFormat.OpenXml.Validation namespace to validate and find errors in your documents. For example, you can use the Open XML system support layer to inject XML directly into parts in your document, but that will not guarantee the production of valid, compliant Open XML files. You can now check whether the full or partial content in a document is valid using the OpenXmlValidator class. This class enables you to enumerate all of the errors within a file, where each error is represented by a ValidationErrorInfo object. From there you can get a description of the error, an XPath to the exact location of the error, and the part where the error exists.

You can validate a full document, a package, a part in the package or a section of content represented by an element.

The following code example shows how to use the new validation features.

try 
{ 
    OpenXmlValidator validator = new OpenXmlValidator(); 
    int count = 0; 
    foreach (ValidationErrorInfo error in 
        validator.Validate(
            WordprocessingDocument.Open("InvalidFile.docx", true))) 
    { 
        count++; 
        Console.WriteLine("Error " + count); 
        Console.WriteLine("Description: " + error.Description); 
        Console.WriteLine("Path: " + error.Path.XPath); 
        Console.WriteLine("Part: " + error.Part.Uri); 
        Console.WriteLine("-------------------------------------------"); 
    } 
    Console.ReadKey(); 
} 
catch (Exception ex) 
{ 
    Console.WriteLine(ex.Message); 
}

You can also specify that the validator should validate the content based on known Office 2007 rules or Office 2010 rules. The setting is configurable in the validator constructor.

The Open XML SDK 2.0 now supports an AutoSave feature that automatically saves changes in the document object model back to the file stream. This makes most calls to a Save method unnecessary provided that you employ a Using statement within your code. The AutoSave setting is enabled by default and can be optionally turned off.

For more information about the Open XML SDK, see the Microsoft Connect site.

Community Additions

ADD
Show:
© 2014 Microsoft