Extracting Styles from Word 2010 Documents by Using the Open XML SDK 2.0

Office Visual How To

Summary:  Use the strongly typed classes in the Open XML SDK 2.0 to retrieve an XDocument instance that contains the styles or stylesWithEffects part from a Microsoft Word document, without loading the document into Word.

Applies to: Excel 2010 | Office 2010 | Office client | Open XML | PowerPoint 2010 | VBA | Word 2010

Published:  February 2011

Provided by:  Ken Getz, MCW Technologies, LLC

Overview

The Open XML file formats make it possible to retrieve blocks of content from Microsoft Word documents, but doing so requires some effort. The Open XML SDK 2.0 adds strongly typed classes that simplify access to the Open XML file formats: The SDK simplifies the tasks of retrieving, for example, an XDocument instance that contains the document styles or stylesWithEffects part. Given the XML content, you could archive the information, modify and reapply it, or apply it to a new document. The code sample that is included with this Visual How To shows how to the use the SDK to retrieve the styles or stylesWithEffects part─pushing the styles into a new document is not covered in this code sample.

Code It

The sample provided with this Visual How To includes the code necessary to retrieve an XDocument instance that contains the styles or stylesWithEffects part for a Word 2007 or Word 2010 document. The following sections explain the code, in detail. You should note that in a document created in Word 2007, there will only be a single styles part; Word 2010 adds a second stylesWithEffects part. To provide for relaying a document from Word 2010 to Word 2007 and back, Word 2010 maintains both the original styles part and the new styles part. The Open XML specification requires that Microsoft Word disregard parts that it does not recognize; Word 2007 does not notice the stylesWithEffects part that Word 2010 adds to the document. Your application must interpret the results of retrieving the styles or stylesWithEffects part.

Setting Up References

To use the code from the Open XML SDK 2.0, you must add a few references to your project. The sample project already includes these references, but in your own code, you would need to explicitly reference the following assemblies:

  • WindowsBase─This reference may be set for you, depending on the type of project you create.

  • DocumentFormat.OpenXml─Installed by the Open XML SDK 2.0.

Also, you should add the following using/Imports statements to the top of your code file.

Imports System.IO
Imports System.Xml
Imports DocumentFormat.OpenXml.Packaging
using System;
using System.IO;
using System.Xml;
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;

Examining the Procedure

The WDExtractStyles procedure accepts two parameters: the first parameter contains a string indicating the path to the file to modify, and the second indicates whether you want to retrieve the styles part, or the newer stylesWithEffects part (in essence, you will need to call this procedure twice for Word 2010 documents, retrieving each of the parts). The procedure returns an XDocument instance that contains the complete styles or stylesWithEffects part that you requested, with all the style information for the document (or a null reference, if the part you requested does not exist).

Public Function WDExtractStyles(
  ByVal fileName As String,
  Optional ByVal getStylesWithEffectsPart As Boolean = True) 
  As XDocument
Public static XDocument WDExtractStyles(
  string fileName, 
  bool getStylesWithEffectsPart = true)

The procedure examines the document you specify, looking for the specified styles or stylesWithEffects part. If it exists, the procedure returns it, as an XDocument instance. To call the procedure, pass the parameter values, as shown in the example code. Be sure you provide a document named C:\temp\StylesFrom.docx, which (for demonstration purposes) contains some sample-modified styles, before running the sample code.

Dim styles = WDExtractStyles("C:\Temp\StylesFrom.docx", True)
If styles IsNot Nothing Then
  Console.WriteLine(styles.ToString())
End If
var styles = WDExtractStyles(@"C:\temp\StylesFrom.docx", true);
if (styles != null)
  Console.WriteLine(styles.ToString());

Accessing the Document

The code starts by creating a variable named styles that the procedure returns before it exits.

Dim styles As XDocument = Nothing
' Code removed here…
Return styles
XDocument styles = null;
// Code removed here…
return styles;

The code continues by opening the document, using the WordprocessingDocument.Open method and indicating that the document should be open for read-only access (the final false parameter). Given the open document, the code uses the MainDocumentPart property to navigate to the main document, and then prepares a variable named stylesPart to hold a reference to the styles part.

Using document = WordprocessingDocument.Open(fileName, False)
  Dim docPart = document.MainDocumentPart
  Dim stylesPart As StylesPart = Nothing
  ' Code removed here…
End Using
using (var document = WordprocessingDocument.Open(fileName, false))
{
  var docPart = document.MainDocumentPart;

  var stylesPart = null;
  // Code removed here…
}

Finding the Correct Styles Part

The code next retrieves a reference to the requested styles part, using the setStylesWithEffectsPart Boolean parameter. Based on this value, the code retrieves a specific property of the docPart variable, and stores it in the stylesPart variable

If getStylesWithEffectsPart Then
  stylesPart = docPart.StylesWithEffectsPart
Else
  stylesPart = docPart.StyleDefinitionsPart
End If
if (getStylesWithEffectsPart)
  stylesPart = docPart.StylesWithEffectsPart;
else
  stylesPart = docPart.StyleDefinitionsPart;

Retrieving the Part Contents

If the requested styles part exists, the code must return the entire contents of the part in an XDocument instance. Each part provides a GetStream method, which returns a Stream. The code passes the Stream instance to the XmlNodeReader.Create method, and then calls the XDocument.Load method, passing the XmlNodeReader as a parameter. This sequence of steps is common when working with XDocuments and Open XML parts. You will see it used often.

If stylesPart IsNot Nothing Then
  Using reader = XmlNodeReader.Create(
    stylesPart.GetStream(FileMode.Open, FileAccess.Read))
    ' Create the XDocument:
    styles = XDocument.Load(reader)
  End Using
End If
if (stylesPart != null)
{
  using (var reader = XmlNodeReader.Create(
    stylesPart.GetStream(FileMode.Open, FileAccess.Read)))
  {
    // Create the XDocument:
    styles = XDocument.Load(reader);
  }
}

Sample Procedure

The following code example contains the complete sample procedure.

Public Function WDExtractStyles(
  ByVal fileName As String,
  Optional ByVal getStylesWithEffectsPart As Boolean = True) 
  As XDocument

  Dim styles As XDocument = Nothing

  Using document = WordprocessingDocument.Open(fileName, False)
    Dim docPart = document.MainDocumentPart
    Dim stylesPart As StylesPart = Nothing

    If getStylesWithEffectsPart Then
      stylesPart = docPart.StylesWithEffectsPart
    Else
      stylesPart = docPart.StyleDefinitionsPart
    End If
    If stylesPart IsNot Nothing Then
      Using reader = XmlNodeReader.Create(
        stylesPart.GetStream(FileMode.Open, FileAccess.Read))
        ' Create the XDocument:
        styles = XDocument.Load(reader)
      End Using
    End If
  End Using
  Return styles
End Function
public static XDocument WDExtractStyles(
  string fileName, 
  bool getStylesWithEffectsPart = true)
{
  XDocument styles = null;
 
  using (var document = WordprocessingDocument.Open(fileName, false))
  {
    var docPart = document.MainDocumentPart;
 
    StylesPart stylesPart = null;
    if (getStylesWithEffectsPart)
      stylesPart = docPart.StylesWithEffectsPart;
    else
      stylesPart = docPart.StyleDefinitionsPart;
 
    if (stylesPart != null)
    {
      using (var reader = XmlNodeReader.Create(
        stylesPart.GetStream(FileMode.Open, FileAccess.Read)))
      {
        // Create the XDocument:
        styles = XDocument.Load(reader);
      }
    }
  }
  return styles;
}
Read It

The sample included with this Visual How To shows code that retrieves the styles or stylesWithEffects part from a Word document. To use the sample, install the Open XML SDK 2.0, available from the link listed in the Explore It section. The sample also uses a modified version of code included as part of a set of code snippets for the Open XML SDK 2.0. The Explore It section also includes a link to the full set of snippets, although you can use the sample without downloading and installing the snippets.

The sample application demonstrates only a handful of the available properties and methods provided by the Open XML SDK 2.0 that you can interact with when modifying document structure. For more information, see the documentation that comes with the Open XML SDK 2.0 Productivity Tool: Click the Open XML SDK Documentation tab in the lower-left corner of the application window, and search for the class you need to study. Although the documentation does not always include code examples, given the sample shown here and the documentation, you should be able to successfully modify the sample application.

See It

Watch the video

> [!VIDEO https://www.microsoft.com/en-us/videoplayer/embed/436ec7c9-ba0d-4b79-90e9-c2817ace3d4d]

Length: 00:07:52

Click to grab code

Grab the Code

Explore It

About the Author

Ken Getz is a senior consultant with MCW Technologies. He is coauthor of ASP.NET Developers Jumpstart (Addison-Wesley, 2002), Access Developer's Handbook (Sybex, 2001), and VBA Developer's Handbook, 2nd Edition (Sybex, 2001).