Retrieving Core Properties from Word 2010 Documents by Using the Open XML SDK 2.0

Office Visual How To

Summary:  Use strongly typed classes in the Open XML SDK 2.0 for Microsoft Office to retrieve core document properties in a Microsoft Office Word 2007 or Microsoft Word 2010 document, without loading the document into Microsoft Word.

Applies to: Excel 2010 | Office 2007 | Office 2010 | Open XML | PowerPoint 2010 | VBA | Visual Studio | Word 2007 | Word 2010

Published:  February 2012

Provided by:   Ken Getz

Overview

The Open XML file formats enable you to retrieve core document properties from a Microsoft Office Word 2007 or Microsoft Word 2010 document. The Open XML SDK 2.0 adds strongly typed classes to simplify access to the Open XML file formats. The SDK also simplifies the retrieval of core document properties, and the code sample that is included with this article describes how to use the SDK to retrieve core document properties in a Office Word 2007 or Word 2010 document.

To use the code sample, install the Open XML SDK 2.0 by using the link that is listed in the Explore It section. The code sample is modified from code that is included as part of a set of code examples for the Open XML SDK 2.0. The Explore It section also includes a link to the full set of code examples, although you can use the code sample without downloading and installing the code examples. The sample application retrieves core document properties (that is, properties provided for all Office documents) in a document that you supply.

Code It

The code sample that accompanies this article includes the code that is required to retrieve core document properties in a Office Word 2007 or Word 2010 document.

Setting up references

To use the code from the Open XML SDK 2.0, you must add several references to your project. The sample project includes these references, but in your own code, you must explicitly reference the following assemblies:

  • WindowsBase. This reference might be set for you, depending on the kind of project that you create.

  • DocumentFormat.OpenXml. Installed by the Open XML SDK 2.0.

You should also add the following using or Imports statements to the top of your code file.

Imports DocumentFormat.OpenXml.Packagingv
using System;
using System.Collections.Generic;
using DocumentFormat.OpenXml.Packaging;

Retrieving core properties

Because of the power of the Open XML SDK 2.0, retrieving core document properties is so simple that you do not have to call a special helper procedure. You can just retrieve the PackageProperties property of a WordProcessingDocument object, and then retrieve the specific core property that you need. First, set up a reference to the document.

Private Const FILENAME As String = "DocumentProperties.docx"

Using document As WordprocessingDocument =
  WordprocessingDocument.Open(FILENAME, True)
  ' Code removed here...
End Using
const string FILENAME = "DocumentProperties.docx";

using (WordprocessingDocument document = 
 WordprocessingDocument.Open(FILENAME, false))
{
  // Code removed here...
}

Given the reference to the WordProcessingDocument object, the code can retrieve a reference to the PackageProperties property of the document. This object provides its own properties, each of which exposes one of the core document properties.

Dim props = document.PackageProperties.Properties
var props = document.PackageProperties.Properties;

Given the reference to the PackageProperties, the code can then retrieve any of the core properties by using the code in the following example.

Console.WriteLine("Creator = " & props.Creator)
Console.WriteLine("Created = " & props.Created)
Console.WriteLine("Title = " & props.Title)
Console.WriteLine("Creator = " + props.Creator);
Console.WriteLine("Created = " + props.Created);
Console.WriteLine("Title = " + props.Title);

Sample procedure

The sample includes the following code.

Private Const FILENAME As String = "I:\Samples\DocumentProperties.docx"

Sub Main()
  Using document As WordprocessingDocument =
    WordprocessingDocument.Open(FILENAME, True)
    Dim props = document.PackageProperties

    Console.WriteLine("Creator = " & props.Creator)
    Console.WriteLine("Created = " & props.Created)
    Console.WriteLine("Title = " & props.Title)
  End Using
End Sub
const string FILENAME = @"I:\Samples\DocumentProperties.docx";
static void Main(string[] args)
{
  using (WordprocessingDocument document = 
    WordprocessingDocument.Open(FILENAME, false))
  {
    var props = document.PackageProperties;
    Console.WriteLine("Creator = " + props.Creator);
    Console.WriteLine("Created = " + props.Created);
    Console.WriteLine("Title = " + props.Title);
  }
}
Read It

It is important to realize that the PackageProperties class provides a group of properties that define the core properties, so the properties themselves always exist. In other words, you do not have to confirm that the property isn't null before you retrieve the property. This is not the case with the application properties that are provided by the ExtendedFilePropertiesPart class. The application properties are defined in XML, and may or may not exist. To retrieve those properties, you must verify that they exist by comparing the reference to the property to null before you attempt to retrieve the value of the property.

The code examples in this article include several of the issues that you encounter when you work with the Open XML SDK 2.0. Each example is slightly different. However, the basic concepts are the same. Unless you understand the structure of the part that you are trying to work with, even the Open XML SDK 2.0 does not make it possible to interact with the part. Take the time to investigate the objects that you are working with before you start to write code. You will save time.

See It

 

Watch the video

> [!VIDEO https://www.microsoft.com/en-us/videoplayer/embed/e02e10c8-73e0-466e-943b-f5825c4aaa7e]

Length: 00:6:17

Click to grab code

Grab the Code

Explore It

 

About the Author

Ken Getz is a developer, writer, and trainer, working as a senior consultant with MCW Technologies, LLC, a Microsoft Solution Provider. He has co-authored several technical books for developers, including the best-selling ASP.NET Developer's Jumpstart, the Access Developer's Handbook series, and the VBA Developer's Handbook series. Ken is a lead courseware author for AppDev, and has authored many of their most popular titles. Ken has spoken for many years at technical conferences, including Microsoft TechEd.