Retrieving Custom Properties from Word 2010 Documents by Using the Open XML SDK 2.0.

Office Visual How To

Summary:  Use strongly typed classes in the Open XML SDK 2.0 for Microsoft Office to retrieve custom document properties in a Microsoft Office Word 2007 or Microsoft Word 2010 document, without loading the document into Microsoft Word.

Applies to: Excel 2010 | Office 2007 | Office 2010 | Open XML | PowerPoint 2010 | VBA | Visual Studio | Word 2007 | Word 2010

Published:  February 2012

Provided by:   Ken Getz

Overview

The Open XML file formats enable you to retrieve custom document properties from a Microsoft Office Word 2007 or Microsoft Word 2010 document. The Open XML SDK 2.0 adds strongly typed classes to simplify access to the Open XML file formats. The SDK also simplifies the retrieval of custom document properties, and the code sample that is included with this article describes how to use the SDK to retrieve custom document properties in an Office Word 2007 or Word 2010 document.

To use the code sample, install the Open XML SDK 2.0 by using the link that is listed in the Explore It section. The code sample is modified from code that is included as part of a set of code examples for the Open XML SDK 2.0. The Explore It section also includes a link to the full set of code examples, although you can use the code sample without downloading and installing the code examples.

The sample application retrieves custom properties in a document that you supply, calling the WDGetCustomProperty method in the sample to do the work. The method enables you to get a custom property's value, if it exists. For this example, the code specifies that you created a document named DocumentProperties.docx, and that you added custom document properties named IsItSafe (YesNo), FavoriteDate (DateTime), and Disposition (String). The calls to the method resemble the following code example.

Console.WriteLine(WDGetCustomProperty(FILENAME, "IsItSafe"))
Console.WriteLine(WDGetCustomProperty(FILENAME, "FavoriteDate"))
Console.WriteLine(WDGetCustomProperty(FILENAME, "Disposition"))
' This property doesn't exist in the sample document. 
' The code should fail gracefully.
Console.WriteLine(WDGetCustomProperty(FILENAME, "FakeProperty"))
Console.WriteLine(WDGetCustomProperty(FILENAME, "IsItSafe"));
Console.WriteLine(WDGetCustomProperty(FILENAME, "FavoriteDate"));
Console.WriteLine(WDGetCustomProperty(FILENAME, "Disposition"));
// This property doesn't exist in the sample document. 
// The code should fail gracefully.
Console.WriteLine(WDGetCustomProperty(FILENAME, "FakeProperty"));

It is important to understand how custom properties are stored in a Word document. The Open XML SDK 2.0 includes, in its tool directory, a useful application named OpenXmlSdkTool.exe, shown in Figure 1. This tool enables you to open a document and view its parts and the hierarchy of parts. Figure 1 shows the test document that has the custom properties that you added, and in the right-hand panes, the tool displays both the XML for the part and reflected C# code that you can use to generate the contents of the part. Figure 2 shows the same view of the tool, but this time shows the right pane expanded so that you can view the property names and values.

Figure 1. Open XML SDK 2.0 Productivity Tool UI

Open XML SDK 2.0 Productivity Tool UI

 

Figure 2. Viewing Custom Property Names and Values

Viewing Custom Property Names and Values

If you examine the XML content in Figure 2, you can find information about the code that is similar to the following observations:

  • Each property in the XML content consists of an XML element that includes the name and the value of the property.

  • For each property, the XML content includes an fmtid attribute, always set to the same string value: {D5CDD505-2E9C-101B-9397-08002B2CF9AE}.

  • Each property in the XML content includes a pid attribute, which must include an integer starting at 2 for the first property and then incrementing for each successive property.

  • Each property tracks its type (in the figure, the vt:lpwstr and vt:filetime element names define the types for each property).

  • The name attribute contains the name for the property, and the inner text of the XML element contains the value of the property.

Code It

The code sample that accompanies this article includes the code that is required to retrieve a custom document property in a Office Word 2007 or Word 2010 document.

Setting up references

To use the code from the Open XML SDK 2.0, you must add several references to your project. The sample project includes these references, but in your own code, you must explicitly reference the following assemblies:

  • WindowsBase. This reference might be set for you, depending on the kind of project that you create.

  • DocumentFormat.OpenXml. Installed by the Open XML SDK 2.0.

You should also add the following using or Imports statements to the top of your code file.

Imports DocumentFormat.OpenXml.Packaging
Imports DocumentFormat.OpenXml.CustomProperties
using System;
using System.Linq;
using DocumentFormat.OpenXml.CustomProperties;
using DocumentFormat.OpenXml.Packaging;

Examining the procedure

The WDGetCustomProperty procedure accepts the following two parameters:

  • The name of the document to query (string).

  • The name of the property to retrieve (string).

Public Function WDGetCustomProperty(
  ByVal fileName As String, ByVal propertyName As String) As String
public static string WDGetCustomProperty(
  string fileName, string propertyName)

The procedure returns the value of the requested property, if it exists. To call the procedure, pass in the parameter values, as shown in the following code example.

Private Const FILENAME As String = "DocumentProperties.docx"

Console.WriteLine(WDGetCustomProperty(FILENAME, "IsItSafe"))
private const string FILENAME = "DocumentProperties.docx";
Console.WriteLine(WDGetCustomProperty(FILENAME, "IsItSafe"));

The sample project includes a helper procedure named DisplayProperty, which accepts a property name. The procedure displays the name of the property and its value in the Console window.

Private Sub DisplayProperty(propName As String)
  Console.WriteLine("{0} = {1}",
    propName, WDGetCustomProperty(FILENAME, propName))
End Sub
static void DisplayProperty(string propName)
{
  Console.WriteLine("{0} = {1}", 
    propName, WDGetCustomProperty(FILENAME, propName));
}

Working with the document

The code starts by interacting with the document that you supplied in the parameters to the WDGetCustomProperty procedure. The code also starts by opening the Word document in read-only mode by using the Open method of the WordProcessingDocument class. After it creates a variable to hold the return value from the procedure, the code attempts to retrieve a reference to the custom file properties part by using the CustomFilePropertiesPart property of the document.

Dim returnValue As String = Nothing

Using document = WordprocessingDocument.Open(fileName, False)
  Dim customProps = document.CustomFilePropertiesPart
  ' Code removed here...
End Using
Return returnValue
string returnValue = null;

using (var document = WordprocessingDocument.Open(fileName, false))
{
  var customProps = document.CustomFilePropertiesPart;
  // Code removed here...
}
return returnValue;

If the reference to the custom properties part is null, the code cannot continue. Next, the code retrieves a reference to the Properties property of the custom properties part (that is, a reference to the properties themselves). If this reference is null, the code cannot continue.

If customProps IsNot Nothing Then
  ' No custom properties? Nothing to return, in that case.
  Dim props = customProps.Properties
  If props IsNot Nothing Then
    ' Code removed here...
  End If
End If
if (customProps != null)
{
  // No custom properties? Nothing to return, in that case.
  var props = customProps.Properties;
  if (props != null)
  {
    // Code removed here...
  }
}

Finally, the code uses LINQ to attempt to retrieve a reference to the custom property for which the value of the name attribute matches the property name that you supplied. If the property exists, the code returns the InnerText property, which contains the value of the property.

Dim prop = props. _
  Where(Function(p) CType(p, CustomDocumentProperty).
          Name.Value = propertyName).FirstOrDefault()
' Does the property exist? If so, get the return value.
If prop IsNot Nothing Then
  returnValue = prop.InnerText
End If
var prop = props.
  Where(p => ((CustomDocumentProperty)p).
    Name.Value == propertyName).FirstOrDefault();
// Does the property exist? If so, get the return value.
if (prop != null)
{
  returnValue = prop.InnerText;
}

Provide a test document that contains the sample custom properties (IsItSafe, Disposition, and FavoriteDate), and then run the code sample by pressing Ctrl+F5. You should see the values of the properties in the Console window.

Sample procedure

The sample procedure includes the following code.

Public Function WDGetCustomProperty(
  ByVal fileName As String, ByVal propertyName As String) As String
  Dim returnValue As String = Nothing

  Using document = WordprocessingDocument.Open(fileName, False)
    Dim customProps = document.CustomFilePropertiesPart
    If customProps IsNot Nothing Then
      ' No custom properties? Nothing to return, in that case.
      Dim props = customProps.Properties
      If props IsNot Nothing Then
        Dim prop = props. _
          Where(Function(p) CType(p, CustomDocumentProperty).
                  Name.Value = propertyName).FirstOrDefault()
        ' Does the property exist? If so, get the return value.
        If prop IsNot Nothing Then
          returnValue = prop.InnerText
        End If
      End If
    End If
  End Using
  Return returnValue
End Function
public static string WDGetCustomProperty(
  string fileName, string propertyName)
{
  string returnValue = null;

  using (var document = WordprocessingDocument.Open(fileName, false))
  {
    var customProps = document.CustomFilePropertiesPart;
    if (customProps != null)
    {
      // No custom properties? Nothing to return, in that case.
      var props = customProps.Properties;
      if (props != null)
      {
        var prop = props.
          Where(p => ((CustomDocumentProperty)p).
            Name.Value == propertyName).FirstOrDefault();
        // Does the property exist? If so, get the return value.
        if (prop != null)
        {
          returnValue = prop.InnerText;
        }
      }
    }
  }
  return returnValue;
}
Read It

The code examples in this article include several of the issues that you encounter when you work with the Open XML SDK 2.0. Each example is slightly different. However, the basic concepts are the same. Unless you understand the structure of the part that you are trying to work with, even the Open XML SDK 2.0 does not make it possible to interact with the part. Take the time to investigate the objects that you are working with before you start to write code. You will save time.

See It

Watch the video

> [!VIDEO https://www.microsoft.com/en-us/videoplayer/embed/4c127709-c9f4-428f-8a48-e84304057a17]

Length: 00:8:17

Click to grab code

Grab the Code

Explore It

 

About the Author

Ken Getz is a developer, writer, and trainer, working as a senior consultant with MCW Technologies, LLC, a Microsoft Solution Provider. He has co-authored several technical books for developers, including the best-selling ASP.NET Developer's Jumpstart, the Access Developer's Handbook series, and the VBA Developer's Handbook series. Ken is a lead courseware author for AppDev, and has authored many of their most popular titles. Ken has spoken for many years at technical conferences, including Microsoft TechEd.