Export (0) Print
Expand All
Expand Minimize

Manipulating Word 2007 Files with the Open XML Format API (Part 3 of 3)

Office 2007

Summary: This is the third in a series of three articles that describes the Open XML Application Programming Interface (API) code that you can use to access and manipulate Microsoft Office Word 2007 files. (16 printed pages)

Frank Rice, Microsoft Corporation

September 2007 (Revised August 2008)

Applies to: Microsoft Office Word 2007

Contents

The 2007 Microsoft Office system introduces new file formats that are based on XML called Open XML Formats. Microsoft Office Word 2007, Microsoft Office Excel 2007, and Microsoft Office PowerPoint 2007 all use these formats as the default file format. Open XML formats are useful because they are an open standard and are based on well-known technologies: ZIP and XML. Microsoft provides a library for accessing these files as part of the .NET Framework 3.0 technologies in the DocumentFormat.OpenXml namespace in the Welcome to the Open XML Format SDK 1.0. The Open XML Format members are contained in theDocumentFormat.OpenXml API and provide strongly-typed part classes to manipulate Open XML documents. The SDK simplifies the task of manipulating Open XML packages. The Open XML Format API encapsulates many common tasks that developers perform on Open XML Format packages, so you can perform complex operations with just a few lines of code.

In the following code, you set the value of a custom property in a document. A document may or may not include custom properties that reside in the custom.xml part. Therefore, the procedure does the following:

  • If the custom.xml part does not exist in the document, it adds it.

  • If the custom.xml part is there, but the property is not, it adds the property.

  • If the property exists and is of the same type, it replaces the value.

  • If the property exists and is of a different type, it updates the existing property.

When completed, the procedure returns a Boolean value indicating whether the operation succeeded or not.

Public Enum PropertyTypes
   YesNo
   Text
   DateTime
   NumberInteger
   NumberDouble
End Enum

Public Function WDSetCustomProperty(ByVal docName As String, ByVal propertyName As String, ByVal propertyValue As Object, ByVal 
propertyType As PropertyTypes) As Boolean
   ' Given a document name, a property name/value, and the property
   ' type, add a custom property to a document. Return true if the
   ' property is added/updated, or False if the property cannot be updated.
   Const customPropertiesSchema As String = "http://schemas.openxmlformats.org/officeDocument/2006/custom-properties"
   Const customVTypesSchema As String = "http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes"
   Dim retVal As Boolean = False
   Dim propertyTypeName As String = "vt:lpwstr"
   Dim propertyValueString As String = Nothing
   ' Calculate the correct type.
   Select Case (propertyType)
      Case PropertyTypes.DateTime
         propertyTypeName = "vt:filetime"
         If (TypeOf propertyValue Is String) Then
            propertyValueString = String.Format("{0:s}Z", Convert.ToDateTime(propertyValue))
         End If
      Case PropertyTypes.NumberInteger
         propertyTypeName = "vt:i4"
         If (TypeOf propertyValue Is Int32) Then
            propertyValueString = Convert.ToInt32(propertyValue).ToString
         End If
      Case PropertyTypes.NumberDouble
         propertyTypeName = "vt:r8"
         If (TypeOf propertyValue Is Double) Then
            propertyValueString = Convert.ToDouble(propertyValue).ToString
         End If
      Case PropertyTypes.Text
         propertyTypeName = "vt:lpwstr"
         propertyValueString = Convert.ToString(propertyValue)
      Case PropertyTypes.YesNo
         propertyTypeName = "vt:bool"
         If (TypeOf propertyValue Is Boolean) Then
            ' Must be lowercase!
            propertyValueString = Convert.ToBoolean(propertyValue).ToString.ToLower
         End If
   End Select
   If (propertyValueString = Nothing) Then
      ' If the code cannot convert the 
      ' property to a valid value, throw an exception.
      Throw New InvalidDataException("Invalid parameter value.")
   End If
   Dim wdPackage As WordprocessingDocument = WordprocessingDocument.Open(docName, True)
   ' Work with the custom properties part.
   Dim customPropsPart As CustomFilePropertiesPart = wdPackage.CustomFilePropertiesPart
   ' Manage namespaces to perform XML XPath queries.
   Dim nt As NameTable = New NameTable
   Dim nsManager As XmlNamespaceManager = New XmlNamespaceManager(nt)
   nsManager.AddNamespace("d", customPropertiesSchema)
   nsManager.AddNamespace("vt", customVTypesSchema)
   Dim customPropsUri As Uri = New Uri("/docProps/custom.xml", UriKind.Relative)
   Dim customPropsDoc As XmlDocument = Nothing
   Dim rootNode As XmlNode = Nothing
   ' There may not be a custom properties part.
   If (customPropsPart Is Nothing) Then
      customPropsDoc = New XmlDocument(nt)
      ' The part does not exist. Create it now.
      customPropsPart = wdPackage.AddCustomFilePropertiesPart
      ' Set up the rudimentary custom part.
      rootNode = customPropsDoc.CreateElement("Properties", customPropertiesSchema)
   rootNode.Attributes.Append(customPropsDoc.CreateAttribute("xmlns:vt"))
      rootNode.Attributes("xmlns:vt").Value = customVTypesSchema
      customPropsDoc.AppendChild(rootNode)
   Else
      ' Load the contents of the custom properties part into an XML document.
      customPropsDoc = New XmlDocument(nt)
      customPropsDoc.Load(customPropsPart.GetStream)
      rootNode = customPropsDoc.DocumentElement
   End If
   ' Now that you have a reference to an XmlDocument object that 
   ' corresponds to the custom properties part, 
   ' check to see if the required property is already there.
   Dim searchString As String = String.Format("d:Properties/d:property[@name='{0}']", propertyName)
   Dim node As XmlNode = customPropsDoc.SelectSingleNode(searchString, nsManager)

   Dim valueNode As XmlNode = Nothing
   If (Not (node) Is Nothing) Then
      ' You found the node. Now check its type.
      If node.HasChildNodes Then
         valueNode = node.ChildNodes(0)
         If (Not (valueNode) Is Nothing) Then
            Dim typeName As String = valueNode.Name
            If (propertyTypeName = typeName) Then
               ' The types are the same. 
               ' Replace the value of the node.
               valueNode.InnerText = propertyValueString
               ' If the property existed, and its type
               ' has not changed, you are finished.
               retVal = True
            Else
               ' Types are different. Delete the node
               ' and clear the node variable.
               node.ParentNode.RemoveChild(node)
               node = Nothing
            End If
         End If
      End If
   End If
   ' The previous block of code may have cleared the value in the 
   ' variable named node.
   If (node Is Nothing) Then
      ' Either you did not find the node, or you 
      ' found it, its type was incorrect, and you deleted it.
      ' Either way, you need to create the new property node now.
      ' Find the highest existing "pid" value.
      ' The default value for the "pid" attribute is "2".
      Dim pidValue As String = "2"
      Dim propertiesNode As XmlNode = customPropsDoc.DocumentElement
      If propertiesNode.HasChildNodes Then
         Dim lastNode As XmlNode = propertiesNode.LastChild
         If (Not (lastNode) Is Nothing) Then
            Dim pidAttr As XmlAttribute = lastNode.Attributes("pid")
            If Not (pidAttr Is Nothing) Then
               pidValue = pidAttr.Value
               ' Increment pidValue, so that the new property
               ' gets a pid value one higher. This value should be 
               ' numeric, but you should confirm that.
               Dim value As Integer = 0
               If Integer.TryParse(pidValue, value) Then
                  pidValue = Convert.ToString((value + 1))
               End If
            End If
         End If
      End If
      node = customPropsDoc.CreateElement("property", customPropertiesSchema)
      node.Attributes.Append(customPropsDoc.CreateAttribute("name"))
      node.Attributes("name").Value = propertyName
      node.Attributes.Append(customPropsDoc.CreateAttribute("fmtid"))
      node.Attributes("fmtid").Value = "{D5CDD505-2E9C-101B-9397-08002B2CF9AE}"
      node.Attributes.Append(customPropsDoc.CreateAttribute("pid"))
      node.Attributes("pid").Value = pidValue
      valueNode = customPropsDoc.CreateElement(propertyTypeName, customVTypesSchema)
      valueNode.InnerText = propertyValueString
      node.AppendChild(valueNode)
      rootNode.AppendChild(node)
      retVal = True
   End If
   ' Save the properties XML back to its part.
   customPropsDoc.Save(customPropsPart.GetStream)

   Return retVal
End Function

The code first defines an enumeration of possible custom property types.

Public Enum PropertyTypes
   YesNo
   Text
   DateTime
   NumberInteger
   NumberDouble
End Enum

Next, the code example calls the WDSetCustomProperty, passing in a reference to the Word 2007 document, the name of the custom property, the new value to which you want to set the property, and the property type from the enumerated values.

Dim propertyTypeName As String = "vt:lpwstr"

Then you set the propertyTypeName variable to a default node name (vt:lpwstr) representing a Text value in the WordprocessingML markup in the CustomFilePropertiesPart part. The document's custom properties reside in the CustomFilePropertiesPart part. This variable eventually references the node that contains the property value you want to set.

Next, a series of Select Case statements (switch statements in Microsoft Visual C#) test the type of the property you want to update. Then, depending on the value of the property, the code sets a variable equal to the name of the specific node that holds that value. The code then formats the value that updates the property.

Select Case (propertyType)
   Case PropertyTypes.DateTime
      propertyTypeName = "vt:filetime"
      If (TypeOf propertyValue Is String) Then
         propertyValueString = String.Format("{0:s}Z", Convert.ToDateTime(propertyValue))
      End If
   Case PropertyTypes.NumberInteger
      propertyTypeName = "vt:i4"
      If (TypeOf propertyValue Is Int32) Then
         propertyValueString = Convert.ToInt32(propertyValue).ToString
      End If
......
......
End Select

Then you create a WordprocessingDocument object from the input document, representing the Office Open XML Format package. Next, the code retrieves the CustomFilePropertiesPart. Then the code creates a namespace manager to set up the XPath query.

The next section of code determines if the custom property part exists.

If (customPropsPart Is Nothing) Then
   customPropsDoc = New XmlDocument(nt)
   ' The part does not exist. Create it now.
   customPropsPart = wdPackage.AddCustomFilePropertiesPart
   ' Set up the rudimentary custom part.
   rootNode = customPropsDoc.CreateElement("Properties", customPropertiesSchema)
   rootNode.Attributes.Append(customPropsDoc.CreateAttribute("xmlns:vt"))
   rootNode.Attributes("xmlns:vt").Value = customVTypesSchema
   customPropsDoc.AppendChild(rootNode)
Else
   ' Load the contents of the custom properties part into an XML document.
   customPropsDoc = New XmlDocument(nt)
   customPropsDoc.Load(customPropsPart.GetStream)
   rootNode = customPropsDoc.DocumentElement
End If

If the part does not exist, the code creates a custom property part shell and populates it with basic properties. If the part does exist, you load its contents into a memory-resident XML document and then you set up the search string as an XPath query to search for the d:Properties/d:property node.

If (Not (node) Is Nothing) Then
   ' You found the node. Now check its type.
   If node.HasChildNodes Then
      valueNode = node.ChildNodes(0)
      If (Not (valueNode) Is Nothing) Then
         Dim typeName As String = valueNode.Name
         If (propertyTypeName = typeName) Then
            ' The types are the same. 
            ' Replace the value of the node.
            valueNode.InnerText = propertyValueString
            ' If the property existed, and its type
            ' did not change, you are finished.
            retVal = True
         Else
            ' The types are different. Delete the node
            ' and clear the node variable.
            node.ParentNode.RemoveChild(node)
            node = Nothing
         End If
      End If
   End If
End If

In this code, the following actions may occur:

  • If you did not find the node, it adds it to the part.

  • If you found the node and the type is different then the new property, it deletes the node.

  • If you found the node and the type is the same as the new property, it replaces the property value. Otherwise, it adds a new node with the new value and type.

  • If you did not find the node, or you found it but its type was incorrect, so you deleted it, it creates the new property node.

Dim pidValue As String = "2"
Dim propertiesNode As XmlNode = customPropsDoc.DocumentElement
If propertiesNode.HasChildNodes Then
   Dim lastNode As XmlNode = propertiesNode.LastChild
   If (Not (lastNode) Is Nothing) Then
      Dim pidAttr As XmlAttribute = lastNode.Attributes("pid")
      If Not (pidAttr Is Nothing) Then
         pidValue = pidAttr.Value
         ' Increment pidValue, so that the new property
         ' gets a pid value one higher. This value should be 
         ' numeric, but you should confirm that.
         Dim value As Integer = 0
         If Integer.TryParse(pidValue, value) Then
                  pidValue = Convert.ToString((value + 1))
         End If
      End If
   End If
End If

Each property has an id value (pidValue) that has a default value of 2. This value must be one higher than the value for any existing property ids. This code segment finds the value of the existing property node id (if any exist) and ensures that the id of the new property node is one higher.

node = customPropsDoc.CreateElement("property", customPropertiesSchema)
node.Attributes.Append(customPropsDoc.CreateAttribute("name"))
node.Attributes("name").Value = propertyName
node.Attributes.Append(customPropsDoc.CreateAttribute("fmtid"))
node.Attributes("fmtid").Value = "{D5CDD505-2E9C-101B-9397-08002B2CF9AE}"
node.Attributes.Append(customPropsDoc.CreateAttribute("pid"))
node.Attributes("pid").Value = pidValue
valueNode = customPropsDoc.CreateElement(propertyTypeName, customVTypesSchema)
valueNode.InnerText = propertyValueString
node.AppendChild(valueNode)
rootNode.AppendChild(node)
retVal = True

The remaining code creates the property element, adds its attributes, and then appends the node to root node. The final step returns the Boolean value indicating whether the operation succeeded.

The following code changes the print orientation of a document.

Public Enum PrintOrientation
   Landscape
   Portrait
End Enum

Public Sub WDSetPrintOrientation(ByVal docName As String, ByVal newOrientation As PrintOrientation)
   ' Given a document name, set the print orientation for all the sections of the document.
   ' Example:
   ' WDSetPrintOrientation(@"C:\Samples\SetOrientation.docx", PrintOrientation.Landscape); 
   Const wordmlNamespace As String = "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
   Dim wdPackage As WordprocessingDocument = WordprocessingDocument.Open(docName, True)
   ' Get the officeDocument part.
   Dim documentPart As MainDocumentPart = wdPackage.MainDocumentPart
   ' Load the officeDocument part into an XML document.
   Dim doc As XmlDocument = New XmlDocument
   doc.Load(documentPart.GetStream)
   ' Manage namespaces to perform XPath queries.
   Dim nt As NameTable = New NameTable
   Dim nsManager As XmlNamespaceManager = New XmlNamespaceManager(nt)
   nsManager.AddNamespace("w", wordmlNamespace)
   Dim nodes As XmlNodeList = doc.SelectNodes("//w:sectPr/w:pgSz", nsManager)
   For Each node As System.Xml.XmlNode In nodes
      ' Retrieve the current orientation for the section.
      ' Assume the orientation is portrait.
      Dim orientation As PrintOrientation = PrintOrientation.Portrait
      Dim attr As XmlAttribute = node.Attributes("w:orient")
      If (Not (attr) Is Nothing) Then
         Select Case (attr.Value)
            Case "portrait"
               orientation = PrintOrientation.Portrait
            Case "landscape"
               orientation = PrintOrientation.Landscape
         End Select
      End If
      ' Compare the current orientation to the requested orientation.
      ' If it is the same, get out. Otherwise, make the changes.
      If (newOrientation <> orientation) Then
         If (attr Is Nothing) Then
            ' Create the attribute. Although this is not necessary
            ' when there is no change in orientation, 
            ' setting it has no negative effect.
            attr = node.Attributes.Append(doc.CreateAttribute("w:orient", wordmlNamespace))
         End If
         Select Case (newOrientation)
            Case PrintOrientation.Landscape
               attr.Value = "landscape"
            Case PrintOrientation.Portrait
               attr.Value = "portrait"
         End Select
         Dim pageSizeNode As XmlNode = node.ParentNode.SelectSingleNode("w:pgMar", nsManager)
         If (Not (pageSizeNode) Is Nothing) Then
            ' Swap page dimensions.
            Dim width As String = Nothing
            Dim height As String = Nothing
            Dim widthAttr As XmlAttribute = Nothing
            Dim heightAttr As XmlAttribute = Nothing
            widthAttr = node.Attributes("w:w")
            If (Not (widthAttr) Is Nothing) Then
               width = widthAttr.Value
            End If
            heightAttr = node.Attributes("w:h")
            If (Not (heightAttr) Is Nothing) Then
               height = heightAttr.Value
            End If
            If (Not (widthAttr) Is Nothing) Then
               widthAttr.Value = height
            End If
            If (Not (heightAttr) Is Nothing) Then
               heightAttr.Value = width
            End If
            ' Rotate margins. Printer settings determine how far you 
            ' rotate when switching to landscape mode. Not having those
            ' settings, this code rotates 90 degrees. You can 
            ' modify this behavior, or make it a parameter for the 
            ' procedure.
            Dim top As String = Nothing
            Dim bottom As String = Nothing
            Dim left As String = Nothing
            Dim right As String = Nothing
            Dim topAttr As XmlAttribute = Nothing
            Dim leftAttr As XmlAttribute = Nothing
            Dim bottomAttr As XmlAttribute = Nothing
            Dim rightAttr As XmlAttribute = Nothing
            topAttr = pageSizeNode.Attributes("w:top")
            If (Not (attr) Is Nothing) Then
               top = topAttr.Value
            End If
            leftAttr = pageSizeNode.Attributes("w:left")
            If (Not (attr) Is Nothing) Then
               left = leftAttr.Value
            End If
            rightAttr = pageSizeNode.Attributes("w:right")
            If (Not (attr) Is Nothing) Then
               right = rightAttr.Value
            End If
            bottomAttr = pageSizeNode.Attributes("w:bottom")
            If (Not (attr) Is Nothing) Then
               bottom = bottomAttr.Value
            End If
            If (Not (topAttr) Is Nothing) Then
               topAttr.Value = left
            End If
            If (Not (leftAttr) Is Nothing) Then
               leftAttr.Value = bottom
            End If
            If (Not (rightAttr) Is Nothing) Then
               rightAttr.Value = top
            End If
            If (Not (bottomAttr) Is Nothing) Then
               bottomAttr.Value = right
            End If
         End If
      End If
   Next
   ' Save the document XML back to its part.
   doc.Save(documentPart.GetStream)
End Sub

The code first defines an enumeration of the two print options.

Public Enum PrintOrientation
   Landscape
   Portrait
End Enum

Next, the code calls the WDSetPrintOrientation, passing in a reference to the Word 2007 document and the desired print orientation, either landscape or portrait. Then you set up the WordprocessingDocument object representing the Office Open XML Format package and set a reference to the MainDocumentPart part. You create a memory-resident XML document and load in the contents of the main document part.

Next, you set up a namespace manager by using the XmlNamespaceManager object and by setting a reference to the default WordprocessingML namespace, using the w qualifier. Then you select the printer-specific nodes using the following XPath expression.

Dim nodes As XmlNodeList = doc.SelectNodes("//w:sectPr/w:pgSz", nsManager)

Next, you test the w:orient node to determine the current setting. This procedure assumes portrait orientation.

Dim attr As XmlAttribute = node.Attributes("w:orient")
If (Not (attr) Is Nothing) Then
   Select Case (attr.Value)
      Case "portrait"
          orientation = PrintOrientation.Portrait
      Case "landscape"
          orientation = PrintOrientation.Landscape
   End Select
End If

The procedure then tests to see if the requested orientation is the same as the current orientation, and if so, the procedure exits.

If (newOrientation <> orientation) Then

Otherwise, the remainder of the code changes the print attributes necessary to change the document's orientation.

As this article demonstrates, working with Word 2007 files is much easier with the Welcome to the Open XML Format SDK 1.0. I encourage you to experiment with the code in this series of articles to solve your own programming problems by using the Office Open XML API.

Show:
© 2014 Microsoft