Retrieving Comments from Word 2010 Documents by Using the Open XML SDK 2.0

Office Visual How To

Summary:  Use the strongly-typed classes in the Open XML SDK 2.0 for Microsoft Office to retrieve an XML block that contains all the comments from a Microsoft Word 2010 document, without loading the document into Microsoft Word.

Applies to: Excel 2010 | Office 2007 | Office 2010 | Open XML | PowerPoint 2010 | VBA | Word 2010

Published:  August 2011

Provided by:  Ken Getz, MCW Technologies, LLC

Overview

The Office Open XML file formats make it possible to retrieve blocks of content from Microsoft Word 2010 documents. The Open XML SDK 2.0 for Microsoft Office adds strongly-typed classes that simplify access to the Office Open XML file formats. The SDK simplifies the tasks of retrieving, for example, the block of XML that contains all the comments. The code sample that is included with this Visual How To describes how to the use the SDK to achieve this goal.

Code It

The code sample provided with this Visual How To includes the code to retrieve the XML block that contains all the comments from a Word 2007 or Word 2010 document. The following sections discuss the code, in detail. When you use the sample code to retrieve the comments, the procedure returns an XML element that is named w:comments, which contains the XML block of information from the original document. You (and your application) must interpret the results of retrieving the comments.

Setting Up References

To use the code from the Open XML SDK 2.0, you must add references to your project. The sample project already includes these references, but in your code, you must explicitly reference the following assemblies:

  • WindowsBase (this reference may already be set for you, depending on the kind of project that you create)

  • DocumentFormat.OpenXml (installed by the Open XML SDK 2.0)

Also, ensure that your code includes at least the following using/Imports statements at the top of the code file.

Imports DocumentFormat.OpenXml.Packaging
Imports System.IO
Imports System.Xml
using System.IO;
using System.Xml;
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;

Examining the Procedure

The WDRetrieveComments procedure accepts a single parameter that indicates the name of the document from which you want to retrieve the comments (string). The procedure returns an XDocument instance that contains an element named w:comments that contains the comments as an XML element (or a null reference, if the document contains no comments).

Public Function WDRetrieveComments(ByVal fileName As String) As XDocument
public static XDocument WDRetrieveComments(string fileName)

The procedure examines the document that you specify, looking for the special comments element. If it exists, the procedure returns it. To call the procedure, pass the parameter value, as shown in the code example. Verify that you provide the path of a document that (for demonstration) contains at least a few comments, before you run the sample code.

Dim comments = WDRetrieveComments(DEMOPATH)
If comments IsNot Nothing Then
    Console.WriteLine(comments.ToString())
End If
var comments = WDRetrieveComments(DEMOPATH);
if (comments != null))
    Console.WriteLine(comments.ToString());

Accessing the Document

The code starts by creating a variable named comments that the procedure will return before it exits.

Dim comments As XDocument = Nothing
    ' Code removed here…
Return comments
XDocument comments = null;
    // Code removed here…
return comments;

The code continues by opening the document by using the Open(String, Boolean) method and indicating that the document should be open for read-only access (the final false parameter). Given the open document, the code uses the MainDocumentPart property to navigate to the main document. The code stores this reference in a variable named docPart.

Using document = WordprocessingDocument.Open(filename, False)
    Dim docPart = document.MainDocumentPart
    ' Code removed here…
End Using
using (var document = WordprocessingDocument.Open(filename, false))
{
    var docPart = document.MainDocumentPart;
    // Code removed here…
}

Finding the Comments

If the main document part reference is not null (and it should not be, for a valid document), the code retrieves a reference to the comments part by using the WordprocessingCommentsPart property of the docPart variable.

If docPart IsNot Nothing Then
    ' Retrieve the comments part.
    Dim commentsPart = docPart.WordprocessingCommentsPart
    ' Code removed here…
End If
if (docPart != null)
{
    // Retrieve the comments part.
    var commentsPart = docPart.WordprocessingCommentsPart;
    // Code removed here…
}

Retrieving the Comments as XML

Given a reference to the comments part, the code determines first that the reference is not null. If so, the code creates a new Stream object, calling the GetStream() method of the part. Finally, the code fills the XDocument by using the Create(Stream) method.

If commentsPart IsNot Nothing Then
    ' Load an XElement with the comments content.
    Using stm As Stream = commentsPart.GetStream(FileMode.Open, FileAccess.Read)
        comments = XDocument.Load(XmlReader.Create(stm))
    End Using
End If
if (commentsPart != null)
{
    // Load an XDocument with the comments content.
    using (Stream stm = commentsPart.GetStream(FileMode.Open, FileAccess.Read))
    {
        comments = XDocument.Load(XmlReader.Create(stm));
    }
}

Sample Procedure

The following code shows the full sample procedure.

Public Function WDRetrieveComments(ByVal fileName As String) As XDocument
    Dim comments As XDocument = Nothing

    Using document = WordprocessingDocument.Open(fileName, False)
        ' Retrieve the document part.
        Dim docPart = document.MainDocumentPart
        If docPart IsNot Nothing Then
            ' Retrieve the comments part.
            Dim commentsPart = docPart.WordprocessingCommentsPart
            If commentsPart IsNot Nothing Then
                ' Load an XElement with the comments content.
                Using stm As Stream = commentsPart.GetStream(FileMode.Open, FileAccess.Read)
                    comments = XDocument.Load(XmlReader.Create(stm))
                End Using
            End If
        End If
    End Using
    Return comments
End Function
public static XDocument WDRetrieveComments(string fileName)
{
    XDocument comments = null;
 
    using (var document = WordprocessingDocument.Open(fileName, false))
    {
        // Retrieve the document part.
        var docPart = document.MainDocumentPart;
        if (docPart != null)
        {
            // Retrieve the comments part.
            var commentsPart = docPart.WordprocessingCommentsPart;
            if (commentsPart != null)
            {
                // Load an XDocument with the comments content.
                using (Stream stm = commentsPart.GetStream(FileMode.Open, FileAccess.Read))
                {
                    comments = XDocument.Load(XmlReader.Create(stm));
                }
            }
        }
    }
    return comments;
}
Read It

The code sample included with this Visual How To contains code that retrieves the comments from a Word document. To use the sample, install the Open XML SDK 2.0, available from the link listed in the Explore It section. The sample also uses a modified version of code that is included as part of a set of code examples for the Open XML SDK 2.0. The Explore It section also includes a link to the full set of code examples, although you can use the sample without downloading and installing the code examples.

To understand what the sample code is doing, it is useful to examine the contents of the document by using the Open XML SDK 2.0 Productivity Tool for Microsoft Office, which is included as part of the Open XML SDK 2.0. Figure 1 shows a sample document that contains comments, opened in the tool. The sample code retrieves a reference to the Document part, and within that part, locates the comments part.

Figure 1. Open the sample document and locate the element

Comments in the Open XML SDK 2.0 Productivity Tool

 

The sample application demonstrates only several of the available properties and methods that are provided by the Open XML SDK 2.0 that you can interact with when you are retrieving and modifying document structure. For more information, see the documentation that is included with the Open XML SDK 2.0 Productivity Tool: Click the Open XML SDK Documentation tab in the lower-left corner of the application window, and search for the class that you need to study. Although the documentation does not currently include code examples, given the sample shown here and the documentation, you should be able to successfully modify the sample application.

See It

Watch the video

> [!VIDEO https://www.microsoft.com/en-us/videoplayer/embed/c56b6492-8e35-42e7-8ae3-09c4e2f821f3]

Length: 00:06:20

Click to grab code

Grab the Code

Explore It

About the Author

MVP ContributorKen Getz, is a senior consultant with MCW Technologies. He is coauthor of ASP.NET Developers Jumpstart (Addison-Wesley, 2002), Access Developer's Handbook (Sybex, 2001), and VBA Developer's Handbook, 2nd Edition (Sybex, 2001).