Export (0) Print
Expand All
Expand Minimize
2 out of 2 rated this helpful - Rate this topic

How to: Retrieve Property Values from a Word Processing Document

This topic describes how to use the classes in the Open XML SDK 2.0 for Microsoft Office to programmatically retrieve a property value from a document part in a word processing document.

The following assembly directives are required to compile the code in this topic.

using System.Windows.Forms;
using System.XML;
using DocumentFormat.OpenXml.Packaging;

To open an existing document, instantiate the WordprocessingDocument class as shown in the following using statement. In the same statement, open the word processing file at the specified document by using the Open(String, Boolean) method. To open the file for editing the Boolean parameter is set to true. In this example you just need to read the file; therefore, you can open the file for read-only access by setting the Boolean parameter to false.

using (WordprocessingDocument wordDoc = 
       WordprocessingDocument.Open(document, false)) 
{ 
    // Insert other code here. 
}

The using statement provides a recommended alternative to the typical .Open, .Save, .Close sequence. It ensures that the Dispose method (internal method used by the Open XML SDK to clean up resources) is automatically called when the closing brace is reached. The block that follows the using statement establishes a scope for the object that is created or named in the using statement, in this case wordDoc.

The basic document structure of a WordProcessingML document consists of the document and body elements, followed by one or more block level elements such as p, which represents a paragraph. A paragraph contains one or more r elements. The r stands for run, which is a region of text with a common set of properties, such as formatting. A run contains one or more t elements. The t element contains a range of text. For example, the WordprocessingML markup for a document that contains only the text “Example text.” is shown in the following code example.

<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:body>
    <w:p>
      <w:r>
        <w:t>Example text.</w:t>
      </w:r>
    </w:p>
  </w:body>
</w:document>

Using the Open XML SDK 2.0, you can create document structure and content using strongly-typed classes that correspond to WordprocessingML elements. You will find these classes in the DocumentFormat.OpenXml.Wordprocessing namespace. The following table lists the class names of the classes that correspond to the document, body, p, r, and t elements.

WordprocessingML Element

Open XML SDK 2.0 Class

Description

document

Document

The root element for the main document part.

body

Body

The container for the block level structures such as paragraphs, tables, annotations, and others specified in the ISO/IEC 29500 specification.

p

Paragraph

A paragraph.

r

Run

A run.

t

Text

A range of text.

The following text from ISO/IEC 29500 specification introduces this element.

An instance of this part contains properties specific to an Office Open XML document. [Example: A PresentationML document specifies the number of slides in this presentation when last saved by a producer.end example]A package shall contain at most one Extended File Properties part, and that part shall be the target of a relationship in the package-relationship item for the document.

[Example:

<Relationships xmlns="…">
   <Relationship Id="rId4"
      Type="http://…/extended-properties" Target="docProps/app.xml"/>
</Relationships>

end example]

The root element for a part of this content type shall be Properties.

[Example: Here's some content markup from a WordprocessingML document:

<Properties …>
   <Template>Normal.dotm</Template>
   <TotalTime>0</TotalTime>
   <Pages>1</Pages>
   <Words>3</Words>
   <Characters>22</Characters>
   <Application>Sample Producer</Application>
   <DocSecurity>0</DocSecurity>
   <Lines>1</Lines>
   <Paragraphs>1</Paragraphs>
   …
   <AppVersion>12.0000</AppVersion>
</Properties>

© ISO/IEC29500: 2008.

After you have opened the word processing file for read-only access, you can instantiate the ExtendedFilePropertiesPart class, and then you can examine document part instance. By using the GetStream, you can get the part content data stream and count the number of characters.

{
    ExtendedFilePropertiesPart appPart = wordDoc.ExtendedFilePropertiesPart;

    xmlProperties.Load(appPart.GetStream());
}
XmlNodeList chars = xmlProperties.GetElementsByTagName("Characters");

The following code example shows how to retrieve the number of characters in a word processing document. To call the GetPropertyFromDocument method, you can use the code in the following example, which retrieves the number of characters in a file named “Word17.docx and displays the result in a message box.

string document = @"C:\Users\Public\Documents\Word17.docx";
GetPropertyFromDocument(document);

Following is the complete sample code in both C# and Visual Basic.

// To retrieve the properties of a document part.
public static void GetPropertyFromDocument(string document)
{
    XmlDocument xmlProperties = new XmlDocument();

    using (WordprocessingDocument wordDoc = 
        WordprocessingDocument.Open(document, false))
    {
        ExtendedFilePropertiesPart appPart = wordDoc.ExtendedFilePropertiesPart;

        xmlProperties.Load(appPart.GetStream());
    }
    XmlNodeList chars = xmlProperties.GetElementsByTagName("Characters");

    MessageBox.Show("Number of characters in the file = " +
        chars.Item(0).InnerText, "Character Count"); 
}

Did you find this helpful?
(1500 characters remaining)
Thank you for your feedback

Community Additions

ADD
Show:
© 2014 Microsoft. All rights reserved.