Export (0) Print
Expand All
Expand Minimize
1 out of 1 rated this helpful - Rate this topic

Retrieving Content from Different Parts: Explicit or Implicit Relationships in the Open XML SDK 2.0 for Microsoft Office

Office 2007

Summary: Learn how to use the Open XML Software Development Kit 2.0 for Microsoft Office to retrieve content from a Microsoft Office Word 2007 document programmatically by using explicit and implicit relationships. (16 printed pages)

Office Visual How To

Applies to: 2007 Microsoft Office System, Microsoft Office Word 2007, Open XML SDK 2.0 for Microsoft Office, Microsoft Visual Studio 2008

Joel Krist, iSoftStone

August 2009

Overview

The Open XMLSoftware Development Kit 2.0 for Microsoft Office makes it possible to create and manipulate Microsoft Office Word 2007, Microsoft Office Excel 2007, and Microsoft Office PowerPoint 2007 documents programmatically via the Open XML formats. The typesafe classes included with the SDK provide a layer of abstraction between the developer and the Open XML formats, simplifying the process of working with Office 2007 documents and enabling the creation of solutions that are not dependent on the presence of the Office client applications to handle document creation.

The sample code in this visual how-to article shows how to use the classes provided with the Open XML Format SDK 2.0 for Microsoft Office to retrieve content from a Word 2007 document programmatically by using explicit and implicit relationships. The code uses the HeaderReference, CommentRangeStart, and Hyperlink classes from the Open XML Format SDK 2.0 to retrieve information about the headers, comments, and hyperlinks in a document.

See It

Video startup screen

Watch the Video

Length: 11:31 | Size: 14.60 MB | Type: WMV file

Code It | Read It | Explore It

Code It

Download the sample code

This visual how-to article presents a solution that creates a Windows console application that displays information about the headers, comments and hyperlinks in a Word 2007 document. The following figure shows the output generated when the code is run against a Word document that contains different first, even, and odd page headers, three comments, and three hyperlinks.

Figure 1. Output from the sample code

Output from the sample code

 

This section walks through the following steps:

  1. Creating a Windows console application solution in Visual Studio 2008.

  2. Adding references to the DocumentFormat.OpenXml and WindowsBase assemblies.

  3. Adding the sample code to the solution.

Creating a Windows Console Application in Visual Studio 2008

This visual how-to article uses a Windows console application to provide the framework for the sample code. However, you could use the same approach that is illustrated here with other application types as well.

To create a Windows Console Application in Visual Studio 2008

  1. Start Microsoft Visual Studio 2008.

  2. On the File menu, point to New, and then click Project.

  3. In the New Project dialog box select the Visual C# Windows type in the Project types pane.

  4. Select Console Application in the Templates pane, and then name the project RelationshipTest.

    Figure 2. Create new solution in the New Project dialog box

    Create new solution in the New Project dialog box

     

  5. Click OK to create the solution.

Adding References to the DocumentFormat.OpenXml and WindowsBase Assemblies

The sample code uses the classes and enumerations that are in the DocumentFormat.OpenXml.dll assembly that is installed with the Open XML Fomat SDK 2.0 for Microsoft Office. To add the reference to the assembly in the following steps or to build the sample code that accompanies this visual how-to, you must first download and install the Open XML SDK 2.0 for Microsoft Office so that the assembly is available.

To add References to the DocumentFormat.OpenXml and WindowsBase Assemblies

  1. Add a reference to the DocumentFormat.OpenXml assembly by doing the following:

    1. On the Project menu in Visual Studio, click Add Reference to open the Add Reference dialog box.

    2. Select the .NET tab, scroll down to DocumenFormat.OpenXml, select it, and then click OK.

      Figure 3. Add Reference to DocumentFormat.OpenXML

      Add Reference to DocumentFormat.OpenXml

       

  2. The classes in the DocumentFormat.OpenXml assembly use the System.IO.Packaging.Package class that is defined in the WindowsBase assembly. Add a reference to the WindowsBase assembly by doing the following:

    1. On the Project menu in Visual Studio, click Add Reference to open the Add Reference dialog box.

    2. Select the .NET tab, scroll down to WindowsBase, select it, and then click OK.

      Figure 4. Add Reference to WindowsBase

      Add Reference to WindowsBase

       

Adding the Sample Code to the Solution

Replace the entire contents of the Program.cs source file with the following code.

using System;
using System.Linq;
using System.Text;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;

class Program
{
    static void Main(string[] args)
    {
        string testDocumentPath = @"C:\Temp\RelationshipTest.docx";

        using (WordprocessingDocument doc =
          WordprocessingDocument.Open(testDocumentPath, false))
        {
            ProcessHeaders(doc);
            ProcessComments(doc);
            ProcessHyperlinks(doc);
        }
    }

    static void ProcessHeaders(WordprocessingDocument doc)
    {
        string headerRelationshipId;
        string headerType = "";
        StringBuilder headerText = null;
        Document mainDocument = doc.MainDocumentPart.Document;

        // If there are headers then display a heading for the
        // headers output.
        if (mainDocument.Descendants<HeaderReference>().Count() > 0)
        {
            Console.WriteLine("{0, -12} {1, -30} {2, -30}",
                "Header Id", "Header Type", "Header Text");
            Console.WriteLine("{0, 12} {1, 30} {2, 30}",
                "------------",
                "------------------------------",
                "------------------------------");
        }
        else
        {
            Console.WriteLine(
                "The document does not contain any headers.");
            Console.WriteLine();
            return;
        }

        // Iterate through the headerReference elements in the main
        // document part.
        foreach (HeaderReference headerReference in
            mainDocument.Descendants<HeaderReference>())
        {
            // The headerReference element has an explicit relationship
            // with a Header part. Get the relationship id that points
            // to the header part.
            headerRelationshipId = headerReference.Id.Value;

            // Get the header's type from the headerReference
            // Type attribute.
            headerType = headerReference.Type.Value.ToString();

            // Get the header element from the Header part via the
            // explicit relationship id.
            Header header =
                ((HeaderPart)(doc.MainDocumentPart.GetPartById(
                headerRelationshipId))).Header;

            // Get the header text. The text could be spread across
            // multiple text runs so process all the text elements
            // that are descendants of the header element.
            headerText = new StringBuilder();
            
            foreach (Text text in header.Descendants<Text>())
                headerText.Append(text.InnerText);

            // Shorten the header text if needed.
            if (headerText.Length > 30)
            {
                headerText.Length = 27;
                headerText.Append("...");
            }

            // Display the header relationship id, header type,
            // and header text.
            Console.WriteLine("{0, 12} {1, -30} {2, -30}",
                headerRelationshipId, headerType, headerText);
        }

        Console.WriteLine();
    }

    static void ProcessComments(WordprocessingDocument doc)
    {
        StringBuilder runText = null;
        string commentId;
        StringBuilder commentText = null;
        Document mainDocument = doc.MainDocumentPart.Document;

        // If there are comments then display a heading for the
        // comments output.
        if (mainDocument.Descendants<CommentRangeStart>().Count() > 0)
        {
            Console.WriteLine("{0, -12} {1, -30} {2, -30}",
                "Comment Id", "Comment Run Text", "Comment Text");
            Console.WriteLine("{0, 12} {1, 30} {2, 30}",
                "------------",
                "------------------------------",
                "------------------------------");
        }
        else
        {
            Console.WriteLine(
                "The document does not contain any comments.");
            Console.WriteLine();
            return;
        }

        // Iterate through the commentRangeStart elements in the
        // main document part.
        foreach (CommentRangeStart commentRangeStart in
            mainDocument.Descendants<CommentRangeStart>())
        {
            runText = new StringBuilder();
            commentText = new StringBuilder();

            // Get the text in the document that is associated
            // with the comment. The text could be spread across
            // multiple text elements so process all the text
            // elements that are descendants of the commentRangeStart
            // element's parent paragraph.
            foreach (Text text in
                commentRangeStart.Parent.Descendants<Text>())
                runText.Append(text.InnerText);

            // The commentRangeStart element has an implicit
            // relationship with the Comments part.
            // Get the id of the associated comment via the
            // commentRangeStart's Id attribute.
            commentId = commentRangeStart.Id.Value;

            // Get the comment element from the comments part
            // via it's id.
            Comment comment = doc
                .MainDocumentPart
                .WordprocessingCommentsPart
                .Comments
                .Elements<Comment>().Single(c => c.Id == commentId);

            // Get the comment text. The text could be spread across
            // multiple text runs so process all the text elements
            // that are descendants of the comment element.
            foreach (Text text in comment.Descendants<Text>())
                commentText.Append(text.InnerText);

            // Shorten the run and comment text if needed.
            if (runText.Length > 30)
            {
                runText.Length = 27;
                runText.Append("...");
            }

            if (commentText.Length > 30)
            {
                commentText.Length = 27;
                commentText.Append("...");
            }

            // Display the comment id, document text, and comment text.
            Console.WriteLine("{0, 12} {1, -30} {2, -30}",
                commentId, runText, commentText);
        }

        Console.WriteLine();
    }

    static void ProcessHyperlinks(WordprocessingDocument doc)
    {
        StringBuilder hyperlinkText = null;
        string hyperlinkRelationshipId;
        StringBuilder hyperlinkUri = null;
        Document mainDocument = doc.MainDocumentPart.Document;

        // If there are hyperlinks then display a heading for the
        // hyperlinks output.
        if (mainDocument.Descendants<Hyperlink>().Count() > 0)
        {
            Console.WriteLine("{0, 12} {1, -30} {2, -30}",
                "Hyperlink Id", "Hyperlink Text", "Hyperlink Uri");
            Console.WriteLine("{0, 12} {1, 30} {2, 30}",
                "------------",
                "------------------------------",
                "------------------------------");
        }
        else
        {
            Console.WriteLine(
                "The document does not contain any hyperlinks.");
            Console.WriteLine();
            return;
        }

        // Iterate through the hyperlink elements in the
        // main document part.
        foreach (Hyperlink hyperlink in
            mainDocument.Descendants<Hyperlink>())
        {
            hyperlinkText = new StringBuilder();

            // Get the text in the document that is associated
            // with the hyperlink. The text could be spread across
            // multiple text elements so process all the text
            // elements that are descendants of the hyperlink element.
            foreach (Text text in hyperlink.Descendants<Text>())
                hyperlinkText.Append(text.InnerText);

            // The hyperlink element has an explicit relationship
            // with the actual hyperlink. Get the relationship id
            // via the hyperlink element's Id attribute.
            hyperlinkRelationshipId = hyperlink.Id.Value;

            // Get the hyperlink uri via the explicit relationship Id.
            ExternalRelationship hyperlinkRelationship = doc
                .MainDocumentPart
                .ExternalRelationships
                .Single(c => c.Id == hyperlinkRelationshipId);
            
            hyperlinkUri = new StringBuilder(
                hyperlinkRelationship.Uri.AbsoluteUri);

            // Shorten the hyperlink text and uri if needed.
            if (hyperlinkText.Length > 30)
            {
                hyperlinkText.Length = 27;
                hyperlinkText.Append("...");
            }

            if (hyperlinkUri.Length > 30)
            {
                hyperlinkUri.Length = 27;
                hyperlinkUri.Append("...");
            }

            // Display the hyperlink relationship id, hyperlink text,
            // and hyperlink uri.
            Console.WriteLine("{0, 12} {1, -30} {2, -30}",
                hyperlinkRelationshipId, hyperlinkText, hyperlinkUri);
        }

        Console.WriteLine();
    }
}

 

Build and run the solution in Visual Studio by pressing CTRL+F5. The sample code displays header, comment, and hyperlink data for a source document named RelationshipTest.docx in the C:\Temp folder. The source document must exist for the solution to work. To change the name or the location of the source document, modify the sample code and change the value of the testDocumentPath variable that is defined in the Main method.

Read It

Document parts can refer to other resources by using the concept of relationships in the Office Open XML file formats. The referenced resource can be internal to the package, such as another part in the document, or external, such as an external hyperlink target. Section 9.2 of the Standard ECMA-376 specification for the Office Open XML File Formats describes relationships in open connectivity (OPC) as follows:

"In OPC, relationships describe references from parts to other internal resources in the package or to external resources. They represent the type of connection between a source part and a target resource, and make the connection directly discoverable without looking at the part contents, so they are quick to resolve."

There are two types of relationships in Office Open XML - explicit and implicit. For an explicit relationship, a resource is referenced from a source part’s XML by using the Id attribute of a Relationship tag. Page headers in WordProcessingML documents are referenced by using explicit relationships. For example, a main document part that defines separate headers for first, even, and odd pages will have headerReference tags similar to the following.

<w:headerReference w:type="even" r:id="rId1"/>
<w:headerReference w:type="default" r:id="rId2"/>
<w:headerReference w:type="first" r:id="rId3"/>

 

The r:id attribute of the headerReference tag defines an explicit relationship to the header part that contains the markup for the header. The Relationships part for the main document part will contain a corresponding Relationship tag with an Id attribute that matches the r:id attribute of the headerReference tag and a Target attribute that specifies the Header part that contains the header markup. The following XML is an example.

<Relationships xmlns="...">
    <Relationship Id="rId1"
      Type=http://schemas.openxmlformats.org/../header
Target="header1.xml"/>
    <Relationship Id="rId2"
      Type=http://schemas.openxmlformats.org/../header
      Target="header2.xml"/>
    <Relationship Id="rId3"
      Type=http://schemas.openxmlformats.org/../header
      Target="header3.xml"/>
</Relationships>

 

Implicit relationships also use Ids to identify the relationship target. However, the Ids used by implicit relationships are not used to specify the target part. The target part is implied based on the type of the element with the implicit relationship and the Id is an index into the elements of the target part. Comments in WordProcessingML documents are referenced by using implicit relationships For example, a main document part that contains comments have commentRangeStart tags similar to the following.

<w:commentRangeStart w:id="0"/>

 

The w:id attribute of the commentRangeStart tag defines an implicit relationship to a comment in the single Comments part that exists in the package. That is, if a document contains comments, it is implied that there will be a single Comments part that contains comment-related markup. The w:id attribute of the commentRangeStart element specifies the index of the w:comment element in the Comments part that contains the markup for the related comment. The following code is an example.

<w:comment w:id="0" w:author="...l" w:date="..." w:initials="..."></w:comment>

 

Code that works with relationships to retrieve content takes a different approach for each type of relationship. When working with explicit relationships, use the relationship id to access the required target part. For example, the following code fragment from the sample code works with headers that use explicit relationships. The code first gets the relationship id of the target header part and then uses the GetPartById method to get the target part.

// Iterate through the headerReference elements in the main
// document part.
foreach (HeaderReference headerReference in
    mainDocument.Descendants<HeaderReference>())
{
    // The headerReference element has an explicit relationship
    // with a Header part. Get the relationship id that points
    // to the header part.
    headerRelationshipId = headerReference.Id.Value;

    // Get the header element from the Header part via the
    // explicit relationship id.
    Header header =
        ((HeaderPart)(doc.MainDocumentPart.GetPartById(
        headerRelationshipId))).Header;

 

For implicit relationships, the approach is different. In that situation, the code determines the target part based on the type of the element. Then, it uses the relationship id to get the correct element in the target part. For example, the following code fragment from the sample code works with comments that use implicit relationships. The code first gets the implicit relationship id for a comment in the main document. The code determines that comment markup is stored in the single Comments part, so it accesses the Comments part via WordprocessingDocument.MainDocumentPart.WordprocessingCommentsPart and selects the desired comment element by using the relationship id.

// Iterate through the commentRangeStart elements in the
// main document part.
foreach (CommentRangeStart commentRangeStart in
    mainDocument.Descendants<CommentRangeStart>())
{
    // The commentRangeStart element has an implicit
    // relationship with the Comments part.
    // Get the id of the associated comment via the
    // commentRangeStart's Id attribute.
    commentId = commentRangeStart.Id.Value;

    // Get the comment element from the comments part
    // via it's id.
    Comment comment = doc
        .MainDocumentPart
        .WordprocessingCommentsPart
        .Comments
        .Elements<Comment>().Single(c => c.Id == commentId);

Refer to Section 9.2 of the Standard ECMA-376 specification for the Office Open XML File Formats that is available on the Microsoft Office 2007 SP2 Office Open XML Implementer Notes site for more examples that illustrate the differences between explicit and implicit relationships.

Ee413542.note(en-us,office.12).gifNote:

When you use the Open XML SDK 2.0 for Microsoft Office to create a document-generation solution, it is best practice to create a template document first, and then use DocumentReflector, a tool that comes with the SDK. DocumentReflector can generate C# code that uses the SDK typesafe classes to reproduce your template document and the functionality that it contains. You can then use that code to help you add functionality or to help you understand the Open XML document parts and relationships that are required to implement a specific document feature. For more information about best practices and the Open XML SDK 2.0 for Microsoft Office, see Erika Ehrli's blog entry Getting Started Best Practices.

Explore It

Did you find this helpful?
(1500 characters remaining)
Thank you for your feedback

Community Additions

ADD
Show:
© 2014 Microsoft. All rights reserved.