Inserting Repeating Data Items into a Word 2007 Table by Using the Open XML API

Summary: Walk through a solution that solves the problem of displaying an arbitrary number of data items on the surface of a Microsoft Office Word 2007 document by using Open XML Formats and the Microsoft .NET Framework 2.0. (6 printed pages)

Stephen Oliver, Microsoft Corporation

February 2009

Applies to: Microsoft Office Word 2007

Contents

  • Overview

  • Inserting Repeating Data

  • Conclusion

  • Additional Resources

Overview

As part of a business solution, developers sometimes want to dynamically display repeating data items of an unknown number in a Microsoft Office Word 2007 document. Word 2007 does provide the ability to dynamically map data to the document surface by using the XML mapping feature of data-bound content controls mapped to custom XML parts. However, you can map a content control only to a single XML element in a custom XML part. In addition, Word 2007 does not provide a table content control that can be bound to custom XML that represents tabular data.

Further, when you attempt to display repeating data items to the surface of a Word 2007 document, you must know at design time how many data items you want to map. This is necessary so that you can create a content control for each XML element that you want to display. Obviously this will not work if there are an unknown, arbitrary number of data items that you want to display in the Word 2007 document. Additionally, in many instances, the Word 2007 document must be manipulated without running Word 2007—for performance and scalability, for constraints inherent in a specific business case, or for some other reason.

Despite these limitations, you can easily develop a solution that displays repeating data items in a Word 2007 document by using the Open XML Format SDK 2.0. This article demonstrates one possible way to display repeating data items by using the Open XML Format SDK 2.0 (CTP). In the Open XML Format SDK 2.0, nearly all elements in the Open XML file format schemas are represented as managed objects. By using the API, you can work with object properties and methods while the API makes the necessary changes in the underlying XML. This level of abstraction allows you to more easily work with the Open XML file formats. Also, the Open XML Format SDK 2.0 does not require that you automate the 2007 Microsoft Office applications to work with their associated Open XML file formats.

Inserting Repeating Data

At its most basic, inserting repeating data is simple: locate the table in a Word 2007 document that you want to modify, add a row to the table for each data item present, and then populate the cells in the row with data taken from a data source.

For example, consider an application that creates an order history document for a customer. The application pulls the order history information for a given customer from a database and then displays that history in a table in a Word 2007 document. In this case, the order history document is based on a generic Word 2007 document that contains placeholders for such things as the customer name and address. The document also contains the table used for the customer's order history. The generic Word document acts as a template for individual instances of the order history document. In this scenario, every new order history document is a copy of the Word 2007 template document. The document is populated at run time with data (for example, the order history) specific to a particular customer and then saved as a separate Word 2007 document with a unique file name.

NoteNote

Template is used here in the general sense and does not mean a Word 2007 template file (.dotx).

Locating a Table

After the application has opened a copy of the generic Word 2007 document, the table in the document must be located so that you can add data to it. This can pose a problem if there is more than one table in the document. The underlying file format for Word 2007 documents follows the Open XML schema for word-processing documents (WordprocessingML). It can be difficult to uniquely identify a given table from another table in a WordprocessingML document, especially if the tables share characteristics such as style and dimensions. Consider the following WordprocessingML for a typical table.

<w:tbl>
    <w:tblPr>
        <w:tblStyle w:val="TableGrid"/>
        <w:tblW w:w="0" w:type="auto"/>
        <w:tblLook w:val="04A0"/>
      </w:tblPr>
      <w:tblGrid>
        <w:gridCol w:w="3192"/>
        <w:gridCol w:w="3192"/>
        <w:gridCol w:w="3192"/>
    </w:tblGrid>

In general, WordprocessingML does not associate an element or element attribute with the table element in such a way that you can easily use WordprocessingML to uniquely identify a table.

You could locate a table in a document based on the style applied to the table, especially where semantic meaning is attached to a given style. For example, you could create a custom table style that is associated with specific business logic. However, this method may not be as useful if more than one table in your document needs to use the same custom style.

Another way to locate a table would be to use its location in the document body. For example, you might use code similar to the following to retrieve the first table in a document.

Table myTable = mainPart.Document.Body.Descendants<Table>().First();

In the preceding line of code, a Table object represents the table element (<tbl>) returned by the First() extension method call. That is, the code searches through all the descendant elements of the body element (<body>), locating only the table descendant elements, and then returns the first one. This might be an acceptable solution in cases where the order of the tables is fixed and will not change arbitrarily. However, if you cannot be certain of the order of the tables, you can locate a table in code more dependably by using content controls.

Locating a Table by Using Content Controls

Content controls have properties that can be used to uniquely identify them. The title property is the text caption for the content control and is visible when the end user views the content control in the Word 2007 document. However, because this property is used to convey information to the end user, it may not be appropriate in some instances to use it to uniquely identify the control to your application.

The tag property is similar to the title property but is not visible to the end user. You can use the tag property to easily identify the control to your application. The following figure shows the Content Control Properties dialog box, with which you can set the title and tag properties.

Figure 1. Content Control Properties dialog box

Content Control Properties dialog box

Because the content control can be uniquely identified, anything that is placed inside the control can also be uniquely identified. When you place a table inside a content control, you then have the ability to locate the table by first locating the content control.

Important noteImportant

Content controls have no built-in mechanism to ensure that the values for the title or tag property are unique in a single Word 2007 document. It is up to the document creator to ensure that the title or tag property value of a content control is unique.

The following figure shows a content control with the title and tag values set.

Figure 2. Content control with title and tag values

Content control with title and tag values

In WordprocessingML, content controls are represented by the structured document tag (<sdt>) element. The <sdt> element, in turn, is represented by Open XML Format SDK classes that begin with "Sdt". In the case of a content control that contains a table, the SdtBlock class represents <sdt> elements that are around one or more block-level structures (for example, paragraphs, tables, and so on).

In the following code, the content control in the example Word 2007 document (depicted in Figure 2) is located in the underlying WordprocessingML based on the value of its tag property. The table in the control is then easily discovered.

 
using (WordprocessingDocument theDoc = WordprocessingDocument.Open(location, true))
    {
        MainDocumentPart mainPart = theDoc.MainDocumentPart;

        // This should return only one content control element: the one with 
        // the specified tag value.
        // If not, "Single()" throws an exception.
        SdtBlock ccWithTable = mainPart.Document.Body.Descendants<SdtBlock>().Where
            (r => r.SdtProperties.GetFirstChild<Tag>().Val == tblTag).Single();

        // This should return only one table.
        Table theTable = ccWithTable.Descendants<Table>().Single();

Because only one content control contains the table that will be modified, the code above uses the Single extension method, which returns only one instance of a specified type.

Figure 3. Content control with table inside

Content control with table inside

Once the table is located in the document, the next step is to insert data items from the data source.

Insert Repeating Data Items

The data source for this example is the Northwind sample database, although any number of different data sources could be used.

Tip

You can find the Northwind sample database on a computer on which Microsoft Visual Studio is installed. By default, Visual Studio 2008 samples and a readme file are installed in drive:\Program Files\Microsoft Visual Studio 9.0\Samples\LCID. The Northwind.mdf database file is contained in the CSharpSamples.zip file in \LinqSamples\Data and in the VBSamples.zip file in \VB Samples\Language Samples\LINQ Samples\LinqToNorthwind\LinqToNorthwind\Data. You can also download SQL Server sample databases at Microsoft SQL Server Community Projects and Samples. For Microsoft Visual Studio Express Editions, all samples are located online. For more information, see How to: Install Sample Databases.

The LINQ query in the following code example uses classes generated from Visual Studio 2008 LINQ to SQL to get customer order information from the Northwind database. For more information about LINQ to SQL classes, see LINQ to SQL.

NorthWindDBDataContext db = new NorthWindDBDataContext();

    var customerOrderHistory =
        from order in db.Orders
        where order.CustomerID == custID
        join detail in db.Order_Details on order.OrderID equals detail.OrderID
        select new { Contact = order.Customer.ContactName,
            NameOfProduct = detail.Product.ProductName, Amount = detail.Quantity };

The select statement projects a new data type that contains the properties for the data that you want to display in the Word 2007 table: the contact name for the customer, the name of the product that the customer purchased, and the amount of product the customer purchased.

After establishing the data source and with the table located in the document, the code gets a reference to the last row in the table. In the following example, the last row is the empty row just under the header row.

// Get the last row in the table.
    TableRow theRow = theTable.Elements<TableRow>().Last();

The table in the document contains only the header row and an empty row under the header row, as shown in Figure 3. Rather than building each row piece-by-piece for every data item, the code copies the empty row (the last row). This ensures that the new row always contains the correct number of columns. For every data item, you add another copy of the empty placeholder row. You add the new rows in the foreach loop in the following code by using the overridden Append method, which takes a parameter array of objects of type OpenXmlElement. To modify text in a table cell by using the Open XML Format SDK, you need to use a Text object, which is a property of the Run object, so the code supplies nested constructors for Paragraph, Run, and then Text objects. The Text object has a constructor overload that takes a string as its only argument, so the code calls that constructor and supplies a value from the data source. Once the code has written a value in each cell of the new row, the row is appended to the table, as shown in the following code.

foreach (var order in customerOrderHistory)
    {
        TableRow rowCopy = (TableRow)theRow.CloneNode(true);
        rowCopy.Descendants<TableCell>().ElementAt(0).Append(new Paragraph 
            (new Run (new Text(order.Contact.ToString()))));
        rowCopy.Descendants<TableCell>().ElementAt(1).Append(new Paragraph 
            (new Run (new Text(order.NameOfProduct.ToString()))));
        rowCopy.Descendants<TableCell>().ElementAt(2).Append(new Paragraph 
            (new Run (new Text(order.Amount.ToString()))));
        theTable.AppendChild(rowCopy);
     }

When each of the data items has been written to the table, the code exits the foreach loop. You could simply save the changes to the customer order history document, but that would leave one small anomaly in the document. Recall that the code in the preceding example copied the existing placeholder row and added each copy to the end of the table but made no changes to the placeholder row itself. By doing so, that created a table in which the row after the header row (the placeholder row) remains empty. To work around this, remove the placeholder row before finally saving all the changes to the table back into the document, as shown in the following code.

 
// Remove the empty placeholder row from the table.
theTable.RemoveChild(theRow);

// Save the changes to the table back into the document.
mainPart.Document.Save();

Calling the Save method of the Document object commits changes made to the in-memory representation of the MainDocumentPart document object model back into the XML file for the MainDocumentPart part (the Document.xml file in the Wordprocessing package).

Conclusion

Using the Open XML Format SDK 2.0 greatly simplifies writing code that manipulates the Open XML file formats. Because the Open XML Format SDK 2.0 represents Open XML file format schema elements as first-class managed objects, developers can manipulate the underlying XML of an Open XML file format document by using familiar object-oriented features, such as properties, methods, and type safety, instead of having to directly manipulate the XML Document Object Model (DOM). As you have seen in this article, this makes tasks like inserting repeating data into a Word 2007 table considerably easier to accomplish.

Additional Resources

For more information, see the following additional resources.