Using Tagged Content Controls to Solve Structured Document Scenarios in Word 2010

Summary: Learn how to use tagged content controls in Microsoft Word 2010 software solutions.

Applies to: Office 2010 | Open XML | Visual Studio Tools for Microsoft Office | Word 2007 | Word 2010

In this article
Understanding Document Solution Scenarios
Understanding Tagged Content Controls
Locating a Tagged Content Control Programmatically
Conclusion
Additional Resources

Published: February 2011

Provided by: Stephen D. Oliver, Microsoft Corporation | Tristan Davis, Microsoft Corporation

Contents

Understanding Document Solution Scenarios

Software solutions that involve manipulating Microsoft Word documents can be categorized into scenarios that describe, in general terms, how the solution interacts with the document. A document generation scenario involves programmatically generating documents. A data extraction scenario involves programmatically extracting specific content from a document.

Content controls are a feature of Word 2007 and Word 2010. A content control is a bounded and potentially labeled region in a document that serves as a container for specific types of content. Individual content controls can contain content such as pictures, dates, lists, or paragraphs of formatted text. Using the same technique previously described in the blog post: The Easy Way to Assemble Multiple Word Documents. Content controls are used in that example to locate the parts of the document in which to insert the individual documents. This technical article describes how to use tagged content controls as a technique in either of these scenarios.

One advantage of using this technique as opposed to using the XML Mapping feature is that once the content control is located by tag, you can retrieve the WordprocessingML markup contained within the content control or push a variety of Word content into the control. Consider the example of a scenario that requires extracting content from within a Rich Text content control that includes rich text and multiple paragraphs. Using XML Mapping, you can only transfer data between a single XML node in the document data store and a content control in the document, and rich text and multiple paragraphs are not permitted. So in this case, the appropriate alternative is to use tagged content controls.

Note

For more information about the Word XML Mapping feature, see How to: Bind a Content Control to a Node in the Data Store. For more information about Word content control development scenarios, see the Word Content Controls Developer Resource Center.

Understanding Tagged Content Controls

A content control has several properties including the tag property. The tag property allows you to uniquely identify, or tag, the control in order to locate it within the document. The tag value is normally hidden from the end user, but is always available to code, making it ideal to use for software solutions.

Tagged content controls can be effectively used as part of a document generation scenario. In this scenario, tagged content controls specify areas where data is pushed into the document that is generated by the solution.

Note

For another example of using tagged content controls as part of a development solution, see Inserting Repeating Data Items into a Word 2007 Table by Using the Open XML API.

Tagged content controls can also be used as part of a data extraction scenario where tagged content controls are used in a document to locate content that you want to extract.

In both scenarios, the value of the tag property is used to locate a specific content control in the document.

Locating a Tagged Content Control Programmatically

To use a tagged content control in either scenario, first set the tag property for the content control. You can set the tag property using the user interface that is provided in the Content Control Properties dialog box in Word 2010 as shown in Figure 2.

Figure 1. Rich Text content control on the document surface

Screenshot of a rich text content control

 

Figure 2. Content Control Properties dialog box

Content control properties dialog box

Setting the Content Control Tag Property

The following procedure shows how to set the value of the tag property for a Rich Text content control.

To set the tag property of a content control

  1. In Word 2010, click the Developer tab and then click Design Mode.

  2. On the Developer tab, click the Rich Text Content Control button in the Controls group.

    Note

    You may have to enable the display of the Developer tab if it is not already displayed. To show the Developer tab on the ribbon, click File, click Options, click Customize Ribbon, and then select Developer under Customize the Ribbon. Click OK to apply the changes.

  3. Right-click the Rich Text Content Control and then select Properties.

  4. In the Title text box, type a title for the content control (for example, Policy Number). The title is always visible to the end user and is useful for indicating the information that should go inside the content control.

  5. In the Tag text box, type a unique name (for example, idPolicyNumber). The tag is only visible to the user if the document is in Design Mode. Otherwise, it is not displayed.

    Note

    Word does not enforce unique naming. If a specific scenario requires different content controls in a document to have unique tag values, the document creator must ensure that tags are unique.

  6. Click OK to set the Title and Tag properties. The content control displays the Title and Tag text.

  7. On the Developer tab, click Design Mode to toggle off Design Mode and then click the content control that you created. The Title text appears and the Tag text no longer appears.

When the tag property of the content control is set with a unique identifier, you can locate it programmatically using the tag value.

Figure 3. Content Control Properties dialog box with Title and Tag values entered

Content control properties dialog with values set

 

Figure 4. Rich Text content control with Title and Tag values entered

Content control in Design Mode view

Using Word Automation with Visual Basic for Applications to Locate a Content Control by Tag

You can locate a tagged content control using either Word automation, or without Word automation by writing code that manipulates the file format directly. When using Word automation, there are two Word object model members that you can use to locate a tagged content control.

The Document.SelectContentControlsByTag method is one way that you can use to locate a content control using its tag value. The Document.SelectContentControlsByTag method returns a ContentControls collection object that contains all content controls with the specified tag value as shown in the following code example.

Sub GetCCByTag()
Dim cc As ContentControl
Dim docCCs As ContentControls

' Get the collection of all content controls with this tag.
Set docCCs = ActiveDocument.SelectContentControlsByTag("idPolicyNumber")

' If any content controls are found iterate through them and give the type.
If docCCs.Count <> 0 Then
    For Each cc In docCCs
        MsgBox cc.PlaceholderText
    Next
Else
    MsgBox "No content controls found with that tag value."
End If
End Sub

The Word object model also provides the Document.ContentControls property, which is a collection of all the content controls in the document. You can iterate through this collection and locate a specific content control using its tag property, as demonstrated in the following example.

Sub GetCCByTag()
Dim cc As ContentControl
Dim docCCs As ContentControls
Dim ccTag As String

' Iterate through all the content controls in the document
' and select those having the specified tag value.
If ActiveDocument.ContentControls.Count <> 0 Then
    For Each cc In ActiveDocument.ContentControls
        If cc.Tag = "idPolicyNumber" Then
            MsgBox cc.PlaceholderText
        Else
            MsgBox "No content control found using that tag value."
        End If
    Next
End If
End Sub

Using the Open XML SDK and LINQ to Locate a Content Control by Tag

Using Word automation works well for scenarios where running the Word application is not a problem. To process many documents quickly, it is usually faster to work directly against the file format of the document instead of using Word automation. To do this, you must know how the underlying XML in the file format represents content controls.

Using Content Controls in WordprocessingML

The Open File Format WordprocessingML schema represents content controls in a Word document as structured document tag (<sdt>) elements. The content control tag property is represented by the <tag> element, which is a child element of the content control property’s (<sdtPr>) element.

<w:sdt>
        <w:sdtPr>
          <w:alias w:val="Policy Number"/>
          <w:tag w:val="idPolicyNumber"/>
          <w:id w:val="651797572"/>
          <w:placeholder>
            <w:docPart w:val="45F4EF9575044CEDAC3AA9542E3BE792"/>
          </w:placeholder>
          <w:text/>
        </w:sdtPr>
        <w:sdtEndPr/>
        <w:sdtContent>
          <w:r w:rsidR="00E333BD">
            <w:t>123456</w:t>            
          </w:r>
        </w:sdtContent>
      <w:sdt>

You can use the classes provided in the Microsoft .NET Framework to work with XML in general. However, the Open XML SDK 2.0 API is designed to work directly against the Open XML Formats. You can use the Open XML SDK 2.0 API to locate <sdt> elements in the document that have a specified tag attribute value.

Using the Open XML SDK and LINQ

The Open XML SDK 2.0 API provides strongly typed classes that represent elements in the Open XML Formats schemas. In the Open XML SDK 2.0, the WordprocessingML <sdt> element is represented by more than one class. The Open XML SDK 2.0 API class that you use depends on the position of the content control within the document and the kind of content it contains.

For example, consider the Word document shown earlier in this article. The Word document could be used as a form where the form contains a content control that holds a policy number. The content control has a tag value of idPolicyNumber.

The underlying XML in the document shows that the content control is inside a paragraph.

<w:p w:rsidR="00951B81"
         w:rsidRDefault="00D5228A">
      <w:sdt>
        <w:sdtPr>
          <w:alias w:val="Policy Number"/>
          <w:tag w:val="idPolicyNumber"/>
          <w:id w:val="651797572"/>
          <w:placeholder>
            <w:docPart w:val="45F4EF9575044CEDAC3AA9542E3BE792"/>
          </w:placeholder>
          <w:text/>
        </w:sdtPr>
        <w:sdtEndPr/>
        <w:sdtContent>
          <w:r w:rsidR="00E333BD">
            <w:t>123456</w:t>            
          </w:r>
        </w:sdtContent>
      </w:sdt>
    </w:p>

Because the content control is inside a paragraph (<p>), you use the SdtRun class from the Open XML SDK 2.0 API. The SdtRun class represents content controls that are around inline-level structures, such as runs (<r>), in a paragraph. The Open XML SDK 2.0 API SdtContentRun class represents an <sdtContent> element inside an <sdt> element that contains inline-level content. Within the <sdtcontent> element, run <r> elements indicate runs of text. In this example, there is only a single run with one text <t> element. You can use the Open XML SDK 2.0 API classes to work directly against the XML of the Word document to locate a specific content control by its tag and then traverse elements within the markup for the content control.

The following code sample shows how to do this using the Word document in the previous example.

public static string GetContentControlValueByTag(string location, string tag)
        {
            // Open the Word document specified by "location".
            using (WordprocessingDocument theDoc = WordprocessingDocument.Open(location, true))
            {
                // Get the package part that holds the main document content.
                MainDocumentPart mainPart = theDoc.MainDocumentPart;

                // Find the content content control whose Tag value == tag.
                // In this case, the tag value is "idPolicyNumber".
                SdtRun policyNumberCC = mainPart.Document.Body.Descendants<SdtRun>().Where(r => r.SdtProperties.GetFirstChild<Tag>().Val == tag).SingleOrDefault();

                // Find the <sdtContent> element of the content control.
                SdtContentRun contentRun = policyNumberCC.GetFirstChild<SdtContentRun>();

                // Get the string inside the content control.
                string policyNumber = contentRun.GetFirstChild<Run>().GetFirstChild<Text>().InnerText;

                return policyNumber;
            }
         }

Note

For clarity, the previous code sample does not include error-handling code typically used in a production environment.

The [GetContentControlValueByTag] method opens the Word document that is specified by location, locates a content control using the Tag value of the control (specified by tag), and then retrieves the text within the content control.

Conclusion

This article has discussed how to use the Tag property of a content control to locate it inside a given Word document. For more information about programmatically interacting with content controls, the Open XML Formats, and using the Open XML SDK 2.0, see the following resources.

Additional Resources