OfficeTalk: Creating Form Letters in Word by Using Bookmarks and Office Open XML Files

This content is no longer actively maintained. It is provided as is, for anyone who may still be using these technologies, with no warranties or claims of accuracy with regard to the most recent product version or service release.

Summary: Microsoft Office Open XML enables you to manipulate files without the need to have Microsoft Office installed. Learn how to automatically populate bookmarks in Word to create documents such as form letters. (16 Printed Pages)

Applies to:    Microsoft Word 2007 | Microsoft Word 2010
Provided by:   Frank Rice, Microsoft Corporation

Contents

  • Background

  • Examining the Project

  • Creating the Workbook and Template Document

  • Creating the Project

  • Testing the Project

  • Conclusion

  • Additional Resources

Background

A huge advantage of the Office Open XML file formats, introduced in the 2007 Microsoft Office system, is the ability to create and manipulate documents without the need to install Microsoft Office, either on the computer doing the work or on the recipient’s computer. Combining this with the objects and methods in the DocumentFormat.OpenXml namespace, provided by the Open XML SDK 2.0, enables you to easily manipulate the structure and contents of Office Open XML documents.

In this column, I demonstrate combining these items into a project created in Microsoft Visual Studio 2008 that automatically creates one or more documents from a template and then populates a set of bookmarks with data from an external data source. The data source that is used in this project is a Microsoft Excel workbook but it could just as easily be a Microsoft Access database or a text file. The project also assumes that you have created a Microsoft Word template beforehand. The steps in this column walk you through creating the workbook and Word template.

Examining the Project

The project is a console application that you create in Visual Studio 2008 although you can create it in Microsoft Visual Studio 2010. I created one project in C# and another in Visual Basic .NET so I show the code for both in the examples and you can decide which language that you want to work in.

The data source is a contiguous range of data in an Excel worksheet. The data is a listing of addresses as shown in Figure 1.

Figure 1. The address data in an Excel worksheet

Address data in an Excel worksheet

 

I retrieved the data from the workbook by using automation. However, you may decide to retrieve the data by using Open XML or by using a text or XML reader. For more information about this and other topics, see the Additional Resources section at the end of this column.

The project retrieves the worksheet data as a comma-delineated string and parses it into an array. A new document is then created from a template (see Figure 2) and the main document part is opened and read into an XML document.

Figure 2. The template with bookmarks used to create the form letters

Template with bookmarks

 

Next, the first bookmark and its associated text nodes are retrieved based on the bookmark ID (zero-based) by using an XPath query. The code then iterates through one or more text nodes.

Note

A bookmark may contain data split across multiple text nodes depending on how the bookmark was created and modified. For example, suppose that you insert a bookmark into a document by highlighting placeholder text, and adding the bookmark (from the insert bookmark command on the Ribbon user interface). Then, you go back and edit the placeholder text. In all likelihood, the placeholder text is now contained across multiple text nodes. You must take multiple text nodes into consideration when updating bookmarks programmatically.

When the code reaches the first text node, the associated piece of address data (such as FirstName or Street) is copied into it. The code then continues iterating through the remaining text nodes, deleting each one so that the result is one text node per bookmark that contains the address text. This process continues for each bookmark.

And finally the XML is written back to the main document part and the process continues, assuming that there is another address in the queue. The result for one address is shown in Figure 3. All of this is explained in more detail as you read through the commented code later in this column.

Figure 3. A document with updated bookmarks

Document with updated bookmarks

 

Creating the Workbook and Template Document

In the following steps, you create the Excel 2007 workbook that contains the address data and the template document with bookmarks.

  1. Start Excel 2007 or Excel 2010.

  2. On Sheet1, add the addresses shown in Figure 1. Save the workbook as Test.xlsx to C:\Visual Studio Projects\CreateFormLetters (or to a folder of your choosing) and then close it.

  3. Start Word 2007 or Word 2010.

  4. Type the form letter shown in Figure 2.

  5. Drag the pointer to select the Date item.

  6. On the Insert menu, click Bookmark. In the Bookmark name box, type Date, and click the Add.

  7. Next, highlight the FirstName item in the letter, open the Bookmark dialog, type FirstName in the Bookmark name box and then click Add.

  8. Repeat this procedure for the remaining address items in the order in which they appear in the letter: LastName, Street, City, State, Zip, Greeting.

  9. Save the document as HiringTemplate.docx to the same folder you used previously and then close it.

Creating the Project

Create the console application by using the following steps.

  1. Start Microsoft Visual Studio.

  2. On the File menu, click New, and then click Project.

  3. In the New Project dialog, under the language that you are using (C# or Visual Basic), select Windows, and then click Console Application.

  4. In the Name box, type CreateFormLetters. In the Location box, type in the folder location that you specified in the previous procedure (C:\Visual Studio Projects). If you choose a location different from the one in this example, you must change the path in the Main and RemoveBookMarks procedures as seen shortly. Click OK.

  5. Next, you add references to the type libraries and assemblies that you will use in the project. In Solution Explorer, right-click the References folder, and then click Add Reference. If you do not see the References folder, on the Project menu at the top of the screen, click Show All Files.

  6. In the Add Reference dialog, on the .NET tab, scroll down, and select DocumentFormat.OpenXml. If you do not see this entry, you must to install the Open XML SDK 2.0 from the location described previously in this column.

  7. Next, create a reference to the Microsoft Excel Object Library. Right-click the References folder again, click Add Reference, and in the Add Reference dialog, click the COM tab.

  8. Scroll down until you see either the Microsoft Excel 12.0 Object Library or Microsoft Excel 14.0 Object Library entry, depending on which version of Microsoft Office you are using, select it, and then click OK.

  9. Next, add code to the project. In Solution Explorer, double-click either Module1.vb (Visual Basic only) or Program.cs (C# only).

  10. At the top of the code screen, add the following statements that specify the references that you added earlier.

    Imports Excel = Microsoft.Office.Interop.Excel
    Imports DocumentFormat.OpenXml.Packaging
    Imports DocumentFormat.OpenXml.Wordprocessing
    Imports System.IO
    Imports System.Xml
    
    using System;
    using System.Reflection;
    using System.IO;
    using System.Xml;
    using Excel = Microsoft.Office.Interop.Excel;
    using Word = Microsoft.Office.Interop.Word;
    using DocumentFormat.OpenXml.Packaging;
    using DocumentFormat.OpenXml.Wordprocessing;
    
  11. Just above the Sub Main (Visual Basic only) or private void main (C# only) statement, add the following lines which create a reference to Excel and an Excel workbook, respectively.

    Private objExcelApp As Excel.Application
    Private objBook As Excel._Workbook
    
    Excel.Application objExcelApp;
    Excel._Workbook objBook;
    
  12. Inside the Main (main) procedure, insert the following code:

    Dim objBooks As Excel.Workbooks
    Dim objSheets As Excel.Sheets
    Dim objSheet As Excel._Worksheet
    Dim rngLast As Excel.Range
    Dim valueString As String = ""
    
    objExcelApp = New Microsoft.Office.Interop.Excel.Application()
    objBooks = objExcelApp.Workbooks
    'Open the workbook containing the address data.
    objBook = objBooks.Open("C:\\Visual Studio Projects\\CreateFormLetters\\Test.xlsm")
    'Get a reference to the first worksheet.
    objSheets = objBook.Worksheets
    objSheet = objSheets.Item(1)
    
    'Select the range of data containing the addresses and get the outer boundaries.
    rngLast = objSheet.Range("A1").CurrentRegion.SpecialCells(Excel.XlCellType.xlCellTypeLastCell)
    Dim lLastRow As Long = rngLast.Row
    Dim lLastCol As Long = rngLast.Column
    Dim rowCounter As Long
    Dim colCounter As Long
    
    'Iterate through the data and concatenate the values into a comma-delimited string.
    For rowCounter = 1 To lLastRow
       For colCounter = 1 To lLastCol
          'Write the next value into the string.
          valueString = String.Concat(valueString, objSheet.Cells(rowCounter, colCounter).Value.ToString() + ",")
       Next colCounter
    Next rowCounter
    
    Dim stringList() As String
    'Split the string into a character array.
    stringList = valueString.Split(",".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
    
    'Create and populate the form documents.
     ReplaceBookMarks(stringList)
    
    'Close the workbook.
    objBook.Close()
    
    Excel.Workbooks objBooks;
    Excel.Sheets objSheets;
    Excel._Worksheet objSheet;
    Excel.Range rngLast;
    
    objExcelApp = new Microsoft.Office.Interop.Excel.Application();
    objBooks = (Excel.Workbooks)objExcelApp.Workbooks;
    //Open the workbook containing the address data.
    objBook = objBooks.Open("C:\\Visual Studio Projects\\CreateFormLetters\\Test.xlsm", Missing.Value, Missing.Value,
       Missing.Value, Missing.Value,
       Missing.Value, Missing.Value,
       Missing.Value, Missing.Value,
       Missing.Value, Missing.Value,
       Missing.Value, Missing.Value,
       Missing.Value, Missing.Value);
    //Get a reference to the first sheet of the workbook.
    objSheets = objBook.Worksheets;
    objSheet = (Excel._Worksheet)objSheets.get_Item(1);
    
    //Select the range of data containing the addresses and get the outer boundaries.
    rngLast = objSheet.get_Range("A1").SpecialCells(Microsoft.Office.Interop.Excel.XlCellType.xlCellTypeLastCell);
    long lLastRow = rngLast.Row;
    long lLastCol = rngLast.Column;               
    
    // Iterate through the data and concatenate the values into a comma-delimited string.
    for (long rowCounter = 1; rowCounter <= lLastRow; rowCounter++)
    {
       for (long colCounter = 1; colCounter <= lLastCol; colCounter++)
       {
          //Write the next value into the string.
          valueString = String.Concat(valueString, objSheet.Cells[rowCounter, colCounter].Value.ToString() + ",");
       }      
    }
    
    //Split the string into a character array.
    string[] stringList = valueString.Split(",".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
    
    //Create and populate the form documents.
    ReplaceBookMarks(stringList);
    
    //Close the workbook.
    objBook.Close();
    

    These procedures open the existing workbook containing the addresses, iterates through the rows and columns that contain the data, and then concatenates them into a comma-delimited string. The string is then split into a character array. Next, the ReplaceBookMarks procedure is called, passing in the array of address data, and then the workbook is closed.

  13. Add the following procedure after the Main (main) procedure.

    Private Sub ReplaceBookMarks(ByVal stringList() As String)
    Dim wordmlNamespace As String = "https://schemas.openxmlformats.org/wordprocessingml/2006/main"
    Dim nextFormLetterVersion As Integer = 1
    Dim addressItemNumber As Integer = 0
    Dim numOfBookmarks As Integer = 7
    
    While (addressItemNumber < stringList.Length)
       'Make a copy of the template file.
       File.Copy("C:\Visual Studio Projects\CreateFormLetters\HiringTemplate.docx", ("C:\\Visual Studio Projects\\CreateFormLetters\\FormLetter" _
         & (nextFormLetterVersion & ".docx")), True)
    
       'Open the document as an Open XML package and extract the main document part.
       Dim wordPackage As WordprocessingDocument = WordprocessingDocument.Open(("C:\Visual Studio Projects\CreateFormLetters\FormLetter" _
         & (nextFormLetterVersion & ".docx")), True)
       Dim part As MainDocumentPart = wordPackage.MainDocumentPart
    
       'Set up the namespace manager so you can perform XPath queries 
       'to search for bookmarks in the part.
       Dim nt As NameTable = New NameTable
       Dim nsManager As XmlNamespaceManager = New XmlNamespaceManager(nt)
       nsManager.AddNamespace("w", wordmlNamespace)
    
       'Load the part's XML into an XmlDocument instance.
       Dim xmlDoc As XmlDocument = New XmlDocument(nt)
       xmlDoc.Load(part.GetStream)
       Dim i As Integer = 0
    
       'Iterate through the bookmarks.
       For i = 0 To numOfBookmarks
          Dim firstTextNodeFound As Boolean = False
          '"i" is the index number of the bookmarks and the loop counter. You use
          'it to retrieve the bookmark by ID.
          'Get the beginning and end bookmark nodes as well as the text node for that ID.
          Dim bookmarkStartNode As XmlElement = xmlDoc.DocumentElement.SelectSingleNode(("//w:bookmarkStart[@w:id='" & i & "']"), nsManager)
          Dim followingNodesList As XmlNodeList = bookmarkStartNode.SelectNodes((".//following::w:t | .//following::w:bookmarkEnd[@w:id='" & i & "']"), nsManager)
          For Each el As XmlElement In followingNodesList
    
             'Update the value of first text node and remove all other text nodes
             'falling between the bookmark start and bookmark end nodes.
             If (el.Name = "w:t") Then
                If firstTextNodeFound Then
                   el.ParentNode.RemoveChild(el)
                Else
                   Select Case (i)
                      Case 0
                         'Insert the formatted date into the first bookmark.
                         Dim dt As DateTime = DateTime.Now
                         Dim dateString As String = String.Format("{0:MMMM dd yyyy}", dt)
                         el.InnerText = dateString
                         firstTextNodeFound = True
                      Case 7
                         'Insert the first name in the greeting bookmark.
                         el.InnerText = stringList(addressItemNumber - 6).ToString
                         firstTextNodeFound = True
                      Case Else
                         'Insert the address data into the other bookmarks.
                         el.InnerText = stringList(addressItemNumber).ToString
                         firstTextNodeFound = True
                         addressItemNumber =addressItemNumber + 1
                      End Select
                   End If
    
                 Else
                    Exit For
                 End If
              Next
           Next
    
           'Write the changes back to the document part.
           xmlDoc.Save(wordPackage.MainDocumentPart.GetStream(FileMode.Create))
    
           'Increment the form letter version.
           nextFormLetterVersion = (nextFormLetterVersion + 1)
       End While
    End Sub
    
    private void ReplaceBookMarks(string[] stringList) 
    {
       string wordmlNamespace = "https://schemas.openxmlformats.org/wordprocessingml/2006/main";
       int nextFormLetterVersion = 1;
       int addressItemNumber = 0;
       int numOfBookmarks = 7;
    
       while (addressItemNumber < stringList.Length)
       {
          // Make a copy of the template file.
          File.Copy(@"C:\Visual Studio Projects\CreateFormLetters\HiringTemplate.docx", "C:\\Visual Studio Projects\\CreateFormLetters\\FormLetter" + nextFormLetterVersion + ".docx", true);
    
          //Open the document as an Open XML package and extract the main document part.
          WordprocessingDocument wordPackage = WordprocessingDocument.Open(@"C:\Visual Studio Projects\CreateFormLetters\FormLetter" + nextFormLetterVersion + ".docx", true);
          MainDocumentPart part = wordPackage.MainDocumentPart;
    
          //Setup the namespace manager so you can perform XPath queries 
          //to search for bookmarks in the part.
          NameTable nt = new NameTable();
          XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
          nsManager.AddNamespace("w", wordmlNamespace);
    
          //Load the part's XML into an XmlDocument instance.
          XmlDocument xmlDoc = new XmlDocument(nt);
          xmlDoc.Load(part.GetStream());
    
          //Iterate through the bookmarks.
          for (int i = 0; i <= numOfBookmarks; i++)
          {
              bool firstTextNodeFound = false;
    
              //"i" is the index number of the bookmarks and the loop counter. You use it
              //to retrieve the bookmark by ID.
              XmlElement bookmarkStartNode = (XmlElement)xmlDoc.
                 DocumentElement.SelectSingleNode("//w:bookmarkStart[@w:id='" + i + "']", nsManager);
              //Get the beginning and end bookmark nodes as well as the text node for that ID.
              XmlNodeList followingNodesList = bookmarkStartNode.
                 SelectNodes(".//following::w:t | .//following::w:bookmarkEnd[@w:id='" + i + "']", nsManager);
    
              foreach (XmlElement el in followingNodesList)
              {
                 //Update the value of first text node and remove all 
                 //other text nodes falling between bookmark start and bookmark end nodes.
                 if (el.Name == "w:t")
                 {
                    if (firstTextNodeFound)
                    {
                       el.ParentNode.RemoveChild(el);
                    }
                    else
                    {
                       switch (i)
                       {
                          case 0:
                             //Insert the formatted date into the first bookmark.
                             DateTime dt = DateTime.Now;
                             string dateString = String.Format("{0:MMMM dd yyyy}", dt);
                             el.InnerText = dateString;
                             firstTextNodeFound = true;
                             break;
                          case 7:
                             //Insert the first name in the greeting bookmark.
                             el.InnerText = stringList[addressItemNumber - 6].ToString();
                             firstTextNodeFound = true;
                             break;
                          default:
                             //Insert the address data into the other bookmarks.
                             el.InnerText = stringList[addressItemNumber].ToString();
                             firstTextNodeFound = true;
                             addressItemNumber++;
                             break;
                        }
                     }
                  }
                  else
                  {
                     //It is a different bookmark node so move to the next one.
                     break;
                  }
               }
             }
    
             //Write the changes back to the document part.
             xmlDoc.Save(wordPackage.MainDocumentPart.GetStream(FileMode.Create));
    
             //Increment the form letter version.
             nextFormLetterVersion++;
          }
       }
     }
    
  14. In this procedure, first a copy of the template document is created with a different version number for each form letter. Then, the document is opened as an Office Open XML package and the main document part is extracted. Next, a namespace manager is created and initialized to help with performing XPath queries on the part to find each bookmark. The part's XML is then loaded into an Xml document. Then, the bookmark start, text, and end nodes (encapsulated by w:bookmarkStart, w:t, and w:bookmarkEnd tags, respectively) are retrieved based on the bookmark ID number. The first text node is located and the appropriate piece of address data is copied into it. Next, the remaining text nodes are removed. If the bookmark is the date item, the current date is inserted into it. If the bookmark is the greeting at the top of the body of the letter, the addressee's first name is inserted into it. Otherwise, the address data is inserted in its associated bookmarks. And finally, the Xml is written back to the main document part.

Testing the Project

Perform the following steps to test the project.

  1. On the Build menu at the top of the screen, click Build Solution.

  2. When the build is complete and has no errors, press the F5 key to run the project.

  3. Because there are five addresses in the test workbook, you should see five form letters created that have increasing version numbers from 1 to 5.

  4. Open the first form letter. It should resemble that seen in Figure 3.

Conclusion

The Office Open XML format enables you to create and manipulate documents without having Microsoft Office installed on your computer. In this column, you were able to create copies of a Word template and update the bookmarks in the document from the data in an Excel workbook. An improvement you might want to make in this project is retrieving the bookmark by name instead of by ID.

Additional Resources

Find more information about the topics described in this column at the following locations.