Export (0) Print
Expand All
Expand Minimize

Automating Word Tables for Data Insertion and Extraction

Office 2003

This content is outdated and is no longer being maintained. It is provided as a courtesy for individuals who are still using these technologies. This page may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

Summary: Learn how to automate the creation and formatting of tables in Word. Get information about optimizing performance, populating a table with data, formatting table structure, linking table data, and extracting data from a table. (25 printed pages)

You can look at the world as split into applications that store data (databases) and applications that present information, such as Microsoft Office Word 2003 and Microsoft Office PowerPoint 2003. Increasingly, the end user demands to display database content in documents and presentations. While Word does provide some tools for displaying tables from databases in its documents, these are somewhat rudimentary, they require a basic understanding of how the database is built, and using them involves a number of steps. In addition, there may also be security and access issues involved, requiring additional layers of protection. The developer is therefore increasingly confronted with the task of transferring data into Word, whether in the form of tables, or as part of the document text. This article considers some of the major aspects of using the Word object model to work with tables.

One of the bigger mysteries for developers unfamiliar with the Word object model is determining the most appropriate method to automate the creation and formatting of tables in Word. The task has a number of aspects. In this article, I discuss:

  • Optimizing performance when working with tables.

  • Populating a table with data.

  • Formatting the table structure and data.

  • Linking table data in from another application.

  • Extracting data from a Word table.

I will show you the code, with samples in classic Microsoft Visual Basic and C#. (From these two languages, the syntax for Microsoft Visual Basic .NET can also be derived.) I will also discuss some of the idiosyncrasies of the Word object model, and why particular techniques may be better to use, rather than other, more obvious, approaches.

NoteNote

You can download the sample projects as *.bas and *.cs files for Word Visual Basic for Applications, Visual Basic 6.0, and C#. The code generates a document in Word with tables created and formatted using the methods discussed in this article.

Table functionality has changed to a greater or lesser extent in every version of Word since Microsoft Word 97. Tables in Microsoft Word 2000 acquired HTML (Web) table characteristics; among other things, columns can resize automatically to fit cell content and you can size them as a percentage of the whole. Microsoft Word 2002 tables gained the ability to support text wrapping around the table, as well as within table cells. In Word 2003, tables formatted with text wrapping can be set to break to the next page (this was not possible in Word 2002). So, if you are programming for multiple versions of Word, you need to be aware that Word 97, Word 2000, Word 2002, and Word 2003 can differ significantly, depending on what you want to do. Do not assume that what you develop in one version works in other versions; be sure to install all versions and test your code with each of them.

Improving Performance When Automating Tables

Performance is often an issue when automating Word, especially when creating tables. Working with tables in Word 2000 or later versions can be slow, mostly due to the new Web capabilities mentioned above. Word has to do a lot of work in the background to calculate the page layout.

One way to speed things up: Do not automate Word at all when creating a document. Write an RTF, HTML, or, for Word 2003, an XML file. To learn more about RTF, see the Rich Text Format (RTF) Specification, version 1.6. To learn more about XML files, see the article What You Can Do with Word XML. Find information about all three file formats in the Microsoft Knowledge Base article Information About How to Extract Office File Formats and Schemas.

If you do automate Word, move things along more swiftly by building the table in the view state requiring the fewest resources: the Normal view. The Normal view is optimized for performance, and is not "what you see is what you get" (WYSIWYG). For things that require exact page layout, such as positioning graphics with text flow formatting, you need to be in the Print Layout view. But Normal view is optimal for dumping in and formatting large quantities of text and tables.

In addition, you can restrict screen updating and suppress automatic pagination in Normal view, speeding execution even more.

private void OptimizePerformance(wd.Document wdDoc, wd.ApplicationClass
    wdApp)
{
    wdDoc.ActiveWindow.View.Type = wd.WdViewType.wdNormalView;
    wdApp.Options.Pagination = false;
    wdApp.ScreenUpdating = false;
}

When creating a table, ensure that it behaves as tables did in Word 97, before the addition of the new Web features. You can later enable or set any of the newer features, but while creating the table it is advisable to turn them off by setting the DefaultTableBehavior to wdWord8TableBehavior, as shown in the following code sample.

NoteNote

Remember that a Word table may contain a maximum of 64 columns.

using wd = Microsoft.Office.Interop.Word;
object objDefaultBehaviorWord8 = 
    wd.WdDefaultTableBehavior.wdWord8TableBehavior;
object objAutoFitFixed = wd.WdAutoFitBehavior.wdAutoFitFixed;
private void CreateBasicTable(ref wd.Range wdRng, System.Data.SqlClient.SqlDataReader dr)
{
    int nrRows = 1;
    int nrCols = dr.FieldCount;
    wd.Table tbl = wdRng.Tables.Add(wdRng, nrRows, nrCols, 
        ref objDefaultBehaviorWord8, ref objAutoFitFixed);
    PopulateAndExtendTable(tbl, dr);
}
NoteNote

The Tables.Add method shown here is the same in Word 2000, 2002, and 2003; the last two arguments are not used or allowed in Word 97; code containing them does not compile for Word 97.

Caution noteCaution

The VBA documentation is a more complete source of information available on the Word object model—but test the assumptions, as it is not entirely error free. For example, the information about the AutoFitBehavior argument says "Sets the AutoFit rules for how Word sizes tables. Can be one of the following WdAutoFitBehavior constants: wdAutoFitContent, wdAutoFitFixed, or wdAutoFitWindow. If DefaultTableBehavior is set to wdWord8TableBehavior, this argument is ignored." However, testing Tables.Add as in the sample code and then again using wdAutoFitWindow reveals that the argument is not ignored completely. When you use wdAutoFitWindow the table resizes to exactly fit within the document's left and right margins if these are changed. wdAutoFitContent, however, does not come into effect if you set wdWord8TableBehavior as the DefaultTableBehavior. Word can lay out the page faster when you use wdAutoFitFixed.

Even if you use all of the possible performance optimization techniques, you find that very long tables slow things down in Word. In this case, you may want to consider breaking the table into multiple, smaller tables.

Usually, when I insert a table into Word through automation I intend to fill that table with data. Intuitively, most developers insert a table with one or two rows and the requisite number of columns, and then proceed to add rows, one after the other, as needed. Only when they test with a large number of records do they notice that this process is slow. Next, they attempt to determine the number of rows required, create the table with all the rows, then loop through cell by cell to fill in the data. Unfortunately, this is not significantly faster. The following code demonstrates these two techniques.

NoteNote

Be careful when inserting objects into a cell range. The Word object model allows you to assign things to Table.Cell(index, index).Range, just as to any other range. But if you do not collapse the range to the cell's starting point (Direction:=wdCollapseStart) you may insert objects (such as pictures) into the table structure, rather than into the cell that you intended. Note that the Text property is the default for the Range property; assigning a string to a cell's Range property is allowed in the classic Visual Basic languages (and Visual Basic .NET if Option Strict is not on). This does not damage the table structure because the string is assigned to the Text property, behind the scenes.

NoteNote

Also be aware, when working with arrays and data tables that writing to the cell index zero (tbl.Cell(0,0).Range.Text) damages the table structure and crashes Word. The first cell in a Word table is index 1, 1.

//Add rows to table as required
object objMissing = System.Reflection.Missing.Value;
private void PopulateAndExtendTable(wd.Table tbl, 
System.Data.SqlClient.SqlDataReader dr)
{
    int nrCols = dr.FieldCount;
    int nrRow = 1;
    for (int nrCol = 1; nrCol <= (nrCols); nrCol++)
    {
        // Column headings come are added first.  
        // Note that the column names are in a zero-based
        // collection, so subtract one from nrCol when retrieving
        // a name.
        tbl.Cell(nrRow, nrCol).Range.Text = dr.GetName(nrCol-1);
    }
    while (dr.HasRows && dr.Read())
    {
        tbl.Rows.Add(ref objMissing);
        nrRow++;
        for (int nrCol = 1; nrCol <= (nrCols); nrCol++)
        {
            // Now add the records.
            tbl.Cell(nrRow, nrCol).Range.Text = 
                dr.GetValue(nrCol-1).ToString();
        }
    }
    dr.Close();
    sqlConnection1.Close();
}
//Populate existing rows
private void PopulateTable(wd.Table tbl, System.Data.DataTable data)
{
    // Iterate through the columns to get the headings.
    for(int nrCol = 1; nrCol<=data.Columns.Count; nrCol++)
    {
        tbl.Cell(1, nrCol).Range.Text = data.Columns[nrCol-1].ColumnName;
    }

    // Iterate through the rows. The first row contains 
    // the column headings, so start with the second row.
    for(int nrRow = 2; nrRow-1<=data.Rows.Count; nrRow++)
    {
        // data.Rows is zero-based, so subtract two
        // in order to start with the first record.
        System.Data.DataRow rw = data.Rows[nrRow - 2];
        // Iterate through the columns to get the data.
        for(int nrCol = 1; nrCol<=data.Columns.Count; nrCol++)
        {
            tbl.Cell(nrRow, nrCol).Range.Text = rw[nrCol-1].ToString();
        }
    }

}

The most efficient way to create a table of data in a Word document is to concatenate the data into a delimited string, assign it to a document range, and then convert the range to a table. The samples below use a tab character as the field delimiter, but you may use any character. The record delimiter, however, must be ANSI 13 (a carriage return).

NoteNote

Place the content of each field within a pair of single or double quotes if the field or record delimiter could be part of the field data.

private wd.Table CreateTableFromString(ref wd.Range wdRng, System.Data.DataTable data)
{
    wdRng.Text = BuildDataString(data);
    wd.Table tbl = wdRng.ConvertToTable(ref objTabChar, ref objMissing, 
        ref objMissing, ref objMissing, ref objMissing, ref objMissing, 
        ref objMissing, ref objMissing, ref objMissing, ref objMissing, 
        ref objMissing, ref objMissing, ref objMissing, ref objMissing, 
        ref objAutoFitFixed, ref objDefaultBehaviorWord8);
    return tbl;
}
private string BuildDataString(System.Data.DataTable data)
{
    string dataString = "";
    for(int nrCol=1; nrCol<=data.Columns.Count; nrCol++)
    {
        // Fill the column headings.
        dataString += data.Columns[nrCol-1].ColumnName;
        if(nrCol < data.Columns.Count)
        {
            // Append a field delimiter.
            dataString += "\t";
        }
        else
        {
            // We're on the last colunm, so append a 
            // record delimiter
            dataString += "\n";
        }
    } // end for column headings
    for(int nrRow=1; nrRow<=data.Rows.Count; nrRow++)
    {
        System.Data.DataRow rw = data.Rows[nrRow-1];
        for(int nrCol=1; nrCol<=data.Columns.Count; nrCol++)
        {
            dataString += rw[nrCol-1].ToString();
            if(nrCol < data.Columns.Count)
            {
                // Append a field delimiter.
                dataString += "\t";
            }
            else
            {
                // We're on the last column, so append a
                // record delimiter.
                dataString += "\n";
            }
        }
    }
    return dataString;
} 
TipTip

You can find a code example for Visual Basic .NET, using data from a Web service in the article Working with ADO.NET Datasets in Microsoft Office.

Formatting Tables Programmatically in Word

In order to apply formatting, you must specify to what you are applying it. As with filling a table with data, it is faster to apply formatting to ranges or groups of cells, rather than to process formatting cell-by-cell. You can assign an entire table to a range, as well as entire rows; multiple, contiguous rows; or multiple, contiguous cells within a row.

To work with an entire column, multiple, contiguous columns, or a contiguous set of cells within the table that do not extend across entire rows, you must first select the columns or cells, then apply formatting to the selection. You cannot assign columns and blocks of cells to a range because the information is not contiguous within the document's text flow.

Figure 1. You cannot assign a block of cells to a range

You cannot assign a block of cells to a range

Example to select a group of cells within a table:

object objWdCell = wd.WdUnits.wdCell;
object objWdCharacter = wd.WdUnits.wdCharacter;
object objWdLine = wd.WdUnits.wdLine;
private void btnCurrentTable_Click(object sender, System.EventArgs e)
{
    if ((bool) wdApp.Selection.get_Information(
        wd.WdInformation.wdWithInTable))
    {
        wd.Table tbl = wdApp.Selection.Tables[1];
        //Set from second cell in second row
        //to third cell in fifth row
        wd.Selection sel = SelectCells(tbl, 2, 2, 4, 7);
        if (sel != null)
        {
            sel.Font.Bold = 1;
        }
    }
}
private wd.Selection SelectCells(wd.Table tbl, int rowStart, 
    int colStart, int rowEnd, int colEnd)
{
    wd.Selection sel = null;
    int nrRows = tbl.Rows.Count;
    int nrCols = tbl.Columns.Count;
    // Make sure the start points exist in the table.
    // If they don't, then return without 
    // setting the selection
    if (rowStart > nrRows)
        return sel;
    if (colStart > nrCols)
        return sel;
    // Make sure the end point exists in the table.
    // If it does not, then set the last row/column as end points.
    if (rowEnd >= nrRows)
         rowEnd = (nrRows - rowStart + 1);
    if (colEnd >= nrCols)
        colEnd = (nrCols - colStart + 1);
    // Select the start cell.
    tbl.Cell(rowStart, colStart).Select();
    sel = wdApp.Selection;
    // Make sure the selection will extend.
    sel.ExtendMode = true;
    // First select the start cell.
    sel.Expand(ref objWdCell);
    // Next extend across the columns.
    // Subtract one from colEnd and rowEnd because the first row 
    // and column are already selected.
    object objColEnd = (object) (colEnd-1);
    object objRowEnd = (object) (rowEnd-1);
    sel.MoveRight(ref objWdCharacter, ref objColEnd, ref objTrue);
    // Now extend down the rows.
    sel.MoveDown(ref objWdLine, ref objRowEnd, ref objTrue);
    
    return sel;
}

Example to select a set of columns:

private void btnColumns_Click(object sender, System.EventArgs e)
{
    // Use the first table in the current document.
    wd.Table tbl = wdApp.ActiveDocument.Tables[1];
    // Start with the second column.
    tbl.Columns[2].Select();
    object objCount = (object) 1;
    // Extend across the given number of columns.
    wdApp.Selection.MoveRight(ref objWdCharacter, ref objCount, 
        ref objTrue);
    wdApp.Selection.Font.Bold = 1;
}

You can apply several types of formatting to tables: direct formatting, character and paragraph styles, table AutoFormats, and—in Word 2002 or Word 2003—table styles (custom table AutoFormats). Styles are usually preferable to direct formatting for a number of reasons:

Efficiency. When you apply formatting with a style, you apply multiple attributes in a single step. You type less code, it executes more quickly, and you are less likely to encounter the infamous "Out of memory" message that appears in the Word window—and stops code execution until the user dismisses it.

NoteNote

Word tracks editing actions taken in a document to support the Edit/Undo functionality. Word tracks this information in "temp" and "scratch" files, which have a limited size. When the amount of information exceeds the available space allotted by Word, the "Out of memory" message appears. In addition to streamlining the amount of information Word tracks by using styles, three more actions may help you avoid this error message:

NoteNote

1. Make sure AllowFastSaves is turned off. (This out-dated feature that is retained for compatibility reasons is no longer necessary with modern computers.)

NoteNote

2. Execute Document.UndoClear to clear the Undo list.

NoteNote

3. Allow Word to perform clean-up by periodically saving the file to disk.

Consistency and uniformity. Consistent formatting of all items of the same type is very important in reports and other large documents. It is much easier to keep track of formatting when you use styles rather than many separate formatting commands.

Maintenance. If the document you produce is saved, edited, and re-used, it is more efficient to work with styles.

For example, if you want to change the color of all table headings from black to blue, you can adjust the style definition, and all the table headings change. If you do not use styles, you must find, select, and apply the required formatting to each table heading.

The person designing the Word document should make the decision to use paragraph and character styles, or AutoFormats and table styles, or a combination. Normally, you can achieve finer control of text formatting with paragraph and character styles. AutoFormat and table styles provide a quick way to apply table-specific formatting (borders, shading, formatting of first and last columns and rows).

TipTip

If you decide to use a table style, and plan to control font formatting through the table style, be sure to create the table from a paragraph formatted with the Normal style. Any other paragraph or character style overrides the font formatting defined in the table style.

The following sample code demonstrates how to create a table style and apply it to all the tables in a document. Figure 2 shows the result.

Figure 2. Table formatted using a table style

Table formatted using a table style
//For Word 2002 and Word 2003 only
wd.WdBorderType verticalBorder = wd.WdBorderType.wdBorderVertical;
wd.WdBorderType leftBorder = wd.WdBorderType.wdBorderLeft;
wd.WdBorderType rightBorder = wd.WdBorderType.wdBorderRight;
wd.WdBorderType topBorder = wd.WdBorderType.wdBorderTop;
wd.WdBorderType bottomBorder = wd.WdBorderType.wdBorderBottom;

wd.WdLineStyle doubleBorder = wd.WdLineStyle.wdLineStyleDouble;
wd.WdLineStyle noBorder = wd.WdLineStyle.wdLineStyleNone;
wd.WdLineStyle singleBorder = wd.WdLineStyle.wdLineStyleSingle;

wd.WdTextureIndex noTexture = wd.WdTextureIndex.wdTextureNone;
wd.WdColor gray10 = wd.WdColor.wdColorGray10;
wd.WdColor gray70 = wd.WdColor.wdColorGray70;
wd.WdColorIndex white = wd.WdColorIndex.wdWhite;

private wd.Style CreateTableStyle(ref wd.Document wdDoc)
{
    object styleTypeTable = wd.WdStyleType.wdStyleTypeTable;
    wd.Style styl = wdDoc.Styles.Add
         ("New Table Style", ref styleTypeTable);
    styl.Font.Name = "Arial";
    styl.Font.Size = 11;
    wd.TableStyle stylTbl = styl.Table;
    stylTbl.Borders.Enable = 1;

    wd.ConditionalStyle evenRowBanding =
        stylTbl.Condition(wd.WdConditionCode.wdEvenRowBanding);
    evenRowBanding.Shading.Texture = noTexture;
    evenRowBanding.Shading.BackgroundPatternColor = gray10;
    // Borders have to be set specifically for every condition.
    evenRowBanding.Borders[leftBorder].LineStyle = doubleBorder;
    evenRowBanding.Borders[rightBorder].LineStyle = doubleBorder;
    evenRowBanding.Borders[verticalBorder].LineStyle = singleBorder;
    
    wd.ConditionalStyle firstRow = 
        stylTbl.Condition(wd.WdConditionCode.wdFirstRow);
    firstRow.Shading.BackgroundPatternColor = gray70;
    firstRow.Borders[leftBorder].LineStyle = doubleBorder;
    firstRow.Borders[topBorder].LineStyle = doubleBorder;
    firstRow.Borders[rightBorder].LineStyle = doubleBorder;
    firstRow.Font.Size = 14;
    firstRow.Font.ColorIndex = white;
    firstRow.Font.Bold = 1;

    // Set the number of rows to include in a "band".
    stylTbl.RowStripe = 1;
    return styl;
}

private void FormatAllTables(wd.Document wdDoc, wd.Style styl)
{
    foreach (wd.Table tbl in wdDoc.Tables)
    {
        object objStyle = styl;
        tbl.Range.set_Style(ref objStyle);
        // If the table ends in an "even band" the border will
        // be missing, so in this case add the border.

        if (SqlInt32.Mod(tbl.Rows.Count, 2) != 0)
        {
            tbl.Borders[bottomBorder].LineStyle = doubleBorder;
        }
    }
}

Word can repeat a specified number of heading rows at the top of every page if a table breaks automatically to a new page. It is not possible, however, to display different text, such as "Continued. . .", in repeated heading rows. Be sure to apply HeadingFormatting in the finishing stages of your code. Applying a table style or AutoFormat overrides this setting.

'Set the first through third rows to repeat
ActiveDocument.Tables(1).Rows(3).HeadingFormat = True  '1

Controlling how a table breaks to a new page is another aspect of table formatting. You can prevent rows from splitting across pages by setting the AllowPageBreaks property of the Table and Row objects.

NoteNote

This property has no effect on vertically merged cells.

Sometimes, you are required to link data into a document from another application. The user would usually copy the data, or use one of the Word tools to import it. (For example, on the Edit menu, click Paste Special with Link, or, on the Data toolbar, click Insert Database or, on the Insert menu, point to Object, and then click From File with Link.)

Important noteImportant

In Word 2003 and in recent service packs for Word 2000 and Word 2002, new security measures prevent fields that link in outside data from updating automatically. For more information, see the Microsoft Knowledge Base article How the Behavior of the Word Fields Changes After You Install the Word Update.

There are, of course, equivalent commands in the Word object model, but they are not usually the best choice for the developer. The InsertObject and InsertDatabase methods are difficult to get working correctly. Due to differences in how individual machines may be configured, if you get them working on one computer, they may not work on the next. Paste Special does not have this kind of problem, but it is not considered "polite" to interfere with the user's Clipboard.

Instead, often the most reliable and efficient approach is to re-create the final result of using the commands in the Word interface: inserting Link and Database fields directly. Link fields use OLE to bring live data into Word. Database fields can, like mail merge, bring in data through Dynamic Data Exchange (for Microsoft Office Excel and Microsoft Office Access), Open Database Connectivity (ODBC), or (in Word 2002 and 2003) OLE DB.

TipTip

Where possible, I recommend using ODBC for the Database field connections. Newer installations and versions of Word do not support DDE well and the Database field does not always store all the necessary information to update an OLE DB connection. ODBC is more broadly supported and more stable.

To determine the syntax you need to link to a particular application, go through the steps in the user interface, then press Alt+F9 to toggle on the field codes. For an Excel spreadsheet, for example, you see something like:{ LINK Excel.Sheet.8 "C:\\test\\test.xls" "Sheet1!R12C1:R18C3" \a \h }. (Although pasting always uses an R1C1 cell reference, you can substitute a range name, if you prefer to work with a named Excel range. The following code sample demonstrates this.)

NoteNote

To create a table from an Excel file, you must either create an Excel file or use one you already have and change the path in the code to point to it.

To insert this field using the Word object model, use the Fields.Add method:

private void main
{
    wd.Range wdRng = wdApp.Selection.Range
    // Note that backslashes in field paths must be doubled.
    String xlRange = "Sheet1!R12C1:R18C3";
    LinkToExcelWorksheet(ref wdRng, 
        "C:\\\\test\\\\yourExcelfile.xls", xlRange, "\\r");
}
private void LinkToExcelWorksheet(ref wd.Range wdRng, 
    string path, string xlRange, string importFormat)
{
    object objFieldText = 
         (object) "Link Excel.Sheet.8 " + "\"" + path + 
            "\" \"" + xlRange + "\" " + importFormat;
    // Insert the field code as text.
    wdRng.Text = (string) objFieldText;
    wdRng.Collapse(ref objCollapseEnd);

    // Insert the table.
    wd.Field wdFld = wdRng.Fields.Add(wdRng, ref objMissing, 
    ref objFieldText, ref objTrue);

    // Make sure the range is AFTER the field.
    wdRng = wdFld.Result;
    wdRng.Collapse(ref objCollapseEnd);
}
Caution noteCaution

When programming for international use, remember that English field names are used in all language versions of Word 2000 and later. For Word 97, you need to use the local language field names, switches, and functions. The Type argument of the Fields.Add method enables you to specify the field type, and Word automatically inserts the appropriate field name for the current local language. But you may need to change the rest of the field content (in the Text argument).

Consult the Word Help files for detailed information about individual Word fields and their switches. The \* Mergeformat switch is not field-specific but it plays an important role with linked tables. (The \* Mergeformat switch corresponds to the PreserveFormatting argument of the Fields.Add method.) This switch tells Word to retain table and font formatting applied to the table when the field updates. If it is not present in the field, formatting reverts when the field updates.

Important noteImportant

If the number of rows (records) in the field result increases, new rows may not pick up direct formatting applied in Word. The \* Mergeformat switch also saves where formatting is applied in a field result. Therefore, if the number of records a linked table displays can vary (increase) when the field updates, the formatting may appear inconsistent. In this case, do not use the \* Mergeformat switch; consider using a table AutoFormat or table style instead.

NoteNote

The \* Mergeformat switch does not work correctly in the original release of Word 2000. Service packs for that version corrected the bug.

Besides the \* Mergeformat switch, you can use a table AutoFormat to control the appearance of a Database field. In contrast to the switch, a table AutoFormat applies correctly even if the size of the table changes, and saves and is editable as part of the field code (see the \b and \l switches in the Word Help).

TipTip

You can link table data in through a field as an alternative to reading data from another source and then creating a table from it in Word. After you link to the outside table, you can apply the Unlink method to break the link, turning it into a plain Word table. This can save a lot of time (because you write less code), and the performance is sometimes improved.

You cannot extract data from a closed Word document in its native, binary file format (unless you are a C++ programmer and you have obtained the BIFF information from Microsoft). If the user saves the document in HTML, RTF, or (for Word 2003) XML file format, then you can use the usual tools to open these files and parse out the information without needing to use Word.

NoteNote

For more information about Word-specific information for these file formats see Microsoft Office Developer Center: Word. For more information about Office binary file formats see the Microsoft Knowledge Base article Information about how to extract Office file formats and schemas.

Otherwise, you must open the document in Word to get at its content. The most intuitive way to pull out table information is to process the table cell by cell. As when writing data into a table, this method is slow.

It is more efficient to convert the table to delimited text, pass it to a string variable, and then work with it.

NoteNote

If the table contains merged cells, Word automatically splits the cells so that the delimited text string contains the same number of fields (cells) for every record. The value that was in the merged cell is in the first field/record where Word encountered the merged cell; the other fields derived from the merged cell are empty. There is, therefore, no way to distinguish between fields that were part of merged cells and fields derived from cells that were empty. If you need to know what cells were merged it is recommended that you save the table or document to a filtered HTML file and then parse the HTML file. Merged table cells have a colspan or rowspan property after they convert to HTML format.

Sub ExtractTableData()
    Dim doc As Word.Document
    Dim tbl As Word.Table
    Dim rng As Word.Range
    Dim sData As String
    Dim aData1() As String
    Dim aData2() As String
    Dim aDataAll() As String
    Dim nrRecs As Long
    Dim nrFields As Long
    Dim lRecs As Long
    Dim lFields As Long
    
    Set doc = ActiveDocument
    Set tbl = doc.Tables(1)
    Set rng = tbl.ConvertToText(Separator:=vbTab, _
        NestedTables:=False)
    ' Pick up the delimited text into and put it into a string variable.
    sData = rng.Text
    ' Restore the original table.
    doc.Undo
    ' Strip off last paragraph mark.
    sData = Mid(sData, 1, Len(sData) - 1)
    ' Break up each table row into an array element.
    aData1() = Split(sData, vbCr)
    nrRecs = UBound(aData1())
    ' The messagebox below is for debugging purposes and tells you
    ' how many rows are in the table.  It is commented out but can
    ' be used simply by uncommenting it.
    'MsgBox "The table contained " & nrRecs + 1 & " rows"
    'Process each row to break down the field information
    'into another array.
    For lRecs = LBound(aData1()) To nrRecs
        aData2() = Split(aData1(lRecs), vbTab)
        ' We need to do this only once!
        If lRecs = LBound(aData1()) Then
            nrFields = UBound(aData2())
            ReDim Preserve aDataAll(nrRecs, nrFields)
        End If
        ' Now bring the row and field information together
        ' in a single, two-dimensional array.
        For lFields = LBound(aData2()) To nrFields
            aDataAll(lRecs, lFields) = aData2(j)
        Next
    Next
End Sub

Do not use a field delimiter that is contained in the table data. Also, if the table data contains any paragraph marks, you must replace them with another character before converting the table to text, and then restore them in the text passed to the string variable.

You can convert the table to text directly in the document (then use the Undo method to reverse the process or, alternatively, close the document without saving). If this approach presents problems (for example, the user is or will be working in the document) it is also possible to reproduce the table in a new document and convert it to text.

When working with Word 2003, you can pick up the table information as XML directly from the table range. You must set DataOnly to false and pick up the WordProcessingML, which is quite verbose.

Document.Tables(1).Range.XML(DataOnly:=False)

This article does not address all of the aspects about automating Word tables, but it can help you get a good start. Check the Additional Resources section for more information about other aspects of working with Word, and for more information about tables in particular.

Cindy Meister specializes in automation of Microsoft Office applications. Her main area of interest is Word for Windows, especially the challenge of bringing information from outside Word into documents and extracting content from Word documents for use in other applications. She has worked with all releases of Microsoft Office applications since the early 1990's. She has been a Word Most Valuable Professional (MVP) since 1996. Cindy resides in Switzerland, and can be reached at cindymeister@mvps.org. She maintains, at irregular intervals, a Web site with tips on using Word at http://homepage.swissonline.ch/cindymeister.

Show:
© 2014 Microsoft