Export (0) Print
Expand All
2 out of 2 rated this helpful - Rate this topic

Merging Simple Content from Excel 2007 Workbooks and Worksheets by Using the Open XML SDK 2.0 for Microsoft Office

Office 2007

Summary:  Learn how to use the Open XML SDK 2.0 for Microsoft Office to merge simple content from multiple Microsoft Office Excel 2007 workbooks and worksheets programmatically. (18 printed pages)

Office Visual How To

Applies to:  2007 Microsoft Office System, Microsoft Office Excel 2007, Open XML SDK 2.0 for Microsoft Office, Microsoft Visual Studio 2008

Joel Krist, iSoftStone

August 2009

Overview

The Open XMLSoftware Development Kit 2.0 for Microsoft Office makes it possible to create and manipulate Microsoft Office Word 2007, Microsoft Office Excel 2007, and Microsoft Office PowerPoint 2007 documents programmatically via the Open XML formats. The typesafe classes included with the SDK provide a layer of abstraction between the developer and the Open XML formats, simplifying the process of working with Office 2007 documents and enabling the creation of complex document solutions.

The sample code in this visual how-to article shows how to merge text and numeric data from multiple Excel 2007 workbooks and worksheets programmatically. The code uses the SpreadsheetDocument, SharedStringTable, SharedStringItem, SheetData, Row, and Cell classes in the Open XML SDK 2.0 for Microsoft Office to create the required document parts and to merge the content.

See It

Video startup screen

Watch the Video

Length: 10:08 | Size: 16.70 MB | Type: WMV file

Code It | Read It | Explore It

Code It

Download the sample code

This visual how-to article presents a solution that creates an Excel 2007 workbook and merges worksheets from other Excel workbooks into that new workbook. The solution illustrates how to copy worksheet cells that contain text and numeric data that is not formatted, and how to work with the shared string table. (There is no intent here to provide a complete solution that anticipates every scenario or that can manipulate every possible content type in a worksheet.)

As it proceeds, this section walks through the following steps:

  1. Creating a Windows console application solution in Visual Studio 2008.

  2. Adding references to the DocumentFormat.OpenXml and WindowsBase assemblies.

  3. Adding the sample code to the solution.

Creating a Windows Console Application in Visual Studio 2008

This visual how-to article uses a Windows console application to provide the framework for the sample code. However, you could use the same approach that is illustrated here with other application types as well.

To create a Windows Console Application in Visual Studio 2008

  1. Start Microsoft Visual Studio 2008.

  2. On the File menu, point to New, and then click Project.

  3. In the New Project dialog box select the Visual C# Windows type in the Project types pane.

  4. Select Console Application in the Templates pane, and then name the project MergeWorkbooks.

    Figure 1. Create new solution in the New Project dialog box

    Create new solution in the New Project dialog box

     

  5. Click OK to create the solution.

Adding References to the DocumentFormat.OpenXml and WindowsBase Assemblies

The sample code uses the classes and enumerations that are in the DocumentFormat.OpenXml.dll assembly that is installed with the Open XML SDK 2.0 for Microsoft Office. To add the reference to the assembly in the following steps or to build the sample code that accompanies this visual how-to, you must first download and install the Open XML SDK 2.0 for Microsoft Office so that the assembly is available.

To add References to the DocumentFormat.OpenXml and WindowsBase Assemblies

  1. Add a reference to the DocumentFormat.OpenXml assembly by doing the following:

    1. On the Project menu in Visual Studio, click Add Reference to open the Add Reference dialog box.

    2. Select the .NET tab, scroll down to DocumenFormat.OpenXml, select it, and then click OK.

      Figure 2. Add Reference to DocumentFormat.OpenXML

      Add Reference to DocumentFormat.OpenXML

       

  2. The classes in the DocumentFormat.OpenXml assembly use the System.IO.Packaging.Package class that is defined in the WindowsBase assembly. Add a reference to the WindowsBase assembly by doing the following:

    1. On the Project menu in Visual Studio, click Add Reference to open the Add Reference dialog box.

    2. Select the .NET tab, scroll down to WindowsBase, select it, and then click OK.

      Figure 3. Add Reference to WindowsBase

      Add Reference to WindowsBase

       

Adding the Sample Code to the Solution

Replace the contents of the Program.cs source file with the following code.

using System;
using System.Linq;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
using DocumentFormat.OpenXml;

class Program
{
  static void Main(string[] args)
  {
    string mergedWorkbookName = "MergedWorkbook.xlsx";
    string[] sourceWorkbookNames = new string[]
    { "Workbook1.xlsx", "Workbook2.xlsx", "Workbook3.xlsx" };
    string workbookFolder = @"C:\Temp\";

    WorkbookPart mergedWorkbookPart = null;

    WorksheetPart mergedWorksheetPart = null;
    WorksheetPart childWorksheetPart = null;

    Sheets mergedWorkbookSheets = null;
    Sheets childWorkbookSheets = null;

    Sheet newMergedSheet = null;
    SheetData mergedSheetData = null;

    SharedStringTablePart mergedSharedStringTablePart = null;
    SharedStringTablePart childSharedStringTablePart = null;

    // Create the merged workbook package.
    using (SpreadsheetDocument mergedWorkbook =
      SpreadsheetDocument.Create(
        workbookFolder + mergedWorkbookName,
        SpreadsheetDocumentType.Workbook))
    {
      // Add the merged workbook part to the new package.
      mergedWorkbookPart = mergedWorkbook.AddWorkbookPart();
      GenerateMergedWorkbook().Save(mergedWorkbookPart);

      // Get the Sheets element in the merged workbook
      // for use later. 
      mergedWorkbookSheets =
        mergedWorkbookPart.Workbook.GetFirstChild<Sheets>();

      // Create the Shared String Table part in the
      // merged workbook.
      mergedSharedStringTablePart =
        mergedWorkbookPart.AddNewPart<SharedStringTablePart>();
      GenerateSharedStringTablePart().Save(
        mergedSharedStringTablePart);

      // For each source workbook to merge...
      foreach (string workbookName in sourceWorkbookNames)
      {
        // Open the source workbook. The following will throw an
        // exception if the source workbook does not exist.
        using (SpreadsheetDocument childWorkbook =
          SpreadsheetDocument.Open(
            workbookFolder + workbookName, false))
        {
          // Get the Sheets element in the source workbook.
          childWorkbookSheets =
            childWorkbook.WorkbookPart.Workbook.GetFirstChild
              <Sheets>();
          // Get the Shared String Table part of the child workbook.
          childSharedStringTablePart =
              childWorkbook.WorkbookPart.SharedStringTablePart;

          // For each worksheet in the child workbook...
          foreach (Sheet childSheet in childWorkbookSheets)
          {
            // Get a worksheet part for the child worksheet using
            // it's relationship Id. 
            childWorksheetPart =
              (WorksheetPart)childWorkbook.WorkbookPart.GetPartById(
                childSheet.Id);

            // Add a worksheet part to the merged workbook based on
            // the child worksheet.
            mergedWorksheetPart =
              mergedWorkbookPart.AddPart<WorksheetPart>(
                childWorksheetPart);

            // There should be only one worksheet that is set
            // as the main view.
            CleanView(mergedWorksheetPart);

            // Create a Sheet element for the new sheet in the
            // merged workbook.
            newMergedSheet = new Sheet();

            // Set the Name, Id, and SheetId attributes of the
            // new Sheet element.
            newMergedSheet.Name =
              GenerateWorksheetName(mergedWorkbookSheets,
                childSheet.Name.Value);

            newMergedSheet.Id = mergedWorkbookPart.GetIdOfPart(
              mergedWorksheetPart);

            newMergedSheet.SheetId =
              (uint)mergedWorkbookSheets.ChildElements.Count + 1;

            // Add the new Sheet element to the Sheets element in the
            // merged workbook.
            mergedWorkbookSheets.Append(newMergedSheet);

            // Get the SheetData element of the new worksheet part
            // in the merged workbook.
            mergedSheetData =
              mergedWorksheetPart.Worksheet.GetFirstChild
                <SheetData>();

            // For each row of data...
            foreach (Row row in mergedSheetData.Elements<Row>())
            {
              // For each cell in the row...
              foreach (Cell cell in row.Elements<Cell>())
              {
                // If the cell is using a shared string, merge 
                // the string from the child workbook into the merged
                // workbook. 
                if (cell.DataType != null &&
                    cell.DataType.Value == CellValues.SharedString)
                {
                  ProcessCellSharedString(mergedWorksheetPart, cell,
                    mergedSharedStringTablePart,
                    childSharedStringTablePart);
                }
              }
            }
          }
        }
      }

      //Save the changes to the merged workbook.
      mergedWorkbookPart.Workbook.Save();
    }
  }

  #region Part Generation Methods

  private static Workbook GenerateMergedWorkbook()
  {
    return new Workbook(new Sheets());
  }

  private static SharedStringTable GenerateSharedStringTablePart()
  {
    var element =
      new SharedStringTable()
      { Count = (UInt32Value)0U, UniqueCount = (UInt32Value)0U };

    return element;
  }

  #endregion

  #region Utility Methods

  private static void CleanView(WorksheetPart worksheetPart)
  {
    //There can only be one sheet that has the focus.
    SheetViews views =
      worksheetPart.Worksheet.GetFirstChild<SheetViews>();

    if (views != null)
      views.Remove();
  }

  private static string GenerateWorksheetName(
    Sheets mergedWorkbookSheets, string baseSheetName)
  {
    // Generate a name for the worksheet.
    // Must be unique and < 32 characters long.
    int count = 0;
    string newBaseSheetName = baseSheetName;

    // While the new sheet name is already used...
    while (IsSheetNameInUse(mergedWorkbookSheets, newBaseSheetName))
    {
      count++;

      // Decrease the length of the sheet's base name to keep the
      // length under 32 characters.
      if (baseSheetName.Length >= (28 - Digits(count)))
        newBaseSheetName =
          baseSheetName.Substring(0, 28-Digits(count));
      else
        newBaseSheetName = baseSheetName;

      newBaseSheetName = newBaseSheetName + ("(" + count + ")");
    }

    return newBaseSheetName.ToString();
  }

  // Determine if the specified sheet name is already in use.
  private static bool IsSheetNameInUse(Sheets mergedWorkbookSheets,
    string sheetName)
  {
    foreach (Sheet mergedSheet in mergedWorkbookSheets)
    {
      if (mergedSheet.Name.Value.Equals(sheetName))
        return true;
    }

    return false;
  }

  // Return the number of digits in the specified number
  private static int Digits(int number)
  {
    if (number != 0)
    {
      int n = 0;

      while (number > 0)
      {
        ++n;
        number /= 10;
      }

      return n;
    }
    else
      return 1;
  }

  private static void ProcessCellSharedString(
    WorksheetPart mergedWorksheetPart, Cell cell,
    SharedStringTablePart mergedSharedStringTablePart,
    SharedStringTablePart childSharedStringTablePart)
  {
    // The provided cell is using a shared string, so the cell value
    // is an index into the workbook's shared string table, pointing to
    // the actual string value. Check the shared strings in the merged
    // workbook. If there is not already a matching shared string 
    // item, copy the shared string item from the shared string 
    // table in the child workbook to the shared string table in the 
    // merged workbook. Fix up the shared string index in the cell 
    // value in either case.

    SharedStringItem newMergedSharedStringItem = null;
    SharedStringItem childSharedStringItem = null;

    int matchedIndex = -1;
    int currentIndex = 0;
    int sharedStringIndex = Convert.ToInt32(cell.CellValue.Text);

    childSharedStringItem =
      childSharedStringTablePart.SharedStringTable.Elements
      <SharedStringItem>().ElementAt(sharedStringIndex);

    // See if there is a shared string item in the merged workbook
    // that matches the item in the child workbook.
    foreach (SharedStringItem mergedSharedStringItem in
      mergedSharedStringTablePart.SharedStringTable)
    {
      if (childSharedStringItem.OuterXml ==
        mergedSharedStringItem.OuterXml)
      {
        // There is a match, so use the existing shared string item.
        matchedIndex = currentIndex;
        break;
      }

      currentIndex++;
    }

    // If there was a match...
    if (matchedIndex != -1)
      // then use the existing shared string item.
      cell.CellValue.Text = matchedIndex.ToString();
    else
    {
      // else add the shared string item from the child workbook to the
      // merged workbook and set the cell in the merged worksheet to
      // point to it. The index into the shared string table is 0-based 
      // so we use the current shared string item count for the index 
      // used by the cell so that it's pointing at the correct item 
      // once we add the item and increment the item count below.

      cell.CellValue.Text =
        mergedSharedStringTablePart.SharedStringTable.Count.ToString();

      // Add the new string item to the table.
      newMergedSharedStringItem =
        (SharedStringItem)(childSharedStringItem.Clone());
      mergedSharedStringTablePart.SharedStringTable.AppendChild
        <SharedStringItem>(newMergedSharedStringItem);

      // Increment the count of shared string items.
      mergedSharedStringTablePart.SharedStringTable.Count++;
      mergedSharedStringTablePart.SharedStringTable.UniqueCount++;

      // Save the changes to the shared string table in the merged
      // workbook.
      mergedSharedStringTablePart.SharedStringTable.Save();
    }

    // Save the changes to the new worksheet in the merged workbook.
    mergedWorksheetPart.Worksheet.Save();
  }

  #endregion
}

 

Build and run the solution in Visual Studio by pressing CTRL+F5. When you build and run the sample code, it creates a workbook in the C:\Temp folder named MergedWorkbook.xlsx. The merged workbook contains the worksheets from three workbooks - Workbook1.xlsx, Workbook2.xlsx, and Workbook3.xlsx - also located in the C:\Temp folder.

Figure 4. Source workbooks and worksheets

Source workbooks and worksheets

 

Figure 5. Merged workbook that is generated by the code

Merged workbook that is generated by the code

 

To change the names or the location of the workbooks, modify the sample code and change the value of the mergedWorkbookName, sourceWorkbookNames, or workbookFolder variables defined in the Main method. The workbooks that you want to merge must exist for the solution to run successfully. To create the source workbooks, open Microsoft Excel 2007, create and name the workbooks, and then save them to the folder that you specified in the workbookFolder variable.

Read It

The sample code in this visual how-to article illustrates how to create an Excel 2007 workbook that contains worksheets merged from other workbooks. This section uses code snippets from the Code It section to describe the approach.

The code first creates a Spreadsheet package by using the SpreadsheetDocument.Create method. It then adds the workbook and the shared string table parts to the package, and gets the Sheets element in the workbook. (The code uses that element later on when it adds the merged worksheets.)

// Create the merged workbook package.
using (SpreadsheetDocument mergedWorkbook =
  SpreadsheetDocument.Create(
    workbookFolder + mergedWorkbookName,
    SpreadsheetDocumentType.Workbook))
{
  // Add the merged workbook part to the new package.
  mergedWorkbookPart = mergedWorkbook.AddWorkbookPart();
  GenerateMergedWorkbook().Save(mergedWorkbookPart);

  // Get the Sheets element in the merged workbook
  // for use later. 
  mergedWorkbookSheets =
    mergedWorkbookPart.Workbook.GetFirstChild<Sheets>();

  // Create the Shared String Table part in the
  // merged workbook.
  mergedSharedStringTablePart =
    mergedWorkbookPart.AddNewPart<SharedStringTablePart>();
  GenerateSharedStringTablePart().Save(
    mergedSharedStringTablePart);

 

The GenerateMergedWorkbook method creates an empty Sheets element in the workbook and the GenerateSharedStringTablePart method creates an empty SharedStringTable element.

private static Workbook GenerateMergedWorkbook()
{
  return new Workbook(new Sheets());
}

private static SharedStringTable GenerateSharedStringTablePart()
{
  var element =
    new SharedStringTable()
    { Count = (UInt32Value)0U, UniqueCount = (UInt32Value)0U };

  return element;
}

 

After creating the main document and the shared string table parts in the new merged workbook, the Main method uses the SpreadsheetDocument.Open method to open each of the specified source workbooks. It gets the shared string table of each source workbook so that it (the code) can copy those shared strings into the merged workbook. In addition, it retrieves the Sheets element of each source workbook so that it (the code) can iterate over each source worksheet.

// For each source workbook to merge...
foreach (string workbookName in sourceWorkbookNames)
{
  // Open the source workbook. The following will throw an
  // exception if the source workbook does not exist.
  using (SpreadsheetDocument childWorkbook =
    SpreadsheetDocument.Open(
      workbookFolder + workbookName, false))
  {
    // Get the Sheets element in the source workbook.
    childWorkbookSheets =
      childWorkbook.WorkbookPart.Workbook.GetFirstChild<Sheets>();

    // Get the Shared String Table part of the child workbook.
    childSharedStringTablePart =
      childWorkbook.WorkbookPart.SharedStringTablePart;

    // For each worksheet in the child workbook...
    foreach (Sheet childSheet in childWorkbookSheets)
    {

 

The code uses the WorkbookPart.AddPart method to create a new merged worksheet part based on the current source worksheet. It then creates a new Sheet element for the merged workbook and sets the new sheet's Name, Id, and SheetId attributes. The code uses a helper method named GenerateWorksheetName to generate a unique name for the merged worksheet in case worksheets from different source workbooks have the same name. The code then appends the new Sheet element to the merged workbook's Sheets collection.

// Get a worksheet part for the child worksheet using
// it's relationship Id. 
childWorksheetPart =
  (WorksheetPart)childWorkbook.WorkbookPart.GetPartById(
    childSheet.Id);

// Add a worksheet part to the merged workbook based on
// the child worksheet.
mergedWorksheetPart =
  mergedWorkbookPart.AddPart<WorksheetPart>(
    childWorksheetPart);

// There should be only one worksheet that is set
// as the main view.
CleanView(mergedWorksheetPart);

// Create a Sheet element for the new sheet in the
// merged workbook.
newMergedSheet = new Sheet();

// Set the Name, Id, and SheetId attributes of the
// new Sheet element.
newMergedSheet.Name =
  GenerateWorksheetName(mergedWorkbookSheets,
    childSheet.Name.Value);

newMergedSheet.Id = mergedWorkbookPart.GetIdOfPart(
  mergedWorksheetPart);

newMergedSheet.SheetId =
  (uint)mergedWorkbookSheets.ChildElements.Count + 1;

// Add the new Sheet element to the Sheets element in the
// merged workbook.
mergedWorkbookSheets.Append(newMergedSheet);

 

The code gets the SheetData element for the newly created merged worksheet to get information about the data stored in each cell of each row. It loops through each cell of each row and looks at the cell's DataType attribute to determine whether the cell is using a shared string. If it is, the code calls a helper method named ProcessCellSharedString.

// Get the SheetData element of the new worksheet part
// in the merged workbook.
mergedSheetData =
  mergedWorksheetPart.Worksheet.GetFirstChild<SheetData>();

// For each row of data...
foreach (Row row in mergedSheetData.Elements<Row>())
{
  // For each cell in the row...
  foreach (Cell cell in row.Elements<Cell>())
  {
    // If the cell is using a shared string then merge the
    // string from the child workbook into the merged
    // workbook. 
    if (cell.DataType != null &&
        cell.DataType.Value == CellValues.SharedString)
    {
      ProcessCellSharedString(mergedWorksheetPart, cell,
        mergedSharedStringTablePart,
        childSharedStringTablePart);
    }
  }
}

 

The ProcessCellSharedString method checks the shared string table in the merged workbook to see if the shared string item already exists. If it does not exist, the code copies the shared string item from the source workbook into the shared string table of the merged workbook. It then updates the cell's value so that it points to the correct string item in the merged workbook's shared string table.

private static void ProcessCellSharedString(
  WorksheetPart mergedWorksheetPart, Cell cell,
  SharedStringTablePart mergedSharedStringTablePart,
  SharedStringTablePart childSharedStringTablePart)
{
  // The provided cell is using a shared string so the cell value
  // is an index into the workbook's shared string table, pointing to
  // the actual string value. Check the shared strings in the merged
  // workbook and if there is not already a matching shared string item
  // then copy the shared string item from the shared string table in
  // the child workbook to the shared string table in the merged
  // workbook. Fix up the shared string index in the cell value in
  // either case.

  SharedStringItem newMergedSharedStringItem = null;
  SharedStringItem childSharedStringItem = null;

  int matchedIndex = -1;
  int currentIndex = 0;
  int sharedStringIndex = Convert.ToInt32(cell.CellValue.Text);

  childSharedStringItem =
    childSharedStringTablePart.SharedStringTable.Elements
    <SharedStringItem>().ElementAt(sharedStringIndex);

  // See if there is a shared string item in the merged workbook
  // that matches the item in the child workbook.
  foreach (SharedStringItem mergedSharedStringItem in
    mergedSharedStringTablePart.SharedStringTable)
  {
    if (childSharedStringItem.OuterXml ==
      mergedSharedStringItem.OuterXml)
    {
      // There is a match so use the existing shared string item.
      matchedIndex = currentIndex;
      break;
    }

    currentIndex++;
  }

  // If there was a match...
  if (matchedIndex != -1)
    // then use the existing shared string item.
    cell.CellValue.Text = matchedIndex.ToString();
  else
  {
    // else add the shared string item from the child workbook to the
    // merged workbook and set the cell in the merged worksheet to
    // point to it. The index into the shared string table is 0-based
    // so we use the current shared string item count for the index 
    // used by the cell so that it's pointing at the correct item once 
    // we add the item and increment the item count below.

    cell.CellValue.Text =
      (mergedSharedStringTablePart.SharedStringTable.Count).ToString();

    // Add the new string item to the table.
    newMergedSharedStringItem =
      (SharedStringItem)(childSharedStringItem.Clone());
    mergedSharedStringTablePart.SharedStringTable.AppendChild
      <SharedStringItem>(newMergedSharedStringItem);

    // Increment the count of shared string items.
    mergedSharedStringTablePart.SharedStringTable.Count++;
    mergedSharedStringTablePart.SharedStringTable.UniqueCount++;

    // Save the changes to the shared string table in the merged
    // workbook.
    mergedSharedStringTablePart.SharedStringTable.Save();
  }

  // Save the changes to the new worksheet in the merged workbook.
  mergedWorksheetPart.Worksheet.Save();
}

 

As mentioned previously, there is no intent here to provide a complete solution that anticipates every scenario or that can manipulate every possible content type in a worksheet. Instead, the idea is to show you how to copy simple text and numeric data, and how to work with shared strings. If you use the sample code to merge a worksheet that contains cells formatted via styles or themes, Excel displays the following warning after the code creates the merged workbook.

Figure 6. Warning message in Excel about unreadable content

Warning message in Excel about unreadable content

 

Excel displays the warning because the sample code does not attempt to create or merge any stylesheet or theme data from the source workbooks. If you click Yes in the dialog box, Excel tries to repair the workbook by removing the records that refer to the unreadable content, which removes the formatting from the cells in the process.

Figure 7. Repaired Records dialog box in Excel

Repaired Records dialog box in Excel

 

Ee396252.note(en-us,office.12).gifNote:

When you use the Open XML SDK 2.0 for Microsoft Office to create a document-generation solution, it is best practice to create a template document first, and then use DocumentReflector, a tool that comes with the SDK. DocumentReflector can generate C# code that uses the SDK typesafe classes to reproduce your template document and the functionality that it contains. You can then use that code to help you add functionality or to help you understand the Open XML document parts and relationships that are required to implement a specific document feature. For more information about best practices and the Open XML SDK 2.0 for Microsoft Office, see Erika Ehrli's blog entry Getting Started Best Practices.

Explore It

Did you find this helpful?
(1500 characters remaining)
Thank you for your feedback

Community Additions

ADD
Show:
© 2014 Microsoft. All rights reserved.