Creating Documents by Using the Open XML Format SDK 2.0 CTP (Part 2 of 3)

Summary: See common scenarios simplified by using the Open XML Format APIs. Assemble and manipulate data in Microsoft Office Excel 2007 workbooks by using the Open XML Format Software Development Kit (SDK version 2.0 (CTP) APIs. (14 printed pages)

Zeyad Rajabi, Microsoft Corporation

Frank Rice, Microsoft Corporation

February 2009

Applies to: Microsoft Office Excel 2007, Microsoft Office PowerPoint 2007, Microsoft Office Word 2007

Contents

  • Document Assembly Solution for SpreadsheetML

  • Sending Data from SpreadsheetML to a Data Source

  • Summary

  • Additional Resources

Introduction to This Series of Articles

This series of articles explores the overall design of the Open XML Format SDK 2.0 Community Technical Preview (CTP) with respect to goals and scenarios. Creating Documents by Using the Open XML Format SDK Version 2.0 CTP (Part 1 of 3) dives deeply into the architecture of the Open XML Format SDK. Creating Documents by Using the Open XML Format SDK 2.0 (Part 2 of 3) and Creating Documents by Using the Open XML Format SDK 2.0 CTP (Part 3 of 3) present common scenario and show you lots of sample code.

You can access the Open XML Format SDK 2.0 (CTP) from the following locations:

Download sample: Sample Code downloads for the Open XML Format SDK 2.0 (CTP)

Document Assembly Solution for SpreadsheetML

The Open XML Format SDK 2.0 (CTP) provides a set of Microsoft .NET Framework application programming interfaces (APIs) that helps you create and manipulate documents in the Open XML Formats -- in both client and server environments -- without requiring Microsoft Office client applications. Creating Documents by Using the Open XML Format SDK 2.0 (Part 2 of 3) presents two scenarios: a workbook populated from a database, and a database populated from a workbook, each without using the Microsoft Office Excel 2007 client application.

In the following section, you populate an Excel worksheet with data from a Microsoft SQL Server database and create a chart based on that data.

Scenario 1: Document Assembly

The theme of this scenario is document assembly and involves creating a file based on external data sources, such as other files or databases. In this scenario, you are a developer for a fictional company named Adventure Works. This company stores all sales information in a database. Currently, the sales team uses Excel to connect to the database manually to retrieve the latest sales information. Your job is to create a report generation tool that automatically retrieves data from the database, imports it into Excel, and creates a report with a chart. The Sales team will run this solution on the report server in overnight batches, so there is a strict requirement that they cannot use the Excel application client.

Solution

Before describing the details of the solution, note the following prerequisites:

  • The solution was created with Microsoft Visual Studio 2008.

  • The solution uses the sample Adventure Works database, which is based on Microsoft SQL Server 2005. You can download the sample database at Adventure Works

Note that you are not required to use either of these resources to build Open XML Format solutions. You can use any comparable editor or compiler, and any data source.

Download sample: Sample Code downloads for the Open XML Format SDK 2.0 (CTP)

After opening the sample project, follow along with the steps in the next sections to see how the solution is created. If you haven’t already done so, install the

Open XML Format 2.0 SDK and then create a Microsoft .NET Framework project with a reference to the Open XML Format SDK 2.0 (CTP). In this scenario, create a Microsoft Visual C# project named v2WorkbookFromDB and add a reference to DocumentFormat.OpenXML, which is the name of the dynamic linked library (DLL) for the Open XML Format SDK 2.0. This gives you access to the classes and objects in the SDK.

Next, add a connection to the database. In this scenario, you use Language Integrated Query (LINQ) technology because of its ease in querying and connecting to SQL databases. For more information about this technology, see Additional Resources. Within Microsoft Visual Studio, you can add autogenerated LINQ to SQL classes to the Adventure Works database in a matter of seconds. These classes allow you to connect and query databases without writing any SQL code.

Connect to a Database with LINQ to SQL

  1. In the Visual Studio C# project, in the Solution Explorer, right-click the project name, click Add, and then click New Item.

    Figure 1. Add a LINQ to SQL item to the project

    Add a LINQ to SQL item to the project

  2. Select the LINQ to SQL Classes template. Type a name and then click Add.

    Figure 2. Select the LINQ to SQL classes template

    Select the LINQ to SQL classes template

  3. Next, create a connection to the Adventure Works database. To display the Server Explorer pane, on the View menu click Server Explorer.

  4. In the Server Explorer, click the Connect to Database icon.

  5. In the Data Source box, select Microsoft SQL Server Database File, and then click Continue.

  6. In the Add Connection dialog box, browse to the Adventure Works database file, click Open, and then click OK to close the dialog box.

  7. In the Server Explorer, select the database connection, and drag the SalesTerritory table to the designer window as shown in Figure 3.

    Figure 3. Adding the SalesTerritory table to the project

    Adding the SalesTerritory table to the project

    That's it. No extra coding is required. You have access to the database, and the .NET Framework classes are automatically generated. You can query the database through the objects in these classes. Continuing the scenario, assume that you are interested in last year's and this year's sales for each territory.

Create a Template File

In this part of the scenario, you create and populate an Excel workbook from data within the database. You can either create a workbook or start from an existing template. It is almost always easier to create an Open XML Format solution by starting from an existing template.

Create a workbook template in Excel that has two worksheets. The first worksheet contains the header rows for the information in the Territory Name, Sales Last Year, and Sales This Year. The second worksheet contains the structure of the chart. The template should like the following figures.

Figure 4. Excel Template worksheet

The Excel Template worksheet

Figure 5. Sales by Region chart

d967502c-d6c9-40fd-82b1-248f9fc4c25d

After the template is created, do the following:

  1. Insert data into the first sheet of the workbook.

  2. Modify the ranges in the chart so it displays the data.

Insert Data into the Workbook

Before you can use the Open XML Format SDK 2.0, you need to add the correct namespaces to the C# file. Add references to the Packaging and Spreadsheet API components with the following code.

using DocumentFormat.OpenXml.Packaging; 
using DocumentFormat.OpenXml.Spreadsheet; 
using DocumentFormat.OpenXml;

In the template, the first worksheet needs to contain all the necessary data. The following code opens the existing workbook file and accesses the first worksheet. After you have access to the worksheet, access the SheetData object which represents all of the cell data in the worksheet.

//Make a copy of the template file.
File.Copy("template.xlsx", "generated.xlsx", true);
//Open the copied template workbook. 
using (SpreadsheetDocument myWorkbook = SpreadsheetDocument.Open("generated.xlsx", true)) 
{ 
   //Access the main Workbook part, which contains all references.
   WorkbookPart workbookPart = myWorkbook.WorkbookPart;
   //Get the first worksheet. 
   WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
   // The SheetData object will contain all the data.
SheetData sheetData = worksheetPart.Worksheet.GetFirstChild<SheetData>(); 
}

To add data to the worksheet, append rows to the SheetData object from the database. The following code queries the Adventure Works database for all sales information in the SalesTerritory table. For each row in the query, create a row and add it to the worksheet.

//Connect to database.
AdventureWorksDataContext db = new AdventureWorksDataContext();
//The data starts at row 2. 
int index = 2;
//Select all rows from SalesTerritory table.
var territoryQuery = from t in db.SalesTerritories select t; 
//For each row in the database, add a row to they spreadsheet.
foreach (var item in territoryQuery)
{ 
   string territoryName = item.Name;
   decimal salesLastYear = Math.Round(item.SalesLastYear, 2);
   decimal salesThisYear = Math.Round(item.SalesYTD, 2); 
   //Add a new row.
   Row contentRow = CreateContentRow(index, territoryName, salesLastYear, salesThisYear); 
   index++; 
   //Append new row to sheet data.
   sheetData.AppendChild(contentRow);
}

In this case, you create a row that has three cells: the name of the territory, sales from last year, and sales for this year. The first cell is a text-based cell, while the other two cells are value-based cells. The following code creates a row with the three cells.

string[] headerColumns = new string[] { "A", "B", "C" };
Row CreateContentRow(int index, string territory, decimal salesLastYear, decimal salesThisYear)
{ 
   //Create the new row.
   Row r = new Row();
   r.RowIndex = (UInt32)index; 
   //First cell is a text cell, so create it and append it.
   Cell firstCell = CreateTextCell(headerColumns[0], territory, index);
   r.AppendChild(firstCell);
   //Create the cells that contain the data.
   for (int i=1; i<headerColumns.Length; i++)
   {
      Cell c = new Cell();
      c.CellReference = headerColumns[i] + index;
      CellValue v = new CellValue();
      if (i == 1)
         v.Text = salesLastYear.ToString();
      else
         v.Text = salesThisYear.ToString();
      c.AppendChild(v);
      r.AppendChild(c);
   }
   return r;
}

Text-based cells and value-based cells are stored differently in SpreadsheetML. The following code creates text-based cells.

Cell CreateTextCell(string header, string text, int index)
{
   //Create a new inline string cell.
   Cell c = new Cell();
   c.DataType = CellValues.InlineString;
   c.CellReference = header + index;
   //Add text to the text cell.
   InlineString inlineString = new InlineString();
   Text t = new Text();
   t.Text = text;
   inlineString.AppendChild(t);
   c.AppendChild(inlineString);
   return c;
}

Now you have a fully functional workbook with data from the database.

Adjusting the Chart Data

Look at the chart data within the template. Notice that the ranges for the data are fixed and the ranges refer to data in the second row in the spreadsheet.

<c:f>'My Data'!$B$2</c:f>

To fix the chart data, extend this range to include all of the new rows, as shown in the following code.

void FixChartData(WorkbookPart workbookPart, int totalCount)
{
   //Get the appropriate chart part from template file.
   ChartPart chartPart = workbookPart.ChartsheetParts.First().DrawingsPart.ChartParts.First(); 
   //Change the ranges to accomodate the newly inserted data.
   foreach (Charts.Formula formula in chartPart.ChartSpace.Descendants<Charts.Formula>()) 
   { 
      if (formula.Text.Contains("$2")) 
      { 
         string s = formula.Text.Split('$')[1]; 
         formula.Text += ":$" + s + "$" + totalCount; 
      } 
   } 
   chartPart.ChartSpace.Save(); 
}

The Result

The result is the workbook that contains all the data from the database as well as a chart that represents the data, as shown in the following figures.

Figure 6. Completed sales data worksheet

The completed sales data worksheet

Figure 7. Sales by Region chart

The Sales by Region chart

Sending Data from SpreadsheetML to a Data Source

In the previous section, you created a spreadsheet based on data from a database by using the Open XML Format SDK 2.0. In this section, you do the reverse: read data from a spreadsheet and insert it into a data source, in this case, a database.

Scenario 2: Populating a Database from a Workbook

The theme in this scenario is document data retrieval, where data is retrieved from a file. You are a developer working for a fictional company named Contoso. The sales team uses Excel to create sales orders for customers. These sales orders are all based on an Excel template with designated regions to keep track of data such as customer Id, invoice number, items being purchased, total dues, and so forth. The sales team typically creates hundreds of sales orders a day. So you will create a solution that is able to export bulk data from these sales orders into a database. This solution will be run every night on the server, so automating the Excel client application is not an option.

Solution

Sample Code downloads for the Open XML Format SDK 2.0 (CTP).

In this scenario, you read sales orders that are contained in a workbook template. In this sales order template, you have designated regions where users fill in the information. To simplify the coding and to ensure that the code is more robust, you take advantage of defined name regions as specified in SpreadsheetML. Defined name regions allow you to mark one or more cells in a worksheet as having semantic meaning. For example, you specify a defined name region for a cell having a customer Id. The advantage of using defined names is that you do not need to look up hardcoded columns and rows. For example, if a cell containing the customer Id changes from B10 to B11 after a user adds a row to the spreadsheet, you can still find the specific cell based on the defined name tag. For this scenario, create a template that looks similar to the following figure.

Figure 8. Contoso template

The Contoso template

Building a Defined Names Table

First, add the correct namespaces to the C# file. In this case, add the Packaging and Spreadsheet API components with the following code.

using DocumentFormat.OpenXml.Packaging; 
using DocumentFormat.OpenXml.Spreadsheet; 
using DocumentFormat.OpenXml;

The sales order file contains defined name regions, so the next step is to find all the defined names that are specified in the file. In SpreadsheetML, workbook-level defined names are stored in the main workbook part as follows.

<definedNames> 
<definedName name="CustomerCompanyName">'Sales Order'!$B$13</definedName> 
<definedName name="CustomerID">'Sales Order'!$E$9</definedName> 
<definedName name="CustomerName">'Sales Order'!$B$12</definedName> 
<definedName name="ItemsRange">'Sales Order'!$A$22:$G$36</definedName> 
</definedNames>

The defined name is stored as a string that needs to be parsed. In this solution, each of these defined names is read and then parsed into the following components:

  • Defined name string (this string is a reference to the data)

  • Sheet name

  • Start column

  • Start row

  • End column (only present if the defined name is a range)

  • End row (only present if the defined name is a range)

You use this information to find the value, or sets of values, that comprise the defined name region. Note that the code assumes that the defined names are not changed and that no additional defined names are added to the string ItemsRange.

using (SpreadsheetDocument myWorkbook = SpreadsheetDocument.Open(filename, true)) 
{ 
   //Access the main Workbook part, which contains all references. 
   WorkbookPart workbookPart = myWorkbook.WorkbookPart; 
   //Get all the reference names. 
   List<DefinedNameVal> namesTable = BuildDefinedNamesTable(workbookPart); 
}

This is where you define the BuildDefineNamesTable method as shown in the following code. (Note that this method was contained in a class named DefineNamesVal).

static List<DefinedNameVal> BuildDefinedNamesTable(WorkbookPart workbookPart) 
{ 
   //Build a list. 
   List<DefinedNameVal> definedNames = new List<DefinedNameVal>(); 
   foreach (DefinedName name in workbookPart.Workbook.GetFirstChild<DefinedNames>()) 
   { 
      //Parse defined name string. 
      string key = name.Name; 
      string reference = name.InnerText; 
      string sheetName = reference.Split('!')[0]; 
      sheetName = sheetName.Trim('\''); 
      //Assumption: None of the defined names are relative defined names (i.e. A1). 
      string range = reference.Split('!')[1]; 
      string[] rangeArray = range.Split('$'); 
      string startCol = rangeArray[1]; 
      string startRow = rangeArray[2].TrimEnd(':'); 
      string endCol = null; 
      string endRow = null; 
      if (rangeArray.Length > 3) 
      { 
         endCol = rangeArray[3]; 
         endRow = rangeArray[4]; 
      } 
      definedNames.Add(new DefinedNameVal() { Key = key, SheetName = sheetName, StartColumn = startCol, StartRow = startRow, EndColumn = endCol, EndRow = endRow }); 
   } 
   return definedNames; 
}

This code assumes that none of the names are relative-defined names. For example, the defined name ranges contain $ between the column and row.

This is all of the information needed to look up the values for the defined name regions.

Access the Worksheet Part Specified by Defined Name Regions

In the Excel template, there are two types of data:

  • Defined names that specify values contained within one cell, for example, customer Id.

  • Defined names that specify a region of values, which has the inventory information for the sales order; for example, what is being ordered, how many.

In either case, the value for the defined name is contained in the worksheet that is specified by the defined name, which was already found. The following code returns the worksheet part that is based on the defined name data.

private static WorksheetPart GetWorkSheetPart(WorkbookPart workbookPart, DefinedNameVal definedName) 
{ 
   //Get worksheet based on defined name. 
   string relId = workbookPart.Workbook.Descendants<Sheet>() .Where(s => definedName.SheetName.Equals(s.Name)).First().Id; 
   return (WorksheetPart)workbookPart.GetPartById(relId); 
}

Get Cell Values Specified by Defined Name Regions

Now that you have the worksheet part and the cell reference (or region of cells) as specified in the defined name region, you can retrieve the cell values. There are two types of values stored in the sales order: numbers and strings. In SpreadsheetML, you can represent cell values that are strings as inline strings within the cell or as values that are stored within the shared string table that is found in the sharedstrings part of the worksheet.

The first step in retrieving a specific cell value based on column and row index is to find that cell. The second step is to look up the value for that cell. The following methods retrieve the cell value based on the cell reference.

static string GetCellValue(WorksheetPart worksheetPart, SharedStringTablePart stringTablePart, string startCol, string startRow) 
{ 
   string reference = startCol + startRow; 
   //Get the exact cell based on the reference. 
   Cell cell = worksheetPart.Worksheet.Descendants<Cell>().Where(c => Reference.Equals(c.CellReference)).First(); 
   return GetValue(cell, stringTablePart); 
}

If the cell value is based on a shared string, look up the appropriate value from the shared string table, otherwise, just look at the cell value, as shown in the following code.

static string GetValue(Cell cell, SharedStringTablePart stringTablePart) 
{ 
   if (cell.ChildElements.Count == 0) return null; 
   //Get the cell value. 
   string value = cell.CellValue.InnerText; 
   //Look up the real value from shared string table. 
   if ((cell.DataType != null) && (cell.DataType == CellValues.SharedString)) 
   value = stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText; 
   return value; 
}

Putting It All Together

Now that the basics are in place, go through all the defined names and find the appropriate values from the spreadsheet. For simplicity, instead of writing all found values into a database, print the data to the console.

As mentioned previously, there are two types of data based on the defined names: single cell values and range values that are based on the sales order. The following code handles the first type of data, the single cell values.

//Create the inventory list. 
DefinedNameVal inventory = null; 
//Go through all the defined names you found. 
foreach (DefinedNameVal definedName in namesTable) 
{ 
   string cellValue = null; 
   //Get the worksheet as specified in the defined name. 
   WorksheetPart worksheetPart = GetWorkSheetPart(workbookPart, definedName); 
   //Not the inventory list? 
   if (!definedName.Key.Contains("Items")) 
   { 
      //Get the cell value. 
      cellValue = GetCellValue(worksheetPart, stringTablePart, definedName.StartColumn, definedName.StartRow); 
      //If the cell value is a date then convert the value to a human readable date. 
      if (definedName.Key.Contains("Date")) 
      { 
         DateTime d = new DateTime(1900, 1, 1); 
         d = d.AddDays(Int32.Parse(cellValue) - 1); 
         cellValue = d.ToShortDateString(); 
      } 
      Console.WriteLine(definedName.Key + ":\t " + cellValue); 
   } 
   else 
   { 
      //Store the value for later use. 
      inventory = definedName; 
   } 
}

Notice that the previous code is converting a SpreadsheetML-stored date to a human-readable date. Because you are going to print out the found values, just track the inventory values as a separate list, as seen in the following code.

//Get the worksheet as specified in the inventory.
WorksheetPart worksheetPart = GetWorkSheetPart(workbookPart, inventory); 
//Get all data from the inventory. 
int startRow = Int32.Parse(inventory.StartRow); 
int endRow = Int32.Parse(inventory.EndRow); 
for (int i = startRow; i <= endRow; i++) 
{ 
   string[] rowValues = new string[7]; //for printing... 
   bool empty = true; 
   //Get a row from the sales order. 
   Row row = worksheetPart.Worksheet.Descendants<Row>() 
   .Where(r => i == r.RowIndex) 
   .First(); 
   int j = 0; 
   //Get all the values from the row. 
   foreach (Cell c in row.ChildElements) 
   { 
      //Get the cell value.
      string value = GetValue(c, stringTablePart); 
      if ((value != null) && (value != "")) 
         empty = false; 
      rowValues[j] = value; 
      j++; 
   } 
   if (empty) break; 
   Console.WriteLine(inventory.Key + i + ": " + string.Join(", ", rowValues)); 
}

This code reads the entire inventory sales order until it reaches an empty row, which means the list is complete.

The Result

Assume that you have a sales order with the information that is shown in Figure 9.

Figure 9. Sample sales order

A sample sales order

Executing the code with this information produces the following output, which is written to the console.

Figure 10. Results of retrieving data values from the worksheet

Results of retrieving data values from worksheet

Summary

The Open XML Formats SDK APIs make it easy to work with data from different sources, such as a database or an Excel workbook. In Creating Documents by Using the Open XML Format SDK 2.0 CTP (Part 3 of 3), you work with Microsoft Office PowerPoint 2007 presentation and content controls in Word.

Additional Resources

For more information about the Open XML Format SDK see the following resources: