Export (0) Print
Expand All
Expand Minimize

Retrieving the Values of Cells in Excel 2010 Workbooks by Using the Open XML SDK 2.0

Office 2010

Office Visual How To

Summary:  Use the strongly-typed classes in the Open XML SDK 2.0 to retrieve the value of a cell in an Excel 2007 or Excel 2010 document, without loading the document into Microsoft Excel.

Last modified: September 12, 2012

Applies to: Excel 2007 | Excel 2010 | Office 2007 | Office 2010 | Open XML | PowerPoint 2010 | VBA | Word 2010

Published:   August 2010

Provided by:   Ken Getz, MCW Technologies, LLC

Overview

The Open XML file formats enable you to retrieve information about a particular cell in an Excel workbook. The Open XML SDK 2.0 adds strongly-typed classes to simplify access to the Open XML file formats: The SDK simplifies the tasks of retrieving information about the workbook, and finding the appropriate XML content. The code sample provided with this Visual How To shows how to the use the SDK to perform this task.

Code It

The code sample provided with this Visual How To includes the code that is required to retrieve the value of a specified cell in a specified sheet in an Excel 2007 or Excel 2010 workbook.

Setting Up References

To use the code from the Open XML SDK 2.0, you must add several references to your project. The sample project includes these references, but in your code, you must explicitly reference the following assemblies:

  • WindowsBase─This reference may be set for you, depending on the type of project that you create.

  • DocumentFormat.OpenXml─Installed by the Open XML SDK 2.0.

You should also add the following using/Imports statements to the top of your code file.

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;

Examining the Procedure

The XLGetCellValue procedure accepts three parameters that indicate the following:

  • The name of the document to examine (string).

  • The name of the sheet to examine (string).

  • The cell address (such as A1, B12) from which to retrieve a value (string).

public static string XLGetCellValue(string fileName, 
  string sheetName, string addressName)

The procedure returns the value of the specified cell, if it can be found. To call the procedure, pass all the parameter values, as shown in the following code example.

const string fileName = @"C:\temp\GetCellValue.xlsx";
string value = XLGetCellValue(fileName, "Sheet1", "A1");

Accessing the Cell

The following code example shows how the code starts by creating a variable that will contain the return value, assuming a null result.

string value = null;

Next, the code opens the document by using the SpreadsheetDocument.Open method and indicates that the document should be open for read-only access (the final false parameter). Next, the code retrieves a reference to the workbook part by using the WorkbookPart property of the document.

using (SpreadsheetDocument document = 
  SpreadsheetDocument.Open(fileName, false))
{
    WorkbookPart wbPart = document.WorkbookPart;
    // Code removed here…
}

To find the requested cell, the code must first retrieve a reference to the sheet, given its name. The code must search all the sheet-type descendants of the workbook part Workbook property and examine the Name property of each sheet that it finds. Be aware that this search only looks through the relations of the workbook, and does not actually find a worksheet part—it only finds a reference to a Sheet, which contains information such as the name and ID of the sheet. The simplest way to do this is to use a LINQ query.

Sheet theSheet = wbPart.Workbook.Descendants<Sheet>().
  Where(s => s.Name == sheetName).FirstOrDefault();
if (theSheet == null)
{
  throw new ArgumentException("sheetName");
}

Be aware that the FirstOrDefault method returns either the first matching reference (a sheet, in this case) or a null reference if no match was found. The code checks for the null reference, and throws an exception if you passed an invalid sheet name.

Now that you have information about the sheet, the code must retrieve a reference to the corresponding worksheet part. The sheet information that you retrieved provides an Id property, and with that Id property, the code can retrieve a reference to the corresponding WorksheetPart by calling the WorkbookPartGetPartById property.

WorksheetPart wsPart = 
  (WorksheetPart)(wbPart.GetPartById(theSheet.Id));

Just as when locating the named sheet, when locating the named cell, the code uses the Descendants method, looking for the first match in which the Reference property equals the specified addressName parameter. After this method call, the variable named theCell will either contain a reference to the cell, or will contain a null reference.

Cell theCell = wsPart.Worksheet.Descendants<Cell>().
  Where(c => c.CellReference == addressName).FirstOrDefault();

Retrieving the Value

At this point, the variable named theCell contains either a null reference, or a reference to the cell that you requested. If you examine the Open XML content (that is, theCell.OuterXml) for the cell, you will find XML similar to the following code example.

<x:c r="A1">
  <x:v>12.345000000000001</x:v>
</x:c>

The InnerText property contains the content for the cell, and so the next block of code retrieves this value.

if (theCell != null)
{
  value = theCell.InnerText;
  // Code removed here…
}

Now, the sample procedure must interpret the value. As it is, the sample handles numeric, date, string, and Boolean values—you can extend the sample as necessary. The Cell type provides a DataType property that indicates the type of the data within the cell; the value of the DataType property is null for numeric and date types. It contains the value CellValues.SharedString for strings, and CellValues.Boolean for Boolean values. If the DataType property is null, the code only returns the value of the cell (a numeric value). Otherwise, the code continues by branching based on the data type.

if (theCell.DataType != null)
{
  switch (theCell.DataType.Value)
    {
    // Code removed here…
    }
}

If the DataType property contains CellValues.SharedString, the code must retrieve a reference to the single SharedStringTablePart.

var stringTable = 
  wbPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();

Next, if the string table exists the code returns the InnerText property of the element it finds at the specified index (first converting the value property to an integer).

NoteNote

If the string table does not exist, the workbook is corrupted and the sample code will return the index into the string table instead of the string itself.

if (stringTable != null)
{
  value = stringTable.SharedStringTable.
    ElementAt(int.Parse(value)).InnerText;
}

If the DataType property contains CellValues.Boolean, the code converts the 0 or 1 it finds in the cell value into the appropriate text string.

case CellValues.Boolean:
  switch (value)
  {
    case "0":
      value = "FALSE";
      break;
    default:
      value = "TRUE";
      break;
  }

Finally, the procedure returns the variable value, which contains the requested information.

Sample Procedure

The following code example shows the complete procedure.

public static string XLGetCellValue(
  string fileName, string sheetName, string addressName)
{
  string value = null;

  using (SpreadsheetDocument document = 
    SpreadsheetDocument.Open(fileName, false))
  {
    WorkbookPart wbPart = document.WorkbookPart;

    // Find the sheet with the supplied name, and then use that Sheet
    // object to retrieve a reference to the appropriate worksheet.
    Sheet theSheet = wbPart.Workbook.Descendants<Sheet>().
      Where(s => s.Name == sheetName).FirstOrDefault();

    if (theSheet == null)
    {
      throw new ArgumentException("sheetName");
    }

    // Retrieve a reference to the worksheet part, and then use its 
    // Worksheet property to get a reference to the cell whose 
    // address matches the address you supplied:
    WorksheetPart wsPart = 
      (WorksheetPart)(wbPart.GetPartById(theSheet.Id));
    Cell theCell = wsPart.Worksheet.Descendants<Cell>().
      Where(c => c.CellReference == addressName).FirstOrDefault();

    // If the cell does not exist, return an empty string:
    if (theCell != null)
    {
      value = theCell.InnerText;

      // If the cell represents a numeric value, you are done. 
      // For dates, this code returns the serialized value that 
      // represents the date. The code handles strings and Booleans
      // individually. For shared strings, the code looks up the 
      // corresponding value in the shared string table. For Booleans, 
      // the code converts the value into the words TRUE or FALSE.
      if (theCell.DataType != null)
      {
        switch (theCell.DataType.Value)
        {
          case CellValues.SharedString:
            // For shared strings, look up the value in the shared 
            // strings table.
            var stringTable = wbPart.
              GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
            // If the shared string table is missing, something is 
            // wrong. Return the index that you found in the cell.
            // Otherwise, look up the correct text in the table.
            if (stringTable != null)
            {
              value = stringTable.SharedStringTable.
                ElementAt(int.Parse(value)).InnerText;
            }
            break;

          case CellValues.Boolean:
            switch (value)
            {
              case "0":
                value = "FALSE";
                break;
              default:
                value = "TRUE";
                break;
            }
            break;
        }
      }
    }
  }
  return value;
}
Read It

The sample that is included with this Visual How To contains code that retrieves the value from a particular cell in a specified sheet in an Excel 2007 or Excel 2010 workbook. To use the code sample, install the Open XML SDK 2.0, available from the link listed in the Explore It section. The sample also uses code included as part of a set of code examples for the Open XML SDK 2.0. The Explore It section also includes a link to the full set of code examples, although you can use the sample without downloading and installing the sample code.

The sample application retrieves the value from several cells in a document that you supply, calling the XLGetCellValue method in the sample to do the work. This method returns the value of the specified cell as a string—the calling code must interpret the string value. The calls to the method resemble the following code example.

const string fileName = @"C:\temp\GetCellValue.xlsx";
string value = XLGetCellValue(fileName, "Sheet1", "A1");
Console.WriteLine(value);
value = XLGetCellValue(fileName, "Sheet1", "A2");
Console.WriteLine(value);
value = XLGetCellValue(fileName, "Sheet1", "A3");
Console.WriteLine(value);
value = XLGetCellValue(fileName, "Sheet1", "A4");
Console.WriteLine(value);

It is important to understand how Excel stores cells and their values. Figure 1 shows a sample workbook, with four types of values in four cells—the sample application assumes that you have created this workbook, and stored it as C:\temp\GetCellValue.xlsx. The Open XML SDK 2.0 includes, in its tool directory, a useful application named OpenXmlSdkTool.exe, shown in Figure 2. This tool enables you to open a document and view its various parts and the hierarchy of parts. Figure 2 shows the test document—in the left pane, the document was expanded to the worksheet node, and in the right panes, the tool displays both the XML for the part and the reflected C# code that you could use to generate the contents of the part.

Figure 1. Sample data in an Excel workbook

Sample data in an Excel workbook

Figure 2 shows the OpenXmlSdkTool.exe application that is included with the Open XML SDK 2.0.

Figure 2. Open XML SDK 2.0 Productivity Tool

Open XML SDK 2.0 Productivity Tool

If you examine the left pane (the hierarchy of parts) and the XML content in Figure 2, you will find facts that you need to know to understand the code in this Visual How To:

  • In the highlighted XML content on the right, you can see that the markup contains a reference to the cell address—therefore, given the cell address, you can locate the content that you need.

  • To locate the cell that you need, first locate the sheet that contains the cell.

What you cannot see by using the Productivity Tool is that to retrieve a reference to a named sheet, you must first examine all the sheets contained with the workbook to find a match against the name, and, given the sheet and its ID, retrieve the worksheet part. The sample code handles these issues for you.

Excel maintains a part only for containing each string that you use in a spreadsheet. Each unique string appears once in this table, and Excel stores an index into this table in each cell that contains a string. Given a DataType property of CellValues.SharedString, code must retrieve the corresponding shared string, by index, from the shared string table part. Figure 3 shows the Open XML SDK 2.0 Productivity Tool representation of the shared string table for a sample workbook. This example contains two strings, indexed 0 and 1.

Figure 3. Open XML SDK 2.0 Productivity Tool showing the shared string table for a sample worksheet

Productivity Tool showing shared string table
See It

Watch the video

Watch video

Length: 00:10:46

Click to grab code

Grab the Code

Explore It

About the Author
Ken Getz is a senior consultant with MCW Technologies. He is coauthor of ASP.NET Developers Jumpstart (Addison-Wesley, 2002), Access Developer's Handbook (Sybex, 2001), and VBA Developer's Handbook, 2nd Edition (Sybex, 2001).

Show:
© 2014 Microsoft