Office Open XML Formats: Retrieving Excel 2007 Cell Values

Summary: Retrieve cell values from worksheets in Excel 2007 programmatically.

Office Visual How To

Applies to: 2007 Microsoft Office System, Microsoft Office Excel 2007, Microsoft Visual Studio 2005

Ken Getz, MCW Technologies, LLC

March 2007

Overview

Imagine that you need to retrieve information about a specific cell within an Excel workbook or set of workbooks. The ability to perform this operation without requiring you to load Excel 2007 and then load the workbooks, one after another, can be an incredible time saver, and the Office Open XML File Formats make this task possible. Working with the Office Open XML File Formats requires knowledge of the way Excel stores its content, the System.IO.Packaging API, and XML programming.

See It Splash screen of video xlcellvalues

Watch the Video

Length: 08:28 | Size: 5.67 MB | Type: WMV file

Code It | Read It | Explore It

Code It

To help you get started, you can download a set of forty code snippets for Visual Studio 2005, each of which demonstrate various techniques working with the Office 2007 Open XML File Formats. After you install the code snippets, create a sample Excel workbook with which to test. Place some varied values (numbers, strings, dates, Booleans) into cells, so you have some values to work with. For details, see Read It, later in this article. Create a Windows Application project in Microsoft Visual Studio 2005, open the code editor, right-click, select Insert Snippet, and select the Excel: Get cell value snippet from the list of available Office 2007 snippets. If you use Microsoft Visual Basic, inserting the snippet inserts a reference to WindowsBase.dll and adds the following Imports statements.

Imports System.IO.Packaging
Imports System.Xml
Imports System.IO

If you use Microsoft Visual C#, you need to add the reference to the WindowsBase.dll assembly and the corresponding using statements, so that you can compile the code. (Code snippets in C# cannot set references and insert using statements for you.) If the Windowsbase.dll reference does not appear on the .NET tab of the Add Reference dialog box, click the Browse tab, locate the C:\Program Files\Reference assemblies\Microsoft\Framework\v3.0 folder, and then click WindowsBase.dll.

The XLGetCellValue snippet delves programmatically into the various document parts and relationships between the parts to retrieve a single cell value. To test it, store your sample workbook so that it is easy to find (for example, C:\Test.xlsx). In a Windows application, insert the XLGetCellValue snippet, and then call it by modifying the names to meet your needs. You see the value of the cell you requested in an alert.

MessageBox.Show(XLGetCellValue("C:\Test.Xlsx", "Sheet1", "A1"))
MessageBox.Show(XLGetCellValue("C:\\Test.Xlsx", "Sheet1", "A1"));

The snippet code starts with the following block.

Public Function XLGetCellValue(ByVal fileName As String, _
 ByVal sheetName As String, ByVal addressName As String) As String
  ' Return the value of the specified cell.
  Const documentRelationshipType As String = _
   "http://schemas.openxmlformats.org/officeDocument/" & _
   "2006/relationships/officeDocument"
  Const worksheetSchema As String = _
   "http://schemas.openxmlformats.org/spreadsheetml/2006/main"
  Const sharedStringsRelationshipType As String = _
   "http://schemas.openxmlformats.org/officeDocument/" & _
   "2006/relationships/sharedStrings"
  Const sharedStringSchema As String = _
   "http://schemas.openxmlformats.org/spreadsheetml/2006/main"

  Dim cellValue As String = Nothing

  ' Retrieve the stream containing the requested
  ' worksheet's info:
  Dim documentPart As PackagePart = Nothing
  Dim documentUri As Uri = Nothing

  Using xlPackage As Package = Package.Open( _
   fileName, FileMode.Open, FileAccess.Read)
    ' Get the main document part (workbook.xml).
    For Each relationship As PackageRelationship In _
     xlPackage.GetRelationshipsByType(documentRelationshipType)
      ' There should only be one document part in the package. 
      documentUri = PackUriHelper.ResolvePartUri( _
       New Uri("/", UriKind.Relative), relationship.TargetUri)
      documentPart = xlPackage.GetPart(documentUri)
      ' There should only be one instance, 
      ' but get out no matter what.
      Exit For
    Next

    If documentPart IsNot Nothing Then
      ' Load the contents of the workbook.
      Dim doc As XmlDocument = New XmlDocument()
      doc.Load(documentPart.GetStream())

      ' Create a NamespaceManager to handle the default namespace, 
      ' and create a prefix for the default namespace:
      Dim nt As NameTable = New NameTable()
      Dim nsManager As New XmlNamespaceManager(nt)
      nsManager.AddNamespace("d", worksheetSchema)
      nsManager.AddNamespace("s", sharedStringSchema)

      ' Next code block goes here.

    End If
  End Using
  Return cellValue
End Function
public string XLGetCellValue(string fileName, 
  string sheetName, string addressName)
{
  //  Return the value of the specified cell.
  const string documentRelationshipType = 
    "http://schemas.openxmlformats.org/officeDocument/" +
    "2006/relationships/officeDocument";
  const string worksheetSchema = 
    "http://schemas.openxmlformats.org/spreadsheetml/2006/main";
  const string sharedStringsRelationshipType = 
    "http://schemas.openxmlformats.org/officeDocument/" + 
    "2006/relationships/sharedStrings";
  const string sharedStringSchema = 
    "http://schemas.openxmlformats.org/spreadsheetml/2006/main";

  string cellValue = null;

  //  Retrieve the stream containing the requested
  //  worksheet's info:
  using (Package xlPackage = Package.Open(
    fileName, FileMode.Open, FileAccess.Read))
  {
    PackagePart documentPart = null;
    Uri documentUri = null;

    //  Get the main document part (workbook.xml).
    foreach (System.IO.Packaging.PackageRelationship relationship in 
      xlPackage.GetRelationshipsByType(documentRelationshipType))
    {
      //  There should only be one document part in the package. 
      documentUri = PackUriHelper.ResolvePartUri(
        new Uri("/", UriKind.Relative), relationship.TargetUri);
      documentPart = xlPackage.GetPart(documentUri);
      //  There should only be one instance, 
      //  but get out no matter what.
      break;
    }

    if (documentPart != null)
    {
      // Load the contents of the workbook.
      XmlDocument doc = new XmlDocument();
      doc.Load(documentPart.GetStream());

      //  Create a namespace manager, so you can search.
      //  Add a prefix (d) for the default namespace.
      NameTable nt = new NameTable();
      XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
      nsManager.AddNamespace("d", worksheetSchema);
      nsManager.AddNamespace("s", sharedStringSchema);

      // Next code block goes here.
  return cellValue;
}

This code starts by creating constants that are used to refer to the various schemas and namespaces required by the procedure. The code then finds the document part, by calling the Package.GetRelationshipsByType method, passing the constant that contains the document relationship name (see Figure 2). The code then loops through all the returned relationships, and retrieves the document URI, relative to the root of the package. You must loop through the PackageRelationship objects to retrieve the one you want. In every case, this loop only executes once. When it finds the document part, the code creates an XmlDocument instance to contain the contents of the workbook. Next, it loads the XML content, and creates an XmlNamespaceManager instance loaded with the two namespaces that is needed to perform searches. After working through the code that follows, the procedure returns the value of the cell that it found.

The next block attempts to find and load the sheet you specified.

Dim searchString As String = _
 String.Format("//d:sheet[@name='{0}']", sheetName)
Dim sheetNode As XmlNode = _
 doc.SelectSingleNode(searchString, nsManager)
If sheetNode IsNot Nothing Then
  ' Get the relId attribute:  
  Dim relationAttribute As XmlAttribute = _
   sheetNode.Attributes("r:id")
  If relationAttribute IsNot Nothing Then
    Dim relId As String = relationAttribute.Value

    Dim sheetRelation As PackageRelationship = _
     documentPart.GetRelationship(relId)
    Dim sheetUri As Uri = _
     PackUriHelper.ResolvePartUri( _
     documentUri, sheetRelation.TargetUri)
    Dim sheetPart As PackagePart = xlPackage.GetPart(sheetUri)

    ' Load the contents of the sheet into an XML document.
    Dim sheetDoc As New XmlDocument()
    sheetDoc.Load(sheetPart.GetStream())

    ' Next code block goes here.

  End If
End If
string searchString = 
  string.Format("//d:sheet[@name='{0}']", sheetName);
XmlNode sheetNode = 
  doc.SelectSingleNode(searchString, nsManager);
if (sheetNode != null)
{
  //  Get the relId attribute:
  XmlAttribute relationAttribute = 
    sheetNode.Attributes["r:id"];
  if (relationAttribute != null)
  {
    string relId = relationAttribute.Value;

    //  First, get the relation between the 
    // document and the sheet.
    PackageRelationship sheetRelation = 
      documentPart.GetRelationship(relId);
    Uri sheetUri = PackUriHelper.ResolvePartUri(
      documentUri, sheetRelation.TargetUri);
    PackagePart sheetPart = xlPackage.GetPart(sheetUri);

    //  Load the contents of the workbook.
    XmlDocument sheetDoc = new XmlDocument(nt);
    sheetDoc.Load(sheetPart.GetStream());

    // Next code block goes here.

  }
}

This code block searches for the specific sheet within the workbook's XML content. If it finds the sheet name, it stores the r:id value, and uses that to locate the relationship to the sheet part. Given that information, the code resolves the relationship URI, gets the sheet part, and loads its content into a new XmlDocument instance.

The next block begins the attempt to retrieve the cell value.

Dim cellNode As XmlNode = _
 sheetDoc.SelectSingleNode( _
 String.Format("//d:sheetData/d:row/d:c[@r='{0}']", _
 addressName), nsManager)
If cellNode IsNot Nothing Then
  ' Retrieve the value. The value may be stored within 
  ' this element. If the "t" attribute contains "s", then
  ' the cell contains a shared string, and you must look 
  ' up the value individually.
  Dim typeAttr As XmlAttribute = cellNode.Attributes("t")
  Dim cellType As String = String.Empty
  If typeAttr IsNot Nothing Then
    cellType = typeAttr.Value
  End If

  Dim valueNode As XmlNode = _
   cellNode.SelectSingleNode("d:v", nsManager)
  If valueNode IsNot Nothing Then
    cellValue = valueNode.InnerText
  End If

  ' Next code block goes here.

End If
XmlNode cellNode = sheetDoc.SelectSingleNode(
  string.Format("//d:sheetData/d:row/d:c[@r='{0}']", addressName), 
  nsManager);
if (cellNode != null)
{

  //  Retrieve the value. The value may be stored within 
  //  this element. If the "t" attribute contains "s", then
  //  the cell contains a shared string, and you must look 
  //  up the value individually.
  XmlAttribute typeAttr = cellNode.Attributes["t"];
  string cellType = string.Empty;
  if (typeAttr != null)
  {
    cellType = typeAttr.Value;
  }

  XmlNode valueNode = cellNode.SelectSingleNode("d:v", nsManager);
  if (valueNode != null)
  {
    cellValue = valueNode.InnerText;
  }

  // Next code block goes here.
}

Using a rather complex XPath expression, this block attempts to find information about the requested cell. If it finds the element that represents the cell, it determines if the cell contains text. If so, it must search the shared strings file to find the value of the text. Otherwise, it can simply retrieve the value locally.

The next code block checks the cell type: if it's "b" or "s", the code must handle it specially (note that the snippet doesn't handle date values specifically, but you might want to add code to convert from the serial date format Excel uses into a standard DateTime type).

' Check the cell type. At this point, this code only checks
' for booleans and strings individually.
If cellType = "b" Then
  If cellValue = "1" Then
    cellValue = "TRUE"
  Else
    cellValue = "FALSE"
  End If
ElseIf cellType = "s" Then
  ' Go retrieve the actual string from the associated string file.
  For Each stringRelationship As PackageRelationship In _
    documentPart.GetRelationshipsByType(sharedStringsRelationshipType)
    ' There should only be one shared string reference, 
    ' so you exit this loop immediately.
    Dim sharedStringsUri As Uri = _
     PackUriHelper.ResolvePartUri(documentUri, _
     stringRelationship.TargetUri)
    Dim stringPart As PackagePart = _
     xlPackage.GetPart(sharedStringsUri)
    If stringPart IsNot Nothing Then

      ' Next code block goes here.

    End If
  Next
End If
//  Check the cell type. At this point, this code only checks
//  for booleans and strings individually.
if (cellType == "b")
{
  if (cellValue == "1")
  {
    cellValue = "TRUE";
  }
  else
  {
    cellValue = "FALSE";
  }
}
else if (cellType == "s")
{
  //  Go retrieve the actual string from the associated string file.
  foreach (System.IO.Packaging.PackageRelationship 
    stringRelationship in 
    documentPart.GetRelationshipsByType(sharedStringsRelationshipType))
  {
    //  There should only be one shared string reference, 
    // so you exit this loop immediately.
    Uri sharedStringsUri = PackUriHelper.ResolvePartUri(
      documentUri, stringRelationship.TargetUri);
    PackagePart stringPart = xlPackage.GetPart(sharedStringsUri);
    if (stringPart != null)
    {

      // Next code block goes here.

    }
  }
}

If the code determines that the cell contains a Boolean value, it converts the contents to True or False. If it determines that the cell contains a string, it finds the shared string part, and loads its package.

The final code block retrieves a string value from the shared string part.

' Load the contents of the shared strings.
Dim stringDoc As New XmlDocument(nt)
stringDoc.Load(stringPart.GetStream())

' Add the string schema to the namespace manager:
nsManager.AddNamespace("s", sharedStringSchema)

Dim requestedString As Integer = Convert.ToInt32(cellValue)
Dim strSearch As String = _
 String.Format("//s:sst/s:si[{0}]", requestedString + 1)
Dim stringNode As XmlNode = _
 stringDoc.SelectSingleNode(strSearch, nsManager)
If stringNode IsNot Nothing Then
  cellValue = stringNode.InnerText
End If
//  Load the contents of the shared strings.
XmlDocument stringDoc = new XmlDocument(nt);
stringDoc.Load(stringPart.GetStream());

//  Add the string schema to the namespace manager:
nsManager.AddNamespace("s", sharedStringSchema);

int requestedString = Convert.ToInt32(cellValue);
string strSearch = string.Format(
  "//s:sst/s:si[{0}]", requestedString + 1);
XmlNode stringNode = 
  stringDoc.SelectSingleNode(strSearch, nsManager);
if (stringNode != null)
{
  cellValue = stringNode.InnerText;
}

This block loads the shared string part into a new XmlDocument instance, adds a new namespace to the namespace manager, and searches for the requested value within the shared string part. If it finds a match, it sets the cell value to be the string it found.

Read It

It’s important to understand the file structure of a simple Excel document, so that you can find the data you need—in this case, look for the value of a cell, and perhaps, a string in the shared string part. To do that, create an Excel document with several values in it, perhaps a number, a string, and a Boolean value. I named my document Test.xlsx, and it contains three values, as shown in Figure 1.

Figure 1. The sample document contains three different types of values

Three different values in a spreadsheet

To investigate the contents of the document, you can follow these steps:

  1. In Windows Explorer, rename the document Test.xlsx.zip.

  2. Open the ZIP file, using either Window Explorer, or some ZIP-management application.

  3. View the _rels\.rels file, shown in Figure 2. This document contains information about the relationships between the parts in the document. Note the value for the workbook.xml part, as highlighted in the figure—this information allows you to find specific parts.

    Figure 2. Use relationships between document parts to find specific parts

    XML code snippet

  4. Open xl\workbook.xml, shown in Figure 3. The highlighted element contains a reference to the sheet named Sheet1, and you use one of these references to retrieve information about a particular sheet.

    Figure 3. Use the r:id attribute to find the worksheet you need

    XML Code snippet

  5. Open xl\_rels\workbook.xml.rels, as shown in Figure 4. This document contains information about the relationship between the document part and all the subsidiary document parts, and the code snippet uses this information to the specified worksheet, so that it can retrieve the value of a cell. Note, for example, that the sheet whose relationship ID is rId1 refers to worksheets/sheet1.xml.

    Figure 4. Each sheet relationship appears in the workbook.xml.rels file

    XML code snippet

  6. Open xl\worksheets\sheet1.xml, as shown in Figure 5—this document part contains the sheet named Sheet1. The code snippet uses XML-searching techniques to find cell values within the XML content.

    Figure 5. In the sheet part, you can find cell values

    XML code snippet

  7. If the "t" attribute within a cell contains "s", as in the highlighted element in Figure 5, the snippet must look in the shared string part. You can see the relationship to the shared string part in Figure 4, with the relationship id rId6. As indicated in the relationships part, open sharedStrings.xml, at the same level as workbook.xml (that is, the xl folder).

    Figure 6. The shared strings part contains all the strings within the workbook

    XML code snippet

  8. The shared strings part contains one entry for each unique string in the workbook. By storing a list of unique strings, the Excel can reduce the storage size for the workbook, assuming that the workbook might contain multiple instances of any of the strings. The code snippet retrieves the values of strings, using the string's index within the part.

  9. Close the tool you used to investigate the workbook, and rename the file with a .XLSX extension.

Explore It