Office Open XML Formats: Retrieving Lists of Excel 2007 Worksheets

Summary: Learn how to retrieve lists of worksheets from Excel programmatically.

Office Visual How To

Applies to: 2007 Microsoft Office System, Microsoft Office Excel 2007, Microsoft Visual Studio 2005

Ken Getz, MCW Consulting, Inc.

March 2007

Overview

Imagine that you need to retrieve a list of all the worksheets within an Excel workbook or set of workbooks. The ability to perform this operation without requiring you to load Excel 2007 and then load the workbooks, one after another, can be an incredible time saver, and the Office Open XML File Formats make this task possible. Working with the Office Open XML File Formats requires knowledge of the way Excel stores its content, the System.IO.Packaging API, and XML programming.

See It XLGetListOfSheets video thumbnail

Watch the Video

Length: 05:36 | Size: 3.7 MB | Type: WMV file

Code It | Read It | Explore It

Code It

To help you get started, you can download a set of forty code snippets for Visual Studio 2005, each of which demonstrate various techniques working with the Office 2007 Open XML File Formats. After you install the code snippets, create a sample Excel workbook to test with. Add extra worksheets, and change the names of the sheets, if you like. (See the Read It section for reference). Create a new Windows Application project in Visual Studio 2005, open the code editor, right-click, select Insert Snippet, and select the Excel: Get sheet info from the list of available snippets for the 2007 Office system. If you use Microsoft Visual Basic, inserting the snippet inserts a reference to WindowsBase.dll and adds the following Imports statements.

Imports System.IO
Imports System.IO.Packaging
Imports System.Xml

If you use Microsoft Visual C#, you need to add the reference to the WindowsBase.dll assembly and the corresponding using statements, so that you can compile the code. (Code snippets in C# cannot set references and insert using statements for you.) If the Windowsbase.dll reference does not appear on the .NET tab of the Add Reference dialog box, click the Browse tab, locate the C:\Program Files\Reference assemblies\Microsoft\Framework\v3.0 folder, and then click WindowsBase.dll.

The XLGetSheetInfo snippet delves programmatically into the various parts and relationships between the parts to retrieve a list of SheetInfo types (note that SheetInfo is a type inserted by the snippet—it includes information about the sheet name and type). To test the snippet, store your sample workbook somewhere easy to find (for example, C:\Test.xlsx). In a Microsoft Windows application, insert the XLGetSheetInfo snippet, and then call it, modifying the name of the workbook to meet your needs. You see the list of sheets in the Output window.

For Each sheet As SheetInfo In XLGetSheetInfo("C:\test.xlsx")
  Debug.WriteLine(sheet.SheetName)
Next
foreach (SheetInfo sheet in XLGetSheetInfo("C:\\Test.xlsx"))
{
  System.Diagnostics.Debug.WriteLine(sheet.SheetName);
}

The snippet code starts with the following block, which defines the SheetInfo type.

Public Structure SheetInfo
  Public SheetName As String
  Public SheetType As String

  Public Sub New(ByVal SheetName As String, _
   ByVal SheetType As String)
    Me.SheetName = SheetName
    Me.SheetType = SheetType
  End Sub
End Structure
public struct SheetInfo
{
  public string SheetName;
  public string SheetType;

  public SheetInfo(string SheetName, string SheetType)
  {
    this.SheetName = SheetName;
    this.SheetType = SheetType;
  }
}

This code starts by creating a constant that is used to refer to the relationship required by the procedure. The code then finds the document part, by calling the Package.GetRelationshipsByType method, passing in the constant that contains the document relationship name (see Figure 2). The code then loops through all the returned relationships, and retrieves the document URI, relative to the root of the package. You must loop through the PackageRelationship objects to retrieve the one you want. In every case, this loop only executes once. Assuming that it found the document part, the code loads the part. After working through the remainder of the code, the procedure returns the list of SheetInfo instances.

Public Function XLGetSheetInfo( _
 ByVal fileName As String) As List(Of SheetInfo)

  ' Return a generic list containing info about all the sheets.        
  Const documentRelationshipType As String = _
   "http://schemas.openxmlformats.org/officeDocument/" & _
   "2006/relationships/officeDocument"

  ' Fill this collection with a list of all the sheets
  Dim sheets As New List(Of SheetInfo)

  Using xlPackage As Package = Package.Open(fileName, _
   FileMode.Open, FileAccess.Read)
    ' Get the main document part (workbook.xml).
    For Each relationship As PackageRelationship In _
     xlPackage.GetRelationshipsByType(documentRelationshipType)
      ' There should only be one document part in the package. 
      Dim documentUri As Uri = PackUriHelper.ResolvePartUri( _
       New Uri("/", UriKind.Relative), relationship.TargetUri)
      Dim documentPart As PackagePart = _
       xlPackage.GetPart(documentUri)

       ' Next code block goes here.

      ' There's only one document part.
      Exit For
    Next
  End Using
  Return sheets
End Function
public List<SheetInfo> XLGetSheetInfo(string fileName)
{
  //  Return a generic list containing info about all the sheets.        
  const string documentRelationshipType = 
    "http://schemas.openxmlformats.org/officeDocument/" + 
    "2006/relationships/officeDocument";

  //  Fill this collection with a list of all the sheets
  List<SheetInfo> sheets = new List<SheetInfo>();

  using (Package xlPackage = 
    Package.Open(fileName, FileMode.Open, FileAccess.Read))
  {
    //  Get the main document part (workbook.xml).
    foreach (System.IO.Packaging.PackageRelationship 
      relationship in 
      xlPackage.GetRelationshipsByType(documentRelationshipType))
    {
      //  There should only be one document part in the package. 
      Uri documentUri = PackUriHelper.ResolvePartUri(
        new Uri("/", UriKind.Relative), relationship.TargetUri);
      PackagePart documentPart = xlPackage.GetPart(documentUri);

      // Next code block goes here.

      //  There's only one document part.
      break;
    }
  }
  return sheets;
}

Next, the code creates an XmlDocument instance to contain the contents of the workbook. It loads the XML content and creates an XmlNamespaceManager instance loaded with the namespace. This data is used later to perform searches. Note that because the namespace the code searches in is the default namespace for the XML content, it must fabricate an abbreviation—this code uses the name default.

' Load the contents of the workbook, which is all you 
' need to retrieve the names and types of the sheets:
Dim doc As XmlDocument = New XmlDocument()
doc.Load(documentPart.GetStream())

' Create a NamespaceManager to handle the default namespace, 
' and create a prefix for the default namespace:
Dim nsManager As New XmlNamespaceManager(doc.NameTable)
nsManager.AddNamespace( _
 "default", doc.DocumentElement.NamespaceURI)

' Next code block goes here.
//  Load the contents of the workbook, which is all you 
//  need to retrieve the names and types of the sheets:
XmlDocument doc = new XmlDocument();
doc.Load(documentPart.GetStream());

//  Create a NamespaceManager to handle the default namespace, 
//  and create a prefix for the default namespace:
XmlNamespaceManager nsManager = 
  new XmlNamespaceManager(doc.NameTable);
nsManager.AddNamespace(
  "default", doc.DocumentElement.NamespaceURI);

// Next code block goes here.

The final block loops through all the worksheet nodes it finds within the workbook part. For each one, it retrieves both the name and the type attributes (the default type is "worksheet") and stores the information in a new SheetInfo instance.

NoteNote

This snippet was created during Beta before the file format was final. Now, instead of retrieving information about the sheet type from an attribute, the snippet should retrieve the information from the relationship file, based on the relationship type. Therefore, this snippet reports that every sheet is a worksheet.)

Finally, the code adds the new SheetInfo to the list that the procedure returns.

' Loop through all the nodes, retrieving the information
' about each sheet:
For Each node As XmlNode In doc.SelectNodes( _
 "//default:sheets/default:sheet", nsManager)
  Dim sheetName As String = String.Empty
  Dim sheetType As String = "worksheet"

  sheetName = node.Attributes("name").Value
  Dim typeAttr As XmlAttribute = node.Attributes("type")
  If typeAttr IsNot Nothing Then
    sheetType = typeAttr.Value
  End If
  sheets.Add(New SheetInfo(sheetName, sheetType))
Next
//  Loop through all the nodes, retrieving the information
//  about each sheet:
foreach (System.Xml.XmlNode node in doc.SelectNodes(
  "//default:sheets/default:sheet", nsManager))
{
  string sheetName = string.Empty;
  string sheetType = "worksheet";

  sheetName = node.Attributes["name"].Value;
  XmlAttribute typeAttr = node.Attributes["type"];
  if (typeAttr != null)
  {
    sheetType = typeAttr.Value;
  }
  sheets.Add(new SheetInfo(sheetName, sheetType));
}

Read It

It’s important to understand the file structure of a simple Excel document, so that you can find the data you need—in this case, you want to list of all the sheets in the workbook. To do that, create an Excel workbook with several sheets. Change the names of a few sheets, as well. I named my document Test.xlsx, and it contains four sheets, as shown in Figure 1.

Figure 1. The sample document contains four sheets.

Sample document with four sheets

To investigate the contents of the document, you can follow these steps:

  1. In Windows Explorer, rename the document Test.xlsx.zip.

  2. Open the ZIP file, using either Window Explorer, or some ZIP-management application.

  3. View the _rels\.rels file, shown in Figure 2. This document contains information about the relationships between the parts in the document. Note the value for the workbook.xml part, as highlighted in the figure—this information allows you to find specific parts.

    Figure 2. In _rels\.rels, You use relations between parts to find specific parts.

    Use relations between parts to find specific parts

  4. Open xl\workbook.xml, shown in Figure 3. The highlighted element contains a list of all the sheets—it's from here that you gather all the information you need.

    Figure 3. In xl\workbook.xml, the document part contains a list of worksheets.

    Document part contains list of worksheets

  5. Close the tool you used to investigate the workbook, and rename the file with a .XLSX extension.

Explore It