How to: Get a Column Heading in a Spreadsheet Document

Last modified: October 14, 2010

Applies to: Excel 2010 | Office 2010 | PowerPoint 2010 | Word 2010

In this article
Create a SpreadsheetDocument Object
Basic Structure of a SpreadsheetML Document
How the Sample Code Works
Sample Code

This topic shows how to use the classes in the Open XML SDK 2.0 for Microsoft Office to retrieve a column heading in a spreadsheet document programmatically.

The following assembly directives are required to compile the code in this topic.

Imports System.Collections.Generic
Imports System.Linq
Imports DocumentFormat.OpenXml.Packaging
Imports DocumentFormat.OpenXml.Spreadsheet
Imports System.Text.RegularExpressions

In the Open XML SDK, the SpreadsheetDocument class represents an Excel document package. To create an Excel document, you create an instance of the SpreadsheetDocument class and populate it with parts. At a minimum, the document must have a workbook part that serves as a container for the document, and at least one worksheet part. The text is represented in the package as XML using SpreadsheetML markup.

To create the class instance from the document you call one of the Open() overload methods. In this example, you need to open the file for read access only. Therefore, you can use the Open(String, Boolean) method, and set the Boolean parameter to false.

The following code example calls the Open method to Open the file specified by the filepath for read-only access.

' Open the document as read-only.
Dim document As SpreadsheetDocument = SpreadsheetDocument.Open(docName, False)

The using statement provides a recommended alternative to the typical .Open, .Save, .Close sequence. It ensures that the Dispose method (internal method used by the Open XML SDK to clean up resources) is automatically called when the closing brace is reached. The block that follows the using statement establishes a scope for the object that is created or named in the using statement, in this case mySpreadsheet.

The basic document structure of a SpreadsheetML document consists of the Sheets and Sheet elements, which reference the worksheets in the Workbook. A separate XML file is created for each Worksheet. For example, the SpreadsheetML for a workbook that has two worksheets name MySheet1 and MySheet2 is located in the Workbook.xml file and is shown in the following code example.

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> 
<workbook xmlns=http://schemas.openxmlformats.org/spreadsheetml/2006/main xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
    <sheets>
        <sheet name="MySheet1" sheetId="1" r:id="rId1" /> 
        <sheet name="MySheet2" sheetId="2" r:id="rId2" /> 
    </sheets>
</workbook>

The worksheet XML files contain one or more block level elements such as SheetData. sheetData represents the cell table and contains one or more Row elements. A row contains one or more Cell elements. Each cell contains a CellValue element that represents the value of the cell. For example, the SpreadsheetML for the first worksheet in a workbook, that only has the value 100 in cell A1, is located in the Sheet1.xml file and is shown in the following code example.

<?xml version="1.0" encoding="UTF-8" ?> 
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
    <sheetData>
        <row r="1">
<c r="A1">
    <v>100</v> 
</c>
        </row>
    </sheetData>
</worksheet>

Using the Open XML SDK 2.0, you can create document structure and content that uses strongly-typed classes that correspond to SpreadsheetML elements. You can find these classes in the DocumentFormat.OpenXML.Spreadsheet namespace. The following table lists the class names of the classes that correspond to the workbook, sheets, sheet, worksheet, and sheetData elements.

SpreadsheetML Element

Open XML SDK 2.0 Class

Description

workbook

Workbook

The root element for the main document part.

sheets

DocumentFormat.OpenXml.Spreadsheet.Sheets

The container for the block level structures such as sheet, fileVersion, and others specified in the ISO/IEC 29500 specification.

sheet

DocumentFormat.OpenXml.Spreadsheet.Sheet

A sheet that points to a sheet definition file.

worksheet

DocumentFormat.OpenXml.Spreadsheet.Worksheet

A sheet definition file that contains the sheet data.

sheetData

DocumentFormat.OpenXml.Spreadsheet.SheetData

The cell table, grouped together by rows.

row

DocumentFormat.OpenXml.Spreadsheet.Row

A row in the cell table.

c

DocumentFormat.OpenXml.Spreadsheet.Cell

A cell in a row.

v

DocumentFormat.OpenXml.Spreadsheet.CellValue

The value of a cell.

The code in this how-to consists of three methods (functions in Visual Basic): GetColumnHeading, GetColumnName, and GetRowIndex. The last two methods are called from within the GetColumnHeading method.

The GetColumnName method takes the cell name as a parameter. It parses the cell name to get the column name by creating a regular expression to match the column name portion of the cell name. For more information about regular expressions, see Regular Expression Language Elements.

' Create a regular expression to match the column name portion of the cell name.
Dim regex As Regex = New Regex("[A-Za-z]+")
Dim match As Match = regex.Match(cellName)
Return match.Value

The GetRowIndex method takes the cell name as a parameter. It parses the cell name to get the row index by creating a regular expression to match the row index portion of the cell name.

' Create a regular expression to match the row index portion the cell name.
Dim regex As Regex = New Regex("\d+")
Dim match As Match = regex.Match(cellName)
Return UInteger.Parse(match.Value)

The GetColumnHeading method uses three parameters, the full path to the source spreadsheet file, the name of the worksheet that contains the specified column, and the name of a cell in the column for which to get the heading.

The code gets the name of the column of the specified cell by calling the GetColumnName method. The code also gets the cells in the column and orders them by row using the GetRowIndex method.

' Get the column name for the specified cell.
Dim columnName As String = GetColumnName(cellName)

' Get the cells in the specified column and order them by row.
Dim cells As IEnumerable(Of Cell) = worksheetPart.Worksheet.Descendants(Of Cell)().Where(Function(c) _
    String.Compare(GetColumnName(c.CellReference.Value), columnName, True) = 0).OrderBy(Function(r) GetRowIndex(r.CellReference))

If the specified column exists, it gets the first cell in the column using the IEnumerable(T).First method. The first cell contains the heading.

' Get the first cell in the column.
Dim headCell As Cell = cells.First()

If the content of the cell is stored in the SharedStringTablePart object, it gets the shared string items and returns the content of the column heading using the M:System.Int32.Parse(System.String) method. If the content of the cell is not in the SharedStringTable object, it returns the content of the cell.

' If the content of the first cell is stored as a shared string, get the text of the first cell
' from the SharedStringTablePart and return it. Otherwise, return the string value of the cell.
If ((Not (headCell.DataType) Is Nothing) AndAlso (headCell.DataType.Value = CellValues.SharedString)) Then
    Dim shareStringPart As SharedStringTablePart = document.WorkbookPart.GetPartsOfType(Of SharedStringTablePart)().First()
    Dim items() As SharedStringItem = shareStringPart.SharedStringTable.Elements(Of SharedStringItem)().ToArray()
    Return items(Integer.Parse(headCell.CellValue.Text)).InnerText
Else
    Return headCell.CellValue.Text
End If

The following code example shows how to retrieve the column heading using the name of the column. You can call the GetColumnHeading method by using a call like the following example that uses the file "Sheet4.xlsx."

Dim docName As String = "C:\Users\Public\Documents\Sheet4.xlsx"
Dim worksheetName As String = "Sheet1"
Dim cellName As String = "B2"
Dim s1 As String = GetColumnHeading(docName, worksheetName, cellName)

Following is the complete sample code in both C# and Visual Basic.

' Given a document name, a worksheet name, and a cell name, gets the column of the cell and returns
' the content of the first cell in that column.
Public Function GetColumnHeading(ByVal docName As String, ByVal worksheetName As String, ByVal cellName As String) As String
    ' Open the document as read-only.
    Dim document As SpreadsheetDocument = SpreadsheetDocument.Open(docName, False)

    Using (document)
Dim sheets As IEnumerable(Of Sheet) = document.WorkbookPart.Workbook.Descendants(Of Sheet)().Where(Function(s) s.Name = worksheetName)
If (sheets.Count() = 0) Then
    ' The specified worksheet does not exist.
    Return Nothing
End If

Dim worksheetPart As WorksheetPart = CType(document.WorkbookPart.GetPartById(sheets.First.Id), WorksheetPart)

' Get the column name for the specified cell.
Dim columnName As String = GetColumnName(cellName)

' Get the cells in the specified column and order them by row.
Dim cells As IEnumerable(Of Cell) = worksheetPart.Worksheet.Descendants(Of Cell)().Where(Function(c) _
    String.Compare(GetColumnName(c.CellReference.Value), columnName, True) = 0).OrderBy(Function(r) GetRowIndex(r.CellReference))

If (cells.Count() = 0) Then
    ' The specified column does not exist.
    Return Nothing
End If

' Get the first cell in the column.
Dim headCell As Cell = cells.First()

' If the content of the first cell is stored as a shared string, get the text of the first cell
' from the SharedStringTablePart and return it. Otherwise, return the string value of the cell.
If ((Not (headCell.DataType) Is Nothing) AndAlso (headCell.DataType.Value = CellValues.SharedString)) Then
    Dim shareStringPart As SharedStringTablePart = document.WorkbookPart.GetPartsOfType(Of SharedStringTablePart)().First()
    Dim items() As SharedStringItem = shareStringPart.SharedStringTable.Elements(Of SharedStringItem)().ToArray()
    Return items(Integer.Parse(headCell.CellValue.Text)).InnerText
Else
    Return headCell.CellValue.Text
End If

    End Using
End Function

' Given a cell name, parses the specified cell to get the column name.
Private Function GetColumnName(ByVal cellName As String) As String
    ' Create a regular expression to match the column name portion of the cell name.
    Dim regex As Regex = New Regex("[A-Za-z]+")
    Dim match As Match = regex.Match(cellName)
    Return match.Value
End Function

' Given a cell name, parses the specified cell to get the row index.
Private Function GetRowIndex(ByVal cellName As String) As UInteger
    ' Create a regular expression to match the row index portion the cell name.
    Dim regex As Regex = New Regex("\d+")
    Dim match As Match = regex.Match(cellName)
    Return UInteger.Parse(match.Value)
End Function
Show: