Export (0) Print
Expand All
Expand Minimize

Manipulating Excel 2007 and PowerPoint 2007 Files with the Open XML Format API (Part 1 of 2)

Office 2007

Summary: The Welcome to the Open XML Format SDK 1.0 is a library for accessing Open XML Format files. This article, the first in a series of two, describes the Open XML object model code that you can use to access and manipulate Microsoft Office Excel 2007 and Microsoft Office PowerPoint 2007 files. (12 printed pages)

The 2007 Microsoft Office system introduces new file formats that are based on XML called Open XML Formats. Microsoft Office Word 2007, Microsoft Office Excel 2007, and Microsoft Office PowerPoint 2007 all use these formats as the default file format. Open XML formats are useful because they are an open standard and are based on well-known technologies: ZIP and XML. Microsoft provides a library for accessing these files as part of the .NET Framework 3.0 technologies in the DocumentFormat.OpenXml namespace in the Welcome to the Open XML Format SDK 1.0. The Open XML Format members are contained in theDocumentFormat.OpenXml API and provide strongly-typed part classes to manipulate Open XML documents. The SDK simplifies the task of manipulating Open XML packages. The Open XML Format API encapsulates many common tasks that developers perform on Open XML Format packages, so you can perform complex operations with just a few lines of code.

NoteNote

You can find additional samples of manipulating Open XML Format files as well as references for each member contained in the Open XML object model in the Open XML Format SDK 1.0.

The Open XML Package Convention specification defines a set of XML files that contain the content and define the relationships for all of the parts stored in a single package. These packages combine the parts that make up the document files for the 2007 Microsoft Office programs that support the Open XML Format. The Open XML Format API discussed in this article allows you to create packages and manipulate the files that make up the packages.

In the following code, you remove document comments for a specific user name.

Public Sub XLDeleteCommentsByUser(ByVal fileName As String, ByVal userName As String)
   Const commentsSchema As String = "http://schemas.openxmlformats.org/spreadsheetml/2006/main"
   Dim doc As SpreadsheetDocument = SpreadsheetDocument.Open(fileName, True)
   '  Manage namespaces to perform XML XPath queries.
   Dim nt As NameTable = New NameTable
   Dim nsManager As XmlNamespaceManager = New XmlNamespaceManager(nt)
   nsManager.AddNamespace("sh", commentsSchema)
   For Each part As WorksheetPart In doc.WorkbookPart.WorksheetParts
      If (Not (part.WorksheetCommentsPart) Is Nothing) Then
         Dim commentDoc As XmlDocument = New XmlDocument
         commentDoc.Load(part.WorksheetCommentsPart.GetStream)
         Dim searchString As String = String.Format("//sh:authors/sh:author[text()='{0}']", userName)
         Dim node As XmlNode = commentDoc.SelectSingleNode(searchString, nsManager)
         If (Not (node) Is Nothing) Then
            Dim nodes As XmlNodeList = node.SelectNodes("preceding-sibling::sh:author", nsManager)
            Dim authorID As Integer = nodes.Count

            Dim commentListNode As XmlNode = commentDoc.SelectSingleNode("sh:comments/sh:commentList", nsManager)
            If (Not (commentListNode) Is Nothing) Then
               searchString = String.Format("./sh:comment[@authorId='{0}']", authorID)
               Dim comments As XmlNodeList = commentListNode.SelectNodes(searchString, nsManager)
               For Each commentNode As System.Xml.XmlNode In comments
                  commentListNode.RemoveChild(commentNode)
               Next
            End If
         End If
         '  Save the comment XML back to its part.
         commentDoc.Save(part.WorksheetCommentsPart.GetStream(FileMode.Create))
      End If
   Next
End Sub

In this procedure, you pass in two parameters: the full path to the file and the user name of the person whose comments you want to delete. Then you open the input file as an Open XML package by using the Open method of the SpreadsheetDocument object. Next, you set up a namespace manager by using the XmlNamespaceManager object and by setting a reference to the default SpreadsheetML namespace with the sh qualifier.

The procedure then checks for the existence of the comments part in the package. The contents of the comments part are loaded into a memory-resident XML document. Then you search the workbook for nodes by a specific user name by using the XPath expression sh:authors/sh:author. If the user name is found, you then find the ordinal value of the user name. Then you retrieve the index of the selected node. To do that, you count the number of preceding nodes by retrieving a reference to those nodes.

Finally, the updated comments markup stream is saved back to the comments part.

In the following code, you delete a worksheet from a spreadsheet document.

Public Function XLDeleteSheet(ByVal fileName As String, ByVal sheetToDelete As String) As Boolean
   Dim returnValue As Boolean = False
   '  Open the package with read/write access.
   Dim xlDoc As SpreadsheetDocument = SpreadsheetDocument.Open(fileName, True)
   Dim doc As XmlDocument = New XmlDocument
   doc.Load(xlDoc.WorkbookPart.GetStream)
   Dim nsManager As XmlNamespaceManager = New XmlNamespaceManager(doc.NameTable)
   nsManager.AddNamespace("d", doc.DocumentElement.NamespaceURI)
   Dim searchString As String = String.Format("//d:sheet[@name='{0}']", sheetToDelete)
   Dim node As XmlNode = doc.SelectSingleNode(searchString, nsManager)
   If (Not (node) Is Nothing) Then
      Dim relationAttribute As XmlAttribute = node.Attributes("r:id")
      If (Not (relationAttribute) Is Nothing) Then
         Dim relId As String = relationAttribute.Value
         xlDoc.WorkbookPart.DeletePart(relId)
         node.ParentNode.RemoveChild(node)
         doc.Save(xlDoc.WorkbookPart.GetStream(FileMode.Create))
         returnValue = True
      End If
   End If
   Return returnValue
End Function

In this procedure, you pass in two parameters: the full path to the workbook and the name of the worksheet to delete. Then you open the input file as an Open XML package by using the Open method of the SpreadsheetDocument object. Next, you load the contents of the workbook into an XML DOM document. You then set up a namespace manager by using the XmlNamespaceManager object and by setting a reference to the default SpreadsheetML namespace with the d qualifier. Then you search the document for the specified worksheet by using the name attribute of the //d:sheet node.

For all of the matching nodes (if any exist), the relationship Id is retrieved and the worksheet corresponding to that Id is deleted.

And finally, the updated SpreadsheetML markup is saved back to the main workbook part.

Given a workbook and a worksheet name, the following code returns either a list of hidden row numbers, or a list of hidden column numbers.

Public Function XLDetectHiddenRowsOrCols(ByVal fileName As String, ByVal sheetName As String, ByVal detectRows As Boolean) As List(Of Integer)
   Const worksheetSchema As String = "http://schemas.openxmlformats.org/spreadsheetml/2006/main"
   '  Set up the return value list.
   Dim itemList As List(Of Integer) = New System.Collections.Generic.List(Of Integer)
   Dim xlDoc As SpreadsheetDocument = SpreadsheetDocument.Open(fileName, False)
   '  Load the contents of the workbook.
   Dim doc As XmlDocument = New XmlDocument
   doc.Load(xlDoc.WorkbookPart.GetStream)
   '  Create a namespace manager, so you can search.
   '  Add a prefix for the default namespace.
   Dim nsManager As XmlNamespaceManager = New XmlNamespaceManager(doc.NameTable)
   nsManager.AddNamespace("d", worksheetSchema)
   Dim searchString As String = String.Format("//d:sheet[@name='{0}']", sheetName)
   Dim sheetNode As XmlNode = doc.SelectSingleNode(searchString, nsManager)
   If (Not (sheetNode) Is Nothing) Then
      '  Get the relId attribute.
      Dim relationAttribute As XmlAttribute = sheetNode.Attributes("r:id")
      If (Not (relationAttribute) Is Nothing) Then
         Dim relId As String = relationAttribute.Value
         Dim sheetPart As WorksheetPart = CType(xlDoc.WorkbookPart.GetPartById(relId), WorksheetPart)
         '  First, get the relationship between the document and the sheet.
         '  Load the contents of the workbook.
         Dim sheetDoc As XmlDocument = New XmlDocument
         sheetDoc.Load(sheetPart.GetStream(FileMode.Open))
         If detectRows Then
             '  Retrieve the list of hidden rows.
             For Each node As System.Xml.XmlNode In sheetDoc.SelectNodes("//d:row[@hidden='1']", nsManager)
                  '  For each hidden row, add information to the output list.
                  Dim rowAttr As XmlAttribute = node.Attributes("r")
                  If (Not (rowAttr) Is Nothing) Then
                      itemList.Add(Convert.ToInt32(rowAttr.Value))
                  End If
          Next
      Else
          '  Retrieve the list of hidden columns
          For Each node As System.Xml.XmlNode In sheetDoc.SelectNodes("//d:cols/d:col", nsManager)
            Dim hiddenAttr As XmlAttribute = node.Attributes("hidden")
            If (Not (hiddenAttr) Is Nothing) Then
                If (hiddenAttr.Value = "1") Then
                   '  Get the range of columns that are hidden.
                   Dim minAttr As XmlAttribute = node.Attributes("min")
                   Dim maxAttr As XmlAttribute = node.Attributes("max")
                   If ((Not (minAttr) Is Nothing) AndAlso (Not (maxAttr) Is Nothing)) Then
                      '  Finally, add the range of values to the list.
                      Dim minValue As Integer = Convert.ToInt32(minAttr.Value)
                      Dim maxValue As Integer = Convert.ToInt32(maxAttr.Value)
                      Dim i As Integer = minValue
                      Do While (i <= maxValue)
                         itemList.Add(i)
                         i = (i + 1)
                      Loop
                   End If
                 End If
              End If
            Next
         End If
      End If
   End If
   Return itemList
End Function

In this procedure, you pass in three parameters: the full path to the Excel 2007 workbook, the name of the worksheet, and, a Boolean value specifying whether to search for hidden rows. If detectRows is true, the procedure returns a list of hidden rows. If detectRows is false, then the list of hidden columns is returned. Then you open the input file as an Open XML package by using the Open method of the SpreadsheetDocument object. Next, you load the contents of the workbook into an XML DOM document. You then set up a namespace manager by using the XmlNamespaceManager object and by setting a reference to the default SpreadsheetML namespace, with the d qualifier. Then you search the document for the node representing the specified worksheet by using the name attribute of the //d:sheet node.

If the matching node is found, you get the relationship between the workbook and the worksheet. First, load the contents of the worksheet. If detectRows is true, then retrieve the list of hidden rows by using the hidden attribute of the //d:row node.

NoteNote

Rows are numbered starting with 1. Columns are numbered starting with zero (0).

If detectRows is false then it is assumed that you want to search for hidden columns, which you do by using the hidden attribute of the //d:cols/d:col node.

And finally, the list of items is returned to the calling procedure.

Given an Excel 2007 workbook and the title of a chart, create a Word 2007 file and export the chart.

Public Sub XLExportChart(ByVal inputFileName As String, ByVal outputFileName As String, ByVal chartTitle As String)
   Const drawingMLSchema As String = "http://schemas.openxmlformats.org/drawingml/2006/main"
   Const chartMLSchema As String = "http://schemas.openxmlformats.org/drawingml/2006/chart"
   Dim chartFound As Boolean = False
   Dim xlDoc As SpreadsheetDocument = SpreadsheetDocument.Open(inputFileName, False)
   For Each sheetPart As WorksheetPart In xlDoc.WorkbookPart.WorksheetParts
      If (sheetPart.DrawingsPart Is Nothing) Then
         Continue For
      End If
      '  Loop through all the parts related to the worksheet.
      For Each chartPart As ChartPart In sheetPart.DrawingsPart.ChartParts
         '  A chart is found. Does it have the correct title?
         '  Create a namespace manager, so you can search.
         Dim nt As NameTable = New NameTable
         Dim nsManager As XmlNamespaceManager = New XmlNamespaceManager(nt)
         nsManager.AddNamespace("c", chartMLSchema)
         nsManager.AddNamespace("a", drawingMLSchema)
         '  Load the chart document from the part's stream, and 
         '  search for the requested title.
         Dim chartDoc As XmlDocument = New XmlDocument(nt)
         chartDoc.Load(chartPart.GetStream)
         Dim titleNode As XmlNode = chartDoc.SelectSingleNode("//c:chart//c:title//a:t", nsManager)
         If (Not (titleNode) Is Nothing) Then
            If (String.Compare(titleNode.InnerText, chartTitle, True) = 0) Then
               Dim newDoc As WordprocessingDocument = WordprocessingDocument.Create(outputFileName, WordprocessingDocumentType.Document)
               newDoc.AddMainDocumentPart()
            newDoc.MainDocumentPart.AddPart(Of ChartPart)(chartPart)
               '  Tell the outer loops that you are finished.
               chartFound = True
             End If
         End If
      Next
      '  Because you need to export only a single chart, exit.
      If chartFound Then
         Exit Sub
      End If
   Next
End Sub

In this procedure, you pass in three parameters: the full path to the workbook, the full path of the output file, and the name of the chart.

NoteNote

In this procedure, the Word 2007 document is used as temporary storage for the chart part and, as such, none of the relationships are defined. Thus, the document cannot be opened in Word. You could substitute the code in this procedure with code that correctly adds a part to the document package. For a sample of this type of code, see the Open XML Format SDK 1.0.

First you open the input file as an Open XML package by using the Open method of the SpreadsheetDocument object. Next, you set up a namespace manager by using the XmlNamespaceManager object and by setting a reference to the default chartML namespace with the c qualifier and to the drawingML namespace with the a qualifier.

Then the procedure loops through each chartPart part searching for the chart title by using the //c:chart//c:title//a:t XPath expression. If it is found, a Word 2007 document is created and the chartPart part is exported to the package.

In the following code example, you create a list of all of the hidden worksheets in a workbook.

Public Function XLGetHiddenSheets(ByVal fileName As String) As List(Of String)
   Dim sheets As List(Of String) = New List(Of String)()
   Dim xlDoc As SpreadsheetDocument = SpreadsheetDocument.Open(fileName, False)
   '  Get the main document part (workbook.xml).
   Dim doc As XmlDocument = New XmlDocument
   doc.Load(xlDoc.WorkbookPart.GetStream)
   '  Create a NamespaceManager to handle the default namespace, 
   '  and create a prefix for the default namespace.
   Dim nsManager As XmlNamespaceManager = New XmlNamespaceManager(doc.NameTable)
   nsManager.AddNamespace("default", doc.DocumentElement.NamespaceURI)
   For Each node As System.Xml.XmlNode In doc.SelectNodes("//default:sheets//default:sheet[@state='hidden']", nsManager)
      Dim sheetName As String = node.Attributes("name").Value
      sheets.Add(sheetName)
   Next
   Return sheets
End Function

In this procedure, you pass in one parameter: the full path to the workbook. Then you open the input file as an Open XML package by using the Open method of the SpreadsheetDocument object and load the data into an XML document. Next, you set up a namespace manager by using the XmlNamespaceManager object and by setting a reference to the default SpreadsheetML namespace to search for the hidden worksheets.

Then you search the document for hidden worksheets by using the state attribute of the //default:sheets//default:sheet node. The name of each hidden worksheet is added to the sheets list. After all of the worksheets are searched, the sheets list is returned to the calling procedure.

In the following code, you retrieve the value of a cell in a worksheet.

Public Function XLGetCellValue(ByVal fileName As String, ByVal sheetName As String, ByVal addressName As String) As String
   Const worksheetSchema As String = "http://schemas.openxmlformats.org/spreadsheetml/2006/main"
   Const sharedStringSchema As String = "http://schemas.openxmlformats.org/spreadsheetml/2006/main"
   Dim cellValue As String = Nothing
   '  Retrieve the stream containing the requested
   '  worksheet's information.
   Dim xlDoc As SpreadsheetDocument = SpreadsheetDocument.Open(fileName, False)
   '  Get the main document part (workbook.xml).
   Dim doc As XmlDocument = New XmlDocument
   doc.Load(xlDoc.WorkbookPart.GetStream)
   '  Create a namespace manager, so you can search.
   '  Add a prefix (d) for the default namespace.
   Dim nt As NameTable = New NameTable
   Dim nsManager As XmlNamespaceManager = New XmlNamespaceManager(nt)
   nsManager.AddNamespace("d", worksheetSchema)
   nsManager.AddNamespace("s", sharedStringSchema)
   Dim searchString As String = String.Format("//d:sheet[@name='{0}']", sheetName)
   Dim sheetNode As XmlNode = doc.SelectSingleNode(searchString, nsManager)
   If (Not (sheetNode) Is Nothing) Then
      '  Get the relId attribute.
      Dim relationAttribute As XmlAttribute = sheetNode.Attributes("r:id")
      If (Not (relationAttribute) Is Nothing) Then
         Dim relId As String = relationAttribute.Value
         '  Load the contents of the workbook.
         Dim sheetDoc As XmlDocument = New XmlDocument(nt)
         sheetDoc.Load(xlDoc.WorkbookPart.GetPartById(relId).GetStream)
         Dim cellNode As XmlNode = sheetDoc.SelectSingleNode(String.Format("//d:sheetData/d:row/d:c[@r='{0}']", addressName), nsManager)
         If (Not (cellNode) Is Nothing) Then
            Dim typeAttr As XmlAttribute = cellNode.Attributes("t")
            Dim cellType As String = String.Empty
            If (Not (typeAttr) Is Nothing) Then
               cellType = typeAttr.Value
            End If
            Dim valueNode As XmlNode = cellNode.SelectSingleNode("d:v", nsManager)
            If (Not (valueNode) Is Nothing) Then
               cellValue = valueNode.InnerText
            End If
            If (cellType = "b") Then
               If (cellValue = "1") Then
                  cellValue = "TRUE"
               Else
                  cellValue = "FALSE"
               End If
            ElseIf (cellType = "s") Then
               If (Not (xlDoc.WorkbookPart.SharedStringTablePart) Is Nothing) Then
                   Dim stringDoc As XmlDocument = New XmlDocument(nt)
                   stringDoc.Load(xlDoc.WorkbookPart.SharedStringTablePart.GetStream)
                   '  Add the string schema to the namespace manager.
                   nsManager.AddNamespace("s", sharedStringSchema)
                   Dim requestedString As Integer = Convert.ToInt32(cellValue)
                   Dim strSearch As String = String.Format("//s:sst/s:si[{0}]", (requestedString + 1))
                   Dim stringNode As XmlNode = stringDoc.SelectSingleNode(strSearch, nsManager)
                   If (Not (stringNode) Is Nothing) Then
                      cellValue = stringNode.InnerText
                   End If
                End If
            End If
         End If
      End If
   End If
   Return cellValue
End Function

In this procedure, you pass in three parameters: the full path to the workbook, the name of the worksheet, and the address of the cell containing the value you want to retrieve. Then you open the input file as an Open XML package by using the Open method of the SpreadsheetDocument object and load the data into an XML document. Next, you set up a namespace manager by using the XmlNamespaceManager object and by setting a reference to the default worksheetSchema namespace with d the qualifier and the sharedStringSchema with the s qualifier. The sharedStringSchema namespace refers to the SharedStringTablePart part, which contains strings that are shared across cells.

Next, you retrieve the node representing the specified worksheet in the main workbook part by selecting the name attribute of the //d:sheet node. If the node is found, the relationship Id of the worksheet is retrieved and used to load the worksheet into an XML document. Then you retrieve the value. In the node, if the t attribute contains an s, this indicates that this is a shared string and must be looked up in the SharedStringTablePart part. Otherwise, the value can be retrieved directly from the node.

NoteNote

This code checks only for Boolean and String values individually.

Finally, the procedure returns either the cell value or a Boolean value specifying whether the search succeeded.

As this article demonstrates, working with Excel 2007 and PowerPoint 2007 files is much easier with the Welcome to the Open XML Format SDK 1.0. In part two of this series of articles (coming soon), I describe other common tasks that you can perform with Excel 2007 and PowerPoint 2007 files.

Show:
© 2014 Microsoft