Editing Data in an Excel 2007 Open XML File with VBA
Explore different ways to find particular XML data in an Excel 2007 file, retrieve the data, and then write it back to the file all by using Office Open XML. (14 printed pages)
Jan Karel Pieterse, Excel MVP
Frank Rice, Microsoft Corporation
Applies to: Microsoft Office Excel 2007
Introducing Office Open XML Files
The Microsoft Office Open XML file structure makes it possible to edit and update Microsoft Office 2007 files without the need to open the file in the application that created it. The components parts of the Open XML file are contained in an industry standard Zip container. Inside of the container are XML files (document parts) that: define the relationship between each part in the package, contain the data displayed in the document or workbook, define the styles and formatting used in the document, and contain information on the properties and settings of package and its document parts. Because it is a Zip file, you can see the document parts and structure of the file by adding the .zip extension to the document file name and opening the package. Figure 1 is an example of the structure and components of a Microsoft Excel 2007 workbook.
Figure 1. The structure and contents of an Excel 2007 workbook
In this article, we demonstrate how to manually explore the structure and parts of an Excel 2007 workbook and add text and numbers to a cell. After that, we provide Microsoft Visual Basic for Applications (VBA) code that programmatically opens the Zip file, extracts the XML contents of a document part, writes the XML back to the document part, and then closes the Zip file.
In the following sections, we demonstrate how to find the document part that contains the data in a worksheet, and add text and a number to a cell in the worksheet. We use the sample Excel 2007 file available for download from the JKP Application Development Service Web site.
Finding Data in Worksheets
To get started, open the workbook in the sample. The workbook contains a number of worksheets as shown Figure 2.
Figure 2. The contents of the sample Excel 2007 file
For this exercise, suppose that you want to find data that resides in the Comments worksheet without opening the workbook in Excel. Figure 3 shows the contents of the Comments worksheet. This information will be used later.
Figure 3. The contents of the Comments worksheet
The first thing you want to do is to find which XML file in the package contains the data in the Comments worksheet. To do this, open the workbook.xml in an XML editor or text editor such as NotePad. Figure 4 shows an excerpt of the document part’s XML.
Figure 4. An excerpt of the workbook part
Looking the <sheet> element for the Comment worksheet, you see that the r:id attribute, the relationship identifier, is equal to rId7. To find which worksheet part this relationship Id points to, open the workbook.xml.rels part in the xl\_rels folder. Figure 5 shows an excerpt from the file.
Figure 5. An excerpt of the workbook.xml.rels part
The highlighted statement in Figure 5 shows the relationship rId7 points to sheet7.xml in the workbook. Thus the Comments worksheet is represented by the sheet7.xml part in the workbook package. See Figure 3 for the content of the Comments worksheet. Next, open worksheet sheet7.xml in the editor. Figure 6 displays the XML for this part which shows cells A1 through A3.
Figure 6. The XML for the sheet7.xml part
Now suppose that you are interested in the content of cell A1 (<c r="A1" t="s">). How do you find that content? The t="s" attribute says that the cell contains a string. The package contains a xl\sharedStrings.xml part where the strings in a workbook are stored in a table. Because several cells may share the same string such as a postal code or country name, instead of repeating the same string in each cell, it is more efficient to store the string once in a separate part. Then each cell that uses a particular string contains a number representing the indexed location of that string in the sharedStrings part.
So looking at the next line <v>28</v> in the sheet7.xml part tells you that the value for that cell is contained in the shared strings table at index 28. Opening the sharedString.xml part shows the following information at the top of the file.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="69" uniqueCount="46">
The uniqueCount attribute says that there are 46 unique strings in the workbook. Counting down to item 29, you see that the item contains the XML shown in Figure 7.
Figure 7. The contents of the sharedString.xml part
Adding Text to Cells
Now suppose that you want to add a new text entry, which is not in the shared strings table, into a row in the Comments worksheet.
Update the string counts at the top of the sharedString.xml part. The updated XML should look like the following:
Insert the new text into the sharedString.xml part at position 29 (the index is zero-based) as shown in Figure 8.
Figure 8. Insert this text to position 29 in the sharedString.xml part
Add an entry to the sheet7.xml part as shown in Figure 9.
Figure 9. Insert this data into the sheet7.xml part
Remove the .zip file extension from the filename and open the file in Excel 2007. The new text is in the Comments worksheet.
Adding Numbers to Cells
Adding numbers to a cell is similar to adding text to a cell with two differences:
The numbers are stored in the worksheet part with no reference to the shared string table.
You can add number formatting.
For example, Figure 10 shows the XML in sheet7.xml that refers to the number 12 in cell A2 (see Figure 3) with the General number format and no other special formatting.
Figure 10. The number 12 in cell A2
Cells with formatting (styles) are referred to by an index number into the Styles.xml part. For example, Figure 11 shows the entry in sheet7.xml for the data shown in cell A3 (see Figure 3).
Figure 11. The entry for cell A3 in sheet7.xml
Figure 12 shows the entry for the index 34 in the styles.xml part. This entry points to numFmtId = 14 which is an implicit value for the date format mm-dd-yy as shown in the table in Figure 13. For more information about implicit style formats, see the Standard ECMA-376 Office Open XML Formats: 2nd Edition Part 2.
Figure 12. The indexed entry for index 34 in the styles.xml part
Figure 13. Table of implicit style formats
As this section demonstrates, adding formatted cells to a worksheet is not straightforward. The best way to do this is to use an existing format style as a model.
Editing Elements in Open XML Files with VBA
Now that you have had a chance to go through the process of manual reading and updating an Office Open XML Spreadsheet file, it is time to see how to edit the contents of a file programmatically with VBA. You can find a downloadable version of the file used in this section at the JKP Application Development Service Web site
The VBA project is located in the file EditOpenXML.xlsm and consists of two modules and one class module as seen in Figure 14. The modDemo module contains the code that calls the method in the clsEditOpenXML class. The modUNC module contains code to work with UNC paths.
Figure 14. The EditOpenXML project structure
The project also uses the Microsoft XML, version 3.0 library to assist in working with the XML parts. The project also takes advantage of the File Scripting Object and the Windows Shell application object. The class module performs the following actions:
Uncompress an .xlsx or .xlsm file (in this sample formcontrols.xlsm)
Extract XML from the Office Open XML part in the file formcontrols.xlsm
Write the resulting XML to Sheet2 in the EditOpenXML.xlsm file
Write the XML back to the part
Compress the Zip file
These actions are discussed in the sections that follow.
Using Class Modules
To call the procedures in the following section, I have created a module named modDemo.
Public Sub Demo() Dim cEditOpenXML As clsEditOpenXML Dim sXML As String Set cEditOpenXML = New clsEditOpenXML With cEditOpenXML 'Indicate which OpenXML file to process. .SourceFile = ThisWorkbook.Path & "\formcontrols.xlsm" 'Before you can access info in the file, it must be unzipped. .UnzipFile 'Indicate which sheet you want to change. .Sheet2Change = "MySheet" 'Get XML from the sheet's xml file. sXML = .GetXMLFromFile(.SheetFileName) 'Change the xml of the sheet here. 'To prove we really imported the sheet, write the content of 'the cells to Sheet2 of this file. WriteXML2Sheet sXML 'Write the xml back to the sheet. '.WriteXML2File sXML, .SheetFileName 'Rezip the unzipped package. .ZipAllFilesInFolder End With 'After you let the class go out of scope, then remove the zip file's .zip extension 'in the terminate event of the class. 'Then the OpenXML file has its original filename back. Set cEditOpenXML = Nothing End Sub
Uncompressing.xlsx or .xlsm Files
To work with the files in the Office Open XML package, the file (the file named formcontrols.xlsm) must be uncompressed into its components parts. The code that follows does that. As a precaution, a backup copy of the package file is created. Next, a folder is created to contain the uncompressed files. Then the files are copied to the target folder.
Public Sub UnzipFile() ' This code is courtesy Ron de Bruin www.rondebruin.nl. Dim FSO As Object Dim oShellApp As Object Set FSO = CreateObject("scripting.filesystemobject") ' Derive the folder to unzip to from the location of the source file. UnzipFolder = FolderName ' In the same folder, create a dedicated unzip folder as the source file, ' named ..\ Unzipped Filename\ If Right(UnzipFolder, 1) <> "\" Then UnzipFolder = UnzipFolder & "\UnZipped " & FileName & "\" Else UnzipFolder = UnzipFolder & "UnZipped " & FileName & "\" End If On Error Resume Next ' Remove all previous existing folders. FSO.deletefolder UnzipFolder & "*", True Kill UnzipFolder & "*.*" On Error GoTo 0 ' Create normal folder. If FolderExists(UnzipFolder) = False Then MkDir UnzipFolder End If Set oShellApp = CreateObject("Shell.Application") ' Copy the files in the newly created folder. oShellApp.Namespace(UnzipFolder).CopyHere oShellApp.Namespace(SourceFile).items On Error Resume Next ' Clean up temp folder. FSO.deletefolder Environ("Temp") & "\Temporary Directory*", True ' The files are now located in unzipped folder. XLFolder = UnzipFolder & "xl\" Set oShellApp = Nothing Set FSO = Nothing Exit Sub End Sub
Extracting XML from Files
The following code extracts the XML from the package part and then writes the XML to Sheet1 in the file EditOpenXML.xlsm to let you see what XML was extracted.
Public Function GetXMLFromFile(sFileName As String) As String Dim oXMLDoc As MSXML2.DOMDocument If Len(XLFolder) = 0 Then GetXMLFromFile = "" Else Set oXMLDoc = New MSXML2.DOMDocument oXMLDoc.Load XLFolder & sFileName Dim nodeList As IXMLDOMNodeList Dim node As IXMLDOMNode Dim nodeValue As String Dim concatValues As String Dim currRow As Integer 'Retrieve the node list by using an XPath query. Set nodeList = oXMLDoc.SelectNodes("/worksheet/sheetData/row/c") currRow = 1 ' Iterate through the node list and write the values to Sheet1 in this workbook. ' Then, concatenate the node value to a string. For Each node In nodeList nodeValue = node.nodeTypedValue Range("A1").Offset(currRow - 1, 0).Value = nodeValue concatValues = concatValues + vbCrLf + nodeValue currRow = currRow + 1 Next ' Return the concatenated string. GetXMLFromFile = concatValues ' GetXMLFromFile = oXMLDoc.XML Set oXMLDoc = Nothing End If End Function
Writing XML to Worksheets
After reading the XML from the part, write the results to a worksheet.
Public Sub WriteXML2Sheet(sXML As String) Dim oNode As MSXML2.IXMLDOMNode Dim oNodeList As MSXML2.IXMLDOMNodeList Dim oXMLDoc As MSXML2.DOMDocument Dim lRow As Long Set oXMLDoc = New MSXML2.DOMDocument oXMLDoc.loadXML sXML Set oNodeList = oXMLDoc.SelectNodes("/worksheet/sheetData/row/c") For Each oNode In oNodeList If oNode.FirstChild.nodeTypedValue = "" Then ' Value: Sheet2.Range(oNode.Attributes(0).nodeValue).Value = oNode.nodeTypedValue Else ' Formula: Sheet2.Range(oNode.Attributes(0).nodeValue).Value = oNode.FirstChild.nodeTypedValue End If Next End Sub
Writing XML Back to Files
After whatever processing that you want to do, write the updated XML back to the original file.
Compressing Zip Files
And finally, compress the file back to the Zip package.
Public Sub ZipAllFilesInFolder() ' This code sample is courtesy Ron le Bruin www.rondebruin.nl Dim oShellApp As Object Dim sDate As String Dim sDefPath As String Dim vFileNameZip As Variant Dim FSO As Object Dim lFileCt As Long Set FSO = CreateObject("scripting.filesystemobject") ' To ensure a unique filename, ' append date and time to the name of the current file. sDate = Format(Now, " dd-mmm-yy h-mm-ss") vFileNameZip = SourceFile & sDate & ".zip" 'Create the empty Zip file. NewZip vFileNameZip Set oShellApp = CreateObject("Shell.Application") ' Count how many items are in the original folder. lFileCt = oShellApp.Namespace(FolderName & "Unzipped " & FileName & "\").items.Count ' Copy the files to the compressed folder. oShellApp.Namespace(vFileNameZip).CopyHere oShellApp.Namespace(FolderName & "Unzipped " & FileName & "\").items ' Keep script waiting until there are the same number of files in the new folder. On Error Resume Next Do Until oShellApp.Namespace(vFileNameZip).items.Count = lFileCt Application.Wait (Now + TimeValue("0:00:01")) Loop DoEvents ' Remove the original file. Kill SourceFile ' Rename new Zipped file to same name as original file (with .zip appended). Name vFileNameZip As SourceFile On Error Resume Next ' Now remove the old folder. FSO.deletefolder FolderName & "Unzipped " & FileName, True On Error GoTo 0 Set oShellApp = Nothing End Sub
Getting XML for a Specific Worksheets
When it comes to working with Excel 2007 and its Office Open XML files, the most common operation is likely changing the XML for a particular worksheet. One procedure shown in this section extracts the rId (relationship identifier) attribute value from a list of nodes representing the worksheet in the package. Another procedure retrieves the sheet for that relationship identifier from the package. The last procedure retrieves the friendly name of the worksheet with that relationship identifier.
Retrieving Relationship Ids
The following code retrieves the rId attribute value for a particular sheet from the workbook.xml file.
Private Function GetSheetIdFromSheetName(sSheetName) As String Dim oXMLDoc As MSXML2.DOMDocument Dim oXMLNode As MSXML2.IXMLDOMNode Dim oXMLNodeList As MSXML2.IXMLDOMNodeList If mvXLFolder <> "" And Sheet2Change <> "" Then Set oXMLDoc = New MSXML2.DOMDocument oXMLDoc.Load XLFolder & "workbook.xml" Set oXMLNodeList = oXMLDoc.SelectNodes("/workbook/sheets/sheet") For Each oXMLNode In oXMLNodeList If oXMLNode.Attributes.getNamedItem("name").nodeValue = sSheetName Then GetSheetIdFromSheetName = oXMLNode.Attributes.getNamedItem("r:id").nodeValue Exit Function End If Next End If End Function
Retrieving Part Names
The following procedure returns the package part name that matches the relationship from the previous procedure.
Public Function GetSheetFileNameFromId(sSheetId As String) As String Dim oXMLDoc As MSXML2.DOMDocument Dim oXMLNode As MSXML2.IXMLDOMNode Dim oXMLNodeList As MSXML2.IXMLDOMNodeList If mvXLFolder <> "" And Sheet2Change <> "" Then Set oXMLDoc = New MSXML2.DOMDocument oXMLDoc.Load XLFolder & "_rels\workbook.xml.rels" Set oXMLNodeList = oXMLDoc.SelectNodes("/Relationships/Relationship") For Each oXMLNode In oXMLNodeList If oXMLNode.Attributes.getNamedItem("Id").nodeValue = sSheetId Then GetSheetFileNameFromId = oXMLNode.Attributes.getNamedItem("Target").nodeValue Exit Function End If Next End If End Function
Retrieving Friendly Worksheet Names
The following procedure retrieves the friendly name for the worksheet identified by the rId attribute.
Public Function GetSheetNameFromId(sId As String) As String Dim oXMLDoc As MSXML2.DOMDocument Dim oXMLNode As MSXML2.IXMLDOMNode Dim oXMLNodeList As MSXML2.IXMLDOMNodeList If mvXLFolder <> "" Then Set oXMLDoc = New MSXML2.DOMDocument oXMLDoc.Load XLFolder & "workbook.xml" Set oXMLNodeList = oXMLDoc.SelectNodes("/workbook/sheets/sheet") For Each oXMLNode In oXMLNodeList If oXMLNode.Attributes.getNamedItem("r:id").nodeValue = "rId" & Val(sId) + 1 Then GetSheetNameFromId = oXMLNode.Attributes.getNamedItem("name").nodeValue ' After you get the name, exit the function. Exit Function End If Next End If End Function
The techniques demonstrated in this article show you different ways to find, extract, and write back XML data to a file. Working with Office Open XML files requires knowledge of the structure and content of the package and of the document parts. We encourage you to use the techniques explained here to further explore working with these types of files.