Part 1: Automating the XML Data Mapping Process in Excel 2003
Sergei Gundorov
Clark Hamilton
Microsoft Corporation
March 2005
Applies to:
Microsoft Office Editions 2003
Summary: In the first of a three-part series, learn how to use XML maps to customize Excel as a data input and display system. (20 printed pages)
Contents
Introduction
Creating XML Element Names from an Excel Template
Creating XML Maps
Exploring the XML Map
Manually Importing and Exporting XML Data
Collecting Metadata for Data Mapping
Conclusion
Additional Resources
Introduction
Many automated spreadsheet solutions support the fact that XML maps in Microsoft Office Excel 2003 provide a fast and flexible way to move data between scattered spreadsheet cells and a back-end data source. Custom automation of the XML mapping process does three things:
- Eliminates errors when spreadsheet authors want to change the layout (for example, add or remove or move data cells)
- Significantly decreases change-processing time
- Provides useful metadata to facilitate database maintenance, and streamlines reporting from data gathered in the data source–linked spreadsheet.
This article describes several useful methods that enable advanced Excel users to automate the XML mapping process. The following table illustrates some typical problems that you can solve by using XML maps.
Problem | Solution |
---|---|
Single-cell mapping in Excel requires you to manually drag and drop each individual XML element. | Set up the XML key support grid in the spreadsheet that provides database-like dimensions in Excel. |
The standard Excel spreadsheet layout provides only three dimensions: sheet, row, and column. | Add one more dimension to the standard sheet/row/column layout to gain the ability to add multiple tables with repeating row and column dimension elements to a single spreadsheet while still using a single XML map per spreadsheet. |
Manually dragging and dropping each XML element is slow, hinders the change process, and is prone to error. | Automate single-cell XML mapping by "deriving" the XML map from a grid. |
XML element names must adhere to XML rules (that is, they cannot include special characters, spaces, and so on), which diminishes their reporting value. | Implement a simple table-element naming convention and add metadata tags to gather Excel-dimension simple descriptors for easy reporting. (This solution is not covered in this article.) |
XML schemas in Excel are called XML maps. You can use XML maps to create mapped ranges, and manage the relationship between mapped ranges and elements in the XML schema.
To start using Excel for data input and display, we run a macro to derive a four-part identifier for every non-formula data cell from the sheet, section, row, and column labels. The identifiers are used to generate XML maps to relate the contents of mapped cells to elements in a schema for importing or exporting XML data. The four-part identifier consists of SheetName, SectionName, RowName, and ColumnName, and is ideal for describing a composite key in the database. The Excel workbook becomes a template for dynamically displaying data from an XML data source.
Creating XML Element Names from an Excel Template
Our users create conventional Excel spreadsheets containing a blueprint layout for displaying corporate metrics. We use their template to create a user interface that other spreadsheet users can use for standardized data input and display. An Excel-based UI provides at least two benefits: Users can design the UI in a familiar environment, and they can take full advantage of the power of Excel formulas.
We use the worksheet shown in Figure 1, which contains two unrelated tables of data.
Figure 1. Sample Excel worksheet with two data sections (Click to see larger image)
Automating the Process
Our goal is to create a flexible XML schema that is self-maintaining. We take a nontraditional approach and allow the spreadsheet layout to determine the structure of the XML schema. We can use the worksheet name and the cell row and column, but this reduces our flexibility if we later decide to insert or delete areas of the worksheet. Instead, we use the text descriptions themselves and create a "Section;" description in addition to sheet, row, and column labels; this results in a four-dimensional data cell identifier. The identifiers become element names in our XML map. We use Microsoft Visual Basic for Applications (VBA) to automate the grid-creation process. The following sample code creates XML element names for our XML map and places them in XML element grid-control rows and columns, which we will later hide.
'****************************************
'** Sub: BuildGrid
'** Purpose: Derive XML Element Names
'****************************************
Sub BuildGrid()
Dim i As Integer
Dim tableRange As Range
Dim gridName As String
Dim gridIndex As Integer
Dim rowName As String
Dim colName As String
Dim rowOffset As Integer
Dim colOffset As Integer
Dim totalRows As Integer
Dim totalCols As Integer
Dim wks As Worksheet
'setting up range to process
Set tableRange = Selection
gridName = InputBox("Input grid name for selected table:", _
"ROB2.0 Automatic Grid Builder Tool", tableRange.Cells(1, 1))
If UCase(gridName) = "FALSE" Or gridName = "" Then Exit Sub
Set wks = ActiveSheet
'determining total number of rows to process
totalRows = tableRange.Rows.Count
'determining total number of columns to process
totalCols = tableRange.Columns.Count
'identifying next available grid intercept
Do
i = i + 1
If wks.Cells(i, i) = "" Then
gridIndex = i
Exit Do
End If
Loop
'clearing out all previous formats and values
wks.Range(Cells(gridIndex, gridIndex), Cells(gridIndex, _
wks.UsedRange.Columns.Count)).Clear
wks.Range(Cells(gridIndex, gridIndex), _
Cells(wks.UsedRange.Rows.Count, gridIndex)).Clear
wks.Cells(i, i) = CleanString(gridName)
rowOffset = tableRange.Cells(1, 1).Row - 1
colOffset = tableRange.Cells(1, 1).Column - 1
'processing rows
For i = 2 To totalRows
rowName = CleanString(tableRange.Cells(i, 1))
wks.Cells(rowOffset + i, gridIndex) = rowName
Next
'processing columns
For i = 2 To totalCols
colName = CleanString(tableRange.Cells(1, i))
wks.Cells(gridIndex, colOffset + i) = colName
Next
'apply formatting to grid columns and rows
If (gridIndex Mod 2) = 1 Then
wks.Range(Cells(gridIndex, gridIndex), Cells(gridIndex, _
wks.UsedRange.Columns.Count)).Interior.ColorIndex = 35
wks.Range(Cells(gridIndex, gridIndex), _
Cells(wks.UsedRange.Rows.Count, _
gridIndex)).Interior.ColorIndex = 35
Else
wks.Range(Cells(gridIndex, gridIndex), Cells(gridIndex, _
wks.UsedRange.Columns.Count)).Interior.ColorIndex = 34
wks.Range(Cells(gridIndex, gridIndex), _
Cells(wks.UsedRange.Rows.Count, _
gridIndex)).Interior.ColorIndex = 34
End If
wks.Range(Cells(gridIndex, gridIndex), Cells(gridIndex, _
wks.UsedRange.Columns.Count)).Font.ColorIndex = 5
wks.Range(Cells(gridIndex, gridIndex), _
Cells(wks.UsedRange.Rows.Count, gridIndex)).Font.ColorIndex = 5
End Sub
'****************************************************
'** Function: CleanString
'** Purpose: Remove Non-XML-Compliant Characters
'****************************************************
Function CleanString(str As String) As String
Dim charList As String
Dim charArray As Variant
Dim i As Integer
'characters to remove from grid element name
'add to this list as necessary
charList = "('000);(000's);(%);($); ;$;&;(;);,;.;"";';-;+;:;/;\;@;>;<;=;" & Chr(13) & ";" & Chr(10)
charArray = Split(charList, ";")
For i = 0 To UBound(charArray)
str = Replace(str, charArray(i), "")
Next
'replacing special symbols/common abbreviations
str = Replace(str, "%", "Pct")
str = Replace(str, "#", "Num")
str = Replace(str, "Revenue", "Rev", , , vbTextCompare)
str = Replace(str, "Target", "TGT", , , vbTextCompare)
str = Replace(str, "Actual", "ACT", , , vbTextCompare)
str = Replace(str, "Budget", "BUD", , , vbTextCompare)
str = Replace(str, "Prior Year", "PY", , , vbTextCompare)
str = Replace(str, "Business", "Bus", , , vbTextCompare)
str = Replace(str, "Customer", "Cust", , , vbTextCompare)
str = Replace(str, "Solution", "Sol", , , vbTextCompare)
str = Replace(str, "Total", "TOT", , , vbTextCompare)
CleanString = str
End Function
We use the BuildGrid macro by highlighting a logical section (table) on the active sheet that includes one header row and one header column (in our simplified example there is only one of each) and run the macro once for each table. The users that generated the original template were asked to start their work in cell AA26, to leave room for our element name control rows and columns. The result is shown in Figure 2. (Note that we have hidden some rows and columns for clarity.)
Figure 2. Sample Excel worksheet after running "BuildGrid" macro to create element names (Click to see larger image)
Creating XML Maps
The structure we created in the previous step provides the foundation for a worksheet XML map. We discovered that one XML map per worksheet resulted in the best performance when consuming data. The following code constructs an XML map from the element names. Start the process by running the ProcessSheet macro.
'**********************************************************
'** Sub: ProcessSheet
'** Purpose: XML mapping automation process
'** Notes: (1) Requires setting a reference to
'** Microsoft XML, v5.0
'** (2) Before running, create a new worksheet
'** named "Output" in the workbook
'** containing this code.
'**********************************************************
Sub ProcessSheet()
Dim slideName As String
Dim currentMap As XmlMap
Dim processComments As Boolean
Application.DisplayAlerts = False
slideName = CleanString(ActiveSheet.Name)
'building and applying data map
BuildXMLMap ActiveSheet
For Each currentMap In ActiveWorkbook.XmlMaps
If currentMap.Name = slideName & "_Map" Then
ActiveWorkbook.XmlMaps(slideName & "_Map").Delete
End If
Next
Set currentMap = ActiveWorkbook.XmlMaps.Add(ActiveWorkbook.Path _
& "\" & slideName & ".xml", "ROOT")
currentMap.Name = slideName & "_Map"
With currentMap
.PreserveColumnFilter = False
.PreserveNumberFormatting = True
.AdjustColumnWidth = False
End With
ApplyXMLMapping
Application.DisplayAlerts = True
End Sub
'******************************************************************
'** Sub: BuildXMLMap
'** Purpose: Create XML Map from element names
'** Notes: Requires setting a reference to Microsoft XML, v5.0
'*******************************************************************
Sub BuildXMLMap(Optional wks As Worksheet)
Dim xDOMOut As New MSXML2.DOMDocument
Dim nodeOut As IXMLDOMNode
Dim rootNode As IXMLDOMNode
Dim slideNode As IXMLDOMNode
Dim sectionNode As IXMLDOMNode
Dim rowNode As IXMLDOMNode
Dim iGridIndex As Integer
Dim strXpath As String
Dim iCntrlCols As Integer
Dim iRows As Integer
Dim iCols As Integer
Dim iR As Integer
Dim iC As Integer
Dim i As Integer
Dim iCounter As Integer
Dim sSlideName As String
Dim strSection As String
Dim wksOutP As Worksheet
If wks Is Nothing Then
Set wks = ActiveSheet
End If
Application.ScreenUpdating = False
'here add steps to create output XML doc
xDOMOut.loadXML "<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>" & _
"<ROOT xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"">" & _
"</ROOT>"
'setting currentnode to ROOT
Set rootNode = xDOMOut.selectSingleNode("ROOT")
Set wksOutP = ThisWorkbook.Worksheets("Output")
'here cycling through all control row+column
'combinations until find a blank
'which means controls stop. Fully extensible and dynamic model.
'Can grow indefinitely
Do
i = i + 1
If wks.Cells(i, i) = "" Then
iCntrlCols = i - 1
Exit Do
End If
Loop
'determining last cell in the sheet
'we will always use row 1 so the total count will
'always be=last row
iRows = wks.UsedRange.Rows.Count
iCols = wks.UsedRange.Columns.Count
sSlideName = CleanString(wks.Name)
iCounter = 1
'adding SlideName node to the ROOT
Set nodeOut = xDOMOut.createNode(NODE_ELEMENT, sSlideName, "")
Set slideNode = rootNode.appendChild(nodeOut)
'resetting the counter to start outside the control column range
For iC = 1 To iCntrlCols
'processing section node
strSection = Cells(iC, iC)
Set sectionNode = slideNode.selectSingleNode(Cells(iC, iC))
If sectionNode Is Nothing Then
Set nodeOut = xDOMOut.createNode(NODE_ELEMENT, _
Cells(iC, iC), "")
Set sectionNode = slideNode.appendChild(nodeOut)
End If
For iR = (iCntrlCols + 1) To iRows
'looping through all columns in current control row
For i = (iCntrlCols + 1) To iCols
'if both coordinates are not left blank
'AND not description tags then need to process for Xpath
If Cells(iR, iC) <> "" And _
Cells(iC, i) <> "" And _
Left(Cells(iR, iC), 1) <> "[" And _
Left(Cells(iC, i), 1) <> "[" Then
'here need to test for gray(15)
'or black(1) cell background color = don't map
If Not Cells(iR, i).Interior.ColorIndex = 1 And _
Not Cells(iR, i).Interior.ColorIndex = 15 And _
Not Cells(iR, i).Formula Like "=*" Then
iCounter = iCounter + 1
'checking to see if row node exists
'in current section node
Set rowNode = _
sectionNode.selectSingleNode(Cells(iR, iC))
If rowNode Is Nothing Then
Set nodeOut = _
xDOMOut.createNode(NODE_ELEMENT, _
Cells(iR, iC), "")
Set rowNode = _
sectionNode.appendChild(nodeOut)
End If
'creating column node
Set nodeOut = _
xDOMOut.createNode(NODE_ELEMENT, _
Cells(iC, i), "")
'appending column node to current row node
Set rowNode = rowNode.appendChild(nodeOut)
'setting default value to 0 to avoid
'data type mismatch error
rowNode.nodeTypedValue = 0
End If
End If
Next
Next
'testing for an empty section
If Not sectionNode.hasChildNodes Then
xDOMOut.removeChild (sectionNode)
End If
Next
'processed file is saved
xDOMOut.Save ActiveWorkbook.Path & "\" & sSlideName & ".xml"
End Sub
'*******************************************************************
'** Sub: ApplyXMLMapping
'** Purpose: Map elements
'** Notes: Requires setting a reference to Microsoft XML, v5.0
'*******************************************************************
Sub ApplyXMLMapping()
Dim iGridIndex As Integer
Dim strXpath As String
Dim iCntrlCols As Integer
Dim iRows As Integer
Dim iCols As Integer
Dim iR As Integer
Dim iC As Integer
Dim i As Integer
Dim mapP As XmlMap
Dim sSlideName As String
Dim strSection As String
Application.ScreenUpdating = False
RemoveAllXMLMappings ActiveSheet
sSlideName = CleanString(ActiveSheet.Name)
Set mapP = ActiveWorkbook.XmlMaps(sSlideName & "_Map")
'cycling through all control row+column combinations
'until we find a blank, which means controls stop.
'Fully extensible and dynamic model. Can grow indefinitely
Do
i = i + 1
If ActiveSheet.Cells(i, i) = "" Then
iCntrlCols = i - 1
Exit Do
End If
Loop
'determining last cell in the sheet
'we will always use row 1 so total count will always be=last row
iRows = ActiveSheet.UsedRange.Rows.Count
iCols = ActiveSheet.UsedRange.Columns.Count
'resetting the counter to start outside the control column range
For iC = 1 To iCntrlCols
'reading in section name
strSection = Cells(iC, iC)
'looping through all rows in current control column
For iR = (iCntrlCols + 1) To iRows
'looping through all columns in current control row
For i = (iCntrlCols + 1) To iCols
'if both coordinates are not left blank AND
'not description tags, we need to process for Xpath
If Cells(iR, iC) <> "" And _
Cells(iC, i) <> "" And _
Left(Cells(iR, iC), 1) <> "[" And _
Left(Cells(iC, i), 1) <> "[" Then
If Not Cells(iR, i).Interior.ColorIndex = 1 And _
Not Cells(iR, i).Interior.ColorIndex = 15 And _
Not Cells(iR, i).Formula Like "=*" Then
'assembling key name,
'first adding root element
strXpath = "/ROOT/"
'adding slide (sheet) name
strXpath = strXpath & sSlideName & "/"
'adding subsection name
strXpath = strXpath & strSection & "/"
'assembling XMLKey linecode+"/"+column code
strXpath = strXpath & Cells(iR, iC) _
& "/" & Cells(iC, i)
'applying XPath
Cells(iR, i).XPath.SetValue mapP, strXpath
End If
End If
Next
Next
Next
End Sub
'************************************
'** Sub: RemoveAllXMLMappings
'** Purpose: Delete Old Mappings
'************************************
Sub RemoveAllXMLMappings(wks As Worksheet)
Dim rCell As Range
For Each rCell In wks.UsedRange.Cells
If rCell.XPath <> "" Then
rCell.XPath.Clear
End If
Next
End Sub
Figure 3 shows the results of running our "ProcessSheet" macro. Mapped cells have a bold blue border, which you can hide if desired.
Figure 3. Sample Excel worksheet after running macro to generate XML maps (Click to see larger image)
The spreadsheet is now fully capable of consuming or exporting XML data that conforms to our schema. There are several options for importing XML data, including many programmatic solutions. We plan to cover our solution in a future article.
Exploring the XML Map
On the Data menu in Excel, select XML and then XML Source to open the XML Source window, as shown in Figure 4.
Figure 4. XML map in sample worksheet
Manually Importing and Exporting XML Data
On the Data menu in Excel, select XML and then Export to save mapped cell data as an XML document file. To refresh your spreadsheet from an XML data file, open the Data menu again, select XML and then Import, and browse for an XML document conforming to your XML map schema.
Collecting Metadata for Data Mapping
The following code provides a way to collect all data cell dimension elements in a spreadsheet. You could use this tool in various scenarios; for example, creating a database table to store your values and using the four-part dimension as a composite primary key or for data mapping. Providing the cell address helps with data validation.
'*********************************************************************
'** Sub: ExtractKeys
'** Purpose: Export all dimensions to a worksheet.
'** Notes: Before running, create a new worksheet named "Output"
'** in the workbook containing this code.
'*********************************************************************
Sub ExtractKeys()
Dim iGridIndex As Integer
Dim strXpath As String
Dim iCntrlCols As Integer
Dim iRows As Integer
Dim iCols As Integer
Dim iR As Integer
Dim iC As Integer
Dim i As Integer
Dim iCounter As Integer
Dim sSlideName As String
Dim strSection As String
Dim wksOutP As Worksheet
Application.ScreenUpdating = False
Set wksOutP = ThisWorkbook.Worksheets("Output")
wksOutP.Cells.Clear
'cycle through all control row+column combinations
'until we find a blank, which means controls stop.
'Fully extensible and dynamic model.
'Can grow indefinitely
Do
i = i + 1
If ActiveSheet.Cells(i, i) = "" Then
iCntrlCols = i - 1
Exit Do
End If
Loop
'determining last cell in the sheet
'we always use row 1, so the total count will always be=last row
iRows = ActiveSheet.UsedRange.Rows.Count
iCols = ActiveSheet.UsedRange.Columns.Count
sSlideName = CleanString(ActiveSheet.Name)
wksOutP.Cells(1, 1) = "SlideName"
wksOutP.Cells(1, 2) = "SectionName"
wksOutP.Cells(1, 3) = "ColName"
wksOutP.Cells(1, 4) = "RowName"
wksOutP.Cells(1, 5) = "Address"
wksOutP.Range(wksOutP.Cells(1, 1), _
wksOutP.Cells(1, 9)).Font.Bold = True
iCounter = 1
'resetting the counter to start outside the control column range
For iC = 1 To iCntrlCols
'reading in section name
strSection = Cells(iC, iC)
For iR = (iCntrlCols + 1) To iRows
'looping through all columns in current control row
For i = (iCntrlCols + 1) To iCols
'if both coordinates are not left blank AND
'not description tags then need to process for Xpath
If Cells(iR, iC) <> "" And _
Cells(iC, i) <> "" And _
Left(Cells(iR, iC), 1) <> "[" And _
Left(Cells(iC, i), 1) <> "[" Then
If Not Cells(iR, i).Interior.ColorIndex = 1 And _
Not Cells(iR, i).Interior.ColorIndex = 15 And _
Not Cells(iR, i).Formula Like "=*" Then
iCounter = iCounter + 1
wksOutP.Cells(iCounter, 1) = sSlideName
wksOutP.Cells(iCounter, 2) = strSection
wksOutP.Cells(iCounter, 3) = Cells(iC, i)
wksOutP.Cells(iCounter, 4) = Cells(iR, iC)
wksOutP.Cells(iCounter, 5) = _
ActiveSheet.Range(Cells(iR, i), _
Cells(iR, i)).Address
End If
End If
Next
Next
Next
End Sub
Conclusion
The sample code in this article is generic and can be used for most scenarios requiring support of flexible XML mapping with little or no modification. Currently we use this code in a project called "The Rhythm of the Business 2.0," which supports a flexible and constantly changing Excel template used by more than 1,000 Microsoft employees to assess the entire company's performance on a monthly basis. In our project we also needed to harvest the unaltered data-point descriptions for use in a reporting application. We did this by adding meta tags, using square brackets for marking up the description positions. (This article does not include the code that retrieves the description values.) We encountered another reporting issue related to the Excel formulas. In a future article, we will describe how to take advantage of the grid structure to programmatically extract and store Excel formulas in a Microsoft SQL Server database and replicate the UI-side calculations using dynamic SQL.
Additional Resources
The following articles can help you develop more custom XML solutions using Excel: