Part 1: Automating the XML Data Mapping Process in Excel 2003

 

Sergei Gundorov
Clark Hamilton
Microsoft Corporation

March 2005

Applies to:
    Microsoft Office Editions 2003

Summary: In the first of a three-part series, learn how to use XML maps to customize Excel as a data input and display system. (20 printed pages)

Contents

Introduction
Creating XML Element Names from an Excel Template
Creating XML Maps
Exploring the XML Map
Manually Importing and Exporting XML Data
Collecting Metadata for Data Mapping
Conclusion
Additional Resources

Introduction

Many automated spreadsheet solutions support the fact that XML maps in Microsoft Office Excel 2003 provide a fast and flexible way to move data between scattered spreadsheet cells and a back-end data source. Custom automation of the XML mapping process does three things:

  • Eliminates errors when spreadsheet authors want to change the layout (for example, add or remove or move data cells)
  • Significantly decreases change-processing time
  • Provides useful metadata to facilitate database maintenance, and streamlines reporting from data gathered in the data source–linked spreadsheet.

This article describes several useful methods that enable advanced Excel users to automate the XML mapping process. The following table illustrates some typical problems that you can solve by using XML maps.

Problem Solution
Single-cell mapping in Excel requires you to manually drag and drop each individual XML element. Set up the XML key support grid in the spreadsheet that provides database-like dimensions in Excel.
The standard Excel spreadsheet layout provides only three dimensions: sheet, row, and column. Add one more dimension to the standard sheet/row/column layout to gain the ability to add multiple tables with repeating row and column dimension elements to a single spreadsheet while still using a single XML map per spreadsheet.
Manually dragging and dropping each XML element is slow, hinders the change process, and is prone to error. Automate single-cell XML mapping by "deriving" the XML map from a grid.
XML element names must adhere to XML rules (that is, they cannot include special characters, spaces, and so on), which diminishes their reporting value. Implement a simple table-element naming convention and add metadata tags to gather Excel-dimension simple descriptors for easy reporting. (This solution is not covered in this article.)

XML schemas in Excel are called XML maps. You can use XML maps to create mapped ranges, and manage the relationship between mapped ranges and elements in the XML schema.

To start using Excel for data input and display, we run a macro to derive a four-part identifier for every non-formula data cell from the sheet, section, row, and column labels. The identifiers are used to generate XML maps to relate the contents of mapped cells to elements in a schema for importing or exporting XML data. The four-part identifier consists of SheetName, SectionName, RowName, and ColumnName, and is ideal for describing a composite key in the database. The Excel workbook becomes a template for dynamically displaying data from an XML data source.

Creating XML Element Names from an Excel Template

Our users create conventional Excel spreadsheets containing a blueprint layout for displaying corporate metrics. We use their template to create a user interface that other spreadsheet users can use for standardized data input and display. An Excel-based UI provides at least two benefits: Users can design the UI in a familiar environment, and they can take full advantage of the power of Excel formulas.

We use the worksheet shown in Figure 1, which contains two unrelated tables of data.

Sample Excel worksheet with two data sections (Click to see larger image)

Figure 1. Sample Excel worksheet with two data sections (Click to see larger image)

Automating the Process

Our goal is to create a flexible XML schema that is self-maintaining. We take a nontraditional approach and allow the spreadsheet layout to determine the structure of the XML schema. We can use the worksheet name and the cell row and column, but this reduces our flexibility if we later decide to insert or delete areas of the worksheet. Instead, we use the text descriptions themselves and create a &quotSection;" description in addition to sheet, row, and column labels; this results in a four-dimensional data cell identifier. The identifiers become element names in our XML map. We use Microsoft Visual Basic for Applications (VBA) to automate the grid-creation process. The following sample code creates XML element names for our XML map and places them in XML element grid-control rows and columns, which we will later hide.

'****************************************
'**    Sub:      BuildGrid
'**    Purpose:  Derive XML Element Names
'****************************************
Sub BuildGrid()

Dim i As Integer
Dim tableRange As Range
Dim gridName As String
Dim gridIndex As Integer
Dim rowName As String
Dim colName As String
Dim rowOffset As Integer
Dim colOffset As Integer
Dim totalRows As Integer
Dim totalCols As Integer
Dim wks As Worksheet

    'setting up range to process
    Set tableRange = Selection
    gridName = InputBox("Input grid name for selected table:", _
        "ROB2.0 Automatic Grid Builder Tool", tableRange.Cells(1, 1))
    If UCase(gridName) = "FALSE" Or gridName = "" Then Exit Sub
    Set wks = ActiveSheet

    'determining total number of rows to process
    totalRows = tableRange.Rows.Count

    'determining total number of columns to process
    totalCols = tableRange.Columns.Count

    'identifying next available grid intercept
    Do
        i = i + 1
            If wks.Cells(i, i) = "" Then
                gridIndex = i
                Exit Do
            End If
    Loop

    'clearing out all previous formats and values
    wks.Range(Cells(gridIndex, gridIndex), Cells(gridIndex, _
        wks.UsedRange.Columns.Count)).Clear
    wks.Range(Cells(gridIndex, gridIndex), _
        Cells(wks.UsedRange.Rows.Count, gridIndex)).Clear
    wks.Cells(i, i) = CleanString(gridName)
    rowOffset = tableRange.Cells(1, 1).Row - 1
    colOffset = tableRange.Cells(1, 1).Column - 1

    'processing rows
    For i = 2 To totalRows
        rowName = CleanString(tableRange.Cells(i, 1))
        wks.Cells(rowOffset + i, gridIndex) = rowName
    Next

    'processing columns
    For i = 2 To totalCols
        colName = CleanString(tableRange.Cells(1, i))
        wks.Cells(gridIndex, colOffset + i) = colName
    Next

    'apply formatting to grid columns and rows
    If (gridIndex Mod 2) = 1 Then
        wks.Range(Cells(gridIndex, gridIndex), Cells(gridIndex, _
            wks.UsedRange.Columns.Count)).Interior.ColorIndex = 35
        wks.Range(Cells(gridIndex, gridIndex), _
            Cells(wks.UsedRange.Rows.Count, _
            gridIndex)).Interior.ColorIndex = 35
    Else
        wks.Range(Cells(gridIndex, gridIndex), Cells(gridIndex, _
            wks.UsedRange.Columns.Count)).Interior.ColorIndex = 34
        wks.Range(Cells(gridIndex, gridIndex), _
            Cells(wks.UsedRange.Rows.Count, _
                gridIndex)).Interior.ColorIndex = 34
    End If

    wks.Range(Cells(gridIndex, gridIndex), Cells(gridIndex, _
        wks.UsedRange.Columns.Count)).Font.ColorIndex = 5
    wks.Range(Cells(gridIndex, gridIndex), _
        Cells(wks.UsedRange.Rows.Count, gridIndex)).Font.ColorIndex = 5

End Sub


'****************************************************
'**    Function:  CleanString
'**    Purpose:   Remove Non-XML-Compliant Characters
'****************************************************
Function CleanString(str As String) As String

Dim charList As String
Dim charArray As Variant
Dim i As Integer

    'characters to remove from grid element name
    'add to this list as necessary
     charList = "('000);(000's);(%);($); ;$;&;(;);,;.;"";';-;+;:;/;\;@;>;<;=;" & Chr(13) & ";" & Chr(10)
    charArray = Split(charList, ";")

    For i = 0 To UBound(charArray)
        str = Replace(str, charArray(i), "")
    Next

    'replacing special symbols/common abbreviations
    str = Replace(str, "%", "Pct")
    str = Replace(str, "#", "Num")
    str = Replace(str, "Revenue", "Rev", , , vbTextCompare)
    str = Replace(str, "Target", "TGT", , , vbTextCompare)
    str = Replace(str, "Actual", "ACT", , , vbTextCompare)
    str = Replace(str, "Budget", "BUD", , , vbTextCompare)
    str = Replace(str, "Prior Year", "PY", , , vbTextCompare)
    str = Replace(str, "Business", "Bus", , , vbTextCompare)
    str = Replace(str, "Customer", "Cust", , , vbTextCompare)
    str = Replace(str, "Solution", "Sol", , , vbTextCompare)
    str = Replace(str, "Total", "TOT", , , vbTextCompare)

    CleanString = str

End Function

We use the BuildGrid macro by highlighting a logical section (table) on the active sheet that includes one header row and one header column (in our simplified example there is only one of each) and run the macro once for each table. The users that generated the original template were asked to start their work in cell AA26, to leave room for our element name control rows and columns. The result is shown in Figure 2. (Note that we have hidden some rows and columns for clarity.)

Sample Excel worksheet after running BuildGrid macro to create element names (Click to see larger image)

Figure 2. Sample Excel worksheet after running "BuildGrid" macro to create element names (Click to see larger image)

Creating XML Maps

The structure we created in the previous step provides the foundation for a worksheet XML map. We discovered that one XML map per worksheet resulted in the best performance when consuming data. The following code constructs an XML map from the element names. Start the process by running the ProcessSheet macro.

'**********************************************************
'**    Sub:      ProcessSheet
'**    Purpose:  XML mapping automation process
'**    Notes:    (1) Requires setting a reference to
'**                  Microsoft XML, v5.0
'**              (2) Before running, create a new worksheet
'**                  named "Output" in the workbook
'**                  containing this code.
'**********************************************************
Sub ProcessSheet()

Dim slideName As String
Dim currentMap As XmlMap
Dim processComments As Boolean

    Application.DisplayAlerts = False

    slideName = CleanString(ActiveSheet.Name)

    'building and applying data map
    BuildXMLMap ActiveSheet

    For Each currentMap In ActiveWorkbook.XmlMaps
        If currentMap.Name = slideName & "_Map" Then
            ActiveWorkbook.XmlMaps(slideName & "_Map").Delete
        End If
    Next

    Set currentMap = ActiveWorkbook.XmlMaps.Add(ActiveWorkbook.Path _
        & "\" & slideName & ".xml", "ROOT")
    currentMap.Name = slideName & "_Map"

    With currentMap
        .PreserveColumnFilter = False
        .PreserveNumberFormatting = True
        .AdjustColumnWidth = False
    End With

    ApplyXMLMapping
    Application.DisplayAlerts = True

End Sub


'******************************************************************
'**    Sub:      BuildXMLMap
'**    Purpose:  Create XML Map from element names
'**    Notes:    Requires setting a reference to Microsoft XML, v5.0
'*******************************************************************
Sub BuildXMLMap(Optional wks As Worksheet)

Dim xDOMOut As New MSXML2.DOMDocument
Dim nodeOut As IXMLDOMNode
Dim rootNode As IXMLDOMNode
Dim slideNode As IXMLDOMNode
Dim sectionNode As IXMLDOMNode
Dim rowNode As IXMLDOMNode
Dim iGridIndex As Integer
Dim strXpath As String
Dim iCntrlCols As Integer
Dim iRows As Integer
Dim iCols As Integer
Dim iR As Integer
Dim iC As Integer
Dim i As Integer
Dim iCounter As Integer
Dim sSlideName As String
Dim strSection As String
Dim wksOutP As Worksheet

    If wks Is Nothing Then
        Set wks = ActiveSheet
    End If

    Application.ScreenUpdating = False

    'here add steps to create output XML doc
    xDOMOut.loadXML "<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>" & _
        "<ROOT xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"">" & _
        "</ROOT>"
    'setting currentnode to ROOT
    Set rootNode = xDOMOut.selectSingleNode("ROOT")
    Set wksOutP = ThisWorkbook.Worksheets("Output")

    'here cycling through all control row+column
    'combinations until find a blank
    'which means controls stop. Fully extensible and dynamic model.
    'Can grow indefinitely
    Do
        i = i + 1
        If wks.Cells(i, i) = "" Then
            iCntrlCols = i - 1
            Exit Do
        End If
    Loop

    'determining last cell in the sheet
    'we will always use row 1 so the total count will
    'always be=last row
    iRows = wks.UsedRange.Rows.Count
    iCols = wks.UsedRange.Columns.Count

    sSlideName = CleanString(wks.Name)

    iCounter = 1

    'adding SlideName node to the ROOT
    Set nodeOut = xDOMOut.createNode(NODE_ELEMENT, sSlideName, "")
    Set slideNode = rootNode.appendChild(nodeOut)
    'resetting the counter to start outside the control column range
    For iC = 1 To iCntrlCols

    'processing section node
    strSection = Cells(iC, iC)
    Set sectionNode = slideNode.selectSingleNode(Cells(iC, iC))
    If sectionNode Is Nothing Then
        Set nodeOut = xDOMOut.createNode(NODE_ELEMENT, _
            Cells(iC, iC), "")
        Set sectionNode = slideNode.appendChild(nodeOut)
    End If

    For iR = (iCntrlCols + 1) To iRows
        'looping through all columns in current control row
        For i = (iCntrlCols + 1) To iCols
            'if both coordinates are not left blank
            'AND not description tags then need to process for Xpath
            If Cells(iR, iC) <> "" And _
                  Cells(iC, i) <> "" And _
                  Left(Cells(iR, iC), 1) <> "[" And _
                  Left(Cells(iC, i), 1) <> "[" Then

                  'here need to test for gray(15)
                  'or black(1) cell background color = don't map
                  If Not Cells(iR, i).Interior.ColorIndex = 1 And _
                      Not Cells(iR, i).Interior.ColorIndex = 15 And _
                      Not Cells(iR, i).Formula Like "=*" Then

                      iCounter = iCounter + 1
                      'checking to see if row node exists
                      'in current section node
                      Set rowNode = _
                          sectionNode.selectSingleNode(Cells(iR, iC))
                          If rowNode Is Nothing Then
                              Set nodeOut = _
                                  xDOMOut.createNode(NODE_ELEMENT, _
                                  Cells(iR, iC), "")
                              Set rowNode = _
                                  sectionNode.appendChild(nodeOut)
                          End If
                          'creating column node
                          Set nodeOut = _
                              xDOMOut.createNode(NODE_ELEMENT, _
                              Cells(iC, i), "")
                          'appending column node to current row node
                          Set rowNode = rowNode.appendChild(nodeOut)
                          'setting default value to 0 to avoid 
                          'data type mismatch error
                          rowNode.nodeTypedValue = 0
                  End If
            End If
        Next
    Next
    'testing for an empty section
    If Not sectionNode.hasChildNodes Then
        xDOMOut.removeChild (sectionNode)
    End If
    Next

    'processed file is saved
    xDOMOut.Save ActiveWorkbook.Path & "\" & sSlideName & ".xml"

End Sub

'*******************************************************************
'**    Sub:      ApplyXMLMapping
'**    Purpose:  Map elements
'**    Notes:    Requires setting a reference to Microsoft XML, v5.0
'*******************************************************************
Sub ApplyXMLMapping()

Dim iGridIndex As Integer
Dim strXpath As String
Dim iCntrlCols As Integer
Dim iRows As Integer
Dim iCols As Integer
Dim iR As Integer
Dim iC As Integer
Dim i As Integer
Dim mapP As XmlMap
Dim sSlideName As String
Dim strSection As String

    Application.ScreenUpdating = False

    RemoveAllXMLMappings ActiveSheet

    sSlideName = CleanString(ActiveSheet.Name)

    Set mapP = ActiveWorkbook.XmlMaps(sSlideName & "_Map")

    'cycling through all control row+column combinations
    'until we find a blank, which means controls stop.
    'Fully extensible and dynamic model. Can grow indefinitely
    Do
        i = i + 1
        If ActiveSheet.Cells(i, i) = "" Then
            iCntrlCols = i - 1
            Exit Do
        End If
    Loop

    'determining last cell in the sheet
    'we will always use row 1 so total count will always be=last row
    iRows = ActiveSheet.UsedRange.Rows.Count
    iCols = ActiveSheet.UsedRange.Columns.Count

    'resetting the counter to start outside the control column range
    For iC = 1 To iCntrlCols
        'reading in section name
        strSection = Cells(iC, iC)
        'looping through all rows in current control column
        For iR = (iCntrlCols + 1) To iRows
            'looping through all columns in current control row
            For i = (iCntrlCols + 1) To iCols
                'if both coordinates are not left blank AND
                'not description tags, we need to process for Xpath
                If Cells(iR, iC) <> "" And _
                    Cells(iC, i) <> "" And _
                    Left(Cells(iR, iC), 1) <> "[" And _
                    Left(Cells(iC, i), 1) <> "[" Then

                    If Not Cells(iR, i).Interior.ColorIndex = 1 And _
                        Not Cells(iR, i).Interior.ColorIndex = 15 And _
                        Not Cells(iR, i).Formula Like "=*" Then
                        'assembling key name,
                        'first adding root element
                        strXpath = "/ROOT/"
                        'adding slide (sheet) name
                        strXpath = strXpath & sSlideName & "/"
                        'adding subsection name
                        strXpath = strXpath & strSection & "/"
                        'assembling XMLKey linecode+"/"+column code
                        strXpath = strXpath & Cells(iR, iC) _
                            & "/" & Cells(iC, i)
                        'applying XPath
                        Cells(iR, i).XPath.SetValue mapP, strXpath
                    End If
                End If
            Next
        Next
    Next
End Sub

'************************************
'**    Sub:      RemoveAllXMLMappings
'**    Purpose:  Delete Old Mappings
'************************************
Sub RemoveAllXMLMappings(wks As Worksheet)

Dim rCell As Range

    For Each rCell In wks.UsedRange.Cells
        If rCell.XPath <> "" Then
            rCell.XPath.Clear
        End If
    Next
End Sub

Figure 3 shows the results of running our "ProcessSheet" macro. Mapped cells have a bold blue border, which you can hide if desired.

Sample Excel worksheet after running macro to generate XML maps (Click to see larger image)

Figure 3. Sample Excel worksheet after running macro to generate XML maps (Click to see larger image)

The spreadsheet is now fully capable of consuming or exporting XML data that conforms to our schema. There are several options for importing XML data, including many programmatic solutions. We plan to cover our solution in a future article.

Exploring the XML Map

On the Data menu in Excel, select XML and then XML Source to open the XML Source window, as shown in Figure 4.

XML map in sample worksheet

Figure 4. XML map in sample worksheet

Manually Importing and Exporting XML Data

On the Data menu in Excel, select XML and then Export to save mapped cell data as an XML document file. To refresh your spreadsheet from an XML data file, open the Data menu again, select XML and then Import, and browse for an XML document conforming to your XML map schema.

Collecting Metadata for Data Mapping

The following code provides a way to collect all data cell dimension elements in a spreadsheet. You could use this tool in various scenarios; for example, creating a database table to store your values and using the four-part dimension as a composite primary key or for data mapping. Providing the cell address helps with data validation.

'*********************************************************************
'**    Sub:      ExtractKeys
'**    Purpose:  Export all dimensions to a worksheet.
'**    Notes:    Before running, create a new worksheet named "Output"
'**              in the workbook containing this code.
'*********************************************************************
Sub ExtractKeys()

Dim iGridIndex As Integer
Dim strXpath As String
Dim iCntrlCols As Integer
Dim iRows As Integer
Dim iCols As Integer
Dim iR As Integer
Dim iC As Integer
Dim i As Integer
Dim iCounter As Integer
Dim sSlideName As String
Dim strSection As String
Dim wksOutP As Worksheet

    Application.ScreenUpdating = False

    Set wksOutP = ThisWorkbook.Worksheets("Output")
    wksOutP.Cells.Clear

    'cycle through all control row+column combinations
    'until we find a blank, which means controls stop.
    'Fully extensible and dynamic model.
    'Can grow indefinitely
    Do
        i = i + 1

        If ActiveSheet.Cells(i, i) = "" Then
            iCntrlCols = i - 1
            Exit Do
        End If
    Loop

    'determining last cell in the sheet
    'we always use row 1, so the total count will always be=last row
    iRows = ActiveSheet.UsedRange.Rows.Count
    iCols = ActiveSheet.UsedRange.Columns.Count

    sSlideName = CleanString(ActiveSheet.Name)

    wksOutP.Cells(1, 1) = "SlideName"
    wksOutP.Cells(1, 2) = "SectionName"
    wksOutP.Cells(1, 3) = "ColName"
    wksOutP.Cells(1, 4) = "RowName"
    wksOutP.Cells(1, 5) = "Address"

    wksOutP.Range(wksOutP.Cells(1, 1), _
        wksOutP.Cells(1, 9)).Font.Bold = True

    iCounter = 1

    'resetting the counter to start outside the control column range
    For iC = 1 To iCntrlCols
        'reading in section name
        strSection = Cells(iC, iC)

        For iR = (iCntrlCols + 1) To iRows
            'looping through all columns in current control row
            For i = (iCntrlCols + 1) To iCols
                'if both coordinates are not left blank AND 
                'not description tags then need to process for Xpath
                If Cells(iR, iC) <> "" And _
                    Cells(iC, i) <> "" And _
                    Left(Cells(iR, iC), 1) <> "[" And _
                    Left(Cells(iC, i), 1) <> "[" Then

                    If Not Cells(iR, i).Interior.ColorIndex = 1 And _
                        Not Cells(iR, i).Interior.ColorIndex = 15 And _
                        Not Cells(iR, i).Formula Like "=*" Then

                        iCounter = iCounter + 1
                        wksOutP.Cells(iCounter, 1) = sSlideName
                        wksOutP.Cells(iCounter, 2) = strSection
                        wksOutP.Cells(iCounter, 3) = Cells(iC, i)
                        wksOutP.Cells(iCounter, 4) = Cells(iR, iC)
                        wksOutP.Cells(iCounter, 5) = _
                            ActiveSheet.Range(Cells(iR, i), _
                            Cells(iR, i)).Address
                    End If
                End If
            Next
        Next
    Next
End Sub

Conclusion

The sample code in this article is generic and can be used for most scenarios requiring support of flexible XML mapping with little or no modification. Currently we use this code in a project called "The Rhythm of the Business 2.0," which supports a flexible and constantly changing Excel template used by more than 1,000 Microsoft employees to assess the entire company's performance on a monthly basis. In our project we also needed to harvest the unaltered data-point descriptions for use in a reporting application. We did this by adding meta tags, using square brackets for marking up the description positions. (This article does not include the code that retrieves the description values.) We encountered another reporting issue related to the Excel formulas. In a future article, we will describe how to take advantage of the grid structure to programmatically extract and store Excel formulas in a Microsoft SQL Server database and replicate the UI-side calculations using dynamic SQL.

Additional Resources

The following articles can help you develop more custom XML solutions using Excel: