Part 2: Mapping XML from SQL Server to a Single Cell in Excel 2003
Sergei Gundorov
Clark Hamilton
Microsoft Corporation
March 2005
Applies to:
Microsoft Office Editions 2003
Summary: In the second of this three-part series, learn how to map well-formed XML from a standard SQL Server query to a single cell in a worksheet. The resulting output is an XML tree with non-repeating elements. (20 printed pages)
Contents
Introduction
Underlying Problems That the Sample Code Addresses
SQL XML Output Requirements
Automating the Transformation Process
Exploring the XML Map
Manually Importing and Exporting XML Data
Conclusion
Additional Resources
Introduction
XML maps in Microsoft Office Excel 2003 provide one of the fastest ways to import data into a spreadsheet. The methodology works seamlessly for data files with repeating elements that take advantage of a list object. However, many scenarios call for a single-mapped cell or one-to-one XML element to cell mapping. Most XML sources don't provide the data in the non-repeating element, XML tree format that is necessary for single-cell mapping. Creating XSLT templates to transform a typical FOR XML RAW input into an XML tree can be complex and requires in-depth, specialized knowledge.
This article presents a straightforward and generic programmatic solution for consuming well-formed XML from a standard Microsoft SQL Server query output of SELECT ...FOR XML RAW (for example, <ROOT><row attr1="Value1-1" attr2="Value1-2".../><row attr1="Value2-1" attr2="Value2-2".../>...</ROOT>). The output is an XML tree with non-repeating elements, which is required for Excel single-mapped cells.
Underlying Problems That the Sample Code Addresses
Single-cell mapping in Excel has specific requirements:
- Elements must be non-repeating, or Excel mapping will immediately create a list.
- Data must be presented in an XML tree format when dealing with more than one dimension.
The data source side presents its own challenges:
- Data rarely comes in an XML tree format compatible with Excel single-cell mapping.
- Depending on the nature of the data, it can have a varying number of dimensions; also, dimension names change from data source to data source.
The latter is one of the main reasons why we decided to create an XML conversion component instead of using XSLT. We needed to transform varying data inputs, with different numbers of dimensions and dimension element names. We ruled out an XSLT plus FOR XML EXPLICIT approach because it did not provide the desired level of flexibility. The following code gives us the ability to use any single data value query result with the FOR XML RAW clause when we have non-repeating dimension value combinations.
Here are a couple of sample files with source input from the SQL XML SELECT ... FOR XML RAW query and its transformed output:
1. Four Dimensions
Source:
<?xml version="1.0" encoding="utf-8" ?><root>
<row SlideName="Client" SectionName="Revenue" RowName="Academic" ColumnName="ACT_CUR" DataValue="1000" />
<row SlideName="Client" SectionName="Revenue" RowName="Academic" ColumnName="ACT_YTD" DataValue="1000" />
<row SlideName="Client" SectionName="Revenue" RowName="Commercial" ColumnName="ACT_CUR" DataValue="1000" />
<row SlideName="Client" SectionName="Revenue" RowName="Commercial" ColumnName="ACT_YTD" DataValue="1000" />
</root>
Transformed Output Compatible with Excel Single-Cell Mapping:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<ROOT xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Client>
<Revenue>
<Academic>
<ACT_CUR>1000</ACT_CUR>
<ACT_YTD>1000</ACT_YTD>
</Academic>
<Commercial>
<ACT_CUR>1000</ACT_CUR>
<ACT_YTD>1000</ACT_YTD>
</Commercial>
</Revenue>
</Client>
</ROOT>
2. Two Dimensions
Source:
<?xml version="1.0" encoding="utf-8" ?>
<root>
<row SlideName="Client" CommentName="Comment1" DataValue="value" />
<row SlideName="Client" CommentName="Comment2" DataValue="value2" />
</root>
Transformed Output Compatible with Excel Single-Cell Mapping:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<ROOT xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Client>
<Comment1>value</Comment1>
<Comment2>value2</Comment2>
</Client>
</ROOT>
SQL XML Output Requirements
The SQL XML FOR XML RAW query output presents an ideal XML format for programmatically converting source data from database tables into Excel single-mapped cells because it consistently supplies data in the standardized form (for example, the <row ../> element is guaranteed to be there). We can rely on the structure of the source XML document or XML file for our transformation steps. RAW XML is the first requirement of our sample code. The other requirements are:
- All dimension elements must be non-repeating. You can achieve this by adding the GROUP BY [dimensions list] clause to the query.
- Dimension attributes must be in hierarchical order from the highest dimension to the lowest. For example: SheetName (top-level dimension), TableName (next level down; the sheet may have one or more unique tables that can also appear on other sheets), RowName (next level down; the same RowName can appear once in other tables on the same or other sheets), ColumnName (next level down; the same ColumnName can appear once under the same RowName, but it can be present in other RowNames in the same table).
- You must include a data value attribute immediately following the attributes that contain the dimension element names. The data value attribute holds the value that maps to an Excel cell.
- Dimension attribute values cannot contain spaces and special characters because they will become XML elements. For example, <row SlideName="ServerTools" .../> is a valid attribute value because it can become an element name. However, <row SlideName="Server & Tools" .../> is not valid because it contains two spaces and an ampersand and therefore cannot become an element name. You should consult XML documentation for more details about element name restrictions.
The following sample query follows these requirements and produces the four-dimension sample XML shown earlier:
SELECT SlideName, SectionName,RowName,ColumnName,SUM(DataValue) as DataValue FROM [source table name]
WHERE [filter]
GROUP BY SlideName, SectionName,RowName,ColumnName
FOR XML RAW
The GROUP BY clause that includes all dimensions guarantees that the final output file will contain non-repeating elements.
Note The SQL Server query generates an XML fragment. Our code can consume this fragment without a header, or you can choose to add an XML header programmatically when retrieving the data to facilitate other manipulations.
Automating the Transformation Process
Our goal was to create a universal, reusable component to work with an XML Document object that holds an SQL XML FOR XML RAW query result that meets the requirements listed in the previous section. We reviewed many scenarios for importing data into an Excel workbook, and the most efficient scenario used one database call for each XML source with the same dimensionality (to minimize network traffic), and had one map in each sheet for each type of map (that is, a map with the same dimensions). It is important to note that, in our tests, one big XML map that included data for all sheets in the workbook performed slower than individual sheet-by-sheet maps (approximately 6,000 records per workbook with 15 worksheets). This is why we implemented the splitAttr variable in our code. We split the source file into multiple files based on this attribute. Typically you would want the splitAttr variable to be a sheet name or a reference to a sheet name. Excel XML maps will infer a schema from the XML tree non-repeating element source, so the only thing left to do is generate properly formatted XML input. The following sample code does just that.
Microsoft Visual Basic .NET Example
Option Explicit On
Imports System.Xml
Imports System.Windows.Forms
Namespace CustomXMLTransform
'****************************************************************
'Purpose: bypass complex XSLT transformations and have
'generic object to read in any SQL XML .. FOR XML RAW input with
'variable number of dimensions and non-repeating elements
'compatible with Excel single-cell mapping format.
'Output as an XMLDocument object and XML file(s)
'ready for Excel XMLMap use
'****************************************************************
'NOTE: setting up special class seems to be as fast as a hashtable
'and allows for easier code port to VB or VBA if .NET solution is
'not an option
'
'use Type for VB or VBA
'NOTE: reference MS XML V5 in VBA or VB project.
'Type xDOMCollection
' Name As String
' xDOM As New MSXML2.DOMDocument
'End Type
Friend Class xDOMCollection
Public Name As String
Public xDOM As New XmlDocument
End Class
'Error handling deliberately left out, so that using classes can
'write own handling
Public Class RawXMLToXMLTree
'pre-allocating 50 slots for xDOMCollection entries
Public Const FILE_COUNT As Integer = 50
Friend Shared xDOMOut(FILE_COUNT) As xDOMCollection
'NOTE: can add return type as XMLDocuments collection
'(for example, ArrayList)
Public Shared Function TransformXMLDom( _
ByVal xDOMIn As XmlDocument, _
ByVal outputPath As String, _
ByVal splitAttr As String, _
ByVal dataValueAttr As String)
Dim rowNodeList As System.Xml.XmlNodeList
Dim xmlKeyNode As System.Xml.XmlNode
Dim curNode As String
Dim prevNode As String
Dim curElement As String
Dim i As Integer
Dim ii As Integer
Dim xDOMIndex As Integer
'Construction of the container
'the upper boundary can be passed as a parameter
'or xDOMCollection can be rewritten as Hashtable
For i = 0 To FILE_COUNT
RawXMLToXMLTree.xDOMOut(i) = New xDOMCollection
Next
'selecting all rows
'since using FOR XML RAW <row../> element will be there by
'definition
rowNodeList = xDOMIn.SelectNodes("//row")
'processing every row in the source file
'and adding new node to the output file if the node is not
'found stop at DataValue attribute and add as value to the
'lowest node in the output file
For i = 0 To rowNodeList.Count - 1
'resetting curnode value
'Every output file will have DocumentElement ROOT
prevNode = "ROOT"
For ii = 0 To rowNodeList(i).Attributes.Count - 1
'determining if xDOM for this split attribute value
'already exists and setting collection index to point
'to correct DOM in collection allows for any sort
'order in the source raw XML and is faster than
'resetting rowNodeList to
'xDOMIn.SelectNodes("//row[@SlideName=[slidename]")
If rowNodeList(i).Attributes.Item(ii).Name = _
splitAttr Then
xDOMIndex = RawXMLToXMLTree.GetxDomIndex _
(rowNodeList(i).Attributes.Item(ii).Value)
End If
'constructing current node query string
If Not rowNodeList(i).Attributes(ii).Name = _
dataValueAttr Then
curElement = rowNodeList(i).Attributes(ii).Value
curNode = prevNode & "/" & rowNodeList(i). _
Attributes(ii).Value
'faster to use set here as opposed to
'selectSingleNode in the Else clause
xmlKeyNode = RawXMLToXMLTree.GetNode(xDOMIn, _
xDOMOut(xDOMIndex).xDOM, prevNode, curNode, _
curElement)
'if successfully added a node then setting
'prevNode to current node for next iteration
prevNode = curNode
Else
'adding data value to XMLKey node in the XML
'Tree output
xmlKeyNode.InnerText = rowNodeList(i). _
Attributes(ii).Value
'stop processing record after data value attribute
'Covers case when SQL source XML file has
'additional attributes for metadata
Exit For
End If
Next
Next
'processed file is saved off to multiple files
For i = 0 To FILE_COUNT
'exiting collection if encounter first blank entry
If xDOMOut(i).Name = "" Then Exit For
'saving off XML file compatible with Excel single
'cell mapping to the disk
'can be put in any container object for direct Excel XML
'Map import method as well
xDOMOut(i).xDOM.Save(outputPath & xDOMOut(i).Name & _
"_XLSMAP.xml")
'Here can add extension for loading XMLDocument
'to another object or
'Excel(XMLMap imorting)
'Sample line: [Workbook_Object].XmlMaps _
'("[Your_Map_Name]").ImportXml [XMLDocument Object].XML
Next
End Function
Shared Function GetNode( _
ByRef xDOMIn As XmlDocument, _
ByVal xDOMOut As XmlDocument, _
ByVal prevNode As String, _
ByVal curNode As String, _
ByVal curElement As String) _
As System.Xml.XmlNode
Dim activeNode As XmlNode
Dim outCurrentNode As XmlNode
Dim nodeOut As XmlNode
activeNode = xDOMOut.SelectSingleNode(curNode)
'this test provides maximum efficiency
'we need to add dimension elements and data value(s) on
'single pass through xDOMIn
If activeNode Is Nothing Then
GetNode = xDOMOut.SelectSingleNode(prevNode)
'in the output document always begin to add
'file split attribute (top level)
'at the ROOT level
nodeOut = xDOMOut.CreateNode(XmlNodeType.Element, _
curElement, "")
GetNode = GetNode.AppendChild(nodeOut)
End If
End Function
Shared Function GetxDomIndex(ByVal splitAttr As String) _
As Integer
Dim i As Integer
For i = 0 To FILE_COUNT
If RawXMLToXMLTree.xDOMOut(i).Name = splitAttr Then
Exit For
ElseIf RawXMLToXMLTree.xDOMOut(i).Name = "" Then
RawXMLToXMLTree.xDOMOut(i).Name = splitAttr
RawXMLToXMLTree.xDOMOut(i).xDOM.LoadXml( _
"<?xml version=""1.0"" encoding=""UTF-8""" & _
" standalone=""yes""?>" & _
"<ROOT xmlns:xsi=""" & _
"http://www.w3.org/2001/XMLSchema-instance"">" & _
"<" & splitAttr & "/>" & _
"</ROOT>")
Exit For
End If
Next
GetxDomIndex = i
End Function
End Class
'class used for testing
'exclude from project if plan to build a DLL
Public Class TestXMLTransformation
Public Shared Sub Main()
Dim startTime As DateTime
Dim endTime As DateTime
Dim xDOMIn As New XmlDocument
Dim sourceFileNameDlg As _
New System.Windows.Forms.OpenFileDialog
Dim result As DialogResult
Dim outputPath As String
Try
'setting up open file dialog
sourceFileNameDlg.Title = _
"Select source XML file to convert to XML Tree:"
sourceFileNameDlg.Filter = "XML files (*.xml)|*.xml"
result = sourceFileNameDlg.ShowDialog()
If result = DialogResult.OK Then
Dim fileName As String
fileName = sourceFileNameDlg.FileName
'output files will be written in the same
'folder as the source file.
'NOTE: make sure that the output file names
'do not overwrite source file!
outputPath = fileName.Substring(0, _
fileName.LastIndexOf("\") + 1)
xDOMIn.Load(fileName)
'reading in possible defaults for XML file processing
'processing first data row
'uniform file structure is essential for successful
'file processing
'Underlying assumptions:
'1)We are processing SQL XML ..FOR XML RAW output
'that by definition will have <row ../> element
'2)Dimensions are hierarchical and go from the highest
'to lowest
Dim firstDataRowNode As XmlNode
firstDataRowNode = xDOMIn.SelectSingleNode("//row")
'assuming first attribute to be used for the file
'split
Dim splitAttr As String
splitAttr = firstDataRowNode.Attributes(0).Name
splitAttr = InputBox("Input attribute name to " & _
"use for source file multi-file split." & _
vbCr & vbLf & vbCr & vbLf & _
"(NOTE: " & _
"1. Attribute names are case sensitive. " & _
"2. Input top level dimension attribute " & _
"name even if not splitting " & _
"into multiple files.)", _
"Split Attribute Name Input", splitAttr)
'assuming last attribute to be used for Data Value
Dim dataValueAttr As String
dataValueAttr = firstDataRowNode.Attributes( _
firstDataRowNode.Attributes.Count - 1).Name
dataValueAttr = InputBox("Input attribute name " & _
"for data value element." & _
vbCr & vbLf & vbCr & vbLf & _
"(NOTE: individual record " & _
"parsing stops after this " & _
"attribute and the value of " & _
"this attribute will be " & _
"associated with already " & _
"processed dimension elements " & _
"of the current row.)", _
"Data Value Attribute Name Input", _
dataValueAttr)
'measuring total transformation time
startTime = startTime.Now()
'NOTE: XML is case sensitive, so attribute
'names passed to the function
'must match source XML file attributes case
RawXMLToXMLTree.TransformXMLDom(xDOMIn, _
outputPath, splitAttr, dataValueAttr)
MsgBox("Time: " & endTime.Now(). _
Subtract(startTime). _
ToString(), _
MsgBoxStyle.Information, _
"Elapsed Time:")
End If
Catch e As Exception
MsgBox(e.Message, MsgBoxStyle.Exclamation, _
"Error in processing XML file:")
End Try
End Sub
End Class
End Namespace
C# Example
using System;
using System.Windows.Forms;
using System.Xml;
namespace CustomXMLTransform
{
internal class xDOMCollection
{
public string Name;
public XmlDocument xDOM = new XmlDocument();
}
public class RawXMLToXMLTree
{
public const int FILE_COUNT = 50;
internal static xDOMCollection[] xDOMOut =
new xDOMCollection[FILE_COUNT];
public static XmlNode xmlKeyNode;
public static void TransformXMLDom(
XmlDocument xDOMIn,
string outputPath,
string splitAttr,
string dataValueAttr)
{
XmlNodeList rowNodeList;
string curNode;
string prevNode;
string curElement;
int xDOMIndex=0;
for (int i = 0; i < FILE_COUNT; i++)
{
RawXMLToXMLTree.xDOMOut[i] = new xDOMCollection();
}
rowNodeList = xDOMIn.SelectNodes("//row");
for (int i = 0; i < rowNodeList.Count; i++)
{
//each output file will contain DocumentElement ROOT
prevNode = "ROOT";
for (int ii = 0;
ii < rowNodeList[i].Attributes.Count;
ii++)
{
if (rowNodeList[i].Attributes.Item(ii)
.Name==splitAttr)
{
xDOMIndex = RawXMLToXMLTree
.GetxDomIndex(rowNodeList[i]
.Attributes.Item(ii).Value);
}
if (rowNodeList[i].Attributes[ii]
.Name != dataValueAttr)
{
curElement = rowNodeList[i].Attributes[ii].Value;
curNode = prevNode + "/" + curElement;
RawXMLToXMLTree.xmlKeyNode =
RawXMLToXMLTree
.GetNode(ref xDOMIn,
xDOMOut[xDOMIndex].xDOM,
prevNode, curNode, curElement);
prevNode = curNode;
}
else
{
RawXMLToXMLTree.xmlKeyNode.InnerText =
rowNodeList[i].Attributes[ii].Value;
break;
}
}
}
for (int i = 0; i < FILE_COUNT; i++)
{
if (xDOMOut[i].Name == null)
{
break;
}
xDOMOut[i].xDOM.Save(outputPath + xDOMOut[i]
.Name + "_XLSMAP.xml");
}
}
static XmlNode GetNode(ref XmlDocument xDOMIn,
XmlDocument xDOMOut,
string prevNode,
string curNode,
string curElement)
{
XmlNode activeNode;
XmlNode nodeOut;
activeNode = xDOMOut.SelectSingleNode(curNode);
if (activeNode == null)
{
activeNode = xDOMOut.SelectSingleNode(prevNode);
nodeOut = xDOMOut.CreateNode(XmlNodeType.Element,
curElement, "");
activeNode = activeNode.AppendChild(nodeOut);
}
return activeNode;
}
static int GetxDomIndex(string splitAttr)
{
int i=0;
do
{
if (RawXMLToXMLTree.xDOMOut[i].Name == splitAttr)
{
return i;
}
else if (RawXMLToXMLTree.xDOMOut[i].Name == null)
{
RawXMLToXMLTree.xDOMOut[i].Name = splitAttr;
RawXMLToXMLTree.xDOMOut[i].xDOM.LoadXml(
"<?xml version=\"1.0\" encoding=\"UTF-8" +
"\" standalone=\"yes\"?>"+
"<ROOT xmlns:xsi="+
"\"http://www.w3.org/2001/XMLSchema-instance\">" +
"<" + splitAttr + "/>" + "</ROOT>");
return i;
}
i++;
} while(i < FILE_COUNT);
return i;
}
}
public class TestXMLTransformation
{
public static void Main()
{
DateTime startTime;
XmlDocument xDOMIn = new XmlDocument();
System.Windows.Forms.OpenFileDialog sourceFileNameDlg =
new System.Windows.Forms.OpenFileDialog();
DialogResult result;
string outputPath;
sourceFileNameDlg.Title =
"Select source XML file to convert to XML Tree:";
sourceFileNameDlg.Filter = "XML files (*.xml)|*.xml";
result = sourceFileNameDlg.ShowDialog();
if (result == DialogResult.OK)
{
string fileName;
fileName = sourceFileNameDlg.FileName;
outputPath = fileName.Substring(0,
fileName.LastIndexOf("\\") + 1);
xDOMIn.Load(fileName);
XmlNode firstDataRowNode;
firstDataRowNode = xDOMIn.SelectSingleNode("//row");
string splitAttr;
splitAttr = firstDataRowNode.Attributes[0].Name;
string dataValueAttr;
dataValueAttr = firstDataRowNode
.Attributes[firstDataRowNode.Attributes.Count - 1]
.Name;
result=MessageBox.Show("File Split Attribute: " +
splitAttr + "\n" +
"Data Value Attribute: " + dataValueAttr + "\n\n" +
"Continue?","XML File Attributes",
System.Windows.Forms.MessageBoxButtons.OKCancel,
System.Windows.Forms.MessageBoxIcon.Question,
System.Windows.Forms.MessageBoxDefaultButton.Button1,
System.Windows.Forms.MessageBoxOptions
.ServiceNotification);
if(result==DialogResult.Cancel) return;
startTime = DateTime.Now;
RawXMLToXMLTree.TransformXMLDom(
xDOMIn,
outputPath,
splitAttr,
dataValueAttr);
MessageBox.Show("Time: " +
DateTime.Now.Subtract(startTime).ToString(),
"Elapsed Time:",
System.Windows.Forms.MessageBoxButtons.OK,
System.Windows.Forms.MessageBoxIcon.Information,
System.Windows.Forms.MessageBoxDefaultButton.Button1,
System.Windows.Forms.MessageBoxOptions
.ServiceNotification);
}
}
}
}
C# Example (Additional Notes)
The C# sample code does not have code comments. The code comments for Visual Basic .NET classes and methods with the same names are applicable for the C# code. One exception: C# does not support InputBox, so the Main method in TestXMLTransformation does not implement it. However, methods in the class RawXMLToXMLTree will behave identically.
Exploring the XML Map
You can use the previous four-dimension source XML sample file to test the code. After you run the sample file, you will find the [split dimension attribute value]_XLSMAP.XML file in the same folder as the source file. (If you are using the four-dimension sample, the output file name will be Client_XLSMAP.XML.) To test, open the Data menu, select XML and then Import, and follow the prompts. If you select XML Source instead, you will see the XML Source window as shown in Figure 1. You can drag individual XML values anywhere on the spreadsheet. The list object for XML files with repeating elements will not give you the same flexibility.
Figure 1. Mapping the spreadsheet cells by using the XML Source window (Click to see larger image)
Now that you have mapped the spreadsheet cells, you can move data cells around, map and unmap elements, and import and export data out of the map in the same format as the [split dimension attribute value]_XLSMAP.XML file.
Manually Importing and Exporting XML Data
In Excel, open the Data menu, select XML and then Export to save mapped cell data as an XML document file. To refresh your spreadsheet from an XML data file, open the Data menu, select XML and then Import, and browse for an XML document conforming to your XML map schema.
Conclusion
The sample code in this article is generic and can be used for most scenarios requiring support of flexible data-transfer processes between SQL Server and an Excel spreadsheet. Currently we use this code in a project called "The Rhythm of the Business 2.0," which supports a flexible and constantly changing Excel template used by more than 1,000 Microsoft employees to assess the entire company's performance on a monthly basis. We process between 6,000 and 10,000 records on each data refresh on the client computer. The transformation process takes approximately 0.3–0.6 seconds. This time includes writing 15 XML output files to disk for data caching and depends on CPU speed (3GHz in our case). Our code is efficient and still very flexible.
Here are the key benefits of our approach: It supports any number of dimensions, and the sort order of the XML input is not important. In our previous article, Part 1: Automating the XML Data Mapping Process in Excel 2003, we described how to automate the process of single-cell XML mapping in the spreadsheet. Our next article provides a solution for harvesting calculated cell values that do not have XML mapping.
Additional Resources
The following content can help you develop more custom XML solutions using Excel:
- Part 1: Automating the XML Data Mapping Process in Excel 2003
- Microsoft Knowledge Base Article – 278872: How to Specify Parameters for an XML Template Query from Visual Basic
- COM Callable Wrapper in the Microsoft .NET Framework Developer's Guide
- Assembly Registration Tool (Regasm.exe) in the Microsoft .NET Framework SDK