Using the Excel 2003 Object Model to Add XML Data Integration
PH&V Information Services
Microsoft® Office Excel 2003
Microsoft Office System
Summary: Connecting workbooks to data sources so that spreadsheets can manipulate and update data is a common goal in Office-based solutions. Microsoft Office Excel 2003 makes this much easier and more sweeping in its power by adding more XML data integration capabilities. This article demonstrates using the Excel object model to take advantage of the XML support in Excel including namespaces, XML Lists, XML Maps, data binding and more. (11 printed pages)
The ability to retrieve data into the spreadsheet and use the results of the work completed with the spreadsheet to update a data source or an application is critical to making spreadsheets useful. Without the ability to integrate data from a variety of sources, spreadsheet users are required to copy data from data sources into the spreadsheet and add data from the spreadsheet back into whatever system needs the data.
Microsoft® Office Excel 2003 adds the ability to work with any XML document through enhanced XML data interoperability. With this improvement, there is one requirement that must be met for Excel to integrate into any application: the application must either produce or accept XML.
As an example of an XML document used in an application, the following XML document holds sales order information, including elements for customer information and repeated elements for product information:
<so id="A1024" type="Rush" > <Customer> <Name>Pat Smith</Name> </Customer> <Products> <Line> <ProductId>AX142</ProductId> <Quantity>12</Quantity> </Line> <Line> <ProductId>BZ739</ProductId> <Quantity>1</Quantity> </Line> </Products> </so>
Excel does not require that the XML document have a corresponding schema that describes the document's format. Excel generates a schema for any document that it imports. However, for the example shown in this article, a schema for this document is assumed to be in a file with the following path: c:\schemas\salesorder.xsd.
In Excel, the XmlMap object represents one or more schemas and their mapping to the spreadsheet. To load a new XML map, you must add a new schema to the XmlMaps collection. This code adds a new schema file to the Workbook object and returns a reference to the map:
Dim xmp As XmlMap Set xmp = _ Application.Workbooks(1).XmlMaps.Add("c:\schemas\salesorder.xsd")
The Add method accepts a second parameter allowing you to specify the root element to use in the spreadsheet. The root element parameter is essential when Excel cannot determine which element is the root element from among multiple top-level elements.
Once a map loads, you can refer to it by its position or by its name in the XmlMaps collection. Excel forms the initial name for the map by appending _Map to the name of the root element. You can retrieve the name of a map from the Name property of the XmlMap.
The root element in the sales order document loaded by the code in the previous example is called "so"; you can also find a reference to the schema from the XMLMaps collection with the following code. Note that the map name is case sensitive:
Set xmp = Application.Workbooks(1).XmlMaps("so_Map")
Tip It is a good practice to assign the map a name that you control immediately after loading the schema:
Dim xmp As XmlMap Set xmp = Application.Workbooks(1).XmlMaps.Add("c:\schemas\salesorder.xsd") xmp.Name = "SalesOrder"
To remove a schema and its related mappings from the Workbook, just use the Delete method of the XmlMap. The data in the spreadsheet's cells is not affected.
A schema may reference other schemas. If it does, the Add method of XmlMaps also adds all of those schemas into the map. You can access the schemas in a map through the Schemas collection for XmlMap. This code, for example, retrieves the XML for the first schema in the map (the so schema in the example):
Dim strSchemaXML As String strSchemaXML = Set xmp = Application.Workbooks(1).XmlMaps("SalesOrder").Schemas(1).XML
You map elements and attributes from the schema to cells in the spreadsheet with XPath statements that specify an element or attribute in the schema. Excel passes the XPath statement to the SetValue method of a Range object's XPath object, along with a reference to the map for the schema. This example, for example, maps cell A1 to the Name child of the Customer element under the so root element:
Dim cll As Range Dim xp As XPath Set xp = ActiveSheet.Range("A1").XPath xp.SetValue ActiveWorkbook.XmlMaps(1), "/so/Customer/Name"
Excel maps attributes in the same way, using the "at" symbol (@) in front of the attribute's name to indicate that the path refers to an attribute:
ActiveSheet.Range("A2").XPath.SetValue _ ActiveWorkbook.XmlMaps(1), "/so/@id"
You can retrieve a cell's mapping through the XPath object's Value property. This code displays the mapping established for cell A1:
Excel returns the error "The XPath is not valid because either the XPath syntax is incorrect or not supported by Excel" if you use a badly formed XPath statement. However, you also get this message if you violate any of the rules for Excel XPath statements described in the next section.
You can only map a cell to one element or attribute at a time. You can change mapping to a range at any time by calling the SetValue method and passing a new path. You can break the mapping between a range and an element by calling the XPath object's Clear method like this:
You can use predicates within your XPath statement as part of specifying which elements to map to the spreadsheet. Excel supports XPath limited to the following:
- The XPath is an absolute path to a simple-content element or attribute
Example: "/ns1:root/ns1:row/ns1:column1" is supported if 'column1' is a child-most node, but not "/ns1:root/ns1:row" for the same document since 'row' is not a child.
- The XPath does not express axes, but uses the default child axes
Example: "/ns1:root/ns1:row" is supported but not "/ns1:root/child::ns1:row
- An optional filter can be expressed at the end of the xpath
Example: "/ns1:root/ns1:row/ns1:column1[@foo='abc']" is supported but not "/ns1:root/ns1:row[@foo='abc']/ns1:column1"
- The filter can only contain a single expression comparing a named attribute to a specific value
- Filters are only supported on XPaths that resolve to a simple-content element (not attributes)
- The named attribute must be defined as an attribute of the simple-content element
- The attribute name must be preceded by the short-hand (@) symbol representing the axes 'attribute'
Example: "/ns1:root/ns1:row/ns1:column1[@foo='abc']" is supported not "/ns1:root/ns1:row/ns1:column1[attribute::foo='abc']")
- An arbitrary amount of white-space can be embedded between filter tokens
Example: "/ns1:root/ns1:row/ns1:column1[ @ foo='abc']" is valid
You must associate the element or attributes in your XPath statement with the namespace used in the schema. The third parameter to the SetValue method accepts a namespace definition, using the same syntax as in an XML document. In the following example, the namespace http://phvis.com/exSalesOrder is associated with the elements in the document, using the prefix so. This code passes the map, an XPath statement with namespace prefixes, and a definition of the namespace to the SetValue method:
xp.SetValue xmp, "/so:so/so:Products/so:Line/so:Quantity", _ "xmlns:so='http://phvis.com/exSalesOrder'"
You can retrieve the namespace associated with a schema from the Schema object in Schemas collection for XmlMap. Rewriting the previous code to take advantage of this property gives this code:
xp.SetValue xmp, "/so:so/so:Products/so:Line/so:Quantity", _ "xmlns:so='" & xmp.Schemas(1).Namespace & "'"
Excel automatically creates a namespace and prefix for the schema when the schema is loaded. The prefix uses the format ns1, ns2, etc. The Workbook object's Namespaces collection lets you retrieve the prefix that Excel created for a namespace when loading the schema. This code builds the namespace definition using the Namespaces collection:
Dim strNSDefinition As String strNSDefinition = "xmlns: " & _ ActiveWorkbook.XmlNamespaces(1).Prefix & _ "=" & ActiveWorkbook.XmlNamespace(1).URI xp.SetValue xmp, "/so:so/so:Products/so:Line/so:Quantity", _ strNSDefinition
If you use the prefix for Excel, you don't need to provide the namespace/prefix definition when calling the SetValue method. This example uses the ns1 prefix used by Excel for the first schema loaded:
xp.SetValue xmp, "/ns1:so/ns1:Products/ns1:Line/ns1:Quantity"
In the sales order example, the ProductId and Quantity elements can repeat many times. Where an element repeats inside a schema, you can assign the element to a column of cells. All that's necessary is to pass True as the SetValue method's fourth parameter:
Dim xmp As XmlMap Dim cll As Range Dim xp As XPath Set xp = ActiveSheet.Range("B1 ").XPath xp.SetValue xmp, _ "/ns1:so/ns1:Products/ns1:Line/ns1:ProductId", , True Set xp = ActiveSheet.Range("C1").XPath xp.SetValue xmp, "/ns1:so/ns1:Products/ns1:Line/ns1:Quantity", , True
The result of mapping a repeating element to a range of cells is to create a list in Excel (see Figure 1). Automatically created XML-related lists appear as a single column with a heading at the top (based on the element or attribute name) and a cell for adding data at the bottom of the list. As you add data to the final cell, the list is extended automatically to the next row.
Figure 1. A list created in Excel through mapping an element to a column of cells
You should always map the lowest level of data in the document's hierarchy. Although you can map container elements (such as Products, in the example) without an error, the resulting map does not generate an XML document. As an example, in the Sales Order example, this means that the components that you can map within the document are:
- The ID and type attributes
- The Name, ProductId, and Quantity elements
Mapping the container elements (Customer, Products, and Line) generates the "XPath syntax is incorrect" message.
You can retrieve a reference to a list from the Worksheet object's ListObjects collection in order to configure the list.
This code, for example, retrieves a reference to the first list in the spreadsheet and assigns a name to the list:
Dim lst As ListObject Set lst = ActiveSheet.ListObjects(1) lst.Name = "ProductId"
You can use the ListObject object's name to retrieve it from the ListObjects collection (the names assigned by Excel are in the format list1, list2, etc.). This code retrieves the list using the name just assigned:
Set lst = ActiveSheet.ListObjects("ProductId")
From the ListObject object's Range property, you can retrieve the Range object for the list. From the Range object, you can access the list's XPath object, letting you clear or change the list's mapping. A list also has three other Range objects:
- HeaderRowRange: The cell at the top of the list that contains the header
- DataBodyRange: The range of the cells in the list that contain data
- InsertRowRange The cell at the bottom of the list in which you can add data. This range is only available when one of the cells in the list is the currently active cell.
Many elements and attributes don't have user-friendly names, so a typical application for a ListObject is to set the name at the top of the list:
lst.HeaderRowRange.Value = "Product Number"
You can turn a list back into a set of ordinary cells by calling its Unlink method.
The Active property of a ListObject is True when one of the cells in the list is the currently active cell. This code adds a new value to the bottom of the list:
If lst.Active Then lst.InsertRowRange.Value = "BZ734" End If
In addition to loading the schema that describes an XML document, you can also load the data from an XML document into an Excel spreadsheet. When you load a document that uses a currently loaded schema, the data in the document is inserted into the cells mapped to the document's elements.
Importing to an XMLMap
Loading an XML document into a mapped workbook is completed using the Import method of the XmlMap object. This code loads a sales order document into a workbook that already includes the "so" schema mapped to cells in the workbook:
ActiveWorkbook.XmlMaps("SalesOrder").Import _ "C:\Documents\SalesOrder.XML", False
Setting the Import method's second parameter to False prevents data from the imported document from overwriting data that is already in lists in the workbook. Instead, Excel appends the list data to the already existing data (Excel does not import the file if the only elements in the grid are non-repeating when the second parameter is set to False). You can also control whether Excel appends data when it loads by setting the XmlMap's AppendOnImport property to True.
The Import method returns an integer value indicating whether the import was a success. The three values for the returned value are:
- xlXmlImportSuccess: Import succeeded
- xlXmlImportElementsTruncated: Import succeeded but not all data fits into the worksheet
- xlXmlImportValidationFailed: Import failed because the document was not valid when compared to the map's schema
You can make the import process more interactive by retrieving the name of the map associated with the current cell. The Map property of a Range object's XPath object contains a reference to the map used with the cell. This code uses the Map object's Name property to determine which map to use, loads a document, and then checks the result returned:
Dim res As XlXmlImportResult res = ActiveWorkbook.XmlMaps(ActiveCell.XPath.Map.Name).Import( _ "C:\Documents\SalesOrder.XML", False) Select Case res Case xlXmlImportElementsTruncated MsgBox "Not all data fit into the spreadsheet. " Case xlXmlImportValidationFailed MsgBox "Document not valid. " End Select
Map objects also include an ImportXML method that imports a string of XML. This example imports a sales order document that consists of only the root element:
ActiveWorkbook.XmlMaps("SalesOrder").ImportXml _ "<so:so xmlns:so='http://phvis.com/xSalesOrder'/>", True
Databinding a Map
The XMLMap object's DataBinding property provides another way to load XML documents into the spreadsheet. The DataBinding object's LoadSettings method accepts a wide variety of inputs, including URLs or the path name, to a UDC file and a WSDL file describing a Web service. The only requirement is that the data source specified by the parameter must return an XML document. In this example, the data binding is set to a URL for an XML document:
ActiveWorkbook.XmlMaps("SalesOrder").DataBinding.LoadSettings _ "http://Customers/so.xml"
With the settings in place, you can load the XML data into the spreadsheet by calling the DataBinding object's Refresh method:
You can break the DataBinding object's connection to the data source by calling the object's ClearSettings method. You can also retrieve the URL for the data source by reading the DataBinding object's SourceURL property:
MsgBox "Breaking Connection to " & _ ActiveWorkbook.XmlMaps("SalesOrder").DataBinding.SourceURL ActiveWorkbook.XmlMaps("SalesOrder").DataBinding.ClearSettings
Importing from a Worksheet
Another way to import data is to use the Workbook object's XMLImport method, which imports an XML document and generates an XMLMap for the document. The XMLImport method accepts the name of an XML document as its first parameter. The second parameter must be a variable of the type XMLMap, which, after Excel imports the document, contains a reference to the XMLMap created by the import. Excel generates a schema for the document if the document does not contain a reference to a schema. The third parameter to the XMLImport method allows you to indicate whether to overwrite data already in the range (True) or to append the new data to the existing data (False). Finally, the fourth parameter is the range into which Excel should import the data. Excel creates lists for any repeating elements automatically.
The following code imports a sales order document into the range beginning at cell A1. Excel overwrites any existing data:
Dim xmp As XmlMap ActiveWorkbook.Import "c:\documents\MyBook.XML", xmp, True, _ ActiveWorkBook.Range("A1")
You can also load data using the Workbook object's XMLImportXML, which imports XML strings in the same way as the XMLMap object's ImportXML method imports XML strings.
You can retrieve data in cells that you assign to an element using the XMLDataQuery method. The XMLDataQuery method accepts an XPath statement that described a mapping to a cell and returns an Excel Range object. For example, this code retrieves the Excel Range object mapped to the Name element:
Dim rng As Range ActiveWorkbook.Worksheets(1).XmlDataQuery( _ "/ns1:so/ns1:Customer/ns1:Name").value
The XPath expression must match an XPath expression used to map a cell to an element or attribute. If you use a predicate as part of the mapping, you can omit the predicate if it was the last part of the XPath statement.
You can map an element only once. Therefore, it is important to determine whether you already mapped an element. The XMLMapQuery provides a way to test whether you already mapped an element to another element. If no Range object is returned by the XMLMapQuery method, there is no mapping for the element. This example demonstrates the technique by checking for mapping to the ProductQuantity element:
Dim rng As Range Set rng = ActiveWorkBook.WorkSheets(1). _ XmlMapQuery("/ns1:so/ns1:Products/ns1:Line/ns1: Quantity") If rng Is Nothing Then xp.SetValue xmp, _ "/ns1:so/ns1:Products/ns1:Line/ns1:Quantity", , True End If
An important part of using Excel to update and create XML documents is saving the document. You can create a document for any of the loaded schemas using the Workbook object's SaveAsXMLData method. You must pass the method a path name to the saved file and add a reference to the map whose elements you want to write out. This example creates a file called ASalesOrder.XML from the first map in the collection:
ActiveWorkbook.SaveAsXMLData "C:\Documents\ASalesOrder.XML", _ ActiveWordbook.XmlMaps(1)
You can also create documents from the XMLMap object. The XMLMap object includes both ExportXML and Export methods. The ExportXML saves the XML document into a string variable, and the Export exports the mapped data to file. Both methods return one of two values indicating whether the export was a success:
xlXmlExportSuccess: The export succeeded. xlXmlExportValidationFailed: The export did not succeed because the data in the spreadsheet is not valid to the schema.
Check the map's IsExportable property to ensure a successful export. For example, a map is not exportable if you do not supply required fields supplied or if you provide an invalid value for an element. These two examples demonstrate the two methods:
Dim strXML As String If ActiveWorkbook.XmlMaps(1).IsExportable Then ActiveWorkbook.XmlMaps(1).ExportXML strXML End If Dim xmp As XmlMap Set xmp = ActiveWorkbook.XmlMaps(1) If xmp.IsExportable Then ActiveWorkbook.XmlMaps(1).Export "c:\Documents\ASalesOrder.XML" End If
The Application and Workbook objects in Excel fire events before and after importing and exporting XML documents. To use these events, create global variables and assign them to the appropriate objects. This code does that as part of a class module's Initialize event:
Dim WithEvents exc As Application Dim WithEvents sheet As WorkSheet Private Sub Class_Initialize() Set sheet = ActiveSheet Set exc = Application End Sub
In some other event, such as the AutoOpen routine in Excel, you need to create the class module. With that completed, the code in the class module can catch the XML events.
Excel passes the "before" events (WorkbookBeforeXMLImport and WorkbookBeforeXMLExport) references to the workbook, the map to use, and a Boolean variable that, if set to True in the event's code, cancels the import or export. One parameter is different between the import and export events: Excel passes the export event the path name to the file; Excel passes the import event a Boolean indicating whether or not this import is occurring during a refresh of the workbook's data.
Excel passes the "after" events (WorkbookAfterXMLExport and WorkbookAfterXMLImport) references to the workbook and the map used in the export. Excel passes both events a result code indicating whether the export or import succeeded. Similar to the before events, the remaining parameter for these events is different for the import and export events: Excel passes The export event the path name of the file to create; Excel passes the import event a Boolean indicating whether or not the import should occur during a refresh of the workbook's data.
The Workbook object also fires events both before and after XML imports and exports. The parameters passed to these events are the same as the equivalent Application events, except that Excel passes no reference to the workbook. For example, Excel passes the AfterXMLExport event a reference to the Map object to use, the file to to create, and the result code for the export.
As an example, this code uses the Workbook object's before event to check that you are attempting to create a file in the directory,
c:\documents\ and, if not, cancels the export:
Private Sub sheet_BeforeXmlExport(ByVal Map As XmlMap, ByVal Url _ As String, Cancel As Boolean) If Instr(Url, "c:\documents\") = 0 Then Cancel = True MsgBox "Document must be saved to the C:\Documents directory. " End If End Sub
The latest version of Excel acquires a rich set of functionality through integration with XML. The object model for Excel provides tools to create applications that integrate Excel into XML-based workflows. By extending the base functionality of Excel with customized code, you can allow users to create and update XML documents using any XML vocabulary.
About the Author
Peter Vogel is a consultant on Office and .NET development. He is the author of Visual Basic Object and Component Handbook (Prentice Hall PTR, 2000, ISBN: 0-13023-0731) and the editor of the Smart Access newsletter.