Schemas: Tapping the Potential Within
August 16, 1999
In the following article, I will explain how to work with the benefits of schema I described in last month's article. I will show how to take advantage of the fact that schemas are extensible and open, and I will demonstrate how you can annotate schemas to further describe your data and understand those annotations when processing the data.
To extend the functionality of your schemas, annotate them with elements and attributes. This is useful if you want a more specific description of a node than a specific schema processor currently allows. For instance, stock prices are often quoted down to one one-thousandth of a penny. Therefore, you may have a price element that looks like this:
This price element can be typed within the schema. However, the "fixed.14.4" data type usually used for currency will not be adequate. The best way to describe this value using the simple data types available in the MSXML parser is "float":
<ElementType name="price" dt:type="float" xmlns:dt="urn:schemas-microsoft-com:datatypes"/>
The "float" data type, however, isn't accurate enough, as the application using this data requires no more than 5 digits to the right of the decimal point. To better describe the "price" element, we can extend the schema to include a "type" reference from another namespace:
<ElementType name="price" ext-dt:type="currency" xmlns:ext-dt="urn:extensions:datatypes"/>
Within the application that processes the "price" element, we can now write code that understands what "currency" means, in this context, and validates the data accordingly.
Having annotated your schema with additional type information, it is useful to add comments explaining the meaning of the annotations. Because a schema is an XML document, you could add the standard XML comments. However, there is another mechanism for adding further meta-information: the "description" element. This can be placed in the schema anywhere that an element can be placed:
<ElementType name="price" ext-dt:type="currency" xmlns:ext-dt="urn:extensions:datatypes"> <description>The currency type can have no more than 5 digits to the right of the decimal point.</description> </ElementType>
There are two entry points into the schema: Loading the schema directly through the URL of the schema, and accessing the type declarations through the definition property.
Loading the schema directly gives you a tree, just as loading any XML file does. This frees you from the read-only restrictions placed on the schema when it is used to validate an XML node.
The "definition" property on a node returns the XMLDOMNode representation of the type declaration for the instance node. The XMLDOMNode is still part of the tree created when the schema was parsed; therefore, it is possible to navigate the schema from there using the object model. In addition, you can call the ownerDocument property on the XMLDOMNode returned by the definition property and get an XMLDOMDocument node representing the entire schema used to validate the instance node:
'PriceElem is set to the first price element Set PriceElem = XMLDoc.selectSingleNode("//price") 'PriceType is set to the ElementType declaration describing the price element Set PriceType = PriceElem.definition 'PriceSchema is set to the in-memory representation of the Schema Set PriceSchema = PriceType.ownerDocument
As I mentioned before, when retrieving an in-memory representation of a schema in this fashion, the XMLDOMDocument node representing the schema will be read-only. To get a read-write representation, either add the documentElement of the node to an empty XMLDOMDocument:
or load the document using the load method:
The load method on the XMLDOMDocument object can take an XMLDOMDocument node as its XML source.
In the following example, I use the "definition" property to analyze the type declarations for each of the elements within an element describing a stock transaction. I then navigate the type declaration, searching for annotations that I can understand, and process the data accordingly. The goal of the application is to validate values, within the instance document, that are of a type "currency" within the "urn:extensions:datatypes" namespace.
I first load the following XML document into memory:
<ind:transaction xmlns:ind="x-schema:http://appServer/transSchema.xml"> <ind:date>1998-10-13</ind:date> <ind:action>SELL</ind:action> <ind:quantity>30.0000</ind:quantity> <ind:symbol>HCRP</ind:symbol> <ind:description>HelloWorld Corp.</ind:description> <ind:price>41.18750</ind:price> <ind:amount>1235.62500</ind:amount> <ind:comm>19.95</ind:comm> </ind:transaction>
At "http://appServer/transSchema.xml" is the following XML schema:
<Schema xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes" xmlns:ext-dt="urn:extensions:datatypes"> <ElementType name="date" dt:type="date"/> <ElementType name="action" dt:type="string"/> <ElementType name="quantity" dt:type="float"/> <ElementType name="symbol" dt:type="string"/> <ElementType name="description" dt:type="string"/> <ElementType name="price" dt:type="float" ext-dt:type="currency"/> <ElementType name="amount" dt:type="float" ext-dt:type="currency"/> <ElementType name="comm" dt:type="fixed.14.4"/> <ElementType name="transaction" content="eltOnly" model="open"> <element type="date"/> <element type="action"/> <element type="quantity"/> <element type="symbol"/> <element type="description"/> <element type="price"/> <element type="amount"/> <element type="comm"/> </ElementType> </Schema>
When I load the instance document, the MSXML parser validates the document against the part of the schema that it understands. As a result, the price and amount elements are both validated and typed as "float". The "ext-dt:type" attributes on both type declarations are ignored by the parser.
To make use of the "ext-dt:type" attributes, my application must access the information by navigating the schema. The first step is to locate the elements that are singled out by the annotations as being of the type "currency". This is done in the ValidateTree Sub routine:
Private Sub ValidateTree(xmlElem As IXMLDOMElement) Dim xmlAtt As IXMLDOMAttribute Dim nextElem As IXMLDOMElement 'Search attributes on the type declaration of the elem for attributes 'in the "urn:extension:datatypes" namespace For J = 0 To xmlElem.definition.Attributes.length - 1 Set xmlAtt = xmlElem.definition.Attributes.Item(J) 'If you find an attribute of the "urn:extension:datatypes" namespace '(an ext-dt:type attribute), then pass the value of that attribute '(in this case, "currency") and the value of the element to the 'ValidateElem Sub If xmlAtt.namespaceURI = "urn:extensions:datatypes" Then ValidateElem xmlElem.Text, xmlAtt.nodeValue End If Next 'Now do the same with the element's children if there are any If xmlElem.hasChildNodes = True And xmlElem.childNodes.Item(0).nodeType = 1 Then For I = 0 To xmlElem.childNodes.length - 1 Set nextElem = xmlElem.childNodes.Item(I) ValidateTree nextElem Next End If End Sub
Once an element of an extension type has been found, the value of that element must be validated against the extension type. To do this, the extension type must first be determined. This is done in the ValidateElem subroutine:
Private Sub ValidateElem(str As String, valType As String) 'Select the validation function that matches the type declared by the 'ext-dt:type attribute Select Case valType Case "currency" ValCurrency str Case Else fail End Select End Sub
Now that it has been determined that the element value is to be of type "currency", we can check to see whether the value of the instance node is of the correct type:
Private Sub ValCurrency(valStr As String) On Error Resume Next Dim decStr As String decStr = Mid(valStr, InStr(valStr, ".") + 1) If Err.Number <> 0 Or decStr > 100000 Then fail End If End Sub
The great thing about schemas is that they are just XML documents. This allows you to extend them as you would any XML grammar and access them as you would any other XML document. Through annotations, I can further describe my data so my processor can better understand it. Through the object model, I can get at the annotated schema to process the data accordingly.
Charlie Heinemann is a program manager for Microsoft's XML team. Coming from Texas, he knows how to think big.