MSDN Magazine > Issues and Downloads > 2002 > January >  XML Files: Object Graphs, XPath, String Compari...
From the January 2002 issue of MSDN Magazine
MSDN Magazine
Object Graphs, XPath, String Comparisons, and More
Aaron Skonnard
Download the code for this article: XML0201.exe (110KB)
Browse the code for this article at Code Center: XPath and XSLT ISAPI Extensions


Q Can you explain to me the standard approach for serializing object graphs as XML without duplicating data that is referenced from multiple places?
Mike Beard

A Since the XML data model is based on a tree structure, it's not possible to model certain object graphs or primary/foreign key relationships without the assistance of out-of-band references. Both XML 1.0 and XML Schema provide ID and IDREF[S] types to assist with modeling these non-tree-based structures. The sample document in Figure 1 illustrates two person elements that are related based on their spouse attribute references.
      There is plenty of support for this approach scattered throughout the layered APIs. For example, the DOM API provides a getElementsById method, which dereferences ID/IDREF[S] values—this method requires the document to have a Document Type Definition (DTD) or schema. The following JScript® function shows an example that uses getElementById to return the given person's corresponding spouse element node (assuming the XML document in Figure 1):
function getSpouse(personNode) {
  var s = personNode.selectSingleNode("spouse/@href");
  return personNode.ownerDocument.getElementById(s.text);
}
      XPath also provides a built-in id function for the same purpose. Like getElementById, it also relies on the ID/IDREF[S] information found in the document's DTD or schema. It can be used directly within XPath expressions, as shown in this simple XSLT template that outputs person element information:
<xsl:template match="person">
  Name: <xsl:value-of select="name"/>
  Spouse: <xsl:value-of select="id(spouse/@href)/name"/>
</xsl:template>
      The SOAP 1.1 specification codifies this approach by defining a standard encoding scheme for object graphs that can only be expressed through references like these. The SOAP encoding uses id and href attributes for describing these relationships, as shown earlier. For example, consider the following method signature that compares two Person objects:
int Compare(Person p1, Person p2);
      If p1 and p2 reference the same object for a given invocation, the SOAP encoding for the request message would look like the code that's shown in Figure 2. See Section 5 of the SOAP 1.1 specification (http://www.w3.org/TR/SOAP) for more on the encoding details and a few additional examples.

Q Considering the XML Schema document in Figure 3, is there any way to write an XPath expression that identifies all the type attributes that contain values from the urn:www-develop-com:invoices namespace?

A The Namespaces-in-XML and Infoset specifications treat only element and attribute names as namespace-qualified. As a result, the layered technologies like DOM, SAX, XPath, and XSLT only provide methods for dealing with such names with respect to namespaces. XML Schema, however, introduced a new data type called QName which makes it possible to namespace-qualify simple text values.
      As an example, the type attributes shown in Figure 3 contain namespace-qualified text values. If the text value contains a prefix, the prefix is resolved against the set of in-scope namespace declarations. If the text value doesn't contain a prefix, it's associated with the in-scope default namespace, if any. If there isn't a default namespace in scope and the text value doesn't contain a prefix, it's not associated with any namespace, just like a text value that's not of type QName.
      In this particular example, there are four type attributes whose values come from the urn:www-develop-com:invoices namespace. The local element declarations for LineItem and LineItems are of type tns:LineItem and tns:LineItems respectively, and tns is bound to urn:www-develop-com:invoices in a namespace declaration found above. Also, the local element declaration for Sku and the global element declaration for Invoice are of type Sku and Invoice respectively (no prefix), and the in-scope default namespace is also urn:www-develop-com:invoice.
      Writing an XPath expression to identify the type attribute values that come from the urn:www-develop-com:invoices namespace is a little tricky. You might first be tempted to write an XPath expression that hardcodes the tns prefix as follows:
//@type[substring-before(.,':')]='tns']
While this would identify the two type attributes whose values are prefixed with tns, it obviously wouldn't identify the other two type attributes whose values are scoped by the default namespace. It makes more sense to write an expression that doesn't care what prefixes (if any) are used, but rather what the prefixes actually represent—namely, the namespace URI.
      Accomplishing this requires the XPath namespace axis. The namespace axis contains a node for each in-scope namespace declaration available on a given element node. (Note that the namespace axis only returns nodes for element nodes.) Namespace nodes have the XPath properties listed in Figure 4.
      With the namespace axis, it's now possible to write an XPath expression that identifies the type attributes whose values are scoped by the namespace urn:www-develop-com:invoices, regardless of the prefix used (if any):
//@type[../namespace::*[local-name()=
   substring-before(../@type,':') and 
   .='urn:www-develop-com:invoices']]
This expression identifies all four type attributes, even those whose values are scoped by the default namespace urn:www-develop-com:invoices (see Figure 5).

Figure 5 Default Namespace
Figure 5 Default Namespace

      Since using QNames in simple text values is becoming more common, MSXML 4.0 introduced a new extension function called namespace-uri, bound to the Microsoft® extension namespace urn:schemas-microsoft-com:xslt. This is the function signature:
string ms:namespace-uri(string)
It accepts a string of type QName and returns the appropriate namespace identifier. This makes it possible to greatly simplify the previous expression:
//@type[ms:namespace-uri(.)='urn:www-develop-com:invoices']
This expression will only work with MSXML 4.0 today, but chances are good that something like it will find its way into XPath 2.0.

Q Is there any way for me to do a string comparison that will be case-insensitive using XPath?

A Since XML 1.0 is completely case-sensitive, so is XPath 1.0. In other words, there are no built-in functions for doing case-insensitive string comparisons. In XPath, whenever performing comparisons against node-sets, the string-value of each node in the node-set is tested in a case-sensitive manner.
      XPath does provide a bit of a back door, however, through the translate function. Translate allows you to replace certain characters in the supplied string with other characters. If you used translate to replace all lowercase characters with the corresponding uppercase characters (or vice versa), you could use it to perform case-insensitive comparisons, as shown here:
//LineItem[translate(Description,'abcdefghijklmnopqrstuvwxyz',
'ABCDEFGHIJKLMNOPQRSTUVWXYZ') = translate($desc,
'abcdefghijklmnopqrstuvwxyz','ABCDEFGHIJKLMNOPQRSTUVWXYZ')]
The example just shown performs a case-insensitive comparison on each LineItem's child Description element and the supplied $desc variable.
      As noted in the XPath specification, this solution is not sufficient for all languages and does present performance issues. To provide an alternate solution, MSXML 4.0 introduced another XPath extension function called string-compare (also defined in the urn:schemas-microsoft.com:xslt namespace), which addresses these problems. The signature is shown here:
number string-compare(string x, string y 
                      [,string language [, string options]])
      Ms:string-compare compares the strings x and y based on the language and options supplied. It returns 0 if the strings are equal, -1 if x < y, and 1 if x > y. The language string accepts a language identifier for determining the sort order (such as en-US or fr-CA). By default, the comparison is case-sensitive with lowercase characters coming first. The u option makes uppercase characters come first, while the i option makes it case-insensitive. This makes it possible to rewrite the previous expression as follows:
   //LineItem[string-compare(Description,$desc,
       'en-US','i')=0]
Q How do you go about comparing date/time values in XPath?

A Unfortunately, there is no reliable way in XPath 1.0 to sort or compare date/time values lexicographically. Because of this, when serializing date/time values into an XML document, most developers choose a date/time string format that can be compared lexicographically (often using string comparisons) like the Coordinated Universal Time (UTC) format.
      In addition to the XPath extension functions already mentioned, MSXML 4.0 also introduced three new functions for dealing with date/time string values. Each of these functions, described in Figure 6, takes an XML Schema date/time lexical representation and converts it into another representation that suits your needs.
      The UTC format is the most effective for doing date/time sorting. The following XSLT snippet illustrates how to sort a set of Invoice elements based on a date/time value:
<xsl:for-each select="//Invoice">
  <xsl:sort select="ms:utc(child::Date)"/>
    ... <!-- process sorted set here -->
  </xsl:sort>
</xsl:for-each>
Q Does Microsoft Internet Information Services (IIS) have any built-in support for XML, XPath, or XSLT?

A There currently aren't any built-in ISAPI extensions for XML, XPath, or XSLT, but some would be quite useful. I've included two sample ISAPI extensions, xptrext and xsltext, in the code download that provide simple XPath and XSLT functionality, respectively (see the link at the top of this article).
      If you associate xptrext with all .xml files in a given virtual root (via the IIS Application Configuration dialog), it then expects you to pass an XPath expression in the query string, as shown here:
http://localhost/somevroot/people.xml?//person[@id='p123']
This request then returns the subset identified by the XPath expression instead of the entire document.
      If you've associated xsltext will all .xsl/.xslt files in a given vroot, the extension then expects you to POST an XML document as part of the HTTP request. The extension loads the MSXML XSLT processor and transforms the posted document using the XSLT file specified in the URL.

Q Does MSXML support XInclude?
Joose Reede

A No, MSXML 4.0 doesn't currently support XInclude, which is a new specification being worked on in the W3C related to a general inclusion mechanism. In the November 2000 XML Files, I provided an XInclude implementation in the form of a SAX filter, which can be used in combination with MSXML 4.0, as shown in Figure 7.
      It's also trivial to build a simple XInclude implementation (that just does XML inclusion, ignoring XML Base and the parse attribute) using XSLT (see Figure 8).

Q I expect the expression //Price[1] to return the first Price element in the document, but it doesn't. Why not?

A The reason that //Price[1] doesn't return the first Price element in the document has to do with the // abbreviation. This abbreviation actually expands to /descendant-or-self::node()/, which means the following expression is identical to //Price[1]:
/descendant-or-self::node()/child::Price[1]
      Once you expand // to the verbose form, it becomes more apparent that //Price[1] identifies each Price element that happens to be the first child of its parent node, not the absolute first Price element in the document, which can be identified by the following:
/descendant::Price[1]
      In general, this issue shows its ugly face whenever you use // in combination with position-based predicates.

Q Why doesn't the following identify attribute nodes?
node()
A The answer to this question can again be found by understanding abbreviations. The relative expression
node()
is shorthand for
child::node()
Attribute nodes are not part of the child axis (or any of the other hierarchy-based axes like descendant, ancestor, following, preceding, and so forth). Attribute nodes are only identified through the attribute axis, which can be abbreviated to the @ symbol.

Q How do I find all nodes at the same level in the tree using XPath?

A Assuming that you want to identify all nodes at a given level and not just sibling nodes, you need to use the ancestor axis. The following XPath expression identifies all nodes that have three nodes above them, including the abstract root node:
/descendant::node()[count(ancestor::node()) = 3]
Q I've noticed that there are two axes for finding sibling, preceding-sibling and following-sibling, but is there also a way to return all of a node's siblings?

A There isn't a single axis that returns all of a node's siblings, but you can use preceding-sibling and following-sibling together through the union operator, as shown here:
preceding-sibling::node() | following-sibling::node()
      Another way to identify all siblings is to ask for all children of the context node's parent:
../node()
Of course, this also identifies the context node, not just its siblings, as the previous example does.

Q I want to build a DHTML-based menu with XSLT. Is that possible for me to do?
Rahul
India

A Absolutely. The main benefit of designing your navigation bars with an XML/XSLT solution is that you can reuse the technology in all future Web applications. Figure 9 shows a sample XML document that describes a menu's content, while Figure 10 shows the XSLT that generates the simple DHTML-based expand/collapse tree menu shown in Figure 11. Notice that the XSLT transformation shown in Figure 10 uses the WD syntax, so it can be used with Microsoft Internet Explorer 4.0 or 5.0-based applications that don't have MSXML 3.0 available.

Figure 11 Tree Menu
Figure 11 Tree Menu

Q In XPath, how do I express the position of the parent of a node? I guess it must be something like ../position(), but this doesn't work.
Gerd Mueller

A The position function returns the one-based position of the context node, and it cannot be used as a complete location step as shown in your question. Asking for the position of a node's parent doesn't make a lot of sense since it will always return 1. You typically use the position function inside of XPath predicates (such as parent::node()[position()=1]) or inside of XSLT test expressions (like those used in if/choose statements).

Q How do I control the encoding that is used when an XSLT program outputs a file?

A XSLT provides the xsl:output element for specifying how you would like the processor to serialize the output document. The xsl:output element is a top-level element, meaning it's used directly within the transform/stylesheet element. The following transformation indicates that the processor should use UTF-16, omit the XML declaration, and pretty-print the output tree with indenting:
<xsl:transform version='1.0'   
  xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
  <xsl:output encoding='utf-16' omit-xml-
       declaration='yes'indent='yes'/>
   ... <!-- rest of transformation goes here -->
</xsl:transform>
Q XML Schema makes it possible to specify that an element can occur between, say, two and five times in a content model through the minOccurs/maxOccurs attributes. Is this possible with DTDs?

A DTDs only support the following occurrence modifiers: *, +, and ?, which represent 0 or more, 1 or more, and 0 or 1, respectively. It is possible, however, to simulate XML Schema's minOccurs/maxOccurs attributes as shown here:
<!ELEMENT foo (bar, bar, bar?, bar?, bar?)>
This element declaration specifies that the foo element contains between two and five child bar elements.

Q Why is it that selectNodes doesn't work with sum(//Price)?

A The selectNodes and selectSingleNode functions available in MSXML only accept XPath expressions that return node-sets. The expression sum(//Price) returns a number, not a node-set, so it cannot be used with either of these methods. The Microsoft .NET XPath implementation, however, provides two methods, Select and Evaluate, which return node-set and non-node-set values, respectively.

Q How can I write an XPath expression that behaves like the SQL SELECT DISTINCT statement?

A The following expression identifies all Sku elements in the document, excluding those with duplicate string-values.
/descendant::Sku[not(preceding::Sku = .)]
Notice the use of the not function instead of the != operator. Using != does not return the same result because != is true if there is at least one preceding Sku whose string-value is not equal. Using the not function in combination with = returns only those Skus where there isn't a single preceding Sku where the string-value is equal to that Sku.

Q Does MSXML 3.0 provide schema support? Are there any other tools in place for working with XML Schemas today?

A MSXML 3.0 does not support the XML Schema Recommendation (XSD), but it does support an XML Schema predecessor called XML-Data Reduced (XDR), which most Microsoft documentation refers to loosely as "schemas." The syntax is similar in many ways, but XDR is not nearly as powerful as XSD. The recently released MSXML 4.0 introduced support for XSD and provides several extensions to its DOM, SAX, and XPath implementations for reflecting against XSD information at runtime. See my December 2001 column for more details.
      The Microsoft SOAP Toolkit 2.0 also supports XSD in combination with the Web Services Description Language (WSDL). The SOAP Toolkit uses WSDL to expose the types in COM type libraries as well as to dynamically build client-side proxies that can transparently call the exposed COM components via SOAP.
      Likewise, System.XML in .NET supports XSD and WSDL for building transparent Web services, and it also offers many additional XSD-related tools and utilities. One such tool worth investigating is xsd.exe, which makes it possible to generate C# or Visual Basic® .NET class definitions from XSD; it can also reflect against an assembly and generate the appropriate schema for the contained types. With this type of predefined type-system mapping, it then becomes possible to automatically map between .NET objects and XML documents without touching XML APIs in the raw (see System.Xml.Serialization.XmlSerializer in the .NET Framework documentation for more details). Visual Studio® .NET also comes with an XML Schema-driven XML editor that provides context-sensitive IntelliSense® (see Figure 12).

Figure 12 Visual Studio .NET XML Editor with IntelliSense
Figure 12 Visual Studio .NET XML Editor with IntelliSense

Q Is it possible to write an XSLT program that transforms multiple input documents into a single output document? Is it also possible to write an XSLT program that can transform a single input document into multiple output documents?

A Yes. XSLT provides the document function for processing multiple input documents during a transformation. The document function takes a URI that identifies the desired document resource. The document function may be used in combination with for-each or apply-templates, as shown here:
<xsl:transform 
 xmlns:xsl='http://www.w3.org/1999/XSL/Transform'   
 version='1.0'>
  <xsl:template match='/'>
    <output>
      <xsl:apply-templates select='/person/name'/>
      <xsl:apply-templates 
         select='document("person2.xml")/person/name'/>
    </output>
  </xsl:template>
  ... <!-- other templates omitted -->
</xsl:transform>
      Transforming the input document into multiple output documents isn't currently supported by the XSLT 1.0 specification, although it's on the list for version 2.0. However, several XSLT 1.0 processors provide extension mechanisms for multiple output documents today. For example, Saxon provides the saxon:output element, while Apache's Xalan provides the xalan:write element.
      MSXML 3.0 doesn't currently provide this built-in functionality, but since it does allow for user-defined script/extensions (via msxml:script), you could easily write your own mechanism (see Figure 13). This transformation creates a separate output document for each chapter element in the book document.

Q How do you output the HTML code for a non-breaking space (&nbsp;) from XSLT?
Evgeny Grischenko

A When the output method for the stylesheet is HTML, you can simply use the character reference for nonbreaking space (&#160;). You can simplify this by adding a general internal entity that maps to "&#160;" (see Figure 14).
      While other processors support what I just described, MSXML 3.0 and 4.0 don't currently translate "&#160" to "&nbsp;" in their implementations of the HTML output method. However, the following mechanism that turns off escaping will work with all processors (even if the output method is not HTML):
<xsl:text 
  disable-output-escaping="yes">&amp;nbsp;</xsl:text>
Q What's the easiest way to write a transform that does nothing more than reorder a group of elements?

A The easiest way to do this is to use the identity transformation in conjunction with xsl:sort. The identity transformation copies all nodes to the output tree without changes:
<!-- identity transformation -->
<xsl:template match="node()|@*">
  <xsl:copy>
    <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
</xsl:template>
      I used the identity transformation in the XSLT-based XInclude implementation shown earlier. With this in place, all you need to do is call xsl:sort after the original call to xsl:apply-templates:
<xsl:template match="/">
  <people>
    <xsl:apply-templates select="person">
      <xsl:sort select="age"/>
    </xsl:apply-templates>
  </people>
</xsl:template>
This example sorts all of the person elements by the value of the child age element and then copies them to the output tree.

Q Is there a way to retrieve the nodes in my DOMDocument in a sorted order using the selectNodes method?

A XPath doesn't provide a mechanism for sorting node-sets—this is the reason that XSLT provides the xsl:sort element (discussed in the previous question). However, you could always write your own sorting algorithm using the DOM API:
set nl = doc.selectNodes("//person")
sortNodeList nl, "name"
' use sorted nl here
Q Can SAX be used to generate XML documents?

A SAX can be used to both generate and consume XML documents. The SAX ContentHandler interface models the core Infoset. Applications that need to consume documents will implement ContentHandler, while applications that need to generate documents will consume (call into) ContentHandler. In many situations the documents may never even be serialized as XML 1.0.
      However, if you do want to use SAX to generate XML 1.0 document streams, you need to use a ContentHandler implementation that provides this functionality. I provided such an implementation in my November 2000 column in a class called CSerializer. MSXML 3.0 and 4.0 introduced a similar helper class called MXXMLWriter that can be used as follows:
Dim mx as New MXXMLWriter
Dim ch as IVBSAXContentHandler
Dim atts as new SAXAttributes
set ch = mx

s.startDocument
s.startElement "urn:person", "person", "p:person", atts
•••
      There is also a related implementation called MXHTMLWriter that can be used to output HTML streams. This implementation makes the appropriate adjustments for HTML document serialization, much like the XSLT HTML output method does.

Q I've been playing with XML in .NET and noticed that it doesn't provide a SAX implementation? Was this intentional?

A Yes, XML in .NET provides an improved streaming API that offers all of the same performance benefits as SAX but is much easier to use. SAX ContentHandler offers a push model interface while .NET's XmlReader offers a pull model interface (much like a forward-only database cursor). XmlReader offers all of the same streaming benefits as SAX, but for 9 out of 10 common XML programming tasks, the pull model design turns out to be simpler and more intuitive, especially for developers used to working with the common database APIs.
      For more details on the differences between SAX and XmlReader, see XML in .NET: .NET Framework XML Classes and C# Offer Simple, Scalable Data Manipulation and the September 2001 installment of the XML Files.

Q Why is it that sometimes after successfully calling load, calls to selectNodes return nothing when I'm sure that the XPath expression is correct?

A This sounds like a side effect of asynchronous loading. MSXML uses an asynchronous loading mechanism by default for working with large or remote documents (such as HTTP). You can override this behavior by setting the async property of the IXMLDocument interface to false before calling load:
Dim doc as New DOMDocument30
doc.async = false
If (doc.load("http://www.develop.com/courses.xml")) Then
  ' when load returns, the document has finished loading
End If
      This is one of the most common problems that people encounter when they begin working with MSXML and it can manifest itself in a variety of ways. So unless you really want to load the document asynchronously, always set async to false before loading the document.

Send questions and comments for Aaron to xmlfiles@microsoft.com.
Aaron Skonnard is an instructor/researcher at DevelopMentor, where he develops the XML and Web service-related curriculum. Aaron coauthored Essential XML Quick Reference (Addison-Wesley, 2001) and Essential XML (Addison-Wesley, 2000). Get in touch with Aaron at http://staff.develop.com/aarons.

Page view tracker