This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

MSDN Magazine

Addressing Infosets with XPath

Aaron Skonnard

In my May 2000 column, I reviewed the set of W3C recommendations that most developers consider the current core of XML. The XML Information Set (Infoset) specification was probably new to most readers, including those already actively using XML. This lack of awareness mostly stems from the fact that XML was originally defined in terms of a concrete syntax, as opposed to an abstract data model. This makes it hard to approach XML without first thinking about angle brackets. The Infoset specification (https://www.w3.org/TR/xml-infoset) takes a few steps back and standardizes the abstract data model implied by the XML 1.0 and Namespaces recommendations.
      As most developers have experienced firsthand, thinking in terms of higher-level abstractions can greatly simplify working with data and building more complex systems. The XML space is no exception to this universal truth. Consider the following XML document that models car information:

  <?xml version='1.0'?>
  
<?play jingle horn.wav?>
<cars xmlns='urn:schemas-develop-com:cars'>
<car make='BMW'>
<model>323i</model>
<year>2000</year>
<price>26,990</price>
</car>
</cars>
<!-- end of cars -->

 

      Working with this document structure without higher-level abstractions requires you (and the solution) to have intimate knowledge of the grammar productions in XML 1.0 and of Namespaces, something most developers prefer to avoid. This also makes it impossible to address the document structure using a simple and uniform syntax that is not tied to XML 1.0 serialization.
      The Infoset data model defines the tree-oriented nature of XML in terms of information items that have one or more named properties. The Infoset's syntax-free abstraction makes it possible to model well-formed XML documents in a way that's completely independent of the concrete syntax. With the Infoset's data model securely in place, it's now possible to create and use a uniform syntax for addressing document structures. The XPath 1.0 recommendation (https://www.w3.org/TR/xpath) defines such a syntax.

CXPath 1.0

      The XPath recommendation defines the official syntax for addressing Infoset subsets. The Infoset data model implies hierarchical relationships (child, parent, descendant, ancestor, and so on) between the information items that make up a document. XPath uses these implied relationships along with other filtering constructs to assist in identifying portions of the document (for example, find all child car elements whose make='BMW'). Before XPath, the only similar mechanisms available to developers were a few Document Object Model (DOM) APIs (getElementsByTagName, childNodes, and so on) and ID/IDREF relationships, which are of limited utility in terms of generic document addressing.
      MSXML has supported an addressing syntax since the selectNodes and selectSingleNodes methods were first introduced to the IXMLDOMNode interface with MSXML 2.0. The original syntax was referred to as XSL Patterns and was really just an initial implementation of the specification that led to XPath 1.0. MSXML 2.6 and above provide support for both XSL Patterns and XPath 1.0. MSXML 2.6 and above provide support for XSL Patterns as well as some support for XPath 1.0. MSXML 2.6 implements a significant portion of the XPath 1.0 specification, and when the next version of MSXML is released it is planned to be fully compliant.
      Today, the selectNodes and selectSingleNode methods are used for executing either an XSL Pattern or an XPath expression against a given node in the DOM tree. For backward compatibility, the default selection language is XSL Patterns. To specify XPath as the selection language, use the new setProperty method:

  dim doc as MSXML2.DOMDocument30
  
set doc = new MSXML2.DOMDocument30
doc.setProperty "SelectionLanguage", "XPath"
var sel = cars.selectNodes("/cars/car/model");
for (i=0; i<sel.length; i++)
UseThisNode(sel.item(i));

 

      After looking at this DOM example, it's important to point out that XPath has absolutely no ties to any APIâ€"or any other language for that matter. It simply defines how to address Infoset subsets. XPath can be just as useful in Simple API for XML (SAX) environments or other high-level XML languages like XSL Transformations (XSLT).
      XPath's name reflects its use of a path-based syntax. If you've ever used a command prompt or typed a URL into your browser, you're already familiar with the basics of XPath expressions. The following is an example of a simple XPath expression that identifies all of the model elements that are children of car elements that are children of the root cars element.

  /cars/car/model
  

 

      This XPath expression defines the navigation process through the document's Infoset and identifies a specific set of nodes (commonly referred to as a node-set). However, because the Infoset still isn't finalized (it's currently a W3C working draft), XPath defined its own abstraction, known as the XPath tree model, for use throughout the language.
      The XPath tree model defines several node types, including root, element, attribute, text, comment, processing-instruction, and namespace nodes. Although it's pretty intuitive to see how these node types map back to the Infoset definitions, Appendix B of the XPath specification describes these mappings explicitly. One thing to note from this mapping is that XPath models a sequence of character information items as a single text node to simplify the addressing of element content.

Figure 1 XPath Tree Model

Figure 1 XPath Tree Model

       Figure 1 shows the XPath tree model for the XML document shown earlier. If you apply the sample XPath expression to this tree model, the result is pretty clear: it identifies the model element.
      It's interesting to note that XPath, unlike most other XML technology layers (such as XSLT, XML Schemas, and so on), does not use XML as part of the syntax. Although the same type of functionality could have been achieved through an XML-based addressing vocabulary, a standard path-based syntax is already very intuitive for most developers and much more compact. This facilitates using XPath expressions in URI fragment identifiers and XML attribute values.

General Expressions

      The most basic construct in XPath is the expression. (See the information on producing Exprs in the XPath specification for details not covered here.) An XPath expression can contain location paths (like the one shown earlier), function calls, variable references, unions of sets, comparisons, mathematical operations, and so on. The supported expression constructs are quite complete for this level of addressing syntax.
      An XPath expression is evaluated to yield one the following four object types: node-set (an ordered collection of nodes), Boolean (true/false), number (floating point), and string (a sequence of UCS characters). For example, the following XPath expressions yield node-sets:

  /cars/car/model
  
car/model | car/price | car/year

 

These expressions identify sets of nodes in the tree model. Notice that through the | operator the second expression combines the union of three location path results to form a new node-set.
      Here are some XPath expressions that return Boolean results:

  starts-with(model, '32')
  
car/price < 27,000

 

The first expression here consists of one simple function call that returns a Boolean while the second function contains a simple Boolean comparison.
      As mentioned, XPath expressions can contain mathematical operations and can yield numeric results. The following expressions yield numbers by calculating the total price of a car (including the price of all options), and the average price of all cars, respectively:

  car/price + car/options-total
  
sum(car/price) div count(car/price)

 

      XPath expressions can also yield strings. For example, the following expression returns the concatenation of the make and model:

  concat(car/make, car/model)
  

 

      In certain situations, it's necessary to coerce an object from one type to another. XPath defines how any of the four types can be coerced into a Boolean, number, or string. During expression evaluation, it's possible that objects will implicitly coerce from one type to another (this actually happened in several of the previous examples). For example, the following equality expressions rely on implicit coercions of the node-set to the appropriate type:

  car/model = '323i'
  
car/price = 26690

 

The node-set identified by car/model must first be coerced to a string before comparison against the literal value ('323i') takes place. The node-set identified by car/price must likewise be converted to a number prior to the comparison. In both situations, the coercion happens implicitly as part of the evaluation process.
      XPath also provides a set of functions for defining explicit coercions within expressions. These functions are named Boolean, number, and string. For example, you can rewrite the previous examples to use explicit coercions as follows:

  string(car/model) = '323i'
  
number(car/price) = 26690

 

These work exactly the same way as in the implicit case.
      It's fairly intuitive to see how coercion takes place between Boolean, number, and string types. What's not so intuitive, however, is how a node-set is coerced into one of the other three types. When coercing a node-set to a Boolean, the result is false if the node-set is empty, and true otherwise. When coercing a node-set to a number, the node-set is first coerced to string, then the string is coerced to a number. When coercing a node-set to a string, the string value of the first node in the set is returned.
      The XPath specification defines how the string value is evaluated for each of the different node types. Evaluating the string value for attribute, text, comment, and processing-instruction nodes yields the value of each node type, as you would expect. The string value for the root/element nodes, on the other hand, yields the concatenation of all descendant text nodes.
      Note that XPath expressions may contain variable references. A variable reference takes the form of $name and can exist anywhere within the simple expressions shown. For example, the following expression compares the model's value to a string contained within the variable m:

  car/model = $m
  

 

      The XPath specification does not address how variables are declared or defined. Different runtime environments that use XPath may make it possible to declare variable bindings. For example, XPath is used quite extensively in XSLT 1.0, which supports the notion of named parameters and variables. The following simple transform defines the parameter m and references it within an XPath expression that is part of an XSLT if statement:

  <xsl:transform 
  
xmlns:xsl="https://www.w3.org/1999/XSL/Transform"
version='1.0'>
<xsl:param name='m'/>
<xsl:if test='car/model = $m'>
Yes, this is the model!
</xsl:if>
...
</xsl:transform>

 

If XPath is used in other environments where variable bindings are also available, they can likewise be taken advantage of.
      You've seen examples of most XPath expression types (equality, mathematical, location path, and so on). The most important of all these expression types is the location path.

Location Paths

      A location path expression always yields a node-set. This type of expression consists of one or more location steps delimited with a forward slash (/). If a location path begins with a forward slash it's considered an absolute location path, otherwise it's a relative location path. For example, the following expression is an absolute location path consisting of three location steps:

  /cars/car/model
  

 

      In the case of absolute location paths, evaluation begins at the root of the tree model. A relative location path, however, is evaluated relative to the node where the expression was applied. As an example, applying the following relative location path to the cars node produces the same result as the previous absolute example:

  car/model   (relative to /cars)
  

 

      The following snippet illustrates how to execute the previous relative expression against a specific DOM element node (assuming cars is a reference to the cars element node) using MSXML:

  var sel = cars.selectNodes("car/model");
  
for (i=0; i<sel.length; i++)
UseThisNode(sel.item(i));

 

      As another example, when XPath is used within XSLT, relative expressions are always evaluated against the current context of the containing XSLT construct (template, for-each, and so on).

Location Steps

      Each location step in a location path consists of three parts: the axis identifier, the node test, and a set of predicates, as shown here:

  axis::node-test[predicate1][predicate2]...
  

 

The axis identifier and the node test are always separated by a double colon, and each predicate is contained within square brackets. The location path examples shown thus far have not contained an axis identifier or predicate expressions because they were using an abbreviated expression form (which I'll cover toward the end of this column).
      A location step defines how to identify a set of nodes relative to a context node. The axis identifies a group of nodes that have a specific hierarchical relationship to the context node. The defined axes include self, child, parent, descendant[-or-self], ancestor[-or-self], following[-sibling], preceding[-sibling], attribute, and namespace. (At this time MSXML doesn't support the namespace, following, and preceding axes.)
      The self, child, parent, descendant[-or-self], and ancestor[-or-self] axes simply identify the nodes with the specified relationship to the context node. The next axis identifies all nodes that come after the context node (in document order), excluding descendants. The ancestor axis identifies all nodes that come before the context node (in document order), excluding ancestors. The ancestor-or-self variant simply restricts the node-set to sibling nodes. The attribute and namespace axes identify the context node's attributes and namespace nodes, respectively. The self, descendant, ancestor, preceding, and following axes completely partition the document without any overlap.
      The axis identifier identifies the pool of nodes that the rest of the location step filters through. The next part of the location step, the node test, defines the first of these filters. The node test can filter nodes either by name or by type. When performing node tests by name, the node in question must not only have the specified name, but must also match the principal node type of the specified axis. The principal node type for all axes, except for attribute and namespace, is element node. The principal node types for the attribute and namespace axes are attribute and namespace, respectively.
      For example, the following location step identifies all child element nodes (of the context node) that have a name of cars:

  child::cars
  

 

Node tests by name can also take qualified names (QNames), as shown here:

  descendant::c:cars
  

 

This location step identifies all descendants (of the context node) that have a qualified name matching c:cars (where c is the namespace prefix).
      The wildcard character can also be used to match nodes of the axis' principle node type regardless of name. The following example selects all attribute nodes of the context node.

  attribute::*
  

 

      Node test can also be performed by type. There is a type test for each of the node types in the XPath tree model, including text, comment, processing-instruction, and node. For example, the following location step selects all descendant text nodes (of the context node):

  descendant::text()
  

 

The node test can be used to identify all nodes within the specified axis regardless of type:

  child::node()
  

 

This location step identifies all of the context node's child nodes regardless of the type (although this would not return attribute nodes since they're not technically considered child nodes).
      As you can see, the axis identifies the initial pool of nodes. The node test filters the initial node-set by either name or type. For further filtering, predicates are required. A predicate expression always evaluates to a Boolean value. If the result of the predicate expression evaluates to true for the context node, it remains in the node-set; if it evaluates to false, the context node is discarded from the node-set.
      Predicates may contain any of the simple expression types shown in the previous section, but the final result is always coerced to a Boolean value to determine whether the context node should remain in the node-set. Here are some example predicate expressions:

  child::car[attribute::make = 'BMW']
  
descendant::car[child::price < 27000]
child::car[child::option]

 

      The first expression identifies all child car nodes (of the context node) that have a make attribute value equal to 'BMW'. The second expression identifies all descendant car nodes (of the context node) that have a child price node whose value is less than 27000. In this case the string value is evaluated and coerced to a number. The final expression identifies all child car nodes (of the context node) that have child option nodes. Here the node-set identified by child::option is coerced to a Boolean that simply tests for emptiness.

Functions

      As you saw in the previous section, several types of functions can be used inside of expressions. These functions give you more control over the different object types defined by the language. XPath defines four groups of functions, one for each of the object types. In other words, there are groups of functions specifically for dealing with node-sets, strings, numbers, and Booleans.
      The node-set functions (see Figure 2) make it possible to deal directly with the current context node-set. The position function returns the position of the context node within the current context node-set. For example, if you only wanted to identify the first child model node, the following predicate does the trick (notice that XPath positions are one-based):

  child::model[position() = 1]
  

 

To find the first preceding model element, the following expression can be applied:

  preceding::model[position() = 1]
  

 

Notice that the position is relative to the direction of the axis.
      The last function returns the offset to the last node in the context node-set. The following expression identifies the last child model node:

  child::model[position() = last()]
  

 

      The count function takes a node-set parameter and returns the size of the node-set. For example, to identify only cars that have exactly three options, the following predicate can be used:

  child::cars[count(child::options) = 3]
  

 

      The id function makes it easy to dereference ID/IDREF relationships through a simple expression, as in these examples:

  id('bmw-323 bmw-328')
  
id(attribute::car-ids)

 

The first example supplies a space-delimited list of IDs that refer to attribute values of type ID (this requires a Schema or Documentation Type Definition). The second example supplies a node-set, which in this case evaluates to an attribute node that also may contain an IDREF string.
      The rest of the node-set functions deal with namespace information. The name function returns the context node's QName (prefix:local-name) while the local-name function returns only the local name part. The namespace-uri function can be used to test for specific namespace URIs, as shown here:

  child::car[namespace-uri() = 'urn:schemas-develop-com:cars']
  

 

      The string functions (see Figure 3) make it possible to deal more intimately with string types than you could with XSL Patterns. These functions allow you to extract substrings, concatenate strings, and evaluate string lengths. Two of the most commonly used functions are starts-with and contains. starts-with allows you to test whether a string starts with a specified literal, while contains allows you to test whether a string contains a specified literal:

  child::car[starts-with(child::model, '32')]
  
child::car[contains(child::model, '2')]

 

      The Boolean and numeric functions (see Figure 4 and Figure 5) make it easier to work with the Boolean and number data types, respectively. Since they're intuitive, I'll leave them for your perusal.

Location Path Evaluation

      The location path evaluation process is one of the more esoteric parts of the specification. In the previous section you saw that a location path consists of one or more location steps. You also saw that each location step consists of three parts: an axis identifier, a node test, and a set of predicates. The location steps that make up the expression are always evaluated in order. They are evaluated from left to right, one at a time.
      Before the first location step is evaluated, the context node-set must be initialized. If the expression is absolute, the initial context node-set consists only of the root node. If it's relative, the initial context node-set consists only of the node where the expression was applied.
      The first location step in the expression is then applied to each node in the context node-set one at a time. (For the first location step there is only one node in the context node-set.) Each time the location step is applied to a node in the context node-set, a new node-set is identified. The union of all identified node-sets produces a final result node-set for that particular location step. This result node-set then becomes the context node-set for the next location step in the expression, and the process repeats itself. After the last location step has been applied, the final result node-set is the result of the expression.

Figure 6 XPath Expression Tree Model

Figure 6 XPath Expression Tree Model

       Figure 6 shows an example of this process, a tree model that can be evaluated by walking through the evaluation of the following XPath expression:

  /child::cars/child::car/child::model[child::text() = '323i']
  

 

Abbreviations

      Because one of the main goals of XPath was to keep the syntax as compact as possible, the specification defines several abbreviations that can be used within location path expressions. The most significant abbreviation is the omission of the child axis. Since the child axis is used predominately, it's always implied even if left out. For example, the following two expressions are equivalent:

  /child::cars/child::car/child::model
  
/cars/car/model

 

Since the attributes axis is also quite commonly used, it can be abbreviated to a single @ character:

  /child::cars/attribute::make  =  /cars/@make
  

 

      Another common task is to search the entire tree for a specific set of nodes either by name or by type. The /descendant-or-self:: node()/ expression provides this type of functionality. This expression can be condensed to //. As a result, the following expressions are equivalent:

  /descendant-or-self::node()/child::model  
  
//model

 

      XPath defines two other very intuitive abbreviations for developers used to command prompts and URLs. The self::node and parent::node location steps can be abbreviated to . and .. respectively. Therefore:

  self::node()/descendant-or-self::node()/child::model = .//model
  

 

and

  parent::node()/child::model = ../model
  

 

      One last abbreviation worth mentioning has to do with the position function. Since position is used commonly within predicate expressions and most developers are familiar with array offset syntax, an expression in the form of [position() = number] can be abbreviated to simply [number]:

  /child::model[position() = 1]  =  /model[1]
  

 

      It's much easier to become accustomed to the verbose syntax before latching onto the shortcuts. Once you've learned the standard form, it's easy to become familiar with the abbreviations.

Conclusion

      XPath provides a uniform and compact syntax for addressing XML documents. Instead of writing code to explicitly walk through a document's structure looking for nodes that match some criteria, an XPath expression can be used to encapsulate that entire process. In general, the standardization of the XPath syntax makes it possible to layer on other technologies that require addressing functionality (such as XSLT, XPointer, and XLink). In the future, many XML technologies will use XPath to do the heavy lifting when it comes to document addressing. MSXML 2.6 is not completely compliant with XPath 1.0, although by the time MSXML 3.0 is released later this year it is expected to fully implement the specification.

Aaron Skonnard is an instructor and researcher at DevelopMentor, where he co-manages the XML curriculum. Aaron wrote* Essential WinInet (Addison-Wesley Longman, 1998) and coauthored Essential XML *due out from Addison Wesley Longman in June 2000. Get in touch with Aaron at https://www.skonnard.com.

From the July 2000 issue of MSDN Magazine.