Web Q&A;: XPath, XML Notepad, Data Islands, Case Sensitivity, XSL, and More

MSDN Magazine

XPath, XML Notepad, Data Islands, Case Sensitivity, XSL, and More
Edited by Nancy Michell

Q I'm new to XML so there are a few things I don't understand. For example, what is XPath? Is it some strange new language we have to learn to use XML?

A
Actually, XPath turns out to be very simple, yet very powerful. Because XML describes the hierarchical structure of a document or set of data, an XPath statement simply points to where in the tree particular information can be found. In the XML Files column in the May 2000 issue of MSDN® Magazine, Aaron Skonnard describes XPath this way:

Consider the following XML fragment:

  <contact category="enemy of the state">    
  
<fullname>Smith</fullname>
<numbers>
<home>801-555-2323</home>
<cell>801-555-3232</cell>
</numbers>
</contact>

 

Suppose it was necessary to locate Smith's phone numbers contained within the document. While this could be accomplished using SAX or DOM manual processing, this type of query can be described with a simple XPath expression:

  /descendant::contact[fullname="Smith"]
  
/child::numbers/child::*

 

      You can see the XPath statement is actually a path to a node in the hierarchy, so it's pretty simple, but obviously it's powerful because it provides you with access to a very specific place in the tree, or in the document.
      The particular XML Files column just mentioned has lots of good information for someone who is starting out with XML. Another good source for beginners is the W3C. See their intro to XML at https://www.w3.org/XML/1999/XML-in-10-points.html.

Q
Does Microsoft have a tool for viewing the structure of an XML document?

A
Yes, there's XML Notepad, a handy tool that shows the user the structure of an XML document, including all nodes and child nodes. Figure 1 shows a screen shot of XML Notepad after it has opened the simple XML document shown in Figure 2.

Figure 1 XML Notepad
Figure 1 XML Notepad

The structure of the data is represented in the left pane while the values of the nodes are displayed in the right pane. The graphical representation of the tree makes development much easier. XML Notepad can be downloaded from the MSDN Web site at https://msdn.microsoft.com/library/en-us/dnxml/html/xmlpadintro.asp where you'll also find information on using it.

Q
I'm considering using XML on my Web site, but there's much I don't know. For example, what is an XML data island used for?

A
Obviously you're on the right track if you're asking about data islands for your Web page because that's exactly what data islands are used for. Data islands are simply groups of XML data placed inside an HTML document.
      You can place the XML and the data inside the page like so:

  <XML ID="XMLID"> 
  
<XMLDATA>
<DATA>TEXT</DATA>
</XMLDATA>
</XML>

 

Using this option, you would actually place the data in between the <data> tags.
      The other option for using XML in your page is to point to an external XML data file using a source tag, such as the following:

  <XML SRC="https://localhost/xmlFile.xml"></XML>
  

 

      The Microsoft® XML 3.0 Developer's Guide explains the process (see https://msdn.microsoft.com/library/en-us/xmlsdk30/htm/xmconxmldevelopersguide.asp).

Q
Can I call IXMLDOMElement's getAttribute and find an attribute without it being case-sensitive? Please help!

A
Yes, you can work around the case-sensitivity. Call

  selectNodes("@*[translate(name(), 'abcdefghijklmnopqrstuvwxyz',
  
'ABCDEFGHIJKLMNOPQRSTUVWXYZ')='???']")

 

to get attribute nodes you want, then play with the return node set. (Note that you should replace "???" with an uppercase attribute name like "ATT".)
      The solution is to use translate from XSLT to convert the strings to uppercase and then do the comparisons. For example, if you want to find whether there is an attribute named "name" or "Name" or other combinations on the context node, call selectNodes or selectSingleNode with a query like this:

  "@*[translate(name(), 'abcdefghijklmnopqrstuvwxyz',
  
'ABCDEFGHIJKLMNOPQRSTUVWXYZ')='NAME']"

 

Note that this alternative solution only applies to English or any language whose lowercase characters have uppercase counterparts.

Q
What is the best way to use XSL to take elements and group them by a common child element? For example, if I have some XML like the code shown in Figure 2, how can I generate output using an XSL that groups everyone by their age? I'd like something that looks like the following:

  25 year old authors: 
  
Mike
Randy
37 year old authors:
Stuart
40 year old authors:
Melissa
Amy

 

A The idea is to write only headlines ("xx year old authors:") when the age changes (see Figure 3). The first half of the test

  not(following-sibling::*[age = current()/age])
  

 

tests if the age is different from the next sibling. The second half

  position()!=last()
  

 

is only necessary because there is nothing following the last, which makes a headline unnecessary.

Q
How can I get only a node's siblings with an XPath query?

  <food>
  
<fruit name="apple"/>
<fruit name="orange"/>
<fruit name="grape"/>
<fruit name="orange"/>
<fruit name="orange"/>
</food>

 

Suppose the current context is the bolded item, meaning that it's what the XML engine is currently processing. What query can I use so that I can, in some test condition, query all tags so that what gets returned are only the first and last @name="orange". I am using MSXML 2.5.
      What if I have more than just those two siblings? Also, does this solution work with the MSXML 2.5 engine?

  <food>
  
<fruit name="apple"/>
<fruit name="orange"/>
<fruit name="grape"/>
<fruit name="orange"/>
<fruit name="orange"/>
<fruit name="orange"/>
<fruit name="orange"/>
<fruit name="orange"/>
<fruit name="orange"/>
<fruit name="orange"/>
<fruit name="orange"/>
</food>

 

A To pick out just the first and last nodes from a set, $v, use

  $v[first()] | $v[last()]
  

 

On further reflection, a faster XPath that should work in MSXML 2.5 positions you relative to the current node:

  ../fruit[@name="orange"]
  

 

This XPath includes the original context node, but you can remove it by assigning its position to $pos. Then the XPath

  ../fruit[position()!=$pos][@name="orange"]
  

 

should remove it from the list and you can pick out the first and last of the remaining ones.

Q
Is there any way to escape the characters " and ' in an XPath expression like

  //mytag[@id="my text"]
  

 

I know you can use either " or ' to specify a value, but I want to be able to search for text having both " and '. I tried the entity reference approach (&quot;), but it looks like XPath doesn't know about it.

A
There are two issues: one is that there is a surrounding context, which (as you point out) has its own escaping rules and requirements. The other is that, after you've dealt with the surrounding context, XPath itself provides no string escaping mechanisms.
      For example, if you're executing an XPath query on the URL in SQL XML, after you've dealt with URL-encoding issues, there is no way to ask for all values equal to a string containing both the single quote and double quote without resorting to concat() or variables like /Comments[@Description=concat(" ' ", ' " ')]. You can try something clever with concat and/or variables, like concat(' " ', " ' ") or assigning $quot to " and $apos to '. This phenomenon (being able to construct a value that can't be serialized out conveniently as a literal) is very annoying, but actually isn't limited to XPath. It happens in XML, too. A CDATA section containing ]]> has to be written out as two successive CDATA sections:

  <![CDATA[ all the text up to 
  
and including the first
bracket ]]]><![CDATA[]>]]>

 

      If the XPath expression you are using is a part of an XSLT file, you should be able to use &quot; and &apos;. If the expression is a part of a C# program, for example, then you can use \ as an escape character. Also you may use \" and \' in XSL patterns, which is the default dialect (SelectionLanguage).

Send questions and comments to webqa@microsoft.com

Thanks to these Microsoft developers for their technical expertise: Michael Brundage, Gabriel Ghizila, Sebastian Helm, Erik Jensen, Pranav Kandula, Anton Lapounov, Hock Seng Lee, Peter Rilling, Eric Rudolph, Kuen Siew, Chris Sires, Mike Treit, Guang-an Wu.

From the September 2001 issue of MSDN Magazine.