XML Files: XSLT Processing, Processing Instructions in XML, Parameterizing Statements in XML, and More

MSDN Magazine

XSLT Processing, Processing Instructions in XML, Parameterizing Statements in XML, and More

Download the code for this article:XML0205.exe (43KB)

Q Is it possible to write an XSLT that processes data coming from SQL Server™?
A** Although XSLT only accepts XML input, there are several ways to process SQL Server 2000 results using XSLT. The simplest way is to use the XML support built into SQL Server 2000. SQL Server 2000 introduced the FOR XML clause, which allows query results to be returned as XML fragments. For example, the following SELECT statement returns author records using code that includes a generic XML format:

  SELECT au_fname, au_lname FROM AUTHORS FOR XML RAW
  

 

The RAW keyword tells SQL Server 2000 to model each record in the result as a row element with column names and values modeled as attributes.

  <row au_fname="Abraham"  au_lname="Bennet" /> 
  
<row au_fname="Reginald" au_lname="Blotchet-Halls" />
<row au_fname="Cheryl" au_lname="Carson" />
<row au_fname="Michel" au_lname="DeFrance" />
<row au_fname="Innes" au_lname="del Castillo" />
•••

 

      FOR XML AUTO instructs SQL Server 2000 to use the table name as the element name, as shown here:

  <authors au_fname="Abraham"  au_lname="Bennet" /> 
  
<authors au_fname="Reginald" au_lname="Blotchet-Halls" />
<authors au_fname="Cheryl" au_lname="Carson" />
<authors au_fname="Michel" au_lname="DeFrance" />
<authors au_fname="Innes" au_lname="del Castillo" />
•••

 

      With FOR XML AUTO, it's also possible to have the column names and values returned as child elements instead of attributes. This is made possible by the ELEMENTS keyword, as shown here:

  SELECT au_fname, au_lname FROM AUTHORS FOR XML AUTO, ELEMENTS
  

 

This select statement returns the following XML fragment:

  <authors>
  
<au_fname>Abraham</au_fname>
<au_lname>Bennet</au_lname>
</authors>
<authors>
<au_fname>Reginald
</au_fname>
<au_lname>Blotchet-Halls
</au_lname>
</authors>
<authors>
<au_fname>Cheryl
</au_fname>
<au_lname>Carson
</au_lname>
</authors>
•••

 

      This functionality was first introduced with SQL Server 7.0 through a downloadable (technology preview) ISAPI extension before it became part of the core SQL Server 2000 engine. With SQL Server 2000, you can experiment with these techniques by using the SQL Query Analyzer, as shown in Figure 1.

Figure 1 SQL Query Analyzer
Figure 1 SQL Query Analyzer

      It's also still possible to expose such functionality through a Web endpoint (otherwise known as Microsoft® Internet Information Services (IIS) virtual root) by using the IIS Virtual Directory Management tool. With this tool, you can create a new vroot, map it to a desired database, choose the types of queries you want to allow, and configure other related properties. Figure 2 shows the tool running on my machine with a vroot named pubs mapped to the SQL Server 2000 pubs database.

Figure 2 IIS Virtual Directory Management Tool
Figure 2 IIS Virtual Directory Management Tool

      Now it's possible to attach FOR XML queries to a URL query string, as shown here:

     https://localhost/pubs?sql=SELECT au_fname, au_lname FROM 
  
authors FOR XML AUTO, elements

 

If you try loading this URL in Microsoft Internet Explorer, the following error is displayed because it isn't able to display the resulting XML fragment:

  The XML page cannot be displayed 
  

Cannot view XML input using XSL style sheet.
Please correct the error and then click the Refresh button...

Only one top level element is allowed in an XML document.
Error processing resource 'https://localhost/pubs?...

 

      If you select View | Source, you'll notice that the XML fragment was returned properly. Internet Explorer is trying to load the results as an XML document in order to run an XSLT transform before displaying it. An error is produced because the results are not well-formed XML. Although the .NET APIs are specialized to deal with such XML result fragments, most mainstream XML APIs wouldn't be able to process them as is.
      To help get around this issue, SQL Server 2000 also allows you to specify the name of a root container element that can be used to produce a well-formed XML document. Just add "root=element name" to the query string:

  https://localhost/pubs?sql=SELECT au_fname, au_lname 
  
FROM authors FOR XML AUTO, elements&root=results

 

This produces the following well-formed XML document:

  <results>
  
<authors>
<au_fname>Abraham</au_fname>
<au_lname>Bennet</au_lname>
</authors>
<authors>
<au_fname>Reginald</au_fname>
<au_lname>Blotchet-Halls</au_lname>
</authors>
•••
</results>

 

Now if you load the URL into Internet Explorer, you'll see that it's able to run the XSLT transform, successfully producing the standard collapsible XML tree view. Likewise, you can write your own XSLT to process similar SQL results as long as they are well-formed XML documents. You can even process several different streams from SQL Server 2000 using the built-in XSLT document function, as illustrated by the example in Figure 3, which produces the HTML document displayed in Figure 4.

Figure 4 XSLT Using Pubs Database
Figure 4 XSLT Using Pubs Database

      NET also enables you to execute XSLT transformations against an instance of XmlDataDocument, which is a hybrid between the DataSet and XmlDocument classes. The .NET XslTransform class operates over objects that extend the XPathNavigator abstract base class, such as XmlDataDocument. This approach is similar, but it doesn't rely on the SQL Server 2000 XML integration features, as you can see here:

  DataSet ds = new DataSet();
  
SqlDataAdapter da = new SqlDataAdapter(
"select * from authors",
"server=localhost;uid=sa;database=pubs");
da.Fill(ds, "authors");
XmlDataDocument ddoc = new XmlDataDocument(ds);
XslTransform tx = new XslTransform();
tx.Load("view.xsl");
tx.Transform(ddoc.CreateNavigator(), null, Console.Out);

 

Here, the XSLT program isn't communicating with SQL Server 2000 directly. The XSLT is just written against the XML representation of the underlying DataSet.
      In addition to these approaches, it's also possible to write XSLT extensions in JavaScript, C#, or Visual Basic® .NET that work directly with ADO recordsets or .NET DataSets during the transformation process. See my column in the March 2002 issue for more details on this approach.
Q** I tried running an XSLT, but it fails on the following call to xsl:apply-templates.

  <xsl:apply-templates select="document(concat('https://localhost/
  
pubs?sql=select title from titleauthor ta, titles t where
ta.title_id = t.title_id and ta.au_id = &apos;',
@au_id, '&apos; for xml auto
&amp;root=results'))/results/t"/>

 

A Although it looks like this should work, you have to remember that using the character references (like &apos;) is equivalent to using the actual character code in the logical tree (the Infoset). Hence, when the XSLT processor loads the file, it first expands &apos; into the actual character code before executing the program. This produces adjacent single quote characters in the select statement, which causes an error in the concat function.
      You need to change the attribute value delimiter to single quotes and use double quotes as the delimiter for the concat arguments, as shown here:

  <xsl:apply-templates select='document(concat("https://localhost/
  
pubs?sql=select title from titleauthor ta, titles t where
ta.title_id = t.title_id and ta.au_id = &apos;",
@au_id, "&apos; for xml auto
&amp;root=results"))/results/t'/>

 

Q How do you encode ]]> inside of a CDATA section?
A** The only thing you can't place inside of a CDATA section is the ]]> character sequence, which serves as the CDATA section end marker. For example, consider what happens with the following ill-formed XML document:

  <p>
  
Here is an example of using a CDATA section:
<![CDATA[
<![CDATA[ character data goes here... ]]>
]]>
</p>

 

This document is ill-formed because the ]]> token is not allowed inside the outer CDATA section. After experiencing this, you might be tempted to trick the processor by escaping part of the CDATA section end token, as shown here:

  <p>
  
Here is an example of using a CDATA section:
<![CDATA[
<![CDATA[ character data goes here... ]]&gt;
]]>
</p>

 

This document is now well-formed, but it doesn't produce the desired result because &gt; is not expanded by the XML processor. Remember that the only characters recognized as markup in a CDATA section are ]]>. So one solution is to split the inner CDATA end token, as shown here:

  <p>
  
Here is an example of using a CDATA section:
<![CDATA[<![CDATA[ character data goes here... ]]]]>>
</p>

 

      Although this can produce two distinct text nodes in some APIs, it's trivial to normalize them into one. This is automatic in XSLT through a simple xsl:value-of expression

  <xsl:value-of select="p"/>)
  

 

which produces output like that shown in the following:

  Here is an example of using a CDATA section:
  
<![CDATA[ character data goes here... ]]>

 

      Of course, this whole discussion assumes you need to nest one CDATA section within another. This would be the case if you needed to escape large chunks of unsafe XML characters along with the inner CDATA section. It's much simpler if you don't nest CDATA sections and just escape the individual unsafe characters, as is done in the following code:

  <p>
  
Here is an example of using a CDATA
section: &lt;![CDATA[ character data
goes here... ]]&gt;
</p>

 

Q Are there any standard PIs in XML?
A** For those who aren't familiar with the term, PI is shorthand for a processing instruction that takes the following syntactical form in XML 1.0:

  <?target data?>
  

 

      PIs are used to instruct applications to perform some type of extra processing on the given document. Most mainstream Web browsers, including Internet Explorer, recognize the following PI:

  <?xml-stylesheet 
  
href="mystyle.xsl"
type="text/xsl"?>

 

      When the browser loads the XML document and recognizes the PI, it performs a transformation using the specified XSLT file and displays the result of the transformation instead of the raw XML file itself. This PI has been accepted as a W3C recommendation (see https://www.w3.org/TR/xml-stylesheet/ for more details).
      I don't know of any other industry standard PIs, although there may be some used within different industry segments. PIs have lost momentum and are falling out of grace with XML developers since they don't support namespace-qualified target names and since you can essentially duplicate their behavior through a namespace-qualified element.
Q** I'm trying to parameterize a select statement in XSLT as follows, but it doesn't work. What's wrong?

    <xsl:template name="processAuthors">
  
<xsl:param name="authors"/>
<xsl:for-each select="$authors">
<!-- process authors here -->
</xsl:for-each>
</xsl:template>

 

A There is nothing wrong with this template, but it may or may not work depending on the object you supply in the authors parameter. If you pass in a node-set of authors, as shown here, it should work:

  <xsl:call-template name="processAuthors">
  
<xsl:with-param name="authors"
select="//author"/>
</xsl:call-template>

 

But if you pass in a string containing the XPath expression, it definitely won't work. Each of these xsl:call-template calls produce an error when executed:

  <xsl:call-template name="processAuthors">
  
<xsl:with-param name="authors"
select="'//author'"/>
</xsl:call-template>

<xsl:call-template
name="processAuthors">
<xsl:with-param name="authors">
//author</xsl:with-param>
</xsl:call-template>

 

      The errors occur because, in both cases, $authors represents a string object and XPath doesn't support coercing strings into node-sets. If you need to parameterize aspects of an XPath expression, you can typically get by with parameterized predicate expressions, as shown here:

  <xsl:for-each select="//author[age < 
  
$ageLimit]">
•••
<!-- process authors with age under
supplied limit -->
</xsl:for-each>

 

Q How can I easily parameterize a sort statement in XSLT (for a sortable HTML table, say)?
A** The xsl:sort element uses a select statement (XPath expression) to identify the sort key:

  <xsl:sort select="key" .../>
  

 

This means you can't parameterize the select expression and pass in a string value. In fact, there really isn't a good way to accomplish this without either running a two-pass transformation or modifying the XSLT document programmatically before executing the transformation.
      Running a two-pass transformation is not very efficient because it requires you to write a meta-transformation (one XSLT that produces another XSLT). First, you run a transformation to generate the XSLT document with the "right" sort key embedded. Then you use the generated XSLT to produce the actual HTML report.
      If you're working with DHTML, you probably already have the XSLT document loaded into a DOMDocument object, which you can modify programmatically. For example, when the user clicks on a table column for sorting purposes, just use XPath to programmatically identify the select attribute in the XSLT document and modify its value before executing the transformation again. Then take the generated HTML and squirt it into the DHTML page using the innerHTML property. The sortable page is shown in Figure 5.

Figure 5 Fun with HTML Sorting
Figure 5 Fun with HTML Sorting

      The following DHTML code fragment illustrates how this can be accomplished:

  function onSort()
  
{
var node = xsldoc.selectSingleNode("//xsl:sort/@select");
var columnName = event.srcElement.innerText;
// update the select attribute with the name of the
// selected column
node.text = columnName;
// execute transformation with modified sort key
tableBody.innerHTML = doc.transformNode(xsldoc);
}

 

This turns out to be a handy technique for generalizing XSLT transformations used in DOM-based applications. The complete source code for this example is available for download with this article at the link at the top of this article.
Q** I have an XSLT document that generates an HTML table [see Figure 6]. The problem is that I want to generate a table header that includes the name of each element but remains generic. I can't figure out how to do this while processing the first title_info row and not on any of the rest. Any ideas?
A** The easiest way to accomplish this is through a conditional statement and a simple for-each loop. Inside the title_info template, you need to use an xsl:if to see if you're on the first title_info element (test="position() = 1"). If you are, use an xsl:for-each statement to walk through each of its child elements to output its name, as shown here:

  <xsl:template match="title_info">
  
<xsl:if test="position()=1">
<xsl:for-each select="*">
<th><xsl:value-of select="name()"/></th>
</xsl:for-each>
</xsl:if>
<tr><xsl:apply-templates/></tr>
</xsl:template>

 

      You can also accomplish this using xsl:apply-templates (instead of xsl:for-each), but it requires the use of XSLT modes. For example, suppose that after identifying the first title_info you call xsl:apply-templates to do the header processing, as demonstrated in the following:

  <xsl:template match="title_info">
  
<xsl:if test="position()=1">
<xsl:apply-templates/>
</xsl:if>
<tr><xsl:apply-templates/></tr>
</xsl:template>

 

Because the processor doesn't know that the first title_info element's children should be processed differently on this call, the first row is added to the resulting table twice. What you really need here is the ability to process the first title_info element's children in two separate passes—the first pass to produce the header and the second to add the first row to the table.
      The XSLT language offers template modes to help deal with situations in which a given set of nodes needs to be processed in multiple passes. To use modes, simply give the template a mode name (mode="somename"), and then when you call xsl:apply-templates, specify which mode to use. For example, your original program could be rewritten as shown in Figure 7.
      Modes play an important role in XSLT's declarative programming model and are often required when you have to deal with complex transformations.
Q** I'd like to implement a reverse polish notation (RPN) calculator in XSLT just for fun, but I'm getting stuck. Here's the sample equation that I'm using:

  <!-- 32 + (5 - (22 * 5)) -->
  
<equation>
<add>
<number>32</number>
<subtract>
<number>5</number>
<multiply>
<number>22</number>
<number>5</number>
</multiply>
</subtract>
</add>
</equation>

 

A The reason this turns out to be quite difficult is that XSLT variables don't behave like variables in most imperative programming languages. Nevertheless, you'll need to use them to solve this problem along with the help of recursion. Marvin Smit, an attendee at a recent conference at which I appeared, offered an elegant solution that uses a combination of xsl:apply-templates and xsl:value- of instructions to recursively process the equation (see Figure 8).
      I've provided the equation document, this solution, and a procedural solution (which uses named templates and xsl:call-template) in the sample code associated with this column. Enjoy!
Q** In your January 2001 article, you showed how to use XslTransform by loading an XSLT file from disk. Does XslTransform also support loading an XSLT from a string or an application variable?
A** XslTransform's Load method doesn't have an overload that takes a string of XML 1.0—the version of Load that takes a string expects a URL. There is an overload that takes an IXPathNavigable reference. Since XmlDocument implements IXPathNavigable and it does support loading XML 1.0 documents from strings (through LoadXml), you can get the job done this way:

     XmlDocument doc = new XmlDocument();
  
doc.LoadXml(xsltString);
XslTransform tx = new XslTransform();
tx.Load(doc);

 

Q I have an XSLT program that returns text instead of XML or HTML. When I call Load on XslTransform and have it return an XmlReader object, I can't get to the data because it is not well-formed XML. How do I get the result of a transform that returns text without writing it to a file? I need it as a string.
A** If the output of the XSLT program is not well-formed XML, you cannot use the Transform overload that returns an XmlReader object, as you saw. Instead, you need to supply it with either a Stream or TextWriter object to which it can write the results. Since you want the results as a String, you should supply Transform with a StringWriter object, as shown here:

  XmlDocument doc = new XmlDocument();
  
doc.Load("input.xml");
XslTransform tx = new XslTransform();
tx.Load("transform.xsl");
StringWriter sw = new StringWriter();
tx.Transform(doc.CreateNavigator(), null, sw);
Console.WriteLine(sw.ToString());

 

Q I'm hearing a lot more about the Infoset these days—what exactly is it again?
A** The Infoset is only the center of the XML universe! As stated in the W3C recommendation (see https://www.w3.org/TR/xml-infoset), the XML Information Set (Infoset) "provides a set of definitions for use in other specifications that need to refer to the information in an XML document." In other words, the Infoset defines the abstract data model that all other XML technologies share in common (see Figure 9). Essentially, the Infoset is the UML description of an XML document.

Figure 9 Infoset
Figure 9 Infoset

      The XML 1.0 recommendation defines the syntax used to create XML documents. After XML 1.0 was released, other XML technologies such as DOM, XPath, and XSLT began to evolve. Each of these layered specifications had to infer the logical XML data model from the original specification since it hadn't been codified. Because this was happening it became apparent that the various specifications didn't agree on what the implied data model should be.
      As an example, consider the simple question of whether an attribute node should have a parent property. Everyone would agree that attributes are not considered children of an element but it's not clear how one would navigate from an attribute node back to its owner element (using any given API). In XPath this is possible through the parent axis. In the DOM API, an attribute node's parent property always returns null (although you can navigate back to the element through an ownerElement property). The main purpose of the Infoset is to put an end to all of these disputes once and for all by providing a clear definition of the fundamental data model that it's built upon.
      The Infoset data model serves as the central rendezvous point for all XML technologies (see Figure 9). When framed this way, XML 1.0 simply becomes a serialization format for the data model. This allows other serialization formats to emerge over time (consider WAP) as long as they also map to the same data model. And if the other specifications (DOM, XPath, and so on) are defined in terms of the data model (and not in terms of XML 1.0), they are shielded from such matters. All other XML technologies and APIs are simply projections of the Infoset onto other languages and programmatic types.
      Besides making things clearer for specification authors, which ultimately improves consistency, the Infoset also facilitates interoperability at higher levels. Consider what's happening when you execute a FOR XML query against SQL Server 2000. In this case, SQL Server 2000 transforms the results into an XML 1.0 string, which then has to be parsed on the client by an XML processor and converted into the Infoset representation that's easy to consume programmatically (see Figure 10). Essentially there are two transformations taking place; one is happening on the server and the other is happening on the client.

Figure 10 Processing with XML Parsing
Figure 10 Processing with XML Parsing

      If someone wrote an Infoset provider for SQL Server 2000, this would no longer be the case. The XML-based processing code written by the client remains the same but now SQL Server 2000 is free to return more efficient TDS streams, which are rendered as an XML Infoset through the provider API (see Figure 11). The provider is simply translating API calls to make the data look like XML, although it never really contains angle brackets, it's really just a TDS stream in disguise.

Figure 11 Processing with TDS Streams
Figure 11 Processing with TDS Streams

      It's actually quite easy to write this type of provider in .NET. You simply extend either XmlReader or XPathNavigator and override the abstract members. If you wrote a custom navigator for the file system, you would then be able to query the file system using XPath expressions and generate file system reports using XSLT. In my September 2001 column, I focused on this topic and provided several custom readers/navigators that help illustrate this concept. Here is an example using my custom FileSystemNavigator:

  FileSystemNavigator fsn = new FileSystemNavigator();
  
// find all files in the temp directory
XPathNodeIterator files = fsn.Select("/mycomputer/c/temp/*");
while (files.MoveNext()) {
// process file here ...
}

 

      Although the Infoset is completely abstract, you'll understand how XML and all of the related technologies work if you internalize it. As my good friend Tim Ewald likes to say, "When learning XML, you must apply the principles that Yoda instilled in his young apprentice, Luke Skywalker—you have to let the Infoset flow through you..." and only then will you realize your true potential as an XML developer.

Send questions and comments for Aaron to xml@microsoft.com.

is an instructor/researcher at DevelopMentor, where he develops the XML and Web Services curriculum. Aaron coauthored Essential XML Quick Reference (Addison-Wesley, 2001) and Essential XML (Addison-Wesley, 2000). Get in touch with Aaron at https://staff.develop.com/aarons.

From the May 2002 issue of MSDN Magazine