Export (0) Print
Expand All
Expand Minimize

Producing Multiple Outputs from an XSL Transformation

 

Oleg Tkachenko
Multiconn Technologies

June 6, 2003

Summary: Guest author Oleg Tkachenko explains how you can postprocess XSL transformation results into multiple documents using the XslTransform and XmlTextWriter classes in the .NET Framework. (17 printed pages)


Download the XML06162003_sample.exe sample file.

Note   This download requires that the Microsoft .NET Framework 1.0 be installed.
Editor's Note   This month's installment of Extreme XML is written by guest columnist Oleg Tkachenko. Oleg lives in Holon, Israel and is a Software Engineer for Multiconn Technologies. He has been working closely with XML, and especially XSL for the last three years, and he is an active contributor to the online community surrounding Microsoft XML products. He can be reached at olegt@multiconn.com.

As far back as January 1999 people have asked for the ability to produce multiple output documents in one XSL transformation. It's fairly natural indeed to be able to perform not only one-to-one or many-to-one, but also one-to-many as well as many-to-many transformations. For instance, to produce separate HTML pages for each chapter of a book or to generate a Web photo gallery from an image catalog and a page template. Unfortunately the W3C XSLT 1.0 Recommendation, which was published back in November 1999, does not define support for multiple output documents; instead a single result document is assumed.

The W3C XSL Working group has created a requirement that the next version of XSLT language would support multiple output documents and accordingly the W3C XSLT 1.1 Working Draft (officially frozen) has introduced a notion of a subsidiary result document in order to meet this requirement. W3C XSLT 2.0 Working Draft (work in progress) has gone even further—the concept of a principal and subsidiary result documents has been removed and all possible result documents have been given equal status.

So although the story sounds promising for future versions of XSLT, current users of XSLT 1.0, such as users of the System.Xml.Xsl.XslTransform class or MSXML, do not have access to this functionality. To workaround this lack of functionality, a number of alternate solutions are usually recommended in XSLT-related MSDN Newsgroups including:

  • Preprocessing of the transformation input by breaking it into chunks and performing the transformation on each chunk in turn.
  • Creating result-tree fragments and writing them aside using extension functions or extension objects.
  • Postprocessing of the transformation result by splitting it into chunks and writing each chunk as a separate document.

In this article, I'll show how the latter solution can be easily and effectively implemented in the .NET Framework using the XslTransform class and a customized XmlTextWriter class.

Postprocessing the Result of XSL Transformation

The idea behind implementing multiple outputs by postprocessing is to introduce an additional custom layer, where further transformation occurs between the XSLT processor and consumer of the XSL Transformation results. This additional layer is where the multiple output logic is implemented. Let's name that layer the redirecting layer. This way the XSLT processor still produces a single result document, which contains both main result document and optional subsidiary result documents marked as such by some redirecting instructions. The redirecting instructions are elements that indicate the subsidiary result documents, as well as define where and how to redirect them. The redirecting layer passes the main result document further untouched, but redirects subsidiary result documents to the specified destination according to redirecting instructions. Figure 1 shows the conceptual processing model.

Figure 1. Conceptual model of multiple output by postprocessing

The Design Process

Now I have to decide how to implement the redirecting instruction and the redirecting layer.

Redirecting Instruction

The redirecting instruction might be implemented as pair of XML processing instructions, which marks redirecting start and end. Although this approach looks reasonable, it has some serious flaws. First of all, XML processing instructions cannot have attributes, so it's not so easy to embed and retrieve structured information from a processing instruction. Secondly, XML processing instructions cannot have children, hence not well suited to tag a subtree. It's too easy to create a processing instruction pair with ill-formed content in between or even to forget end processing instruction altogether. And what's really bad is that this will still be a perfectly well-formed XML document, therefore the problem won't be recognized till the final output redirecting stage.

An XML element is much better suited to be the redirecting instruction. To avoid reinventing the wheel, I decided to use the exsl:document XSLT extension element developed by EXSLT community initiative as the redirecting instruction in my implementation. The exsl:document element must belong to the http://exslt.org/common namespace and has the following syntax (supported subset of attributes only):

Attributes

<exsl:document
    href = { uri-reference }
    method = { "xml" | "text" }
    encoding = { string }
    standalone = { "yes" | "no" }
    doctype-public = { string }
    doctype-system = { string }
    indent = { "yes" | "no" }
    <-- Content: template -->
</exsl:document>

exsl:document Element Semantics

Now it's time to formulate some boring semantics, but please stick around. exsl:document element is used to create a subsidiary result document. The content of the exsl:document element is a template, which is instantiated to create a sequence of nodes. A root node is created with this sequence of nodes as its children, and the tree with this root node represents the subsidiary result document.

The attributes on xsl:output elements do not affect outputting of subsidiary result documents. The output of a subsidiary result document is completely controlled by the attributes on the exsl:document element that were used to create that subsidiary result document.

The href attribute, which is the only one required on exsl:document element, specifies where the new result document should be stored. It must be an absolute or relative URI, and it must not have a fragment identifier. Semantics for the rest of the attributes are exactly as defined in the W3C XSLT 1.0 Recommendation for the xsl:output element.

Note   XSLT 1.0 element extension mechanism allows namespaces to be designed as extension namespaces. If a namespace prefix is listed in the extension-element-prefixes attribute of the xsl:stylesheet element, then any element in that namespace should be treated as an extension element. However, the exsl:document element in my processing model is not a real XSLT extension element, instead it's just regular result element for the XslTransform class, therefore the "exsl" namespace prefix should not be listed in the extension-element-prefixes attribute. Listing the prefix would lead to an exception because the XslTransform class does not support custom extension elements. But it's good idea to put the "exsl" namespace prefix into the exclude-result-prefixes attribute of the xsl:stylesheet element to avoid "exsl" namespace propagation into the transformation result documents.

Traffic-Controller: the Redirecting Layer

Having defined syntax and semantics of the redirecting instruction it's time to describe the redirecting layer design.

The XslTransform class, found in the System.Xml.Xsl namespace of the .NET Framework class library, is designed as a flexible component with a rich API and allows for output of the transformation result to a Stream, TextWriter, XmlWriter, or XmlReader object. Stream and TextWriter classes are generic XML-unaware I/O classes, therefore when outputting to them, XslTransform serializes the result tree down to character level XML syntax and writes it to the specified Stream or TextWriter object using TextWriter methods. In the case of an XmlWriter, however, XslTransform does not serialize the result tree, but writes it to the specified XmlWriter object using XmlWriter methods. And finally in the latter case, a new XmlReader object is created and the result of the transformation can be asynchronously pulled out from this object using XmlReader methods.

From this variety of XslTransform outputs, customizing the XmlTextWriter class (which implements XmlWriter) seems to be the most effective and clear choice. This way one can get almost direct access to the transformation result tree and therefore avoid superfluous interim serialization/parsing steps. For this reason, I decided to implement the redirecting layer as a customized XmlTextWriter class, named MultiXmlTextWriter. As with any tradeoff, this decision has some drawbacks. See the section entitled Strings Attached for a description.

Having said that, I can refine the conceptual model as shown in Figure 2.

Figure 2. Refined Multiple Output Conceptual Model

Customizing the XmlTextWriter

Anybody experienced with SAX will find that customizing the XmlWriter has much in common with SAX filter creation. Basically, what we have is a series of method calls for writing out XML occurring, and if we want to filter or modify some calls, we have to override appropriate WriteXXX methods. Also, as usual in push processing mode, if we need more information than one local method call contains to perform a particular task, we have to track some of the writer's state between methods calls.

The output redirecting logic I want to implement in the MultiXmlTextWriter class is quite simple. Once the exsl:document element start tag is detected in the writing stream, its attributes should be picked out and a new writer object (XmlTextWriter or StreamWriter depending on the "method" attribute value) should be created with parameters as specified in those attributes, then the output is switched to this newly created writer object until the end tag of this exsl:document element in encountered.

To implement this logic I need to be able to:

  1. Detect the exsl:document element start tag.
  2. Pick up exsl:document element attributes.
  3. Detect the start of exsl:document element content.
  4. Redirect output.
  5. Detect the exsl:document element end tag.

And to accomplish these tasks, I also need a simple state machine of four states:

  • Relaying: the output is being relayed further untouched (default state).
  • Redirecting: the output is being redirected (that is, exsl:document element content is written).
  • WritingRedirectElementAttrs: exsl:document element attributes are being written.
  • WritingRedirectElementAttrValue: the exsl:document element attribute value is being written

Detecting of the exsl:document element start tag is straightforward. I just need to override the WriteStartElement method to check if the element's local name is "document" and the namespace URI is "http://exslt.org/common":

public override void WriteStartElement(string prefix, string localName, string ns) {
    if (ns == "http://exslt.org/common" && localName == "document") {
        //Detected exsl:document element start tag
        ...

Once the exsl:document element start tag is detected, I can expect exsl:document element attributes to be written next, so the state machine should be switched to the WritingRedirectElementAttrs state.

Detecting the exsl:document element end tag is a bit trickier, but still fairly straightforward. I have to keep the depth level while writing exsl:document element content—the depth level is incremented on each WriteStartElement method call and decremented on each WriteEndElement method call, so returning to the initial depth level value in WriteEndElement method means the end tag I'm looking for is encountered.

Picking up the exsl:document element attributes require more state fiddling. Once the WriteStartAttribute method is called in the WritingRedirectElementAttrs state, I should save the attribute name and switch the state machine to the WritingRedirectElementAttrValue state, then in the next WriteString method I'll get this attribute value. WriteEndAttribute method unsets the state machine as follows:

public override void WriteStartAttribute(string prefix, string localName, string ns) {
    if (redirectState == RedirectState.WritingRedirectElementAttrs) {
        redirectState = RedirectState.WritingRedirectElementAttrValue;
        currentAttributeName = localName;
        ...
}
public override void WriteString(string text) {
    if (redirectState == RedirectState.WritingRedirectElementAttrValue) {
        switch (currentAttributeName) {
            case "href":
                state.Href = text;
                break;
            ...
}
public override void WriteEndAttribute() {
    if (redirectState == RedirectState.WritingRedirectElementAttrValue) {
        redirectState = RedirectState.WritingRedirectElementAttrs;
        ...
}

Then, detecting the start of the exsl:document element content can be based on the idea that in the WritingRedirectElementAttrs state, every WriteXXX method call other than WriteStartAttribute means that writing of attributes is over and writing of the content has started. Once this is detected, the state machine should be switched to the Redirecting state.

To get a better understanding of how it works, look at Figure 3 below, which shows the sequence of states during typical processing of the exsl:document element:

Figure 3. Sequence of states during the processing of the exsl:document element

And finally, the output redirecting is a piece of cake once I have the state machine. I need to override the following base XmlTextWriter methods:

And in each of them I have to check whether we are currently in the Redirecting state and accordingly invoke either the current writer namesake method or the base class (XmlTextWriter) namesake method:

public override void WriteComment(string text) {
    if (redirectState == RedirectState.Redirecting)
        state.Writer.WriteComment(text);
    else
        base.WriteComment(text);
}

Source Code

You can find thoroughly commented MultiXmlTextWriter sources in the src directory of the article sample code. They consist of two classes in MultiOutput namespace—the MultiXmlTextWriter class itself, which implements the design described above, and the OutputState class, which operates as a collection of the output state properties (it keeps the current writer and current depth level, along with the whole bunch of subsidiary result document properties, defined by exsl:document element attributes).

Take a look also at the MultiXmlTextWriter documentation in the "samples\xmldoc\doc" directory. This documentation was generated from C# XML documentation file using MultiXmlTextWriter itself. See Real World Example: Generating MSDN-style Documentation From C# Documentation File below for more.

In addition, the latest version of the MultiXmlTextWriter (I believe there is still much room for its enhancement) can be downloaded from the GotDotNet site.

MultiXmlTextWriter Usage Pattern

Now I'm going to show you how the MultiXmlTextWriter class can be used. The MultiXmlTextWriter class extends XmlTextWriter, which in turn implements XmlWriter, therefore MultiXmlTextWriter instances can be passed directly to the overloaded XslTransform.Transform() method, which accepts XmlWriter as an object to which to write the transformation result.

Note   Generally speaking, the MultiXmlTextWriter class is not limited to use with XslTransform, and in a similar manner can be used to split any XML stream.

Here is typical usage scenario in C# code:

namespace MultiOutput.Test {
    using System;
    using System.Xml.XPath;
    using System.Xml.Xsl;
    using System.Xml;
    using System.Text;
    using System.IO;
    using MultiOutput;    
    
   /// <summary>
   /// <c>MultiXmlTextWriter</c> test class.
   /// </summary>
   /// <remarks>Usage:<br/>
   /// <code>MultiOutTransfrom.exe source stylesheet result</code>
   /// </remarks>
   public class MultiOutTransform {
       static void Main(string[] args) {           
           try {
               XPathDocument doc = new XPathDocument(args[0]);
               XslTransform xslt = new XslTransform();
               xslt.Load(args[1]);               
               MultiXmlTextWriter multiWriter = 
                   new MultiXmlTextWriter(args[2], Encoding.UTF8);
               multiWriter.Formatting = Formatting.Indented;               
               xslt.Transform(doc, null, multiWriter);               
           } catch (Exception e) {
               Console.Error.WriteLine("Transformation failed, an error has occured:");
               Console.Error.WriteLine(e);
           }
       }
   }
}

You can use the above MultiOutTransform class as a very primitive XSLT command line utility, which nevertheless supports multiple output. It requires three command line arguments—source XML filename, XSLT stylesheet filename, and the main result document filename. The samples below are run using MultiOutTransform.exe.

Example: Invoice Processing

As a simple example of using multiple output documents in XSLT, consider a hypothetical invoice processing application that takes an invoice document as input and produces HTML confirmation page to a client and simultaneously sends SOAP messages to an order processing Web service. All files of this example can be found in the "samples\order" directory.

Below is dummy input invoice document:

invoice.xml

<invoice>
    <item>Wallabee</item>
    <item>Wombat</item>
    <item>Wren</item>
</invoice>

The following XSLT stylesheet does the job. It produces the HTML confirmation page as a main result document and builds SOAP messages as additional result documents in a different directory:

invoice-processor.xsl

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xmlns:exsl="http://exslt.org/common" exclude-result-prefixes="exsl">
    <xsl:template match="/">
        <!-- Main result document - confirmation -->
        <html>
            <head>
                <title>Thank you for purchasing!</title>
            </head>
            <body>
                <h2>Thank you for purchasing at fabrikam.com!</h2>
            </body>                        
            <xsl:apply-templates mode="order"/>
        </html>
    </xsl:template>
    <xsl:template match="invoice" mode="order">
        <!-- Additional result document - SOAP message for fabrikam.com
        order processing web service -->
        <exsl:document href="soap/order.xml" indent="yes">
            <soap:Envelope 
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
                <soap:Body>                    
                    <ns:Order xmlns:ns="urn:fabrikam-com:orders">
                        <xsl:apply-templates mode="order"/>
                    </ns:Order>
                </soap:Body>
            </soap:Envelope>
        </exsl:document>
    </xsl:template>
    <xsl:template match="item" mode="order">
        <xsl:copy-of select="."/>
    </xsl:template>
</xsl:stylesheet>

Being run against invoice.xml using the aforementioned MultiOutTransform.exe utility, which encapsulates standard XslTransform and MultiXmlTextWriter, this stylesheet produces two result documents—confirmation.html and order.xml in the "soap" directory (note, this directory will be created by MultiXmlTextWriter if it does not exist):

MultiOutTransform.exe invoice.xml invoice-processor.xsl confirmation.html

Main result document: confirmation.html

<html>
  <head>
    <title>Thank you for purchasing!</title>
  </head>
  <body>
    <h2>Thank you for purchasing at fabrikam.com!</h2>
  </body>
</html>

Subsidiary result document: order.xml

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <ns:Order xmlns:ns="urn:fabrikam-com:orders">
      <item>Wallabee</item>
      <item>Wombat</item>
      <item>Wren</item>
    </ns:Order>
  </soap:Body>
</soap:Envelope>

Real World Example: Generating MSDN-style Documentation from C# XML Documentation Files

Now it's time to demonstrate how the MultiXmlTextWriter class can be used in real world applications. The C# language has an extremely useful feature—you can document the code you write using XML documentation comments. The C# compiler is then able to process documentation comments in your code to an XML file. I'll show you how the MultiXmlTextWriter class can be used to generate multipage HTML documentation in MSDN style from this XML file.

The XML documentation file contains a flat list of all in the code that is tagged to generate documentation in the following format:

<doc>
    <assembly>
        <name>AssemblyName</name>
    </assembly>
    <members>
        <member name="MemberID">
            ... documentation comments ...
        </member>    
        ...
    </members>
</doc>

Where AssemblyName is an assembly name and MemberID is a unique ID string, generated by C# compiler, which identifies the member. This ID string contains information about the member type (class, property, method, and so on), fully qualified name of the member starting at the root of the namespace, and method parameters if the member is a method.

MSDN class library documentation usually includes:

  • Namespace summary page for each namespace
  • Class overview page for each class
  • Class members page for each class
  • Constructors page for each class
  • Constructor page for each constructor
  • Method page for each method
  • Property page for each property

I'm going to generate all these pages and an additional CSS file, frameset index page, namespace list page, "all classes" page, and class list page for each namespace. All files of this example can be found in the samples\xmldoc directory.

XSLT stylesheet xmldoc.xsl (located in the samples\xmldoc\xslt directory), which generates all above pages from a single XML documentation file is straightforward. It creates a frameset as a main result document and generates a CSS file that contains style information (note that the CSS format is text rather than XML, so the CSS output document should be generated with the method="text" attribute on the exsl:document element). Then the source XML documentation tree is transformed into a temporary hierarchical tree with all ID strings parsed, whereupon this hierarchical tree is processed several times to produce all the specialized pages listed above.

The snippet below illustrates how a constructor page is created:

<xsl:template match="constructor" mode="class">
    <!-- The namespace path (e.g. Acme/Foo/Bar) -->
    <xsl:variable name="ns-path" select="translate(@namespace,'.','/')" />
    <!-- Constructor page -->
    <exsl:document href="{$ns-path}/{@class}-ctor{position()}.html" 
indent="yes">
        <!-- Page title -->
        <xsl:variable name="title">
            <xsl:value-of select="@class" /> Constructor (<xsl:for-each 
select="params/param">
                <xsl:value-of select="@name" />
                <xsl:if test="position() != last()">, </xsl:if>
            </xsl:for-each>)</xsl:variable>
        <html>
            <head>
                <!-- Script to set frameset title onload -->
                <script type="text/javascript">
                        function asd(){
                            parent.document.title="<xsl:value-of 
select="$title" />";
                        }
                    </script>
                <title>
                    <xsl:value-of select="$title" />
                </title>
                <link rel="stylesheet" type="text/css">
                    <!-- Relative path to the CSS file -->
                    <xsl:attribute name="href"><xsl:call-template 
name="root-path-gen"><xsl:with-param name="path" select="$ns-path" 
/></xsl:call-template>xmldoc.css</xsl:attribute>
                </link>
            </head>
            <body topmargin="0" onload="asd()">
                <!-- Page header -->
                <xsl:call-template name="header-gen">
                    <xsl:with-param name="text" select="$title" />
                </xsl:call-template>
                <div id="nstext" valign="bottom">
                    <p>
                        <xsl:apply-templates select="summary" />
                    </p>
                    <!-- Constructor parameters -->
                    <xsl:if test="param">
                        <h4 class="dtH4">Parameters</h4>
                        <dl>
                            <xsl:for-each select="param">
                                <dt>
                                    <i>
                                        <xsl:value-of select="@name" />
                                    </i>
                                </dt>
                                <dd>
                                    <xsl:apply-templates />
                                </dd>
                            </xsl:for-each>
                        </dl>
                    </xsl:if>
                    <!-- Exceptions -->
                    <xsl:if test="exception">
                        <h4 class="dtH4">Exceptions</h4>
                        <div class="tablediv">
                            <table cellspacing="0" class="dtTABLE">
                                <tr valign="top">
                                    <th width="50%">Exception Type</th>
                                    <th width="50%">Condition</th>
                                </tr>
                                <xsl:for-each select="exception">
                                    <tr valign="top">
                                        <td width="50%">
                                            <xsl:value-of select="@name" />
                                        </td>
                                        <td width="50%">
                                            <xsl:apply-templates />
                                        </td>
                                    </tr>
                                </xsl:for-each>
                            </table>
                        </div>
                    </xsl:if>
                    <xsl:apply-templates select="remarks" />
                    <xsl:apply-templates select="example" />
                    <h4 class="dtH4">See Also</h4>
                    <p>
                        <a href="{@class}.html">
                            <xsl:value-of select="@class" /> Class</a> | 
<a href="{@class}-members.html">
                            <xsl:value-of select="@class" /> Members</a> | 
<a href="namespace-summary.html">
                            <xsl:value-of select="@namespace" /> 
Namespace</a>
                    </p>
                    <xsl:call-template name="footer-gen" />
                </div>
            </body>
        </html>
    </exsl:document>
</xsl:template>

As you can see, the result can be a bit overwhelming. For instance, from the single MultiXmlTextWriter.xml documentation file, generated by C# compiler from the MultiXmlTextWriter sources, the xmldoc.xsl stylesheet generates complete MSDN-style documentation, consisting of 52 HTML pages and a CSS file. And all this is done in just one XSL Transformation run:

MultiOutTransform.exe MultiXmlTextWriter.xml xslt\xmldoc.xsl doc\index.html 

Below is the resulting HTML-frameset page with the MultiXmlTextWriter Constructor page loaded into the right frame:

Figure 4. Generated MSDN-style documentation with the MultiXmlTextWriter constructor page

Strings Attached

Of course, the demonstrated approach has some drawbacks I have to mention. Speed, low memory footprint, and simplicity were all important to me so I decided to customize the XmlTextWriter class to do the redirecting job, but this unconditionally assumes the XSL Transformation is always done in XML, so actually there is no way to produce real HTML (not XHTML) result documents. More specifically, main result document is always XML, but subsidiary result documents may be written either as XML or as text, depending on the method attribute value of the appropriate exsl:document element.

Moreover, the xsl:output element is ignored. That only affects outputting of the main result document though, because as I said earlier, the xsl:output element does not affect outputting of subsidiary result documents. It's completely controlled by the exsl:document element. Instead, you can get some control over outputting of the main result document using the MultiXmlTextWriter properties inherited from the XmlTextWriter class, particularly with encoding and indentation.

The disabling of output escaping feature is ignored as always when XSL transformation is performed to an XmlWriter.

The above restrictions are not extremely stringent, but even if they don't fit someone's requirements, implementing the redirecting layer in another way can easily make them customizable.

At the same time, the demonstrated approach also has some additional virtues. First of all, it shares multiple output semantics with XSLT 1.1 and XSLT 2.0 Working Drafts, as well as with other XSLT processors, which supports multiple output through an extension element. This effectively means a solution based on such implementation can be easily ported to another XSLT processor and that it will be compatible with future XSLT 2.0 models. Moreover, even now such a solution may benefit from using the exsl:document element as defined by the EXSLT initiative , being compatible with all XSLT processors, supporting exsl:document extension element.

Acknowledgements

First of all, thanks to Dare Obasanjo for inspiring and reviewing this article, especially for the great idea of MSDN-style documentation generation. The invoice processing example was inspired by Kurt Cagle's Web log.

Thanks to Michael Kay for his help in my small historical investigation and all his ongoing work on XSLT technology development. Also, thanks to Jeni Tennison, Uche Ogbuji, and other EXSLT initiative supporters for their efforts to standardize and document extensions to XSLT language.

Feel free to post any questions or comments about this article on the Extreme XML message board on GotDotNet.

Show:
© 2014 Microsoft