Solving the SAX Puzzle: Off-the-Shelf XML Processing
Eldar A. Musayev, Ph.D.
Microsoft Corporation
December 2000
Summary: This article shows how to capitalize on component-based design by considering a simple SAX program that utilizes this advantage. (4 printed pages)

Figure 1. SAX program and its various components
SAX (Simple API for XML) is usually considered a low-level, high-performance XML parser. However, it offers another important advantage—component-based design. This article shows how to capitalize on component-based design by considering a simple SAX program that utilizes this advantage.
Using an XML Filter
When using SAX to read an XML document, the reader throws a series of events based upon the content (the start of an element, the end of an element, character content, a processing instruction, and so on) it encounters. To receive these events, the application provides handlers. The reader calls these handlers when it encounters appropriate XML content.

Figure 2. The SAX reader calls handlers when it encounters appropriate XML content
Now, imagine a component that implements all the SAX handlers, just like the application does. At the same time, this component also simulates a SAX reader by passing further received events to the handlers implemented by other applications or components. Such a component behaves like an escrow agent in the real world—looking like a buyer to the actual seller (paying money and receiving merchandise), while also looking like a seller to the actual buyer (receiving money and supplying merchandise).

Figure 3. Escrow agent example
Unlike a real-world escrow agent, this component, or SAX XML filter, doesn't handle the money exchanged between buyers and sellers. Instead, a SAX XML filter handles the events exchanged between other SAX readers and handlers. In essence, the escrow agent role played by a SAX XML filter is to look like a handler to a reader, while also looking like a reader to another handler.

Figure 4. The SAX XML filter handles events exchanged between other SAX readers and handlers
In the real world, you wouldn't want several escrow agents between you and a buyer or seller. Yet, in SAX, such agents may be quite useful since they are not just passing SAX events from one to another, but also doing something useful on the way. This lets you separate different steps of a business process into different components. Thus, using SAX and XML filters gives you the advantages of both an object-oriented designed application and a low-level, high-performance parser.
Creating a Simple Program
Suppose that you write a server application that processes orders. It receives an order from a customer, expands the items on the order, checks the inventory, calculates the tax, calculates the shipping and handling (S&H), charges the credit card, sends the order for fulfillment (if successful), and generates order confirmation in the form of a "Thank You" page. Meanwhile, it also logs the order.
With the appropriate SAX components, this is quite easy. You just piece together several of these components, just like in a puzzle. With each piece doing its work, you solve the whole task.

Figure 5. SAX program that utilizes component-based design
In Microsoft® Visual Basic®, such a program might look like:
Sub ProcessOrder ( inStream As IStream, outStream As IStream )
' Create components
Dim reader As New SAXXMLReader
Dim filter0 As New XMLWriter
Dim filter1 As New ExpandItems, filter2 As New CalculateTaxSH
Dim filter3 As New ChargeCreditCard
Dim filter4 As New XMLWriter, filter5 As New XMLWriter, filter6 As New
XMLWriter
' Connect them
Set filter0.parent = reader
Set filter1.parent = filter0
...
Set filter6.parent = filter5
' Set parameters
Set filter0.output = "raw_order.log"
Set filter4.output = "order.log"
Set filter5.output = "http://fulfillment/order.asp"
Set filter6.output = outStream
' Ready... go!
filter3.parse inStream
End Sub
Creating such a program is both simple and challenging. In order for this program to work, you have to make sure to implement all the components called by the program. Luckily, you don't have to implement them yourself. You have two options before taking component development in-house:
- You can use the off-the-shelf components created by Microsoft, your business partner, your software provider, or other third-party company. The components provided with the Microsoft XML Parser (MSXML) are a good example of off-the-shelf components.
- You can outsource the development of custom components to an external consulting company or freelancer.
In the sample program shown earlier, one of the off-the-shelf components that you can use is the SAX reader, SAXXMLReader, provided with MSXML. Another component would be an XML writer that generates the necessary XML and outputs it to a file, HTTP POST, or IStream. With a few more lines of code, the XML writer provided with MSXML can do all of this.
However, this is not just about Microsoft-supplied components. It seems likely that there will be a number of XML writers available, providing output to different destinations and different formats, as well as a number of readers able to get from different sources and different formats. For example, the credit card company that services your merchant account may provide its own ChargeCreditCard component that utilizes its own reader and writer interfaces.
Given these off-the-shelf solutions from Microsoft and others, you would typically only have to implement a few components on your own. For example, the sample Visual Basic program shown earlier only requires you to implement the expanding items and S&H components. However, you don't necessarily have to implement those components if you don't want to. Since SAX is a standard API, as well as XML in standard format, it's easy to subcontract these components to any freelance developer in the world. Of course, if you do that, you might be missing out on the fun of writing them yourself.
Furthermore, you don't have to limit your use of SAX components to XML data. You can apply these components to some general data that can be represented as XML. For example, there could be a reader that receives an Active Server Pages (ASP) Request object and fires up the SAX events describing the submitted form (which is normally urlencoded or plain text), and a writer that generates XHTML to the ASP Response object. With such components, you can easily create SAX/XML processing within an ASP page.
Conclusion
Though many components are not yet off-the-shelf, there are existing components that make implementation simpler. Furthermore, there are more components on the way. In the interim, using SAX allows you to obtain an object-oriented, component-based design for your eBusiness XML application.