SQL Server 2000 and XML: Developing XML-Enabled Data Solutions for the Web
|Scott Howlett and Darryl Jennings|
|This article assumes you're familiar with XML, ASP, and ADO|
|Level of Difficulty 1 2 3 |
|Download the code for this article: SQLXML.exe (516KB) |
|Browse the code for this article at Code Center:
XML Data Solutions |
|SUMMARY Using XML for data access allows you to separate the data from the presentation, and promotes reuse, extensibility, and division of labor. XML also has a simplified data model, which promotes easier testing.|
This article presents and compares five data access approaches, using a variety of technologies including ASP and ADO, XSLT, and DirectXML. Once built, the solutions are compared on the basis of their speed and efficiency.
|uppose for a moment that you're designing a data-driven Internet application. You want to get data out of your database for presentation on the Web. In addition, you want to make sure that the solution is well-architected, but performance is not your primary concern. You would prefer that the solution used for the downstream direction (database to browser) extended well to the upstream direction (browser to database).|
This article is a follow-up to "
Beyond ASP: XML and XSL-based Solutions Simplify Your Data Presentation Layer" in the November 2000 issue of MSDN® Magazine. Before we discuss the various solutions we implemented for the scenario just mentioned, we're going to start with a quick recap of that article followed by a clarification of some topics based on the tons of feedback we've received over the last 10 months.
Following that, we'll outline SQL Server™ 2000 support for XML, focusing on templates and updategrams that use XSD mapping schemas. Then we'll introduce the sample applications that show the ideas in practice. Basically, we built the same sample application different five ways, each time using different combinations of ASP, ADO, XML, XSLT, SQL Server 2000, and .NET technologies. Finally, we'll include benchmarking numbers to address developers' concerns about performance and scalability when using XML-based architectures.
Review In the past, we used XML and XSL in all types of solutions from large-scale enterprise apps right down to five or ten-day mini projects. We were finding that most of the XML resources out there really focused on XML as an enterprise technology, and as a result most sites were still being built in the traditional ASP approach that had remained relatively unchanged since it was first introduced in the early days of Microsoft® Internet Information Server (IIS) 3.0.
Much has changed in the year since the "Beyond ASP" article was published. Most notably, MSXML 3.0 was released in October 2000 with support for the W3C standard XSLT, .NET was announced, and SQL Server 2000 was released, soon followed by two Web releases which enhanced its XML capabilities. In short, these new technologies solidify XML-centric approaches for almost any Internet-based application. Indeed, to optimize development for the Microsoft .NET-based servers, you must put XML at the center of your architecture.
The XML-centric approach to Internet development has gone mainstream. At the same time, the interface to the database is changing from SQL to bidirectional XML, unleashing a host of powerful new possibilities. We will focus here on how emerging technology is making this design concept a reality.
One other note—we spent lots of time on the sample code (there's even a readme file to help get things set up) so give it a try. All the code for this article can be found at the link at the top of this article.
Nine Reasons For Using XML XML promotes separation of presentation and data, reusability, extensibility, and division of labor. In addition, it has a simplified data model, allows one transaction in one place, and helps simplify testing. XML is also becoming increasingly popular for allowing integration with legacy systems, a trend which promises to continue in the future.
We're going to focus on three of these points here:
Each of the five sample applications will be measured against these three criteria.
- Division of Labor. Different team members have different skills, and a well-architected solution maximizes each team member's skill set. To do this effectively, you must abstract the relational data model from the front-end developers.
- One transaction, one place. XML provides entire transactions and complex datasets in a single, portable entity. This portable data container is a leap forward over recordsets or other proprietary data formats.
- Look to the future. Microsoft has adopted XML as one of the key technologies for the .NET platform, which we will discuss as we examine .NET Enterprise Servers.
XML and XSLT It's important to realize that when using XSLT there are no cross-browser issues—transformations can always be performed on the server if the client doesn't support XSLT. In addition, you'll find that the HTML produced by XSLT complies better with the HTML standard than you might get with other approaches.
You'll also find that XSLT is just as powerful as procedural code in terms of its ability to produce complex HTML pages. In addition, for those one in a million exceptions, you can always resort to calling out to script. As an example of the XSLT's power, the BizTalk™ Mapper that can translate EDI documents to XML uses XSLT under the hood. Once you get used to it, XSLT is more modular (and thus more maintainable) than procedural code with embedded HTML.
With XML, there are no disk I/O issues. Because all XML documents are loaded into memory, using the XML/XSLT approach won't result in more disk I/O than other approaches. Developers have expressed concerns about performance when using script code instead of compiled COM objects. To be honest, we built the sample apps using script because they're easier for people to download, install, and run on their own systems. In any event, it turns out that there is a negligible performance difference between script code and Visual Basic®-based COM code on Windows® 2000. Don't believe us? See Architecture Decisions for Dynamic Web Applications: Performance, Scalability, and Reliability.
Developers have also been concerned about performance when using the RecordsetToXMLDoc function, which converts an ADO recordset to an XML document by looping through the records to build the XML. Why shouldn't you use ADODB.Recordset.Save adPersistXML, which produces a 2D XML structure? It turns out that this is only marginally better than using the recordset directly, plus you have no control over the resulting XML structure. The power of XML is its ability to model hierarchical data that is arbitrarily deep—limiting yourself to a 2D hierarchy (which is what you get with adPersistXML) greatly reduces the power of XML.
Another performance concern centered on the number of execution layers that an XML solution imposes. Granted, with a manual conversion of a recordset to XML, another layer is added. However, software is all about abstraction, and this usually means introducing another layer. As you'll see shortly, with the features introduced in SQL Server 2000, you can get all the abstraction benefits without introducing another layer and get great performance improvements at the same time.
As with any new technology, it can take some time to ramp up on XML and XSLT. You should keep in mind that sooner or later you will have to learn XML—just take a quick look at BizTalk and .NET and you'll see one place where Microsoft has already adopted XML.
Figure 1 is a handy reference guide to some of the terminology you'll need to know.
XML with SQL Server 2000 Many developers point out (and rightly so) that the conversion from recordset to XML introduces another execution layer. Even so, we still found many great benefits in the XML/XSLT design pattern. SQL Server 2000 comes with native XML support, and SQL Server 2000 Web Release 2 Beta 1 (WR2b1) adds even more XML-based features, putting the layer issue to rest. You can now get all the benefits of XML without introducing an extra execution layer and the resulting complexity.
The developers at Microsoft went to great lengths to make the XML capabilities of SQL Server 2000 accessible via HTTP. This access method requires that a virtual directory be configured on the server, the details of which are explained in the setup instructions for the sample applications. As of Web Release 2 Beta 1, SQL Server 2000 XML capabilities are also programmatically accessible through the use of ADO command, stream, and connection objects. To gain a better understanding of how both the HTTP and programmatic access methods are used and how they perform, we developed the third sample application using both of them. To our surprise, HTTP-based access was considerably faster than programmatic execution via ADO (see the section on performance at the end of this article).
There are two different approaches for building an XML view based on the contents of a SQL database. As you'll see, the resultant XML can be based on one or more tables and can be arbitrarily deep. The first approach uses the SQL Server 2000 extended SELECT statement syntax and the second takes advantage of support for XPath queries and mapping schemas. SQL extensions can be used with or without a template and don't require a mapping schema. XPath queries, on the other hand, must be used within a template and the template must specify a mapping schema.
SQL Query Extensions The syntax of the SELECT statement has been extended to support the retrieval of XML documents from SQL Server 2000 directly through the use of the new FOR XML clause. The FOR XML clause has three modes of use, each of which allows for a varying degree of control over the appearance of the XML document that it generates. These modes are Raw, Auto, and Explicit. A full explanation of how to use these methods is beyond the scope of this article, but they're covered very well in the SQL Server 2000 documentation. A huge limitation of the SQL query extensions is that they don't facilitate the use of updategrams (which require mapping schemas).
Raw mode (for XML RAW) produces an XML document that is analogous to that produced by the ADODB.Recordset.Save adPersistXML approach (for more info on this, see the ADO documentation). This means that the resultant XML document will have a shallow hierarchical structure and function like a traditional recordset rather than take advantage of the arbitrarily complex tree structure that's available when using XML.
Auto mode (for XML AUTO) is capable of automatically generating an XML document based on the number and order of the tables and the join strategy employed in the FROM clause of the SELECT statement. The creative use of various join flavors can be effective in building many simple XML document structures, but its limitations are quickly exposed when attempting to publish documents with a complicated structure.
Explicit mode allows for complete control of the properties of the resultant XML document and is therefore more complicated to use. Basically, you get full control of the XML document while at the same time losing the benefits of a mapping schema.
Note that the FOR XML clause has an option called XMLData which, if specified, includes an auto-generated XDR schema inline with the XML view, but it's not a mapping schema.
SQL Templates As mentioned previously, templates can be used by specifying the SELECT statement using the SQL query extension FOR XML or by specifying a mapping schema and an XPath query. We feel that the use of templates with XSD mapping schemas and XPath queries is the most powerful mechanism for getting XML-based data directly from SQL Server 2000.
You may recall that an XSD schema is a mapping schema if it includes all of the necessary annotations required by SQL Server 2000 in order to handle the mapping from a relational data store to XML view. In particular, foreign key relationships must be explicitly defined using a sql:relationship tag. These relationships must be enclosed in xsd:annotation tags to keep the mapping schema XSD-compliant.
Figure 2 is an example of these annotations.
Figure 3 shows the XSD mapping schema used for the sample application, except that the annotation from
Figure 2 is not shown in order to save space. (If you are not already familiar with XSD schemas,
http://www.w3.org/TR/xmlschema-0 is a great place to start.) Notice how the tables and columns in the relational data store are mapped to XML elements using the attributes: sql:relation maps the table, sql:key-fields maps the primary keys, sql:field maps the columns, and sql:relationship maps the key relationships.
The use of mapping schemas and templates is preferred to the SQL extensions for a number of reasons. For example, you need mapping schemas to use updategrams anyway and complex mapping schemas are much easier to author than the FOR XML Explicit equivalents. In addition, using the SQL query extensions requires you to write SELECT statements, whereas the addition of a mapping schema does not. You'll also find that mapping schemas are very useful in both .NET and BizTalk scenarios. And finally, XML best practices dictate the use of schemas.
In addition, XSD is preferred over XDR because it is the W3C standard. This template generates an XML view of the data using an XSD mapping schema.
The Visual Studio .NET designer will show you the XSD schema (sqlxml\persons.xsd). The previous template also generates the XML data shown in
SQL Updategrams If the lure of extracting data from a SQL database in a customizable form that's compliant with an XSD mapping schema is not enough, remember that you can also use updategrams. Updategrams allow you to provide a schema-compliant XML document to be mapped back to the database instead of writing your own shredding and inserting code.
Support for updategrams is included in the second SQL Server 2000 Web Release. They allow users to insert, update, and delete relational database information based on the contents of an XML document and its corresponding mapping schema. This makes SQL Server 2000 a bidirectional XML repository! Users can now request information in the form of an XML document, make changes, add and delete information, and then send it back in XML format, having it mapped and stored correctly in the relational database.
Updategrams are easy to use (though at this point the error messages are often very cryptic). Within an updategram, all the work is done between <sync> blocks. There can be any number of <sync> blocks included, and all of the changes made within them are transactional. Each transaction can contain any number of before and after blocks.
The functionality of these blocks is completely intuitive. The <before> block holds a piece of the existing XML view and the <after> block is what you want this piece of the view to look like when the changes are made. All of the insert, update, and delete work is done for you based on the differences in the <before> and <after> blocks and the mapping schema that you provided. (Consult the "XML for SQL" documentation provided with the second Web release for the use of updategrams.) Note that updategrams can affect multiple rows and multiple tables.
Figure 5 is an example of an updategram that modifies data in the pubs database.
How do updategrams deal with concurrent access to the database in terms of preventing User B's updates from overwriting User A's update? The answer lies in the updategram's clever use of an optimistic locking strategy. Basically, SQL Server 2000 will compare the contents of the <before> block in the updategram to what exists in the database. If the data is different, the operation will not proceed because an optimistic locking conflict has been detected.
The more information you include in the <before> block, the more rigorous the conflict detection will be. By including only the key fields, your update will always be applied, unless the key field no longer exists. If you download the sample code for this article, you'll find a sample updategram and the ADO execution code, which only includes the primary key in the <before> block.
So, now we have a mechanism to get XML directly from SQL Server 2000 using template and XSD mapping schemas. We also have a mechanism to get XML directly into SQL Server 2000 using updategrams and the same XSD mapping schema. All of this without writing a single line of SQL. Very cool.
The Sample Applications Next, let's jump right into the sample applications so you get a sense of what we're building. We decided to demonstrate the display of authors and their associated books from the pubs database. Figure 6 shows the resulting view on the data. The view of the data is somewhat complex, and includes some server-generated DHTML that allows the books for each author to be expanded or collapsed as necessary.
Figure 6 DHTML Read-only View
We built the sample page in the five different ways shown in
Figure 7. We will describe each of these solutions/sample applications in the next sections.
Solution 1 The code required to build this page using ASP and ADO (shown conceptually in Figure 8) should look pretty familiar.
Figure 8 #1
The complete code listing can be found in straightasp.asp, and
Figure 9 shows a portion of the required code. It's very clear that the data access code is hopelessly intertwined with the presentation code.
Figure 10 shows how well this solution ranks on the various criteria.
Solution 2 In the approach we call Solution 2 (defined conceptually in Figure 11), we make the same calls to the database as we did in Solution 1, but instead of generating the HTML as we went, we got all of the data in one function call (GetXMLFromDB) and then applied an XSLT stylesheet. In this case, the data has been completely removed from any presentation formatting.
Figure 12 shows the code required to build the XML from the database.
Figure 11 #2
The RecordsetToXMLDoc function was covered in the November 2000 article. An updated version of this function can be found in the xml\xmlutil2.asp file in the code download. Once we have the data, it's a pretty simple exercise to do the transform.
Figure 13 shows the code required to perform the transform.
You'll notice that there are three different transforms performed, based on values in the query string. The first does the transforms using defaultss.xsl, which shows the raw XML. The difference between TransformXML and TransformXML2 is that TransformXML2 uses a cached stylesheet (via the XSLTemplate object) that is stored in an Application variable. If you want to use XSLTemplate, a great starting point is the article "
Inside MSXML3 Performance", which also discusses how this approach improves performance. The code for both functions, TransformXML and TransformXML2, can be found in inc\xmlutil2.asp (in the download). Solution 2 is summarized in
Solution 3 Solution 3, shown conceptually in Figure 15, looks much like the last solution, but instead of building the XML by hand we use the XML capabilities of SQL Server 2000. The section "XML with SQL Server 2000" explains this process in detail, but what we've done is create a view of the data using an XSD schema. Using this view, we specify an XPath expression to extract the data we want.
Figure 16 shows how this is accomplished programmatically using the SQLXMLOLEDB provider that is included with SQL Server 2000 WR2b1.
Figure 15 #3
SQL templates can be executed programmatically using ADO (via the SQXMLOLEDB provider) or using the SQL Server 2000 http-based access mechanism. This solution demonstrates both execution options using the same template, so that they can be compared. The HTTP-based access was not only much easier to program, it was also faster (see the section on performance at the end of this article).
The code for ShowAuthorsView is pretty much the same as in Solution 2, but we added support for HTTP-based data access. In addition, we added support for different XPath expressions to be used in the template. Figure 17 shows the sample application with an XPath filter applied (the filter will only show authors with a contract value of zero).
Figure 17 Filtering Using XPath
XPath queries didn't always work as we expected (beta software, remember?). Here's what we found when playing around with XPath queries. First, XPath expressions that attempt to filter based on a child node that is not an immediate child fails (person[books/book/title_id='PS3333']) with the error "XPath: an unexpected internal error occurred". Second, XPath expressions that attempt to filter based on a child that is contained by a node marked with sql:is-constant (for example, person[address/state='CA'] fails with the error "The column prefix '_Q1G' does not match with a table name or alias name used in the query."
Hopefully, when WR2b1 for SQL Server 2000 is released, we'll be able to use rich XPath expressions such as: person[books/book/price > 10], which means all persons who have authored books that sell for more than $10, and person[books/book/publisher/location/country='USA'], which means all persons who have authored books for publishers located in the U.S. For a summary of the ratings for Solution 3 see
Solution 4 Solution 4 (modeled in Figure 19) works in much the same way as Solution 3, except we use the .NET XML WebControl to do the transform for us.
Figure 19 #4
We only had to write four lines of code in the Page_Load method (from dotnettransform.aspx.cs) to load the source XML document (using SQL Server 2000 HTTP-based XML access) and to load the stylesheet.
The code in inc\xmlutil.asp is not required for this solution, and, as such, no knowledge of the DOM is required to perform the transform using .NET. Solution 4 is summarized in
private void Page_Load(object sender, System.EventArgs e)
// Put user code to initialize the page here
XmlDocument doc = new XmlDocument();
ctlXML.Document = doc;
ctlXML.TransformSource = "xsl\\authors3.xsl";
Solution 5 For Solution 5, outlined in Figure 21, we thought we'd produce a view of the data using a DataGrid in .NET to see what type of functionality we could get without writing any code (C# or XSLT).
Figure 21 #5
It turns out that producing the view of the data in exactly the same format as Solutions 1 through 4 is beyond the scope of this article. However, producing a simple view of the authors (without the client-side DHTML) was a really straightforward exercise. Nearly everything could be accomplished without writing any code at all! Here are the steps we took:
At the end of the day, we produced a view of the data that is a starting point for producing a view similar to that in Solutions 1 through 4 (see Figure 23). .NET holds lots of promise for working with XML-based datasets.
Figure 24 summarizes Solution 5.
- To start, we created a new dataset class using the XSD schema that we saw earlier by right-clicking Project | Add Component | Data Set. This creates a new XSD file and an underlying CS file that is autogenerated by Visual Studio. Then we pasted our schema (which we created by hand) into the XSD file. Alternatively, you could use the XML designer to create the XSD file.
- Next, we dragged a DataSet component from the Toolbox onto the page. This opened a wizard where we created a typed dataset based on the dataset class created in Step 1. We called the dataset dsPersons and updated the Page_Load event handler to retrieve the data from SQL Server 2000 using HTTP, as you can see in the code in
- From there, we dragged a DataGrid component from the Toolbox onto the ASPX page by right-clicking Datagrid | Property Builder. We selected dsPersons as the DataSource and person as the data member.
- Next we used the Property Builder to select the columns we wanted to display.
- From there, we configured the Paging, Format, and Borders areas in the Property Builder. We added an event handler for the OnPageIndexChange event (by adding the attribute OnPageIndexChanged="dgperson_PageIndexChanged" to the <asp:DataGrid> tag in the .aspx file. The following code from dotnetgrid.aspx.cs shows the server-side implementation to change the pages.
public void dgperson_PageIndexChanged(object source,
dgperson.CurrentPageIndex = e.NewPageIndex;
Figure 23 .NET DataGrid
Performance For completeness, we benchmarked the five solutions to compare the performance of each approach. Note that you should not compare Solution 5 to the others because the views of the data are not identical.
The tests were run using Homer (the Microsoft load testing application) against a server running Windows 2000 Server, SQL Server 2000 Web Release 2 Beta 1 and .NET Framework Beta 2. Homer stores its configuration information in a Microsoft Access .mdb file. We've included the Homer .mdb file in the sample code zip file, so you can open it up and benchmark each of your own various configurations.
The server was a Dell Dimension XPST500, 550MHz with 256MB of RAM. Homer was running on a separate workstation. Pay special attention to the relative performance (compared to the ASP/ADO approach) as shown in the last column of
Look at the whopping 11.1× performance improvement when using HTTP-based data access and XSLT (Solution 3) and performance nearly nine times as good when using HTTP-based data access and .NET Transform (Solution 4). Although these stats aren't exactly scientific, it does show the awesome performance of SQL Server 2000 and .NET. Also note that the simplistic caching XSLT mechanism used did not always increase the performance.
Conclusion There is little doubt that XML will continue to play a key role in this Internet-centric era. Microsoft has a complete XML-based offering from existing released products like BizTalk and SQL Server 2000 to beta products like .NET. The challenge for software architects is to take full advantage of these server products by using XML throughout their apps.
The use of XSD mapping schemas to enable XML-based data to flow both to and from SQL Server 2000 is extremely powerful. By using XSD, you'll get immediate portability to .NET, while working with a W3C industry standard. At the same time, you'll also realize maximum performance when using XSLT and HTTP-based data access.
| For related articles see:|
Beyond ASP: XML and XSL-based Solutions Simplify Your Data Presentation Layer
A Survey of Microsoft SQL Server 2000 XML Features
For background information see:
A Guide to XML and its Technologies
Inside MSXML3 Performance
| Scott Howlett and Darryl Jennings are consultants at imason Inc., an Internet consulting company specializing in legacy integration for companies building business-to-business and line-of-business applications. Reach them via e-mail at firstname.lastname@example.org and email@example.com.|