The XML Files

We were unable to locate this content in de-de.

Here is the same content in en-us.

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.
MIND

XML-based Persistence Behaviors Fix Web Farm Headaches

Aaron Skonnard


Code for this article: XMLFILES0300.exe (38KB)


D
evelopers usually want their Web applications to work successfully for the widest possible audience, so they frequently develop for the lowest common denominator. When it comes to managing state on the client machine, most developers use cookies or hidden-field forms and embedded query string values. However, if you can be sure your users are running Microsoft® Internet Explorer 5.0, you can take advantage of the newly introduced client-side persistence behaviors that offer a better solution for managing state.
      The four new behaviorsâ€"saveFavorite, saveHistory, saveSnapshot, and userDataâ€"offer a unique mechanism for saving arbitrary state somewhere on the client machine, whether it be in memory or on the hard disk. Some of these client-side persistence behaviors even allow you to encode the state as XML on the client machine. For more information on these behaviors, see my Inside Knowledge column in the December 1999 issue of Microsoft Internet Developer (http://www.microsoft.com/mind/1299/inside/inside1299.htm)
      Storing state as XML has several advantages. First, using XML allows you to maintain the structural identity and meaning of your data, whereas using cookies forces you to flatten your data structures to a string of simple name/value pairs. Second, the XML data stores allow you to save more data on disk. Cookies only allow up to 4KB of data per string, whereas XML data stores can be as large as 64KB. Third, using XML allows developers to take advantage of the integrated MSXML parser in Internet Explorer 5.0 to access data and perform sophisticated queries through XSL patterns.
      In my December 1999 Inside Knowledge column, I presented a sample application that illustrates how to use the XML userData behavior to implement a complete shopping cart application that lives on the client machine. With this application, the user can choose books from an inventory page and add them to the shopping cart without forcing round-trips to the serverâ€"a real performance boost. The shopping cart data can be viewed, modified, and then sent to the server.

Back to the Server


      Everyone agrees that XML is a prime candidate for encoding state on the client machine. The problem with this approach, however, has to do with browser compatibility. While some developers can target an environment where every user is running Internet Explorer 5.0, most don't. Therefore, they are forced to use cookies or hidden fields.
      But it's difficult to manage large amounts of data on the client machine this way. In most situations, it makes more sense to store the state on the Web server and make the additional roundtrips when necessary. The client machine will still need a cookie or hidden field to serve as the user's logical ID, but everything else will be maintained on the server.
      There are several mechanisms for storing state in a Web farm. The most common approaches are using the built-in ASP Session object and storing the data in an isolated relational database (like Microsoft SQL Server™). Storing state in memory on the Web server can be problematic on a Web farm because of load balancing, so most developers tend to isolate state into relational databases even though it takes longer to access it.

Application In-memory State Management


      There are certain situations, however, where it does make sense to store application state in memory on the Web server, even within a Web farm. If your application employs read-only state that is infrequently updated, it makes sense to store that state in memory on the Web server to offload the database machine.
      To make this work, all of the Web servers within the Web farm must look at the same physical data store to load the application state into memory. Once loaded, all Web servers within the farm will have identical information in their in-memory caches. Propagating updates between the different Web servers in the farm is not a concern since the data is read-only. If the application state has to be updated externally by an administrator, all of the Web servers will reload their cache when told to do so.
      Microsoft employs this technique on some of its very large Web sites, including MSN™. Since much of the information on home.msn.com doesn't change very often, it makes sense to store as much of that information as possible in memory. The MSN developers wanted to use the VBScript Dictionary object to hold some of the global data that is used throughout the site. This would have forced them to store the Dictionary object reference within the ASP Application object. The significant problem with this approach is that the VBScript Dictionary object is marked as ThreadingModel=Apartment, and assigning apartment-threaded objects to the Application or Session objects can severely impair Web server performance.

The LookupTable Object


      The developers at Microsoft set out to create a version of the VBScript Dictionary object, called LookupTable, that didn't suffer threading problems. The LookupTable object is marked as ThreadingModel=Both and doesn't encounter performance problems when accessed from multiple ASP threads simultaneously. The LookupTable allows you to load name/value pairs into memory from a comma-delimited file that contains the application state. It can also be configured to reload the file periodically to account for any changes to the underlying file. Figure 1 shows a complete list of the LookupTable methods.
      The following code illustrates how to create and initialize the LookupTable object within an application's global.asa file:


<object id="LookupTbl" progid="IISSample.LookupTable"
runat="Server"
scope="Application"></object>

<script language="VBScript" runat="Server">
Sub Application_OnStart()
  LookupTbl.LoadValues Server.MapPath("appstate.txt"), 0,
    3600
End Sub
</script>
This code loads the name/value pairs from the appstate.txt file and will reload the file every hour (3600 seconds). Then, from any ASP page in the application, quick lookups can be performed via the LookupValue method:


<%= LookupTbl.LookupValue(someValue) %>
      Even though the actual data may exist in a relational database, it makes sense to pull this information out of memory because it's so fast and inexpensive to do so. Why overload the database server with multiple read-only queries? There are many Web applications that would benefit from the use of the LookupTable object for dealing with their application data.
      For more information on the LookupTable object, point your browser to http://msdn.microsoft.com/workshop/server/downloads/lkuptbl.asp, where you can download the DLL, some samples, documentation, and even the source code. Microsoft does not officially support the LookupTable object; they simply provide it as a sample component that you are free to use and extend however you want.

Problems with Dictionary Objects


      The main problem with Dictionary objects is that they're only capable of storing simple name/value pairs, just like client-side cookies. So it's very difficult to maintain the structure of your application state because it must be flattened to name/value pairs to fit into the Dictionary object.
      It's also difficult to perform complex queries on the Dictionary object. For example, suppose you want to retrieve all headlines that belong to the sports category. To do this using a Dictionary object, you have to go through each entry and somehow figure out if it belongs to the given category because the data is stored as a continuous string.
      Nevertheless, Dictionary objects are commonly used today in many large Web applications because of the benefits described earlier. But the reality is that Dictionary objects can be used efficiently for only certain sets of very simple tabular data, and not for more complex data structures that will require sophisticated retrieval capabilities.

Why not XML?


      XML makes state easier to manage on the server as well as on the client machine. Using XML, developers can maintain the structural meaning of state information while making it easy to manipulate through standard XML handler APIs.
      If the application state is maintained in-memory as XML, performing a simple retrieval like the one described previously (get all headlines that belong to the sports category) is a piece of cake. With XML, you can take advantage of XSL patterns to query the application data and retrieve the pieces of information you want. You can display portions of the application state to the user by performing XSL transformations on the XML data.
      This approach, like the LookupTable, works well for read-only data that doesn't change very often. It is similar to the LookupTable in that you'll have to reload the in-memory XML cache periodically to account for any changes in the underlying XML file. By using XML in-memory on the Web server, you get all of the same benefits as using a LookupTable object, plus data that is much more accessible and easier to deal with.

DOMDocument versus DOMFreeThreadedDocument


      As long as you have MSXML 2.0 installed on the Web server (either by installing MSXML 2.0 itself or Internet Explorer 5.0), you can take advantage of the Microsoft XML API either directly from an ASP page or from a custom ASP component. MSXML provides an XML Document Object Model (DOM) implementation as outlined by the W3C DOM Level 1 Specification (see http://www.w3.org). The W3C DOM Document object is implemented in MSXML by the DOMDocument coclass.
      The following ASP script illustrates how to create an instance of the MSXML DOMDocument and load an XML file:


<%
    set doc = Server.CreateObject("MSXML.DOMDocument")
    b = doc.load("somefile.xml")
%>
      If you dig around the MSXML library a bit more, you'll also find a coclass named DOMFreeThreadDocument. Even though the DOMDocument and DOMFreeThreadedDocument coclasses are both marked as ThreadingModel=Both, only DOMFreeThreadedDocument is officially safe to use in a freethreaded situation without external synchronization efforts. Furthermore, the DOMFreeThreadedDocument coclass aggregates the freethreaded marshaler (FTM), making it possible to avoid thread context switches within a process.
      To make the previous example use the DOMFreeThreadedDocument coclass, you only need to change the progID representing the coclass:


<%
    set doc =
        Server.CreateObject("MSXML.FreeThreadedDOMDocument")
    b = doc.load("somefile.xml")
%>
You may have also seen sample code that makes use of the Microsoft.FreeThreadedXMLDOM progID to create the same MSXML Document coclass. Both will work since both progIDs map to the same coclass. Figure 2 will give you a better idea of how the progIDs map to the coclasses.
      Both coclasses implement the same interfaces and provide the same level of functionality. The only difference between the two is how they deal with threading and marshaling.

DOMFreeThreadedDocument to the Rescue


      You can save object references to session or application scope by executing the following lines of code in any ASP page:


Set obj = Server.CreateObject("SomeLib.SomeObj")
Set Session("someObjRef") = obj
Set Application("someObjRef") = obj
In the case of the Session object, the object reference is available to any thread executing on behalf of a specific/particular user session. In the case of the Application object, the object reference is available to any thread executing on behalf of any user session within the application.
      As you probably already know, ASP is very sensitive to the ThreadingModel of an object when it's stored at application or session scope, as shown earlier. The only sure-fire way to get around these nasty issues is to make sure the coclass is marked as ThreadModel=Both and the coclass aggregates the FTM. When a coclass aggregates the FTM, it can be called from any thread within the same process without the overhead of a thread context switch (even if they are in different apartments). This is often referred to as being apartment-neutral.
      If you store an apartment-neutral object at session or application scope, ASP does not do any of its thread weirdness and plays by the standard COM rules (no thread pinning and so on). For more details on these threading issues, see Don Box's House of COM column in the September 1998 issue of Microsoft Systems Journal (http://www.microsoft.com/msj/0998/com0998.htm).
      If you attempt to store an object reference that doesn't aggregate the FTM at application scope, ASP will give you the following warning message:


Error Type:
Application object, ASP 0197 (0x80004005)
Disallowed object use
Cannot add object with apartment model
behavior to the application intrinsic object.
/App/global.asa, line 24
      If your object doesn't aggregate the FTM, every method invocation on the application-scoped object will result in a thread context switch. ASP would rather not have this happen, so every time you attempt to save an object reference at application scope, ASP will query the object to see if it aggregates the FTM. If it doesn't, the assignment will fail and produce the previous error message. You can turn this behavior off in Microsoft Internet Information Services 5.0 by setting the AspTrackThreadingModel metabase property to True (although this is not recommended because of the obvious side effects).
      Hence, when trying to decide which XML Document coclass to use for managing application state, it's an easy decision to go with DOMFreeThreadedDocument. Since the DOMFreeThreadedDocument coclass aggregates the FTM, storing its object references at application scope won't produce any unwanted thread context switches, as shown here:


<%
    set doc =
        Server.CreateObject("MSXML.FreeThreadedDOMDocument")
    b = doc.load("somefile.xml")
    set Application("xmldoc") = doc
%>
      While this approach will definitely work just like the LookupTable object for managing application state, it probably makes sense to create an application-specific component that encapsulates the details of working with DOMFreeThreadedDocument and your application-specific data.

AppHeadlines Sample


      To help illustrate how this can be done, I wrote a sample component called AppHeadlines.HeadlineMgr, which is included in the downloadable source code for this column. This component encapsulates the details of working with application-wide information headlines that need to be shown on different pages throughout the application. These headlines don't change more often than a couple of times a day. But at the same time, you would like to be able to query the headlines by category, affiliation, importance, and so on. As a result, this type of application data makes a perfect candidate for an in-memory XML cache.
      The AppHeadlines.HeadlineMgr component will be used by the application's various ASP files and needs to live at application scope. In fact, the application that uses this component will have a global.asa file that looks something like this:


Sub Application_OnStart
    ' create the HeadlineMgr component
    set obj =
        Server.CreateObject("AppHeadlines.HeadlineMgr")

    ' initialize object here

    ' save objref at application scope
    ' this works because the HeadlineMgr is marked as 'both'
    ' & it aggregates the FTM
    set Application("objHeadlines") = obj
End Sub
      Because you'll be saving the AppHeadlines.HeadlineMgr component at application scope, it must also aggregate the FTM and be marked as ThreadingModel=Both. Since Visual Basic® is unable to create components that aggregate the FTM, I'll use Visual C++® and ATL.
      To accomplish this, I created a standard ATL DLL project called AppHeadlines and used the ATL Object Wizard to insert a new Simple Object called HeadlineMgr. When using the ATL Object Wizard, be sure to choose ThreadingModel=Both and don't forget to check the Free Threaded Marshaler checkbox. The wizard will generate a class that aggregates the FTM for you. If you look in HeadlineMgr.h, you'll notice that there is a call to CoCreateFreeThreadedMarshaler in the FinalConstruct method.
      One thing you'll want to remember when aggregating the FTM is that all object references held by the object must also be apartment-neutral. If your object, which aggregates the FTM, creates another COM object and holds an apartment-relative reference, any call into that object will likely fail. If an FTM object creates and holds onto another FTM object, everything will work as expected since they're both apartment-neutral.
      Whether an object aggregates the FTM is an implementation detail that you shouldn't get in the habit of making assumptions about. A better way to ensure that all object references are apartment-neutral is to use the Global Interface Table (GIT). In this sample, however, AppHeadlines.HeadlineMgr only creates and holds onto instances of the DOMFreeThreadedDocument coclass, which also aggregates the FTM. For this example, I can safely avoid using the GIT.
      In the FinalConstruct method of the HeadlineMgr class, I created three instances of DOMFreeThreadedDocument and set them to always load synchronously. One of the instances is used to hold the XML headlines, and the other two hold stylesheets used for output. All three of the instances are created the same way:


hr = m_spDomDocHeadlines.CoCreateInstance(
     __uuidof(DOMFreeThreadedDocument));
m_spDomDocHeadlines->put_async(VARIANT_FALSE);
The default interface for the HeadlineMgr component is IHeadlineMgr, which must support IDispatch if you plan to use it from VBScript. Figure 3 describes the IHeadlineMgr methods.
      Most of these methods are straightforward as you can see in HeadlineMgr.cpp, which is part of the downloadable source code. The LoadHeadlines method simply calls the IXMLDOMDocument::load method to load the specified XML and XSL files into memory.
      The PrintHeadlines method calls IXMLDOMNode::transformNode to perform an XSL transformation to HTML. The PrintHeadlinesByCategory, PrintHeadlinesByAffiliation, and PrintHotStories methods first create an appropriate XSL pattern string and then call PrintNodeList. PrintNodeList uses the IXMLDOMNode::selectNodes method with the specified XSL pattern string to retrieve only the desired headlines before performing a node-by-node transformation to HTML.
      Finally, the AddHeadlineBatch method expects to receive an XML document containing the new headlines. The method loads the XML into a new DOMFreeThreadedDocument instance and then adds each headline child node to the existing in-memory headline document.
      As you can see, the implementation of the HeadlineMgr component is quite simple, yet it does encapsulate many of the underlying details.

Using HeadlineMgr


      To illustrate how this component would be used in a Web application, I've included a Visual InterDev-based project called AppHeadlinesTest in the source code download files. This ASP application contains a global.asa file that creates an instance of the AppHeadlines.HeadlineMgr component, calls the IHeadlineMgr::LoadHeadlines method, and saves the object reference into the Application object for later use as shown here:


Sub Application_OnStart
    ' create the HeadlineMgr component
    set obj = _
        Server.CreateObject("AppHeadlines.HeadlineMgr")
    ' load the headline xml doc, main
    ' stylesheet, and node stylesheet
    obj.LoadHeadlines_
    Server.MapPath("headlines.xml"), _
    Server.MapPath("headlines-all.xsl"), _
    Server.MapPath("headlines-node.xsl")
    ' save objref at application scope
    ' this works because the HeadlineMgr is
    ' marked as 'both' & it
    ' aggregates the FTM
    set Application("objHeadlines") = obj
End Sub
      The Application_OnStart method is called when the application first starts. So before any other ASP file is called, the headlines.xml file will be loaded into memory and available for use. To reload the in-memory XML Document, you can simply call LoadHeadlines from any other ASP page, as shown in the previous code. To get an idea of what the headlines look like, see Figure 4, Figure 5, and Figure 6.
Figure 7 Printing All Headlines as XML
Figure 7 Printing All Headlines as XML

      The AppHeadlinesTest project also contains several other ASP files for formatting and viewing the headlines. Once the HeadlineMgr component has been created and initialized, using it from any other ASP page in your project is a simple task. For example, the following four lines of ASP script are all you will need to print all the headlines as HTML (see Figure 7):



<%
' retrieve objref from application scope
set obj = Application("objHeadlines")
' print the headlines
Response.Write obj.PrintHeadlines
%>
      To do something a little more complicated, like printing headlines by category (see Figure 8), the ASP script needs to pull the category value from the form and call PrintHeadlinesByCategory as shown here:


<%
cat = Request.Form("category")
if cat >< "" then
   set obj = Application("objHeadlines")
   Response.Write obj.PrintHeadlinesByCategory(cat)
end if
%>
      The AppHeadlinesTest sample project illustrates how to use all of the HeadlineMgr functionality. A component like HeadlineMgr that completely encapsulates the process of working with XML data makes it very easy to use from an ASP application. Plus, if the headline XML file format changes down the road, you can update the component and your ASP application remains unaffected.
Figure 8 Printing Headlines by Category
Figure 8 Printing Headlines by Category

Conclusion


      Using XML to manage application state makes it possible to maintain the structure of application data and perform more sophisticated queries (like printing headlines by category, affiliation, and so on) than you could with the LookupTable object. As long as the component behaves properly in terms of ASP's threading requirements, the XML cache approach can greatly improve application performance and offload the database server.
      The sample application and the associated test project are included in the Microsoft XML Resource Kit, which is available at http://www.microsoft.com/xml.

Aaron Skonnard works as a consultant, instructor, and author specializing in Windows-based technologies and Web applications. Aaron teaches courses for DevelopMentor and is the author of Essential WinInet (Addison-Wesley Longman, 1998). You can contact him at http://www.skonnard.com.

From the March 2000 issue of MSDN Magazine.

Page view tracker