MSDN Magazine > Issues and Downloads > 2000 > April >  Cutting Edge: Exchanging Data Over the Internet...
This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.
MIND

Cutting Edge

Exchanging Data Over the Internet Using XML

Dino Esposito

I
n the last two installments of this column, I've discussed various techniques and technologies that allow you to exchange data over the Web in a traditional, client/server-oriented fashion. I've discussed the principles and the tools you need to work with Remote Scripting (RS) and the whole set of objects that Remote Data Services (RDS)â€"part of Microsoft® Data Access Components (MDAC) 2.xâ€"makes available. For the most part, I've considered distributed modules that are part of the same application and need to exchange structured information. As long as your platform is Windows®-based, ActiveX® Data Objects (ADO)â€"and in particular the recordset objectâ€"is a favorite part of many developers' toolboxes.
      If you want to adopt the ADO recordset object as the transport mechanism to send blocks of data back and forth over the Web, there are a couple of mutually exclusive problems that need to be addressed. First, recordsets only define the way in which you package your data. In other words, a recordset object is just the external envelope that wraps and protects your data during the trip. Yet recordsets need some sort of lower-level technology to provide the physical connection between the sender and the receiver.
      RDS, RS, and data binding are three approaches that provide a way to retrieve data on the server without leaving or fully refreshing the current Web page. While RS is the most portable of these solutions, in purely performance terms it also turns out to be the least efficient one since it still relies 100 percent on ECMAScript code. And script code, as you should know, is always interpreted. RDS and data binding, on the other hand, take advantage of compiled COM components, and although they provide substantially faster base code, they also cut down the range of supported browsers and client platforms.
      In this column, I'll introduce a fourth possibility that is based on COM and specializes in handling XML data. This component, called XMLHttpRequest, is part of the built- in XML support provided by Microsoft Internet Explorer 5.0.
      Before going any further in describing the features and the capabilities of XMLHttpRequest, let me recap a few key points about using XML as a sort of universal glue that lets heterogeneous systems interoperate and collaborate. In doing so, I'll touch on a couple of topics that will have a significant impact on the Web applications of the near futureâ€"BizTalk and Web servicesâ€"which address the two more prominent perspectives in Windows DNA 2000: business and data integration.

XML Overview

      XML is an emerging component technology. If you still have doubts about this, please read Don Box's "XML Manifesto" on MSDN™ Online (http://msdn.microsoft.com/xml/articles/xmlmanifesto.asp) or a summary of it that appeared in Don's House of COM column in the January 2000 issue of MSJ (http://www.microsoft.com/msj/0100/com.asp). Aaron Skonnard's article, "SOAP: The Simple Object Access Protocol," in the January 2000 issue of MIND (http://www.microsoft.com/mind/0100/soap/soap.asp) will also help you understand the key role of XML in the future of distributed systems.
      As a component technology, XML's primary goal is making cooperation and interoperation easier between different modules belonging to different systemsâ€"and even different organizations. XML is a relatively new entry in the arena of component technology. Before it came to prominence, however, the world had already seen other software technologies that connected users and systems; COM, DCOM, and CORBA are the biggest and most important of them.
      So why has XML become such a high-profile technology so rapidly? I think the answer is simple: XML is plain old text, but it's also smart. Simple as it sounds, a markup language makes it easy to identify (and retrieve at a later time) specific pieces of information with special meanings. Using plain old text and holding universally parsable content are two conditions for implementing a technology successfully across heterogeneous systems.
      A string of text can be transferred without leaving any particular artifacts and can be managed smoothly on any target platform. However, if you just plan to define a text-based protocol to make your modules communicate, you don't necessarily need XML. ASCII, in fact, has existed for a long time already. The difference lies in the tags XML utilizes to delimit pieces of information.
      The major strength of XML is also, today, its main weakness. XML is too generic to be used without fixing at the outset the exact syntax of a document and how each piece of exchanged data can be retrieved and located. Such standardization is simpler to achieve within a single corporate app, even if it spans various heterogeneous hardware and software platforms. Getting the same result in a wider context requires a tremendous effort to arrive at some standard definitions. This process may take years to complete.
      Unfortunately, when it comes to integrating systems from different organizations, especially in an open way, XML shows some limitations. XML is about data description, but it's only useful when people agree on how data should be described. An invoice, for example, is logically the same type of document for any process that manipulates it. But different subjects may render it through different binary formats. Despite the physical storage format, the document needs to be transmitted and received in a commonly recognized format available on all platforms. Of course, XML can be used as this format.
      BizTalk (http://www.biztalk.org) is a worldwide initiative that aims to create a database of XML-based document formats. More than 100 companies have already adhered to BizTalk, which offers more than 30 different XML schemas to which interested companies can refer.
      A BizTalk-based document is an XML file that deploys the tags from a certain vocabulary and follows the rules that the organization has defined for that type of document. A BizTalk-based document is actually exchanged by two BizTalk servers across a network. In Figure 1 you can see the overall BizTalk architecture, illustrating the role of XML for exchanging data between commercial partners. Both parties continue to manage documents in their own native formats on their own platforms, but data moves back and forth, despite architectural differences, thanks to XML.
Figure 1 BizTalk Architecture
Figure 1 BizTalk Architecture

      BizTalk is central to Windows DNA 2000, and will be one of the key tools that help build e-commerce solutions. More often than not, today's e-commerce solutions require integration with existing information systems and data residing on host machinesâ€"mostly mainframes. Windows DNA 2000 also addresses the topic of legacy data integration through a product codenamed "Babylon." Babylon is an integration server that will support something called an XML Transaction Integrator (XMLTI). If you're already familiar with COMTI, then you're half the way to understanding XMLTI. An XMLTI is a component that generates XML code to wrap a COMTI-based component that, in turn, embeds a CICS or IMS procedure running on some mainframe machine. You can have any XML document start and control transactions by means of XMLTI-generated code.

Exploiting XML

      So far I've discussed XML's overall position within the new generation of Web applications, and particularly in the Windows DNA 2000 architecture. Taking for granted that XML is a candidate to become the universal key that unlocks worldwide data exchange, let's see what software and component technology takes advantage of XML today. BizTalk, Babylon, and many other products that form the Windows DNA 2000 infrastructure (see Figure 2), are still in the testing phase. They are due out sometime soon, but they don't yet have widespread use in real-world solutions.
      The Microsoft XML Document Object Model (XMLDOM) is an important element in the XML world. The XMLDOM allows you to see and manipulate the content of an XML document as an object model, making it accessible via COM and script. This component is automatically installed by Internet Explorer 5.0, as well as Windows 2000 and Internet Information Services (IIS) 5.0. It's also available in a freethreaded version, which is suitable for situations where multiple threads can simultaneously access the DOM. The progID for the object model is Microsoft.XMLDOM if you choose the single-threaded model, and Microsoft.FreeThreadedXMLDOM otherwise. Other than the threading model, the behavior of the two objects is identical. Your choice should be made according to the number of threads accessing the DOM.
      Another interesting object included with Internet Explorer 5.0 is XMLHttpRequest. This object lets you communicate between HTTP servers through XML, but XMLHttpRequest can actually do much more exciting things for you than this description might suggest. From the client side of the application you instantiate XMLHttpRequest and, through its methods, issue an HTTP command like GET or PUT to the Web server. If it doesn't sound particularly impressive, consider that you can use GET or PUT to transmit an entire XMLDOM. This is the real added value of such an object!

The XMLHttpRequest Object

      Figure 3 illustrates the typical way you might use XML over the Internet to integrate heterogeneous systems. It's relatively easy to find an XML parser for virtually any platform. On Win32®-based platforms (with the current exception of Windows CE) you can use the XMLDOM object to parse and process the contents of an XML file. On other platforms, you could use whatever native tools you prefer since a given stream of bytes will be read in the same way everywhere. But bewareâ€"this only means that tools on both ends will recognize the same vocabulary of tags, the same attributes for each tag, and the dependencies between them. Nothing can be said about how the rest of the system will then interpret the tags and attributes.
Figure 3 XML as a Universal Glue
Figure 3 XML as a Universal Glue

       Figure 4 shows a scenario that involves only Win32-based platforms and Internet Explorer 5.0. There's no need for you to write a component that takes care of packing the XML data over the network and causes the receiver to invoke the proper parser for it. In fact, XMLHttpRequest does this for you, for free. In addition, XMLHttpRequest allows you to send the XMLDOM directly as an object without first saving it to a file. Let's see how it works with a short example.
Figure 4 Using XMLHttpRequest
Figure 4 Using XMLHttpRequest

      The following code snippet comes from a VBScript file and illustrates simple interaction between a client application and a URLâ€"in particular, an ASP page:

Set xmlhttp = CreateObject("Microsoft.XMLHTTP")
xmlhttp.open "GET", "http://expoware/test.asp", False

xmltext = "<magazine>MIND</magazine>"
Set xmldom = CreateObject("Microsoft.XMLDOM")
xmldom.loadXML xmltext
xmlhttp.send xmldom

MsgBox xmlhttp.responseText
With XMLHttpRequest you can issue any generic HTTP request to a Web server and, in doing so, you can use XML to send and receive data. By XML I don't mean simple XML stringsâ€"in fact, you don't necessarily need XMLHttpRequest to send an XML string to a Web server. The added value of XMLHttpRequest is that you can send the entire XMLDOM as-is to the server page and retrieve it at a later time, modified, as needed.
      In the previous code, after creating an instance of the component, I've opened communication with the component by specifying the type of operation requested, the target URL, and a few other bits of additional information, including the flag that enables asynchronous processing. The Open method alone is not enough. The communication is actually up and running only when you execute the Send method. The parameter of such a method will be received by the target URL, and can be either an XML string or an instance of XMLDOM. Once you hold an instance of XMLDOM, you can set it to work on a well-formed XML document, an XML string, or an existing instance of XMLDOM.
      The interaction between the client-side code and the Web server can take place synchronously or asynchronously, depending on the value of the third parameter passed to the Open method. When finished, you will have a collection of other methods to manipulate the response data. You could view the data as a raw XML string or as an XMLDOM object. You could also read all the HTTP headers, as shown in Figure 5. Furthermore, the response is available as a stream (through the IStream interface) and as an array of unsigned bytes. The responseText method used in the example simply returns what the URL sent back as raw text.
Figure 5 XMLHttpRequest Headers
Figure 5 XMLHttpRequest Headers

      On the other end of the network, a URL commonly represents an ASP page that encapsulates some sort of server-side service. If you're making extensive use of XML in your project on both the client and the server, XMLHttpRequest can help you set up an entirely XML-based conversation between the two. The advantage is that you don't need to write any marshaling code, and you can send the XMLDOM back and forth over the network using just the methods of this component. Because of this you can draw the comparison that XMLHttpRequest is to XML as RDS components are to ADO recordsets.
      On the server side, an ASP page can retrieve the input data through the Request object and work over it the XML way by loading the Request object into an instance of the XMLDOM. At this point, either you can manipulate the XML data as needed or you can prepare a new XML stream.
      The world's simplest XML-generating ASP page looks like this:

<%
  Response.Write "<xml>Hello</xml>"
%>
A slightly more complex page would at least retrieve the passed data and do some work on it before returning:
<%
  set xmldom = Server.CreateObject("Microsoft.XMLDOM")
  xmldom.Load(Request)
  strText = "You sent me: " & xmldom.xml
  Response.Write strText
%>
Notice that the Request object is recognized as an instance of the XMLDOM and is managed properly. All the methods and properties of XMLHttpRequest are listed in Figure 6.

XMLDOM Manipulation

      In a typical scenario, what the client and the server exchange is an instance of the XMLDOM of the XML data involved with the HTTP conversation. This means that the ASP pageâ€"namely the server pageâ€"can retrieve the XMLDOM and modify it, if that's necessary. The changes entered are available to the client as soon as the invocation method terminates.
      Within the ASP page, the Request method lets you access the input XMLDOM and modify the structure and content of the XML data simply by using the methods of the object. For example, the following code illustrates a very simple scenario in which the server page is just changing the content data of the <magazine> node from MIND to MSDN Magazine:
<%
set xmldom = Server.CreateObject("Microsoft.XMLDOM")
xmldom.load(Request)
set myNode = xmldom.selectSingleNode("magazine")
myNode.text = "MSDN Magazine"
strText = "You sent me: " & xmldom.xml
Response.Write strText
%>
The selectSingleNode method shown here allows you to point your attention to a certain node in the tree. What follows is pure XMLDOM programming!

Towards Web Services

      Thanks to the XMLHttpRequest object you can now easily implement a Web service within the Win32 platform, particularly where machines running Windows 2000 are involved. In more general terms, a Web service is a functionality that you can access through the Web both programmatically and interactively. Wouldn't it be nice if lots of Web sites exposed their content in a way that made them easy to access programmatically? Active Channel™ servers actually provide this functionality, but they just return HTML pages. With some extra coding you can arrange one or more ASP pages so they act as Web-level libraries of functions. Client code invokes them via HTTP, where they accept and return XML code. On both sides of the connection the XML code is managed and traversed through the XMLDOM.
      The code in Figure 7 represents an ASP page that extracts information from a database about the topics of a certain seminar and formats it as XML. The following VBScript code shows how to access this function over the Web:
url = "http://expoware/xmlhttp/seminars.asp?classID=1"
set xmlhttp = CreateObject("Microsoft.XMLHTTP")
xmlhttp.open "GET", url, false
xmlhttp.send ""
MsgBox xmlhttp.responseText

Recordsets or XML?


      In my last column, I discussed how and when to use an ADO recordset as the main structure with which to pack your data over the net. RDS is a layer of code that allows you to exchange recordsets between the client and the server. Recordsets are more specific and powerful than XML strings, but they only work if you have ADO installed. This means that you cannot even think about employing RDS in an environment involving heterogeneous platforms.
      XML, however, is pure text and language-neutral. When you have to decide how to implement a protocol to enable data exchange between heterogeneous systems, definitely consider XML. Finally, remember that you can convert a recordset into XML, but without a particular and optimized schema, and without compression, text-based XML is always larger than a recordset.

Dino Esposito is a senior trainer and consultant based in Rome. He has recently written Windows Script Host Programmer's Reference (WROX, 1999). You can reach Dino at desposito@vb2themax.com.

From the April 2000 issue of MSDN Magazine.

Page view tracker