MSDN Magazine > Issues and Downloads > 2000 > July >  House of COM: Using ADO to Create XML-Based Rec...
This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.
MSDN Magazine

Using ADO to Create XML-based Recordsets
Don Box

Code for this article: HouseofCOM0700.exe (46KB)

T
here is a lot of mythology surrounding ActiveX® Data Objects (ADO) and XML. ADO has supported XML since version 2.1. In general, this is a good thing, especially for developers looking for better ways to reach out to the four million programmers who use Visual Basic® and who also worship at the altar of the disconnected recordset. However, simply supporting XML in your application is not good enough because ADO uses a very stylized XML-based vocabulary to perform its magic. The way ADO uses XML allows ADO to interoperate with applications written in programming languages such as Java, Perl, or Python and on platforms such as Unix, Linux, and Apache. Yes, the developer who uses Visual Basic can deal with arbitrary XML using any number of XML parsers. However, morphing your XML into the vocabulary of ADO makes it a no-brainer for any typical application-focused programmer to access your information with Visual Basic.
      The utility of the XML support in ADO is especially apparent when crossing platform boundaries. Consider the case of a server appliance running Linux and Apache in a production environment. A system administrator running Windows® on her desktop may want to monitor activity on the server machine without resorting to a circa-1980 telnet session. Assuming that the Perl program shown in Figure 1 is installed under an Apache-executable directory on the Linux machine (typically in /cgi-bin), you could run a simple Visual Basic-based program as shown in Figure 2 to display the currently running processes. The code for this application is shown in Figure 3, but you should note that most of this code is simply the Visual Basic form template code. The Refresh button handler that does all of the work is only four lines long.
Figure 2 Viewing a Recordset in Visual Basic
Figure 2 Viewing a Recordset in Visual Basic

      Despite the simplicity of the code in Figure 3, you cannot simply sprinkle the magic XML pixie dust on an application and expect ADO to perform miracles. ADO is useless for reading or writing arbitrary XML vocabularies. ADO is not a replacement for the Simple API for XML (SAX) or the Document Object Model (DOM). Rather, ADO recordsets support an XML-based serialization format in parallel with its native Advanced Data Tablegram (ADTG) format. The two formats are functionally identical. The difference between the two is that ADTG is a binary format that is proprietary to Microsoft. XML is a text-based format that is supported by virtually every vendor, platform, and language on the planet. If you understand the XML vocabulary that ADO is expecting, you can generate a serialized ADO recordset without ever having ADO (or OLE DB, a database, or Microsoft® Windows) anywhere near your code.
      Technically, ADO does very little to support recordset persistence. ADO simply utilizes the OLE DB persistence provider to do the heavy lifting. If you are working in OLE DB natively, you must use the ADO Recordset Construction interface to create a transient recordset on top of your existing OLE DB rowset objects. The following code illustrates this technique from C++:



HRESULT GetRecordset(IRowset *prs, _Recordset **pprs) {
  CComPtr<IADORecordsetConstruction> prc;
  HR(prc.CoCreateInstance(__uuidof(Recordset)));
  HR(prc->put_Rowset(prs));
  return prc->QueryInterface(pprs);
}
Translating in the other direction is equally simple:


HRESULT GetRowset(_Recordset *prs, IRowset   **pprs) {
  CComPtr<IADORecordsetConstruction> prc;
  HR(prs->QueryInterface(&prc));
  CComPtr<IUnknown> punk;
  HR(prc->get_Rowset(&punk));
  return punk->QueryInterface(pprs);
}
At the time of this writing, there is no documented way to access the persistence provider except via ADO.
      To invoke the services of the OLE DB persistence provider, you call the _Recordset::Save method on an ADO Recordset. Save is defined as follows


HRESULT Save([in, optional] VARIANT Destination, 
             [in, optional, defaultvalue(0)] 
             PersistFormatEnum PersistFormat);
where PersistFormatEnum is an enumeration with two values:


enum {
    adPersistADTG = 0,
    adPersistXML = 1
} PersistFormatEnum;
The first parameter to Save can be either a file name or an IStream interface pointer. The following line of JScript® serializes a recordset to a file:


rs.Save("C:\\foo.rs");
Because no second parameter was provided, the format will be ADTG, not XML, just as if the method was called like this:


rs.Save("C:\\foo.rs", 0);
To force the serialization format to be XML, the second parameter must be adPersistXML or 1:


rs.Save("C:\\foo.rs", 1);
      When ADO serializes to XML, it uses the UTF-8 character encoding. Additionally, ADO does not emit the initial XML declaration, which is legal given that UTF-8 is in use.
      To recreate an ADO recordset from its serialized form, you call the _Recordset15::Open method:


HRESULT Open([in, optional] VARIANT Source, 
             [in, optional] VARIANT ActiveConnection, 

             [in, optional, defaultvalue(-1)] 
                            CursorTypeEnum CursorType, 
             [in, optional, defaultvalue(-1)] 
                            LockTypeEnum LockType, 
             [in, optional, defaultvalue(-1)] long Options);
When used with serialized recordsets, the first parameter of Open can be a file name, a URL, or an IStream implementation. The following JScript restores a serialized recordset that was saved as either ADTG or XML:


rs.Open("C:\\foo.rs");
The recordset can tell which format was used based on the initial few bytes of the file.
      The first parameter of Save can also be an IStream interface pointer. This means that you can pass an MSXML DOMDocument object to Save in order to use the facilities of the MSXML DOM over the underlying data.


var dom = new ActiveXObject("MSXML.DOMDocument");
rs.Save(dom, 1);
The data from the recordset can now be used with XPath, XSLT, or other XML-based technologies. The ASP Response object can also be used to stream an ADO recordset in an HTTP response message. Figure 4 shows an ASP file that returns a serialized ADO recordset corresponding to the Authors table in the Pubs database that ships with Microsoft SQL Server™. Assuming that this page is exposed via the URL http://example.com/get.asp, you could load the recordset directly via HTTP using the following code:


rs.Open("http://example.com/get.asp");
      When a URL is passed instead of a file name, the recordset simply uses WinInet to contact the server and fetch either the ADTG or XML, which it then processes as if it were a local file.
      Sending an ADO recordset over HTTP is trivial when the recordset is being sent from the HTTP server to the HTTP client. Sending a recordset in the other direction is fairly trivial as well if you are willing to use the MSXML XMLHTTPRequest object. The XMLHTTPRequest object wraps the WinInet calls needed to send an HTTP request and receive an HTTP response. Despite the fact that the XMLHTTPRequest object ships as part of an XML parser, it can be used to send any text-based MIME type, not just XML. Figure 5 shows the overall architecture of the process.
Figure 5 Database Updates via HTTP
Figure 5 Database Updates via HTTP

Figure 6 shows a client-side JScript file that fetches an ADO recordset from the get.asp script, modifies a record, and sends it back to the server using the XMLHTTPRequest object. Figure 7 shows the corresponding ASP script that receives the recordset and sends the changes back to the database using the UpdateBatch method. Note that because the Open method accepts an IStream reference, you can pass the ASP Request object to Open to read the recordset from the HTTP POST payload:



rs.Open(Request);
This technique avoids the problems related to calling Request.BinaryRead from scripting languages.
      So far I have avoided a discussion of the XML vocabulary used by ADO. In general, you can lead a rich and full life without ever seeing this vocabulary provided the Save and Open methods of the Recordset object are all you use. However, to write interoperable programs that do not use ADO, but instead generate or consume XML directly, you must understand the vocabulary well enough to fool ADO into thinking that the data was generated by the OLE DB persistence provider and not your random calls to printf.
      ADO expects a serialized recordset to be contained in an XML document whose root element is named xml. The root element of the document must contain two child elements. The first element contains the schema information describing the type of each row and column. The second child element represents the serialized OLE DB rowset. This rowset element contains one child element per row. Each row element subsequently contains one attribute per non-null column value.
      ADO makes extensive use of XML namespaces. In particular, the three XML namespaces shown in Figure 8 are used in a serialized ADO recordset. Note that the URIs identifying both the schema and datatypes namespaces do not match those found in the XML Data Reduced (XDR) specification. One reason they don't match is that ADO uses its own dialect of XDR that, at the time of this writing, is not supported outside of ADO.
       Figure 9 shows a minimal serialized recordset. The first child element contains an XDR description of the row element type:


<xdr:ElementType name='row' content='eltOnly'>
     <xdr:AttributeType name='name' />
     <xdr:AttributeType name='age'  />
     <xdr:extends type='rowset:rowbase' />
</xdr:ElementType>
This type definition is roughly analogous to the following Document Type Definition (DTD) fragment:


<!ELEMENT row EMPTY>
<!ATTLIST row name CDATA #REQUIRED age  CDATA #REQUIRED>
Both the XDR type definition and the DTD assume that row elements will look something like this:


<target:row name='Don' age='25' />
      Note that the element type definition is contained in a Schema element with an id attribute:


<xdr:Schema id='R' >
  <!-- type definition goes here -->
</xdr:Schema>
This schema is bound to a namespace prefix using a namespace declaration whose URI reference is simply a fragment identifier


xmlns:target='#R'
This means that while the following element is bound to the element type definition


<target:row name='Don' age='25' />
this one is not:


<row name='Don' age='25' />
Support for mapping schema definitions using fragment identifiers is a fairly new feature of MSXML.
      The OLE DB persistence provider expects to see the row data immediately following the schema definition. The data is contained in a data element that contains zero or more row elements as described by the XDR schema definition.


<rowset:data>
  <target:row name='Don' age='25' />
  <target:row name='Aaron' age='20' />
</rowset:data>
      Upon loading the serialized rowset, the persistence provider will simply create a row for each row element and populate the column data based on the name and age attributes.
      The schema definition shown earlier is technically legal and is sufficient if all you want to do is use the resultant recordset to display simple text-based information. However, if you want type-aware access to the individual fields, you must flesh out the AttributeType definitions to provide type descriptions of each indi- vidual column. This is accomplished via XDR's datatype element:


<xdr:AttributeType name='name' >
     <xdr:datatype ... />
</xdr:AttributeType>
      The datatype element allows the schema designer to describe the type of the corresponding character data used in the attribute value (if not specified, the assumed type is BSTR). The primary attribute of this element is the dt:type attribute, which must reference one of the predefined XDR datatypes shown in Figure 10.


<xdr:AttributeType name='age' >
     <xdr:datatype dt:type='i2' />
</xdr:AttributeType>
      This schema fragment states that the age attribute is a two-byte signed integer. For variable-length datatypes like string, XDR also allows a maximum length to be specified:


<xdr:AttributeType name='name' >
     <xdr:datatype dt:type='string' 
                   dt:maxLength='32'/>
</xdr:AttributeType>
In SQL terms, this type corresponds to the VARCHAR(32) datatype.
      The datatypes supported by OLE DB cannot be adequately described using XDR's intrinsic datatype support. For example, the type definition just shown does not distinguish between Unicode and ANSI strings, or between variable-length and fixed-length strings. To support full-fidelity type information, the OLE DB persistence provider augments the datatype definition with namespace-qualified attributes that capture the OLE DB characteristics without changing the core datatype described by the XDR schema definition. The first extended attribute I will look at is the rs:dbtype attribute. This attribute allows the corresponding OLE DB type name to be specified alongside the XDR datatype name.


   <xdr:datatype dt:type='string' rs:dbtype='str' 
              dt:maxLength='32' />
      By default, the XDR string type corresponds to the DBTYPE_WSTR datatype, which is Unicode. The rs:dbtype='str' attribute informs the persistence provider that the corresponding type is DBTYPE_STR, not DBTYPE_WSTR. Additionally, you can indicate whether a column is fixed-length or variable using the rs:fixedLength attribute:


<xdr:datatype dt:type='string' rs:dbtype='str' 
              dt:maxLength='32' rs:fixedLength='false'/>
<xdr:datatype dt:type='string' rs:dbtype='str' 
              dt:maxLength='32' rs:fixedLength='true'/>
The former datatype corresponds to the SQL VARCHAR(32) type; the latter corresponds to the SQL CHAR(32) type. Figure 10 shows how each of the OLE DB datatypes is represented as an extended XDR datatype.
      The persistence provider uses another extended attribute to indicate whether a given column can accept null values. This is indicated using the rs:maybenull attribute:


<xdr:datatype dt:type='i2' rs:maybenull='true'/>
This attribute allows the corresponding column attribute to be missing, which is interpreted by the persistence provider as a NULL value. Figure 11, Figure 12, and Figure 13 show the remaining OLE DB column attributes and properties and how they are serialized to XML.
      The OLE DB persistence provider supports a variety of rowset properties. To provide more granular control over resynchronizing the rowset with the original data source, the persistence provider supports its own property set (DBPROPSET_ADC) whose values are shown in Figure 14. The corresponding XML attribute must appear on the ElementType element in the schema definition. In particular, if you allow updating of the rowset, the updatable attribute must be set to true:


<xdr:ElementType name='row' content='eltOnly'
     rowset:updatable='true' >
Additionally, each column that is writable must have its AttributeType marked as either write or writeunknown:


<xdr:AttributeType name='age' rowset:write='true'/>
Recordsets that are opened with the adLockBatchOptimistic lock mode are written with these values automatically.
      To support ADO's optimistic concurrency control, when a persisted recordset is reserialized after being modified, the changes are not simply inserted into the stream of row elements. Rather, the updates, inserts, and deletes are called out using special elements that contain the original column values in order to detect concurrency violations at UpdateBatch time. These distinguished elements appear after the unmodified rows.
      When an existing row is updated, its serialized row element is removed from the normal sequence. Its original form will appear beneath an update element alongside the updated fields. Assuming that the following row


<target:row name='Don' age='25' />
were to be modified using the following JScript


rs.Fields(0) = "Donald";
the serialized form of the update would look like this:


<rowset:update>
    <rowset:original>
        <target:row name='Don' age='25' />
    </rowset:original>
    <target:row name='Donald' />
</rowset:update>
One of these update elements is written for each row that is modified. When a new row is inserted via ADO's AddNew method, the corresponding row element is serialized inside an insert element:


<rowset:insert>
    <target:row name='John' age='22' />
</rowset:insert>
      Finally, when a row is deleted via ADO's Delete method, it will not appear in the normal list of rows. Rather, its original form will appear as a child of a delete element:


<rowset:delete>
    <target:row name='Aaron' age='20' />
</rowset:delete>
The original values are written so that Update-Batch can verify that no changes have occurred since the original serialization.
      The update, insert, and delete elements must appear as child elements of the data element. All three element types must appear after any original rows. Additionally, all updates must precede the insert and delete elements, and all delete elements must follow the insert elements as if the following pseudo-DTD fragment were in effect:


<!ELEMENT rowset:data (target:row*,
                       rowset:update*, 
                       rowset:insert*, 
                       rowset:delete*)>
      At UpdateBatch time, the persistence provider attempts to submit all changes at once based on the update criteria rowset property. This property can use either timestamps or the stability of selected fields to determine whether a concurrency violation has occurred.
      It is interesting to compare the XML format used by the OLE DB persistence provider (and ADO) and the format used by the SQL Server XML technology preview (which is a precursor to SQL Server 2000). The OLE DB persistence provider uses a fairly inflexible XML vocabulary as its serialization format. There has been no attempt to make this vocabulary useful to the world at large. Rather, the format is completely dedicated to needs of the persistence provider. The SQL Server XML technology preview, on the other hand, seeks to satisfy a broader audience by providing a great deal of flexibility in terms of the generated XML. It is ironic, however, that the current technology preview does not support the OLE DB persistence provider's format as one of its native modes of operation. I hope that Microsoft's database and data access technologies will share a common vocabulary by the time SQL Server 2000 is released.

Don Box is a cofounder of DevelopMentor, a COM think tank that educates the software industry in COM, MTS, and ATL. Don wrote Essential COM, and coauthored the follow-up Effective COM (Addison Wesley). Reach Don at http://www.develop.com/dbox.

From the July 2000 issue of MSDN Magazine.

Page view tracker