Cutting Edge

Binary Serialization of ADO.NET Objects

Dino Esposito

Code download available at:CuttingEdge0212.exe(42 KB)

Contents

Serializing DataTable Objects
Inside the DataView Object
Serializing DataView Objects
Serializing DataRow Objects
ADO.NET Runtime Serialization
Binary ADO.NET Runtime Serialization
Custom Binary Serialization for ADO.NET Objects
Creating a Ghost Serializer
Deserializing Data
What About Surrogate Types?
Conclusion

One of the key elements that makes ADO.NET superior to ADO is the level of integration with XML. In ADO, XML is just an I/O format. In addition, the ADO XML format is totally inflexible and doesn't allow any type of customization. Fortunately, since XML is much more deeply integrated into ADO.NET, you can use it to serialize ADO.NET objects. This deeper integration also allows for the dual programming interface which enables you to work on the same data set both hierarchically and relationally. In this column, I'll focus on the first aspect: ADO.NET object serialization. If you'd like to investigate the second aspect, a good place to start is with the characteristics of the XmlDataDocument class, which is covered in the MSDN® documentation.

The serialization of ADO.NET objects can be accomplished in two functionally different ways. You can serialize ADO.NET objects through the standard Microsoft® .NET Framework mechanism based on formatters, such as the binary formatter and the SOAP formatter. As an alternative, you can serialize ADO.NET objects using their built-in methods that dump the object's contents to an XML document according to a given schema.

It's important to note that only the DataSet and DataTable ADO.NET objects are truly serializable. These are the only two objects that implement the ISerializable interface, thus making them accessible to any .NET Framework formatters. However, only the DataSet exposes additional methods (such as WriteXml) to let you explicitly save the contents to XML. Luckily extending this capability to other ADO.NET objects such as DataTable, DataView, and DataRow is not a particularly difficult task to accomplish.

In this column, I'll first discuss how to enhance ADO.NET objects to make them support built-in XML serialization. Next, I'll consider ADO.NET object serialization as a special case of .NET runtime object serialization. Jeffrey Richter covered the basics (and much more) of the .NET Framework runtime serialization in his .NET columns that appeared in the April, July, and September 2002 issues of MSDN Magazine. In those columns, he discusses how you can sometimes override the way a type serializes itself. It's the idea of overriding, rather than the actual implementation based on surrogate types, that is the key to significantly enhancing the performance of runtime formatters working on ADO.NET objects.

Serializing DataTable Objects

Alone in the System.Data namespace, the DataSet object supplies a series of methods to serialize and deserialize itself in a class-defined XML format. I personally call this format the ADO.NET XML normal form. The WriteXml method of the DataSet class saves the contents of all the child tables and relations to various output streams, including disk files and text writers.

The DataTable class doesn't provide the same WriteXml method to persist its contents to XML. This means that if you have a standalone DataTable object, which is an object that's not included as a child in any parent DataSet object, you must resort to a trick to have it persisted as XML.

When you persist a DataSet object to XML, any contained DataTable object is rendered as XML, but the necessary methods are not publicly available. To work around the issue, you simply create a temporary, empty DataSet object, add the table to it, and then serialize the DataSet object to XML. The code in Figure 1 demonstrates a static method that takes a DataTable object and serializes it to a disk file using the same set of writing modes supported by DataSet objects. The WriteDataTable method has several overloads and mimics the DataSet object's WriteXml method as much as possible. In Figure 1, the input DataTable object is incorporated in a temporary DataSet named DataTable. Of course, in your own implementation you can change this name or, better yet, make it configurable by the user. Note that the name you use slightly influences the resulting XML output. The name of the DataSet, in fact, represents the root node of the final XML document:

DataSet ds = new DataSet("DataTable"); if (dt.DataSet == null) ds.Tables.Add(dt); else ds.Tables.Add(dt.Copy());

Figure 1 MsdnMagAdoNetSerializer Class

public class MsdnMagAdoNetSerializer { public static void WriteDataTable(DataTable dt, XmlWriter writer) { WriteDataTable(dt, writer, XmlWriteMode.IgnoreSchema); } public static void WriteDataTable(DataTable dt, Stream stm) { WriteDataTable(dt, stm, XmlWriteMode.IgnoreSchema); } public static void WriteDataTable(DataTable dt, Stream stm, XmlWriteMode mode) { DataSet tmp = CreateTempDataSet(dt); tmp.WriteXml(stm, mode); } public static void WriteDataTable(DataTable dt, string outputFile) { WriteDataTable(dt, outputFile, XmlWriteMode.IgnoreSchema); } public static void WriteDataTable(DataTable dt, string outputFile, XmlWriteMode mode) { DataSet tmp = CreateTempDataSet(dt); tmp.WriteXml(outputFile, mode); } public static void WriteDataTable(DataTable dt, string outputFile, XmlWriteMode mode) { DataSet tmp = CreateTempDataSet(dt); tmp.WriteXml(outputFile, mode); } private static DataSet CreateTempDataSet(DataTable dt) { // Create a temporary DataSet DataSet ds = new DataSet("DataTable"); // Make sure the DataTable does not already belong to a DataSet if(dt.DataSet == null) ds.Tables.Add(dt); else ds.Tables.Add(dt.Copy()); return ds; } }

Also note that a DataTable object can't be linked to more than one DataSet object at a time. If a DataTable object belongs to a parent object, then its DataSet property is not null. In this case, the temporary DataSet object used to serialize the table must be linked to an in-memory copy of the table. The Copy method just creates a perfect clone of the specified DataTable object. If you prefer, you can implement the ISerializable interface to serialize the DataTable object (see the sidebar "Implementing ISerializable").

The class library that contains the WriteDataTable overloads is available with this month's source code at the link at the top of this article and is named MsdnMagAdoNetSerializer. A client application makes use of the library as follows:

StringWriter writer = new StringWriter(); MsdnMagAdoNetSerializer.WriteDataTable(table, writer); // Show the serialization output OutputText.Text = writer.ToString(); writer.Close();

In this code snippet, the WriteDataTable method writes its output as a string. The StringWriter class accepts data through the text writer interface, then exposes it as a string via ToString.

So much for DataTable objects. Let's see what you can do to serialize the content of an in-memory (and possibly filtered) view.

Inside the DataView Object

The DataView class represents a customized view of a DataTable object. The relationship between DataTable and DataView objects is governed by the rules of a well-known software design pattern—the document/view model. According to this model, the DataTable object acts as the document and the DataView object behaves as the view. At any time, you can have a number of different views of the underlying data. More importantly, each view is managed as an independent object with its own set of properties, methods, and events. Additionally, managing views does not produce any type of data duplication or replication.

Under the hood, the DataView tightly couples with a particular instance of the DataTable class. The view is implemented as an array that contains the indexes of the original table rows that match the criteria set on the view. By default, the table view is unfiltered and contains references to the records included in the table.

Figure 2 DataView Architecture

Figure 2** DataView Architecture **

By working on the DataView's RowFilter and RowStateFilter properties, you can narrow the set of rows that fit into a particular view. Using the Sort property, you can even apply a sort expression to the rows in the view. Figure 2 illustrates the internal architecture of the DataView object. When any of the filter properties are set, from the underlying DataTable object the DataView object gets an updated index of the rows that match the criteria. The index is an array of absolute positions. No row objects are physically copied or even referenced until they are specifically requested by the client. The link between the DataTable object and the DataView object can be established at creation time by passing the DataTable to the DataView class constructor:

public DataView(DataTable table);

You could also create a new view and associate it with a table at a later time using the DataView object's Table property, as shown in these lines:

DataView dv = new DataView(); dv.Table = dataSet.Tables["Employees"];

The relationship between DataTable and DataView objects also works the other way around. In fact, you can obtain a DataView object from any table using the DataTable's DefaultView property. The property returns an unfiltered DataView object initialized to work on that table:

DataView dv = dt.DefaultView;

The view object you get in this way includes as many elements as there are rows in the table. The content of a DataView object can be walked through with a variety of programming interfaces, including collections, lists, and enumerators. Enumerators, in particular, ensure through the GetEnumerator method that you can walk your way through the records in the view using the familiar foreach statement. The following code shows how to access all the rows that fit into the view:

DataView myView = new DataView(table); foreach(DataRowView rowview in myView) { // dereferences the DataRow object DataRow row = rowview.Row; ••• }

When client apps access a particular row in the view, the class expects to find it in an internal rows cache (see Figure 2). A row that is already in the cache gets packed into an intermediate DataRowView object and returned to the caller. The DataRowView object is a wrapper for the reference to the DataRow object that contains the actual data. The DataRowView object acts as the programmable interface that controls and references the row object as a whole. If you need it, you can access the underlying DataRow object from a DataRowView instance through the view's Row property.

If the row that the client requested is not already in the internal DataView cache, then the DataView class loads it from the original table. More specifically, the DataView rows cache is emptied and filled in a single shot during the first request. The rows cache can be empty either because it has not yet been used or because the sort expression or the filter string have been changed. Whenever you reset the filter or the sort expression, the cache is emptied. The first time a row is requested and the cache is empty, the DataView object promptly fills it up with an array of DataRowView objects, each of which references an original DataRow object.

Serializing DataView Objects

Let's extend the MsdnMagAdoNetSerializer class shown in Figure 1 to make it also provide overloaded methods to serialize a DataView object. The idea is that you build a copy of the original DataTable object with the rows that fit into the view (see Figure 3).

Figure 3 Copy of DataTable

public class MsdnMagAdoNetSerializer { ••• public static void WriteDataView(DataView dv, string outputFile, XmlWriteMode mode) { DataTable dt = CreateTempTable(dv); WriteDataTable(dt, outputFile, mode); } private static DataTable CreateTempTable(DataView dv) { // Create a temporary DataTable with the same structure // as the original DataTable dt = dv.Table.Clone(); // Fill the DataTable with all the rows in the view foreach(DataRowView rowview in dv) dt.ImportRow(rowview.Row); return dt; } }

First you create a temporary DataTable with the same structure as the table outfitting the DataView object to save. Next, you fill this temporary copy of the DataTable with the rows referenced in the view. As the final step, you serialize the table to XML using any of the previously defined WriteDataTable methods.

The DataTable's ImportRow method plays a key role in the code shown in Figure 3. ImportRow works by creating a new DataRow object in the context of the table on which the method is invoked. Just like other ADO.NET objects, DataRow cannot be referenced by two container objects at the same time. Using ImportRow is logically equivalent to cloning the row and then adding the clone as a reference to the table. It is worth noting that ImportRow preserves any property settings, as well as original and current values. Unlike the NewRow method, which adds new rows with default values, calling ImportRow maintains the existing state of the row, relative to the current value from the DataRowState enumeration.

Serializing DataRow Objects

The two examples discussed so far indicate a standard way to persist ADO.NET objects to XML. The idea consists of creating a hierarchy of parent objects from the actual object up to the DataSet. So to serialize a single, standalone DataRow object, you have to add it to a temporary DataTable object and, in turn, add the DataTable to a worker DataSet.

ADO.NET Runtime Serialization

As I mentioned earlier, there are basically two ways to serialize ADO.NET objects: using the object's own XML interface when available or using the standard .NET Framework data formatters. So far I've discussed methods for serializing ADO.NET object data to XML, including the necessary API extensions to have the feature supported by objects other than the DataSet. Now let's look at what's needed to serialize ADO.NET objects using the standard .NET Framework runtime object serialization engine.

The big difference between methods like WriteXml and .NET Framework data formatters is that, in the former case, the object controls its own serialization process. On the other hand, when .NET Framework data formatters are involved, any object can behave in one of two ways. The object can declare itself as serializable, passively letting the formatter extrapolate any significant information that needs to be serialized. This type of object serialization is enabled on a per-class basis using the [serializable] attribute. The serialization process utilizes the .NET Framework reflection API to list all the properties that make up the state of an object.

The second option requires that the object implement the ISerializable interface, making it responsible for passing the formatters any data to be serialized. After this step, however, the object no longer controls the process. A class that is not marked with the [serializable] attribute and that does not implement the ISerializable interface can't be serialized. No ADO.NET classes declare themselves as serializable, and only DataSet and DataTable implement the ISerializable interface. For example, you can't serialize a DataColumn or a DataRow object to any .NET Framework formatters. As mentioned earlier, sometimes surrogate types represent a system-provided, application-specific workaround for such a problem—that is, making relatively simple non-serializable types serializable to .NET formatters. But let's first briefly recall the key aspects of .NET runtime serialization.Implementing ISerializable

Another more elegant and efficient way to serialize a DataTable object to a truly binary stream is to create a derived class and make it implement the ISerializable interface, the .NET Framework way in which a serializable class can control its serialization process. What's the advantage? When you use ISerializable, you touch data only once. In contrast, a solution based on the ghost serializer class first copies data from the DataTable to the properties of the ghost serializer and then from there up to the streaming context for the actual serialization. As you can see, at the end of the day, you've moved the data twice. In addition, a solution based on the ISerializable interface will be more forward-compatible with future versions of ADO.NET.

The BinDataTable class outlined as follows can use the same tricks of the ghost serializer to do its job:

[Serializable] public class BinDataTable : DataTable, ISerializable { protected BinDataTable(SerializationInfo si, StreamingContext ctx) {...} void System.Runtime.Serialization.ISerializable.GetObjectData( SerializationInfo si, StreamingContext ctx) {...} }

The GetObjectData method copies the DataTable-sensitive information up to the streaming context. The protected serialization class constructor will do the reverse—extract data from the stream and initialize a brand new BinDataTable object.

To obtain this, though, you obviously need to work with another class and apply casting if you need a DataTable object. The full source code demonstrating this approach is included in this month's download.

Binary ADO.NET Runtime Serialization

The .NET Framework comes with two predefined formatter objects defined in the System.Runtime.Serialization.Formatters namespace: the binary formatter and the SOAP formatter. The classes that provide these two serializers are BinaryFormatter and SoapFormatter. The former produces more compact code, while the latter is designed for interoperability and generates a SOAP-based description of the class.

The following code shows what's needed to serialize a DataTable object using a binary formatter:

BinaryFormatter bin = new BinaryFormatter(); StreamWriter dat = new StreamWriter(outputFile); bin.Serialize(dat.BaseStream, dataTable); dat.Close();

The formatter's Serialize method causes the formatter to flush the contents of an object to a binary stream. The Deserialize method does the reverse. In particular, it reads from a previously created binary stream, rebuilds the object, and returns it to the caller:

DataTable table = new DataTable(); BinaryFormatter bin = new BinaryFormatter(); StreamReader reader = new StreamReader(sourceFile); table = (DataTable) bin.Deserialize(reader.BaseStream); reader.Close();

So far, so good. However, when you run this code something unexpected happens. If you try to serialize a DataTable object or a DataSet object using the binary formatter you still get a binary file, but it's a fairly large one because it's filled with a ton of XML data. Unfortunately, XML data in binary files makes for huge files that lack the portability and readability advantages that XML provides. Subsequently, deserializing such files might take seconds to complete and end up occupying much more memory than really needed. As a result, if you choose a binary serialization of ADO.NET objects because you need to get a more compact output, you will fail. The binary serialization is still the most space-effective approach, but with ADO.NET objects it doesn't prove to be as effective as it should. Is there an architectural reason for this rather odd behavior?

The ADO.NET objects that are serializable through formatters—only the DataTable and DataSet classes—correctly implement the ISerializable interface, which makes them responsible for providing the data being serialized. The ISerializable interface consists of a single method, GetObjectData, whose output the formatter takes and flushes into the output stream.

By design, the DataTable and DataSet classes describe themselves to serializers using XML data. In particular, a DiffGram document is used—that is, plain and verbose XML. Oblivious to the potential inefficiency of the approach, the binary formatter takes this rather long string and appends it to the stream. In this way, DataSet and DataTable objects are always remoted across domains using XML—which is great. Unfortunately, if you are looking for a more compact representation of persisted tables, the ordinary .NET Framework runtime serialization for ADO.NET objects is not for you (not without some adjustments, at least).

The following pseudocode illustrates how DataSet and DataTable objects serialize to a .NET formatter:

void GetObjectData(SerializationInfo info, StreamingContext context) { info.AddValue("XmlSchema", this.GetXmlSchema()); this.WriteXml(strWriter, XmlWriteMode.DiffGram); info.AddValue("XmlDiffGram", strWriter.ToString()); }

The class passes its data to the formatter by adding entries to the SerializationInfo object using the AddValue method. For example, a DataSet object serializes itself by adding a couple of entries for schema and data. The DataTable object does the same, but uses a temporary DataSet object to get its schema and data expressed as XML strings. The information stored in the SerializationInfo is then flushed to a binary stream or a SOAP stream, depending on the formatter in use.

Custom Binary Serialization for ADO.NET Objects

To optimize the binary representation of a DataTable object (or a DataSet object), you have no other choice than mapping the class to an intermediate object whose serialization process is under your complete control. In theory, there are a couple of ways you can implement this algorithm. For example, you can design your application to use custom classes that wrap the functionality of DataTable and DataSet objects while providing a different, custom-made scheme for runtime serialization. As an alternative, you could exploit the .NET resource designed expressly for these situations: the surrogate types. A surrogate type is a class that takes over the task of serializing and deserializing a given type—whether the type is serializable or not. You register the surrogate type for a given class with the formatter of choice and then proceed as usual. Under the hood of the formatter, the surrogate type is detected and serialized in lieu of the original overridden type. As I'll show later in this column, for architectural reasons surrogates don't work with DataTable objects.

Customizing the binary serialization of ADO.NET objects requires you to create a custom class that supports the desired type of serialization, provide the class with the data to serialize, and finally serialize and deserialize as usual.

If you create a custom class that wholly manages the DataTable (or DataSet) serialization, you need to either create a sort of ghost serializer class and mark it as serializable, or implement the ISerializable interface. Next, you copy the key properties of the DataTable object to the members of the class. Which members you actually map is up to you. However, for DataTable objects the list must include the column names and types, plus the data rows. You serialize this new class to the binary formatter instead of the DataTable and, when deserialization occurs, use the restored information to build a new instance of the DataTable object. Let's take some time to explore these steps in more detail and then analyze the results.

Creating a Ghost Serializer

Assuming that you need to persist only columns and rows of a DataTable object, you can quickly create a ghost class. In the following snippet, the ghost serializer class is named GhostDataTableSerializer:

[Serializable] public class GhostDataTableSerializer { protected ArrayList colNames; protected ArrayList colTypes; protected ArrayList dataRows; public GhostDataTableSerializer() {...} public void Load(DataTable dt) {...} public DataTable Save() {...} }

The full source of the class is shown in Figure 4. The class consists of three ArrayList objects which contain column names, column types, and data rows. Notice that ArrayList is a serializable class. The application will serialize the DataTable object through the services of the ghost serializer class, as shown here:

void BinarySerialize(DataTable dt, string outputFile) { BinaryFormatter bin = new BinaryFormatter(); StreamWriter dat = new StreamWriter(outputFile); // Instantiate and fill the ghost serializer GhostDataTable ghost = new GhostDataTable(); ghost.Load(dt); // Serialize the object bin.Serialize(dat.BaseStream, ghost); dat.Close(); }

The key thing to focus on here is how the DataTable object is mapped to the GhostDataTableSerializer class. The mapping takes place inside the Load method.

Figure 4 GhostDataTableSerializer Class

using System; using System.Collections; using System.Data; // ********************************************************************** // Worker class for binary serialization [Serializable] public class GhostDataTableSerializer { public GhostDataTableSerializer() { colNames = new ArrayList(); colTypes = new ArrayList(); dataRows = new ArrayList(); } // ****************************************************************** // Properties protected ArrayList colNames; protected ArrayList colTypes; protected ArrayList dataRows; // ****************************************************************** // ****************************************************************** // Populate the custom table object for custom serialization public void Load(DataTable dt) { // Insert column information (names and types) foreach(DataColumn col in dt.Columns) { colNames.Add(col.ColumnName); colTypes.Add(col.DataType.FullName); } // Insert rows information foreach(DataRow row in dt.Rows) dataRows.Add(row.ItemArray); } // ****************************************************************** // ****************************************************************** // Return the DataTable object filled with the embedded data public DataTable Save() { DataTable dt = new DataTable(); // Add columns for(int i=0; i<colNames.Count; i++) { DataColumn col = new DataColumn(colNames[i].ToString(), Type.GetType(colTypes[i].ToString() )); dt.Columns.Add(col); } // Add rows for(int i=0; i<dataRows.Count; i++) { DataRow row = dt.NewRow(); row.ItemArray = (object[]) dataRows[i]; dt.Rows.Add(row); } dt.AcceptChanges(); return dt; } // ****************************************************************** } // **********************************************************************

The Load method populates the worker arrays (called colNames and colTypes) with the name and the type of the columns. The DataRow's internal array is filled with an array of objects that represents all the values in the row.

public void Load(DataTable dt) { foreach(DataColumn col in dt.Columns) { colNames.Add(col.ColumnName); colTypes.Add(col.DataType.FullName); } foreach(DataRow row in dt.Rows) dataRows.Add(row.ItemArray); }

The DataRow object's ItemArray property is an array of objects. It turns out to be particularly handy because it lets you handle the contents of the entire row as a single, monolithic piece of data. Internally, the get accessor of the ItemArray property is implemented as a simple loop that reads and stores one column after the next. The set accessor is even more valuable because it automatically groups all the changes in a pair of BeginEdit/EndEdit calls and fires column-change events as appropriate.

Notice that the call to AcceptChanges (see Figure 4) resets the status of all the rows to Unchanged, violating one of the key goals of serialization: preserving the state of the object. If you omit the call to AcceptChanges due to the trick performed, all the rows would be marked as Added. In other words, the state of the rows is not persisted, and to obtain that you need the ghost serializer to load, and then restore, row state information as well.

Figure 5 Sample App in Action

Figure 5** Sample App in Action **

Figure 5 shows the sample application in action. As the figure illustrates, a DataTable object serialized using a ghost class can be up to 80 percent smaller than an identical object serialized the standard way through the same binary serializer. In particular, consider the DataTable object resulting from the following query:

SELECT * FROM [Order Details]

The table contains five columns and more than 2000 records. It would take up half a megabyte if serialized to the binary formatter as a DataTable object. By using an intermediate ghost serializer, the size of the output is roughly 83 percent smaller. Looking at things the other way around, in this case the results of the standard serialization process is about five times larger than the results you obtain through the ghost class.

Of course, you don't get such impressive results in all cases. The greater the portion of the table that consists of numbers, the more space you save. The more BLOB fields you have, the less space you save. Try running the following query on the SQL Server™ Northwind database, in which photo is the BLOB field that contains the employee's picture:

SELECT photo FROM employees

The ratio of savings here is only 25 percent and represents the lower end of the savings scale in my tests. The reason for this difference should be self evident. BLOB fields are already made of binary data, so the amount of space you can save is limited.

Deserializing Data

Once the binary data has been deserialized, you hold an instance of the ghost class that must be transformed back into a usable DataTable object. Let's see how the sample application can accomplish this:

DataTable BinaryDeserialize(string sourceFile) { BinaryFormatter bin = new BinaryFormatter(); StreamReader reader = new StreamReader(sourceFile); GhostDataTableSerializer ghost; ghost = (GhostDataTableSerializer) bf.Deserialize( reader.BaseStream); sr.Close(); return ghost.Save(); }

The Save method uses the information stored in the ghost arrays to add columns and rows to a newly created and returned DataTable object (see Figure 6).

Figure 6 Add Rows and Columns

public DataTable Save() { DataTable dt = new DataTable(); // Add columns for(int i=0; i<colNames.Count; i++) { DataColumn col = new DataColumn(colNames[i].ToString(), Type.GetType(colTypes[i].ToString())); dt.Columns.Add(col); } // Add rows for(int i=0; i<dataRows.Count; i++) { DataRow row = dt.NewRow(); row.ItemArray = (object[]) dataRows[i]; dt.Rows.Add(row); } dt.AcceptChanges(); return dt; }

What About Surrogate Types?

Serializing a DataTable object using a surrogate class is simple and effective and does not pose a problem. However, what happens during the deserialization process makes surrogates impractical for ADO.NET objects. The following code snippet shows a portion of the code you might want to write to serialize a DataTable object through a surrogate class named DataTableSurrogate:

SurrogateSelector ss = new SurrogateSelector(); DataTableSurrogate dts = new DataTableSurrogate(); ss.AddSurrogate(typeof(DataTable), new StreamingContext(StreamingContextStates.All), dts); formatter.SurrogateSelector = ss; formatter.Serialize(dat.BaseStream, dt);

You first create a surrogate selector and then add the surrogate type to the chain of surrogates. Next, you associate the selector with the formatter. A surrogate class implements the ISerializationSurrogate interface, which consists of a couple of methods—GetObjectData and SetObjectData. GetObjectData adds items to the SerializationInfo collection, as shown in Figure 7. Figure 8 shows a possible implementation of the method that mimics the behavior of the previously examined ghost serializer class. In terms of performance, the results are nearly identical.

Figure 8 Mimicking the Ghost Serializer

protected void Fill(DataTable dt, SerializationInfo info) { ArrayList colNames = new ArrayList(); ArrayList colTypes = new ArrayList(); ArrayList dataRows = new ArrayList(); // Insert column information (names and types) into worker arrays foreach(DataColumn col in dt.Columns) { colNames.Add(col.ColumnName); colTypes.Add(col.DataType.FullName); } // Insert rows information into a worker array foreach(DataRow row in dt.Rows) dataRows.Add(row.ItemArray); // Add arrays to the serialization info structure info.AddValue("ColNames", colNames); info.AddValue("ColTypes", colTypes); info.AddValue("DataRows", dataRows); }

Figure 7 Surrogate Class

public class DataTableSurrogate : ISerializationSurrogate { public void GetObjectData(object obj, SerializationInfo info, StreamingContext context) { DataTable dt = (DataTable) obj; AddSerializationInfo(dt, info); } public object SetObjectData(object obj, SerializationInfo info, StreamingContext context, ISurrogateSelector selector) { DataTable dt = (DataTable) obj; ReadSerializationInfo(dt, info); return null; } }

As I mentioned, the practical reason that you can't use surrogates to serialize ADO.NET objects is architectural. ADO.NET objects are fairly complex objects in which the constructors play a key role. To have a DataSet or a DataTable object up and running, you must call one of its predefined constructors. Consider that when handling surrogates the .NET runtime won't use constructors to instantiate objects. Rather, it resorts to the static GetUninitializedObject method on the FormatterServices class, and the new instance of the object is initialized to zero. Additionally, because no constructors are run, the instance might not be in a consistent state and safe operations can unexpectedly fail.

For example, the default constructor of the DataTable object performs a long list of members initialization operations. Not all of the DataTable internal members are set to null values, though. In fact, the DataTable relies on a number of internal arrays and other helper objects that are initialized by the constructor. The GetUninitializedObject method clears a block of memory and does not initialize any of these objects. On the other hand, the surrogate class rarely has an intimate knowledge of the class it is replacing. In other words, to know exactly what's needed to correctly initialize a DataTable object you should first have a look at the source code. And assuming you can do that (using a .NET decompiler), you have to make intensive use of reflection to access internal and private DataTable objects. Using the ghost serializer saves about the same amount of space and is much easier to code.

Conclusion

To top off the discussion, I have to mention that the objective difficulty of providing surrogate types for ADO.NET serialization stems from two factors. One is certainly the rather complex and sophisticated nature of the objects involved. According to fellow columnist Jeffrey Richter, though, another subtle yet transient reason should be mentioned. A bug in the formatter's code currently prevents you from making SetObjectData return a completely new type. Once this bug is fixed, you could easily work around the system's use of the GetUninitialized method by calling the object's constructor within SetObjectData and then returning the newly created object to the formatter. For more information on the bug, see the .NET column in the September 2002 issue).

Send your questions and comments for Dino to cutting@microsoft.com.

Dino Espositois an instructor and consultant based in Rome, Italy. He is the author of Building Web Solutions with ASP.NET and ADO.NET and Applied XML Programming for Microsoft .NET, both from Microsoft Press (2002). Reach Dino at dinoe@wintellect.com.