MSDN Magazine > Issues and Downloads > 2002 > July >  .NET: Run-time Serialization, Part 2
From the July 2002 issue of MSDN Magazine
MSDN Magazine
Run-time Serialization, Part 2
I
n my last column, I explained how to use the two serialization formatters included in the Framework Class Library (FCL) and how to define types that allow instances of themselves to be serialized. I'll continue the discussion of serialization by focusing on more advanced techniques that allow a type to completely define how it serializes and deserializes itself.
      Occasionally, you will design a type that requires complete control over how it is serialized and deserialized. The System.Collections.Hashtable type is just such a type. When serialized, a Hashtable object and all the objects it references must be written to the stream. Upon deserialization, a new Hashtable object must be constructed, all the objects managed by the Hashtable object must be constructed, and all the object references must be set correctly. The problem is that hash codes are not guaranteed to be the same for the newly deserialized objects. So deserializing a Hashtable requires that all the objects be deserialized first and then each of the objects must be manually added to the Hashtable object using each object's new hash code value.
      For a type to have complete control over how it is serialized and deserialized, the type must implement the System.Runtime.Serialization.ISerializable interface, which is defined as follows:
public interface ISerializable {
   void GetObjectData(SerializationInfo, StreamingContext context);
}
This interface has just one method, but any type that implements this interface must implement two methods: the GetObjectData method and a special constructor that I'll describe shortly.
      When a formatter serializes an object graph, it looks at each object's type. If the type implements the ISerializable interface, then the formatter constructs a new System.Runtime.Serialization.SerializationInfo object. This object contains the actual set of values that should be serialized for the object.
      When constructing a SerializationInfo, the formatter passes two parameters: a Type and a System.Runtime.Serialization.IFormatterConverter. The Type parameter identifies the object that is being serialized. Two pieces of information are required to uniquely identify a type: the string name of the type and its assembly's identity (which includes the assembly name, version, culture, and public key). When a SerializationInfo object is constructed, it obtains the type's full name (by internally querying Type's FullName property) and stores this string in a private field. You can obtain the type's full name by querying SerializationInfo's FullTypeName property. Likewise, the constructor obtains the type's defining assembly (by internally querying Type's Module property, then Module's Assembly property, then Assembly's FullName property) and stores this string in a private field. You can obtain the assembly's identity by querying SerializationInfo's AssemblyName property.
      While you can set a SerializationInfo's FullTypeName and AssemblyName properties, I would discourage it. If you want to change the type that is being serialized, it is recommended that you call SerializationInfo's SetType method, passing a reference to the desired Type object. Calling SetType ensures that the type's full name and defining assembly are set correctly. Later I'll show you a sample call to SetType.
      Once the SerializationInfo object is constructed and initialized, the formatter calls the type's GetObjectData method, passing it the reference to the SerializationInfo object. The GetObjectData method is responsible for determining what information is necessary to serialize the object and adds this information to the SerializationInfo object. GetObjectData indicates what information to serialize by calling one of the overloaded AddValue methods provided by the SerializationInfo type. AddValue is called once for each piece of data that you want to add.
      The code in Figure 1 shows how the Hashtable type implements the ISerializable and IDeserialization interfaces to take control over the serialization and deserialization of its objects. Each AddValue method takes a String name and some data. Usually, the data is of a simple value type like Boolean, Char, Byte, SByte, Int16, UInt16, Int32, UInt32, Int64, UInt64, Single, Double, Decimal, or DateTime. You can also call AddValue, passing it a reference to an Object such as a String. After GetObjectData has added all of the necessary serialization information, it returns to the formatter.
      Always call an overloaded AddValue methods to add serialization information for your type. If a field's type implements the ISerializable interface, don't call the GetObjectData on the field. Instead, call AddValue to add the field; the formatter will see that the field's type implements ISerializable and will call GetObjectData for you. If you called GetObjectData on the field object, the formatter wouldn't know to create a new object when deserializing the stream.
      The formatter now takes all of the values added to the SerializationInfo object and serializes each of them to the byte stream. You'll notice that the GetObjectData method is passed another parameter, a reference to a System.Runtime.Serialization.StreamingContext object. Most types' GetObjectData methods will completely ignore this parameter, so I'll discuss that later.
      Now that you know how to set all of the information used for serialization, let's turn to deserialization. As the formatter extracts an object from the byte stream, it allocates memory for the new object (by calling the System.Runtime.Serialize.FormatterServices type's static GetUninitializedObject method). Initially, all of this object's fields are set to 0 or null. Then the formatter checks if the type implements the ISerializable interface. If this interface exists, the formatter attempts to call a special constructor whose parameters are identical to that of the GetObjectData method.
      If your class is sealed, I highly recommend that you declare this special constructor to be private. This will prevent any code from accidentally calling it. If your class is not sealed, then you declare this special constructor as protected so that only derived classes can call it. The runtime has an internal mechanism that allows it to call this special constructor no matter how it is declared.
      This constructor receives a reference to a SerializationInfo object containing all of the values added to it when the object was serialized. The special constructor can call any of the GetXxx methods (GetBoolean, GetChar, GetByte, GetSByte, GetInt16, GetUInt16, GetInt32, GetUInt32, GetInt64, GetUInt64, GetSingle, GetDouble, GetDecimal, GetDateTime, GetString, or GetValue) passing in a value string name. The value returned from each of these methods is then used to initialize the fields of the new object. When the constructor returns after execution, the object should be fully constructed.
      When deserializing an object's fields, you should call the Get method that matches the type of value that was passed to the AddValue method when the object was serialized. In other words, if the GetObjectData method called AddValue passing it an Int32 value, then the GetInt32 method should be called when deserializing the object. If the value's type in the stream doesn't match the type that you're trying to get, then the formatter will attempt to use an IFormatterConvert object to cast the stream's value to the type that is desired.
      As I mentioned earlier, when a SerializationInfo object is constructed, it is passed an object whose type implements the IFormatterConverter interface. Since the formatter is responsible for constructing the SerializationInfo object, it chooses whatever IFormatterConverter type it wants. The Microsoft®-provided BinaryFormatter and SoapFormatter types always construct an instance of the System.Runtime.Serialization.FormatterConverter type. Microsoft formatters don't offer any way for you to select a different IFormatterConverter type.
      The FormatterConverter type knows how to convert values between the core types, such as converting an Int32 to an Int64. To convert a value between other types, the FormatterConverter casts the serialized (or original) type to an IConvertible interface and then calls the appropriate interface method. To allow objects of a serializable type to be deserialized as a different type, you may want to consider having your type implement the IConvertible interface. The FormatterConverter object is only used when deserializing objects and when you're calling a Get method whose type doesn't match the type of the value in the stream.
      Instead of calling the various Get methods, the special constructor could instead call GetEnumerator, which returns a System.Runtime.Serialization.SerializationInfoEnumerator object that iterates through all the values contained within the SerializationInfo object. Each value enumerated is a System.Runtime.Serialization.SerializationEntry object.
      Of course, you are welcome to define a type of your own. Derive it from a type that implements ISerializable's GetObjectData and special constructor. If your type also implements ISerializable, then your implementation of GetObjectData and your implementation of the special constructor must call the same functions in the base class in order for the object to be serialized and deserialized properly—don't forget this or the objects will not serialize and deserialize correctly. Later I'll explain how to properly define an ISerializable type whose base type doesn't implement this interface.
      If your derived type doesn't have any additional fields and therefore has no special serialization/deserialization needs, then you do not have to implement ISerializable at all. Like all interface members, GetObjectData is virtual and will be called to properly serialize the object. In addition, the formatter treats the special constructor as "virtualized." That is, during deserialization, the formatter will check the type that it is trying to instantiate. If that type doesn't offer the special constructor, then the formatter will scan all of the base classes until it finds one that implements the special constructor.
      A type may include fields that refer to other objects. When the special constructor is called, any fields that refer to other objects are guaranteed to be set correctly. That is, the fields' values will contain references to allocated objects. As these referenced objects may not have had their fields initialized yet, you should not execute any code in the special constructor that accesses any members on a referenced object.
      If your type must access members (such as call methods) on a referenced type, then it is recommended that your type also implement the IDeserializationCallback interface's OnDeserialization method. When this method is called, all objects have had their fields set. But there's no way to tell what order multiple objects have their OnDeserialization method called. So, while the fields may be initialized, you still don't know if a referenced object is completely deserialized if that referenced object also implements the IDeserializationCallback interface.
      The Hashtable serialization and deserialization code in Figure 1 demonstrates these mechanisms.

ISerializable

      As I mentioned, the ISerializable interface is extremely powerful since it allows a type to take complete control over how instances of the type get serialized and deserialized. This power comes at a cost; the type is now responsible for serializing all of its base type's fields as well. Serializing the base type's fields is easy if the base type also implements the ISerializable interface—you just call the base type's GetObjectData method. Someday you may find yourself defining a type that needs to take control of its serialization, but whose base type does not implement the ISerializable interface. In this case, your class must manually serialize the base type's fields. Figure 2 shows how to implement ISerializable's GetObjectData method and its implied constructor so that the base type's fields are serialized.
      In Figure 2, there is a base class, Person, marked only with the SerializableAttribute custom attribute. Derived from Person is Employee, which also is marked with only the SerializableAttribute attribute. Derived from Employee is Manager, which is marked with the SerializableAttribute attribute and also implements the ISerializable interface. To make the situation more interesting, notice that all three classes define a private String field called title. When calling SerializationInfo's AddValue method, you can't add multiple values with the same name. The code in Figure 2 correctly handles this situation by identifying each field by its class name prepended to the field's name. For example, when Manager's GetObjectData method calls AddValue to serialize Employee's title field, the name of the value is written as "Employee+title" as you'll see when you build this code and examine the resulting file.
      Each class also overrides the ToString method, causing it to display the hierarchy of titles. In the Main method, ToString is called before serialization as a reference to see that each class's title field is displayed correctly. Then, after deserialization, ToString is called again so that its output can be compared with the original object's output, ensuring that all the title fields were serialized and deserialized correctly.

Streaming Contexts

      There are many destinations for a serialized set of objects: same process, different process on the same machine, different process on a different machine, and so on. In some rare situations, an object might want to know where it is going to be deserialized so that it can emit its state differently. For example, an object that wraps a Windows® semaphore object might decide to serialize its kernel handle if the object knows that it will be deserialized into the same process. The object might decide to serialize the semaphore's string name if it knows that the object will be deserialized on the same machine. Finally, the object might decide to throw an exception if it knows that it will be deserialized within a process running on a different machine.
      A number of the methods mentioned earlier, such as GetObjectData and SetObjectData, have a StreamingContext as one of their parameters. A StreamingContext structure is a very simple value type offering just two public read-only properties, as shown in Figure 3.
      A method that receives a StreamingContext structure can examine the State property's bit flags to determine the source or destination of the objects being serialized/deserialized. Figure 4 shows the possible bit flag values.
      Now that you know how to get this information, I'll show you how to set it. The IFormatter interface (implemented by both the BinaryFormatter and the SoapFormatter types) defines a read/write StreamingContext property called Context. When you construct a formatter, the formatter initializes its Context property so that StreamingContextStates is set to All and the reference to the additional state object is set to null.
      After the formatter is constructed, you can construct a StreamingContext structure using any of the StreamingContextState bit flags and you can optionally pass a reference to an object containing any additional context information you need. Now, all you need to do is set the formatter's Context property with this new StreamingContext object before calling the formatter's Serialize or Deserialize methods. Code demonstrating how to tell a formatter that you are serializing/deserializing an object graph for the sole purpose of cloning all the objects in the graph is shown in the DeepClone method I presented in my previous column.

Serializing and Deserializing

      The .NET Framework serialization infrastructure is quite rich and in this section I'll discuss how you can design a type which can serialize or deserialize itself into a different type or object. There are many examples of this.
      Some types (such as System.DBNull and System.Reflection.Missing) are designed to have only one instance per AppDomain. These types are frequently called singletons. If you have a 10 element array where each element refers to a DBNull object, serializing and deserializing the elements in this array should not cause 10 new DBNull objects to exist in the AppDomain. After deserializing the array, each element should refer to the AppDomain's already-existing DBNull object.
      Some types (such as System.Type, System.Reflection.Assembly, and other reflection types like MemberInfo) have one instance per type, assembly, member, and so on. Imagine you have an array where each element references a MemberInfo object. It's possible that five array elements reference a single MemberInfo object. After serializing and deserializing this array, the five elements that referred to a separate single MemberInfo object should all refer to a shared single MemberInfo object. What's more, these elements should refer to the one MemberInfo object that exists for the specific member (in the AppDomain). This could be useful for polling database connection objects.
      For remotely controlled objects, the CLR serializes information about the server object that, when deserialized on the client, causes the CLR to create a proxy object. This type of proxy object is a different type than the server object, but is transparent to the client code. When the client calls instance methods on the proxy object, the proxy code internally remotes the call to the server which actually performs the request.
      Figure 5 shows how to properly serialize and deserialize a singleton type. The Singleton class represents a type that allows only one instance of itself to exist per AppDomain. The code in Figure 6 tests the Singleton's serialization and deserialization code to ensure that only one instance of the Singleton type exists in the AppDomain.
      When the Singleton type is loaded into the AppDomain, the CLR calls its static constructor, which then constructs a Singleton object and saves a reference to it in a static field, theOneObject. The Singleton class doesn't offer any public constructors, which therefore prevents any other code from constructing any other instances of this class.
      In SingletonSerializationTest, an array is created consisting of two elements; each element can reference a Singleton object. The two elements are initialized by calling Singleton's static GetSingleton method. This method returns a reference to the one Singleton object. The first call to Console's WriteLine method displays "True," verifying that both array elements refer to the same exact object.
      Now, SingletonSerializationTest calls the formatter's Serialize method to serialize the array and its elements. When serializing the first Singleton, the formatter detects that the Singleton type implements the ISerializable interface and calls the GetObjectData method. This method calls SetType passing in the SingletonSerializationHelper type, which tells the formatter to serialize the Singleton object as a SingletonSerializationHelper object instead. Since AddValue is not called, no additional field information is written to the byte stream.
      Since the formatter automatically detected that both array elements refer to a single object, it only serializes one object.
      After serializing the array, SingletonSerializationTest calls the formatter's Deserialize method. When deserializing the stream, the formatter tries to deserialize a SingletonSerializationHelper object since this is what the formatter was "tricked" into serializing. (In fact, this is why the Singleton class doesn't provide the special constructor that is usually required when implementing the ISerializable interface.) After constructing the SingletonSerializationHelper object, the formatter sees that this type implements the System.Runtime.Serialization.IObjectReference interface. This interface is defined in the FCL as follows:
public interface IObjectReference {
   Object GetRealObject(StreamingContext context);
}
      When a type implements this interface, the formatter calls the GetRealObject method. This method returns a reference to the object that you really want a reference to now that deserialization of the object has completed. In my example, the SingletonSerializationHelper type has GetRealObject return a reference to the Singleton object that already exists in the AppDomain. So, when the formatter's Deserialize method returns, the a2 array contains two elements, both of which refer to the AppDomain's Singleton object. The SingletonSerializationHelper object used to help with the deserialization is immediately unreachable and will be garbage collected in the future.
      The second call to WriteLine displays "True," verifying that both of a2's array elements refer to the same object. The third and last call to WriteLine also displays "True," which shows that the elements in both arrays all refer to the same object.

Conclusion

      This month I've shown you how to define types that have complete control over how instances of themselves are serialized and deserialized. In the next column, the last of this three-part series, I'll continue my discussion of serialization by focusing on how an application can override the method by which types serialize themselves. In particular, I'll discuss techniques that allow you to serialize a type that was not designed for serialization or when you need to deserialize an object to a different version of a type.

Send questions and comments for Jeff to dot-net@microsoft.com.
Jeffrey Richter is a cofounder of Wintellect (http://www.Wintellect.com), a training, debugging, and consulting firm specializing in .NET and Windows technologies. He is the author of Applied Microsoft .NET Framework Programming (Microsoft Press, 2002) and several programming books on Windows.

Page view tracker