.NET

Run-time Serialization, Part 3

Jeffrey Richter

Contents

Overriding How a Type Serializes Itself
Registering a Surrogate Object with a Formatter
Surrogate Selector Chains
Overriding the Assembly or Type When Deserializing an Object
Conclusion

This is the third part of my three-part series on serialization. In part one, I showed how to serialize and deserialize objects. I also showed how to define types that allow instances of themselves to be serialized. In part two I showed how you can define types that have complete control over how they serialize and deserialize instances of themselves. In this last part, I'll show how an application can override the way types serialize themselves. In particular, I'll discuss techniques that allow you to serialize a type that was not designed for serialization or what to do when you need to deserialize an object to a different version of a type.

Overriding How a Type Serializes Itself

Up until now, I've been discussing how to modify a type's implementation to control how a type serializes and deserializes instances of itself. However, the Microsoft® .NET Framework remoting architecture also allows code that is not part of the type's implementation to override how a type serializes and deserializes its objects. There are two main reasons why application code might want to override a type's behavior. First, it provides you the ability to serialize a type that was not originally designed to be serialized. And second, you can provide a way to map one version of a type to a different version of a type.

Basically, to make this mechanism work, you first define a "surrogate type" that takes over the actions required to serialize and deserialize an existing type. Then you register an instance of your surrogate type with the formatter, telling the formatter which existing type your surrogate type is responsible for acting upon. When the formatter detects that it is trying to serialize or deserialize an instance of the existing type, it will call methods defined by your surrogate object. I'll build a sample that demonstrates how all this works.

First, suppose that there is an existing type, Car, defined in an assembly for which you do not have the source code:

class Car {
   public String make, model;

   public Car(String make, String model) {
      this.make  = make;
      this.model = model;
   }
}

Notice that this type doesn't have SerializableAttribute applied to it and it doesn't implement the ISerializable interface either; this type does not allow instances of itself to be serialized. By defining a surrogate type, you can force the formatter to serialize objects of this type. A serialization surrogate type must implement the System.Runtime.Serialization.ISerializationSurrogate interface, which is defined in the Framework Class Library (FCL) as follows:

public interface ISerializationSurrogate {
   void GetObjectData(Object obj,
      SerializationInfo info, StreamingContext context);

   Object SetObjectData(Object obj,
      SerializationInfo info, StreamingContext context,
      ISurrogateSelector selector);
}

Using this interface, a Car serialization surrogate type would be defined as shown in Figure 1. The GetObjectData method in Figure 1 works just like the ISerializable interface's GetObjectData method. The only difference is that ISerializationSurrogate's GetObjectData method takes one additional parameter, a reference to the "real" object that is to be serialized. In the GetObjectData method shown in Figure 1, this object is cast to a Car and the object's field values are added to the SerializationInfo object.

Figure 1 CarSerializationSurrogate

sealed class CarSerializationSurrogate : ISerializationSurrogate {

   // Method called to serialize a Car object
   public void GetObjectData(Object obj,
      SerializationInfo info, StreamingContext context) {

      Car c = (Car) obj;
      info.AddValue("make",  c.make);
      info.AddValue("model", c.model);
   }

   // Method called to deserialize a Car object
   public Object SetObjectData(Object obj,
      SerializationInfo info, StreamingContext context,
      ISurrogateSelector selector) {

      Car c = (Car) obj;
      c.make  = info.GetString("make");
      c.model = info.GetString("model");
      return null;   // Formatters ignore this return value
   }
}

Since the surrogate type is typically not defined by the same company that created the existing type, the surrogate type may not have intimate knowledge of the existing type. Specifically, the surrogate type probably doesn't know exactly how the existing type uses its fields or exactly what its fields mean to the type. And while the surrogate type can easily access the existing type's public fields, it can't do the same with its nonpublic fields. However, it may be able to use reflection to access the nonpublic fields subject to security checks. Since surrogates don't have intimate knowledge of the existing type, surrogate types tend to be useful only for simple types where the developer just plain forgot to make the type serializable.

The SetObjectData method is called in order to deserialize a Car object. When this method is called, it is passed a reference to a Car object that has been allocated (via FormatterServices' static GetUninitializedObject method). This means that the object's fields are all null and no constructor has been called on the object. SetObjectData simply initializes the fields of this object using the values from the passed-in SerializationInfo object.

Notice that the SetObjectData method has a return type of Object. This would make you think that SetObjectData could actually return a reference to a completely different type of object. However, the Microsoft SoapFormatter and BinaryFormatter types ignore this return value completely, which is why I returned null in the previous example. Obviously, there is a bug here; the return value should allow SetObjectData to return a different object.

Microsoft has scheduled this bug for repair. If the return value is null, then the formatter will use the object that it passed to SetObjectData. If SetObjectData returns an object reference, then the formatter will use the object that's returned. Unfortunately, until this bug is fixed, you can't deserialize a value type. To verify this, modify the code in Figure 1 so that Car is defined as a struct instead of a class. When you rebuild the code and run it, the deserialized Car object will not have its fields initialized.

If you want to define a surrogate type for a value type, it is possible to do, but it is not pretty—you must use reflection to set the fields in the boxed value type. For example, if Car were defined as a struct, then you'd change CarSerializationSurrogate's SetObjectData method to read as follows:

// Method called to 
// deserialize a Car object
public Object SetObjectData(Object obj,
   SerializationInfo info, 
   StreamingContext context,
   ISurrogateSelector 
       selector) {

   // Use reflection to 
   // change the boxed Car object's fields
typeof(Car).GetField("make").SetValue(obj, info.GetString("make"));
typeof(Car).GetField("model").SetValue(obj, info.GetString("model"));
return null;   // Formatters ignore this return value
}

Calling GetField and SetValue causes the boxed value type's field to be changed. If you're using a programming language (like C++ with managed extensions or intermediate language assembly language) that allows you to modify a boxed value type's fields directly, then you can avoid calling the GetField and SetValue methods. Unfortunately C# and Visual Basic® don't allow you to modify fields in a boxed value type.

Once Microsoft fixes the bug in the formatters, the code to deserialize a Car value type can be written like this:

// Method called to deserialize a Car object
public Object SetObjectData(Object obj,
   SerializationInfo info, StreamingContext context,
   ISurrogateSelector selector) {

   // Use reflection to change the boxed Car object's fields
   Car c = (Car) obj;
   c.make  = info.GetString("make");
   c.model = info.GetString("model");
   return c;   // Return the object that we deserialized
}

The bug fix will allow you to implement the SetObjectData method for a type without having to worry about whether you're working with a value type or a reference type.

Registering a Surrogate Object with a Formatter

At this point, I'm sure you're wondering how the formatter knows to use this ISerializationSurrogate type when it tries to serialize or deserialize a Car object. The code in Figure 2 demonstrates how to test the CarSerializationSurrogate type. After the five steps have executed, the formatter is ready to use the registered surrogate types. When the formatter's Serialize method is called, each object's type is looked up in the set maintained by the SurrogateSelector. If a match is found, then the ISerializationSurrogate object's GetObjectData method is called to get the information that should be written out to the byte stream.

Figure 2 Testing the Type

static void SerializationSurrogateDemo() {
   // 1. Construct the desired formatter
   IFormatter formatter = new SoapFormatter();

   // 2. Construct a SurrogateSelector object
   SurrogateSelector ss = new SurrogateSelector();

   // 3. Construct an instance of our serialization surrogate type
   CarSerializationSurrogate css = new CarSerializationSurrogate();

   // 4. Tell the surrogate selector to use our object when a 
   //    Car object is serialized/deserialized
   ss.AddSurrogate(typeof(Car), 
      new StreamingContext(StreamingContextStates.All), 
      new CarSerializationSurrogate());
   // NOTE: AddSurrogate can be called multiple times to register
   // more types with their associated surrogate types

   // 5. Have the formatter use our surrogate selector
   formatter.SurrogateSelector = ss;

   // Try to serialize a Car object
   formatter.Serialize(stream, new Car("Toyota", "Celica"));

   // Rewind the stream and try to deserialize the Car object
   stream.Position = 0;
   Car c = (Car) formatter.Deserialize(stream);

   // Display the make/model to prove it worked
   Console.WriteLine("make = {0}, model = {1}", c.make, c.model);
}

When the formatter's Deserialize method is called, the type of the object about to be deserialized is looked up in the formatter's SurrogateSelector and, if a match is found, the ISerializationSurrogate object's SetObjectData method is called to set the fields within the object being deserialized.

Internally, a SurrogateSelector object maintains a private Hashtable. When AddSurrogate is called, the Type and StreamingContext make up the key and the ISerializationSurrogate object is the key's value. If a key with the same Type and StreamingContext already exists, then AddSurrogate throws an ArgumentException. By including a StreamingContext in the key, you can register one surrogate type object that knows how to serialize and deserialize a Car object to a file and register a different surrogate object that will know how to serialize as well as deserialize a Car object to a different process.

While done very infrequently, it is also possible to remove a surrogate from a SurrogateSelector object by calling the SurrogateSelector's RemoveSurrogate method:

public void RemoveSurrogate(Type type, StreamingContext context);

Surrogate Selector Chains

Multiple SurrogateSelector objects can be chained together. For example, you could have a SurrogateSelector that maintains a set of serialization surrogates that are used for serializing types into proxies that get remoted across the wire or between AppDomains. You could also have a separate SurrogateSelector object that contains a set of serialization surrogates that are used to convert Version 1 types into Version 2 types.

If you have multiple SurrogateSelector objects that you'd like the formatter to use, you must chain them together into a linked list. The FCL's SurrogateSelector type implements the ISurrogateSelector interface, which defines three methods. All three of these methods are related to chaining. Here is how the ISurrogateSelector interface is defined:

public interface ISurrogateSelector {
   void ChainSelector(ISurrogateSelector selector);

   ISurrogateSelector GetNextSelector();

   ISerializationSurrogate GetSurrogate(Type type,
      StreamingContext context, out ISurrogateSelector selector);
}

The ChainSelector method inserts an ISurrogateSelector object immediately after the ISurrogateSelector object being operated on (the this object). The GetNextSelector method returns a reference to the next ISurrogateSelector object in the chain or null if the object being operated on is the end of the chain.

The GetSurrogate method looks up a Type/StreamingContext pair in the ISurrogateSelector object identified by this. If the pair cannot be found, then the next ISurrogateSelector object in the chain is accessed, and so on. If a match is found, then GetSurrogate returns the ISerializationSurrogate object that handles the serialization and deserialization of the type that's being looked up. In addition, GetSurrogate also returns the ISurrogateSelector object that contained the match; this is usually not needed and is ignored. If none of the ISurrogateSelector objects in the chain have a match for the Type/StreamingContext pair, GetSurrogate returns null.

The FCL defines an ISurrogateSelector interface, then defines a SurrogateSelector type that implements this interface. However, it is extremely rare that anyone will ever have to define their own type that implements the ISurrogateSelector interface. The only reason to define your own type that implements this interface is if you need to have more flexibility over mapping one type to another. For example, you might want to serialize all types that inherit from a specific base class in a special way. The FCL's System.Runtime.Remoting.Messaging.RemotingSurrogateSelector class is a perfect example. When serializing objects for remoting purposes, the CLR formats the objects using the RemotingSurrogateSelector. This surrogate selector serializes all objects that derive from System.MarshalByRefObject in a special way so that deserialization causes proxy objects to be created on the client side.

Overriding the Assembly or Type When Deserializing an Object

When serializing an object, formatters output the type's full name and the full name of the type's defining assembly. When deserializing an object, formatters use this information to know exactly what type of object to construct and initialize. The earlier discussion about the ISerializationSurrogate interface showed a mechanism that allowed you to take over the serialization and deserialization duties for a specific type. A type that implements the ISerializationSurrogate interface is tied to a specific type in a specific assembly.

However, there are times when the ISerializationSurrogate mechanism doesn't provide enough flexibility. Here are some scenarios when it might be useful to deserialize an object into a different type than the one it was serialized as:

  • You might decide to move a type's implementation from one assembly to another. For example, the assembly's version number changes, making the new assembly different from the original one.
  • An object on a server that gets serialized into a byte stream which is then sent to a client. When the client processes the stream, it could deserialize the object to a completely different type whose code knows how to remote method calls to the server's object.
  • You make a new version of a type. In this case, you want to deserialize any already-serialized objects into the new version of the type.

The System.Runtime.Serialization.SerializationBinder class makes deserializing an object to a different type very easy. To do this, you first define your own type that derives from the abstract SerializationBinder type (see Figure 3).

Figure 3 Deserializing to a Different Type

sealed class FooVer1ToVer2DeserializationBinder : SerializationBinder {
   public override Type BindToType(
      string assemblyName, string typeName) {

      Type typeToDeserialize = null;

      // For each assemblyName/typeName that you wish to deserialize
      // to a different type, set typeToCreate to the desired type
      String assemVer1 = "MyAssem, Version=1.0.0.0, " +
         "Culture=neutral, PublicKeyToken=b77a5c561934e089";
      String typeVer1 = "Foo";

      if (assemblyName == assemVer1 && typeName == typeVer1) {
         assemblyName = assemblyName.Replace("1.0.0.0", "2.0.0.0");
      }

      // To return the type, do this:
      typeToDeserialize = Type.GetType(String.Format("{0}, {1}", 
         typeName, assemblyName));

      return typeToDeserialize;
   }
}

After you construct a formatter, construct an instance of FooVer1ToVer2DeserializationBinder and set the formatter's Binder read/write property to refer to the binder object. After setting the Binder property, you can now call the formatter's Deserialize method. The following code demonstrates how to construct a formatter and assign an instance of the FooVer1ToVer2DeserializationBinder binder to it:

IFormatter formatter = new BinaryFormatter();
formatter.Binder = new FooVer1ToVer2DeserializationBinder();

// The code below deserializes a version 1.0.0.0 Foo object to a 
// version 2.0.0.0 Foo object. It is assumed that this code is defined
// in the same assembly that defines version 2.0.0.0 of the Foo type.
Foo f = (Foo) formatter.Deserialize(stream);

During deserialization, the formatter sees that a binder has been set. As each object is about to be deserialized, the formatter calls the binder's BindToType method, passing it the assembly name and type that the formatter wants to deserialize. At this point, BindToType decides what type should actually be constructed and returns this type.

Note that the original type and the new type must have the same exact field names and types if the new type uses simple serialization via the Serializable custom attribute. However, the new version of the type could implement the ISerializable interface and then its special constructor will get called and the type can examine the values in the SerializationInfo object and determine how to deserialize itself.

Conclusion

In this three-part series, I covered just about everything you can do with the .NET Framework serialization engine. There are many uses for serialization and employing it properly can greatly reduce the effort required to build applications, persist data, and transfer data between AppDomain, processes, and even between machines.

Send questions and comments for Jeff to dot-net@microsoft.com.

Jeffrey Richteris a cofounder of Wintellect (https://www.Wintellect.com), a training, debugging, and consulting firm specializing in .NET and Windows technologies. He is the author of Applied Microsoft .NET Framework Programming (Microsoft Press, 2002) and several programming books on Windows.