From the April 2002 issue of MSDN Magazine

MSDN Magazine

Run-time Serialization
Download the code for this article:NET0204.exe (41KB)
S

erialization is the process of converting an object or a con-nected graph of objects into a contiguous stream of bytes. Deserialization is the process of converting a contiguous stream of bytes back into its graph of connected objects. The ability to convert objects to and from a byte stream is an incredibly useful mechanism. Here are some examples:

  • An application's state (object graph) can easily be saved in a disk file or database and then restored the next time the application is run. ASP.NET saves and restores session state by way of serialization and deserialization.
  • A set of objects can easily be copied to the system's clipboard and then pasted into the same or another application. In fact, Windows® Forms uses this procedure.
  • A set of objects can be cloned and set aside as a backup while a user manipulates the main set of objects.
  • A set of objects can easily be sent over the network to a process running on another machine. The Microsoft® .NET Framework remoting architecture serializes and deserializes objects that are marshaled by value.

      In addition to these four examples, once you have serialized objects in a byte stream in memory, it is quite easy to perform some more useful operations on the data such as encrypting and compressing the data.
      Since serialization is so useful, it's no wonder that many programmers have spent countless hours developing code to perform these types of actions. Historically, this serialization code is difficult to write, extremely tedious, and error prone. Some of the difficult issues that developers need to grapple with are communication protocols, client/server data type mismatches (such as little-endian/big-endian issues), error handling, objects that refer to other objects, in and out parameters, arrays of structures, and the list goes on and on.
      Well, you'll be happy to know that the .NET Framework has fantastic support for serialization and deserialization. This means that all of the difficult issues I mentioned are now handled completely and transparently by the .NET Framework. As a developer, you can work with your objects before serialization and after deserialization and have the .NET Framework handle the stuff in the middle.
      In this three-part series of columns, I will explain how the .NET Framework exposes its serialization and deserialization services. For almost all data types, the default behavior of these services will be sufficient, meaning that it takes almost no work for you to make your own types serializable. However, there is a small minority of types for which the serialization service's default behavior will not be sufficient. Fortunately, the serialization services are very extensible, and I will explain in these three columns how to tap this extensibility to do some pretty powerful things when serializing or deserializing objects. For example, I'll demonstrate how to serialize Version 1 of an object out to a disk file and then deserialize it a year later into an object of Version 2.

Serialization/Deserialization Quick Start

      Let's start off by looking at some code that serializes an object graph (a connected set of objects) into a byte stream in memory (see Figure 1). Wow, look how simple this is! First, this method constructs a System.IO.MemoryStream object. This object identifies where the serialized block of bytes is to be placed. Then the method constructs a BinaryFormatter object (which can be found in the System.Runtime.Serialization.Formatters.Binary namespace). A formatter is a type (implementing the System.Runtime.Serialization.IFormatter interface) that knows how to serialize and deserialize an object graph. The Framework Class Library ships with two formatters: the BinaryFormatter (used in Figure 1) and a SoapFormatter (which is found in the System.Runtime.Serialization.Formatters.Soap namespace and is implemented in the System.Runtime.Serialization.Formatters.Soap.dll assembly).
      Developers can define their own formatter type if they need to serialize/deserialize an object graph using a format other than binary or SOAP. However, implementing your own formatter is a relatively complex task. Fortunately, most developers will not need to do this—the BinaryFormatter and SoapFormatter types are expected to be sufficient for almost all developers' needs.
      In order to serialize a graph of objects, just call the formatter's Serialize method and pass it two things: a reference to a stream object and a reference to the object graph that you want to serialize. The stream object identifies where the serialized bytes should be placed and can be an object of any type derived from the System.IO.Stream abstract base class. This means that you can serialize an object graph to a MemoryStream, a FileStream, a NetworkStream, and so on.
      The second parameter to Serialize is a reference to an object. This object could be anything—an Int32, a String, a DateTime, an Exception, an Image, an ArrayList, a Hashtable, and so on. The object referred to by the objGraph parameter may refer to other objects. For example, objGraph may refer to a Hashtable, which refers to a set of objects. These objects may also refer to other objects. When the formatter's Serialize method is called, all objects in the graph are serialized to the stream.
      Formatters know how to serialize the complete object graph by referring to the metadata that describes each object's type. The Serialize method uses reflection to see what instance fields are in each object's type as it is serialized. If any of these fields refer to other objects, then the formatter's Serialize method knows to serialize these objects, too.
      Formatters have very intelligent algorithms. They know to serialize each object in the graph out to the stream no more than once. That is, if two objects in the graph refer to each other, then the formatter detects this, serializes each object just once, and avoids entering into an infinite loop.
      In my SerializeToMemory method, when the formatter's Serialize method returns, the MemoryStream is simply returned to the caller. The application uses the contents of this flat byte array any way it desires. For example, it could save it in a file, copy it to the clipboard, send it over a wire, or whatever.
      Now that you see how easy it is to serialize an entire object graph, let's look at some code that deserializes a memory byte stream back into an object graph (see Figure 2). Double wow! This is even simpler than serializing an object graph. In this code, a BinaryFormatter is constructed and then its Deserialize method is called. This method takes the stream as a parameter and returns a reference to the root object within the deserialized object graph.
      Internally, the formatter's Deserialize method examines the contents of the byte stream, constructs instances of all the objects that are in the stream, and initializes the fields in all these objects so that they have the same values they had when the object graph was serialized. Typically, you will cast the object reference that was returned from the Deserialize method into the type that your application is expecting to receive.
      Figure 3 shows a fun, useful method that uses serialization to make a deep copy, or clone, of an object.
      At this point, I'd like to add a few notes to the discussion. First, it is up to you to ensure that your code uses the same formatter for both serialization and deserialization. For example, don't write code that serializes an object graph using the SoapFormatter and then deserializes the graph using the BinaryFormatter. If Deserialize can't decipher the contents of the stream, then a System.Runtime.Serialization.SerializationException will be thrown.
      Second, it is possible and also quite useful to serialize multiple object graphs out to a single byte stream. Here is an example:

  void SaveApplicationState(Stream stream) {
  
// Construct a serialization formatter that does the hard work
BinaryFormatter formatter = new BinaryFormatter();

// Serialize our application's entire state
formatter.Serialize(stream, customers);
formatter.Serialize(stream, pendingOrders);
formatter.Serialize(stream, processedOrders);
}

 

In order to reconstruct the application's state, you could simply do the following:

  void RestoreApplicationState(Stream stream) {
  
// Construct a serialization formatter that does the hard work
BinaryFormatter formatter = new BinaryFormatter();

// Serialize our application's entire state
customer = (Customers) formatter.Deserialize(stream);
pendingOrders = (Orders) formatter.Deserialize(stream);
processedOrders = (Orders) formatter.Deserialize(stream);
}

 

      Finally, when serializing an object, the full name of the type and the name of the type's defining assembly are written to the byte stream. By default, the BinaryFormatter and SoapFormatter types output the assembly's full identity, which includes the assembly's file name (without extension), version number, culture, and public key information. However, you can make these formatters write the simple assembly name (just file name; no version, culture, or public key information) for each serialized type by setting the formatter's AssemblyFormat property to FormatterAssemblyStyle.Simple. This read/write property is offered by both the BinaryFormatter and SoapFormatter types.
      When deserializing an object, the formatter first grabs the assembly identity and ensures that the assembly is loaded into the executing AppDomain. If the formatter's AssemblyFormat property is set to FormatterAssemblyStyle.Full (the default), then the assembly is loaded by calling System.Reflection.Assembly's Load method. This method looks in the Global Assembly Cache (GAC) for the specified assembly and then looks in the application's directory. If the assembly can't be found, then an exception is thrown and no more objects can be deserialized.
      However, if the formatter's AssemblyFormat property is set to FormatterAssemblyStyle.Simple, then the assembly is loaded by calling System.Reflection.Assembly's LoadWithPartialName method. This method looks in the application's directory first for a matching assembly. If the application's directory doesn't have a matching assembly, then the GAC is scanned, looking for the highest version numbered assembly with the same file name. However, since no culture or public key information is specified, the assembly loaded is actually undefined and therefore, you should never set the AssemblyFormat property to FormatterAssemblyStyle.Simple when serializing or deserializing a byte stream. Chapter 20 of my book, Applied Microsoft .NET Framework Programming (Microsoft Press, 2002) has more information about Assembly's Load and LoadWithPartialName methods.
      After an assembly has been loaded, the formatter looks in the assembly for a type matching that of the object being deserialized. If the assembly doesn't contain a matching type, an exception is thrown and no more objects can be deserialized. If a matching type is found, an instance of the type is created and its fields are initialized from the values contained in the byte stream. If the type's fields don't exactly match the names of the fields as read from the byte stream, then a SerializationException exception is thrown and no more objects can be deserialized. Later in this column, I'll discuss some sophisticated mechanisms that allow you to override some of this behavior.
      Some extensible applications use Assembly.LoadFrom to load an assembly and then construct objects from types defined in the loaded assembly. These objects can be serialized to a stream without any trouble. However, when deserializing this stream, the formatter attempts to load the assembly by calling Assembly's Load or LoadWithPartialName method instead of calling the LoadFrom method. In most cases, the Common Language Runtime (CLR) will not be able to locate the assembly file, causing a SerializationException exception to be thrown. This catches many developers by surprise since the objects serialized correctly, and they expect that the objects will deserialize correctly as well.
      If your application serializes objects whose types are defined in an assembly your application loads using Assembly.LoadFrom, then I recommend you implement a method whose signature matches the System.ResolveEventHandler delegate and register this method with System.AppDomain's AssemblyResolve event just before calling a formatter's Deserialize method. (Unregister this method with the event after Deserialize returns.) Now, whenever the formatter fails to load an assembly, the CLR calls your ResolveEventHandler method. The identity of the assembly that failed to load is passed to this method. The method can extract the assembly file name from the assembly's identity and use this name to construct the path where the application knows the assembly file can be found. Then, the method can call Assembly.LoadFrom to load the assembly and return the resulting Assembly reference back from the ResolveEventHandler method. The files in the LoadFromSerialization directory (see the link at the top of this article) demonstrate this technique.
      This section covered the basics of how to serialize and deserialize object graphs. In the remaining sections, I'll look at what you must do in order to define your own serializable types and also will explore various mechanisms that allow you to have greater control over serialization and deserialization.

Defining Serializable Types

      When a type is designed, the developer must make the conscious decision as to whether to allow instances of the type to be serializable. By default, types are not serializable. For example, the code in Figure 4 does not perform as expected. If you were to build and run this program, you'd see that the formatter's Serialize method throws a System.Runtime.Serialization.SerializationException exception. The problem is that the developer of the Point type has not explicitly indicated that Point objects may be serialized. To solve this problem, the developer must apply the System.SerializableAttribute custom attribute to this type as follows. (Note that this attribute is defined in the System namespace, not the System.Runtime.Serialization namespace.)

  [Serializable]
  
struct Point {
public Int32 x, y;

public Point(Int32 x, Int32 y) {
this.x = x;
this.y = y;
}
}

 

      Now, if you rebuild the application and run it, it does perform as expected and the Point object will be serialized to the stream. When serializing an object graph, the formatter checks that every object's type is serializable. If any object in the graph is not serializable, the formatter's Serialize method throws the SerializationException exception.
      When serializing a graph of objects, some of the object's types may be serializable while some of the objects may not be serializable. For performance reasons, format ters do not verify that all of the objects in the graph are serializable before serializing the graph. So, when serializing an object graph, it is entirely possible that some objects may be serialized to the byte stream before the SerializationException is thrown. If this happens, the byte stream is corrupt.
      Your application code should try to recover gracefully from this situation. If you think you may be serializing an object graph where some objects may not be serializable, I recommend that you serialize the objects into a MemoryStream first. Then, if all objects are successfully serialized, you can copy the bytes in the MemoryStream to whatever stream (file or network, for example) you really want the bytes written to.
      The SerializableAttribute custom attribute may be applied to reference types (class), value types (struct), enumerated types (enum), and delegate types (delegate) only. (Note that enumerated and delegate types are always serializable so there is no need to explicitly apply the SerializableAttribute attribute to these types.) In addition, the SerializableAttribute attribute is not inherited by derived types. So, given the following two type definitions, a Person object can be serialized but an Employee object cannot:

  [Serializable]
  
class Person {
•••
}

class Employee : Person {
•••
}

 

To fix this, you would just apply the SerializableAttribute attribute to the Employee type as well:

  [Serializable]
  
class Person {
•••
}

[Serializable]
class Employee : Person {
•••
}

 

      This problem was easy to fix. However, the reverse—defining a type derived from a base type that doesn't have the SerializableAttribute attribute applied to it—is not. But, this is by design. If the base type doesn't allow instances of its type to be serialized, its fields cannot be serialized since a base object is effectively part of the derived object. This is why System.Object has the SerializableAttribute attribute applied to it.
      In general, it is recommended that most types you define be serializable. After all, this grants a lot of flexibility to users of your types. However, you must be aware that serialization reads all of an object's fields regardless of whether the fields are declared as public, protected, internal, or private. You might not want to make a type serializable if it contains sensitive or secure data (such as passwords for example) or if the data would have no meaning or value if transferred.
      If you find yourself using a type that was not designed for serialization and you do not have the source code of the type to add serialization support, all is not lost. In Part 2 of this series, I will explain how you can make any nonserializable type serializable.

Controlling which Fields are Serialized

      When you apply the SerializableAttribute custom attribute to a type, all instance fields (public, private, protected, and so on) are serialized. However, a type may define some instance fields that should not be serialized. In general, there are two reasons why you would not want some of a type's instance fields to be serialized. The first reason is that the field contains information that would not be valid when deserialized. For example, an object that contains a handle to a Windows kernel object (such as a file, process, thread, mutex, event, semaphore, and so on) would have no meaning when deserialized into another process or machine since Windows kernel handles are process-relative values. Another reason is that the field contains information that is easily calculated. In this case, you select which fields do not need to be serialized, thus improving your application's performance by reducing the amount of data transferred.
      The following code uses the System.NonSerializedAttribute custom attribute to indicate which fields of the type should not be serialized. (Note that this attribute is also defined in the System namespace, not the System.Runtime.Serialization namespace.)

  [Serializable]
  
class Circle {
Double radius;

[NonSerialized]
Double area;

public Circle(Double radius) {
this.radius = radius;
area = Math.PI * radius * radius;
}

•••
}

 

In this code, objects of Circle may be serialized. However, the formatter will only serialize the values in the object's radius field. The value in the area field will not be serialized because it has the NonSerializedAttribute attribute applied to it. This attribute can only be applied to a type's fields and it continues to apply to this field when inherited by another type. Of course, you may apply the NonSerializedAttribute attribute to multiple fields within a type.
      So, let's say that the code constructs a Circle object as follows:

  Circle c = new Circle(10);
  

Internally, the area field is set to a value approximate to 314.159. When this object gets serialized, only the value of the radius field (10) gets written to the stream. This is exactly what you want, but now you have a problem when the stream is deserialized back into a Circle object. When deserialized, the Circle object will get its radius field set to 10, but its area field will be initialized to 0—not to 314.159!
      The code in Figure 5 demonstrates how to modify the Circle type in order to fix this problem. First, I've changed Circle so that it now implements the System.Runtime.Serialization.IDeserializationCalback interface:

  public interface IDeserializationCallback {
  
void OnDeserialization(Object sender);
}

 

During deserialization, formatters check to see if the object it is deserializing comes from a type that implements the IDeserializationCallback interface. If the type does implement this interface, then the formatter adds this object's reference to an internal list. After all of the objects have been deserialized, the formatter walks this list and calls each object's OnDeserialization method. When this method is called, all the serializable fields will be set correctly and they may be accessed to perform any additional work that would be necessary to fully deserialize the object.
      In the modified version of Circle (see Figure 5), I made the OnDeserialization method simply calculate the area of the circle using the radius field, placing the result in the area field. Now, area will have the value 314.159. Currently, Microsoft's formatters always pass null to OnDeserialization's sender parameter, which is why this parameter should not be used in the method. The Hashtable code shown in the next section demonstrates another way of using the IDeserialzeCallback interface and its OnDeserialization method.

The SerializableAttribute Attribute

      In this section, I'll provide a bit more insight into how a formatter serializes an object's fields. This knowledge can help you understand the more advanced serialization and deserialization techniques that I'll explain in the remainder of this column and in my next column.
      To make things easier for a formatter, the Framework Class Library offers a FormatterServices type in the System.Runtime.Serialization namespace. This type has only static methods in it and no instances of the type may be instantiated. The following steps describe how a formatter automatically serializes an object whose type has the SerializableAttribute attribute applied to it:

  1. The formatter calls FormatterServices' GetSerializableMembers method:
      public static MemberInfo[] GetSerializableMembers(Type type); 
      
    This method uses reflection to get the type's instance fields (excluding any fields marked with the NonSerializedAttribute attribute). The method returns an array of MemberInfo objects, one for each serializable instance field.
  2. The object being serialized and the array of System.Reflection.MemberInfo objects is then passed to FormatterServices' static GetObjectData method:
      public static Object[] GetObjectData(Object obj, 
      
    MemberInfo[] members);
    This method returns an array of Objects where each element identifies a field in the object being serialized. This Object array and the MemberInfo array are parallel. That is, element 0 in the Object array is the value of the member identified by element 0 in the MemberInfo array.
  3. The formatter writes the assembly's identity and the type's full name to the byte stream.
  4. Finally, the formatter then enumerates over the elements in the two arrays, writing each member's name and value to the byte stream.

      The following steps describe how a formatter automatically deserializes an object whose type has the SerializableAttribute attribute applied to it.

  1. The formatter reads the assembly's identity and full type name from the byte stream. If the assembly is not currently loaded into the AppDomain, it is loaded now (as described earlier). If the assembly can't be loaded, a SerializationException exception is thrown and the object cannot be deserialized. If the assembly is already loaded, the formatter passes the assembly identity information and the type's full name to FormatterServices' static GetTypeFromAssembly method, as you can see here:
      public static Type GetTypeFromAssembly(Assembly assem, 
      
    String name);
    This method returns a System.Type object, indicating the type of object that is being deserialized.
  2. The formatter calls FormatterServices' static GetUninitializedObject method:
      public static Object GetUninitializedObject(Type type);
      
    This method allocates memory (all bytes are initialized to null or 0) for a new object, but does not call a constructor for the object.
  3. The formatter now constructs and initializes a MemberInfo array like it did before by calling FormatterServices' GetSerializableMembers method. This method returns the set of fields that were serialized and that now will need to be deserialized.
  4. The formatter creates and initializes an Object array from the data contained in the byte stream.
  5. The reference to the newly allocated object, the MemberInfo array, and the parallel Object array of field values is passed to FormatterServices' static PopulateObjectMembers method, like so:
      public static Object PopulateObjectMembers(Object obj,
      
    MemberInfo[] members, Object[] data);
    This method enumerates over the arrays, initializing each field to its corresponding value. At this point, you can be sure the object has been completely deserialized.

Conclusion

      In this column I've shown how to use the two serialization formatters included in the Framework Class Library and how to define types that allow instances of themselves to be serialized. In my next two columns, I'll continue the discussion of serialization by focusing on more advanced techniques that allow a type to completely define how it serializes and deserializes itself. I'll also discuss how these more sophisticated techniques are used by various features offered by the .NET Framework.

Send questions and comments for Jeff to dot-net@microsoft.com.

Jeffrey Richter is the author of Applied Microsoft .NET Framework Programming (Microsoft Press, 2002) and Programming Applications for Microsoft Windows (Microsoft Press, 1999). He is a cofounder of Wintellect (https://www.Wintellect.com), and specializes in programming for .NET and Win32.