Serialization and Deserialization

Windows Communication Foundation (WCF) includes a new serialization engine, the DataContractSerializer. The DataContractSerializer translates between .NET Framework objects and XML, in both directions. This topic explains how the serializer works.

When serializing .NET Framework objects, the serializer understands a variety of serialization programming models, including the new data contract model. For a full list of supported types, see Types Supported by the Data Contract Serializer. For an introduction to data contracts, see Using Data Contracts.

When deserializing XML, the serializer uses the XmlReader and XmlWriter classes. It also supports the XmlDictionaryReader and XmlDictionaryWriter classes to enable it to produce optimized XML in some cases, such as when using the WCF binary XML format.

WCF also includes a companion serializer, the NetDataContractSerializer. The NetDataContractSerializer:

  • Is not secure. For more information, see the BinaryFormatter security guide.
  • Is similar to the BinaryFormatter and SoapFormatter serializers because it also emits .NET Framework type names as part of the serialized data.
  • Is used when the same types are shared on the serializing and the deserializing ends.

Both DataContractSerializer and NetDataContractSerializer derive from a common base class, XmlObjectSerializer.

Warning

The DataContractSerializer serializes strings containing control characters with a hexadecimal value below 20 as XML entities. This may cause a problem with a non-WCF client when sending such data to a WCF service.

Creating a DataContractSerializer Instance

Constructing an instance of the DataContractSerializer is an important step. After construction, you cannot change any of the settings.

Specifying the Root Type

The root type is the type of which instances are serialized or deserialized. The DataContractSerializer has many constructor overloads, but, at a minimum, a root type must be supplied using the type parameter.

A serializer created for a certain root type cannot be used to serialize (or deserialize) another type, unless the type is derived from the root type. The following example shows two classes.

[DataContract]
public class Person
{
    // Code not shown.
}

[DataContract]
public class PurchaseOrder
{
    // Code not shown.
}
<DataContract()> _
Public Class Person
    ' Code not shown.
End Class

<DataContract()> _
Public Class PurchaseOrder
    ' Code not shown.
End Class

This code constructs an instance of the DataContractSerializer that can be used only to serialize or deserialize instances of the Person class.

DataContractSerializer dcs = new DataContractSerializer(typeof(Person));
// This can now be used to serialize/deserialize Person but not PurchaseOrder.
Dim dcs As New DataContractSerializer(GetType(Person))
' This can now be used to serialize/deserialize Person but not PurchaseOrder.

Specifying Known Types

If polymorphism is involved in the types being serialized that is not already handled using the KnownTypeAttribute attribute or some other mechanism, a list of possible known types must be passed to the serializer’s constructor using the knownTypes parameter. For more information about known types, see Data Contract Known Types.

The following example shows a class, LibraryPatron, that includes a collection of a specific type, the LibraryItem. The second class defines the LibraryItem type. The third and four classes (Book and Newspaper) inherit from the LibraryItem class.

[DataContract]
public class LibraryPatron
{
    [DataMember]
    public LibraryItem[] borrowedItems;
}
[DataContract]
public class LibraryItem
{
    // Code not shown.
}

[DataContract]
public class Book : LibraryItem
{
    // Code not shown.
}

[DataContract]
public class Newspaper : LibraryItem
{
    // Code not shown.
}
<DataContract()> _
Public Class LibraryPatron
    <DataMember()> _
    Public borrowedItems() As LibraryItem
End Class

<DataContract()> _
Public Class LibraryItem
    ' Code not shown.
End Class

<DataContract()> _
Public Class Book
    Inherits LibraryItem
    ' Code not shown.
End Class

<DataContract()> _
Public Class Newspaper
    Inherits LibraryItem
    ' Code not shown.
End Class

The following code constructs an instance of the serializer using the knownTypes parameter.

// Create a serializer for the inherited types using the knownType parameter.
Type[] knownTypes = new Type[] { typeof(Book), typeof(Newspaper) };
DataContractSerializer dcs =
new DataContractSerializer(typeof(LibraryPatron), knownTypes);
// All types are known after construction.
' Create a serializer for the inherited types using the knownType parameter.
Dim knownTypes() As Type = {GetType(Book), GetType(Newspaper)}
Dim dcs As New DataContractSerializer(GetType(LibraryPatron), knownTypes)
' All types are known after construction.

Specifying the Default Root Name and Namespace

Normally, when an object is serialized, the default name and namespace of the outermost XML element are determined according to the data contract name and namespace. The names of all inner elements are determined from data member names, and their namespace is the data contract’s namespace. The following example sets Name and Namespace values in the constructors of the DataContractAttribute and DataMemberAttribute classes.

[DataContract(Name = "PersonContract", Namespace = "http://schemas.contoso.com")]
public class Person2
{
    [DataMember(Name = "AddressMember")]
    public Address theAddress;
}

[DataContract(Name = "AddressContract", Namespace = "http://schemas.contoso.com")]
public class Address
{
    [DataMember(Name = "StreetMember")]
    public string street;
}
<DataContract(Name:="PersonContract", [Namespace]:="http://schemas.contoso.com")> _
Public Class Person2
    <DataMember(Name:="AddressMember")> _
    Public theAddress As Address
End Class

<DataContract(Name:="AddressContract", [Namespace]:="http://schemas.contoso.com")> _
Public Class Address
    <DataMember(Name:="StreetMember")> _
    Public street As String
End Class

Serializing an instance of the Person class produces XML similar to the following.

<PersonContract xmlns="http://schemas.contoso.com">  
  <AddressMember>  
    <StreetMember>123 Main Street</StreetMember>  
   </AddressMember>  
</PersonContract>  

However, you can customize the default name and namespace of the root element by passing the values of the rootName and rootNamespace parameters to the DataContractSerializer constructor. Note that the rootNamespace does not affect the namespace of the contained elements that correspond to data members. It affects only the namespace of the outermost element.

These values can be passed as strings or instances of the XmlDictionaryString class to allow for their optimization using the binary XML format.

Setting the Maximum Objects Quota

Some DataContractSerializer constructor overloads have a maxItemsInObjectGraph parameter. This parameter determines the maximum number of objects the serializer serializes or deserializes in a single ReadObject method call. (The method always reads one root object, but this object may have other objects in its data members. Those objects may have other objects, and so on.) The default is 65536. Note that when serializing or deserializing arrays, every array entry counts as a separate object. Also, note that some objects may have a large memory representation, and so this quota alone may not be sufficient to prevent a denial of service attack. For more information, see Security Considerations for Data. If you need to increase this quota beyond the default value, it is important to do so both on the sending (serializing) and receiving (deserializing) sides because it applies to both when reading and writing data.

Round Trips

A round trip occurs when an object is deserialized and re-serialized in one operation. Thus, it goes from XML to an object instance, and back again into an XML stream.

Some DataContractSerializer constructor overloads have an ignoreExtensionDataObject parameter, which is set to false by default. In this default mode, data can be sent on a round trip from a newer version of a data contract through an older version and back to the newer version without loss, as long as the data contract implements the IExtensibleDataObject interface. For example, suppose version 1 of the Person data contract contains the Name and PhoneNumber data members, and version 2 adds a Nickname member. If IExtensibleDataObject is implemented, when sending information from version 2 to version 1, the Nickname data is stored, and then re-emitted when the data is serialized again; therefore, no data is lost in the round trip. For more information, see Forward-Compatible Data Contracts and Data Contract Versioning.

Security and Schema Validity Concerns with Round Trips

Round trips may have security implications. For example, deserializing and storing large amounts of extraneous data may be a security risk. There may be security concerns about re-emitting this data that there is no way to verify, especially if digital signatures are involved. For example, in the previous scenario, the version 1 endpoint could be signing a Nickname value that contains malicious data. Finally, there may be schema validity concerns: an endpoint may want to always emit data that strictly adheres to its stated contract and not any extra values. In the previous example, the version 1 endpoint’s contract says that it emits only Name and PhoneNumber, and if schema validation is being used, emitting the extra Nickname value causes validation to fail.

Enabling and Disabling Round Trips

To turn off round trips, do not implement the IExtensibleDataObject interface. If you have no control over the types, set the ignoreExtensionDataObject parameter to true to achieve the same effect.

Object Graph Preservation

Normally, the serializer does not care about object identity, as in the following code.

[DataContract]
public class PurchaseOrder
{
    [DataMember]
    public Address billTo;
    [DataMember]
    public Address shipTo;
}

[DataContract]
public class Address
{
    [DataMember]
    public string street;
}
<DataContract()> _
Public Class PurchaseOrder

    <DataMember()> _
    Public billTo As Address

    <DataMember()> _
    Public shipTo As Address

End Class

<DataContract()> _
Public Class Address

    <DataMember()> _
    Public street As String

End Class

The following code creates a purchase order.

// Construct a purchase order:
Address adr = new Address();
adr.street = "123 Main St.";
PurchaseOrder po = new PurchaseOrder();
po.billTo = adr;
po.shipTo = adr;
' Construct a purchase order:
Dim adr As New Address()
adr.street = "123 Main St."
Dim po As New PurchaseOrder()
po.billTo = adr
po.shipTo = adr

Notice that billTo and shipTo fields are set to the same object instance. However, the generated XML duplicates the information duplicated, and looks similar to the following XML.

<PurchaseOrder>  
  <billTo><street>123 Main St.</street></billTo>  
  <shipTo><street>123 Main St.</street></shipTo>  
</PurchaseOrder>  

However, this approach has the following characteristics, which may be undesirable:

  • Performance. Replicating data is inefficient.

  • Circular references. If objects refer to themselves, even through other objects, serializing by replication results in an infinite loop. (The serializer throws a SerializationException if this happens.)

  • Semantics. Sometimes it is important to preserve the fact that two references are to the same object, and not to two identical objects.

For these reasons, some DataContractSerializer constructor overloads have a preserveObjectReferences parameter (the default is false). When this parameter is set to true, a special method of encoding object references, which only WCF understands, is used. When set to true, the XML code example now resembles the following.

<PurchaseOrder ser:id="1">  
  <billTo ser:id="2"><street ser:id="3">123 Main St.</street></billTo>  
  <shipTo ser:ref="2"/>  
</PurchaseOrder>  

The "ser" namespace refers to the standard serialization namespace, http://schemas.microsoft.com/2003/10/Serialization/. Each piece of data is serialized only once and given an ID number, and subsequent uses result in a reference to the already serialized data.

Important

If both "id" and "ref" attributes are present in the data contract XMLElement, then the "ref" attribute is honored and the "id" attribute is ignored.

It is important to understand the limitations of this mode:

  • The XML the DataContractSerializer produces with preserveObjectReferences set to true is not interoperable with any other technologies, and can be accessed only by another DataContractSerializer instance, also with preserveObjectReferences set to true.

  • There is no metadata (schema) support for this feature. The schema that is produced is valid only for the case when preserveObjectReferences is set to false.

  • This feature may cause the serialization and deserialization process to run slower. Although data does not have to be replicated, extra object comparisons must be performed in this mode.

Caution

When the preserveObjectReferences mode is enabled, it is especially important to set the maxItemsInObjectGraph value to the correct quota. Due to the way arrays are handled in this mode, it is easy for an attacker to construct a small malicious message that results in large memory consumption limited only by the maxItemsInObjectGraph quota.

Specifying a Data Contract Surrogate

Some DataContractSerializer constructor overloads have a dataContractSurrogate parameter, which may be set to null. Otherwise, you can use it to specify a data contract surrogate, which is a type that implements the IDataContractSurrogate interface. You can then use the interface to customize the serialization and deserialization process. For more information, see Data Contract Surrogates.

Serialization

The following information applies to any class that inherits from the XmlObjectSerializer, including the DataContractSerializer and NetDataContractSerializer classes.

Simple Serialization

The most basic way to serialize an object is to pass it to the WriteObject method. There are three overloads, one each for writing to a Stream, an XmlWriter, or an XmlDictionaryWriter. With the Stream overload, the output is XML in the UTF-8 encoding. With the XmlDictionaryWriter overload, the serializer optimizes its output for binary XML.

When using the WriteObject method, the serializer uses the default name and namespace for the wrapper element and writes it out along with the contents (see the previous "Specifying the Default Root Name and Namespace" section).

The following example demonstrates writing with an XmlDictionaryWriter.

Person p = new Person();
DataContractSerializer dcs =
    new DataContractSerializer(typeof(Person));
XmlDictionaryWriter xdw =
    XmlDictionaryWriter.CreateTextWriter(someStream,Encoding.UTF8 );
dcs.WriteObject(xdw, p);
Dim p As New Person()
Dim dcs As New DataContractSerializer(GetType(Person))
Dim xdw As XmlDictionaryWriter = _
    XmlDictionaryWriter.CreateTextWriter(someStream, Encoding.UTF8)
dcs.WriteObject(xdw, p)

This produces XML similar to the following.

<Person>  
  <Name>Jay Hamlin</Name>  
  <Address>123 Main St.</Address>  
</Person>  

Step-By-Step Serialization

Use the WriteStartObject, WriteObjectContent, and WriteEndObject methods to write the end element, write the object contents, and close the wrapper element, respectively.

Note

There are no Stream overloads of these methods.

This step-by-step serialization has two common uses. One is to insert contents such as attributes or comments between WriteStartObject and WriteObjectContent, as shown in the following example.

dcs.WriteStartObject(xdw, p);
xdw.WriteAttributeString("serializedBy", "myCode");
dcs.WriteObjectContent(xdw, p);
dcs.WriteEndObject(xdw);
dcs.WriteStartObject(xdw, p)
xdw.WriteAttributeString("serializedBy", "myCode")
dcs.WriteObjectContent(xdw, p)
dcs.WriteEndObject(xdw)

This produces XML similar to the following.

<Person serializedBy="myCode">  
  <Name>Jay Hamlin</Name>  
  <Address>123 Main St.</Address>  
</Person>  

Another common use is to avoid using WriteStartObject and WriteEndObject entirely, and to write your own custom wrapper element (or even skip writing a wrapper altogether), as shown in the following code.

xdw.WriteStartElement("MyCustomWrapper");
dcs.WriteObjectContent(xdw, p);
xdw.WriteEndElement();
xdw.WriteStartElement("MyCustomWrapper")
dcs.WriteObjectContent(xdw, p)
xdw.WriteEndElement()

This produces XML similar to the following.

<MyCustomWrapper>  
  <Name>Jay Hamlin</Name>  
  <Address>123 Main St.</Address>  
</MyCustomWrapper>  

Note

Using step-by-step serialization may result in schema-invalid XML.

Deserialization

The following information applies to any class that inherits from the XmlObjectSerializer, including the DataContractSerializer and NetDataContractSerializer classes.

The most basic way to deserialize an object is to call one of the ReadObject method overloads. There are three overloads, one each for reading with a XmlDictionaryReader, an XmlReader, or a Stream. Note that the Stream overload creates a textual XmlDictionaryReader that is not protected by any quotas, and should be used only to read trusted data.

Also note that the object the ReadObject method returns must be cast to the appropriate type.

The following code constructs an instance of the DataContractSerializer and an XmlDictionaryReader, then deserializes a Person instance.

DataContractSerializer dcs = new DataContractSerializer(typeof(Person));
FileStream fs = new FileStream(path, FileMode.Open);
XmlDictionaryReader reader =
XmlDictionaryReader.CreateTextReader(fs, new XmlDictionaryReaderQuotas());

Person p = (Person)dcs.ReadObject(reader);
Dim dcs As New DataContractSerializer(GetType(Person))
Dim fs As New FileStream(path, FileMode.Open)
Dim reader As XmlDictionaryReader = _
   XmlDictionaryReader.CreateTextReader(fs, New XmlDictionaryReaderQuotas())

Dim p As Person = CType(dcs.ReadObject(reader), Person)

Before calling the ReadObject method, position the XML reader on the wrapper element or on a non-content node that precedes the wrapper element. You can do this by calling the Read method of the XmlReader or its derivation, and testing the NodeType, as shown in the following code.

DataContractSerializer ser = new DataContractSerializer(typeof(Person),
"Customer", @"http://www.contoso.com");
FileStream fs = new FileStream(path, FileMode.Open);
XmlDictionaryReader reader =
XmlDictionaryReader.CreateTextReader(fs, new XmlDictionaryReaderQuotas());
while (reader.Read())
{
    switch (reader.NodeType)
    {
        case XmlNodeType.Element:
            if (ser.IsStartObject(reader))
            {
                Console.WriteLine("Found the element");
                Person p = (Person)ser.ReadObject(reader);
                Console.WriteLine("{0} {1}    id:{2}",
                    p.Name , p.Address);
            }
            Console.WriteLine(reader.Name);
            break;
    }
}
Dim ser As New DataContractSerializer(GetType(Person), "Customer", "http://www.contoso.com")
Dim fs As New FileStream(path, FileMode.Open)
Dim reader As XmlDictionaryReader = XmlDictionaryReader.CreateTextReader(fs, New XmlDictionaryReaderQuotas())

While reader.Read()
    Select Case reader.NodeType
        Case XmlNodeType.Element
            If ser.IsStartObject(reader) Then
                Console.WriteLine("Found the element")
                Dim p As Person = CType(ser.ReadObject(reader), Person)
                Console.WriteLine("{0} {1}", _
                                   p.Name, p.Address)
            End If
            Console.WriteLine(reader.Name)
    End Select
End While

Note that you can read attributes on this wrapper element before handing the reader to ReadObject.

When using one of the simple ReadObject overloads, the deserializer looks for the default name and namespace on the wrapper element (see the preceding section, "Specifying the Default Root Name and Namespace") and throws an exception if it finds an unknown element. In the preceding example, the <Person> wrapper element is expected. The IsStartObject method is called to verify that the reader is positioned on an element that is named as expected.

There is a way to disable this wrapper element name check; some overloads of the ReadObject method take the Boolean parameter verifyObjectName, which is set to true by default. When set to false, the name and namespace of the wrapper element is ignored. This is useful for reading XML that was written using the step-by-step serialization mechanism described previously.

Using the NetDataContractSerializer

The primary difference between the DataContractSerializer and the NetDataContractSerializer is that the DataContractSerializer uses data contract names, whereas the NetDataContractSerializer outputs full .NET Framework assembly and type names in the serialized XML. This means that the exact same types must be shared between the serialization and deserialization endpoints. This means that the known types mechanism is not required with the NetDataContractSerializer because the exact types to be deserialized are always known.

However, several problems can occur:

  • Security. Any type found in the XML being deserialized is loaded. This can be exploited to force the loading of malicious types. Using the NetDataContractSerializer with untrusted data should be done only if a Serialization Binder is used (using the Binder property or constructor parameter). The binder permits only safe types to be loaded. The Binder mechanism is identical to the one that types in the System.Runtime.Serialization namespace use.

  • Versioning. Using full type and assembly names in the XML severely restricts how types can be versioned. The following cannot be changed: type names, namespaces, assembly names, and assembly versions. Setting the AssemblyFormat property or constructor parameter to Simple instead of the default value of Full allows for assembly version changes, but not for generic parameter types.

  • Interoperability. Because .NET Framework type and assembly names are included in the XML, platforms other than the .NET Framework cannot access the resulting data.

  • Performance. Writing out the type and assembly names significantly increases the size of the resulting XML.

This mechanism is similar to binary or SOAP serialization used by .NET Framework remoting (specifically, the BinaryFormatter and the SoapFormatter).

Using the NetDataContractSerializer is similar to using the DataContractSerializer, with the following differences:

  • The constructors do not require you to specify a root type. You can serialize any type with the same instance of the NetDataContractSerializer.

  • The constructors do not accept a list of known types. The known types mechanism is unnecessary if type names are serialized into the XML.

  • The constructors do not accept a data contract surrogate. Instead, they accept an ISurrogateSelector parameter called surrogateSelector (which maps to the SurrogateSelector property). This is a legacy surrogate mechanism.

  • The constructors accept a parameter called assemblyFormat of the FormatterAssemblyStyle that maps to the AssemblyFormat property. As discussed previously, this can be used to enhance the versioning capabilities of the serializer. This is identical to the FormatterAssemblyStyle mechanism in binary or SOAP serialization.

  • The constructors accept a StreamingContext parameter called context that maps to the Context property. You can use this to pass information into types being serialized. This usage is identical to that of the StreamingContext mechanism used in other System.Runtime.Serialization classes.

  • The Serialize and Deserialize methods are aliases for the WriteObject and ReadObject methods. These exist to provide a more consistent programming model with binary or SOAP serialization.

For more information about these features, see Binary Serialization.

The XML formats that the NetDataContractSerializer and the DataContractSerializer use are normally not compatible. That is, attempting to serialize with one of these serializers and deserialize with the other is not a supported scenario.

Also, note that the NetDataContractSerializer does not output the full .NET Framework type and assembly name for each node in the object graph. It outputs that information only where it is ambiguous. That is, it outputs at the root object level and for any polymorphic cases.

See also