Understanding Encodings

Silverlight

A character encoding system describes the rules by which a character set can be translated into numbers. Any character encoding system consists of two separate components:

  • An encoder, which translates a sequence of characters into a sequence of numeric values (a sequence of bytes).

  • A decoder, which translates a sequence of bytes into a series of characters.

The .NET Framework for Silverlight includes two encoding classes:

  • The UTF8Encoding class, which uses UTF-8 encoding to represent a character in from one to three bytes. UTF8Encoding has been tuned to be as fast as possible and should be faster than any other encoding.

  • The UnicodeEncoding class, which uses UTF-16 encoding to represent a character in either two or four bytes. UTF-16 encoded bytes can be in either little-endian format (least significant byte first) or big-endian format (most significant byte first). For example, the space character (\u0020) is encoded as 0x20 0x00 in little-endian format and as 0x00 0x20 in big-endian format. Internally, the .NET Framework stores text using UTF-16 encoding in a little-endian format.

Both of these classes inherit from the Encoding class.

If you require an encoding that is not available in the .NET Framework for Silverlight, you have two options:

When a method tries an encoding or decoding operation but no mapping exists, it must implement a fallback strategy, which determines how the failed mapping should be handled. There are two types of fallback strategies:

  • Default

    If the attempt to encode a character fails, it is replaced by the byte sequence for the REPLACEMENT CHARACTER character. This is 0xFD 0xFF for little-endian Unicode, 0xFF 0xFD for big-endian Unicode, and 0xEF 0xBF 0xBD for UTF-7. The default fallback is used with all Encoding objects except those instantiated by calling the UTF8Encoding::UTF8Encoding(Boolean, Boolean) and UnicodeEncoding::UnicodeEncoding(Boolean, Boolean, Boolean) constructors with the throwOnInvalidBytes parameter set to true.

  • Application-defined

    If an Encoding object is instantiated by calling the UTF8Encoding::UTF8Encoding(Boolean, Boolean) or UnicodeEncoding::UnicodeEncoding(Boolean, Boolean, Boolean) constructor with the throwOnInvalidBytes parameter set to true, an EncoderFallbackException is thrown if an encoding method cannot successfully map a character to a byte sequence, and a DecoderFallbackException is thrown if a decoding method cannot successfully map a byte sequence to a character. By handling the exception, the application can define a substitute byte sequence for an encoding operation or a specific replacement character for a decoding operation.

Community Additions

ADD
Show: