Windows apps
Collapse the table of content
Expand the table of content
Information
The topic you requested is included in another documentation set. For convenience, it's displayed below. Choose Switch to see the topic in its original location.

UTF32Encoding Class

 

Represents a UTF-32 encoding of Unicode characters.

Namespace:   System.Text
Assembly:  mscorlib (in mscorlib.dll)

System.Object
  System.Text.Encoding
    System.Text.UTF32Encoding

[SerializableAttribute]
public sealed class UTF32Encoding : Encoding

NameDescription
System_CAPS_pubmethodUTF32Encoding()

Initializes a new instance of the UTF32Encoding class.

System_CAPS_pubmethodUTF32Encoding(Boolean, Boolean)

Initializes a new instance of the UTF32Encoding class. Parameters specify whether to use the big endian byte order and whether the GetPreamble method returns a Unicode Unicode byte order mark.

System_CAPS_pubmethodUTF32Encoding(Boolean, Boolean, Boolean)

Initializes a new instance of the UTF32Encoding class. Parameters specify whether to use the big endian byte order, whether to provide a Unicode byte order mark, and whether to throw an exception when an invalid encoding is detected.

NameDescription
System_CAPS_pubpropertyBodyName

When overridden in a derived class, gets a name for the current encoding that can be used with mail agent body tags.(Inherited from Encoding.)

System_CAPS_pubpropertyCodePage

When overridden in a derived class, gets the code page identifier of the current Encoding.(Inherited from Encoding.)

System_CAPS_pubpropertyDecoderFallback

Gets or sets the DecoderFallback object for the current Encoding object.(Inherited from Encoding.)

System_CAPS_pubpropertyEncoderFallback

Gets or sets the EncoderFallback object for the current Encoding object.(Inherited from Encoding.)

System_CAPS_pubpropertyEncodingName

When overridden in a derived class, gets the human-readable description of the current encoding.(Inherited from Encoding.)

System_CAPS_pubpropertyHeaderName

When overridden in a derived class, gets a name for the current encoding that can be used with mail agent header tags.(Inherited from Encoding.)

System_CAPS_pubpropertyIsBrowserDisplay

When overridden in a derived class, gets a value indicating whether the current encoding can be used by browser clients for displaying content.(Inherited from Encoding.)

System_CAPS_pubpropertyIsBrowserSave

When overridden in a derived class, gets a value indicating whether the current encoding can be used by browser clients for saving content.(Inherited from Encoding.)

System_CAPS_pubpropertyIsMailNewsDisplay

When overridden in a derived class, gets a value indicating whether the current encoding can be used by mail and news clients for displaying content.(Inherited from Encoding.)

System_CAPS_pubpropertyIsMailNewsSave

When overridden in a derived class, gets a value indicating whether the current encoding can be used by mail and news clients for saving content.(Inherited from Encoding.)

System_CAPS_pubpropertyIsReadOnly

When overridden in a derived class, gets a value indicating whether the current encoding is read-only.(Inherited from Encoding.)

System_CAPS_pubpropertyIsSingleByte

When overridden in a derived class, gets a value indicating whether the current encoding uses single-byte code points.(Inherited from Encoding.)

System_CAPS_pubpropertyWebName

When overridden in a derived class, gets the name registered with the Internet Assigned Numbers Authority (IANA) for the current encoding.(Inherited from Encoding.)

System_CAPS_pubpropertyWindowsCodePage

When overridden in a derived class, gets the Windows operating system code page that most closely corresponds to the current encoding.(Inherited from Encoding.)

NameDescription
System_CAPS_pubmethodClone()

When overridden in a derived class, creates a shallow copy of the current Encoding object.(Inherited from Encoding.)

System_CAPS_pubmethodEquals(Object)

Determines whether the specified Object is equal to the current UTF32Encoding object.(Overrides Encoding.Equals(Object).)

System_CAPS_pubmethodGetByteCount(Char*, Int32)

Calculates the number of bytes produced by encoding a set of characters starting at the specified character pointer.(Overrides Encoding.GetByteCount(Char*, Int32).)

System_CAPS_pubmethodGetByteCount(Char[])

When overridden in a derived class, calculates the number of bytes produced by encoding all the characters in the specified character array.(Inherited from Encoding.)

System_CAPS_pubmethodGetByteCount(Char[], Int32, Int32)

Calculates the number of bytes produced by encoding a set of characters from the specified character array.(Overrides Encoding.GetByteCount(Char[], Int32, Int32).)

System_CAPS_pubmethodGetByteCount(String)

Calculates the number of bytes produced by encoding the characters in the specified String.(Overrides Encoding.GetByteCount(String).)

System_CAPS_pubmethodGetBytes(Char*, Int32, Byte*, Int32)

Encodes a set of characters starting at the specified character pointer into a sequence of bytes that are stored starting at the specified byte pointer.(Overrides Encoding.GetBytes(Char*, Int32, Byte*, Int32).)

System_CAPS_pubmethodGetBytes(Char[])

When overridden in a derived class, encodes all the characters in the specified character array into a sequence of bytes.(Inherited from Encoding.)

System_CAPS_pubmethodGetBytes(Char[], Int32, Int32)

When overridden in a derived class, encodes a set of characters from the specified character array into a sequence of bytes.(Inherited from Encoding.)

System_CAPS_pubmethodGetBytes(Char[], Int32, Int32, Byte[], Int32)

Encodes a set of characters from the specified character array into the specified byte array.(Overrides Encoding.GetBytes(Char[], Int32, Int32, Byte[], Int32).)

System_CAPS_pubmethodGetBytes(String)

When overridden in a derived class, encodes all the characters in the specified string into a sequence of bytes.(Inherited from Encoding.)

System_CAPS_pubmethodGetBytes(String, Int32, Int32, Byte[], Int32)

Encodes a set of characters from the specified String into the specified byte array.(Overrides Encoding.GetBytes(String, Int32, Int32, Byte[], Int32).)

System_CAPS_pubmethodGetCharCount(Byte*, Int32)

Calculates the number of characters produced by decoding a sequence of bytes starting at the specified byte pointer.(Overrides Encoding.GetCharCount(Byte*, Int32).)

System_CAPS_pubmethodGetCharCount(Byte[])

When overridden in a derived class, calculates the number of characters produced by decoding all the bytes in the specified byte array.(Inherited from Encoding.)

System_CAPS_pubmethodGetCharCount(Byte[], Int32, Int32)

Calculates the number of characters produced by decoding a sequence of bytes from the specified byte array.(Overrides Encoding.GetCharCount(Byte[], Int32, Int32).)

System_CAPS_pubmethodGetChars(Byte*, Int32, Char*, Int32)

Decodes a sequence of bytes starting at the specified byte pointer into a set of characters that are stored starting at the specified character pointer.(Overrides Encoding.GetChars(Byte*, Int32, Char*, Int32).)

System_CAPS_pubmethodGetChars(Byte[])

When overridden in a derived class, decodes all the bytes in the specified byte array into a set of characters.(Inherited from Encoding.)

System_CAPS_pubmethodGetChars(Byte[], Int32, Int32)

When overridden in a derived class, decodes a sequence of bytes from the specified byte array into a set of characters.(Inherited from Encoding.)

System_CAPS_pubmethodGetChars(Byte[], Int32, Int32, Char[], Int32)

Decodes a sequence of bytes from the specified byte array into the specified character array.(Overrides Encoding.GetChars(Byte[], Int32, Int32, Char[], Int32).)

System_CAPS_pubmethodGetDecoder()

Obtains a decoder that converts a UTF-32 encoded sequence of bytes into a sequence of Unicode characters.(Overrides Encoding.GetDecoder().)

System_CAPS_pubmethodGetEncoder()

Obtains an encoder that converts a sequence of Unicode characters into a UTF-32 encoded sequence of bytes.(Overrides Encoding.GetEncoder().)

System_CAPS_pubmethodGetHashCode()

Returns the hash code for the current instance.(Overrides Encoding.GetHashCode().)

System_CAPS_pubmethodGetMaxByteCount(Int32)

Calculates the maximum number of bytes produced by encoding the specified number of characters.(Overrides Encoding.GetMaxByteCount(Int32).)

System_CAPS_pubmethodGetMaxCharCount(Int32)

Calculates the maximum number of characters produced by decoding the specified number of bytes.(Overrides Encoding.GetMaxCharCount(Int32).)

System_CAPS_pubmethodGetPreamble()

Returns a Unicode byte order mark encoded in UTF-32 format, if the UTF32Encoding object is configured to supply one. (Overrides Encoding.GetPreamble().)

System_CAPS_pubmethodGetString(Byte*, Int32)

When overridden in a derived class, decodes a specified number of bytes starting at a specified address into a string. (Inherited from Encoding.)

System_CAPS_pubmethodGetString(Byte[])

When overridden in a derived class, decodes all the bytes in the specified byte array into a string.(Inherited from Encoding.)

System_CAPS_pubmethodGetString(Byte[], Int32, Int32)

Decodes a range of bytes from a byte array into a string.(Overrides Encoding.GetString(Byte[], Int32, Int32).)

System_CAPS_pubmethodGetType()

Gets the Type of the current instance.(Inherited from Object.)

System_CAPS_pubmethodIsAlwaysNormalized()

Gets a value indicating whether the current encoding is always normalized, using the default normalization form.(Inherited from Encoding.)

System_CAPS_pubmethodIsAlwaysNormalized(NormalizationForm)

When overridden in a derived class, gets a value indicating whether the current encoding is always normalized, using the specified normalization form.(Inherited from Encoding.)

System_CAPS_pubmethodToString()

Returns a string that represents the current object.(Inherited from Object.)

Encoding is the process of transforming a set of Unicode characters into a sequence of bytes. Decoding is the process of transforming a sequence of encoded bytes into a set of Unicode characters.

The Unicode Standard assigns a code point (a number) to each character in every supported script. A Unicode Transformation Format (UTF) is a way to encode that code point. The Unicode Standard uses the following UTFs:

  • UTF-8, which represents each code point as a sequence of one to four bytes.

  • UTF-16, which represents each code point as a sequence of one to two 16-bit integers.

  • UTF-32, which represents each code point as a 32-bit integer.

For more information about the UTFs and other encodings supported by System.Text, see .

The UTF32Encoding class represents a UTF-32 encoding. The encoder can use the big endian byte order (most significant byte first) or the little endian byte order (least significant byte first). For example, the Latin Capital Letter A (code point U+0041) is serialized as follows (in hexadecimal):

  • Big endian byte order: 00 00 00 41

  • Little endian byte order: 41 00 00 00

It is generally more efficient to store Unicode characters using the native byte order. For example, it is better to use the little endian byte order on little endian platforms, such as Intel computers. UTF32Encoding corresponds to the Windows code pages 12000 (little endian byte order) and 12001 (big endian byte order). You can determine the "endianness" of a particular architecture by calling the BitConverter.IsLittleEndian method.

Optionally, the UTF32Encoding object provides a byte order mark (BOM), which is an array of bytes that can be prefixed to the sequence of bytes resulting from the encoding process. If the preamble contains a byte order mark (BOM), it helps the decoder determine the byte order and the transformation format or UTF of a byte array.

If the UTF32Encoding instance is configured to provide a BOM, you can retrieve it by calling the GetPreamble method; otherwise, the method returns an empty array. Note that, even if a UTF32Encoding object is configured for BOM support, you must include the BOM at the beginning of the encoded byte stream as appropriate; the encoding methods of the UTF32Encoding class do not do this automatically.

To enable error detection and to make the class instance more secure, you should instantiate a UTF32Encoding object by calling the UTF32Encoding(Boolean, Boolean, Boolean) constructor and setting its throwOnInvalidBytes argument to true. With error detection, a method that detects an invalid sequence of characters or bytes throws an ArgumentException exception. Without error detection, no exception is thrown, and the invalid sequence is generally ignored.

You can instantiate a UTF32Encoding object in a number of ways, depending on whether you want to it to provide a byte order mark (BOM), whether you want big-endian or little-endian encoding, and whether you want to enable error detection. The following table lists the UTF32Encoding constructors and the Encoding properties that return a UnicodeEncoding object.

Member

Endianness

BOM

Error detection

Encoding.UTF32

Little-endian

Yes

No (Replacement fallback)

UTF32Encoding.UTF32Encoding()

Little-endian

Yes

No (Replacement fallback)

UTF32Encoding.UTF32Encoding(Boolean, Boolean)

Configurable

Configurable

No (Replacement fallback)

UTF32Encoding.UTF32Encoding(Boolean, Boolean, Boolean)

Configurable

Configurable

Configurable

The GetByteCount method determines how many bytes result in encoding a set of Unicode characters, and the GetBytes method performs the actual encoding.

Likewise, the GetCharCount method determines how many characters result in decoding a sequence of bytes, and the GetChars and GetString methods perform the actual decoding.

For an encoder or decoder that is able to save state information when encoding or decoding data that spans multiple blocks (such as string of 1 million characters that is encoded in 100,000-character segments), use the GetEncoder and GetDecoder properties, respectively.

The following example demonstrates the behavior of UTF32Encoding objects with and without error detection enabled. It creates a byte array whose last four bytes represent an invalid surrogate pair; the high surrogate U+D8FF is followed by an U+01FF, which is outside the range of low surrogates (0xDC00 through 0xDFFF). Without error detection, the UTF32 decoder uses replacement fallback to replace the invalid surrogate pair with REPLACEMENT CHARACTER (U+FFFD).

using System;
using System.Text;

public class Example
{
   public static void Main()
   {
     // Create a UTF32Encoding object with error detection enabled.
      var encExc = new UTF32Encoding(! BitConverter.IsLittleEndian, true, true);
      // Create a UTF32Encoding object with error detection disabled.
      var encRepl = new UTF32Encoding(! BitConverter.IsLittleEndian, true, false);

      // Create a byte arrays from a string, and add an invalid surrogate pair, as follows.
      //    Latin Small Letter Z (U+007A)
      //    Latin Small Letter A (U+0061)
      //    Combining Breve (U+0306)
      //    Latin Small Letter AE With Acute (U+01FD)
      //    Greek Small Letter Beta (U+03B2)
      //    a high-surrogate value (U+D8FF)
      //    an invalid low surrogate (U+01FF)
      String s = "za\u0306\u01FD\u03B2";

      // Encode the string using little-endian byte order.
      int index = encExc.GetByteCount(s);
      Byte[] bytes = new Byte[index + 4];
      encExc.GetBytes(s, 0, s.Length, bytes, 0);
      bytes[index] = 0xFF;
      bytes[index + 1] = 0xD8;
      bytes[index + 2] = 0xFF;
      bytes[index + 3] = 0x01;

      // Decode the byte array with error detection.
      Console.WriteLine("Decoding with error detection:");
      PrintDecodedString(bytes, encExc);

      // Decode the byte array without error detection.
      Console.WriteLine("Decoding without error detection:");
      PrintDecodedString(bytes, encRepl);
   }

   // Decode the bytes and display the string.
   public static void PrintDecodedString(Byte[] bytes, Encoding enc)
   {
      try {
         Console.WriteLine("   Decoded string: {0}", enc.GetString(bytes, 0, bytes.Length));
      }
      catch (DecoderFallbackException e) {
         Console.WriteLine(e.ToString());
      }
      Console.WriteLine();
   }
}
// The example displays the following output:
//    Decoding with error detection:
//    System.Text.DecoderFallbackException: Unable to translate bytes [FF][D8][FF][01] at index
//    20 from specified code page to Unicode.
//       at System.Text.DecoderExceptionFallbackBuffer.Throw(Byte[] bytesUnknown, Int32 index)
//       at System.Text.DecoderExceptionFallbackBuffer.Fallback(Byte[] bytesUnknown, Int32 index
//    )
//       at System.Text.DecoderFallbackBuffer.InternalFallback(Byte[] bytes, Byte* pBytes)
//       at System.Text.UTF32Encoding.GetCharCount(Byte* bytes, Int32 count, DecoderNLS baseDeco
//    der)
//       at System.Text.UTF32Encoding.GetString(Byte[] bytes, Int32 index, Int32 count)
//       at Example.PrintDecodedString(Byte[] bytes, Encoding enc)
//
//    Decoding without error detection:
//       Decoded string: zăǽβ�

The following example encodes a string of Unicode characters into a byte array by using a UTF32Encoding object. The byte array is then decoded into a string to demonstrate that there is no loss of data.

using System;
using System.Text;

public class Example
{
    public static void Main()
    {
        // The encoding.
        var enc = new UTF32Encoding();

        // Create a string.
        String s = "This string contains two characters " +
                   "with codes outside the ASCII code range: " +
                   "Pi (\u03A0) and Sigma (\u03A3).";
        Console.WriteLine("Original string:");
        Console.WriteLine("   {0}", s);

        // Encode the string.
        Byte[] encodedBytes = enc.GetBytes(s);
        Console.WriteLine();
        Console.WriteLine("Encoded bytes:");
        for (int ctr = 0; ctr < encodedBytes.Length; ctr++) {
            Console.Write("[{0:X2}]{1}", encodedBytes[ctr],
                                         (ctr + 1) % 4 == 0 ? " " : "" );
            if ((ctr + 1) % 16 == 0) Console.WriteLine();
        }
        Console.WriteLine();

        // Decode bytes back to string.
        // Notice Pi and Sigma characters are still present.
        String decodedString = enc.GetString(encodedBytes);
        Console.WriteLine();
        Console.WriteLine("Decoded string:");
        Console.WriteLine("   {0}", decodedString);
    }
}
// The example displays the following output:
//    Original string:
//       This string contains two characters with codes outside the ASCII code range:
//    Pi (π) and Sigma (Σ).
//
//    Encoded bytes:
//    [54][00][00][00] [68][00][00][00] [69][00][00][00] [73][00][00][00]
//    [20][00][00][00] [73][00][00][00] [74][00][00][00] [72][00][00][00]
//    [69][00][00][00] [6E][00][00][00] [67][00][00][00] [20][00][00][00]
//    [63][00][00][00] [6F][00][00][00] [6E][00][00][00] [74][00][00][00]
//    [61][00][00][00] [69][00][00][00] [6E][00][00][00] [73][00][00][00]
//    [20][00][00][00] [74][00][00][00] [77][00][00][00] [6F][00][00][00]
//    [20][00][00][00] [63][00][00][00] [68][00][00][00] [61][00][00][00]
//    [72][00][00][00] [61][00][00][00] [63][00][00][00] [74][00][00][00]
//    [65][00][00][00] [72][00][00][00] [73][00][00][00] [20][00][00][00]
//    [77][00][00][00] [69][00][00][00] [74][00][00][00] [68][00][00][00]
//    [20][00][00][00] [63][00][00][00] [6F][00][00][00] [64][00][00][00]
//    [65][00][00][00] [73][00][00][00] [20][00][00][00] [6F][00][00][00]
//    [75][00][00][00] [74][00][00][00] [73][00][00][00] [69][00][00][00]
//    [64][00][00][00] [65][00][00][00] [20][00][00][00] [74][00][00][00]
//    [68][00][00][00] [65][00][00][00] [20][00][00][00] [41][00][00][00]
//    [53][00][00][00] [43][00][00][00] [49][00][00][00] [49][00][00][00]
//    [20][00][00][00] [63][00][00][00] [6F][00][00][00] [64][00][00][00]
//    [65][00][00][00] [20][00][00][00] [72][00][00][00] [61][00][00][00]
//    [6E][00][00][00] [67][00][00][00] [65][00][00][00] [3A][00][00][00]
//    [20][00][00][00] [50][00][00][00] [69][00][00][00] [20][00][00][00]
//    [28][00][00][00] [A0][03][00][00] [29][00][00][00] [20][00][00][00]
//    [61][00][00][00] [6E][00][00][00] [64][00][00][00] [20][00][00][00]
//    [53][00][00][00] [69][00][00][00] [67][00][00][00] [6D][00][00][00]
//    [61][00][00][00] [20][00][00][00] [28][00][00][00] [A3][03][00][00]
//    [29][00][00][00] [2E][00][00][00]
//
//    Decoded string:
//       This string contains two characters with codes outside the ASCII code range:
//    Pi (π) and Sigma (Σ).

The following example uses the same string as the previous one, except that it writes the encoded bytes to a file and prefixes the byte stream with a byte order mark (BOM). It then reads the file in two different ways: as a text file by using a StreamReader object; and as a binary file. As you would expect, in neither case is the BOM included in the newly read string.

using System;
using System.IO;
using System.Text;

public class Example
{
    public static void Main()
    {
        // Create a UTF-32 encoding that supports a BOM.
        var enc = new UTF32Encoding();

        // A Unicode string with two characters outside an 8-bit code range.
        String s = "This Unicode string has 2 characters " +
                   "outside the ASCII range: \n" +
                   "Pi (\u03A0), and Sigma (\u03A3).";
        Console.WriteLine("Original string:");
        Console.WriteLine(s);
        Console.WriteLine();

        // Encode the string.
        Byte[] encodedBytes = enc.GetBytes(s);
        Console.WriteLine("The encoded string has {0} bytes.\n",
                          encodedBytes.Length);

        // Write the bytes to a file with a BOM.
        var fs = new FileStream(@".\UTF32Encoding.txt", FileMode.Create);
        Byte[] bom = enc.GetPreamble();
        fs.Write(bom, 0, bom.Length);
        fs.Write(encodedBytes, 0, encodedBytes.Length);
        Console.WriteLine("Wrote {0} bytes to the file.\n", fs.Length);
        fs.Close();

        // Open the file using StreamReader.
        var sr = new StreamReader(@".\UTF32Encoding.txt");
        String newString = sr.ReadToEnd();
        sr.Close();
        Console.WriteLine("String read using StreamReader:");
        Console.WriteLine(newString);
        Console.WriteLine();

        // Open the file as a binary file and decode the bytes back to a string.
        fs = new FileStream(@".\Utf32Encoding.txt", FileMode.Open);
        Byte[] bytes = new Byte[fs.Length];
        fs.Read(bytes, 0, (int)fs.Length);
        fs.Close();

        String decodedString = enc.GetString(encodedBytes);
        Console.WriteLine("Decoded bytes from binary file:");
        Console.WriteLine(decodedString);
    }
}
// The example displays the following output:
//    Original string:
//    This Unicode string has 2 characters outside the ASCII range:
//    Pi (π), and Sigma (Σ).
//
//    The encoded string has 340 bytes.
//
//    Wrote 344 bytes to the file.
//
//    String read using StreamReader:
//    This Unicode string has 2 characters outside the ASCII range:
//    Pi (π), and Sigma (Σ).
//
//    Decoded bytes from binary file:
//    This Unicode string has 2 characters outside the ASCII range:
//    Pi (π), and Sigma (Σ).

Universal Windows Platform
Available since 10
.NET Framework
Available since 2.0

Any public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe.

Return to top
Show:
© 2016 Microsoft