Export (0) Print
Expand All
This topic has not yet been rated - Rate this topic

UnicodeEncoding Class

Represents a UTF-16 encoding of Unicode characters.

For a list of all members of this type, see UnicodeEncoding Members.

System.Object
   System.Text.Encoding
      System.Text.UnicodeEncoding

[Visual Basic]
<Serializable>
Public Class UnicodeEncoding
   Inherits Encoding
[C#]
[Serializable]
public class UnicodeEncoding : Encoding
[C++]
[Serializable]
public __gc class UnicodeEncoding : public Encoding
[JScript]
public
   Serializable
class UnicodeEncoding extends Encoding

Thread Safety

Any public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe.

Remarks

This class encodes Unicode characters in Unicode Transformation Format, 16-bit encoding form (UTF-16); that is, the character is encoded in a 16-bit field consisting of two consecutive bytes. An extension mechanism, using pairs of fields called surrogates, enables another 220 characters to be encoded.

The two bytes of an encoded character are stored in either little-endian or big-endian byte order depending on the computer architecture. In big-endian architectures the most significant byte is written and read first, while in little-endian architectures the least significant byte is written and read first.

A UTF-16 encoding can be prefixed by a byte order mark (U+FEFF) to indicate the byte order used. It is assumed that the encoding is big-endian if the byte order mark is hexadecimal 0xFEFF, and little-endian if the byte order mark is hexadecimal 0xFFFE. (A byte order mark can also precede files encoded in UTF-8 format, but in that case the byte order mark just indicates UTF-8 encoding; it does not indicate byte order. For more information, see the UTF8Encoding class.)

This class contains a constructor, UnicodeEncoding, that can specify whether an encoding is little-endian or big-endian, and whether a byte order mark is used.

This class contains the GetCharCount method that reports the number of Unicode characters that result from decoding an array of bytes, and the GetChars method that actually decodes an array of bytes. The GetByteCount method reports the number of bytes that result from encoding strings or arrays of Unicode characters, and the GetBytes method actually encodes characters into an array of bytes.

The GetDecoder method obtains an object to decode bytes. The GetPreamble method can obtain an encoded Unicode byte order mark.

For more information about Unicode, see the Unicode Standard at www.unicode.org.

This class implements the Encoding abstract base class.

Example

[Visual Basic, C#, C++] The following example demonstrates how to encode the string of Unicode characters unicodeString into the byte array encodedBytes using a UnicodeEncoding. The byte array is decoded back into a string to demonstrate that there is no loss of data.

[Visual Basic] 
Imports System
Imports System.Text
Imports Microsoft.VisualBasic.Strings

Class UnicodeEncodingExample
    
    Public Shared Sub Main()
        ' The encoding.
        Dim uni As New UnicodeEncoding()
        
        ' Create a string that contains Unicode characters.
        Dim unicodeString As String = _
            "This Unicode string contains two characters " & _
            "with codes outside the traditional ASCII code range, " & _
            "Pi (" & ChrW(928) & ") and Sigma (" & ChrW(931) & ")."
        Console.WriteLine("Original string:")
        Console.WriteLine(unicodeString)
        
        ' Encode the string.
        Dim encodedBytes As Byte() = uni.GetBytes(unicodeString)
        Console.WriteLine()
        Console.WriteLine("Encoded bytes:")
        Dim b As Byte
        For Each b In  encodedBytes
            Console.Write("[{0}]", b)
        Next b
        Console.WriteLine()
        
        ' Decode bytes back to string.
        ' Notice Pi and Sigma characters are still present.
        Dim decodedString As String = uni.GetString(encodedBytes)
        Console.WriteLine()
        Console.WriteLine("Decoded bytes:")
        Console.WriteLine(decodedString)
    End Sub
End Class

[C#] 
using System;
using System.Text;

class UnicodeEncodingExample {
    public static void Main() {
        // The encoding.
        UnicodeEncoding unicode = new UnicodeEncoding();
        
        // Create a string that contains Unicode characters.
        String unicodeString =
            "This Unicode string contains two characters " +
            "with codes outside the traditional ASCII code range, " +
            "Pi (\u03a0) and Sigma (\u03a3).";
        Console.WriteLine("Original string:");
        Console.WriteLine(unicodeString);

        // Encode the string.
        Byte[] encodedBytes = unicode.GetBytes(unicodeString);
        Console.WriteLine();
        Console.WriteLine("Encoded bytes:");
        foreach (Byte b in encodedBytes) {
            Console.Write("[{0}]", b);
        }
        Console.WriteLine();
        
        // Decode bytes back to string.
        // Notice Pi and Sigma characters are still present.
        String decodedString = unicode.GetString(encodedBytes);
        Console.WriteLine();
        Console.WriteLine("Decoded bytes:");
        Console.WriteLine(decodedString);
    }
}

[C++] 
#using <mscorlib.dll>
using namespace System;
using namespace System::Text;
using namespace System::Collections;

int main()
{
   // The encoding.
   UnicodeEncoding* unicode = new UnicodeEncoding();

   // Create a String* that contains Unicode characters.
   String * unicodeString =
      S"This Unicode string contains two characters with codes outside the traditional ASCII code range, Pi (\u03a0) and Sigma (\u03a3).";
   Console::WriteLine(S"Original string:");
   Console::WriteLine(unicodeString);

   // Encode the String*.
   Byte encodedBytes[] = unicode -> GetBytes(unicodeString);
   Console::WriteLine();
   Console::WriteLine(S"Encoded bytes:");
   IEnumerator* myEnum = encodedBytes->GetEnumerator();
   while (myEnum->MoveNext())
   {
      // Byte b = __try_cast<Byte>(myEnum->Current);
      Byte b = *__try_cast<Byte __gc*>(myEnum->Current);
      Console::Write(S"[{0}]", __box(b));
   }
   Console::WriteLine();

   // Decode bytes back to String*.
   // Notice Pi and Sigma characters are still present.
   String * decodedString = unicode -> GetString(encodedBytes);
   Console::WriteLine();
   Console::WriteLine(S"Decoded bytes:");
   Console::WriteLine(decodedString);
}

[JScript] No example is available for JScript. To view a Visual Basic, C#, or C++ example, click the Language Filter button Language Filter in the upper-left corner of the page.

Requirements

Namespace: System.Text

Platforms: Windows 98, Windows NT 4.0, Windows Millennium Edition, Windows 2000, Windows XP Home Edition, Windows XP Professional, Windows Server 2003 family, .NET Compact Framework

Assembly: Mscorlib (in Mscorlib.dll)

See Also

UnicodeEncoding Members | System.Text Namespace

Show:
© 2014 Microsoft. All rights reserved.