Export (0) Print
Expand All

UTF8Encoding Class

Represents a UTF-8 encoding of Unicode characters.

For a list of all members of this type, see UTF8Encoding Members.

System.Object
   System.Text.Encoding
      System.Text.UTF8Encoding

[Visual Basic]
<Serializable>
Public Class UTF8Encoding
   Inherits Encoding
[C#]
[Serializable]
public class UTF8Encoding : Encoding
[C++]
[Serializable]
public __gc class UTF8Encoding : public Encoding
[JScript]
public
   Serializable
class UTF8Encoding extends Encoding

Thread Safety

Any public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe.

Remarks

This class encodes Unicode characters using UCS Transformation Format, 8-bit form (UTF-8). This encoding supports all Unicode character values and surrogates. For more information regarding surrogate pairs, see UnicodeCategory.

This class contains the GetCharCount method that reports the number of Unicode characters that result from decoding an array of bytes, and the GetChars method that actually decodes an array of bytes. The GetByteCount method reports the number of bytes that result from encoding strings or arrays of Unicode characters, and the GetBytes method actually encodes characters into an array of bytes.

The GetDecoder method obtains an object to convert (decode) UTF-8 encoded bytes into Unicode characters, while the GetEncoder method obtains an object to convert (encode) Unicode characters into UTF-8 encoded bytes. The GetPreamble method can obtain a Unicode byte order mark, which when prefixed to a series of bytes, indicates how those bytes are encoded.

UTF-8 encodes Unicode characters with a variable number of bytes per character. This encoding is optimized for the lower 127 ASCII characters, yielding an efficient mechanism to encode English in an international way. The UTF-8 identifier is the Unicode byte order mark, hexadecimal 0xFEFF, which is represented in UTF-8 as hexadecimal 0xEF 0xBB 0xBF. The byte order mark is used to distinguish UTF-8 text from other encodings.

This class offers an error detection feature that can be turned on when you create an instance of the class using the UTF8Encoding constructor that takes a throwOnInvalidBytes parameter. To turn on error detection, set the value of throwOnInvalidBytes to true. For security reasons, it is recommended that you use this constructor.

Certain methods in this class, such as GetChars, check for invalid sequences of surrogate pairs. If error detection is turned on and an invalid sequence is found, ArgumentException is thrown. If error detection is not turned on and an invalid sequence is found, no exception is thrown and execution continues in a manner defined by that method.

The error detection feature also works during decoding operations. If error detection is on and an invalid byte sequence is found, ArgumentException is thrown. Examples of invalid byte sequence are invalid leading or trailing UTF-8 bytes, UTF-8 byte sequence consisting of more than four bytes, and the non-shortest form as defined in Unicode 3.0.1. When error detection is off, invalid bytes are discarded.

This class inherits from the Encoding class.

Example

[Visual Basic, C#, C++] The following example demonstrates how to use a UTF8Encoding to encode a string of Unicode characters and store them in the byte array encodedBytes. Notice that when encodedBytes is decoded back to a string there is no loss of data.

[Visual Basic] 
Imports System
Imports System.Text
Imports Microsoft.VisualBasic.Strings

Class UTF8EncodingExample
    
    Public Shared Sub Main()
        ' Create a UTF-8 encoding.
        Dim utf8 As New UTF8Encoding()
        
        ' A Unicode string with two characters outside an 8-bit code range.
        Dim unicodeString As String = _
            "This unicode string contains two characters " & _
            "with codes outside an 8-bit code range, " & _
            "Pi (" & ChrW(928) & ") and Sigma (" & ChrW(931) & ")."
        Console.WriteLine("Original string:")
        Console.WriteLine(unicodeString)
        
        ' Encode the string.
        Dim encodedBytes As Byte() = utf8.GetBytes(unicodeString)
        Console.WriteLine()
        Console.WriteLine("Encoded bytes:")
        Dim b As Byte
        For Each b In  encodedBytes
            Console.Write("[{0}]", b)
        Next b
        Console.WriteLine()
        
        ' Decode bytes back to string.
        ' Notice Pi and Sigma characters are still present.
        Dim decodedString As String = utf8.GetString(encodedBytes)
        Console.WriteLine()
        Console.WriteLine("Decoded bytes:")
        Console.WriteLine(decodedString)
    End Sub
End Class

[C#] 
using System;
using System.Text;

class UTF8EncodingExample {
    public static void Main() {
        // Create a UTF-8 encoding.
        UTF8Encoding utf8 = new UTF8Encoding();
        
        // A Unicode string with two characters outside an 8-bit code range.
        String unicodeString =
            "This unicode string contains two characters " +
            "with codes outside an 8-bit code range, " +
            "Pi (\u03a0) and Sigma (\u03a3).";
        Console.WriteLine("Original string:");
        Console.WriteLine(unicodeString);

        // Encode the string.
        Byte[] encodedBytes = utf8.GetBytes(unicodeString);
        Console.WriteLine();
        Console.WriteLine("Encoded bytes:");
        foreach (Byte b in encodedBytes) {
            Console.Write("[{0}]", b);
        }
        Console.WriteLine();
        
        // Decode bytes back to string.
        // Notice Pi and Sigma characters are still present.
        String decodedString = utf8.GetString(encodedBytes);
        Console.WriteLine();
        Console.WriteLine("Decoded bytes:");
        Console.WriteLine(decodedString);
    }
}

[C++] 
#using <mscorlib.dll>
using namespace System;
using namespace System::Text;
using namespace System::Collections;

int main()
{
   // Create a UTF-8 encoding.
   UTF8Encoding* utf8 = new UTF8Encoding();

   // A Unicode string with two characters outside an 8-bit code range.
   String * unicodeString =
      S"This unicode string contains two characters with codes outside an 8-bit code range, Pi (\u03a0) and Sigma (\u03a3).";
   Console::WriteLine(S"Original string:");
   Console::WriteLine(unicodeString);

   // Encode the string.
   Byte encodedBytes[] = utf8 -> GetBytes(unicodeString);
   Console::WriteLine();
   Console::WriteLine(S"Encoded bytes:");
   IEnumerator* myEnum = encodedBytes->GetEnumerator();
   while (myEnum->MoveNext())
   {
      Byte b = *__try_cast<Byte __gc*>(myEnum->Current);
      Console::Write(S"[{0}]", __box(b));
   }
   Console::WriteLine();

   // Decode bytes back to string.
   // Notice Pi and Sigma characters are still present.
   String * decodedString = utf8 -> GetString(encodedBytes);
   Console::WriteLine();
   Console::WriteLine(S"Decoded bytes:");
   Console::WriteLine(decodedString);
}

[JScript] No example is available for JScript. To view a Visual Basic, C#, or C++ example, click the Language Filter button Language Filter in the upper-left corner of the page.

Requirements

Namespace: System.Text

Platforms: Windows 98, Windows NT 4.0, Windows Millennium Edition, Windows 2000, Windows XP Home Edition, Windows XP Professional, Windows Server 2003 family, .NET Compact Framework

Assembly: Mscorlib (in Mscorlib.dll)

See Also

UTF8Encoding Members | System.Text Namespace | Decoder | Encoder

Show:
© 2014 Microsoft