Export (0) Print
Expand All

UTF8Encoding Class

Represents a UTF-8 encoding of Unicode characters.

Namespace: System.Text
Assembly: mscorlib (in mscorlib.dll)

[SerializableAttribute] 
[ComVisibleAttribute(true)] 
public class UTF8Encoding : Encoding
/** @attribute SerializableAttribute() */ 
/** @attribute ComVisibleAttribute(true) */ 
public class UTF8Encoding extends Encoding
SerializableAttribute 
ComVisibleAttribute(true) 
public class UTF8Encoding extends Encoding
Not applicable.

Encoding is the process of transforming a set of Unicode characters into a sequence of bytes. Decoding is the process of transforming a sequence of encoded bytes into a set of Unicode characters.

UTF-8 encoding represents each code point as a sequence of one to four bytes. For more information about the UTFs and other encodings supported by System.Text, see Understanding Encodings and Using Unicode Encoding.

The GetByteCount method determines how many bytes result in encoding a set of Unicode characters, and the GetBytes method performs the actual encoding.

Likewise, the GetCharCount method determines how many characters result in decoding a sequence of bytes, and the GetChars and GetString methods perform the actual decoding.

UTF8Encoding corresponds to the Windows code page 65001.

The encoder can use the big endian byte order (most significant byte first) or the little endian byte order (least significant byte first). It is generally more efficient to store Unicode characters using the native byte order. For example, it is better to use the little endian byte order on little endian platforms, such as Intel computers.

Optionally, the UTF8Encoding object provides a preamble, which is an array of bytes that can be prefixed to the sequence of bytes resulting from the encoding process. If the preamble contains a byte order mark (BOM), it helps the decoder determine the byte order and the transformation format or UTF. The GetPreamble method retrieves an array of bytes that can include the BOM. For more information on byte order and the byte order mark, see The Unicode Standard at the Unicode home page.

NoteNote:

To enable error detection and to make the class instance more secure, the application should use the UTF8Encoding constructor that takes a throwOnInvalidBytes parameter and set that parameter to true. With error detection, a method that detects an invalid sequence of characters or bytes throws a ArgumentException. Without error detection, no exception is thrown, and the invalid sequence is generally ignored.

NoteNote:

The state of a UTF-8 encoded object is not preserved if the object is serialized and deserialized using different .NET Framework versions.

The following example demonstrates how to use a UTF8Encoding to encode a string of Unicode characters and store them in a byte array. Notice that when encodedBytes is decoded back to a string there is no loss of data.

using System;
using System.Text;

class UTF8EncodingExample {
    public static void Main() {
        // Create a UTF-8 encoding.
        UTF8Encoding utf8 = new UTF8Encoding();
        
        // A Unicode string with two characters outside an 8-bit code range.
        String unicodeString =
            "This unicode string contains two characters " +
            "with codes outside an 8-bit code range, " +
            "Pi (\u03a0) and Sigma (\u03a3).";
        Console.WriteLine("Original string:");
        Console.WriteLine(unicodeString);

        // Encode the string.
        Byte[] encodedBytes = utf8.GetBytes(unicodeString);
        Console.WriteLine();
        Console.WriteLine("Encoded bytes:");
        foreach (Byte b in encodedBytes) {
            Console.Write("[{0}]", b);
        }
        Console.WriteLine();
        
        // Decode bytes back to string.
        // Notice Pi and Sigma characters are still present.
        String decodedString = utf8.GetString(encodedBytes);
        Console.WriteLine();
        Console.WriteLine("Decoded bytes:");
        Console.WriteLine(decodedString);
    }
}

import System.*;
import System.Text.*;

class UTF8EncodingExample
{
    public static void main(String[] args)
    {
        // Create a UTF-8 encoding.
        UTF8Encoding utf8 = new UTF8Encoding();

        // A Unicode string with two characters outside an 8-bit code range.
        String unicodeString = "This unicode string contains two characters "
            + "with codes outside an 8-bit code range, " 
            + "Pi (\u03a0) and Sigma (\u03a3).";
        Console.WriteLine("Original string:");
        Console.WriteLine(unicodeString);

        // Encode the string.
        ubyte encodedBytes[] = utf8.GetBytes(unicodeString);
        Console.WriteLine();
        Console.WriteLine("Encoded bytes:");
        for (int iCtr = 0; iCtr < encodedBytes.length; iCtr++) {
            ubyte b = encodedBytes[iCtr];
            Console.Write("[{0}]", String.valueOf(b));
        }
        Console.WriteLine();

        // Decode bytes back to string.
        // Notice Pi and Sigma characters are still present.
        String decodedString = utf8.GetString(encodedBytes);
        Console.WriteLine();
        Console.WriteLine("Decoded bytes:");
        Console.WriteLine(decodedString);
    } //main
} //UTF8EncodingExample

System.Object
   System.Text.Encoding
    System.Text.UTF8Encoding

Any public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe.

Windows 98, Windows Server 2000 SP4, Windows CE, Windows Millennium Edition, Windows Mobile for Pocket PC, Windows Mobile for Smartphone, Windows Server 2003, Windows XP Media Center Edition, Windows XP Professional x64 Edition, Windows XP SP2, Windows XP Starter Edition

The Microsoft .NET Framework 3.0 is supported on Windows Vista, Microsoft Windows XP SP2, and Windows Server 2003 SP1.

.NET Framework

Supported in: 3.0, 2.0, 1.1, 1.0

.NET Compact Framework

Supported in: 2.0, 1.0

XNA Framework

Supported in: 1.0

Community Additions

ADD
Show:
© 2014 Microsoft