Skip to main content
UTF8Encoding Class
 

Represents a UTF-8 encoding of Unicode characters.

Namespace:   System.Text
Assembly:  mscorlib (in mscorlib.dll)

SystemObject
   System.TextEncoding
    System.TextUTF8Encoding

[SerializableAttribute]
[ComVisibleAttribute(true)]
public class UTF8Encoding : Encoding
[SerializableAttribute]
[ComVisibleAttribute(true)]
public ref class UTF8Encoding : Encoding
[<SerializableAttribute>]
[<ComVisibleAttribute(true)>]
type UTF8Encoding = 
    class
        inherit Encoding
    end
<SerializableAttribute>
<ComVisibleAttribute(True)>
Public Class UTF8Encoding
	Inherits Encoding
NameDescription
System_CAPS_pubmethod UTF8Encoding

Initializes a new instance of the UTF8Encoding class.

System_CAPS_pubmethod UTF8Encoding

Initializes a new instance of the UTF8Encoding class. A parameter specifies whether to provide a Unicode byte order mark.

System_CAPS_pubmethod UTF8Encoding

Initializes a new instance of the UTF8Encoding class. Parameters specify whether to provide a Unicode byte order mark and whether to throw an exception when an invalid encoding is detected.

NameDescription
System_CAPS_pubproperty

When overridden in a derived class, gets a name for the current encoding that can be used with mail agent body tags.(Inherited from Encoding.)

System_CAPS_pubproperty

When overridden in a derived class, gets the code page identifier of the current Encoding.(Inherited from Encoding.)

System_CAPS_pubproperty

Gets or sets the DecoderFallback object for the current Encoding object.(Inherited from Encoding.)

System_CAPS_pubproperty

Gets or sets the EncoderFallback object for the current Encoding object.(Inherited from Encoding.)

System_CAPS_pubproperty

When overridden in a derived class, gets the human-readable description of the current encoding.(Inherited from Encoding.)

System_CAPS_pubproperty

When overridden in a derived class, gets a name for the current encoding that can be used with mail agent header tags.(Inherited from Encoding.)

System_CAPS_pubproperty

When overridden in a derived class, gets a value indicating whether the current encoding can be used by browser clients for displaying content.(Inherited from Encoding.)

System_CAPS_pubproperty

When overridden in a derived class, gets a value indicating whether the current encoding can be used by browser clients for saving content.(Inherited from Encoding.)

System_CAPS_pubproperty

When overridden in a derived class, gets a value indicating whether the current encoding can be used by mail and news clients for displaying content.(Inherited from Encoding.)

System_CAPS_pubproperty

When overridden in a derived class, gets a value indicating whether the current encoding can be used by mail and news clients for saving content.(Inherited from Encoding.)

System_CAPS_pubproperty

When overridden in a derived class, gets a value indicating whether the current encoding is read-only.(Inherited from Encoding.)

System_CAPS_pubproperty

When overridden in a derived class, gets a value indicating whether the current encoding uses single-byte code points.(Inherited from Encoding.)

System_CAPS_pubproperty

When overridden in a derived class, gets the name registered with the Internet Assigned Numbers Authority (IANA) for the current encoding.(Inherited from Encoding.)

System_CAPS_pubproperty

When overridden in a derived class, gets the Windows operating system code page that most closely corresponds to the current encoding.(Inherited from Encoding.)

NameDescription
System_CAPS_pubmethod Clone

When overridden in a derived class, creates a shallow copy of the current Encoding object.(Inherited from Encoding.)

System_CAPS_pubmethod Equals

Determines whether the specified object is equal to the current UTF8Encoding object.(Overrides EncodingEquals.)

System_CAPS_protmethod Finalize

Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection.(Inherited from Object.)

System_CAPS_pubmethod GetByteCount

Calculates the number of bytes produced by encoding a set of characters starting at the specified character pointer.(Overrides EncodingGetByteCount.)

System_CAPS_pubmethod GetByteCount

When overridden in a derived class, calculates the number of bytes produced by encoding all the characters in the specified character array.(Inherited from Encoding.)

System_CAPS_pubmethod GetByteCount

Calculates the number of bytes produced by encoding a set of characters from the specified character array.(Overrides EncodingGetByteCount.)

System_CAPS_pubmethod GetByteCount

Calculates the number of bytes produced by encoding the characters in the specified String.(Overrides EncodingGetByteCount.)

System_CAPS_pubmethod GetBytes

Encodes a set of characters starting at the specified character pointer into a sequence of bytes that are stored starting at the specified byte pointer.(Overrides EncodingGetBytes.)

System_CAPS_pubmethod GetBytes

When overridden in a derived class, encodes all the characters in the specified character array into a sequence of bytes.(Inherited from Encoding.)

System_CAPS_pubmethod GetBytes

When overridden in a derived class, encodes a set of characters from the specified character array into a sequence of bytes.(Inherited from Encoding.)

System_CAPS_pubmethod GetBytes

Encodes a set of characters from the specified character array into the specified byte array.(Overrides EncodingGetBytes.)

System_CAPS_pubmethod GetBytes

When overridden in a derived class, encodes all the characters in the specified string into a sequence of bytes.(Inherited from Encoding.)

System_CAPS_pubmethod GetBytes

Encodes a set of characters from the specified String into the specified byte array.(Overrides EncodingGetBytes.)

System_CAPS_pubmethod GetCharCount

Calculates the number of characters produced by decoding a sequence of bytes starting at the specified byte pointer.(Overrides EncodingGetCharCount.)

System_CAPS_pubmethod GetCharCount

When overridden in a derived class, calculates the number of characters produced by decoding all the bytes in the specified byte array.(Inherited from Encoding.)

System_CAPS_pubmethod GetCharCount

Calculates the number of characters produced by decoding a sequence of bytes from the specified byte array.(Overrides EncodingGetCharCount.)

System_CAPS_pubmethod GetChars

Decodes a sequence of bytes starting at the specified byte pointer into a set of characters that are stored starting at the specified character pointer.(Overrides EncodingGetChars.)

System_CAPS_pubmethod GetChars

When overridden in a derived class, decodes all the bytes in the specified byte array into a set of characters.(Inherited from Encoding.)

System_CAPS_pubmethod GetChars

When overridden in a derived class, decodes a sequence of bytes from the specified byte array into a set of characters.(Inherited from Encoding.)

System_CAPS_pubmethod GetChars

Decodes a sequence of bytes from the specified byte array into the specified character array.(Overrides EncodingGetChars.)

System_CAPS_pubmethod GetDecoder

Obtains a decoder that converts a UTF-8 encoded sequence of bytes into a sequence of Unicode characters. (Overrides EncodingGetDecoder.)

System_CAPS_pubmethod GetEncoder

Obtains an encoder that converts a sequence of Unicode characters into a UTF-8 encoded sequence of bytes.(Overrides EncodingGetEncoder.)

System_CAPS_pubmethod GetHashCode

Returns the hash code for the current instance.(Overrides EncodingGetHashCode.)

System_CAPS_pubmethod GetMaxByteCount

Calculates the maximum number of bytes produced by encoding the specified number of characters.(Overrides EncodingGetMaxByteCount.)

System_CAPS_pubmethod GetMaxCharCount

Calculates the maximum number of characters produced by decoding the specified number of bytes.(Overrides EncodingGetMaxCharCount.)

System_CAPS_pubmethod GetPreamble

Returns a Unicode byte order mark encoded in UTF-8 format, if the UTF8Encoding encoding object is configured to supply one. (Overrides EncodingGetPreamble.)

System_CAPS_pubmethod GetString

When overridden in a derived class, decodes a specified number of bytes starting at a specified address into a string. (Inherited from Encoding.)

System_CAPS_pubmethod GetString

When overridden in a derived class, decodes all the bytes in the specified byte array into a string.(Inherited from Encoding.)

System_CAPS_pubmethod GetString

Decodes a range of bytes from a byte array into a string.(Overrides EncodingGetString.)

System_CAPS_pubmethod GetType

Gets the Type of the current instance.(Inherited from Object.)

System_CAPS_pubmethod IsAlwaysNormalized

Gets a value indicating whether the current encoding is always normalized, using the default normalization form.(Inherited from Encoding.)

System_CAPS_pubmethod IsAlwaysNormalized

When overridden in a derived class, gets a value indicating whether the current encoding is always normalized, using the specified normalization form.(Inherited from Encoding.)

System_CAPS_protmethod MemberwiseClone

Creates a shallow copy of the current Object.(Inherited from Object.)

System_CAPS_pubmethod ToString

Returns a string that represents the current object.(Inherited from Object.)

Encoding is the process of transforming a set of Unicode characters into a sequence of bytes. Decoding is the process of transforming a sequence of encoded bytes into a set of Unicode characters.

UTF-8 is a Unicode encoding that represents each code point as a sequence of one to four bytes. Unlike the UTF-16 and UTF-32 encodings, the UTF-8 encoding does not require "endianness"; the encoding scheme is the same regardless of whether the processor is big-endian or little-endian. UTF8Encoding corresponds to the Windows code page 65001. For more information about the UTFs and other encodings supported by System.Text, see Character Encoding in the .NET Framework.

You can instantiate a UTF8Encoding object in a number of ways, depending on whether you want to it to provide a byte order mark (BOM) and whether you want to enable error detection. The following table lists the constructors and the Encoding property that return a UTF8Encoding object.

Member

BOM

Error detection

Yes

No (Replacement fallback)

UTF8EncodingUTF8Encoding

No

No (Replacement fallback)

UTF8EncodingUTF8Encoding

Configurable

No (Replacement fallback)

UTF8EncodingUTF8Encoding

Configurable

Configurable

The GetByteCount method determines how many bytes result in encoding a set of Unicode characters, and the GetBytes method performs the actual encoding.

Likewise, the GetCharCount method determines how many characters result in decoding a sequence of bytes, and the GetChars and GetString methods perform the actual decoding.

For an encoder or decoder that is able to save state information when encoding or decoding data that spans multiple blocks (such as string of 1 million characters that is encoded in 100,000-character segments), use the GetEncoder and GetDecoder properties, respectively.

Optionally, the UTF8Encoding object provides a byte order mark (BOM), which is an array of bytes that can be prefixed to the beginning of the byte stream that results from the encoding process. If a UTF-8 encoded byte stream is prefaced with a byte order mark (BOM), it helps the decoder determine the byte order and the transformation format or UTF. Note, however, that the Unicode Standard neither requires nor recommends a BOM in UTF-8 encoded streams. For more information on byte order and the byte order mark, see The Unicode Standard at the Unicode home page.

If the encoder is configured to provide a BOM, you can retrieve it by calling the GetPreamble method; otherwise, the method returns an empty array. Note that, even if a UTF8Encoding object is configured for BOM support, you must include the BOM at the beginning of the encoded byte stream as appropriate; the encoding methods of the UTF8Encoding class do not do this automatically.

System_CAPS_noteNote

To enable error detection and to make the class instance more secure, you should call the UTF8Encoding constructor and set the throwOnInvalidBytes parameter to true. With error detection enabled, a method that detects an invalid sequence of characters or bytes throws an ArgumentException exception. Without error detection, no exception is thrown, and the invalid sequence is generally ignored.

System_CAPS_noteNote

The state of a UTF-8 encoded object is not preserved if the object is serialized and deserialized using different .NET Framework versions.

The following example uses a UTF8Encoding object to encode a string of Unicode characters and store them in a byte array. The Unicode string includes two characters, Pi (U+03A0) and Sigma (U+03A3), that are outside the ASCII character range. When the encoded byte array is decoded back to a string, the Pi and Sigma characters are still present.

using System;
using System.Text;

class Example
{
    public static void Main()
    {
        // Create a UTF-8 encoding.
        UTF8Encoding utf8 = new UTF8Encoding();

        // A Unicode string with two characters outside an 8-bit code range.
        String unicodeString =
            "This Unicode string has 2 characters outside the " +
            "ASCII range:\n" +
            "Pi (\u03a0), and Sigma (\u03a3).";
        Console.WriteLine("Original string:");
        Console.WriteLine(unicodeString);

        // Encode the string.
        Byte[] encodedBytes = utf8.GetBytes(unicodeString);
        Console.WriteLine();
        Console.WriteLine("Encoded bytes:");
        for (int ctr = 0; ctr < encodedBytes.Length; ctr++) {
            Console.Write("{0:X2} ", encodedBytes[ctr]);
            if ((ctr + 1) %  25 == 0)
               Console.WriteLine();
        }
        Console.WriteLine();

        // Decode bytes back to string.
        String decodedString = utf8.GetString(encodedBytes);
        Console.WriteLine();
        Console.WriteLine("Decoded bytes:");
        Console.WriteLine(decodedString);
    }
}
// The example displays the following output:
//    Original string:
//    This Unicode string has 2 characters outside the ASCII range:
//    Pi (π), and Sigma (Σ).
//
//    Encoded bytes:
//    54 68 69 73 20 55 6E 69 63 6F 64 65 20 73 74 72 69 6E 67 20 68 61 73 20 32
//    20 63 68 61 72 61 63 74 65 72 73 20 6F 75 74 73 69 64 65 20 74 68 65 20 41
//    53 43 49 49 20 72 61 6E 67 65 3A 20 0D 0A 50 69 20 28 CE A0 29 2C 20 61 6E
//    64 20 53 69 67 6D 61 20 28 CE A3 29 2E
//
//    Decoded bytes:
//    This Unicode string has 2 characters outside the ASCII range:
//    Pi (π), and Sigma (Σ).
Imports System.Text

Class Example
    Public Shared Sub Main()
        ' Create a UTF-8 encoding.
        Dim utf8 As New UTF8Encoding()

        ' A Unicode string with two characters outside an 8-bit code range.
        Dim unicodeString As String = _
            "This Unicode string has 2 characters outside the " &
            "ASCII range: " & vbCrLf &
            "Pi (" & ChrW(&h03A0) & "), and Sigma (" & ChrW(&h03A3) & ")."
        Console.WriteLine("Original string:")
        Console.WriteLine(unicodeString)

        ' Encode the string.
        Dim encodedBytes As Byte() = utf8.GetBytes(unicodeString)
        Console.WriteLine()
        Console.WriteLine("Encoded bytes:")
        For ctr As Integer = 0 To encodedBytes.Length - 1
            Console.Write("{0:X2} ", encodedBytes(ctr))
            If (ctr + 1) Mod 25 = 0 Then Console.WriteLine
        Next
        Console.WriteLine()

        ' Decode bytes back to string.
        Dim decodedString As String = utf8.GetString(encodedBytes)
        Console.WriteLine()
        Console.WriteLine("Decoded bytes:")
        Console.WriteLine(decodedString)
    End Sub
End Class
' The example displays the following output:
'    Original string:
'    This Unicode string has 2 characters outside the ASCII range:
'    Pi (π), and Sigma (Σ).
'
'    Encoded bytes:
'    54 68 69 73 20 55 6E 69 63 6F 64 65 20 73 74 72 69 6E 67 20 68 61 73 20 32
'    20 63 68 61 72 61 63 74 65 72 73 20 6F 75 74 73 69 64 65 20 74 68 65 20 41
'    53 43 49 49 20 72 61 6E 67 65 3A 20 0D 0A 50 69 20 28 CE A0 29 2C 20 61 6E
'    64 20 53 69 67 6D 61 20 28 CE A3 29 2E
'
'    Decoded bytes:
'    This Unicode string has 2 characters outside the ASCII range:
'    Pi (π), and Sigma (Σ).
using namespace System;
using namespace System::Text;
//using namespace System::Collections;

int main()
{
   // Create a UTF-8 encoding.
   UTF8Encoding^ utf8 = gcnew UTF8Encoding;

   // A Unicode string with two characters outside an 8-bit code range.
   String^ unicodeString = L"This Unicode string has 2 characters " +
                           L"outside the ASCII range:\n" +
                           L"Pi (\u03a0), and Sigma (\u03a3).";
   Console::WriteLine("Original string:");
   Console::WriteLine(unicodeString);

   // Encode the string.
   array<Byte>^ encodedBytes = utf8->GetBytes(unicodeString );
   Console::WriteLine();
   Console::WriteLine("Encoded bytes:");
   for (int ctr = 0; ctr < encodedBytes->Length; ctr++) {
      Console::Write( "{0:X2} ", encodedBytes[ctr]);
      if ((ctr + 1) % 25 == 0)
         Console::WriteLine();
   }

   Console::WriteLine();

   // Decode bytes back to string.
   String^ decodedString = utf8->GetString(encodedBytes);
   Console::WriteLine();
   Console::WriteLine("Decoded bytes:");
   Console::WriteLine(decodedString);
}
// The example displays the following output:
//    Original string:
//    This Unicode string has 2 characters outside the ASCII range:
//    Pi (π), and Sigma (Σ).
//
//    Encoded bytes:
//    54 68 69 73 20 55 6E 69 63 6F 64 65 20 73 74 72 69 6E 67 20 68 61 73 20 32
//    20 63 68 61 72 61 63 74 65 72 73 20 6F 75 74 73 69 64 65 20 74 68 65 20 41
//    53 43 49 49 20 72 61 6E 67 65 3A 20 0D 0A 50 69 20 28 CE A0 29 2C 20 61 6E
//    64 20 53 69 67 6D 61 20 28 CE A3 29 2E
//
//    Decoded bytes:
//    This Unicode string has 2 characters outside the ASCII range:
//    Pi (π), and Sigma (Σ).

The following example uses the same string as the previous example, except that it writes the encoded bytes to a file and prefixes the byte stream with a byte order mark (BOM). It then reads the file in two different ways: as a text file by using a StreamReader object; and as a binary file. As you would expect, in neither case is the BOM included in the newly read string.

using System;
using System.IO;
using System.Text;

public class Example
{
   public static void Main()
   {
        // Create a UTF-8 encoding that supports a BOM.
        Encoding utf8 = new UTF8Encoding(true);

        // A Unicode string with two characters outside an 8-bit code range.
        String unicodeString =
            "This Unicode string has 2 characters outside the " +
            "ASCII range:\n" +
            "Pi (\u03A0)), and Sigma (\u03A3).";
        Console.WriteLine("Original string:");
        Console.WriteLine(unicodeString);
        Console.WriteLine();

        // Encode the string.
        Byte[] encodedBytes = utf8.GetBytes(unicodeString);
        Console.WriteLine("The encoded string has {0} bytes.",
                          encodedBytes.Length);
        Console.WriteLine();

        // Write the bytes to a file with a BOM.
        var fs = new FileStream(@".\UTF8Encoding.txt", FileMode.Create);
        Byte[] bom = utf8.GetPreamble();
        fs.Write(bom, 0, bom.Length);
        fs.Write(encodedBytes, 0, encodedBytes.Length);
        Console.WriteLine("Wrote {0} bytes to the file.", fs.Length);
        fs.Close();
        Console.WriteLine();

        // Open the file using StreamReader.
        var sr = new StreamReader(@".\UTF8Encoding.txt");
        String newString = sr.ReadToEnd();
        sr.Close();
        Console.WriteLine("String read using StreamReader:");
        Console.WriteLine(newString);
        Console.WriteLine();

        // Open the file as a binary file and decode the bytes back to a string.
        fs = new FileStream(@".\UTF8Encoding.txt", FileMode.Open);
        Byte[] bytes = new Byte[fs.Length];
        fs.Read(bytes, 0, (int)fs.Length);
        fs.Close();

        String decodedString = utf8.GetString(encodedBytes);
        Console.WriteLine("Decoded bytes:");
        Console.WriteLine(decodedString);
   }
}
// The example displays the following output:
//    Original string:
//    This Unicode string has 2 characters outside the ASCII range:
//    Pi (π), and Sigma (Σ).
//
//    The encoded string has 88 bytes.
//
//    Wrote 91 bytes to the file.
//
//    String read using StreamReader:
//    This Unicode string has 2 characters outside the ASCII range:
//    Pi (π), and Sigma (Σ).
//
//    Decoded bytes:
//    This Unicode string has 2 characters outside the ASCII range:
//    Pi (π), and Sigma (Σ).
Imports System.IO
Imports System.Text

Class Example
    Public Shared Sub Main()
        ' Create a UTF-8 encoding that supports a BOM.
        Dim utf8 As New UTF8Encoding(True)

        ' A Unicode string with two characters outside an 8-bit code range.
        Dim unicodeString As String = _
            "This Unicode string has 2 characters outside the " &
            "ASCII range: " & vbCrLf &
            "Pi (" & ChrW(&h03A0) & "), and Sigma (" & ChrW(&h03A3) & ")."
        Console.WriteLine("Original string:")
        Console.WriteLine(unicodeString)
        Console.WriteLine()

        ' Encode the string.
        Dim encodedBytes As Byte() = utf8.GetBytes(unicodeString)
        Console.WriteLine("The encoded string has {0} bytes.",
                          encodedBytes.Length)
        Console.WriteLine()

        ' Write the bytes to a file with a BOM.
        Dim fs As New FileStream(".\UTF8Encoding.txt", FileMode.Create)
        Dim bom() As Byte = utf8.GetPreamble()
        fs.Write(bom, 0, bom.Length)
        fs.Write(encodedBytes, 0, encodedBytes.Length)
        Console.WriteLine("Wrote {0} bytes to the file.", fs.Length)
        fs.Close()
        Console.WriteLine()

        ' Open the file using StreamReader.
        Dim sr As New StreamReader(".\UTF8Encoding.txt")
        Dim newString As String = sr.ReadToEnd()
        sr.Close()
        Console.WriteLine("String read using StreamReader:")
        Console.WriteLine(newString)
        Console.WriteLine()

        ' Open the file as a binary file and decode the bytes back to a string.
        fs = new FileStream(".\UTF8Encoding.txt", FileMode.Open)
        Dim bytes(fs.Length - 1) As Byte
        fs.Read(bytes, 0, fs.Length)
        fs.Close()

        Dim decodedString As String = utf8.GetString(encodedBytes)
        Console.WriteLine("Decoded bytes:")
        Console.WriteLine(decodedString)
    End Sub
End Class
' The example displays the following output:
'    Original string:
'    This Unicode string has 2 characters outside the ASCII range:
'    Pi (π), and Sigma (Σ).
'
'    The encoded string has 88 bytes.
'
'    Wrote 91 bytes to the file.
'
'    String read using StreamReader:
'    This Unicode string has 2 characters outside the ASCII range:
'    Pi (π), and Sigma (Σ).
'
'    Decoded bytes:
'    This Unicode string has 2 characters outside the ASCII range:
'    Pi (π), and Sigma (Σ).
Universal Windows Platform
Available since 8
.NET Framework
Available since 1.1
Portable Class Library
Supported in: portable .NET platforms
Silverlight
Available since 2.0
Windows Phone Silverlight
Available since 7.0
Windows Phone
Available since 8.1

Any public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe.

Return to top