.NET Framework Class Library
Encoding.UTF8 Property

Gets an encoding for the UTF-8 format.

Namespace: System.Text
Assembly: mscorlib (in mscorlib.dll)

Syntax

Visual Basic (Declaration)
Public Shared ReadOnly Property UTF8 As Encoding
Visual Basic (Usage)
Dim value As Encoding

value = Encoding.UTF8
C#
public static Encoding UTF8 { get; }
C++
public:
static property Encoding^ UTF8 {
    Encoding^ get ();
}
J#
/** @property */
public static Encoding get_UTF8 ()
JScript
public static function get UTF8 () : Encoding

Property Value

An Encoding for the UTF-8 format.
Remarks

The Unicode Standard assigns a code point (a number) to each character in every supported script. A Unicode Transformation Format (UTF) is a way to encode that code point. The Unicode Standard version 3.2 uses the following UTFs:

  • UTF-8, which represents each code point as a sequence of one to four bytes.

  • UTF-16, which represents each code point as a sequence of one to two 16-bit integers.

  • UTF-32, which represents each code point as a 32-bit integer.

For more information on UTFs, see The Unicode Standard at www.unicode.org.

Example

The following code example determines the number of bytes required to encode a character array, encodes the characters, and displays the resulting bytes.

Visual Basic
Imports System
Imports System.Text
Imports Microsoft.VisualBasic

Public Class SamplesEncoding   

   Public Shared Sub Main()

      ' The characters to encode:
      '    Latin Small Letter Z (U+007A)
      '    Latin Small Letter A (U+0061)
      '    Combining Breve (U+0306)
      '    Latin Small Letter AE With Acute (U+01FD)
      '    Greek Small Letter Beta (U+03B2)
      '    a high-surrogate value (U+D8FF)
      '    a low-surrogate value (U+DCFF)
      Dim myChars() As Char = {"z"c, "a"c, ChrW(&H0306), ChrW(&H01FD), ChrW(&H03B2), ChrW(&HD8FF), ChrW(&HDCFF)}
 

      ' Get different encodings.
      Dim u7 As Encoding = Encoding.UTF7
      Dim u8 As Encoding = Encoding.UTF8
      Dim u16LE As Encoding = Encoding.Unicode
      Dim u16BE As Encoding = Encoding.BigEndianUnicode
      Dim u32 As Encoding = Encoding.UTF32

      ' Encode the entire array, and print out the counts and the resulting bytes.
      PrintCountsAndBytes(myChars, u7)
      PrintCountsAndBytes(myChars, u8)
      PrintCountsAndBytes(myChars, u16LE)
      PrintCountsAndBytes(myChars, u16BE)
      PrintCountsAndBytes(myChars, u32)

   End Sub 'Main


   Public Shared Sub PrintCountsAndBytes(chars() As Char, enc As Encoding)

      ' Display the name of the encoding used.
      Console.Write("{0,-30} :", enc.ToString())

      ' Display the exact byte count.
      Dim iBC As Integer = enc.GetByteCount(chars)
      Console.Write(" {0,-3}", iBC)

      ' Display the maximum byte count.
      Dim iMBC As Integer = enc.GetMaxByteCount(chars.Length)
      Console.Write(" {0,-3} :", iMBC)

      ' Encode the array of chars.
      Dim bytes As Byte() = enc.GetBytes(chars)

      ' Display all the encoded bytes.
      PrintHexBytes(bytes)

   End Sub 'PrintCountsAndBytes


   Public Shared Sub PrintHexBytes(bytes() As Byte)

      If bytes Is Nothing OrElse bytes.Length = 0 Then
         Console.WriteLine("<none>")
      Else
         Dim i As Integer
         For i = 0 To bytes.Length - 1
            Console.Write("{0:X2} ", bytes(i))
         Next i
         Console.WriteLine()
      End If

   End Sub 'PrintHexBytes 

End Class 'SamplesEncoding


'This code produces the following output.
'
'System.Text.UTF7Encoding       : 18  23  :7A 61 2B 41 77 59 42 2F 51 4F 79 32 50 2F 63 2F 77 2D
'System.Text.UTF8Encoding       : 12  24  :7A 61 CC 86 C7 BD CE B2 F1 8F B3 BF
'System.Text.UnicodeEncoding    : 14  16  :7A 00 61 00 06 03 FD 01 B2 03 FF D8 FF DC
'System.Text.UnicodeEncoding    : 14  16  :00 7A 00 61 03 06 01 FD 03 B2 D8 FF DC FF
'System.Text.UTF32Encoding      : 24  32  :7A 00 00 00 61 00 00 00 06 03 00 00 FD 01 00 00 B2 03 00 00 FF FC 04 00
C#
using System;
using System.Text;

public class SamplesEncoding  {

   public static void Main()  {

      // The characters to encode:
      //    Latin Small Letter Z (U+007A)
      //    Latin Small Letter A (U+0061)
      //    Combining Breve (U+0306)
      //    Latin Small Letter AE With Acute (U+01FD)
      //    Greek Small Letter Beta (U+03B2)
      //    a high-surrogate value (U+D8FF)
      //    a low-surrogate value (U+DCFF)
      char[] myChars = new char[] { 'z', 'a', '\u0306', '\u01FD', '\u03B2', '\uD8FF', '\uDCFF' };

      // Get different encodings.
      Encoding  u7    = Encoding.UTF7;
      Encoding  u8    = Encoding.UTF8;
      Encoding  u16LE = Encoding.Unicode;
      Encoding  u16BE = Encoding.BigEndianUnicode;
      Encoding  u32   = Encoding.UTF32;

      // Encode the entire array, and print out the counts and the resulting bytes.
      PrintCountsAndBytes( myChars, u7 );
      PrintCountsAndBytes( myChars, u8 );
      PrintCountsAndBytes( myChars, u16LE );
      PrintCountsAndBytes( myChars, u16BE );
      PrintCountsAndBytes( myChars, u32 );

   }


   public static void PrintCountsAndBytes( char[] chars, Encoding enc )  {

      // Display the name of the encoding used.
      Console.Write( "{0,-30} :", enc.ToString() );

      // Display the exact byte count.
      int iBC  = enc.GetByteCount( chars );
      Console.Write( " {0,-3}", iBC );

      // Display the maximum byte count.
      int iMBC = enc.GetMaxByteCount( chars.Length );
      Console.Write( " {0,-3} :", iMBC );

      // Encode the array of chars.
      byte[] bytes = enc.GetBytes( chars );

      // Display all the encoded bytes.
      PrintHexBytes( bytes );

   }


   public static void PrintHexBytes( byte[] bytes )  {

      if (( bytes == null ) || ( bytes.Length == 0 ))
         Console.WriteLine( "<none>" );
      else  {
         for ( int i = 0; i < bytes.Length; i++ )
            Console.Write( "{0:X2} ", bytes[i] );
         Console.WriteLine();
      }

   }

}


/* 
This code produces the following output.

System.Text.UTF7Encoding       : 18  23  :7A 61 2B 41 77 59 42 2F 51 4F 79 32 50 2F 63 2F 77 2D
System.Text.UTF8Encoding       : 12  24  :7A 61 CC 86 C7 BD CE B2 F1 8F B3 BF
System.Text.UnicodeEncoding    : 14  16  :7A 00 61 00 06 03 FD 01 B2 03 FF D8 FF DC
System.Text.UnicodeEncoding    : 14  16  :00 7A 00 61 03 06 01 FD 03 B2 D8 FF DC FF
System.Text.UTF32Encoding      : 24  32  :7A 00 00 00 61 00 00 00 06 03 00 00 FD 01 00 00 B2 03 00 00 FF FC 04 00

*/
C++
using namespace System;
using namespace System::Text;
void PrintCountsAndBytes( array<Char>^chars, Encoding^ enc );
void PrintHexBytes( array<Byte>^bytes );
int main()
{
   
   // The characters to encode:
   //    Latin Small Letter Z (U+007A)
   //    Latin Small Letter A (U+0061)
   //    Combining Breve (U+0306)
   //    Latin Small Letter AE With Acute (U+01FD)
   //    Greek Small Letter Beta (U+03B2)
   //    a high-surrogate value (U+D8FF)
   //    a low-surrogate value (U+DCFF)
   array<Char>^myChars = gcnew array<Char>{
      L'z','a',L'\u0306',L'\u01FD',L'\u03B2',L'\xD8FF',L'\xDCFF'
   };
   
   // Get different encodings.
   Encoding^ u7 = Encoding::UTF7;
   Encoding^ u8 = Encoding::UTF8;
   Encoding^ u16LE = Encoding::Unicode;
   Encoding^ u16BE = Encoding::BigEndianUnicode;
   Encoding^ u32 = Encoding::UTF32;
   
   // Encode the entire array, and print out the counts and the resulting bytes.
   PrintCountsAndBytes( myChars, u7 );
   PrintCountsAndBytes( myChars, u8 );
   PrintCountsAndBytes( myChars, u16LE );
   PrintCountsAndBytes( myChars, u16BE );
   PrintCountsAndBytes( myChars, u32 );
}

void PrintCountsAndBytes( array<Char>^chars, Encoding^ enc )
{
   
   // Display the name of the encoding used.
   Console::Write( "{0,-30} :", enc );
   
   // Display the exact byte count.
   int iBC = enc->GetByteCount( chars );
   Console::Write( " {0,-3}", iBC );
   
   // Display the maximum byte count.
   int iMBC = enc->GetMaxByteCount( chars->Length );
   Console::Write( " {0,-3} :", iMBC );
   
   // Encode the array of chars.
   array<Byte>^bytes = enc->GetBytes( chars );
   
   // Display all the encoded bytes.
   PrintHexBytes( bytes );
}

void PrintHexBytes( array<Byte>^bytes )
{
   if ( (bytes == nullptr) || (bytes->Length == 0) )
      Console::WriteLine( "<none>" );
   else
   {
      for ( int i = 0; i < bytes->Length; i++ )
         Console::Write( "{0:X2} ", bytes[ i ] );
      Console::WriteLine();
   }
}

/* 
This code produces the following output.

System.Text.UTF7Encoding       : 18  23  :7A 61 2B 41 77 59 42 2F 51 4F 79 32 50 2F 63 2F 77 2D
System.Text.UTF8Encoding       : 12  24  :7A 61 CC 86 C7 BD CE B2 F1 8F B3 BF
System.Text.UnicodeEncoding    : 14  16  :7A 00 61 00 06 03 FD 01 B2 03 FF D8 FF DC
System.Text.UnicodeEncoding    : 14  16  :00 7A 00 61 03 06 01 FD 03 B2 D8 FF DC FF
System.Text.UTF32Encoding      : 24  32  :7A 00 00 00 61 00 00 00 06 03 00 00 FD 01 00 00 B2 03 00 00 FF FC 04 00

*/
J#
import System.*;
import System.Text.*;
import System.Byte;

public class SamplesEncoding
{
    public static void main(String[] args)
    {
        // The characters to encode:
        //    Latin Small Letter Z (U+007A)
        //    Latin Small Letter A (U+0061)
        //    Combining Breve (U+0306)
        //    Latin Small Letter AE With Acute (U+01FD)
        //    Greek Small Letter Beta (U+03B2)
        //    a high-surrogate value (U+D8FF)
        //    a low-surrogate value (U+DCFF)
        char myChars[] = new char[] {
            'z', 'a', '\u0306', '\u01FD', '\u03B2', '\uD8FF', '\uDCFF'
        };

        // Get different encodings.
        Encoding u7 = Encoding.get_UTF7();
        Encoding u8 = Encoding.get_UTF8();
        Encoding u16LE = Encoding.get_Unicode();
        Encoding u16BE = Encoding.get_BigEndianUnicode();
        Encoding u32 = Encoding.get_UTF32();

        // Encode the entire array, and print out the counts
        // and the resulting bytes.
        PrintCountsAndBytes(myChars, u7);
        PrintCountsAndBytes(myChars, u8);
        PrintCountsAndBytes(myChars, u16LE);
        PrintCountsAndBytes(myChars, u16BE);
        PrintCountsAndBytes(myChars, u32);
    } //main

    public static void PrintCountsAndBytes(char chars[], Encoding enc)
    {
        // Display the name of the encoding used.
        Console.Write("{0,-30} :", enc.toString());

        // Display the exact byte count.
        int iBC = enc.GetByteCount(chars);
        Console.Write(" {0,-3}", String.valueOf(iBC));

        // Display the maximum byte count.
        int iMBC = enc.GetMaxByteCount(chars.length);
        Console.Write(" {0,-3} :", String.valueOf(iMBC));

        // Encode the array of chars.
        ubyte bytes[] = enc.GetBytes(chars);

        // Display all the encoded bytes.
        PrintHexBytes(bytes);
    } //PrintCountsAndBytes

    public static void PrintHexBytes(ubyte bytes[])
    {
        if(bytes == null || bytes.length == 0) {
            Console.WriteLine("<none>");
        }
        else {
            for(int i = 0; i < bytes.length; i++) {
                Console.Write("{0:X2} ",
                        ((System.Byte)bytes[i]).ToString("X2"));
            }
            Console.WriteLine();
        }
    } //PrintHexBytes
} //SamplesEncoding

/* 
This code produces the following output.

System.Text.UTF7Encoding       : 18  23  :7A 61 2B 41 77 59 42 2F 51 4F 79 32 50
 2F 63 2F 77 2D
System.Text.UTF8Encoding       : 12  24  :7A 61 CC 86 C7 BD CE B2 F1 8F B3 BF
System.Text.UnicodeEncoding    : 14  16  :7A 00 61 00 06 03 FD 01 B2 03 FF D8 FF
 DC
System.Text.UnicodeEncoding    : 14  16  :00 7A 00 61 03 06 01 FD 03 B2 D8 FF DC
 FF
System.Text.UTF32Encoding      : 24  32  :7A 00 00 00 61 00 00 00 06 03 00 00 FD
 01 00 00 B2 03 00 00 FF FC 04 00

*/
Platforms

Windows 98, Windows 2000 SP4, Windows CE, Windows Millennium Edition, Windows Mobile for Pocket PC, Windows Mobile for Smartphone, Windows Server 2003, Windows XP Media Center Edition, Windows XP Professional x64 Edition, Windows XP SP2, Windows XP Starter Edition

The .NET Framework does not support all versions of every platform. For a list of the supported versions, see System Requirements.

Version Information

.NET Framework

Supported in: 2.0, 1.1, 1.0

.NET Compact Framework

Supported in: 2.0, 1.0
See Also

Tags :


Community Content

Shawn Steele [MSFT]
Which Unicode Encoding to use?

The choice of which Unicode Encoding to use depends on several factors.  Your application can use UTF-8, UTF-16 (UnicodeEncoding), UTF-32 or even UTF-7.  Internally the .Net Framework strings are native UTF-16 strings.

UTF-7 is generally not used except for special cases where 7 bit compatibility is required. 

UTF-8 allows encoding using 8 bit data sizes and usually works well with many existing systems.  For the ASCII range of characters UTF-8 is identical and allows a broader set of characters.  For CJK scripts however UTF-8 can require 3 bytes for each character, potentially causing larger data sizes than UTF-16.  Sometimes the amount of ASCII data such as HTML tags mitigates the increase in size for the CJK range, providing a negligible net impact.

UTF-16 is often used natively, such as in the Microsoft.Net char type, the windows WCHAR and other common types.  Most common Unicode code points also only take one UTF-16 code point (2 bytes).  Unicode supplementary characters U+10000+ still require 2 UTF-16 surrogate code points.

UTF-32 is used when applications want to avoid the surrogate code point behavior of UTF-16 on systems where encoded space is less of a concern.  Note that single "glyphs" rendered on a display could still be encoded with more than one UTF-32 characters.  Also the supplementary characters susceptible to this behavior are currently much rarer than the Unicode BMP characters.

Personally I prefer UTF-16 data types internally and UTF-8 for data storage and transmission if the data size is important, such as for HTTP.  For some scripts I may choose UTF-16 instead of UTF-8, however often times the number of ASCII control codes even in non-ASCII scripts allow for smaller file sizes in UTF-8 than UTF-16. 

I only choose non-Unicode encodings if I must conform to a legacy protocol or standard that doesn't support Unicode.

Tags :

Shawn Steele [MSFT]
Bytes aren't Unicode (or Strings)

Random collections of numbers, either bytes or chars, don't make a valid String or valid Unicode.  You cannot convert a byte[] to or from Unicode and expect it to work.  Certain characters and code point sequences are illegal in Unicode 5 and will not convert with any of the Unicode Encodings.

If you need to pass binary data in a string format, use base 64 or some other format that is designed for that purpose.

Tags :

Page view tracker