Encoding Base Types
Characters are abstract entities that can be represented using many different character schemes or code pages. For example, Unicode UTF-16 encoding represents characters as sequences of 16-bit integers, whereas Unicode UTF-8 represents the same characters as sequences of 8-bit bytes. The common language runtime uses Unicode UTF-16 (Unicode Transformation Format, 16-bit encoding form) to represent characters.
Applications that target the common language runtime use encoding to map character representations from the native character scheme to other schemes. Applications use decoding to map characters from non-native schemes to the native scheme. The following table lists the most commonly used classes in the System.Text namespace to encode and decode characters.
Converts to and from ASCII characters.
Converts characters to and from various encodings as specified in the Convert method.
UTF-16 Unicode encoding
Converts to and from UTF-16 encoding. This scheme represents characters as 16-bit integers.
UTF-8 Unicode encoding
Converts to and from UTF-8 encoding. This variable-width encoding scheme represents characters using one to four bytes.
The following code example converts a Unicode string into an array of bytes using the ASCIIEncoding.GetBytes method. Each byte in the array represents the ASCII value for the letter in that position of the string.
This example displays the following to the console. The byte 69 is the ASCII value for the E character; the byte 110 is the ASCII value for the n character, and so on.
69 110 99 111 100 105 110 103 32 83 116 114 105 110 103 46
The following code example converts the preceding array of bytes into an array of characters using the ASCIIEncoding class. The GetChars method is used to decode the array of bytes.
The preceding code displays the text Encoding String. to the console.