Unicode in the .NET Framework

The .NET Framework uses Unicode UTF-16 (Unicode Transformation Format, 16-bit encoding form) to represent characters. In some cases, the .NET Framework uses UTF-8 internally.

The Unicode Standard is the universal character encoding scheme for characters and text. It assigns a unique numeric value, called a code point, and name to each character used in the written languages of the world. For example, the character "A" is represented by the code point "U+0041" and the name "LATIN CAPITAL LETTER A". Values are available for over 65,000 characters and there is room to support up to one million more. For more information, see The Unicode Standard at the Unicode home page.

In the past, the various language requirements for different cultures forced applications to use diverse encodings to represent data internally. These diverse encoding schemes forced developers to create fragmented code bases for operating systems and applications, such as single-byte editions for European languages, double-byte editions for Asian languages, and bidirectional editions for Middle Eastern languages. This fragmentation has made it difficult to share data between cultures and even more difficult to develop world-ready applications that support a multilingual user interface.

The Unicode data encoding scheme simplifies world-ready application development because it allows all international characters to be represented in a single encoding. Application developers no longer have to keep track of the encoding scheme that was used to produce characters for a specific language, and data can be shared among systems internationally without being corrupted.

See Also

Concepts

Encoding Base Types

Other Resources

Encoding and Localization