Single-Byte and Multibyte Character Sets
The new home for Visual Studio documentation is Visual Studio 2017 Documentation on docs.microsoft.com.
The latest version of this topic can be found at Single-Byte and Multibyte Character Sets.
The ASCII character set defines characters in the range 0x00 – 0x7F. There are a number of other character sets, primarily European, that define the characters within the range 0x00 – 0x7F identically to the ASCII character set and also define an extended character set from 0x80 – 0xFF. Thus an 8-bit, single-byte-character set (SBCS) is sufficient to represent the ASCII character set as well as the character sets for many European languages. However, some non-European character sets, such as Japanese Kanji, include many more characters than can be represented in a single-byte coding scheme, and therefore require multibyte-character set (MBCS) encoding.
Many |
A multibyte character set may consist of both one-byte and two-byte characters. Thus a multibyte-character string may contain a mixture of single-byte and double-byte characters. A two-byte multibyte character has a lead byte and a trail byte. In a particular multibyte-character set, the lead bytes fall within a certain range, as do the trail bytes. When these ranges overlap, it may be necessary to evaluate the particular context to determine whether a given byte is functioning as a lead byte or a trail byte.