Double-byte Character Sets
A double-byte character set (DBCS), also known as an "expanded 8-bit character set", is an extended single-byte character set (SBCS), implemented as a code page. DBCSs were originally developed to extend the SBCS design to handle languages such as Japanese and Chinese. Some characters in a DBCS, including the digits and letters used for writing English, have single-byte code values. Other characters, such as Chinese ideographs or Japanese kanji, have double-byte code values. A DBCS can correspond either to a Windows code page or an OEM code page. A DBCS code page can also include a non-native code page, for example, an EBCDIC code page. For definitions of these code pages, see Code Pages.
To interpret a DBCS string, an application must start at the beginning of the string and scan forward. It keeps track when it encounters a lead byte in the string, and treats the next byte as the trailing part of the same character. If the application simply scans the string one byte at a time and encounters a byte that appears to be the code value representing a backslash ("\"), that byte might simply be the trail byte of a two-byte character. The application cannot just back up one byte to see if the preceding byte is a lead byte, as that byte value might be eligible to be used as both a lead byte and a trail byte. Thus the application has essentially the same problem with it as with the possible backslash. In other words, substring searches are much more complicated with a DBCS than with either SBCSs or Unicode. Accordingly, applications that support a DBCS must use special functions, such as _mbsstr, instead of the StrStr function.
Your applications use DBCS Windows code pages with the "A" versions of Windows functions. See Conventions for Function Prototypes and Code Pages. To help identify a DBCS code page, an application can use the GetCPInfo or GetCPInfoEx function. An application can use the IsDBCSLeadByte function to determine if a given value can be used as the lead byte of a 2-byte character. In addition, an application can use the MultiByteToWideChar and WideCharToMultiByte functions to map between Unicode and DBCS strings.
Related topics