Encoding Support for Code Pages

The use of Unicode in the .NET Framework simplifies the development of world-ready applications because the applications no longer need to reference a code page. A code page is a list of selected character codes (characters represented as code points) in a certain order. Code pages are usually defined to support specific languages or groups of languages that share common writing systems.

Windows code pages contain 256 code points and are zero-based. In most code pages, the code points 0 through 127 represent the same characters, allowing for continuity and legacy code. The code points 128 through 255 differ significantly between code pages. For example, code page 1253 provides character codes that are required in the Greek writing system. Code page 1252 provides the characters for Latin writing systems, including English, German, and French. The last 128 code points in code page 1253 contain the Greek characters, and the last 128 code points in code page 1252 contain the accent characters. As a result, your application cannot store Greek and German in the same code stream unless it includes an identifier that indicates the referenced code page.

The double-byte character set (DBCS) scheme was developed for languages, such as Chinese, Japanese, and Korean, that contain more than 256 characters. In a DBCS, a pair of code points (a double byte) represents each character. When an application handles DBCS data, the first byte of a DBCS character (the lead byte) is not processed by itself. It is processed in combination with the trail byte that follows immediately after it. This scheme still does not allow for the combination of two languages, such as Japanese and Chinese, in the same data stream. The reason for this is that one pair of double-byte code points can represent different characters, depending on the code page.

The .NET Framework provides support for characters encoded using code pages. Your application can use the GetEncoding method to create a target encoding object for a specified code page. The following code example creates an encoding for the code page 1252.

Encoding enc = Encoding.GetEncoding(1252);

After your application creates an Encoding object that corresponds to a specified code page, the application can use the object to perform other operations supported by the Encoding class. For an example of using this class, see the "Using the Encoding Class" subtopic of the Using Unicode Encoding topic.

Community Additions