© 2004 Microsoft Corporation. All rights reserved.

Unicode Encodings
Unicode code pages are mapped to one of the many Unicode encodings such as UCS-2 or UTF-8. If you use 32-bit words and save a file such that each group of four bytes represents one character, then you're using UCS-4. If you write two bytes per character, you're using UCS-2. In UTF-8, a byte that has the high bit cleared is a character having a value of less than 0x80. In other words, ASCII letters are directly readable, and pure ASCII text is unchanged if encoded in UTF-8. The implementation of Unicode in actual 16-bit values is referred to as UTF-16. A Unicode encoding is used to take raw Unicode and translate it into an encoding that can be used by your application. Raw Unicode that has not been encoded will represent all characters with two bytes of data. For more information, see http://www.microsoft.com/typography/unicode/cs.htm.

Language Preferences Set by Users
Most Web pages contain information that tells the browser what language encoding (the language and character set) to use. However, the information in the HTTP_ACCEPT_LANGUAGE variable depends on whether the user has selected a language preference. If the page does not include that information but the user has the Language Encoding Auto-Select feature on (from the Encoding item on the View menu), Microsoft Internet Explorer can usually determine the language encoding. Users can set their language from the Internet Options item on the Tools menu. If users speak several languages, they can be arranged by priority. If a Web site offers multiple languages, content will appear in the language that has the highest priority.