On This Page
Overview and Description
To create software that is locale and culture aware, you'll need to first understand what is meant by the term "locale," as well as the vital role locale variables play in the development process. As you'll see, the manner in which locales are handled and interpreted differs depending on whether you're working in a traditional Win32 programming environment or within the .NET Framework. The following sections will help you understand the locale settings you'll encounter among various systems and environments.
Language Groups and Language Collections
The different languages and scripts that an operating system can support-once the user installs them-are known as "language groups" in Windows 2000. Generally speaking, a language group contains code-page information, keyboard layouts, and fonts. Some language groups also have a scripting engine, which enables the user to edit supported scripts within the operating system.
With Windows 2000 the user can add support for many languages and scripts (including, among others, Western European, Central European, Arabic, Indic, and Turkic languages). Since Windows 2000 and Windows XP are both single, world-ready binaries, functionality for all supported scripts is available on all language versions of these operating systems. If additional languages are needed, the user only needs to install these separately either during or after setup.
In Windows XP, Microsoft has simplified the process of installing these groups by gathering languages with similar properties into a language collection. On English Windows XP, support for all European languages (including Baltic languages, Cyrillic languages, Greek, and Turkic languages)-known as the "Basic Language Collection"-is installed by default. Additional language collections (such as East Asian for Japanese, Chinese, and Korean, as well as complex scripts for Arabic, Hebrew, the Indic family of languages, and Thai) can also be installed. (After installing a language group or collection, the user will need to restart the computer.)
Figure 1: Regional And Language Options property sheet of Windows XP, where East Asian, complex script, and right-to-left language support have been installed.
The system locale determines which code page is used on the system by default on operating systems that use Unicode as their native encoding-such as Windows 2000 and Windows XP-to convert text data from Unicode to code page whenever dealing with legacy non-Unicode applications. In fact, in Windows XP the system locale is called "Language for non-Unicode programs." Only applications that do not use Unicode as their default character-encoding mechanism are affected by this setting; therefore, applications that are already Unicode-encoded can safely ignore the value and functionality of this setting. (See Figure 2.) Support for new scripts in Windows 2000 and Windows XP (such as for the Indic family, Armenian, and Georgian languages) has been provided through Unicode encoding and, therefore, these scripts are also free of any system-locale limitations.
In this example, the system locale has been set to Romanian (Windows code page 1250 and Original Equipment Manufacturer [OEM] code page 852), a Central European language. This means that all Central European-language applications that are based on code pages can run safely in this configuration, since they are part of the same language collection.
So, for example, a Polish application (another Central European language) does not need a system-locale change. Also, since English is part of the invariant American Standard Code for Information Interchange (ASCII) range of all code pages, English legacy applications will always run properly.
Sometimes there is no noticeable difference between two system locales. For example, the German (Standard) and German (Austria) system locales are identical, since they share the same OEM and Windows code pages; therefore, the behavior of non-Unicode-based applications will be identical in both scenarios. In general, system locales of a language group are very similar and might only be different in the OEM code page.
As its name suggests, the system locale is a unique setting for each system. Only an administrator has the right to change the system locale, and the computer will need to be restarted in order for changes to take effect. The administrator can only select a system locale if the appropriate language group-and its associated script support-is installed.
The system locale is a system variable that cannot be changed programmatically. The only way to change it is for an administrator to do so manually.
Figure 2: System locale can be set from the Advanced tab of the Regional And Language Options property sheet in Windows XP
The user locale determines which default settings a user wants for formatting dates, times, currency, and large numbers. Although it's presented as a language (some in a combination with a country), it's not a language setting. That is, choosing the Hebrew user locale means that the user wants to adhere to the standards of Israel, not really of the Hebrew language. To avoid any confusion with this naming, the .NET Framework calls the user locale "culture information."
As its name implies, the user locale (known in Windows XP as "Standards and formats") is a variable that each individual user can set. This can be done on the fly by selecting changes from the Regional Options tab in the Regional And Language Options property sheet. (See Figure 3) Locale-aware applications should use this value to display formatted data.
When changes are made, all locale-aware applications should monitor the window message WM_SETTINGCHANGE and should be able to update their displayed data accordingly. Numbers, currency, date, and time are some of the variables that are affected by the user-locale setting. The user locale is a user variable that cannot be changed programmatically. The only way to change it is for the user to do so manually.
Figure 3: User locale set to Konkani
Known as "input locale" in Windows 2000 and "input language" in Windows XP, this variable describes a language a user wants to enter into an application (not necessarily type) and the method of input. There can be multiple input locales installed and the user can switch between them. The default input locale is the locale that is active when a new application is started (or in some applications, when a new window is opened). Switching to a different input locale is done on a per-thread basis; that is, you can have two different input locales in two different applications.
Figure 4: Users can add & remove input languages from the Languages tab of the Regional And Language Options property sheet in Windows XP
Location or Geographic ID
This variable is available in Windows XP (and was also available in Microsoft Windows Millennium Edition [Windows Me] though not in Windows 2000) to define the country or location where the user lives. Each user can change this variable on the fly by selecting changes from the Regional Options tab of the Regional And Language Options property sheet. Any changes made are also applied on the fly. By selecting a particular location, the user has set a variable that a Web service (such as one that deals with weather) can check, thus allowing the Web service to deliver information and services specific to the region or country the user has selected.
Figure 5: Regional Options with the location variable set to Malaysia
The thread locale defaults to the currently selected user locale and determines the formatting of dates, times, currency, and large numbers for the thread. It can be changed programmatically using the API SetThreadLocale, but in most cases the thread locale should not be overwritten.
On Microsoft Windows NT 4.0, many applications used the thread locale to define which language resources should be retrieved and displayed. However, this practice represents a misusage of the thread locale. (In Windows 2000 and Windows XP, the system's resource loader does not default to the thread locale variable.) As you'll see, resource languages should always be driven by and follow the user interface (UI) language variable.
This variable allows each user to select the language of the UI for such things as dialog boxes, menus, and Help files. This option is only available on the MUI Pack of Windows XP Professional and on the MultiLanguage version of Windows 2000 Professional.
It's important to distinguish between the system UI language and the user UI language. Though it is true that the user UI language is sometimes the same as the system UI, in other instances it is not. The system language is the language of the localized version that was used to set up Windows 2000 Professional or Windows XP Professional. All menus, dialog boxes, error messages, and Help files are in this language, except on multilanguage versions (such as on the MUI Pack of Windows XP Professional and the MultiLanguage version of Windows 2000 Professional), where the user can select a different language.
The user UI language on a non-MUI machine would be the same as the system UI language. With MUI, however, the user can change the language by clicking the Languages tab within the Regional And Language Options property sheet. (See Figure 6.) To see the effect of this change, the user will have to log off and then log back on.
Figure 6: Regional Options with the location variable set to Malaysia
Browser Language Setting
Microsoft Internet Explorer versions 4 and later share the same functionality across all Windows platforms. Since versions of Windows prior to Windows 2000 did not offer the same flexibility in terms of locale selection (no distinction between user and system locale, for example), Internet Explorer allows the user to select the browser language setting. Web sites can use this setting to offer their content in the user's selected language and to format and display data using the selected locale standards, as in the case of http://www.msn.com. To try out the browser language setting, from Internet Options, click the Languages button (see Figure 4-8), set your browser language to French (France), for example, and go to the MSN Hotmail site at http://www.hotmail.com. MSN Hotmail reads the selected browser setting and redirects you to the French Hotmail version.
All the locale variables described previously function independently of one another, and changing one of them does not affect the setting of the other variables. To summarize this section, here's an example involving an English version of Windows XP.
Figure 7: The browser language setting can be accessed through Internet Options on the Tools menu. Click the Languages button
Retrieving the User Locale in Win32
Since all APIs and functions that allow locale-specific formatting take as an argument the locale ID or the culture name for which this formatting should be performed, the first step in writing locale-aware software is to retrieve the proper locale. How you retrieve this locale will depend on whether you're dealing with Win32 applications, Web pages, or the .NET Framework. Use the GetUserDefaultLCID API to retrieve the user locale value shown as follows:
Since this is a user-specific setting, there is no API made public to alter its value. The user must manually change the value. Besides retrieving the currently selected user locale, you might also want to enumerate and find out about all installed or supported locales in the system. An installed locale is a locale for which the appropriate code page and language support have been installed on the system; a supported locale is the locale for which the system provides appropriate NLS information with appropriate script support installed. The code sample that follows enumerates all installed locales in the system and finds the appropriate language name associated with each LCID.
The big limitation of this enumeration is the fact that LCIDs are returned as character strings, and yet in other NLS APIs, a numeric value is expected. To address this, you can write your own string-to-integer transformation function. (Keep in mind that the character string is the actual hexadecimal representation of the LCID.) In the code sample you've just seen, the call to uiConvertStrToInt is an internal call to this transformation function. Another thing that's noteworthy about this code sample is the call to a commonly used API called GetLocaleInfo. GetLocaleInfo can retrieve a variety of locale-specific information, from localized names for days of the week to the default paper size used for each locale. You can either specify the LCID of the locale for which the information is being retrieved, or use the predefined flag LOCALE_USER_DEFAULT. The latter defaults to the currently selected user locale and saves you from having to call GetUserDefaultLCID. The user locale represents the user's preference for formatting locale- sensitive data for Win32 applications. In Web content, you will need to retrieve the browser language setting to represent content that corresponds to the language and locale of the user.
Retrieving the Current CultureInfo in .NET Framework
Since all APIs and functions that allow locale-specific formatting take as an argument the locale ID or the culture name for which this formatting should be performed, the first step in writing locale-aware software is to retrieve the proper locale. How you retrieve this locale will depend on whether you're dealing with Win32 applications, Web pages, or the .NET Framework.
The CultureInfo class holds culture-specific information, such as the associated language, country or region, calendar, and cultural conventions. The CurrentCulture property gets the CultureInfo instance that represents the culture used by the current thread and returns the value of Thread.CurrentCulture. By default, the CultureInfo property is set to the currently selected user locale of the system, as set by the user in the Regional And Language Options property sheet. Properties and standards of this default culture should be used to represent formatted data to the user. You can also explicitly set the value of CurrentCulture to a given culture name in your code (for the purpose and usage of your application only and not through the system). The following example sets the CurrentCulture to Finnish (Finland) in C#:
Thread.CurrentThread.CurrentCulture = new CultureInfo("fi-FI");
The CurrentCulture property expects a culture that is associated with both a language and a region, such as ("es-ES") for Spanish in Spain. Because a language is often spoken in more than one country or region, the regional information is necessary to determine the appropriate formatting conventions to use. A neutral culture cannot be used for the creation of the CurrentCulture property. If you only have access to a neutral culture, you can create a CultureInfo object in the format that the CurrentCulture property expects. This is done by using the CultureInfo.CreateSpecificCulture method. This method maps a neutral culture to the default specific culture it is associated with, and then creates a CultureInfo object that represents that specific culture. The following code example uses the CultureInfo.CreateSpecificCulture method to map the neutral culture "it" for Italian to the specific culture "it-IT." It then creates a CultureInfo object for "it-IT" and uses it to initialize the value of the CurrentCulture property.
Thread.CurrentThread.CurrentCulture = new CultureInfo("fi-FI");
Similar to the Win32 NLS paradigm, the CurrentCulture class allows you to enumerate all installed or supported cultures. The CultureInfo.GetCultures method can be used for that purpose, as shown in the following example where all supported locales are enumerated:
Retrieving the Browser Language Setting in Web Pages
By default, the global locale of your Web content will always match the following:
When it comes to locale-aware and culture-aware Web design, it's important to represent the data in the client-side format rather than defaulting to the server-side setting. Suppose your server is hosted on an English machine with an English (United States) user locale, but its content is viewed by an English (United Kingdom) user, where the date formatting goes from the English (United States) format of mm/dd/yy to the English (United Kingdom) format of dd/mm/yy.
The browser language setting is commonly used by multilingual Web sites to define the default language in which their content should be represented to the user, as well as the locale in which the data formatting should follow its standards. The technique of trying to get the browser-setting information from the client side is usually referred to as "browser sniffing." The user can set the language in the browser. For example, in Internet Explorer choose Internet Optionsfrom the Tools menu, and then click the Languages button of the General tab to choose one or more preferred languages. Other browsers support the same functionality; Netscape 4.x and 6.x allow the user to set this information by clicking Preferences in the Edit menu. If using Internet Explorer, the user can also change regional settings and have the browser automatically pick up the new language choice. This information is sent to the server in the form of a server variable known as "HTTP_ACCEPT_LANGUAGE." You can retrieve it in ASP with VBScript code such as the following:
This string, now sitting in the stLang variable, can be used in many different ways to control the content of your site. For example, you can:
The only difficulty in retrieving the browser language is that if you select multiple languages, your HTTP_ACCEPT_LANGUAGE string will look something like the following:
In this string, locales are separated by a comma-in this case, Estonian, English (United States), Farsi, Italian (Italy), and French (France). The "q=" represents the priority of each language to help create a fallback mechanism. Suppose the preferred language is Estonian (q=1 by default). However, if you are not offering any support for this language, you can parse the HTTP_ACCEPT_ LANGUAGE string for the next preferred language (q=0.8 for English).
You can then use the SetLocale function to set the global locale of your Web content. (This setting will only be applicable to your session and context.) The following example explicitly sets this locale to Estonian, and the original variable keeps the previous or original locale:
The navigator object navigator.userLanguage can also be used to retrieve the browser locale.
Locale model in Console Mode
Language-specific operations of console applications can follow the locale settings of the system. This will guarantee locale-authentic formatting of numbers, date, time, currency values, and collation (sorting) operations. Locale support is available for text-mode applications through both C run-time (CRT) and Win32. However, these mechanisms should never be mixed. For new Windows applications, Win32 mechanisms are preferred over those in CRT with regard to world-ready code.
C Run Time
CRT locale support is built around the (_w) setlocale (category, locale) call. A call to this function defines the results of all subsequent CRT-based locale-sensitive operations, not only the character encoding. The category argument defines scope of environment changes after setlocale is called.
In order to set the rules for formatting locale-sensitive data in accordance with the user locale, the following calls can be executed:
".OCP" and ".ACP" parameters always refer to the settings of the user locale, not the system locale. While selecting this locale for LC_CTYPE or LC_ALL is not a good choice, all other categories should be set to match the user locale, unless your console must be explicitly independent of the user's settings.
Win32 and .NET Framework
Console applications based on the Win32 API and .NET Framework are no different from other Win32 or .NET applications in their ability to access the Win32 NLS API or the System.Globalization namespace. Using them, however, might cause some display problems because of the special rules that apply to console input/output (I/O). The user locale or CurrentCulture might require the date and time to be formatted using characters in a range unsupported by the console. For example, you cannot write out dates in a long Hebrew format or times with Arabic A.M./P.M. symbols. Some special design decisions must be made to avoid displaying culturally correct yet unreadable data. However, you should avoid taking the easy route of hard-coding the format for your output.