Locale Hero

Enable Global Apps With Locale Builder And Windows Vista

Kieran Snyder and Shawn Steele

This article is based on a prerelease version of Windows Vista. All information contained herein is subject to change.

This article discusses:
  • How locale data affect your application
  • Creating custom locales that work
  • Dates, times, numbers, and calendars
  • Text rendering in a locale
This article uses the following technologies:
Windows Vista

Contents

Microsoft Locale Builder
Creating or Modifying a Locale
Creating Custom Locales that Work
Date and Time Formatting
Numbers, Currency, and Text
How Locale Data Affect Your Application
Time, Dates, and Calendars
Text Rendering in a Locale
When Not to Use Locale Data
Pseudo-Locales for Testing
What's Next?

Internationalization support in Windows Vista will exist on a scale not seen in previous releases of Windows®. When Windows Vista™ ships, it will have built-in enablement infrastructure for over 130 languages. This will offer an enhanced experience not only for languages that Windows has supported before, but also for many more customers around the globe, including users in Latin America, Asia, and Africa. As a result, the expanded internationalization infrastructure introduced in Windows Vista and the Microsoft® .NET Framework 2.0 opens up a new world of potential customers for application developers as well.

Windows enablement support is organized around the concept of locales. A locale is a set of data associated with a particular language-region pairing-for example, English in Canada, Greek in Greece, or Telugu in India)-that may be used to help shape the user experience. Windows Vista will include over 200 unique locales that will provide data appropriate for customers around the world, including calendar support with pertinent date and time formats; text formatting, including a set of relevant Unicode code points, information about text directionality and writing system; font and text rendering support; appropriate numeral and currency standards and formatting; language and region display names; linguistically appropriate sorting behavior; and keyboard layout or other relevant input method.

The locale the user selects informs many aspects of his user experience. Locale choice determines whether the user sees a 12- or 24-hour clock; what keyboard he uses by default while typing; whether algebra.txt or zebra.png shows up first when he looks at a list of file names; and what calendar he sees in Microsoft Outlook®. The effects of locale selection are pervasive, and the range of locale data creates some unique opportunities and challenges for developers who want to make their applications globally aware. By using locale data correctly, a developer can tailor applications to create a linguistically and culturally appropriate user experience for a wide range of customers.

And the range seen in existing Windows Vista locales is just the beginning. With the introduction of Microsoft Locale Builder, you will have the ability to create your own highly customized locale support and share it with customers, colleagues, and broad user communities. Locale Builder allows users to replace a particular piece of data inside a built-in Windows locale or to create a locale for a language-region pairing that is not already supported by Windows. This means that developers will have the opportunity to return highly customized locale data in their applications.

By using the Windows font and rendering support that is available for many of the world's writing systems along with Locale Builder and Microsoft Keyboard Layout Creator, you can create applications that are linguistically accessible for over 90 percent of the world's literate population-that's over 3.5 billion people!

In this article we introduce you to Locale Builder and give you some tips on how to create a custom locale with data that is optimized for application functionality. We also include some development best practices for creating applications that can take advantage of a broad range of locale data to create a customized experience for global customers. Our goal is to enable customers to use our locale infrastructure and tools to provide a globally appropriate user experience across Windows-based applications.

Microsoft Locale Builder

Windows has always provided a way for users to customize their locale data, using the Regional and Language Options control panel to select their locale and customize the time or date formats. However, customization through Regional and Language Options is limited to existing locales and relatively simple variations. Additionally, any changes specified in Regional and Language Options do not persist if the locale selection is changed and then changed back.

Although the level of locale support in Windows Vista represents a significant increase in global coverage, it is still relatively small compared to the number of languages and regions in the world. As we go to press, Ethnologue.com lists 6,912 known living languages. Multiply that times a couple hundred countries. Maintaining data on even a fraction of the possible combinations is logistically challenging, particularly for regions where computers are still rare. Additionally, the data for a specific language or region is not always consistent across the preferences of an individual or a group. In these cases, it makes sense to allow users to build the exact locale they need.

Locale Builder was created to help Windows Vista users and administrators easily create new locales or customize existing locales to meet their own needs. Once you've created a custom locale, you can install it on your machine, include it as a package to be installed on a customer's machine, or share it with other people who might be interested in your locale.

Creating or Modifying a Locale

The first step when creating locale support is to see if Windows already supports the locale you want to produce or one that is similar. It is also possible that other users have created a locale for the language or culture that you need. Engaging with a local university or government may be all you need to find a locale appropriate for your use.

If you cannot find an existing locale appropriate for your needs, you can use Locale Builder to make your own. You may want to modify an existing locale if the current support for your language and region doesn't match your expectations, or if a value changes or is wrong, such as when a country adopts the Euro currency after Microsoft has released the new locale. We use the term replacement locale to describe a locale with customized data that is intended to replace a locale shipped by Microsoft. Replacement locales will change the behavior of the target locale for all users of the computer. For instance, if you create a replacement locale for en-US that uses the 24-hour clock, then all applications calling Windows APIs for en-US time support will use the 24-hour setting.

You can also choose to create an entirely new locale, known as a supplemental locale. Locale Builder guides you to use an existing locale as a template for your new locale. This is useful particularly if the language or region for which you are creating support already exists in another locale.

When you start Locale Builder you will be asked for the name of your new locale and the existing locale on which you'd like to base your custom solution as shown in Figure 1. Names are generally in the RFC 4646 language-country ISO standard form, such as haw-US for Hawaiian (United States). In the case of a replacement locale, the new locale name will be the same as the existing locale name. Locale Builder gives a short preview of what the source locale data look like. Click Next when the name and template locale have been selected.

Figure 1 Selecting the Starting Point for a Custom Locale

Figure 1** Selecting the Starting Point for a Custom Locale **(Click the image for a larger view)

After selecting the basic locale name, you have the opportunity to edit the properties of that locale from the locale formats window as shown in Figure 2. Choose from items such as locale names and standards information, number and currency formatting, time and dates patterns, or other kinds of linguistic and cultural information. Each group allows a range of fields that can be customized to the needs of your particular locale.

Figure 2 Editing Locale Properties

Figure 2** Editing Locale Properties **(Click the image for a larger view)

After the locale has been created, you will want to save a Microsoft Installer .msi file so that you can install it, and you will also want to save a source .ldml file so that you can easily make changes later. Click the Save button in the locale editing window to see the save dialog. Choose your file name and location. The locale name usually makes an appropriate file name.

To install your custom locale, just double-click the .msi that you created. Note that you have to be an administrator to install a custom locale. If you want to share the locale with other users you can share either the .msi or the .ldml file. If you need to install the locale as part of a larger distribution you may want to create an .msm file to use with that distribution's installer.

Once it's installed you can then select your new custom locale from the Regional and Language Options control panel (see Figure 3). Verify that the examples display the content you expect and then test your custom locale in appropriate applications, such as Outlook, Windows Explorer, and any applications important to your needs. You can use the Uninstall a Program Control Panel option to remove a locale previously installed via the .msi installer.

Figure 3 Saving a Custom Locale

Figure 3** Saving a Custom Locale **(Click the image for a larger view)

Creating Custom Locales that Work

So now you know how to use Locale Builder, but the big reason you're building a custom locale is that existing Windows locales don't meet all of your requirements. Maybe there's a locale for your language and region, but you prefer a 24-hour clock. Maybe the default character set is Cyrillic, but new legislation has passed requiring school systems in your region to use Latin. Perhaps your country has recently joined the European Union and is adopting the Euro, and you need a currency symbol update. Or maybe Windows doesn't support your locale at all.

We've received feedback from customers concerning all of these scenarios and more. The fact is that linguistically and culturally appropriate data changes over time, and standards and user expectations change to match. This can create sets of users that have very different target user experiences. If you're building a custom locale, it's because you or your customers want something different from what Windows provides today. And there's a pretty good chance that you have a clear idea of just what it is that you need to add or change, whether it's an entire locale or just a piece of an existing locale.

This section presents some best practices for moving forward once you've identified linguistically and culturally appropriate data for your custom locale. By following these best practices, you can ensure that your locale data are optimized for application functionality.

Windows locales have historically been labeled with numeric identifiers called LCIDs. Windows Vista introduces a set of APIs that use string-based locale identifiers to complement the APIs that take LCIDs. Microsoft is migrating away from LCIDs in favor of string-based identifiers for our new locale support, as the meaning of string-based identifiers is more transparent and strings are more extensible for users creating custom locales. If you use Locale Builder to create a supplemental locale, one thing you'll need to do is select an appropriate identifier.

The identifiers used for Microsoft locales in both Windows and .NET mirror the Internet Engineering Task Force (IETF) standard for locale identifiers as closely as possible. The IETF standard relies on RFC 4646, which in turn uses ISO 639 tags to identify languages and ISO 3166 tags to identify regions. The standard relies on full, descriptive Unicode script names and provides a syntax for combining these tags to identify a set of locale data. Identifiers are of the form:

ll(l)-Ssss-CC

In this syntax, ll(l) denotes the two-letter language tag (or the three-letter tag if there is no ISO two-letter code for your language), Ssss denotes an optional script tag, and CC denotes the region tag. In Windows locales, we only include the script tag if there are multiple locales supporting a particular language-region pairing but using different writing systems. Some examples of Windows locale identifiers are shown in Figure 4.

Figure 4 Examples of Windows Locale Identifiers

Identifier Meaning
en-CA English (Canada)
fr-FR French (France)
ja-JP Japanese (Japan)
quz-BO Quechua (Bolivia)
uz-Latn-UZ Uzbek (Latin, Uzbekistan)
uz-Cyrl-UZ Uzbek (Cyrillic, Uzbekistan)

Rarely, users may need language codes for concepts that aren't addressed by ISO 639 or RFC 4646. In those cases ISO 639 provides the user-assigned codes qaa-qtz, which you may use. Similarly ISO 3166 provided user-assigned region codes such as QMA to QZZ and 900 to 999. RFC 4646 also allows any label after an x label to be user defined. Some examples of user-assigned codes would be en-QMA, qaa-QZZ, and en-x-Aviation. Should you find that you need custom codes for locales, please follow the standards.

We recommend that you choose your locale identifier so that the intended contents of the locale are as transparent as possible to potential consumers of your locale data. If you create a custom locale to share with others, using an identifier that takes advantage of an international standard is a good way to ensure that you are clear about the set of users your locale is intended to support.

You will also want to choose the display name for your locale such that it clearly states the language, region, and, if appropriate, script that your locale supports. The display name you choose will be visible (and selectable) in the Regional Settings control panel on any machine where the locale is installed, and applications may choose to display it as part of selectable UI. The clearer your display name is, the more likely it is that it will be selected by your target users.

Date and Time Formatting

Windows-based applications use the date, time, and calendar properties that you specify much more frequently than other properties. The calendar type that you select for your custom locale will determine which calendar your locale's users see in every calendaring application that calls into Windows for support, including the Windows calendar and Outlook.

There are a few things to keep in mind when selecting your calendar data. First, short day names really are short. We recommend that super-short day names be no more than two characters long. Calendaring applications will often use these abbreviations to display weekdays at the top of calendar rows where UI space is at a premium, as in the M?ori Gregorian calendar in Outlook 2007 shown in Figure 5.

Figure 5 Māori

Figure 5** Māori **

Short day names should also be unique. You'll need to make sure that the super-short day-name abbreviations that you select are not all identical to one another so that calendar labels mean something. If every day in your language starts with P, you'll probably want to include a second code point in your short day name abbreviations.

Abbreviated day-names are longer than short day names, but not by much. Calendars frequently use these day-name abbreviations as well as abbreviated month names, so you will want to be mindful of their usual space constraints on UI. For instance, the Windows calendar uses abbreviated day names to label days of the week as shown in Figure 6.

Figure 6 Yoruba Windows Calendar

Figure 6** Yoruba Windows Calendar **(Click the image for a larger view)

Choose your time and date patterns wisely. Locale Builder offers you a selection from among existing Windows patterns, or you can choose to create a custom pattern (see Figure 7).

Figure 7 Customizing Time and Date Formats

Figure 7** Customizing Time and Date Formats **(Click the image for a larger view)

If you choose to create a custom pattern including text or unique date separators, you will want to make sure that you distinguish between formatting characters (M, d, y) and actual text to be rendered in the presentation of date and time formats. The way to distinguish formatting characters from actual text is by escaping the text to be rendered inside single quotes, as seen in the long date formats for Simplified Chinese:

  • yyyy'年'M'月'd'日'
  • yyyy'年'M'月'd'日',dddd
  • dddd,yyyy'年'M'月'd'日'

Any data within single quotes will be displayed without the quotes. Any sequence requiring a single quote can be escaped by using two single quotes: dd'|'MM'|'yyyy will display in the form 18|08|2006, and hh:mm''''ss'''' will display in the form 12:09'53'' (note that the second quote is a double quote).

It's also important to keep in mind that when it comes to formatting characters, capitalization matters. For example, M is the formatting character representing months in date formats, while m is the formatting character representing minutes in time formats. Things like this may seem small, but if your custom locale gets them wrong, then every application that uses your date and time formats will be broken.

Both short and long date formats should contain content to represent day, month, and year, with short dates typically using abbreviated month names and long dates using the full string.

You can choose to include multiple calendar types in your locale, but only the Gregorian calendar can be localized. Locale Builder allows users to select multiple calendar types for a custom locale, with one calendar designated as the default.

The Gregorian calendar is automatically available for all custom locales, and it is currently the only calendar for which you will need to specify localized day and month names. If you select another calendar type to be included, the data that will be used for month and day names will come from built-in Windows data. If you select a non-Gregorian calendar to be the default for your custom locale, you will want to make sure that the built-in calendar name data work for your intended users.

Separators matter. The Windows Vista Regional and Language Options Control Panel expects only the four separator types shown in Figure 8. If your locale requires some other date separator, you will want to make sure that you escape it with single quotes in the way described previously.

Figure 8 Date Separators

Date Format Description
dd-mm-YY Default short date pattern for Kannada (India)
dd.MM.yyyy Default short date pattern for French (Switzerland)
dd/MM/yyyy Default short date pattern for Irish (Ireland)
d MMMM yyyy Default long date pattern for Tajik (Tajikistan)

Numbers, Currency, and Text

The data that you specify for formatting numbers and currencies can be used by a range of applications, including money management tools, spreadsheets, currency converters, and other business applications. It is important that you select your values such that the resulting formatting presents an intuitive user experience across the range of applications that might use the data.

The use of native digits may vary. Not all locales use ASCII 0-9 as the characters that represent their base 10 digits. Even for those locales that do use 0-9, there may be other native forms that are used in special contexts. The Locale Builder allows you to specify the native digits for your locale as needed.

Once you have specified the native digits, you will have to decide where you would like them to be used. If you select National digit substitution, then the native digits will be used everywhere that numbers are needed. If you select Never, then ASCII 0-9 will be used. If you select Context, then the native digits will be used sometimes but not others. For instance, in Windows, Arabic and Thai digits are context-sensitive. Even if your locale does have native digits, you may or may not wish to apply digit substitution across the board. You will have to consider the expectations of your locale's intended audience in choosing the digit substitution setting.

You will be asked to provide both group and decimal separators for your custom locale, where the decimal separator is used to separate whole from fractional digits and the group separator is used to separate strings of whole digits into groups. Be aware that these separators should be distinct characters; if they are the same, then it will not be possible to compute the intended value of a numeric string. For instance, in the en-US locale, the decimal separator is a period and the group separator is a comma, while in the fr-FR locale, it is the reverse:

  • 123,456.7890 (digit frmatting fr en-US users)
  • 123.456,7890 (digit frmatting fr fr-FR users)

You will also be asked to provide the increment at which digits should be grouped. In the examples just shown, the group size is 3, meaning that there must be 3 digits between group separator characters.

In addition to specifying basic text properties such as line directionality and default paper size, you will need to designate the Unicode ranges and other code page assignments for your locale. Windows uses this information to tell applications how users will expect their text to be displayed. Be sure to include exactly those Unicode ranges that represent the character set that is required to support your locale's intended users.

Using Unicode is a requirement for developing really globally extensible applications, but we are aware that some developers are still relying on ANSI and other encoding schemes. Locale Builder lets you specify code pages so that your locale's users can also use non-Unicode applications if they need to.

You will also need to specify the Unicode script name for the writing system that your locale requires. The script name that you select is important in making sure that anti-spoofing mitigation works on your machine, so you will want to make sure that your selection reflects the major writing system used for your locale (see Figure 9).

Figure 9 Selecting Script Name

Figure 9** Selecting Script Name **(Click the image for a larger view)

The most challenging part of number formatting is changing between the decimal and grouping separator. Mistaking a period for a comma can cause values to be incorrect by orders of magnitude. It is recommended that applications persist numbers in a binary or application-specified number format. Of course, when displaying numbers it is important to use the user preferences.

Number representation can also use non-ASCII native digits, so your application may encounter characters other than 0-9 as inputs. Avoid filtering on U+0030 through U+0039 to prevent frustration for users who are trying to enter data using non-ASCII digits.

How Locale Data Affect Your Application

So now there's a custom locale installed on your computer. You've selected it as your default in the Regional and Language Options control panel, allowing applications to use your custom data whenever they ask for the user default locale. The question remains: you want to use a custom locale, but how well do your applications work with it?

The discussion about custom locales can be a little bit scary for the application developer. After all, we generally expect users in the U.S. to use a 12-hour clock with A.M. and P.M., for numbers in the U.S. to have periods before the decimal and commas to mark thousands, and we really expect the spelling of January to stay January. With custom locales it is possible for these values to change, even for en-US.

If a user chooses a 24-hour clock, wants spaces instead of commas in their numbers, or even installs a locale that uses janvier instead of January, she probably has a reason for doing so. It is important to distinguish between data representations that are important to the user and those that are important to the application. The intent of the locale data is for presentation, and applications should respect the user or system administrator's choices whenever possible.

Realize that locale data can change. Microsoft locales themselves have evolved in response to changing cultural preferences, new international standards, or feedback from users. What happens to your application when the date pattern changes from mm/dd/yyyy to dd/mm/yy? How will your application respond if the user's long date pattern causes strings like "The week day of Monday, day 14 of the month August (yes, August), of the year 2006"? What if the locale uses Unicode PUA (private use) characters or characters that don't appear in the computer's ANSI code page?

First, assume that locale data can and will change, even for en-US. Honor the user's selected locale, and allow user overrides. The user wouldn't have specified a preference if she didn't intend for it to be used. Remember that different machines may have different versions of locale data, even for en-US, and that applications cannot specify a specific locale data version.

Allow for unexpected values. Numbers may not be parseable, dates may return longer or shorter strings than expected, or values may be incomplete. When persisting data for future machine reading, use a format that is locale-independent. And finally, use the new named APIs for locale information, such as GetLocaleInfoEx and GetDateFormatEx.

Custom locales have an LCID of 0x1000 (LOCALE_CUSTOM_UNSPECIFIED) or 0x0c00 (LOCALE_CUSTOM_DEFAULT). So you can't tell Hawaiian from Fijian from a custom English from Klingon by using the LCID. Therefore avoid any dependencies on the LCID by calling GetLocaleInfoEx instead of GetLocaleInfo.

Replacement locales will have the same string identifiers as the Microsoft locales on which they are based, such as en-US, zh-CN, and so on. Supplemental locales provided by third parties should use the same format. For example, fj-FJ would be the expected string identifier for Fijian (Fiji). While this naming scheme makes it easy to identify the intended use of the locale, custom locales will differ from existing Microsoft shipped, replacement, or custom locales using the same identifier. Hopefully all locales identified as fj-FJ actually do represent Fijian (Fiji), but there's no way to guarantee that the custom locale was designed correctly, or that identical versions are installed on different machines. Even en-US could be customized with data that differs from the en-US data that we ship with Windows.

Custom locales can specify their display names. Variations of similar locales could cause multiple locales with the same display name. For example an en-Latn-US locale could be created identical to the shipped en-US, including the display name "English (United States)." Does your application need to distinguish between these cases in a list?

Another oddity of custom locales is that the concept of a localized name is impractical. If we take the 6,912 languages mentioned before and localize each language name in the other's languages we end up with 47,775,744 strings that would need to be maintained. For that reason calls for localized names of custom locales generally display the native name of the locale.

Time, Dates, and Calendars

Locales don't always use the Gregorian calendar by default, so your application should make sure it can handle non-Gregorian dates. If you only display formatted dates this may not be a problem, but date entry devices should consider whether a non-Gregorian calendar is a possibility.

Most developers are aware of the difference between a MM/dd/yyyy and dd.mm.yy type string, but more variations are possible, like "day 12, of month 08, of year 2006." In some cultures the inclusion of this sort of rendered text in date patterns is quite common. Date entry can also be tricky. A date format like 04/05/06 doesn't allow easy determination of which value is the month, day, and year. Consider specifying the expected format for the user or using a calendar date picking control. In the data we ship, about half of the long date formats have day of week names by default and half do not. Your application may need to add a day of week if necessary.

Time formats are surprisingly consistent across locales, with the obvious exception of preferences for the 12/24-hour clock and associated A.M./P.M. indicators. Time zones, however, are more challenging. Time zones are not, strictly speaking, part of the locale, but users can specify a time zone for their system. The bigger problem here is that some time zones switch between standard time and daylight savings time according to principles that are not always algorithmic.

Time data should often be stored in Universal Time Coordinate (UTC) and converted to local time for display because this allows the smallest chance of confusion about the time and the error. Some cases don't fit this generalization though. Specific appointments can be stored in UTC, but a recurring appointment (for example, a meeting that occurs every Wednesday at 3:00 P.M.) usually needs to carry the time zone information along with it. The UTC offset is going to be different in the winter than the summer due to daylight savings time.

Some events may need local time. For example, if you are interested in behavioral patterns, it might be interesting to know that your customers access your Web site around noon local time. So this may require storing the time zone as well. Even time zone data that are usually predictable can change, as seen with the recent U.S. government decision to extend daylight savings time into November. In those cases the system rules and any persisted rules may need to be updated.

One very common software problem involves switching to and from daylight savings time. Applications developed in the summer can easily find themselves off by an hour in the winter or vice versa. Other applications may miss events that happen during the transition. Make sure your application's design and test plan consider these scenarios.

Text Rendering in a Locale

Numerous interesting things happen to text rendering in various locales. Users of some locales expect text to be written from left to right (such as English, Russian, and Hindi), while others prefer right to left (such as Arabic and Hebrew). Some locales require unique casing rules, like mapping i to I (English), or i to İ (Turkish). Users also have locale-specific expectations around linguistic sorting behavior, so that German speakers expect ähnelte to sort before plante, but Swedish speakers want it the other way around. Finally, although storing Unicode character data allows applications to support the broadest range of locales, some legacy applications and protocols require code page encodings, and different locales may require different encoding schemes.

The likely behavior of right-to-left (RTL) custom locales can be tested by using a locale like Arabic or Hebrew, or the RTL qps-mirr mirrored pseudo locale. The behavior of custom locales requiring large script size can be tested with an Asian locale like ja-JP or zh-CN or the qps-asia East Asian pseudo locale.

Applications should use Unicode when possible, but some legacy protocols may require using code pages like ASCII, windows 1252, or iso-2022-jp. In those cases using a newer protocol that supports Unicode is a good idea. If Unicode is not possible, then your application may run into difficulties if the character repertoire of a locale does not match that of the computer's system locale. Locale Builder does not allow users to change the code page assignments for replacement locales.

Different locales can follow different casing rules. In English i and I are different cases of the same character. In Turkish there are two pairs: ı and I, and i and İ. Applications that assume that i and I are equal could run into trouble. For some operations other operating system or protocol limitations may require use of a specific casing algorithm to avoid multiple conflicting mappings.

Similarly, sorting and comparison rules can differ across locales. In some languages factors like diacritics and capitalization can radically alter sorting expectations, while in other languages such factors are only considered as tie-breakers to put otherwise identical words into some kind of an order.

Comparisons are also complicated by the indexing that is required for a database application. If comparison rules change as they did between Windows XP and Windows Vista, then databases may need to be reindexed. Such changes can happen between custom locale versions. One fj-FJ locale may have chosen en-US as its sorting locale, but another may have chosen fr-FR. Your application can call GetNLSVersionEx to get the current sort version for a specific locale and determine if re-indexing is necessary.

When Not to Use Locale Data

Not all data manipulation is intended for presentation. If your application wants to save a bunch of numbers to a file so that it can read it in later, or send a bunch of data across the Web to a SOAP server, then formatting it with a locale is not the appropriate choice. Remember that the target machine may have a different locale, so the commas and periods in the numbers might be backwards from your expectations. When sending data to a machine or storing it for later retrieval, use a format that is guaranteed to be read back later. Store binary data, use data standards when appropriate, format data with the invariant locale, or store that data using specific formatting strings (like yyyy/MM/dd) that your application will use to read the data later.

Pseudo-Locales for Testing

Sometimes testing data for unfamiliar locales can be difficult. Windows Vista includes three built-in pseudo-locales for use in testing. The qps-ploc pseudo-locale creates longer than average strings and uses unexpected characters outside of the normal en-US Latin range to imitate situations that may occur in other locales. “[Шěđлеśđαỳ !!!], 8 ōf [Μäŕςћ !!] ōf 2006” is an example long date using the qps-ploc locale. Enabling of pseudo-locales on Windows Vista is discussed on the MSDN® site. You may also build your own pseudo or custom locales for testing. Examples might be locales with really short strings or other data interesting to your app.

What's Next?

Microsoft Locale Builder and other similar tools are the first steps towards creating an extensible model for globalization support that allows customers to create personalized user experiences that are appropriate to a broad range of expectations. Microsoft has also released Keyboard Layout Creator, which allows users to create and share customized keyboard layouts, and Transliteration Utility, a tool to allow users to transliterate text from one writing system to another.

These tools are just the beginning. As more and more customers use the tools to create custom solutions, there will be a need for a centralized portal where people can obtain the tools and share the solutions that they create with others of similar linguistic or cultural background. We also recognize that there are other aspects of globalization support that are just as important. We are seeking feedback from our customers as to which aspects of user experience are crucial for them to be able to customize.

Kieran Snyder is a Program Manager in the Windows Globalization team at Microsoft where she drives core language enablement support and extensibility tools across Microsoft products. Kieran holds a PhD in linguistics from the University of Pennsylvania and blogs at blogs.msdn.com/kierans. She can be reached at kierans@microsoft.com.

Shawn Steele is a Software Design Engineer at Microsoft working on Windows and the .NET Framework. He is primarily responsible for culture/locale data (naDev tlhInganpu' tu'lu', Qapla'!), code pages/encodings, normalization, and IDN. Shawn blogs at blogs.msdn.com/shawnste and can be reached at shawnste@microsoft.com.