Microsoft .NET Framework 4 (Beta 1): What is New in Globalization
By Mohamed Elgazzar
After a first glance, you may think that the globalization features in the .NET Framework 4 are the same as in .NET Framework 3.5. You will be surprised when I tell you that this is inaccurate. No wonder you are surprised because you cannot see many interface changes in the Globalization Namespace classes and enumerations! Your question will be then, what is new in globalization!
Renovating Globalization Information
In the real world, the globalization information is constantly changing because of cultural developments in the local markets, because of new standards which update the culture sensitive information frequently, or because Microsoft finds more accurate information about different markets or expands into more markets. Microsoft .NET Framework 4 supports a minimum of 354 cultures compared to a minimum of 203 cultures in the previous release. Many of those cultures are neutrals that were added to complete the parent chain to the root neutral culture. For example, three Inuktitut neutrals were added to the already existing cultures Inuktitut (Syllabics, Canada) and Inuktitut (Latin, Canada) as shown in the following table.
| Culture Display Name | Culture Name | LCID |
| Inuktitut | iu | 0x005d |
| Inuktitut (Syllabics) | iu-Cans | 0x785D |
| Inuktitut (Syllabics, Canada) | iu-Cans-CA | 0x045D |
| Inuktitut (Latin) | iu-Latn | 0x7C5D |
| Inuktitut (Latin, Canada) | iu-Latn-CA | 0x085D |
Table – Example of many new neutrals
New Specific Cultures were introduced such as the new Serbian cultures (see the following table). The old Serbian cultures are renamed to Serbian (Cyrillic, Serbia and Montenegro (Former)) and Serbian (Latin, Serbia and Montenegro (Former)) to avoid Display Name collision. Those cultures are kept in the .NET Framework 4 and also they kept their information including the Culture Name and Culture ID.
| Culture Display Name | Culture Name | LCID |
| Serbian - Serbia (Latin) | sr-Latn-RS | 0x241A |
| Serbian - Serbia (Cyrillic) | sr-Cyrl-RS | 0x281A |
| Serbian - Montenegro (Latin) | sr-Latn-ME | 0x2C1A |
| Serbian - Montenegro (Cyrillic) | sr-Cyrl-ME | 0x301A |
Table – New specific cultures
Chinese cultures had a few changes in the Display Name to follow its naming convention Language Name ([Script,] Country/Region Name). Chinese cultures (listed in the table below) are enumerated if you specify All Cultures when calling GetCultures(). In this release, zh-CHS and zh-CHT display names are appended with the word “Legacy” to differentiate them from zh-Hans and zh-Hant. zh which was recently introduced into Windows has “Chinese” as the display name of the culture.
| Display Name | Culture Name | LCID |
| Chinese | zh | 0x7804 |
| Chinese (Simplified) Legacy | zh-CHS | 0x0004 |
| Chinese (Traditional) Legacy | zh-CHT | 0x7C04 |
| Chinese (Simplified) | zh-Hans | 0x0004 |
| Chinese (Traditional) | zh-Hant | 0x7C04 |
| Chinese (Simplified, PRC) | zh-CN | 0x0804 |
| Chinese (Traditional, Hong Kong S.A.R.) | zh-HK | 0x0C04 |
| Chinese (Traditional, Macao S.A.R.) | zh-MO | 0x1404 |
| Chinese (Simplified, Singapore) | zh-SG | 0x1004 |
| Chinese (Traditional, Taiwan) | zh-TW | ox0404 |
Table – Framework Supported Chinese cultures
The parent chain of the Chinese cultures now includes the root Chinese culture. Here are two examples that show the complete parent chain for two of the Chinese specific cultures.
- zh-CN → zh-CHS → zh-Hans → zh → Invariant
- zh-TW → zh-CHT → zh-Hant → zh → Invariant
Tibetan (PRC), French (Monaco), Tamazight (Latin, Algeria) and Spanish (Spain, International Sort) display names were updated as well. When the Display Name changes, usually the English and Native names reflect this change; however, the changes could include the ISO and abbreviated names of script, language and country.
There are many other updates to the values of the globalization properties such as currency, date and time formats, day and month names, AM and PM designators, and some number formatting properties. Here are some examples to demonstrate some of the updates such as the Currency names in RegionInfo. Another example is the short date pattern in the DateTimeFormatInfo class.
| Culture Name | v3.5 Currency Name | v4 Currency Name |
| mt-MT | Maltese Lira | Euro |
| sk-SK | Slovak Koruna | Euro |
| sl-SI | Slovenian Tolar | Euro |
| tr-TR | New Turkish Lira | Turkish Lira |
| Culture Name | v3.5 Short Date Pattern | v4 Short Date Pattern |
| ar-SA | dd/MM/yy | dd/MM/yyyy |
| prs-AF | dd/MM/yy | yyyy/M/d |
| ps-AF | dd/MM/yy | yyyy/M/d |
| pt-BR | d/M/yyyy | dd/MM/yyyy |
Table – Short date pattern updates in DateTimeFormatInfo and currency updates in RegionInfo
Some calendar data was changed such as day and month names for many locales like the CultureInfo. DateTimeFormat.ShortestDayNames of the Arabic locales. Some of the right-to-left locales such as prs-AF, ps-AF, and ug-CN had wrong values for TextInfo.IsRightToLeft property which were fixed in this version.
Getting Current Globalization Information
One of the main globalization features of the .NET Framework 4 is the ability to provide the most recent information where available. The oldest globalization information that this release will provide is the data available at the shipping time and only when running on Windows prior to Windows 7. When running on Windows 7 and later releases, the globalization information will be retrieved directly from the operating system, which means that customers will get the current globalization information when upgrading to new Windows. This is not all you get out of this feature; there is another aspect of this feature when running on Windows 7 and later versions. Customers will see a unified globalization experience for both native (Win32) and managed (.NET) applications.
Because of the ever changing world, the globalization information is subject to change at any time; developers should not expect the values of the globalization properties to persist between releases, or even for the same release of the .NET Framework. This is not entirely new behavior for the .NET Framework users. The properties of the Windows-Only-Cultures which were supported since .NET Framework 2 could have different values when running on different versions of Windows
Culture Name is the most stable property of the culture information and is expected to remain stable in future releases. Other properties could change at any time according to standard or in-country changes. An example to mention here is the culture Display Name which could change. Applications should not take any dependency on the spelling of the Display Name or any other textual or numerical data.
The globalization information retrieval mechanism is changing in the .NET Framework 4. When running on Windows 7 and up, the globalization information will be retrieved directly from Windows. When running on pre-Windows 7 releases (such as Vista, XP, server 2003 and 2008), the globalization information will be retrieved from an internal data store to ensure that your application is not retrieving very old data. The following architectural diagram visualizes the globalization information retrieval model.
.jpg)
Diagram –Globalization Properties Architecture in .NET Framework 4
With this new design, the definition of some of the CultureTypes will change since the globalization information will be retrieved from different locations depending on the hosting operating system. Culture Types: WindowsOnlyCultures and FrameworkCultures are now obsolete. If you try to use those CultureTypes, the compiler will give you a warning; however, the compilation will succeed. Using WindowsOnlyCultures will return no cultures but FrameworkCultures will return the same results as .NET Framework 2. Other CultureTypes will continue to have the same definition as before.
| “warning CS0618: 'System.Globalization.CultureTypes.FrameworkCultures' is obsolete: 'This value has been deprecated. Please use other values in CultureTypes” |
String Handling Story
Sorting, casing, normalization and Unicode character information behaviors live in many classes of the Microsoft .NET Framework. Some of the most obvious implementations of those features are in the System.Globalization namespace in classes such as CharUnicodeInfo, CompareInfo, StringInfo, TextInfo, and TextElementEnumerator. In the .NET Framework 4, the behavior of those features was upgraded to be synchronized with Windows 7, which provides richer linguistic sorting and casing capabilities for the CJK languages, and to fix many issues reported by customers over the last few years for other languages as well. The most important change in this area is compliance with the Unicode standard 5.1. This standard refresh added support for approximately 1400 characters such as new symbols, arrows, diacritics, punctuation, mathematical symbols, CJK strokes, Ideographs, Malayalam and Telugu numeric characters. In addition, it improved sorting and casing for characters within the following existing scripts: Latin, Myanmar, Arabic, Greek, Mongolian, Cyrillic, Gurmukhi, Oriya, Tamil, Telugu, and Malayalam. It also added support for the following new scripts: Sundanese, Lepcha, Ol Chiki, Vai, Saurashtra, Kayah Li, Rejang, and Cham.
Because there are many scenarios that require consistent behavior in the string handling across different versions of Windows such as database indexing, the .NET Framework 4 guarantees a consistent behavior of string handling operations regardless of the hosting Windows. Future releases of the Framework may reevaluate the consistency requirements based on customer feedback.
Some applications could have taken a dependency on the .NET Framework 2–3.5 sorting and casing behavior which is not recommended practice in general unless applications have mitigation plans for any sorting/casing behavior change. Examples of those application scenarios are creating database indexes and storing sort keys. To avoid the risk of breaking those applications by losing their data or getting failures, .NET Framework 4 will provide developers an opt-in option to the sorting and casing behavior of .NET Framework 2–3.5 if their applications have dependency on this behavior. The details of how to opt-in to the old behavior will be shared during Beta2 timeframe.
Alternate sort orders provide more than one sort behavior for some cultures. For example, the German (Germany) culture has the dictionary sort order as the default behavior. However, it supports the phone book sort as an alternate sort order. As another example, the Chinese (Simplified, PRC) culture supports sort by pronunciation as the default behavior and sort by stroke count as an alternate sort order. To specify the alternate sort order, you can create a CultureInfo object using the LCID or name of the alternate sort order. Three alternate sort locales are removed from .NET Framework 4 as they were deprecated from Windows 2000. Other alternate sort orders that were supported by the Framework will remain intact. The following table shows those alternate sort orders.
| Culture name | Language-country/region | Default sort name and LCID | Alternate sort name and LCID |
| zh-HK | Chinese - Hong Kong SAR | Default: 0x00000c04 | zh-HK_stroke: 0x00020c04 |
| ja-JP | Japanese – Japan | Default: 0x00000411 | ja-JP_unicod: 0x00010411 |
| ko-KR | Korean – Korea | Default: 0x00000412 | ko-KR_unicod: 0x00010412 |
Table – Deprecated Alternate Sort Orders
Applications trying to construct a CultureInfo with the deprecated LCID of the alternate sort orders will get a Culture Not Found exception since the sort order is not supported anymore. The same exception will be thrown for any culture that is not supported (enumerated cultures and alternate sort orders). More information about the exception will be available in Beta2 timeframe.
One of the Microsoft long-term strategies for globalization features is to reduce the usage of the Locale Ids. In .NET Framework 4, CompareInfo.ToString() and TextInfo.ToString() will have only culture names for all cultures instead of having a culture name and a LCID as part of the class name. For example, .NET Framework 4 will return “en-US CompareInfo - en-US” –instead of “en-US CompareInfo – 1033” which used to be the return value in .NET Framework 2-3.5.
Being more specific!
Previous releases of the .NET Framework throw an exception if applications try to access some of the neutral culture properties such as the CultureInfo.DateTimeFormat.FirstDayOfWeek property. In .NET Framework 4, all neutral culture properties will return values which will come from the specific culture which is most dominant for that neutral culture. For example, French neutral locale will retrieve the values of most of its properties from French (France). The CultureInfo.DateTimeFormat.FirstDayOfWeek property would return Monday for French which maps to the value in the French (France) culture.
Some properties will be an exception to this rule where they will have different values from the dominant culture properties such as the language name. For example, the language name of the Norwegian neutral culture is Norwegian while the language name of the specific culture of Norwegian, Bokmål (Norway) is Norwegian (Bokmål).
Some properties and methods of neutral cultures will return specific cultures instead of the neutral cultures such as KeyboardLayoutId property and GetConsoleFallbackUICulture method in CultureInfo class.
| KeyboardLayoutId |
| Culture Name | v3.5 | v4.0 |
| ar | 1 | 1025 |
| es | 10 | 1034 |
| fr | 12 | 1036 |
| zh-CHS | 4 | 2052 |
| GetConsoleFallbackUICulture |
| Culture Name | v3.5 | v4.0 |
| af | af | af-ZA |
| de | de | de-DE |
| en | en | en-US |
| ja | ja | ja-JP |
Table – Specific cultures are returned instead of neutral cultures
Custom Cultures Changes
One of the main changes in custom cultures is that the neutral replacement cultures created by the .NET Framework 2 will not load using the .NET Framework 4.
After registering a replacement culture using the CultureAndRegionInfoBuilder class, the overridden information from the custom culture will not be available immediately to the process that created the custom culture. However, processes launched after registering this custom culture will be able to read the overridden information.
What Was Not Changed!
Some globalization feature areas are not changed in the .NET Framework 4 such as text information, encoding, calendar functionality, and IDN features. Those areas function the same way as in the previous release.