Case mappings, alphabetical order, and conventions for sequencing items vary from culture to culture. You should be aware of these variations and understand that they can cause the results of string operations to vary depending on culture.
The unique case-mapping rules for the Turkish alphabet illustrate how uppercase and lowercase mappings differ from language to language even when they use most of the same letters. In most Latin alphabets, the character "I" (Unicode 0069) is the lowercase version of the character "I" (Unicode 0049). However, the Turkish alphabet has two versions of the character "I": one with a dot and one without a dot. In Turkish, the character "I" (Unicode 0049) is considered the uppercase version of a different character "I" (Unicode 0131). The character "I" (Unicode 0069) is considered the lowercase version of yet another character "İ" (Unicode 0130). As a result, a case-insensitive string comparison of the characters "I" (Unicode 0069) and "I" (Unicode 0049) that succeeds for most cultures fails for the culture Turkish (Turkey), designated "tr-TR".
Note |
|---|
The culture Azerbaijani (Azerbaijan, Latin), designated "az-Latn-AZ", also uses this case-mapping rule. |
The following code example demonstrates how the result of a case-insensitive String.Compare operation performed on the strings "FILE" and "file" differs depending on culture. The comparison returns true if the Thread.CurrentThread.CurrentCulture property is set to the culture English (United States), designated "en-US". The comparison returns false if the current culture is set to Turkish (Turkey), designated "tr-TR".
Imports System Imports System.Globalization Imports System.Threading Public Class TurkishISample Public Shared Sub Main() ' Set the CurrentCulture property to English in the U.S. Thread.CurrentThread.CurrentCulture = New CultureInfo("en-US") Console.WriteLine("Culture = {0}", _ Thread.CurrentThread.CurrentCulture.DisplayName) Console.WriteLine("(file == FILE) = {0}", String.Compare("file", _ "FILE", True) = 0) ' Set the CurrentCulture property to Turkish in Turkey. Thread.CurrentThread.CurrentCulture = New CultureInfo("tr-TR") Console.WriteLine("Culture = {0}", _ Thread.CurrentThread.CurrentCulture.DisplayName) Console.WriteLine("(file == FILE) = {0}", String.Compare("file", _ "FILE", True) = 0) End Sub End Class
using System; using System.Globalization; using System.Threading; public class TurkishISample { public static void Main() { // Set the CurrentCulture property to English in the U.S. Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US"); Console.WriteLine("Culture = {0}", Thread.CurrentThread.CurrentCulture.DisplayName); Console.WriteLine("(file == FILE) = {0}", (string.Compare("file", "FILE", true) == 0)); // Set the CurrentCulture property to Turkish in Turkey. Thread.CurrentThread.CurrentCulture = new CultureInfo("tr-TR"); Console.WriteLine("Culture = {0}",Thread.CurrentThread.CurrentCulture.DisplayName); Console.WriteLine("(file == FILE) = {0}", (string.Compare("file", "FILE", true) == 0)); } }
The following output illustrates how the results vary by culture, because the case-insensitive comparison of "I" and "I" evaluates to true for the "en-US" culture and false for the "tr-TR" culture.
Culture = English (United States) (file == FILE) = True Culture = Turkish (Turkey) (file == FILE) = False
In addition to the unique case mappings used in the Turkish and Azerbaijani alphabets, there are other custom case mappings and sorting rules that you should be aware of when considering string operations. The alphabets of nine cultures in the ASCII range (Unicode 0000 through Unicode 007F) contain two-letter pairs for which the result of a case-insensitive comparison, for example, using String.Compare, does not evaluate to equal when the case is mixed. These cultures are:
-
Croatian (Croatia), "hr-HR"
-
Czech (Czech Republic), "cs-CZ"
-
Slovak (Slovenia), "sk-SK"
-
Danish (Denmark), "da-DK"
-
Norwegian (Bokmål, Norway), "nb-NO"
-
Norwegian (Nynorsk, Norway), "nn-NO"
-
Hungarian (Hungary), "hu-HU"
-
Vietnamese (Vietnam), "vi-VN"
-
Spanish (Spain, Traditional Sort), "es-ES_tradnl"
For example, in the Danish language, a case-insensitive comparison of the two-letter pairs "aA" and "AA" is not considered equal. In the Vietnamese alphabet, a case-insensitive comparison of the two-letter pairs "nG" and "NG" is not considered equal. Although you should be aware that these rules exist, in practice, it is unusual to run into a situation where a culture-sensitive comparison of these pairs creates problems, since they are uncommon in fixed strings or identifiers.
The alphabets of six cultures within the ASCII range have standard casing rules, but different sorting rules. These cultures are:
-
Estonian (Estonia), "et-EE"
-
Finnish (Finland), "fi-FI"
-
Hungarian (Hungary, Technical Sort Order), "hu-HU_technl"
-
Lithuanian (Lithuania), "lt-LT"
-
Swedish (Finland), "sv-FI"
-
Swedish (Sweden), "sv-SE"
For example, in the Swedish alphabet, the letter "w" sorts as if it is the letter "v". In application code, sorting operations tend to be used less frequently than equality comparisons and therefore are less likely to create problems.
An additional 35 cultures have custom case mappings and sorting rules outside of the ASCII range. These rules are generally confined to the alphabets used by the specific cultures. Therefore, the likelihood that they will cause problems is low.
For details about the custom case mappings and sorting rules that apply to specific cultures, see The Unicode Standard at the Unicode home page.
Concepts
Other Resources
There seem to be a lot of uppercase I's in the second paragraph. Only one is different, and it's a dotted uppercase Turkish I. There probably should be the characters i and ı, too, somewhere. As it is, that paragraph makes no sense.
The Right Characters?
Thanks for pointing this out, Roberto. I suspect that the use of all uppercase I's was deliberate, although I don't know why. We'll modify the documentation to reflect the actual characters that correspond to the Unicode code units.
--Ron Petrusha
Common Language Runtime User Education
Microsoft Corporation
Note