Custom Case Mappings and Sorting Rules

Article
11/16/2012

Case mappings, alphabetical order, and conventions for sequencing items vary from culture to culture. You should be aware of these variations and understand that they can cause the results of string operations to vary depending on culture.

The unique case-mapping rules for the Turkish alphabet illustrate how uppercase and lowercase mappings differ from language to language even when they use most of the same letters. In most Latin alphabets, the character "I" (Unicode 0069) is the lowercase version of the character "I" (Unicode 0049). However, the Turkish alphabet has two versions of the character "I": one with a dot and one without a dot. In Turkish, the character "I" (Unicode 0049) is considered the uppercase version of a different character "I" (Unicode 0131). The character "I" (Unicode 0069) is considered the lowercase version of yet another character "İ" (Unicode 0130). As a result, a case-insensitive string comparison of the characters "I" (Unicode 0069) and "I" (Unicode 0049) that succeeds for most cultures fails for the culture Turkish (Turkey), designated "tr-TR".

Note

The culture Azerbaijani (Azerbaijan, Latin), designated "az-Latn-AZ", also uses this case-mapping rule.

The following example demonstrates how the result of a case-insensitive Compare operation performed on the strings "FILE" and "file" differs depending on culture. The comparison returns true if the CurrentCulture property is set to the culture English (United States), designated "en-US". The comparison returns false if CurrentCulture is set to Turkish (Turkey), designated "tr-TR". The output illustrates how the results vary by culture, because the case-insensitive comparison of "I" and "I" evaluates to true for the "en-US" culture and false for the "tr-TR" culture.

Imports System.Globalization
Imports System.Threading

Public Class TurkishISample
    Public Shared Sub Main()
        ' Set the CurrentCulture property to English in the U.S.
        Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("en-US")
        Console.WriteLine("Culture = {0}", _
            Thread.CurrentThread.CurrentCulture.DisplayName)
        Console.WriteLine("(file == FILE) = {0}", _
                          String.Compare("file", "FILE", True) = 0)
        Console.WriteLine()

        ' Set the CurrentCulture property to Turkish in Turkey.
        Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("tr-TR")
        Console.WriteLine("Culture = {0}", _
            Thread.CurrentThread.CurrentCulture.DisplayName)
        Console.WriteLine("(file == FILE) = {0}", _
                          String.Compare("file", "FILE", True) = 0)
    End Sub 
End Class 
' The example displays teh following output: 
'       Culture = English (United States) 
'       (file == FILE) = True 
'        
'       Culture = Turkish (Turkey) 
'       (file == FILE) = False

using System;
using System.Globalization;
using System.Threading;

public class TurkishISample
{
    public static void Main()
    {
    // Set the CurrentCulture property to English in the U.S.
    Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("en-US");
    Console.WriteLine("Culture = {0}",   
        Thread.CurrentThread.CurrentCulture.DisplayName);
    Console.WriteLine("(file == FILE) = {0}\n", 
                      string.Compare("file", "FILE", true) == 0);

    // Set the CurrentCulture property to Turkish in Turkey.
    Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("tr-TR");
    Console.WriteLine("Culture = {0}",
                      Thread.CurrentThread.CurrentCulture.DisplayName);
    Console.WriteLine("(file == FILE) = {0}", 
                      string.Compare("file", "FILE", true) == 0);
    }
}
// The example displays the following output: 
//       Culture = English (United States) 
//       (file == FILE) = True 
//  
//       Culture = Turkish (Turkey) 
//       (file == FILE) = False

Additional Custom Case Mappings and Sorting Rules

In addition to the unique case mappings used in the Turkish and Azerbaijani alphabets, there are other custom case mappings and sorting rules that you should be aware of when considering string operations. The alphabets of nine cultures in the ASCII range (Unicode 0000 through Unicode 007F) contain two-letter pairs for which the result of a case-insensitive comparison, for example, using Compare, does not evaluate to equal when the case is mixed. These cultures are:

Croatian (Croatia), "hr-HR"
Czech (Czech Republic), "cs-CZ"
Slovak (Slovenia), "sk-SK"
Danish (Denmark), "da-DK"
Norwegian (Bokmål, Norway), "nb-NO"
Norwegian (Nynorsk, Norway), "nn-NO"
Hungarian (Hungary), "hu-HU"
Vietnamese (Vietnam), "vi-VN"
Spanish (Spain, Traditional Sort), "es-ES_tradnl"

For example, in the Danish language, a case-insensitive comparison of the two-letter pairs "aA" and "AA" is not considered equal. In the Vietnamese alphabet, a case-insensitive comparison of the two-letter pairs "nG" and "NG" is not considered equal. Although you should be aware that these rules exist, in practice, it is unusual to run into a situation where a culture-sensitive comparison of these pairs creates problems, since they are uncommon in fixed strings or identifiers.

The alphabets of six cultures within the ASCII range have standard casing rules, but different sorting rules. These cultures are:

Estonian (Estonia), "et-EE"
Finnish (Finland), "fi-FI"
Hungarian (Hungary, Technical Sort Order), "hu-HU_technl"
Lithuanian (Lithuania), "lt-LT"
Swedish (Finland), "sv-FI"
Swedish (Sweden), "sv-SE"

For example, in the Swedish alphabet, the letter "w" sorts as if it is the letter "v". In application code, sorting operations tend to be used less frequently than equality comparisons and therefore are less likely to create problems.

An additional 35 cultures have custom case mappings and sorting rules outside of the ASCII range. These rules are generally confined to the alphabets used by the specific cultures. Therefore, the likelihood that they will cause problems is low.

For details about the custom case mappings and sorting rules that apply to specific cultures, see The Unicode Standard at the Unicode home page.

Custom Case Mappings and Sorting Rules

Additional Custom Case Mappings and Sorting Rules

See Also

Concepts

Other Resources

Additional resources