This topic has not yet been rated - Rate this topic

Custom Case Mappings and Sorting Rules

Case mappings, alphabetical order, and conventions for sequencing items vary from culture to culture. You should be aware of these variations and understand that they can cause the results of string operations to vary depending on culture.

The unique case-mapping rules for the Turkish alphabet illustrate how uppercase and lowercase mappings differ from language to language even when they use most of the same letters. In most Latin alphabets, the character "i" (U+0069) is the lowercase version of the character "I" (U+0049). However, the Turkish alphabet has two versions of both the uppercase and lowercase "I": one with a dot and one without a dot. In Turkish, the character "I" (U+0049) is considered the uppercase version of the character "ı" (U+0131), whereas "İ" (U+0130) is considered the uppercase version of the character "i" (U+0069). As a result, a case-insensitive string comparison of the characters "i" (U+0069) and "I" (U+0049) that succeeds for most cultures fails for the culture Turkish (Turkey), designated tr-TR.

Note Note

The Azerbaijani culture (Azerbaijan, Latin), designated az-Latn-AZ, also uses this case-mapping rule.

The following code example demonstrates how the result of a case-insensitive String.Compare operation performed on the strings "FILE" and "file" differs depending on culture. The comparison returns true if the Thread.CurrentThread.CurrentCulture property is set to the culture English (United States), designated en-US. The comparison returns false if the current culture is set to Turkish (Turkey), designated tr-TR.

using System;
using System.Globalization;
using System.Threading;

public class Example
{
    public static void Main()
    {
       // Set the CurrentCulture property to English in the U.S.
       Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");
       Console.WriteLine("Culture = {0}",   
                         Thread.CurrentThread.CurrentCulture.DisplayName);
       Console.WriteLine("(file == FILE) = {0}\n", (string.Compare("file", 
                         "FILE", true) == 0));

       // Set the CurrentCulture property to Turkish in Turkey.
       Thread.CurrentThread.CurrentCulture = new CultureInfo("tr-TR");
       Console.WriteLine("Culture = {0}",
                         Thread.CurrentThread.CurrentCulture.DisplayName);
       Console.WriteLine("(file == FILE) = {0}", (string.Compare("file", 
                         "FILE", true) == 0));
    }
}
// The example displays the following output: 
//      Culture = English (United States) 
//      (file == FILE) = True 
// 
//      Culture = Turkish (Turkey) 
//      (file == FILE) = False

In addition to the unique case mappings used in the Turkish and Azerbaijani alphabets, there are other custom case mappings and sorting rules that you should be aware of when considering string operations. The alphabets of nine cultures in the ASCII range (Unicode 0000 through Unicode 007F) contain two-letter pairs for which the result of a case-insensitive comparison, for example, using String.Compare, does not evaluate to equal when the case is mixed. These cultures are:

  • Croatian (Croatia), hr-HR

  • Czech (Czech Republic), cs-CZ

  • Slovak (Slovenia), sk-SK

  • Danish (Denmark), da-DK

  • Norwegian (Bokmål, Norway), nb-NO

  • Norwegian (Nynorsk, Norway), nn-NO

  • Hungarian (Hungary), hu-HU

  • Vietnamese (Vietnam), vi-VN

  • Spanish (Spain, Traditional Sort), es-ES_tradnl

For example, in the Danish language, a case-insensitive comparison of the two-letter pairs "aA" and "AA" is not considered equal. In the Vietnamese alphabet, a case-insensitive comparison of the two-letter pairs "nG" and "NG" is not considered equal. Although you should be aware that these rules exist, in practice, it is unusual to run into a situation where a culture-sensitive comparison of these pairs creates problems, since they are uncommon in fixed strings or identifiers.

The alphabets of six cultures within the ASCII range have standard casing rules, but different sorting rules. These cultures are:

  • Estonian (Estonia), et-EE

  • Finnish (Finland), fi-FI

  • Hungarian (Hungary, Technical Sort Order), hu-HU_technl

  • Lithuanian (Lithuania), lt-LT

  • Swedish (Finland), sv-FI

  • Swedish (Sweden), sv-SE

For example, in the Swedish alphabet, the letter "w" sorts as if it is the letter "v". In application code, sorting operations tend to be used less frequently than equality comparisons and therefore are less likely to create problems.

An additional 35 cultures have custom case mappings and sorting rules outside of the ASCII range. These rules are generally confined to the alphabets used by the specific cultures. Therefore, the likelihood that they will cause problems is low.

For details about the custom case mappings and sorting rules that apply to specific cultures, see The Unicode Standard at the Unicode home page.

Did you find this helpful?
(1500 characters remaining)
© 2013 Microsoft. All rights reserved.