Comparing and Sorting Data for a Specific Culture

Conventions for sorting and ordering data vary from culture to culture. For example, sort order may be case-sensitive or case-insensitive. It may be based on phonetics or on the visual representation of characters. In East Asian languages, characters are sorted by the stroke and radical of ideographs. Sorting also depends on the order languages and cultures use for the alphabet. For example, the Danish language has an "Æ" character that it sorts after "Z" in the alphabet. A world-ready application must be able to compare and sort data on a per-culture basis to support culture-specific and language-specific sorting conventions.

Note Note

In some scenarios, culture-sensitive behavior is not desirable. For more information about when and how to perform culture-insensitive operations, see Culture-Insensitive String Operations.

The CompareInfo class provides methods you can use to perform culture-sensitive string comparisons. The CultureInfo class has a CompareInfo property that gets an instance of the CompareInfo class. A CompareInfo object defines how to compare and sort strings for a specific culture. The String.Compare method uses the information in a culture's CompareInfo object to compare strings.

The following example illustrates how the String.Compare method evaluates two strings ("Apple" and "Æble") differently, depending on the culture used for the comparison. First, the System.Threading.Thread.CurrentThread.CurrentCulture property is set to da-DK for the Danish (Denmark) culture. The Danish language treats the character "Æ" as an individual letter and sorts it after "Z" in the alphabet. Therefore, the string "Æble" is determined to be greater than "Apple" for the Danish (Denmark) culture. Next, the System.Threading.Thread.CurrentThread.CurrentCulture property is set to en-US for the English (United States) culture, The English language treats the character "Æ" as a special symbol and sorts it before the letter "A" in the alphabet. Therefore, the string "Æble" is determined to be less than "Apple" for the English (United States) culture.

using System;
using System.Globalization;
using System.Threading;

public class CompareStringSample
{
   public static void Main()
   {
      string str1 = "Apple";
      string str2 = "Æble"; 

      // Sets the CurrentCulture to Danish in Denmark.
      Thread.CurrentThread.CurrentCulture = new CultureInfo("da-DK");
      // Compares the two strings. 
      int result1 = String.Compare(str1, str2);
      Console.WriteLine("\nWhen the CurrentCulture is \"da-DK\",\nthe " + 
                        "result of comparing {0} with {1} is: {2}", str1, str2, 
                        result1);

      // Sets the CurrentCulture to English in the U.S.
      Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");
      // Compares the two strings. 
      int result2 = String.Compare(str1, str2);
      Console.WriteLine("\nWhen the CurrentCulture is \"en-US\",\nthe " + 
                        "result of comparing {0} with {1} is: {2}", str1, str2, 
                        result2);
   }
}
// The example displays the following output: 
//    When the CurrentCulture is "da-DK",
//    the result of comparing Apple with Æble is: -1 
//     
//    When the CurrentCulture is "en-US",
//    the result of comparing Apple with Æble is: 1

For more information about comparing strings, see Comparing Strings.

Some cultures support more than one sort order. For example, the zh-CN (Chinese - PRC) culture supports two sort orders: by pronunciation (default) and by stroke count. When you create a CultureInfo object by using a culture name (for example, zh-CN), the default sort order is used. To specify the alternate sort order, create a CultureInfo object by calling the CultureInfo.CultureInfo(Int32) or CultureInfo.CultureInfo(Int32, Boolean) constructors and using the identifier for the alternate sort order, and then obtain a CompareInfo object from the CompareInfo property to use in string comparisons. Alternatively, you can create a CompareInfo object directly by using the CompareInfo.GetCompareInfo method, and specify the identifier for the alternate sort order.

The following table lists the cultures that support alternate sort orders and the identifiers for the default and alternate sort orders.

Culture name

Culture

Default sort name and identifier

Alternate sort name and identifier

es-ES

Spanish (Spain)

International: 0x00000C0A

Traditional: 0x0000040A

zh-TW

Chinese (Taiwan)

Stroke Count: 0x00000404

Bopomofo: 0x00030404

zh-CN

Chinese (PRC)

Pronunciation: 0x00000804

Stroke Count: 0x00020804

zh-HK

Chinese (Hong Kong SAR)

Stroke Count: 0x00000c04

Stroke Count: 0x00020c04

zh-SG

Chinese (Singapore)

Pronunciation: 0x00001004

Stroke Count: 0x00021004

zh-MO

Chinese (Macao SAR)

Pronunciation: 0x00001404

Stroke Count: 0x00021404

ja-JP

Japanese (Japan)

Default: 0x00000411

Unicode: 0x00010411

ko-KR

Korean (Korea)

Default: 0x00000412

Korean Xwansung - Unicode: 0x00010412

de-DE

German (Germany)

Dictionary: 0x00000407

Phone Book Sort DIN: 0x00010407

hu-HU

Hungarian (Hungary)

Default: 0x0000040e

Technical Sort: 0x0001040e

ka-GE

Georgian (Georgia)

Traditional: 0x00000437

Modern Sort: 0x00010437

You can call the overloaded CompareInfo.IndexOf method to retrieve the zero-based index of a character or substring within a specified string. The method returns -1 if the character or substring is not found. When searching for a specified character, the IndexOf overloads that accept a parameter of type CompareOptions may perform the comparison differently from the method overloads that do not accept this parameter. The method overloads without this parameter perform a culture-sensitive, case-sensitive search. For example, a Unicode value that represents a precomposed character such as the ligature "Æ" (\u00C6) might be considered equivalent to any occurrence of its components in the correct sequence, such as "AE" (\u0041\u0045), depending on the culture. To perform an ordinal (culture-insensitive) search for exact Unicode values, use one of the CompareInfo.IndexOf overloads that take a parameter of type CompareOptions and set the parameter to Ordinal.

You can also call overloads of the String.IndexOf method that search for a character to perform an ordinal (culture-insensitive) search. Note that the overloads of this method that search for a string perform a culture-sensitive search.

The following example illustrates the difference in the results returned by the CompareInfo.IndexOf method depending on culture. The example creates a CultureInfo object for the Danish (Denmark) and English (United States) cultures and uses the overloads of the CompareInfo.IndexOf method to search for the character "æ" in the strings "æble" and "aeble". For the Danish (Denmark) culture, the CompareInfo.IndexOf(String, Char) method and the CompareInfo.IndexOf(String, Char, CompareOptions) method that has a comparison option of CompareOptions.Ordinal return the same value for each string. This indicates that the character "æ" is considered equivalent only to the Unicode value \u00E6. For the English (United States) culture, the two overloads return different results when searching for "æ" in the string "aeble". This indicates that the culture-sensitive comparison performed by the CompareInfo.IndexOf(String, Char) method evaluates the character "æ" as equivalent to its components "a" and "e".

using System;
using System.Globalization;
using System.Threading;

public class Example
{
   public static void Main()
   {
      string str1 = "æble";
      string str2 = "aeble";
      char find = 'æ';

      // Create CultureInfo objects representing the Danish (Denmark) 
      // and English (United States) cultures.
      CultureInfo[] cultures = { CultureInfo.CreateSpecificCulture("da-DK"), 
                                 CultureInfo.CreateSpecificCulture("en-US") };

      foreach (var ci in cultures) {
         Thread.CurrentThread.CurrentCulture = ci;

         int result1 = ci.CompareInfo.IndexOf(str1, find);
         int result2 = ci.CompareInfo.IndexOf(str2, find);
         int result3 = ci.CompareInfo.IndexOf(str1, find,  
                                              CompareOptions.Ordinal);
         int result4 = ci.CompareInfo.IndexOf(str2, find, 
                                              CompareOptions.Ordinal);      

         Console.WriteLine("\nThe current culture is {0}", 
                           CultureInfo.CurrentCulture.Name);
         Console.WriteLine("\n   CompareInfo.IndexOf(string, char) method:");
         Console.WriteLine("   Position of {0} in the string {1}: {2}", 
                           find, str1, result1);

         Console.WriteLine("\n   CompareInfo.IndexOf(string, char) method:");
         Console.WriteLine("   Position of {0} in the string {1}: {2}", 
                           find, str2, result2);

         Console.WriteLine("\n   CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method");
         Console.WriteLine("   Position of {0} in the string {1}: {2}", 
                           find, str1, result3);

         Console.WriteLine("\n   CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method");
         Console.WriteLine("   Position of {0} in the string {1}: {2}", 
                           find, str2, result4);
         Console.WriteLine();
      }   
   }
}
// The example displays the following output 
//    The current culture is da-DK 
//     
//       CompareInfo.IndexOf(string, char) method: 
//       Position of æ in the string æble: 0 
//     
//       CompareInfo.IndexOf(string, char) method: 
//       Position of æ in the string aeble: -1 
//     
//       CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method 
//       Position of æ in the string æble: 0 
//     
//       CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method 
//       Position of æ in the string aeble: -1 
//     
//     
//    The current culture is en-US 
//     
//       CompareInfo.IndexOf(string, char) method: 
//       Position of æ in the string æble: 0 
//     
//       CompareInfo.IndexOf(string, char) method: 
//       Position of æ in the string aeble: 0 
//     
//       CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method 
//       Position of æ in the string æble: 0 
//     
//       CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method 
//       Position of æ in the string aeble: -1

You can use some of the overloads of the Array.Sort method to sort arrays based on the current culture. The following example creates an array of three strings. First, it sets the System.Threading.Thread.CurrentThread.CurrentCulture property to en-US and calls the Array.Sort(Array) method. The resulting sort order is based on sorting conventions for the English (United States) culture. Next, the example sets the System.Threading.Thread.CurrentThread.CurrentCulture property to da-DK and calls the Array.Sort method again. Notice how the resulting sort order differs from the en-US results because it uses the sorting conventions for Danish (Denmark).

using System;
using System.Globalization;
using System.Threading;

public class ArraySort 
{
   public static void Main(String[] args) 
   {
      // Create and initialize a new array to store the strings. 
      string[] stringArray = { "Apple", "Æble", "Zebra"};

      // Display the values of the array.
      Console.WriteLine( "The original string array:");
      PrintIndexAndValues(stringArray);

      // Set the CurrentCulture to "en-US".
      Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");
      // Sort the values of the array.
      Array.Sort(stringArray);

      // Display the values of the array.
      Console.WriteLine("After sorting for the culture \"en-US\":");
      PrintIndexAndValues(stringArray); 

      // Set the CurrentCulture to "da-DK".
      Thread.CurrentThread.CurrentCulture = new CultureInfo("da-DK");
      // Sort the values of the Array.
      Array.Sort(stringArray);

      // Display the values of the array.
      Console.WriteLine("After sorting for the culture \"da-DK\":");
      PrintIndexAndValues(stringArray); 
   }
   public static void PrintIndexAndValues(string[] myArray)  
   {
      for (int i = myArray.GetLowerBound(0); i <= 
            myArray.GetUpperBound(0); i++ )
         Console.WriteLine("[{0}]: {1}", i, myArray[i]);
      Console.WriteLine();      
   }
}
// The example displays the following output: 
//       The original string array: 
//       [0]: Apple 
//       [1]: Æble 
//       [2]: Zebra 
//        
//       After sorting for the "en-US" culture:
//       [0]: Æble 
//       [1]: Apple 
//       [2]: Zebra 
//        
//       After sorting for the culture "da-DK":
//       [0]: Apple 
//       [1]: Zebra 
//       [2]: Æble

The .NET Framework uses sort keys to support culturally sensitive sort operations. Each character in a string is given several categories of sort weights, including alphabetic, case, and diacritic. A sort key provides a repository of these weights for a particular string. For example, a sort key might contain a string of alphabetic weights, followed by a string of case weights, and so on. For additional information about sort keys, see the Unicode Standard at the Unicode Technical Standard #10: Unicode Collation Algorithm.

In the .NET Framework, the SortKey class maps strings to their sort keys. You can use the CompareInfo.GetSortKey method to create a sort key for a string that you specify. The result is a sequence of bytes that can differ depending on the CurrentCulture property and the CompareOptions value specified. For example, if you specify the value CompareOptions.IgnoreCase when creating a sort key, a string comparison operation using the sort key is case-insensitive.

After you create a sort key for a string, you can pass it as a parameter to methods provided by the SortKey class. The SortKey.Compare method lets you compare sort keys. Because this method performs a simple byte-by-byte comparison, it is much faster than the String.Compare method. If your application performs a large number of sorting operations, you can improve its performance by generating and storing sort keys for all the strings that it uses. When a sort or comparison operation is required, the application can use the sort keys instead of the strings.

The following example creates sort keys for two strings (str1 and str2) when the CurrentCulture property is set to da-DK. It compares the two strings by using the SortKey.Compare method and displays the results. The method returns a negative integer if str1 is less than str2, 0 (zero) if str1 and str2 are equal, and a positive integer if str1 is greater than str2. Next, the example sets the System.Threading.Thread.CurrentThread.CurrentCulture property to en-US and creates new sort keys for the same strings. The example compares the sort keys and displays the results. Notice that the sort results differ based on the setting for the current culture. Although the results of the following example are identical to the results of the Comparing Strings example earlier in this topic, using the SortKey.Compare method is faster than using the String.Compare method.

using System;
using System.Threading;
using System.Globalization;

public class SortKeySample 
{
   public static void Main(String[] args) 
   {
      String str1 = "Apple";
      String str2 = "Æble";

      // Set the CurrentCulture to "da-DK".
      CultureInfo dk = new CultureInfo("da-DK");
      Thread.CurrentThread.CurrentCulture = dk;

      // Create a culturally sensitive sort key for str1.
      SortKey sc1 = dk.CompareInfo.GetSortKey(str1);
      // Create a culturally sensitive sort key for str2.
      SortKey sc2 = dk.CompareInfo.GetSortKey(str2);

      // Compare the two sort keys and display the results. 
      int result1 = SortKey.Compare(sc1, sc2);
      Console.WriteLine("When the CurrentCulture is \"da-DK\",");
      Console.WriteLine("the result of comparing {0} with {1} is: {2}\n", 
                        str1, str2, result1);

      // Set the CurrentCulture to "en-US".
      CultureInfo enus = new CultureInfo("en-US");
      Thread.CurrentThread.CurrentCulture = enus ;

      // Create a culturally sensitive sort key for str1.
      SortKey sc3 = enus.CompareInfo.GetSortKey(str1);
      // Create a culturally sensitive sort key for str1.
      SortKey sc4 = enus.CompareInfo.GetSortKey(str2);

      // Compare the two sort keys and display the results. 
      int result2 = SortKey.Compare(sc3, sc4);
      Console.WriteLine("When the CurrentCulture is \"en-US\",");
      Console.WriteLine("the result of comparing {0} with {1} is: {2}", 
                        str1, str2, result2);
   }
}
// The example displays the following output: 
//       When the CurrentCulture is "da-DK",
//       the result of comparing Apple with Æble is: -1 
//        
//       When the CurrentCulture is "en-US",
//       the result of comparing Apple with Æble is: 1

You can normalize strings to uppercase or lowercase before sorting them. Rules for string sorting and casing are language-specific, and rules vary even within Latin script-based languages. Only a few languages (including English) provide a sort order that matches the order of the code points; for example, A [65] comes before B [66]. For this reason, do not rely on code points to perform accurate sorting and string comparisons.

The .NET Framework supports all Unicode normalization forms, and does not enforce or guarantee a specific form of normalization. You are responsible for choosing the appropriate normalization for your applications.

For more information about string normalization, see the "Normalization" section in the String class topic.

Was this page helpful?
(1500 characters remaining)
Thank you for your feedback
Show:
© 2014 Microsoft