Export (0) Print
Expand All

Comparing and Sorting Data for a Specific Culture

Alphabetical order and conventions for sequencing items vary from culture to culture. For example, sort order can be case-sensitive or case-insensitive. It can be phonetically based or based on the appearance of the character. In East Asian languages, sorts are ordered by the stroke and radical of ideographs. Sorts can also vary depending on the fundamental order the language and culture use for the alphabet. For example, the Swedish language has an "Æ" character that it sorts after "Z" in the alphabet. The German language also has this character, but sorts it like "ae", after "A" in the alphabet. A world-ready application must be able to compare and sort data on a per-culture basis to support culture-specific and language-specific sorting conventions.

Note   In some scenarios culture-sensitive behavior is not desirable. For more information about when and how to perform culture-insensitive operations, see Culture-Insensitive String Operations.

The CompareInfo class provides a set of methods you can use to perform culture-sensitive string comparisons. The CultureInfo class has a CompareInfo property that is an instance of this class. This property defines how to compare and sort strings for a specific culture. The static String.Compare method uses the information in the CultureInfo.CompareInfo property to compare two strings. The String.Compare method returns a negative integer if the first string precedes the second string in the sort order, zero if the two strings are equal, and a positive integer if the first string follows the second string in the sort order.

The following example illustrates how two strings can be evaluated differently by the String.Compare method, depending upon the culture used to perform the comparison. First, the Thread.CurrentCulture is set to da-DK for the Danish (Denmark) culture, and the strings "Apple" and "Æble" are compared. The Danish language treats the character "Æ" as an individual letter, sorting it after "Z" in the alphabet. Therefore, the string "Æble" is greater than "Apple" for the Danish culture. Next, the Thread.CurrentCulture is set to en-US for the English (United States) culture, and the strings "Apple" and "Æble" are compared again. This time, the string "Æble" is determined to be less than "Apple". The English language treats the character "Æ" as a special symbol, sorting it before the letter "A" in the alphabet.

using System;
using System.Globalization;
using System.Threading;

public class CompareStringSample
{
   public static void Main()
   {
      string str1 = "Apple";
      string str2 = "Æble"; 

      // Set the CurrentCulture to Danish in Denmark.
      Thread.CurrentThread.CurrentCulture = new CultureInfo("da-DK");
      // Compare the two strings. 
      int result1 = String.Compare(str1, str2);
      Console.WriteLine("When the CurrentCulture is \"da-DK\",\n" + 
            " the result of comparing {0} with {1} is: {2}",
            str1, str2, result1);

      // Set the CurrentCulture to English in the U.S.
      Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");
      // Compare the two strings. 
      int result2 = String.Compare(str1, str2);
      Console.WriteLine("When the CurrentCulture is \"en-US\",\n" +  
            " the result of comparing {0} with {1} is: {2}",
            str1, str2, result2);
   }
}
// The example displays the following output: 
//       When the CurrentCulture is "da-DK",
//        the result of comparing Apple with Æble is: -1 
//       When the CurrentCulture is "en-US",
//        the result of comparing Apple with Æble is: 1

For more information on comparing strings, see Comparing Strings.

Some cultures support more than one sort order. For example, the culture Chinese (PRC), with the name zh-CN, supports a sort by pronunciation (default) and a sort by stroke count. When your application creates a CultureInfo object using a culture name, for example, zh-CN, the default sort order is used. To specify the alternate sort order, the application should create a CultureInfo object using the identifier for the alternate sort order. Then, the application should obtain a CompareInfo object from the CultureInfo.CompareInfo property to use in string comparisons. Alternatively, your application can create a CompareInfo object directly by calling the static CompareInfo.GetCompareInfo(Int32) method and specifying the identifier for the alternate sort order.

The following table lists the cultures that support alternate sort orders and the identifiers for the default and alternate sort orders.

Culture name

Culture

Default sort name and identifier

Alternate sort name and identifier

es-ES

Spanish (Spain)

International: 0x00000C0A

Traditional: 0x0000040A

zh-TW

Chinese (Taiwan)

Stroke Count: 0x00000404

Bopomofo: 0x00030404

zh-CN

Chinese (PRC)

Pronunciation: 0x00000804

Stroke Count: 0x00020804

zh-HK

Chinese (Hong Kong SAR)

Stroke Count: 0x00000c04

Stroke Count: 0x00020c04

zh-SG

Chinese (Singapore)

Pronunciation: 0x00001004

Stroke Count: 0x00021004

zh-MO

Chinese (Macao SAR)

Pronunciation: 0x00001404

Stroke Count: 0x00021404

ja-JP

Japanese (Japan)

Default: 0x00000411

Unicode: 0x00010411

ko-KR

Korean (Korea)

Default: 0x00000412

Korean Xwansung - Unicode: 0x00010412

de-DE

German (Germany)

Dictionary: 0x00000407

Phone Book Sort DIN: 0x00010407

hu-HU

Hungarian (Hungary)

Default: 0x0000040e

Technical Sort: 0x0001040e

ka-GE

Georgian (Georgia)

Traditional: 0x00000437

Modern Sort: 0x00010437

Your application can use the overloaded CompareInfo.IndexOf method to retrieve the zero-based index of a character or substring within a specified string. The method retrieves a negative integer if the character or substring is not found in the specified string. When searching for a specified character using CompareInfo.IndexOf, the application should take into account that the method overloads that accept a CompareOptions parameter perform the comparison differently from the method overloads that do not accept this parameter. The method overloads that search for a character type and do not take a CompareOptions parameter perform a culture-sensitive search. This if a Unicode value represents a precomposed character, such as the ligature "Æ" (\u00C6), it might be considered equivalent to any occurrence of its components in the correct sequence, such as "AE" (\u0041\u0045), depending on the culture. To perform an ordinal (culture-insensitive) search, for which a character type is considered equivalent to another character type only if the Unicode values are the same, the application should use one of the CompareInfo.IndexOf overloads that take a CompareOptions parameter and set the parameter to the Ordinal value.

Your applications can also use overloads of the String.IndexOf method that search for a character to perform an ordinal (culture-insensitive) search. Note that the overloads of this method that search for a string perform a culture-sensitive search.

The following example illustrates the difference in the results retrieved by the IndexOf method depending on culture. A CultureInfo object is created for da-DK, for the culture Danish (Denmark). Next, overloads of the CompareInfo.IndexOf method are used to search for the character "Æ" in the strings "Æble" and "aeble." Note that, for da-DK, the CompareInfo.IndexOf method that takes a CompareOptions parameter set to Ordinal and the same method that does not take this parameter retrieve the same thing. The character "Æ" is only considered equivalent to the Unicode code value \u00E6.

using System;
using System.Globalization;
using System.Threading;

public class Example
{
   public static void Main()
   {
      string str1 = "æble";
      string str2 = "aeble";
      char find = 'æ';

      // Create CultureInfo objects representing the Danish (Denmark) 
      // and English (United States) cultures.
      CultureInfo[] cultures = { CultureInfo.CreateSpecificCulture("da-DK"), 
                                 CultureInfo.CreateSpecificCulture("en-US") };

      foreach (var ci in cultures) {
         Thread.CurrentThread.CurrentCulture = ci;

         int result1 = ci.CompareInfo.IndexOf(str1, find);
         int result2 = ci.CompareInfo.IndexOf(str2, find);
         int result3 = ci.CompareInfo.IndexOf(str1, find,  
                                              CompareOptions.Ordinal);
         int result4 = ci.CompareInfo.IndexOf(str2, find, 
                                              CompareOptions.Ordinal);      

         Console.WriteLine("\nThe current culture is {0}", 
                           CultureInfo.CurrentCulture.Name);
         Console.WriteLine("\n   CompareInfo.IndexOf(string, char) method:");
         Console.WriteLine("   Position of {0} in the string {1}: {2}", 
                           find, str1, result1);

         Console.WriteLine("\n   CompareInfo.IndexOf(string, char) method:");
         Console.WriteLine("   Position of {0} in the string {1}: {2}", 
                           find, str2, result2);

         Console.WriteLine("\n   CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method");
         Console.WriteLine("   Position of {0} in the string {1}: {2}", 
                           find, str1, result3);

         Console.WriteLine("\n   CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method");
         Console.WriteLine("   Position of {0} in the string {1}: {2}", 
                           find, str2, result4);
         Console.WriteLine();
      }   
   }
}
// The example displays the following output 
//    The current culture is da-DK 
//     
//       CompareInfo.IndexOf(string, char) method: 
//       Position of æ in the string æble: 0 
//     
//       CompareInfo.IndexOf(string, char) method: 
//       Position of æ in the string aeble: -1 
//     
//       CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method 
//       Position of æ in the string æble: 0 
//     
//       CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method 
//       Position of æ in the string aeble: -1 
//     
//     
//    The current culture is en-US 
//     
//       CompareInfo.IndexOf(string, char) method: 
//       Position of æ in the string æble: 0 
//     
//       CompareInfo.IndexOf(string, char) method: 
//       Position of æ in the string aeble: 0 
//     
//       CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method 
//       Position of æ in the string æble: 0 
//     
//       CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method 
//       Position of æ in the string aeble: -1

If your application replaces CultureInfo ci = new CultureInfo ("da-DK") with CultureInfo ci = new CultureInfo ("en-US"), the CompareInfo.IndexOf method with the CompareOptions parameter set to Ordinal and the same method without this parameter retrieve different results. The culture-sensitive comparison performed by the IndexOf method evaluates the character "Æ" as equivalent to its components "ae". The ordinal (culture-insensitive) comparison performed by the IndexOf method does not retrieve character "Æ" equivalent to "ae" because their Unicode code values do not match.

When you recompile and execute the code for en-US, representing English (United States), the following output is produced:

The CurrentCulture property is set to English (United States) 

Using CompareInfo.IndexOf(string, char) method
the result of searching for Æ in the string Æble is: 0

Using CompareInfo.IndexOf(string, char) method
the result of searching for Æ in the string aeble is: 0

Using CompareInfo.IndexOf(string, char, CompareOptions) method
the result of searching for Æ in the string Æble is: 0

Using CompareInfo.IndexOf(string, char, CompareOptions) method
the result of searching for Æ in the string aeble is: -1

The Array class provides an overloaded Sort method that allows your application to sort arrays based on the CultureInfo.CurrentCulture property. In the following example, an array of three strings is created. First, the CultureInfo.CurrentCulture property is set to en-US, and the Array.Sort method is called. The resulting sort order is based on sorting conventions for the English (United States) culture. Next, the Thread.CurrentCulture property is set to da-DK, and the Array.Sort method is called again. Notice how the resulting sort order differs from the en-US results because the sorting conventions for da-DK are used.

using System;
using System.Threading;
using System.Globalization;

public class ArraySort 
{
   public static void Main() 
   {
      string[] stringArray = { "Apple", "Æble", "Zebra" };

      // Display the values of the array.
      Console.WriteLine("The array initially contains the following strings:");
      PrintIndexAndValues(stringArray);

      // Sets the current culture to "en-US".
      Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("en-US");
      // Sort the values of the array.
      Array.Sort(stringArray);
      // Display the values of the array.
      Console.WriteLine( "After sorting for the \"en-US\" culture:");
      PrintIndexAndValues(stringArray); 

      // Set the current culture to "da-DK".
      Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("da-DK");
      // Sort the values of the array.
      Array.Sort(stringArray);
      // Display the values of the array.
      Console.WriteLine( "After sorting for the \"da-DK\" culture:");
      PrintIndexAndValues(stringArray); 
   }

   public static void PrintIndexAndValues(string[] values)  
   {
      foreach (var value in values)
         Console.WriteLine(value);
      Console.WriteLine();
   }
}
// The example displays the following output: 
//       The Array initially contains the following strings: 
//       Apple 
//       Æble 
//       Zebra 
//        
//       After sorting for the "en-US" culture:
//       Æble 
//       Apple 
//       Zebra 
//        
//       After sorting for the culture "da-DK":
//       Apple 
//       Zebra 
//       Æble

Sort keys are used to support culturally sensitive sorts. Based on the Unicode Standard, each character in a string is given several categories of sort weights, including alphabetic, case, and diacritic weights. A sort key serves as the repository of these weights for a particular string. For example, a sort key might contain a string of alphabetic weights, followed by a string of case weights, and so on. For additional information on sort key concepts, see The Unicode Standard at the Unicode home page.

In the .NET Framework, the SortKey class maps strings to their sort keys, and vice versa. Your applications can use the CompareInfo.GetSortKey method to create a sort key for a string that you specify. The resulting sort key for a specified string is a sequence of bytes that can differ depending upon the CurrentCulture and the CompareOptions value specified. For example, if the application specifies the value IgnoreCase when creating a sort key, a string comparison operation using the sort key ignores case.

After creating a sort key for a string, the application can pass it as a parameter to methods provided by the SortKey class. The Compare method allows comparison of sort keys. Because this method performs a simple byte-by-byte comparison, using it is much faster than using String.Compare. Applications that are sorting-intensive can improve performance by generating and storing sort keys for all the strings that are used. When a sort or comparison operation is required, the application can use the sort keys instead of the strings.

The following code example creates sort keys for two strings when the CurrentCulture is set to da-DK. It compares the two strings using the SortKey.Compare method and displays the results. The method returns a negative integer if string1 is less than string2, zero (0) if string1 and string2 are equal, and a positive integer if string1 is greater than string2. Next, the CurrentCulture property is set to en-US and sort keys are created for the same strings. The sort keys for the strings are compared and the results are displayed. Notice that the sort results differ based on the setting for CurrentCulture. Although the results of the following code example are identical to the results of comparing these strings in the Comparing Strings example earlier in this topic, using the SortKey.Compare method is faster than using the String.Compare method.

using System;
using System.Threading;
using System.Globalization;

public class SortKeySample 
{
   public static void Main(String[] args) 
   {
      String str1 = "Apple";
      String str2 = "Æble";

      // Set the current culture to "da-DK".
      CultureInfo dk = CultureInfo.CreateSpecificCulture("da-DK");
      Thread.CurrentThread.CurrentCulture = dk;

      // Create a culturally sensitive sort key for str1.
      SortKey sc1 = dk.CompareInfo.GetSortKey(str1);
      // Create a culturally sensitive sort key for str2.
      SortKey sc2 = dk.CompareInfo.GetSortKey(str2);

      // Compare the two sort keys and display the results. 
      int result1 = SortKey.Compare(sc1, sc2);
      Console.WriteLine("Current culture: {0}", 
                        CultureInfo.CurrentCulture.Name);
      Console.WriteLine("Result of comparing {0} with {1}: {2}\n", 
                        str1, str2, result1);

      // Set the current culture to "en-US".
      CultureInfo enus = CultureInfo.CreateSpecificCulture("en-US");
      Thread.CurrentThread.CurrentCulture = enus ;

      // Create a culturally sensitive sort key for str1.
      SortKey sc3 = enus.CompareInfo.GetSortKey(str1);
      // Create a culturally sensitive sort key for str1.
      SortKey sc4 = enus.CompareInfo.GetSortKey(str2);

      // Compare the two sort keys and display the results. 
      int result2 = SortKey.Compare(sc3, sc4);
      Console.WriteLine("Current culture: {0}", 
                        CultureInfo.CurrentCulture.Name);
      Console.WriteLine("Result of comparing {0} with {1}: {2}\n", 
                        str1, str2, result2);
   }
}
// The example displays the following output: 
//       Current culture: da-DK 
//       Result of comparing Apple with Æble: -1 
//        
//       Current culture: en-US 
//       Result of comparing Apple with Æble: 1

Your application can normalize strings to either uppercase or lowercase before sorting. Rules for string sorting and casing are language-specific. For example, even within Latin script-based languages, there are different composition and sorting rules. There are only a few languages (including English) for which the sort order matches the order of the code points, for example, A [65] comes before B [66].

Your applications should not rely on code points to perform accurate sorting and string comparisons. In addition, the .NET Framework does not enforce or guarantee a specific form of normalization. You are responsible for performing the appropriate normalization in the applications that you develop.

For more information on string normalization, see Normalization and Sorting.

Community Additions

ADD
Show:
© 2014 Microsoft