International Features
CompareStringEx

Compares two Unicode (wide character) strings, for a locale specified by name.

int CompareStringEx (
  LPCWSTR lpLocaleName,
  DWORD dwCmpFlags,
  LPCWSTR lpString1,         
  int cchCount1,
  LPCWSTR lpString2,         
  int cchCount2,
  LPNLVERSIONINFO lpVersionInformation,
  LPVOID lpReserved,
  LPARAM lParam
);

Parameters

lpLocaleName
[in] Optional. Pointer to a locale name, or one of the following predefined values. Details of these values are provided in LCTYPE Constants (National Language Support).
  • LOCALE_NAME_SYSTEM_DEFAULT
  • LOCALE_NAME_USER_DEFAULT
dwCmpFlags
[in] Flags that indicate how the function compares the two strings. By default, these flags are not set. This parameter can specify a combination of any of the following values, or it can be set to zero to obtain the default behavior.
ValueMeaning
LINGUISTIC_IGNORECASEIgnore case, as linguistically appropriate.
LINGUISTIC_IGNOREDIACRITICIgnore nonspacing characters, as linguistically appropriate.
NORM_IGNORECASEIgnore case. With this flag set, the function ignores the distinction between the wide and narrow forms of the CJK compatibility characters.
NORM_IGNOREKANATYPEDo not differentiate between Hiragana and Katakana characters. Corresponding Hiragana and Katakana characters compare as equal.
NORM_IGNORENONSPACEIgnore nonspacing characters.
NORM_IGNORESYMBOLSIgnore symbols and punctuation.
NORM_IGNOREWIDTHIgnore the difference between half-width and full-width characters, for example, C a t == cat. The full-width form is a formatting distinction used in Chinese and Japanese scripts.
NORM_LINGUISTIC_CASINGUse linguistic rules for casing, instead of file system rules (default).
SORT_STRINGSORTTreat punctuation the same as symbols.
lpString1
[in] Pointer to the first string to compare.
cchCount1
[in] Length of the string indicated by lpString1, excluding the null terminator. The application can supply a negative value if the string is null-terminated. In this case, the function determines the length automatically.
lpString2
[in] Pointer to the second string to compare.
cchCount2
[in] Length of the string indicated by lpString2, excluding the null terminator. The application can supply a negative value if the string is null-terminated. In this case, the function determines the length automatically.
lpVersionInformation
[in] Optional. Reserved; must be set to a null pointer.
lpReserved
[in] Optional. Reserved; must be set to a null pointer.
lParam
[in] Optional. Reserved; must be set to zero.

Return Values

Returns one of the following values if successful. To maintain the C runtime convention of comparing strings, the value 2 can be subtracted from a nonzero return value. Then, the meaning of <0, ==0 and >0 is consistent with the C runtime.

  • CSTR_LESS_THAN. The string indicated by lpString1 is less in lexical value than the string indicated by lpString2.
  • CSTR_EQUAL. The string indicated by lpString1 is equivalent in lexical value to the string indicated by lpString2. The two strings are equivalent for collating purposes, although not necessarily identical.
  • CSTR_GREATER_THAN. The string indicated by lpString1 is greater in lexical value than the string indicated by lpString2.

The function returns 0 if it does not succeed. To get extended error information, the application can call GetLastError. GetLastError can return one of the following error codes:

  • ERROR_INVALID_FLAGS
  • ERROR_INVALID_PARAMETER

Remarks

This function tests for linguistic equality. Applications that are concerned with linguistic equality should use this function, CompareString, lstrcmp, or lstrcmpi. For an overview of the use of the string functions, see Strings.

This function can raise security issues when used for a non-linguistic comparison, because two strings that are distinct in their binary representation can be linguistically equivalent.

Equivalent Strings

Usually CompareString, CompareStringEx, lstrcmp, and lstrcmpi evaluate strings character by character. However, many languages have multiple-character elements, such as the two-character pair "CH" in traditional Spanish. CompareString and CompareStringEx use the application-supplied locale identifier or name to identify multiple-character elements. In contrast, lstrcmp, and lstrcmpi use the user's locale.

Another example is Vietnamese, which contains many two-character elements, such as the valid uppercase, title case, and lowercase forms of "GI", which are "GI, "Gi", and "gi", respectively. Any of these forms is treated as a as a single collation element and, if casing is ignored, compares as equal. However, because "gI" is not valid as a single element, CompareString, CompareStringEx, lstrcmp, and lstrcmpi treat "gI" as two separate elements.

Evaluating Strings

Typically strings are compared using a "word sort" technique in which all punctuation marks and other nonalphanumeric characters, except for the hyphen and the apostrophe, come before any alphanumeric character. They hyphen and the apostrophe are treated differently from the other nonalphanumeric characters to ensure that words such as "coop" and "co-op" stay together in a sorted list.

If the SORT_STRINGSORT flag is specified, CompareStringEx compares strings using a "string sort" technique. This type of sort treats the hyphen and apostrophe just like any other nonalphanumeric character. Their positions in the collating sequence are before the alphanumeric characters.

The following table compares the results of a word sort with the results of a string sort.

Word SortString Sort
billetbill's
billsbillet
bill'sbills
cannotcan't
cantcannot
can'tcant
conco-op
coopcon
co-opcoop

The lstrcmp and lstrcmpi functions use a word sort. CompareString, CompareStringEx, LCMapString, LCMapStringEx, FindNLSString, and FindNLSStringEx default to use of a word sort. However, these functions use a string sort if the application sets the SORT_STRINGSORT flag.

Choosing a String Comparision Function

When the comparison follows the user's language preference, for example, when sorting items for an ordered ListView control, the application can do one of the following:

  • Call lstrcmp or lstrcmpi with the user's locale.
  • Call CompareString or CompareStringEx to define a locale for the comparison, to pass additional flags, to embed null characters, or to pass explicit lengths to match parts of a string.

When the results of the comparison should be consistent regardless of locale, for example, when comparing retrieved data against a predefined list or an internal value, the application should use CompareString or CompareStringEx with the locale set to LOCALE_INVARIANT. Either of the following calls match, even if mystr is "INLAP". The locale-sensitive call to lstrcmpi fails if the current locale is Vietnamese.

Note: Your application can also use CompareStringOrdinal for locale-insensitive string comparisons.

Performance

The CompareStringEx function is optimized to run at the highest speed when dwCmpFlags is set to 0 or NORM_IGNORECASE, cchCount1 and cchCount2 are set to -1, and the locale does not support any linguistic compressions, as when traditional Spanish collation treats "ch" as a single character.

Language-specific Notes

The NORM_IGNORENONSPACE flag only has an effect for the locales in which accented characters are sorted in a second pass from main characters. Normally all characters in the string are first compared without regard to accents and, if the strings are equal, a second pass over the strings is performed to compare accents. This flag causes the second pass to not be performed. For locales that sort accented characters in the first pass, this flag has no effect.

For many scripts, the NORM_IGNORENONSPACE flag coincides with LINGUISTIC_IGNOREDIACRITIC, and NORM_IGNORECASE coincides with LINGUISTIC_IGNORECASE. Notably, this is true for Latin scripts. Exceptions are as follows:

  • NORM_IGNORENONSPACE ignores any secondary distinction, whether it is actually a diacritic or not. Scripts for Korean, Japanese, Chinese, and Indic languages, among others, use this distinction for purposes other than diacritics. LINGUISTIC_IGNOREDIACRITIC ignores only actual diacritics, instead of simply ignoring the second collation weight.
  • NORM_IGNORECASE ignores any "tertiary distinction," whether it is actually linguistic case or not. For example, in Arabic and Indic scripts, this distinguishes alternate forms of a character, but the differences do not correspond to linguistic case. LINGUISTIC_IGNORECASE ignores only actual linguistic casing, instead of simply ignoring the third collation weight.

Normally, for case-insensitive comparisons, CompareStringEx maps the lowercase "i" to the uppercase "I", even when the locale is Turkish or Azeri. The NORM_LINGUISTIC_CASING flag overrides this behavior for Turkish or Azeri. If this flag is specified in conjunction with Turkish or Azeri, LATIN SMALL LETTER DOTLESS I (U+0131) is the lowercase form of LATIN CAPITAL LETTER I (U+0049) and LATIN SMALL LETTER I (U+0069) is the lowercase form of LATIN CAPITAL LETTER I WITH DOT ABOVE (U+0130).

CompareStringEx ignores Arabic kashidas during the comparison. Thus, if two strings are identical except for the presence of kashidas, the function returns CSTR_EQUAL.

Custom Locales

Windows Vista and later: CompareStringEx can return data from custom locales. Locales are not guaranteed to be the same from computer to computer or between runs of an application. If your application must persist or transmit data, see Using Persistent Locale Data.

Warning: Using CompareStringEx incorrectly can compromise the security of your application. Strings that are not compared correctly can produce invalid input. Test strings to make sure they are valid before using them, and provide error handlers. For more information, see Security Considerations: International Features.

Example

An example showing the use of this function is provided in NLS: Name-based APIs Sample.

Requirements

  Windows Vista: Supported in Windows Vista and later.
  Header: Declared in Winnls.h; include Windows.h.
  Library: Use Kernel32.lib.

See Also

National Language Support Functions, Collation, Using Unicode Normalization to Represent Strings, Security Considerations: International Features