Equivalent Strings
Usually, CompareString, CompareStringEx, lstrcmp, and lstrcmpi evaluate strings character-by-character. However, many languages have multiple-character elements, such as the two-character pair 'CH' in Traditional Spanish. CompareString and CompareStringEx use the locale passed in Locale (or lpLocaleName) to identify multiple-character elements, while lstrcmp and lstrcmpi use the user locale.
Another example is Vietnamese, which contains many two-character elements such as the valid uppercase, title case, and lowercase forms of 'GI', which are 'GI', 'Gi', and 'gi', respectively. Any of these forms is treated as a single collation element and, if casing is ignored, compares as equal. However, because 'gI' is not valid as a single element, CompareString, CompareStringEx, lstrcmp, and lstrcmpi treat 'gI' as two separate elements.
In contrast, CompareStringOrdinal, introduced in Windows Vista, performs a strictly binary comparison: except for the option of being case-insensitive, it disregards all non-binary equivalences, and (unlike CompareString) it will test all code points for equality, including those that are not given any weight in linguistic collation schemes.
Evaluating Strings
Typically, strings are compared using what is called a "word sort" technique. In a word sort, all punctuation marks and other nonalphanumeric characters, except for the hyphen and the apostrophe, come before any alphanumeric character. The hyphen and the apostrophe are treated differently than the other nonalphanumeric symbols, in order to ensure that words such as "coop" and "co-op" stay together within a sorted list.
If the SORT_STRINGSORT flag is specified, strings are compared using what is called a "string sort" technique. In a string sort, the hyphen and apostrophe are treated just like any other nonalphanumeric symbols. Their positions in the collating sequence are before the alphanumeric symbols.
The following table shows a list of words sorted both ways.
| Word Sort | String Sort | | Word Sort | String Sort |
|---|
| billet | bill's | | t-ant | t-ant |
| bills | billet | | tanya | t-aria |
| bill's | bills | | t-aria | tanya |
| cannot | can't | | sued | sue's |
| cant | cannot | | sues | sued |
| can't | cant | | sue's | sues |
| con | co-op | | went | we're |
| coop | con | | were | went |
| co-op | coop | | we're | were |
The lstrcmp and lstrcmpi functions use a word sort. The CompareString, LCMapString, and FindNLSString functions (and the corresponding locale-name-based CompareStringEx, LCMapStringEx, and FindNLSStringEx functions) default to using a word sort, but use a string sort if their caller sets the SORT_STRINGSORT flag.
When to use CompareString, lstrcmp, and lstrcmpi
When the comparison should follow the user's language preferences, for example, when sorting items for an ordered ListView control, do either of the following:
- Call lstrcmp or lstrcmpi, which use the user locale, or
- Call CompareString or CompareStringEx to define a locale for the comparison, to pass additional flags, to embed null characters, or to pass explicit lengths to match parts of a string.
When the results of the comparison should be consistent regardless of locale, for example, when comparing retrieved data against a predefined list or an internal value, use CompareString or CompareStringEx with the Locale parameter set to LOCALE_INVARIANT. Either of the following calls will match even if mystr is "INLAP", whereas the locale-sensitive call to lstrcmpi fails if the current locale is Vietnamese. CompareStringOrdinal can also be used for locale-insensitive comparisons.
- On Windows XP and later:
CompareString(LOCALE_INVARIANT, NORM_IGNORECASE, mystr, -1, _T("InLap"), -1);
- For earlier operating systems:
DWORD lcid = MAKELCID(MAKELANGID(LANG_ENGLISH, SUBLANG_ENGLISH_US), SORT_DEFAULT);
CompareString(lcid, NORM_IGNORECASE, mystr, -1, _T("InLap"), -1);
The CompareString function can return data from custom locales. Custom locales allow administrators to change any aspects of locale formats but this will not change collation behavior.
Applications that are intended to run only on Windows Vista and later should use CompareStringEx in preference to this function. CompareStringEx provides good support for supplemental locales. However, CompareStringEx is not supported for versions of Windows prior to Windows Vista.
Performance
The CompareString function is optimized to run at the highest speed when dwCmpFlags is set to 0 or NORM_IGNORECASE, cchCount1 and cchCount2 have the value -1, and the passed-in locale does not support any linguistic compressions (such as when Traditional Spanish collation treats "ch" as a single character).
Language-specific Notes
If you are calling the ASCII version (CompareStringA), then rather than converting via the default system code page, CompareString converts parameters via the default code page of the locale you pass in. Among other things, this means that you can never use CompareStringA to handle 8-bit Unicode Transformation Format (UTF-8) text.
The NORM_IGNORENONSPACE flag only has an effect for the locales in which accented characters are sorted in a second pass from main characters. All characters in the string are first compared without regard to accents and (if the strings are equal) a second pass over the strings is performed to compare accents. In this case, this flag causes the second pass to not be performed. (In effect, NORM_IGNORENONSPACE causes diacritics to be stripped from the string before comparison. For a discussion of the consequences, see the blog entry "Why can't we strip the diacritics?" (http://blogs.msdn.com/shawnste/archive/2007/06/08/why-can-t-we-strip-the-diacritics.aspx).) For locales that sort accented characters in the first pass, this flag has no effect.
For many scripts, NORM_IGNORENONSPACE will coincide with LINGUISTIC_IGNOREDIACRITIC and NORM_IGNORECASE will coincide with LINGUISTIC_IGNORECASE. Notably, this is true for Latin scripts. However:
NORM_IGNORENONSPACE ignores any "secondary distinction" whether it is actually a diacritic or not. Scripts for Korean, Japanese, Chinese, and Indic languages (among others) use this distinction for purposes other than diacritics. LINGUISTIC_IGNOREDIACRITIC will ignore only actual diacritics, instead of simply ignoring the second collation weight. NORM_IGNORECASE ignores any "tertiary distinction" whether it is actually linguistic case or not. For example, in Arabic and Indic scripts this distinguishes alternate forms of a character, but the differences do not correspond to linguistic case. LINGUISTIC_IGNORECASE will ignore only actual linguistic casing, instead of simply ignoring the third collation weight.
Normally, for case-insensitive comparison, this function maps the lowercase "i" to the uppercase "I", even when the Locale parameter specifies Turkish or Azeri. The NORM_LINGUISTIC_CASING flag overrides this for Turkish or Azeri. If this flag is specified, in conjunction with a value for the Locale parameter that indicated Turkish or Azeri, then LATIN SMALL LETTER DOTLESS I (U+0131) is the lowercase form of LATIN CAPITAL LETTER I (U+0049) and LATIN SMALL LETTER I (U+0069) is the lowercase form of LATIN CAPITAL LETTER I WITH DOT ABOVE (U+0130).
For double-byte character set (DBCS) locales, the flag NORM_IGNORECASE has an effect on all the wide (two-byte) characters as well as the narrow (one-byte) characters. This includes the wide Greek and Cyrillic characters.
The CompareString function ignores Arabic Kashidas during the comparison. Thus, if two strings are identical, save for the presence of Kashidas, CompareString returns a value of CSTR_EQUAL; the strings are considered "equal" in the collation sense, though they are not necessarily identical.
Custom locales
Windows Vista and later. Note that this API can return data from custom locales (see Custom Locale Information). Custom locales allow administrators to change many aspects of locale formats but this will not change collation behavior.
Security Alert Using this function incorrectly can compromise the security of your application. Strings that are not compared correctly can produce invalid input. Test strings to make sure they are valid before using them and provide error handlers. For more information, see
Security Considerations: International Features.
Windows 95/98/Me: CompareStringW is supported by the Microsoft Layer for Unicode (MSLU). To use this, you must add certain files to your application, as outlined in
Microsoft Layer for Unicode on Windows 95/98/Me Systems.