International Features
LCMapString
For a locale specified by identifier, maps one input character string to another using a specified transformation, or generates a sort key for the input string.
|
int LCMapString(
LCID Locale,
DWORD dwMapFlags,
LPCTSTR lpSrcStr,
int cchSrc,
LPTSTR lpDestStr,
int cchDest
); |
Parameters
- Locale
- [in] Locale identifier that specifies the locale. You can use the MAKELCID macro to create a locale identifier or use one of the following predefined values. For more information, see Locale Identifier Constants and Strings.
- LOCALE_INVARIANT
- LOCALE_SYSTEM_DEFAULT
- LOCALE_USER_DEFAULT
Windows Vista and later: The following custom locale identifiers are also supported.
- LOCALE_CUSTOM_DEFAULT
- LOCALE_CUSTOM_UI_DEFAULT
- LOCALE_CUSTOM_UNSPECIFIED
- dwMapFlags
- [in] Flags specifying the type of transformation to use during string mapping or the type of sort key to generate.
| Flag | Meaning |
| LCMAP_BYTEREV | Windows NT/2000/XP/Server 2003/Vista: Use byte reversal. For example, if the application passes in 0x3450 0x4822, the result is 0x5034 0x2248. |
| LCMAP_FULLWIDTH | Use wide characters where applicable. |
| LCMAP_HALFWIDTH | Use narrow characters where applicable. |
| LCMAP_HIRAGANA | Map all Katakana characters to Hiragana. |
| LCMAP_KATAKANA | Map all Hiragana characters to Katakana. |
| LCMAP_LINGUISTIC_CASING | Windows NT 4.0 and later: Use linguistic rules for casing, instead of file system rules (default). This flag is valid with LCMAP_LOWERCASE or LCMAP_UPPERCASE only. |
| LCMAP_LOWERCASE | For locales and scripts capable of handling uppercase and lowercase, map all characters to lowercase. |
| LCMAP_SIMPLIFIED_CHINESE | Windows NT 4.0 and later: Map traditional Chinese characters to simplified Chinese characters. |
| LCMAP_SORTKEY | Produce a normalized wide character sort key. If the LCMAP_SORTKEY flag is not specified, the function performs string mapping. For details of sort key generation and string mapping, see the Remarks section. |
| LCMAP_TRADITIONAL_CHINESE | Windows NT 4.0 and later: Map simplified Chinese characters to traditional Chinese characters. |
| LCMAP_UPPERCASE | For locales and scripts capable of handling uppercase and lowercase, map all characters to uppercase. |
The following flags can be used alone, with one another, or with the LCMAP_SORTKEY and/or LCMAP_BYTEREV flags. However, they cannot be combined with the other flags listed above.
| Flag | Meaning |
| NORM_IGNORENONSPACE | Ignore nonspacing characters. |
| NORM_IGNORESYMBOLS | Ignore symbols. |
The flags listed below are used only with the LCMAP_SORTKEY flag.
| Flag | Meaning |
| LINGUISTIC_IGNORECASE | Ignore case, as linguistically appropriate. |
| LINGUISTIC_IGNOREDIACRITIC | Ignore nonspacing characters, as linguistically appropriate. |
| NORM_IGNORECASE | Ignore case. |
| NORM_IGNOREKANATYPE | Do not differentiate between Hiragana and Katakana characters. Corresponding Hiragana and Katakana compare as equal. |
| NORM_IGNOREWIDTH | Do not differentiate between a single-byte character and the same character as a double-byte character. |
| NORM_LINGUISTIC_CASING | Use linguistic rules for casing, instead of file system rules (default). |
| SORT_STRINGSORT | Treat punctuation the same as symbols. |
- lpSrcStr
- [in] Pointer to a source string that the function maps or uses for sort key generation. This string cannot have a size of 0.
- cchSrc
- [in] Size, in TCHAR values, of the source string indicated by lpSrcStr. The size of the source string can include the terminating null character, but does not have to. If the terminating null character is included, the mapping behavior of the function is not greatly affected because the terminating null character is considered to be unsortable and always maps to itself.
The application cannot set this parameter to 0. The application can set the parameter to any negative value to specify that the source string is null-terminated. In this case, if LCMapString is being used in its string-mapping mode, the function calculates the string length itself, and null-terminates the mapped string indicated by lpDestStr.
- lpDestStr
- [out] Pointer to a buffer in which this function retrieves the mapped string or sort key. If the application specifies LCMAP_SORTKEY, the function stores a sort key in the buffer, as an array of byte values in the following format:
[all Unicode sort weights] 0x01 [all Diacritic weights] 0x01 [all Case weights] 0x01 [all Special weights] 0x01 0x00
Note that the sort key is null-terminated, regardless of the value of cchSrc. Even if some of the sort weights are absent from the sort key, due to the presence of one or more ignore flags in dwMapFlags, the 0x01 separators and the 0x00 terminator are still present.
- cchDest
- [in] Size, in TCHAR values, of the destination string indicated by lpDestStr. If the application is using the function for string mapping, it supplies a character count for this parameter. If space for a terminating null character is included in cchSrc, cchDest must also include space for a terminating null character.
If the application is using the function to generate a sort key, it supplies a byte count for the size. This byte count must include space for the sort key 0x00 terminator.
The application can set cchDest to 0. In this case, the function does not use the lpDestStr parameter and returns the required buffer size for the mapped string or sort key.
Return Values
Returns the number of characters or bytes in the translated string or sort key, including a terminating null character, if successful. If the function succeeds and the value of cchDest is 0, the return value is the size of the buffer required to hold the translated string or sort key, including a terminating null character.
This function returns 0 if it does not succeed. To get extended error information, the application can call GetLastError. GetLastError can return one of the following error codes:
- ERROR_INSUFFICIENT_BUFFER
- ERROR_INVALID_FLAGS
- ERROR_INVALID_PARAMETER
Remarks
The ANSI version of this function maps strings to and from Unicode based on the default Windows (ANSI) code page associated with the specified locale. When the ANSI version of this function is used with a Unicode-only locale, the function can succeed because the operating system uses the CP_ACP value, representing the default system code page. However, characters that are undefined in the system code page appear in the string as a question mark (?). To determine the identifiers that are Unicode-only, see Locale Identifier Constants and Strings.
Windows 95/98/Me: The Unicode version of this function is supported by the Microsoft Layer for Unicode. To use this version, you must add certain files to your application, as outlined in Microsoft Layer for Unicode on Windows 95/98/Me Systems.
Windows Vista and later: This function can return data from custom locales. Locales are not guaranteed to be the same from computer to computer or between runs of an application. If your application must persist or transmit data, see Using Persistent Locale Data.
Applications that are intended to run only on Windows Vista and later should use LCMapStringEx in preference to this function. LCMapStringEx provides good support for supplemental locales. However, LCMapStringEx is not supported for versions of Windows prior to Windows Vista.
String Mapping
If the application does not specify LCMAP_SORTKEY, the LCMapString function performs string mapping. The mapped string is null-terminated if the source string is null-terminated. The following restrictions apply:
- LCMAP_LOWERCASE and LCMAP_UPPERCASE are mutually exclusive.
- LCMAP_HIRAGANA and LCMAP_KATAKANA are mutually exclusive.
- LCMAP_HALFWIDTH and LCMAP_FULLWIDTH are mutually exclusive.
- LCMAP_TRADITIONAL_CHINESE and LCMAP_SIMPLIFIED_CHINESE are mutually exclusive.
If LCMAP_UPPERCASE or LCMAP_LOWERCASE is set, the lpSrcStr and lpDestStr pointers can be the same. Otherwise, the lpSrcStr and lpDestStr values must not be the same. If they are the same, the function fails, and GetLastError returns ERROR_INVALID_PARAMETER.
When transforming between uppercase and lowercase, this function always maps a single character to a single character. For example, the LCMAP_LOWERCASE and LCMAP_UPPERCASE flags map the German Sharp S ("ß") to itself. The LCMAP_UPPERCASE flag does not map "ß" to "SS". The LCMAP_LOWERCASE flag never maps "SS" to "ß".
When transforming between uppercase and lowercase, this function is not sensitive to context. For example, while the LCMAP_UPPERCASE flag correctly maps both Greek lowercase sigma ("σ") and Greek lowercase final sigma ("ς") to Greek uppercase sigma ("Σ"), the LCMAP_LOWERCASE flag always maps "Σ" to "σ", never to "ς".
By default, the function maps the lowercase "i" to the uppercase "I", even when the Locale parameter specifies Turkish or Azeri. To override this behavior for Turkish or Azeri, the application should specify LCMAP_LINGUISTIC_CASING. If this flag is specified with the appropriate locale, "ı" (lowercase dotless I) is the lowercase form of "I" (uppercase dotless I) and "i" (lowercase dotted I) is the lowercase form of "İ" (uppercase dotted I).
If the LCMAP_HIRAGANA flag is specified to map Katakana characters to Hiragana characters, and LCMAP_FULLWIDTH is not specified, the function only maps full-width characters to Hiragana. In this case, any half-width Katakana characters are placed as is in the output string, with no mapping to Hiragana. The application must specify LCMAP_FULLWIDTH to map half-width Katakana characters to Hiragana. The reason for this restriction is that all Hiragana characters are full-width characters.
The application can call this function with the NORM_IGNORESYMBOLS flag and the NORM_IGNORENONSPACE or NORM_IGNORE_DIACRITICS flag set, and all other flags cleared, to strip characters from the source string. If the application does this with a source string that is not null-terminated, it is possible for LCMapString to return an empty string and not return an error.
For many scripts, NORM_IGNORENONSPACE coincides with LINGUISTIC_IGNOREDIACRITIC and NORM_IGNORECASE coincides with LINGUISTIC_IGNORECASE. This is notably true for Latin scripts. The following exceptions apply:
- NORM_IGNORENONSPACE ignores any secondary distinction, whether it is a diacritic or not. Scripts for Korean, Japanese, Chinese, and Indic languages, among others, use this distinction for purposes other than diacritics. LINGUISTIC_IGNOREDIACRITIC causes the function to ignore only actual diacritics, instead of ignoring the second collation weight.
- NORM_IGNORECASE ignores any tertiary distinction, whether it is actually linguistic case or not. For example, in Arabic and Indic scripts, this flag distinguishes alternate forms of a character, but the differences do not correspond to linguistic case. LINGUISTIC_IGNORECASE causes the function to ignore only actual linguistic casing, instead of ignoring the third collation weight.
For double-byte character set (DBCS) locales, NORM_IGNORECASE has an effect on all wide (two-byte) characters as well as narrow (one-byte) characters, including wide Greek and Cyrillic characters.
Creating Sort Keys
When the application specifies LCMAP_SORTKEY, this function generates a sort key. For either the ANSI or the Unicode version of the function, the output is an array of byte values. To compare sort keys, the application should use a byte-by-byte comparison.
For generating sort keys, LCMAP_LINGUISTIC_CASING is not relevant. Instead, NORM_LINGUISTIC_CASING has a similar effect.
When used in memcmp, the output of this function produces the same order as when the original string is used in CompareString. The memcmp function should be used instead of strcmp, because the sort key can have embedded null bytes. When LCMAP_SORTKEY flag is specified, the output is a string, but the character values are not meaningful display values.
When the application uses this function to generate a sort key, the value retrieved in lpDestStr can contain an odd number of bytes. The LCMAP_BYTEREV flag only reverses an even number of bytes. The last byte (odd-positioned) in the sort key is not reversed. If the terminating 0x00 byte is an odd-positioned byte, it remains the last byte in the sort key. If the terminating 0x00 byte is an even-positioned byte, it exchanges positions with the byte that precedes it.
When generating sort keys, this function ignores the Arabic Kashida. If an application calls the function to create a sort key for a string containing an Arabic kashida, the function creates no sort key value for the Kashida.
When generating sort keys, the function treats the hyphen and apostrophe differently from other punctuation symbols, so that words such as "coop" and "co-op" stay together in a list. All punctuation symbols other than the hyphen and apostrophe sort before alphanumeric characters. The application can change this behavior by setting the SORT_STRINGSORT flag, as defined for the CompareString function.
Windows NT/2000/XP/Vista: Included in Windows NT 3.1 and later.
Windows 95/98/Me: Included in Windows 95 and later.
Header: Declared in Winnls.h; include Windows.h.
Library: Use Kernel32.lib.
Unicode: Implemented as Unicode and ANSI versions on Windows NT/2000/XP. Also supported by Microsoft Layer for Unicode.
See Also
National Language Support, National Language Support Functions,
Handling Collation in Your Applications,
FindNLSString,
LCMapStringEx,
GetNLSVersion,
MAKELCID