Expand Minimize

NormalizeString function

Normalizes characters of a text string according to Unicode 4.0 TR#15. For more information, see Using Unicode Normalization to Represent Strings.

Syntax


int NormalizeString(
  _In_       NORM_FORM NormForm,
  _In_       LPCWSTR lpSrcString,
  _In_       int cwSrcLength,
  _Out_opt_  LPWSTR lpDstString,
  _In_       int cwDstLength
);

Parameters

NormForm [in]

Normalization form to use. NORM_FORM specifies the standard Unicode normalization forms.

lpSrcString [in]

Pointer to the non-normalized source string.

cwSrcLength [in]

Length, in characters, of the buffer containing the source string. The application can set this parameter to -1 if the function should assume the string to be null-terminated and calculate the length automatically.

lpDstString [out, optional]

Pointer to a buffer in which the function retrieves the destination string. Alternatively, this parameter contains NULL if cwDstLength is set to 0.

Note  The function does not null-terminate the string if the input string length is explicitly specified without a terminating null character. To null-terminate the output string, the application should specify -1 or explicitly count the terminating null character for the input string.

cwDstLength [in]

Length, in characters, of the buffer containing the destination string. Alternatively, the application can set this parameter to 0 to request the function to return the required size for the destination buffer.

Return value

Returns the length of the normalized string in the destination buffer. If cwDstLength is set to 0, the function returns the estimated buffer length required to do the actual conversion.

If the string in the input buffer is null-terminated or if cwSrcLength is -1, the string written to the destination buffer is null-terminated and the returned string length includes the terminating null character.

The function returns a value that is less than or equal to 0 if it does not succeed. To get extended error information, the application can call GetLastError, which can return one of the following error codes:

  • ERROR_INSUFFICIENT_BUFFER. A supplied buffer size was not large enough, or it was incorrectly set to NULL.
  • ERROR_INVALID_PARAMETER. Any of the parameter values was invalid.
  • ERROR_NO_UNICODE_TRANSLATION. Invalid Unicode was found in a string. The return value is the negative of the index of the location of the error in the input string.
  • ERROR_SUCCESS. The action completed successfully but yielded no results.

Remarks

Some Unicode characters have multiple equivalent binary representations consisting of sets of combining and/or composite Unicode characters. The Unicode standard defines a process called normalization that returns one binary representation when given any of the equivalent binary representations of a character. Normalization can be performed with several algorithms, called normalization forms, that obey different rules, as described in Using Unicode Normalization to Represent Strings. The Win32 and the .NET Framework currently support normalization forms C, D, KC, and KD, as defined in Unicode Standard Annex #15: Unicode Normalization Forms. Normalized strings are typically evaluated with an ordinal comparison.

The following code demonstrates the use of the buffer length estimate:


const int maxIterations = 10;
LPWSTR strResult = NULL;
HANDLE hHeap = GetProcessHeap();

int iSizeEstimated = NormalizeString(form, strInput, -1, NULL, 0);
for (int i = 0; i < maxIterations; i++)
{
    if (strResult)
        HeapFree(hHeap, 0, strResult);
    strResult = (LPWSTR)HeapAlloc(hHeap, 0, iSizeEstimated * sizeof (WCHAR));
    iSizeEstimated = NormalizeString(form, strInput, -1, strResult, iSizeEstimated);
 
    if (iSizeEstimated > 0)
        break; // success 
 
    if (iSizeEstimated <= 0)
    {
        DWORD dwError = GetLastError();
        if (dwError != ERROR_INSUFFICIENT_BUFFER) break; // Real error, not buffer error 
 
        // New guess is negative of the return value. 
        iSizeEstimated = -iSizeEstimated;
    }
}


Windows XP, Windows Server 2003: The required header file and DLL are part of the "Microsoft Internationalized Domain Name (IDN) Mitigation APIs" download, available at the MSDN Download Center.

Examples

An example showing the use of this function can be found in NLS: Unicode Normalization Sample.

Requirements

Minimum supported client

Windows Vista [desktop apps | Windows Store apps]

Minimum supported server

Windows Server 2008 [desktop apps | Windows Store apps]

Redistributable

Microsoft Internationalized Domain Name (IDN) Mitigation APIs onWindows XP with SP2 and later, orWindows Server 2003 with SP1

Header

Winnls.h (include Windows.h)

DLL

Normaliz.dll

See also

National Language Support
National Language Support Functions
Using Unicode Normalization to Represent Strings
IsNormalizedString
NORM_FORM

 

 

Community Additions

ADD
Show:
© 2014 Microsoft