Maps a character string to a UTF-16 (wide character) string. The character string is not necessarily from a multibyte character set.
int MultiByteToWideChar( _In_ UINT CodePage, _In_ DWORD dwFlags, _In_ LPCSTR lpMultiByteStr, _In_ int cbMultiByte, _Out_opt_ LPWSTR lpWideCharStr, _In_ int cchWideChar );
- CodePage [in]
Code page to use in performing the conversion. This parameter can be set to the value of any code page that is installed or available in the operating system. For a list of code pages, see Code Page Identifiers. Your application can also specify one of the values shown in the following table.
- dwFlags [in]
Flags indicating the conversion type. The application can specify a combination of the following values, with MB_PRECOMPOSED being the default. MB_PRECOMPOSED and MB_COMPOSITE are mutually exclusive. MB_USEGLYPHCHARS and MB_ERR_INVALID_CHARS can be set regardless of the state of the other flags.
Always use decomposed characters, that is, characters in which a base character and one or more nonspacing characters each have distinct code point values. For example, Ä is represented by A + ¨: LATIN CAPITAL LETTER A (U+0041) + COMBINING DIAERESIS (U+0308). Note that this flag cannot be used with MB_PRECOMPOSED.
Fail if an invalid input character is encountered.
Starting with Windows Vista, the function does not drop illegal code points if the application does not set this flag.
Windows 2000 with SP4 and later, Windows XP: If this flag is not set, the function silently drops illegal code points. A call to GetLastError returns ERROR_NO_UNICODE_TRANSLATION.
Default; do not use with MB_COMPOSITE. Always use precomposed characters, that is, characters having a single character value for a base or nonspacing character combination. For example, in the character è, the e is the base character and the accent grave mark is the nonspacing character. If a single Unicode code point is defined for a character, the application should use it instead of a separate base character and a nonspacing character. For example, Ä is represented by the single Unicode code point LATIN CAPITAL LETTER A WITH DIAERESIS (U+00C4).
Use glyph characters instead of control characters.
For the code pages listed below, dwFlags must be set to 0. Otherwise, the function fails with ERROR_INVALID_FLAGS.
Note For UTF-8 or code page 54936 (GB18030, starting with Windows Vista), dwFlags must be set to either 0 or MB_ERR_INVALID_CHARS. Otherwise, the function fails with ERROR_INVALID_FLAGS.
- 57002 through 57011
- 65000 (UTF-7)
- 42 (Symbol)
- lpMultiByteStr [in]
Pointer to the character string to convert.
- cbMultiByte [in]
Size, in bytes, of the string indicated by the lpMultiByteStr parameter. Alternatively, this parameter can be set to -1 if the string is null-terminated. Note that, if cbMultiByte is 0, the function fails.
If this parameter is -1, the function processes the entire input string, including the terminating null character. Therefore, the resulting Unicode string has a terminating null character, and the length returned by the function includes this character.
If this parameter is set to a positive integer, the function processes exactly the specified number of bytes. If the provided size does not include a terminating null character, the resulting Unicode string is not null-terminated, and the returned length does not include this character.
- lpWideCharStr [out, optional]
Pointer to a buffer that receives the converted string.
- cchWideChar [in]
Size, in characters, of the buffer indicated by lpWideCharStr. If this value is 0, the function returns the required buffer size, in characters, including any terminating null character, and makes no use of the lpWideCharStr buffer.
Returns the number of characters written to the buffer indicated by lpWideCharStr if successful. If the function succeeds and cchWideChar is 0, the return value is the required size, in characters, for the buffer indicated by lpWideCharStr. If the input byte/char sequences are invalid, returns U+FFFD for UTF encodings.
The function returns 0 if it does not succeed. To get extended error information, the application can call GetLastError, which can return one of the following error codes:
- ERROR_INSUFFICIENT_BUFFER. A supplied buffer size was not large enough, or it was incorrectly set to NULL.
- ERROR_INVALID_FLAGS. The values supplied for flags were not valid.
- ERROR_INVALID_PARAMETER. Any of the parameter values was invalid.
- ERROR_NO_UNICODE_TRANSLATION. Invalid Unicode was found in a string.
The default behavior of this function is to translate to a precomposed form of the input character string. If a precomposed form does not exist, the function attempts to translate to a composite form.
The use of the MB_PRECOMPOSED flag has very little effect on most code pages because most input data is composed already. Consider calling NormalizeString after converting with MultiByteToWideChar. NormalizeString provides more accurate, standard, and consistent data, and can also be faster. Note that for the NORM_FORM enumeration being passed to NormalizeString, NormalizationC corresponds to MB_PRECOMPOSED and NormalizationD corresponds to MB_COMPOSITE.
As mentioned in the caution above, the output buffer can easily be overrun if this function is not first called with cchWideChar set to 0 in order to obtain the required size. If the MB_COMPOSITE flag is used, the output can be three or more characters long for each input character.
The lpMultiByteStr and lpWideCharStr pointers must not be the same. If they are the same, the function fails, and GetLastError returns the value ERROR_INVALID_PARAMETER.
MultiByteToWideChar does not null-terminate an output string if the input string length is explicitly specified without a terminating null character. To null-terminate an output string for this function, the application should pass in -1 or explicitly count the terminating null character for the input string.
The function fails if MB_ERR_INVALID_CHARS is set and an invalid character is encountered in the source string. An invalid character is one of the following:
- A character that is not the default character in the source string, but translates to the default character when MB_ERR_INVALID_CHARS is not set
- For DBCS strings, a character that has a lead byte but no valid trail byte
Starting with Windows Vista, this function fully conforms with the Unicode 4.1 specification for UTF-8 and UTF-16. The function used on earlier operating systems encodes or decodes lone surrogate halves or mismatched surrogate pairs. Code written in earlier versions of Windows that rely on this behavior to encode random non-text binary data might run into problems. However, code that uses this function on valid UTF-8 strings will behave the same way as on earlier Windows operating systems.
Windows XP: To prevent the security problem of the non-shortest-form versions of UTF-8 characters, MultiByteToWideChar deletes these characters.
Starting with Windows 8: MultiByteToWideChar is declared in Stringapiset.h. Before Windows 8, it was declared in Winnls.h.
Minimum supported client
|Windows 2000 Professional [desktop apps | Windows Store apps]|
Minimum supported server
|Windows 2000 Server [desktop apps | Windows Store apps]|
Minimum supported phone
|Windows Phone 8|