Click to Rate and Give Feedback
MSDN
MSDN Library
Windows Development
 MultiByteToWideChar function
MultiByteToWideChar function

Applies to: desktop apps | Metro style apps

Maps a character string to a UTF-16 (wide character) string. The character string is not necessarily from a multibyte character set.

Caution  Using the MultiByteToWideChar function incorrectly can compromise the security of your application. Calling this function can easily cause a buffer overrun because the size of the input buffer indicated by lpMultiByteStr equals the number of bytes in the string, while the size of the output buffer indicated by lpWideCharStr equals the number of characters. To avoid a buffer overrun, your application must specify a buffer size appropriate for the data type the buffer receives. For more information, see Security Considerations: International Features.

Note  The ANSI code pages can be different on different computers, or can be changed for a single computer, leading to data corruption. For the most consistent results, applications should use Unicode, such as UTF-8 or UTF-16, instead of a specific code page, unless legacy standards or data formats prevent the use of Unicode. If using Unicode is not possible, applications should tag the data stream with the appropriate encoding name when protocols allow it. HTML and XML files allow tagging, but text files do not.

Syntax

int MultiByteToWideChar(
  __in       UINT CodePage,
  __in       DWORD dwFlags,
  __in       LPCSTR lpMultiByteStr,
  __in       int cbMultiByte,
  __out_opt  LPWSTR lpWideCharStr,
  __in       int cchWideChar
);

Parameters

CodePage [in]

Code page to use in performing the conversion. This parameter can be set to the value of any code page that is installed or available in the operating system. For a list of code pages, see Code Page Identifiers. Your application can also specify one of the values shown in the following table.

ValueMeaning
CP_ACP

The system default Windows ANSI code page.

Note  This value can be different on different computers, even on the same network. It can be changed on the same computer, leading to stored data becoming irrecoverably corrupted. This value is only intended for temporary use and permanent storage should use UTF-16 or UTF-8 if possible.

CP_MACCP

The current system Macintosh code page.

Note  This value can be different on different computers, even on the same network. It can be changed on the same computer, leading to stored data becoming irrecoverably corrupted. This value is only intended for temporary use and permanent storage should use UTF-16 or UTF-8 if possible.

Note   This value is used primarily in legacy code and should not generally be needed since modern Macintosh computers use Unicode for encoding.

CP_OEMCP

The current system OEM code page.

Note  This value can be different on different computers, even on the same network. It can be changed on the same computer, leading to stored data becoming irrecoverably corrupted. This value is only intended for temporary use and permanent storage should use UTF-16 or UTF-8 if possible.

CP_SYMBOL

Symbol code page (42).

CP_THREAD_ACP

The Windows ANSI code page for the current thread.

Note  This value can be different on different computers, even on the same network. It can be changed on the same computer, leading to stored data becoming irrecoverably corrupted. This value is only intended for temporary use and permanent storage should use UTF-16 or UTF-8 if possible.

CP_UTF7

UTF-7. Use this value only when forced by a 7-bit transport mechanism. Use of UTF-8 is preferred.

CP_UTF8

UTF-8.

 

dwFlags [in]

Flags indicating the conversion type. The application can specify a combination of the following values, with MB_PRECOMPOSED being the default. MB_PRECOMPOSED and MB_COMPOSITE are mutually exclusive. MB_USEGLYPHCHARS and MB_ERR_INVALID_CHARS can be set regardless of the state of the other flags.

ValueMeaning
MB_COMPOSITE

Always use decomposed characters, that is, characters in which a base character and one or more nonspacing characters each have distinct code point values. For example, Ä is represented by A + ¨: LATIN CAPITAL LETTER A (U+0041) + COMBINING DIAERESIS (U+0308). Note that this flag cannot be used with MB_PRECOMPOSED.

MB_ERR_INVALID_CHARS

Fail if an invalid input character is encountered.

Starting with Windows Vista, the function does not drop illegal code points if the application does not set this flag.

Windows 2000 with SP4 and later, Windows XP:   If this flag is not set, the function silently drops illegal code points. A call to GetLastError returns ERROR_NO_UNICODE_TRANSLATION.
MB_PRECOMPOSED

Default; do not use with MB_COMPOSITE. Always use precomposed characters, that is, characters having a single character value for a base or nonspacing character combination. For example, in the character è, the e is the base character and the accent grave mark is the nonspacing character. If a single Unicode code point is defined for a character, the application should use it instead of a separate base character and a nonspacing character. For example, Ä is represented by the single Unicode code point LATIN CAPITAL LETTER A WITH DIAERESIS (U+00C4).

MB_USEGLYPHCHARS

Use glyph characters instead of control characters.

 

For the code pages listed below, dwFlags must be set to 0. Otherwise, the function fails with ERROR_INVALID_FLAGS.

  • 50220
  • 50221
  • 50222
  • 50225
  • 50227
  • 50229
  • 57002 through 57011
  • 65000 (UTF-7)
  • 42 (Symbol)

Note  For UTF-8 or code page 54936 (GB18030, starting with Windows Vista), dwFlags must be set to either 0 or MB_ERR_INVALID_CHARS. Otherwise, the function fails with ERROR_INVALID_FLAGS.

lpMultiByteStr [in]

Pointer to the character string to convert.

cbMultiByte [in]

Size, in bytes, of the string indicated by the lpMultiByteStr parameter. Alternatively, this parameter can be set to -1 if the string is null-terminated. Note that, if cbMultiByte is 0, the function fails.

If this parameter is -1, the function processes the entire input string, including the terminating null character. Therefore, the resulting Unicode string has a terminating null character, and the length returned by the function includes this character.

If this parameter is set to a positive integer, the function processes exactly the specified number of bytes. If the provided size does not include a terminating null character, the resulting Unicode string is not null-terminated, and the returned length does not include this character.

lpWideCharStr [out, optional]

Pointer to a buffer that receives the converted string.

cchWideChar [in]

Size, in characters, of the buffer indicated by lpWideCharStr. If this value is 0, the function returns the required buffer size, in characters, including any terminating null character, and makes no use of the lpWideCharStr buffer.

Return value

Returns the number of characters written to the buffer indicated by lpWideCharStr if successful. If the function succeeds and cchWideChar is 0, the return value is the required size, in characters, for the buffer indicated by lpWideCharStr. If the input byte/char sequences are invalid, returns U+FFFD for UTF encodings.

The function returns 0 if it does not succeed. To get extended error information, the application can call GetLastError, which can return one of the following error codes:

  • ERROR_INSUFFICIENT_BUFFER. A supplied buffer size was not large enough, or it was incorrectly set to NULL.
  • ERROR_INVALID_FLAGS. The values supplied for flags were not valid.
  • ERROR_INVALID_PARAMETER. Any of the parameter values was invalid.
  • ERROR_NO_UNICODE_TRANSLATION. Invalid Unicode was found in a string.

Remarks

The default behavior of this function is to translate to a precomposed form of the input character string. If a precomposed form does not exist, the function attempts to translate to a composite form.

The use of the MB_PRECOMPOSED flag has very little effect on most code pages because most input data is composed already. Consider calling NormalizeString after converting with MultiByteToWideChar. NormalizeString provides more accurate, standard, and consistent data, and can also be faster. Note that for the NORM_FORM enumeration being passed to NormalizeString, NormalizationC corresponds to MB_PRECOMPOSED and NormalizationD corresponds to MB_COMPOSITE.

As mentioned in the caution above, the output buffer can easily be overrun if this function is not first called with cchWideChar set to 0 in order to obtain the required size. If the MB_COMPOSITE flag is used, the output can be three or more characters long for each input character.

The lpMultiByteStr and lpWideCharStr pointers must not be the same. If they are the same, the function fails, and GetLastError returns the value ERROR_INVALID_PARAMETER.

MultiByteToWideChar does not null-terminate an output string if the input string length is explicitly specified without a terminating null character. To null-terminate an output string for this function, the application should pass in -1 or explicitly count the terminating null character for the input string.

The function fails if MB_ERR_INVALID_CHARS is set and an invalid character is encountered in the source string. An invalid character is one of the following:

  • A character that is not the default character in the source string, but translates to the default character when MB_ERR_INVALID_CHARS is not set
  • For DBCS strings, a character that has a lead byte but no valid trail byte

Starting with Windows Vista, this function fully conforms with the Unicode 4.1 specification for UTF-8 and UTF-16. The function used on earlier operating systems encodes or decodes lone surrogate halves or mismatched surrogate pairs. Code written in earlier versions of Windows that rely on this behavior to encode random non-text binary data might run into problems. However, code that uses this function on valid UTF-8 strings will behave the same way as on earlier Windows operating systems.

Windows XP: To prevent the security problem of the non-shortest-form versions of UTF-8 characters, MultiByteToWideChar deletes these characters.

Starting with Windows 8 Consumer Preview: MultiByteToWideChar is declared in Stringapiset.h. Before Windows 8, it was declared in Winnls.h.

Requirements

Minimum supported client

Windows 2000 Professional

Minimum supported server

Windows 2000 Server

Header

Stringapiset.h (include Windows.h)

Library

Kernel32.lib

DLL

Kernel32.dll

See also

Unicode and Character Sets
Unicode and Character Set Functions
WideCharToMultiByte

 

 

Send comments about this topic to Microsoft

Build date: 3/6/2012

Tags What's this?: Add a tag
Community Content   What is Community Content?
Add new content RSS  Annotations
make sure there is always an eos      ArnoudMulder   |   Edit   |   Show History

the code below ensures there is always an eos in the output


/***************************/
/* ansi-unicode conversion */
/***************************/

BOOL AnsiToUnicode16(CHAR *in_Src, WCHAR *out_Dst, INT in_MaxLen)
{
/* locals */
INT lv_Len;

// do NOT decrease maxlen for the eos
if (in_MaxLen <= 0)
return FALSE;

// let windows find out the meaning of ansi
// - the SrcLen=-1 triggers MBTWC to add a eos to Dst and fails if MaxLen is too small.
// - if SrcLen is specified then no eos is added
// - if (SrcLen+1) is specified then the eos IS added
lv_Len = MultiByteToWideChar(CP_ACP, 0, in_Src, -1, out_Dst, in_MaxLen);

// validate
if (lv_Len < 0)
lv_Len = 0;

// ensure eos, watch out for a full buffersize
// - if the buffer is full without an eos then clear the output like MBTWC does
// in case of too small outputbuffer
// - unfortunately there is no way to let MBTWC return shortened strings,
// if the outputbuffer is too small then it fails completely
if (lv_Len < in_MaxLen)
out_Dst[lv_Len] = 0;
else if (out_Dst[in_MaxLen-1])
out_Dst[0] = 0;

// done
return TRUE;
}


BOOL AnsiToUnicode16L(CHAR *in_Src, INT in_SrcLen, WCHAR *out_Dst, INT in_MaxLen)
{
/* locals */
INT lv_Len;


// do NOT decrease maxlen for the eos
if (in_MaxLen <= 0)
return FALSE;

// let windows find out the meaning of ansi
// - the SrcLen=-1 triggers MBTWC to add a eos to Dst and fails if MaxLen is too small.
// - if SrcLen is specified then no eos is added
// - if (SrcLen+1) is specified then the eos IS added
lv_Len = MultiByteToWideChar(CP_ACP, 0, in_Src, in_SrcLen, out_Dst, in_MaxLen);

// validate
if (lv_Len < 0)
lv_Len = 0;

// ensure eos, watch out for a full buffersize
// - if the buffer is full without an eos then clear the output like MBTWC does
// in case of too small outputbuffer
// - unfortunately there is no way to let MBTWC return shortened strings,
// if the outputbuffer is too small then it fails completely
if (lv_Len < in_MaxLen)
out_Dst[lv_Len] = 0;
else if (out_Dst[in_MaxLen-1])
out_Dst[0] = 0;

// done
return TRUE;
}

Tags What's this?: Add a tag
Flag as ContentBug
Convert from UTF-8 to UTF-16 sample code (C++, Win32, STL)      Giovanni Dicanio   |   Edit   |   Show History
A sample C++ code showing how to convert Unicode text from UTF-8 to UTF-16 encoding using MultiByteToWideChar can be found here:

http://msmvps.com/blogs/gdicanio/archive/2011/02/04/conversion-between-unicode-utf-8-and-utf-16-with-stl-strings.aspx

Tags What's this?: Add a tag
Flag as ContentBug
Convert from UTF-8 to UTF-16 sample code (C++, Win32, STL)      Giovanni Dicanio   |   Edit   |   Show History
A sample C++ code showing how to convert Unicode text from UTF-8 to UTF-16 encoding using MultiByteToWideChar can be found here:

http://msmvps.com/blogs/gdicanio/archive/2011/02/04/conversion-between-unicode-utf-8-and-utf-16-with-stl-strings.aspx
Tags What's this?: Add a tag
Flag as ContentBug
MB_ERR_INVALID_CHARS      Philipp Stephani   |   Edit   |   Show History
About MB_ERR_INVALID_CHARS: $0$0
"Starting with Windows Vista, the function does not drop illegal code points if the application does not set this flag."
$0$0 $0 Instead, the function seems to replace illegal code unit sequences with the Unicode replacement character (U+FFFD). It would be nice if this behavior were documented.$0
Tags What's this?: Add a tag
Flag as ContentBug
Convert from Unicode UTF-8 to UTF-16      Giovanni Dicanio   |   Edit   |   Show History
A sample C++ code showing how to convert Unicode text from UTF-8 to UTF-16 encoding using MultiByteToWideChar can be found here:


Flag as ContentBug
Cannot get ERROR_NO_UNICODE_TRANSLATION      Vincent_Yang   |   Edit   |   Show History
Even if I set the MB_ERR_INVALID_CHARS flag, this API can "successfully" covert invalid char/byte to garbled char.
For example, 0x81, 0x8D, 0x8F, 0x90, 0x9D is invalid byte for cp1252, but when the oringinal multibyte string has such byte value, this API doesn't fail with a ERROR_NO_UNICODE_TRANSLATION eror.
I google the error, other developers also complain it doesn't work as it is described.

any body know a example of "A character that is not the default character in the source string, but translates to the default character when MB_ERR_INVALID_CHARS is not set "
Tags What's this?: Add a tag
Flag as ContentBug
Another MB_COMPOSITE warning      Shawn Steele [MSFT]   |   Edit   |   Show History

Your output buffer isn't 1-1 if you pass this flag.

This is (in my opinion) a badly named flag. It decomposes and input character into possibly multiple output characters. Amongst other things this means that the output can be 2, 3 or more (I think like 6) characters long for each input character, so if you use this flag make sure to either "count" first (pass zero length null output buffer) to get the size of the required output buffer.

Tags What's this?: Add a tag
Flag as ContentBug
Don't use MB_PRECOMPOSED and MB_COMPOSITE      Shawn Steele [MSFT]   |   Edit   |   Show History
First of all, MB_COMPOSITE is a horrible name. It kind of means "normalization form D", which I think of as "decomposed". MB_PRECOMPOSED is sort of "normalization form C".
Instead use NormalizeString() after conversion. NormalizeString() provides more accurate, standard, and consistent data than these flags. It can also be faster.
Also note that MB_PRECOMPOSED has very little effect on most code pages because most of the input data is composed anyway, so the extra processing is often wasted effort.
Tags What's this?: Add a tag
Flag as ContentBug
Processing
© 2012 Microsoft. All rights reserved. Terms of Use | Trademarks | Privacy Statement
Page view tracker