Click to Rate and Give Feedback
MSDN
MSDN Library
User Interface
 WideCharToMultiByte
International Features
WideCharToMultiByte

Maps a wide character string to a new character string. The new character string is not necessarily from a multibyte character set.

Note: The ANSI code pages can be different on different computers, or can be changed for a single computer, leading to data corruption. For the most consistent results, applications should use Unicode, such as UTF-8 (code page 65001) or UTF-16, instead of a specific code page, unless legacy standards or data formats prevent the use of Unicode. If use of Unicode is not possible, applications should tag the data stream with the appropriate encoding name when protocols allow it. HTML, XML, and HTTP files allow tagging, but text files do not.

int WideCharToMultiByte(
  UINT CodePage, 
  DWORD dwFlags, 
  LPCWSTR lpWideCharStr,
  int cchWideChar, 
  LPSTR lpMultiByteStr, 
  int cbMultiByte,
  LPCSTR lpDefaultChar,    
  LPBOOL lpUsedDefaultChar
);

Parameters

CodePage
[in] Code page to use in performing the conversion. This parameter can be set to the value of any code page that is installed or available in the operating system. For a list of code pages, see Code Page Identifiers. Your application can also specify one of the values shown in the following table.
Value Meaning
CP_ACP The current system Windows ANSI code page. This value can be different on different computers, even on the same network. It can be changed on the same computer, leading to stored data becoming irrecoverably corrupted. This value is only intended for temporary use and permanent storage should be done using UTF-16 or UTF-8 if possible.
CP_MACCP The current system Macintosh code page.This value can be different on different computers, even on the same network. It can be changed on the same computer, leading to stored data becoming irrecoverably corrupted. This value is only intended for temporary use and permanent storage should be done using UTF-16 or UTF-8 if possible.

Note: This value is used primarily in legacy code and should not generally be needed since modern Macintosh computers use Unicode for encoding.

CP_OEMCP The current system OEM code page. This value can be different on different computers, even on the same network. It can be changed on the same computer, leading to stored data becoming irrecoverably corrupted. This value is only intended for temporary use and permanent storage should be done using UTF-16 or UTF-8 if possible.
CP_SYMBOL Windows 2000 and later: Symbol code page (42).
CP_THREAD_ACP Windows 2000 and later: The Windows ANSI code page for the current thread. This value can be different on different computers, even on the same network. It can be changed on the same computer, leading to stored data becoming irrecoverably corrupted. This value is only intended for temporary use and permanent storage should be done using UTF-16 or UTF-8 if possible.
CP_UTF7 Windows 98/Me, Windows NT 4.0 and later: UTF-7. Use this value only when forced by a 7-bit transport mechanism. Use of UTF-8 is preferred. With this value set, lpDefaultChar and lpUsedDefaultChar must be set to null pointers.
CP_UTF8 Windows 98/Me, Windows NT 4.0 and later: UTF-8. With this value set, lpDefaultChar and lpUsedDefaultChar must be set to null pointers

Note: On Windows 95, the Microsoft Layer for Unicode enables WideCharToMultiByte to support CP_UTF7 and CP_UTF8.

dwFlags
[in] Flags indicating the conversion type. The application can specify a combination of the following values. The function performs more quickly when none of these flags is set. The application should specify WC_NO_BEST_FIT_CHARS and WC_COMPOSITECHECK with the specific value WC_DEFAULTCHAR to retrieve all possible conversion results. If all three values are not provided, some results will be missing.
Value Meaning
WC_NO_BEST_FIT_CHARS Windows 98/Me and Windows 2000 and later: Translate any Unicode characters that do not translate directly to multibyte equivalents to the default character specified by lpDefaultChar. In other words, if translating from Unicode to multibyte and back to Unicode again does not yield the same Unicode character, the function uses the default character. This flag can be used by itself or in combination with the other defined flags.
WC_COMPOSITECHECK Convert composite characters, consisting of a base character and a nonspacing character, each with different character values. Translate these characters to precomposed characters, which have a single character value for a base-nonspacing character combination. For example, in the character è, the e is the base character and the accent grave mark is the nonspacing character.

Your application can combine WC_COMPOSITECHECK with any one of the following flags, with the default being WC_SEPCHARS. These flags determine the behavior of the function when no precomposed mapping for a base-nonspacing character combination in a wide character string is available. If none of these flags is supplied, the function behaves as if the WC_SEPCHARS flag is set. For more information, see WC_COMPOSITECHECK and related flags in the Remarks section.

WC_DISCARDNS Discard nonspacing characters during conversion.
WC_SEPCHARS Default. Generate separate characters during conversion.
WC_DEFAULTCHAR Replace exceptions with the default character during conversion.

WC_ERR_INVALID_CHARS Windows Vista and later: Fail if an invalid input character is encountered. If this flag is not set, the function silently drops illegal code points. A call to GetLastError returns ERROR_NO_UNICODE_TRANSLATION. Note that this flag only applies when CodePage is specified as CP_UTF8. It cannot be used with other code page values.

For the code pages listed below, dwFlags must be 0. Otherwise, the function fails with ERROR_INVALID_FLAGS.

50220

50221

50222

50225

50227

50229

52936

54936

57002 through 57011

65000 (UTF7)

42 (Symbol)


Note: For the code page 65001 (UTF-8), dwFlags must be set to either 0 or WC_ERR_INVALID_CHARS. Otherwise, the function fails with ERROR_INVALID_FLAGS.

lpWideCharStr
[in] Pointer to the wide character string to convert.
cchWideChar
[in] Size, in WCHAR values, of the string indicated by lpWideCharStr. If this parameter is set to -1, the function assumes the string to be null-terminated and calculates the length automatically, including the null terminator. If cchWideChar is set to 0, the function fails.
lpMultiByteStr
[out] Pointer to a buffer that receives the converted string.
cbMultiByte
[in] Size, in bytes, of the buffer indicated by lpMultiByteStr. If this parameter is set to 0, the function returns the required buffer size for lpMultiByteStr and makes no use of the output parameter itself.
lpDefaultChar
[in] Pointer to the character to use if a wide character cannot be represented in the specified code page. The application sets this parameter to a null pointer if the function is to use a system default value. To obtain the system default character, the application can call the GetCPInfo or GetCPInfoEx function.

For the CP_UTF7 and CP_UTF8 settings for CodePage, this parameter must be set to a null pointer. Otherwise, the function fails with ERROR_INVALID_PARAMETER.

lpUsedDefaultChar
[out] Pointer to a flag that indicates if the function has used a default character in the conversion. The flag is set to TRUE if one or more characters in the source string cannot be represented in the specified code page. Otherwise, the flag is set to FALSE. This parameter can be set to a null pointer.

For the CP_UTF7 and CP_UTF8 settings for CodePage, this parameter must be set to a null pointer. Otherwise, the function fails with ERROR_INVALID_PARAMETER.

Return Values

Returns the number of bytes written to the buffer pointed to by lpMultiByteStr if successful. The number includes the byte for the terminating null character.

If the function succeeds and cbMultiByte is 0, the return value is the required size, in bytes, for the buffer indicated by lpMultiByteStr.

The function returns 0 if it does not succeed. To get extended error information, the application can call GetLastError. GetLastError can return one of the following error codes:

  • ERROR_INSUFFICIENT_BUFFER
  • ERROR_INVALID_FLAGS
  • ERROR_INVALID_PARAMETER

Remarks

security note Security Alert   Using the WideCharToMultiByte function incorrectly can compromise the security of your application. Calling this function can easily cause a buffer overrun because the size of the input buffer indicated by lpWideCharStr equals the number of WCHAR values in the string, while the size of the output buffer indicated by lpMultiByteStr equals the number of bytes. To avoid a buffer overrun, your application must specify a buffer size appropriate for the data type the buffer receives.

Data converted from Unicode UTF-16 to non-Unicode code pages (code pages other than UTF-7 or UTF-8) is subject to data loss, because a code page might not be able to represent every character used in the specific Unicode data. For more information, see Security Considerations: International Features.

For strings that require validation, such as file, resource, and user names, the application should always use the WC_NO_BEST_FIT_CHARS flag with WideCharToMultiByte. This flag prevents the function from mapping characters to characters that appear similar but have very different semantics. In some cases, the semantic change can be extreme. For example, the symbol for "∞" (infinity) maps to 8 (eight) in some code pages.

The lpMultiByteStr and lpWideCharStr pointers must not be the same. If they are the same, the function fails, and GetLastError returns ERROR_INVALID_PARAMETER.

WideCharToMultiByte does not null-terminate an output string if the input string length is explicitly specified without a terminating null character. To null-terminate an output string for this function, the application should pass in -1 or explicitly count the null terminator for the input string.

If cbMultiByte is less than cchWideChar, this function writes the number of characters specified by cbMultiByte to the buffer indicated by lpMultiByteStr. However, if CodePage is set to CP_SYMBOL and cbMultiByte is less than cchWideChar, the function writes no characters to lpMultiByteStr.

The WideCharToMultiByte function operates most efficiently when both lpDefaultChar and lpUsedDefaultChar are set to null pointers. The following table shows the behavior of the function for the four possible combinations of these parameters.

lpDefaultChar lpUsedDefaultChar Result
NULL NULL No default checking. These parameter settings are the most efficient ones for use with this function.
non-NULL NULL Uses the specified default character, but does not set lpUsedDefaultChar.
NULL non-NULL Uses the system default character and sets lpUsedDefaultChar if necessary.
non-NULL non-NULL Uses the specified default character and sets lpUsedDefaultChar if necessary.

Windows Vista and later: This function fully conforms with the Unicode 4.1 specification for UTF-8 and UTF-16. The function used on earlier operating systems encodes or decodes lone surrogate halves or mismatched surrogate pairs. Code written in earlier versions of Windows that rely on this behavior to encode random non-text binary data might run into problems. However, code that uses this function on valid UTF-8 operating systems will behave the same way on Windows Vista and later as on earlier Windows operating systems.

Windows 95 and Windows NT 4.0: The WC_NO_BEST_FIT_CHARS flag is not available on these operating systems. If your application must run on these platforms, you can "round-trip" the string using MultiByteToWideChar. Any code point that does not round-trip is a best-fit character.

Windows 95/98/Me: A version of WideCharToMultiByte is included in these operating systems, but a more extensive version of the function is supported by the Microsoft Layer for Unicode. To use this version, you must add certain files to your application, as outlined in Microsoft Layer for Unicode on Windows 95/98/Me Systems.

WC_COMPOSITECHECK and related flags

As discussed in the Unicode Normalization topic, Unicode allows multiple representations of the same string (interpreted linguistically). For example, Capital A with dieresis (umlaut) can be represented either precomposed as a single Unicode code point "Ä" (U+00C4) or decomposed as the combination of Capital A and the combining dieresis character ("A" + "¨", that is U+0041 U+0308). However, most code pages provide only composed characters.

The WC_COMPOSITECHECK flag causes the WideCharToMultiByte function to test for decomposed Unicode characters and attempt to compose them before converting them to the requested code page. This flag is only available for conversion to single byte (SBCS) or double byte (DBCS) code pages (code pages < 50000, excluding code page 42). If your application needs to convert decomposed Unicode data to single byte or double byte code pages, this flag might be useful. However, not all characters can be converted this way and it is more reliable to save and store such data as Unicode.

When an application is using WC_COMPOSITECHECK, some character combinations might remain incomplete or might have additional nonspacing characters left over. For example, A + ¨ + ¨ combines to Ä + ¨. Using the WC_DISCARDNS flag causes the function to discard additional nonspacing characters. Using the WC_DEFAULTCHAR flag causes the function to use the default replacement character (typically "?") instead. Using the WC_SEPCHARS flag causes the function to attempt to convert each additional nonspacing character to the target code page. Usually this flag also causes the use of the replacement character ("?"). However, for code page 1258 (Vietnamese) and 20269, nonspacing characters exist and can be used. The conversions for these code pages are not perfect. Some combinations do not convert correctly to code page 1258, and WC_COMPOSITECHECK corrupts data in code page 20269. As mentioned earlier, it is more reliable to design your application to save and store such data as Unicode.

Windows normally represents Unicode strings with precomposed data, making the use of the WC_COMPOSITECHECK flag unnecessary. The less common applications that create decomposed data cannot accurately represent many decomposed character combinations in most code pages. Unicode is the preferred way to save and store such data.

Example

For an example, see Looking Up a User's Full Name.

Requirements

  Windows NT/2000/XP/Vista: Included in Windows NT 3.1 and later.
  Windows 95/98/Me: Included in Windows 95 and later.
  Header: Declared in Winnls.h; include Windows.h.
  Library: Use Kernel32.lib.

See Also

Unicode and Character Sets, Unicode and Character Set Functions, MultiByteToWideChar

Tags What's this?: Add a tag
Community Content   What is Community Content?
Add new content RSS  Annotations
In general best fit should be avoided.      Shawn Steele - MSFT   |   Edit   |  

See http://blogs.msdn.com/shawnste/archive/2006/01/19/515047.aspx and http://blogs.msdn.com/shawnste/archive/2007/06/08/why-can-t-we-strip-the-diacritics.aspx for some of my reasoning.

Best fit behavior causes some characters to behave identically to others. This can change the linguistic meaning in a very unfortunate way sometimes, and it can cause security problems.

If you do use best fit, realize that the mappings are somewhat random, you might mangle customer's names, etc., and do any security verification after the conversion (in case it causes new security mappings for your app).

Tags What's this?: Add a tag
Flag as ContentBug
Use Unicode :)      Shawn Steele - MSFT   |   Edit   |  
Of course, its usually best to use Unicode (unless dealing with an older system that can't handle the conversion of legacy data). Then you don't have to worry about best fit, changing code pages, or the like. http://blogs.msdn.com/shawnste/archive/2007/03/20/some-reasons-to-make-your-application-unicode.aspx has some more information. If you have to read legacy data it may be best to try to move it to Unicode early on and then keep it that way. (Remember, most of our ANSI APIs internally convert it to Unicode and then call the Unicode version of the API, so, if nothing else, ANSI APIs are generally slower).
Tags What's this?: Add a tag
Flag as ContentBug
CP_ACP is pretty scary.      Shawn Steele - MSFT   |   Edit   |  
CP_ACP gives you the code page currently configured as the system code page. This might be different on your friend's or customer's machine and lead to data corruption. This is often what happens when you see ? or "funny" characters on web pages, like for fancy quotation marks. For uniform readability by all machines Unicode is often a better choice than doing code page conversions.
Tags What's this?: Add a tag
Flag as ContentBug
UTF-7/8 Conversion problem      Skyfaller ... Shawn Steele - MSFT   |   Edit   |  
I'm having trouble converting to UTF7/8. Whenever
lpDefaultChar, lpUsedDefaultChar

are set to non-zero, the function fails. However, it works with the same arguments if the last two are set to NULL. This was observed on Windows XP SP2.
----------

Shawn Steele

The replacement character isn't interesting for Unicode since any valid string should be convertable to UTF-8. (UTF-7 isn't terribly secure and not recommended) The only exception to that is an invalid surrogate pair (high surrogate with no following low surrogate, etc.), in which case it'll be replaced with U+FFFD

Tags What's this?: Add a tag
Flag as ContentBug
Re: UTF-7/8 Conversion problem      Sam Mesh ... Shawn Steele - MSFT   |   Edit   |  
This is already documented on this page:
lpUsedDefaultChar
...
For the CP_UTF7 and CP_UTF8 settings for CodePage, this parameter must be set to a null pointer. Otherwise, the function fails with ERROR_INVALID_PARAMETER.

P.S. Probably, this was added after Skyfaller's comment? :)

(Shawn Steele - No, it was there before :))

Tags What's this?: Add a tag
Flag as ContentBug
WC_COMPOSITECHECK isn't very helpful      Shawn Steele - MSFT   |   Edit   |  
WC_COMPOSITECHECK isn't complete and is a bit slow. If you have Form D (decomposed) data that you need to compose before encoding with WideCharToMultiByte(), then try using NormalizeString() before calling MultiByteToWideChar. It'll be faster and more accurate.
Tags What's this?: Add a tag
Flag as ContentBug
lpMultiByteStr is not zero-terminated      Old Lucky Luke   |   Edit   |  
Here is a note for anybody using this function. ALWAYS check the return code, and do NOT use the content of lpMultiByteStr if the function returned 0.

As it turns out, WideCharToMultiByte() even fills the buffer lpMultiByteStr when it is too small to hold the converted wide string - however it does not zero-terminate it. I'm not sure why the function would fill the resulting string if it knows that it's too small and then not zero-terminate it. I can only assume it's because it converts byte after bye.

Consider this sample code:

wchar_t wstrTest[] = L"12345678";
char strTest[4] = { '\0' };

WideCharToMultiByte(CP_UTF8, 0, (LPCWSTR) wstrTest, -1, strTest, sizeof(strTest), NULL, NULL);
printf("Converted: %s\n", strTest);

If you don't check the return value, then any subsequent string operation will go past the size of strTest.

Converted: 1234╠╠╠╠╠╠╠╠1

WideCharToMultiByte() returns 0 in this case, but still fills strTest.

In my opinion, WideCharToMultiByte() should either NOT fill the destination variable if it's too small, or terminate it with a 0 character.

If it is "ok" that the resulting multibyte string is smaller and that data might be lost (which can be the case when you work with a fixed output buffer), then it's better to call the function like this:

WideCharToMultiByte(CP_UTF8, 0, (LPCWSTR) wstrTest, -1, strTest, (sizeof(strTest) - sizeof(char)), NULL, NULL);

... assuming that strTest has been zeroed-out first.
Tags What's this?: Add a tag
Flag as ContentBug
Re: not zero-terminated      Shawn Steele - MSFT   |   Edit   |