WideCharToMultiByte

This content has moved to another location. See WideCharToMultiByte for the latest version.

Tags :


Community Content

Shawn Steele [MSFT]
In general best fit should be avoided.

See http://blogs.msdn.com/shawnste/archive/2006/01/19/515047.aspx and http://blogs.msdn.com/shawnste/archive/2007/06/08/why-can-t-we-strip-the-diacritics.aspx for some of my reasoning.

Best fit behavior causes some characters to behave identically to others. This can change the linguistic meaning in a very unfortunate way sometimes, and it can cause security problems.

If you do use best fit, realize that the mappings are somewhat random, you might mangle customer's names, etc., and do any security verification after the conversion (in case it causes new security mappings for your app).

Tags :

Shawn Steele [MSFT]
Use Unicode :)
Of course, its usually best to use Unicode (unless dealing with an older system that can't handle the conversion of legacy data). Then you don't have to worry about best fit, changing code pages, or the like. http://blogs.msdn.com/shawnste/archive/2007/03/20/some-reasons-to-make-your-application-unicode.aspx has some more information. If you have to read legacy data it may be best to try to move it to Unicode early on and then keep it that way. (Remember, most of our ANSI APIs internally convert it to Unicode and then call the Unicode version of the API, so, if nothing else, ANSI APIs are generally slower).
Tags :

Shawn Steele [MSFT]
CP_ACP is pretty scary.
CP_ACP gives you the code page currently configured as the system code page. This might be different on your friend's or customer's machine and lead to data corruption. This is often what happens when you see ? or "funny" characters on web pages, like for fancy quotation marks. For uniform readability by all machines Unicode is often a better choice than doing code page conversions.
Tags :

Shawn Steele [MSFT]
UTF-7/8 Conversion problem
I'm having trouble converting to UTF7/8. Whenever
lpDefaultChar, lpUsedDefaultChar

are set to non-zero, the function fails. However, it works with the same arguments if the last two are set to NULL. This was observed on Windows XP SP2.
----------

Shawn Steele

The replacement character isn't interesting for Unicode since any valid string should be convertable to UTF-8. (UTF-7 isn't terribly secure and not recommended) The only exception to that is an invalid surrogate pair (high surrogate with no following low surrogate, etc.), in which case it'll be replaced with U+FFFD

Tags :

Shawn Steele [MSFT]
Re: UTF-7/8 Conversion problem
This is already documented on this page:
lpUsedDefaultChar
...
For the CP_UTF7 and CP_UTF8 settings for CodePage, this parameter must be set to a null pointer. Otherwise, the function fails with ERROR_INVALID_PARAMETER.

P.S. Probably, this was added after Skyfaller's comment? :)

(Shawn Steele - No, it was there before :))

Tags :

Shawn Steele [MSFT]
WC_COMPOSITECHECK isn't very helpful
WC_COMPOSITECHECK isn't complete and is a bit slow. If you have Form D (decomposed) data that you need to compose before encoding with WideCharToMultiByte(), then try using NormalizeString() before calling MultiByteToWideChar. It'll be faster and more accurate.
Tags :

WizardOz
lpMultiByteStr is not zero-terminated
Here is a note for anybody using this function. ALWAYS check the return code, and do NOT use the content of lpMultiByteStr if the function returned 0.

As it turns out, WideCharToMultiByte() even fills the buffer lpMultiByteStr when it is too small to hold the converted wide string - however it does not zero-terminate it. I'm not sure why the function would fill the resulting string if it knows that it's too small and then not zero-terminate it. I can only assume it's because it converts byte after bye.

Consider this sample code:

wchar_t wstrTest[] = L"12345678";
char strTest[4] = { '\0' };

WideCharToMultiByte(CP_UTF8, 0, (LPCWSTR) wstrTest, -1, strTest, sizeof(strTest), NULL, NULL);
printf("Converted: %s\n", strTest);

If you don't check the return value, then any subsequent string operation will go past the size of strTest.

Converted: 1234╠╠╠╠╠╠╠╠1

WideCharToMultiByte() returns 0 in this case, but still fills strTest.

In my opinion, WideCharToMultiByte() should either NOT fill the destination variable if it's too small, or terminate it with a 0 character.

If it is "ok" that the resulting multibyte string is smaller and that data might be lost (which can be the case when you work with a fixed output buffer), then it's better to call the function like this:

WideCharToMultiByte(CP_UTF8, 0, (LPCWSTR) wstrTest, -1, strTest, (sizeof(strTest) - sizeof(char)), NULL, NULL);

... assuming that strTest has been zeroed-out first.
Tags :

Shawn Steele [MSFT]
Re: not zero-terminated
Its worth noting that many of these functions are designed to also allow non-null terminated input strings (ie: counted behavior). So the output isn't always null terminated, even in success cases, if the input doesn't contain a trailing null (which pretty much means an explicit length was passed in for the input).
Tags :

Thomas Lee
wording correction for "valid UTF-8 operating systems"

correction: under "Windows Vista and later:" it says: "code that uses this function on valid UTF-8 operating systems will behave the same way on Windows Vista and later as on earlier Windows operating systems."

It probably meant to say valid UTF-8 strings, not valid UTF-8 operating systems.

Tags : contentbug

Page view tracker