Click to Rate and Give Feedback
MSDN
MSDN Library

  Switch on low bandwidth view
WideCharToMultiByte

This content has moved to another location. See WideCharToMultiByte for the latest version.

Tags What's this?: Add a tag
Community Content   What is Community Content?
Add new content RSS  Annotations
In general best fit should be avoided.      Shawn Steele [MSFT]   |   Edit   |   Show History

See http://blogs.msdn.com/shawnste/archive/2006/01/19/515047.aspx and http://blogs.msdn.com/shawnste/archive/2007/06/08/why-can-t-we-strip-the-diacritics.aspx for some of my reasoning.

Best fit behavior causes some characters to behave identically to others. This can change the linguistic meaning in a very unfortunate way sometimes, and it can cause security problems.

If you do use best fit, realize that the mappings are somewhat random, you might mangle customer's names, etc., and do any security verification after the conversion (in case it causes new security mappings for your app).

Tags What's this?: Add a tag
Flag as ContentBug
Use Unicode :)      Shawn Steele [MSFT]   |   Edit   |   Show History
Of course, its usually best to use Unicode (unless dealing with an older system that can't handle the conversion of legacy data). Then you don't have to worry about best fit, changing code pages, or the like. http://blogs.msdn.com/shawnste/archive/2007/03/20/some-reasons-to-make-your-application-unicode.aspx has some more information. If you have to read legacy data it may be best to try to move it to Unicode early on and then keep it that way. (Remember, most of our ANSI APIs internally convert it to Unicode and then call the Unicode version of the API, so, if nothing else, ANSI APIs are generally slower).
Tags What's this?: Add a tag
Flag as ContentBug
CP_ACP is pretty scary.      Shawn Steele [MSFT]   |   Edit   |   Show History
CP_ACP gives you the code page currently configured as the system code page. This might be different on your friend's or customer's machine and lead to data corruption. This is often what happens when you see ? or "funny" characters on web pages, like for fancy quotation marks. For uniform readability by all machines Unicode is often a better choice than doing code page conversions.
Tags What's this?: Add a tag
Flag as ContentBug
UTF-7/8 Conversion problem      Skyfaller ... Shawn Steele [MSFT]   |   Edit   |   Show History
I'm having trouble converting to UTF7/8. Whenever
lpDefaultChar, lpUsedDefaultChar

are set to non-zero, the function fails. However, it works with the same arguments if the last two are set to NULL. This was observed on Windows XP SP2.
----------

Shawn Steele

The replacement character isn't interesting for Unicode since any valid string should be convertable to UTF-8. (UTF-7 isn't terribly secure and not recommended) The only exception to that is an invalid surrogate pair (high surrogate with no following low surrogate, etc.), in which case it'll be replaced with U+FFFD

Tags What's this?: Add a tag
Flag as ContentBug
Re: UTF-7/8 Conversion problem      Sam Mesh at hotmail ... Shawn Steele [MSFT]   |   Edit   |   Show History
This is already documented on this page:
lpUsedDefaultChar
...
For the CP_UTF7 and CP_UTF8 settings for CodePage, this parameter must be set to a null pointer. Otherwise, the function fails with ERROR_INVALID_PARAMETER.

P.S. Probably, this was added after Skyfaller's comment? :)

(Shawn Steele - No, it was there before :))

Tags What's this?: Add a tag
Flag as ContentBug
WC_COMPOSITECHECK isn't very helpful      Shawn Steele [MSFT]   |   Edit   |   Show History
WC_COMPOSITECHECK isn't complete and is a bit slow. If you have Form D (decomposed) data that you need to compose before encoding with WideCharToMultiByte(), then try using NormalizeString() before calling MultiByteToWideChar. It'll be faster and more accurate.
Tags What's this?: Add a tag
Flag as ContentBug
lpMultiByteStr is not zero-terminated      WizardOz   |   Edit   |   Show History
Here is a note for anybody using this function. ALWAYS check the return code, and do NOT use the content of lpMultiByteStr if the function returned 0.

As it turns out, WideCharToMultiByte() even fills the buffer lpMultiByteStr when it is too small to hold the converted wide string - however it does not zero-terminate it. I'm not sure why the function would fill the resulting string if it knows that it's too small and then not zero-terminate it. I can only assume it's because it converts byte after bye.

Consider this sample code:

wchar_t wstrTest[] = L"12345678";
char strTest[4] = { '\0' };

WideCharToMultiByte(CP_UTF8, 0, (LPCWSTR) wstrTest, -1, strTest, sizeof(strTest), NULL, NULL);
printf("Converted: %s\n", strTest);

If you don't check the return value, then any subsequent string operation will go past the size of strTest.

Converted: 1234╠╠╠╠╠╠╠╠1

WideCharToMultiByte() returns 0 in this case, but still fills strTest.

In my opinion, WideCharToMultiByte() should either NOT fill the destination variable if it's too small, or terminate it with a 0 character.

If it is "ok" that the resulting multibyte string is smaller and that data might be lost (which can be the case when you work with a fixed output buffer), then it's better to call the function like this:

WideCharToMultiByte(CP_UTF8, 0, (LPCWSTR) wstrTest, -1, strTest, (sizeof(strTest) - sizeof(char)), NULL, NULL);

... assuming that strTest has been zeroed-out first.
Tags What's this?: Add a tag
Flag as ContentBug
Re: not zero-terminated      Shawn Steele [MSFT]   |   Edit   |   Show History
Its worth noting that many of these functions are designed to also allow non-null terminated input strings (ie: counted behavior). So the output isn't always null terminated, even in success cases, if the input doesn't contain a trailing null (which pretty much means an explicit length was passed in for the input).
Tags What's this?: Add a tag
Flag as ContentBug
wording correction for "valid UTF-8 operating systems"      Ben Bryant ... Thomas Lee   |   Edit   |   Show History

correction: under "Windows Vista and later:" it says: "code that uses this function on valid UTF-8 operating systems will behave the same way on Windows Vista and later as on earlier Windows operating systems."

It probably meant to say valid UTF-8 strings, not valid UTF-8 operating systems.

Processing
© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Page view tracker