Unicode Programming Summary

Unicode TasksMultibyte Character Set (MBCS) Tasks

To take advantage of the MFC and C run-time support for Unicode, you need to:

  • Define _UNICODE.

    Define the symbol _UNICODE before you build your program.

  • Specify entry point.

    In the Output category of the Link tab in the Project Settings dialog box, set the Entry Point Symbol to wWinMainCRTStartup.

  • Use “portable” run-time functions and types.

    Use the proper C run-time functions for Unicode string handling. You can use the wcs family of functions, but you may prefer the fully “portable” (internationally enabled) _TCHAR macros. These macros are all prefixed with _tcs; they substitute, one for one, for the str family of functions. These functions are described in detail in the Internationalization section of the Run-Time Library Reference. For more information, see Generic-Text Mappings in TCHAR.H.

    Use _TCHAR and the related portable data types described in Support for Unicode.

  • Handle literal strings properly.

    The Visual C++ compiler interprets a literal string coded as

    L"this is a literal string"
    

    to mean a string of Unicode characters. You can use the same prefix for literal characters. Use the _T macro to code literal strings generically, so they compile as Unicode strings under Unicode or as ANSI strings (including MBCS) without Unicode. For example, instead of:

    pWnd->SetWindowText( “Hello” );
    

    use:

    pWnd->SetWindowText( _T(“Hello”) );
    

    With _UNICODE defined, _T translates the literal string to the L-prefixed form; otherwise, _T translates the string without the L prefix.

    Tip   The _T macro is identical to the _TEXT macro.

  • Be careful passing string lengths to functions.

    Some functions want the number of characters in a string; others want the number of bytes. For example, if _UNICODE is defined, the following call to a CArchive object will not work (str is a CString):

    archive.Write( str, str.GetLength( ) );    // invalid
    

    In a Unicode application, the length gives you the number of characters but not the correct number of bytes, since each character is two bytes wide. Instead, you must use:

    archive.Write( str, str.GetLength( ) * sizeof( _TCHAR ) );    // valid
    

    which specifies the correct number of bytes to write.

    However, MFC member functions that are character-oriented, rather than byte-oriented, work without this extra coding:

    pDC->TextOut( str, str.GetLength( ) );
    

    CDC::TextOut takes a number of characters, not a number of bytes.

To summarize, MFC and the run-time library provide the following support for Unicode programming under Windows NT:

  • Except for database class member functions, all MFC functions are Unicode-enabled, including CString. CString also provides Unicode/ANSI conversion functions.

  • The run-time library supplies Unicode versions of all string-handling functions. (The run-time library also supplies “portable” versions suitable for Unicode or for MBCS. These are the _tcs macros.)

  • TCHAR.H supplies portable data types and the _T macro for translating literal strings and characters. See Generic-Text Mappings in TCHAR.H.

  • The run-time library provides a wide-character version of main. Use wmain to make your application “Unicode-aware.”