Unicode-Enabled Applets on Windows NT

Glossary

  • Byte order mark (BOM): The Unicode character U+FEFF or its non-character mirror image, U+FFFE, used to indicate the byte order of a text stream. The presence of a BOM is a strong clue that a file is encoded in Unicode.

The applications GridFont and UniPut included with the Win32 SDK and Visual C++ 2 demonstrate various techniques for programming for Unicode. Under Windows NT 3.5, the shell applets and File Manager fully support Unicode. You can use the system's Notepad and Character Map applets to create files containing Unicode text by following these steps:

  1. Enable the Lucida Sans Unicode font. (If it doesn't show up in the font dialog in Notepad, you need to add it from your SYSTEM directory using the Fonts icon in Control Panel.)
  2. Select the Lucida Sans Unicode font in Notepad (Edit/Set Font), in Character Map (Font list box), and in File Manager (Options/Font).
  3. In Character Map (shown below), select Basic Greek from the Subset list box.

  4. Choose several characters to copy to the clipboard by double-clicking characters in the map. Click the Copy button.
  5. Paste the characters into a document in Notepad.

  6. Now try to save the file. Most likely, Notepad will issue a warning that it cannot convert some of the characters to the current code page. To create a Unicode file, select Unicode Text as the data type in the Save As dialog box.

    On an NTFS file system you can even use Unicode characters in the filename.

When saving as a Unicode Text file, Notepad always writes out a byte order mark (BOM)—Unicode character U+FEFF—as the first Unicode character in a file. It uses this character (and not the file extension) to help it distinguish Unicode text from other data.

Open one of your favorite applications and try to paste in the clipboard text. You will see that Windows NT 3.5 converts as much as it can to the code page used by the application—no modifications to the application are required. When you open a file in Notepad, Notepad calls a Win32 function named IsTextUnicode. This function determines whether the file uses Unicode. If the file starts with the conventional signature for Unicode—the BOM U+FEFF—it knows to treat the file as Unicode. (Notepad always adds a BOM to a Unicode file when saving it and hides it again when opening the file.) If there is no BOM, IsTextUnicode can only guess whether the file uses Unicode based on a number of rules (described in the Visual C++ 2 documentation of IsTextUnicode).

Even in Windows NT 3.5, several features are not Unicode-enabled, such as the help system, WinHlp32. More important for developers, source files for compilers and resource compilers are still based on code pages, for compatibility reasons.

Show: