Skip to main content

Globalization Step-by-Step

Fonts

 

*

 

Overview and Description

One of the biggest challenges in enabling the operating system for international character sets is the ability to select and display the right character or glyph. When editing a multilingual document, the user should not be expected to select a different font for each one of the scripts he or she wants to view because:

  • The average user might not know which font is the most suitable choice.
  • Simple applications such as Notepad.exe only allow one font for the whole document.
  • This type of font selection would impose a big productivity overhead.

Therefore, in addition to the font substitution (also known as "font association") technique used since the early versions of Windows, new features–OpenType fonts, font fallback, and font linking–were introduced in Windows 2000 to solve these types of font–selection problems.

OpenType Fonts

The Unicode–based OpenType font format has been developed jointly by Microsoft and Adobe; it extends the TrueType font file format originally designed by Apple. OpenType fonts allow rich mapping between characters and glyphs, thus enabling support for ligatures, positional forms, alternates, and other substitutions. OpenType fonts can also include information that supports two–dimensional glyph positioning and glyph attachment, and can contain either TrueType or PostScript outlines. Layout features within OpenType fonts are organized by scripts and languages, allowing a single font to support multiple writing systems, even within the same script.

The Windows core fonts (Times New Roman, Courier New, Arial, Microsoft Sans Serif, and Tahoma) contain Latin, Hebrew, Arabic, Greek, and Cyrillic scripts but do not contain East Asian script characters. They link to fonts that do. The main reason behind the exclusion of these scripts is related to the massive performance overhead that East Asian glyphs would introduce in terms of font loading and mapping in GDI. In addition, these scripts would make the font size several times bigger. Instead of having instructions on how to create glyphs for several hundred characters, you would have instructions on how to create them for some 6,000 or 7,000 characters, approximately.

Font Fallback

One benefit of Unicode is the ability to represent many languages and scripts in a single string. This is also a problem, since very few fonts support more than a couple of scripts. Indeed, it's very difficult to do a good job of making fonts with glyphs for different scripts such that all conform to one set of vertical metrics. To overcome this limitation, and in order to accommodate complex scripts, Uniscribe can detect if the currently selected font doesn't support a particular script and can automatically switch–or fall back–to a predefined font that has appropriate glyphs for the desired script. All these operations are transparent to the user.

Here is an example to better understand this mechanism. A user running Windows XP selects the Tahoma font to enter some text first in English, next in Hebrew, and then in Telugu. Since Tahoma is an OpenType font, it provides support for Latin and Hebrew scripts, but does not contain any Telugu glyphs. Uniscribe detects this lack of font support and automatically renders the Telugu script by using its fallback font, which is Gautami.

Although font fallback can accommodate Indic scripts such as Telugu in Windows XP or later Windows versions, no such mechanism existed in Windows 2000. For most of the scripts, the fallback font is set to Microsoft Sans Serif (an OpenType font). For the Indic family of languages, the fallback is set to another appropriate font. Font fallback is internal to Uniscribe, and applications cannot add new fallback fonts or modify existing ones.

Font Linking

Unlike font fallback, in which the selected font is internally replaced by a predefined font, in font linking it is possible to link one or more fonts (called "linked fonts") to another font (called the "base font"). Once you link fonts, you can use the base font to display code points that do not exist in the base font, but that do exist in one of the linked fonts. For example, linking a hangul font and a Japanese font to a Tahoma font allows you to display both Korean and Japanese characters in Tahoma font.

Note: Font linking can only add glyphs to a base font; you cannot override or replace glyphs in the base font.

If font linking is enabled on your device, you can examine the registry by enumerating the subkeys of the registry key at HKEY_LOCAL_MACHINE–\SOFTWARE\Microsoft\Windows NT\CurrentVersion\FontLink\SystemLink to determine the mappings of linked fonts to base fonts. You can add links by using Regedit to create additional subkeys. Once you have located the registry key that has just been mentioned, from the Edit menu, Highlight the font face name of the font you want to link to and then from the Edit menu, click Modify. On a new line in the dialog field "Value data" of the Edit Multi-String dialog box, enter the path and file name to link to, and face name of the font. Use coma to separate the font file name and font face name. Figure 1 below demonstrates how to enter the value for font linking.

Figure 1: RegEdit's "Edit Multi-String" dialog box

Figure 1: RegEdit's "Edit Multi-String" dialog box


Caution: Editing/modifying the font link entries in the Registry can be done, but is NOT supported by Microsoft. The wrong font link entry can leave the system unstable and impacts machine performance.

Note: After using Regedit to add the font linking, you have to log off Windows and log back on in order to have the new added font linking taking the effect.

Important: Font linking is a mechanism enabled within GDI and takes priority over font fallback.

With font fallback and font linking, the font size of the newly selected font will be the same as that of the original font. For example, if an 8–point Tahoma font was selected to type English and now the user enters some Japanese text, an 8–point MS UI Gothic font will be automatically selected. The 8–point font size might not be the best choice for some scripts, since it can make them hard to read.

Both font fallback and font linking contain logic to estimate an appropriate font size, but both mechanisms have to use metrics exposed by the font that might or might not actually match the way the font appears. Consider the difference in the height of English letters among 8–point Microsoft Sans Serif, 8–point Traditional Arabic, and 8–point Angsana New:

Even though all of these are supposedly 8–point fonts, the actual size of the English letters varies widely. Font fallback and font linking are no substitutes for choosing the right font in the first place. Rather, these mechanisms are simply a means of preventing the user from manually selecting a font; additionally, they prevent UI text from being displayed as a default glyph.

Even so, when font linking occurs, GDI will attempt scale the linked font with the aim of making the glyphs from the linked font appear to match in size the glyphs from the base font. In Windows XP, an algorithm was used that operates in terms of various font metrics. In Vista, this algorithm was found not to give satisfactory results in all scenarios; in particular, it did not give good results when linking to new East Asian fonts that have no embedded bitmaps. To resolve this problem, an alternate scaling mechanism was introduced: explicit scaling factors for particular linked fonts could be specified in font linking registry entries. Scaling factors are specified as a pair of positive integers. For instance, the value

MEIRYO.TTC,Meiryo,128,85

indicates that the scaling algorithm should apply the scaling factors 128 and 85 whenever the given base font is linked to the Meiryo font.

Note that GDI+ is not able to parse these scaling factors. Thus, references to fonts with scaling factors are repeated without these scaling factors. In GDI+, the first reference, with the scaling factors, will appear to be to an unrecognized font and will be ignored. In GDI, the second reference will be treated as redundant and ignored.

Font Substitution

Font substitution is used by GDI to translate a request for one face name into a request for another face name. Substitutions are also sensitive to charsets, so that a request for Arial with Western charset (0) can be translated into a request for Arial with Greek charset (161), for instance.

Font substitution is set with the registry entries under the key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\FontSubstitutes. The registry entry of “Helvetica” with the value of “Arial”, for instance, indicates to substitute Helvetica font with Arial font; and the registry entry of “Arial,0” with the value of “Arial,161” will substitute Arial with ANSI_CHAERSET to Arial with GREEK_CHARSET.

Top of pageTop of page

Solution & Code Samples

Font Selection in Win32

Fonts that have glyphs for all supported scripts and for all Unicode characters are very rare. Arial Unicode MS is one of the most complete fonts for Windows, and yet it does not contain all the glyphs associated with all Unicode code points. In order for your application to take advantage of the font support described previously, you should, as a general rule, adhere to the following guidelines:

  • Do not hard–code font face names. Font face names might have variations to specify the default character set (charset). For example, the Arial font face name on an Arabic machine is known as "Arial, 178."
  • Do not assume a given font is installed. The user might delete or uninstall fonts (even the font that comes with the Windows system!).
  • Do not assume a selected font supports the desired script. For example, it is impossible to use Miriam (a Hebrew font) to represent hiragana script.

Along the same lines, it's strongly suggested that you do not hard–code the font size that you use, and that you make this variable customizable according to the script to be displayed; since some scripts are more complicated than others, they need more pixels to be displayed properly. For example, most English characters can be displayed on a 5x7 grid, but Japanese characters need a grid of at least 16x16 to be seen clearly. Chinese characters, on the other hand, need a 24x24 grid. Thai characters only need 8 pixels for width, but they need at least 22 pixels for height. Thus it is easy to understand why some characters in a small font size might not be legible. See Figure 2 below:

Figure 2: Comparison of English and Japanese characters in varying font sizes.

Figure 2: Comparison of English and Japanese characters in varying font sizes.


Fonts in Dialog Resource Files

System fonts are different from one language version to another (even within the same version of the operating system). Those in charge of translating resource files often do not have enough information or technical background on how to change the font face name–whether this change involves replacing the entire name or only modifying it slightly, such as when adding charset information–for the different languages into which they are translating text. In the following example, MS Sans Serif, a bitmap font that only contains glyphs for Western European languages, is being used in the dialog resources. If the application is localized into Turkish or Japanese and run on Windows 2000 or earlier versions, for example, without changing the font face name, the UI text will be displayed in the default glyph as empty squares. (See Figure 3.) This type of display occurs because MS Sans Serif is not an OpenType font that can accommodate Turkish script, and it is not font linked in the system, plus in Windows 2000 or earlier versions there is no font fallback.

 

DLG_NLS DIALOG DISCARDABLE 0, 0, 344, 260
STYLE DS_MODALFRAME | WS_POPUP | WS_CAPTION | WS_SYSMENU
CAPTION "NLS APIs"
FONT 8,   "MS Sans Serif"

Figure 3: A property sheet translated into Korean on Windows 2000, where the font is set to MS Sans Serif. The system fails to find appropriate glyphs for this font and ends up displaying the default glyph.

Figure 3: A property sheet translated into Korean on Windows 2000, where the font is set to MS Sans Serif. The system fails to find appropriate glyphs for this font and ends up displaying the default glyph.


 

Because the desired behavior is to have the UI font of your application follow the desktop (Shell) UI font, and because the default Shell font is different from one localized language of the operating system to another (for example, Microsoft Sans Serif for English, Tahoma for Arabic, and so on), the best practice is to always use the higher–level font face name known as "MS Shell Dlg." MS Shell Dlg is actually not a font. Rather, it is a font face name that gets mapped to the right font depending on the font–substitution settings of the operating system. By setting your default resource font as MS Shell Dlg, you are assured of providing the appropriate font solution, not only on Windows 2000 and Windows XP, but also on all versions of Windows since Windows 95!

Font Selection at Run Time

Hard–coding the font name in your code can have the same result as hard–coding it in dialog resource files. Either action can break your UI. Here again, your font selection should be flexible and context–dependent. Instead of a direct call to the CreateFont or CreateFontIndirect APIs, where the font attributes are hard–coded in a LOGFONT structure, you should use EnumFontFamiliesEx. EnumFontFamiliesEx enumerates all fonts in the system that match the font characteristics specified by the LOGFONT structure–in this case, by character set instead of by font face name.

DEFAULT_CHARSET in LOGFONT

DEFAULT_CHARSET is not a real charset; in reality on Windows 2000 and Windows XP it does two things:

  • It tries to select the named font with the current system character set.
  • If the named font exists but does not support the system character set, it will still select the font with a charset that the font does support.

DEFAULT_CHARSET should be used when displaying a string of characters encoded with Unicode. In the example that follows, the code identifies the charset corresponding to the currently selected input language and also enumerates a set of compatible fonts. hWnd is the window handle where the font will be used, and hDlg is a dialog box to display the font list.

DWORD   dwCodePage;
HKL         hkl = (HKL) lParam;
LOGFONT lf;
HDC        hDc;
CHARSETINFO cs;
TCHAR szLocaleData [BUFFER_SIZE];
// Initialize the LOGFONT to be used.
_tcscpy (lf.lfFaceName, TEXT(""));
lf.lfCharSet = DEFAULT_CHARSET;

// This is a workaround for Hindi and Tamil, since they
//    don't have charsets. Mangal and Latha are the
//    fonts for Hindi and Tamil shipping with Windows NT,
//    2000, and XP. A better workaround would be to put
//    these strings in data files that can be updated with new
//    font face names. You would then call
//    EnumFontFamiliesEx once per face name.
if (LOWORD(hkl) == MAKELANGID(LANG_HINDI, SUBLANG_DEFAULT))
     _tcscpy (lf.lfFaceName, TEXT("Mangal"));
else if (LOWORD(hkl) == MAKELANGID(LANG_TAMIL, SUBLANG_DEFAULT))
        _tcscpy (lf.lfFaceName, TEXT("Latha"));
else
{
   // Find out what Charset the new kbd wants.
   GetLocaleInfo (LOWORD(hkl), LOCALE_IDEFAULTANSICODEPAGE,
      szLocaleData, 6);
   dwCodePage = _ttol (szLocaleData);
   if (TranslateCharsetInfo ((LPVOID) dwCodePage, &cs,
      TCI_SRCCODEPAGE))
        {
        lf.lfCharSet = (BYTE) cs.ciCharset;
        }
}
// Get list of fonts that support this charset.
// hDc is needed by EnumFontFamilies.
hDc = GetDC (hWnd);

// Callback uses hDlg.
EnumFontFamiliesEx (hDc, &lf, (FONTENUMPROC)EnumFontProc,
   (LPARAM)hDlg, (DWORD) 0);

ReleaseDC (hWnd, hDc);

In this example, the callback function passed to EnumFontFamiliesEx is as follows:

int CALLBACK EnumFontProc (ENUMLOGFONTEX* lpelfe,
      NEWTEXTMETRICEX* lpntme, int iFontType, LPARAMlParam)
{
   // Size computed from format used below and buffer limits
   TCHAR SzFaceName [4+LF_FULLFACESIZE+LF_FACESIZE];
      _stprintf(szFaceName, TEXT("%s (%s)"), lpelfe->elfFullName,
      lpelfe->elfScript);

   // Add string to list box to describe this font.
   SendDlgItemMessage ((HWND)lParam,
   IDC_FONTLIST, LB_ADDSTRING, (WPARAM) 0, (LPARAM) szFaceName);
   return TRUE;
}

Each time the callback function is requested, it builds a string containing the font face name and the language name, and adds that string to the list box (the control name that is part of a dialog template).

As you can see from the previous code sample, Indic scripts must be handled separately because they have no charset values. Since there is no default ANSI or Windows Code Page (ACP) value for Indic scripts, none of the Win32 ANSI entry points (the "A" routines) will work with Indic, Georgian, or Armenian text, or for any new scripts for which system support is provided exclusively through Unicode encoding. (See Unicode Enabled")

The font resource for many East Asian languages has two names: an English name and a localized name. For Windows 95, Windows 98, and Windows NT 4, the localized name only works on a system locale that matches the language, while the English name works on all other system locales. This can be a problem when calling CreateFont or CreateFontIndirect. The best method is to try one name and, if that fails, try the other. EnumFonts, EnumFontFamilies, and EnumFontFamiliesEx returns the English font face name if the system locale does not match the language of the font. On Windows 2000 and later versions, this is no longer a problem because both names will be valid for any locale.

NOTE: In Windows 2000 and Windows XP, you can use both English and localized font face names in CreateFontXXX. However, when you enumerate these fonts using EnumFontXXX, you will only get English font face names if the system locale does not match the font's intended face name language.

Another approach to run–time font selection is to display a font selection common dialog box, from which the user can select the desired font. (See Figure 4.) With the ChooseFont API, you can control the list of fonts that are returned to the user, and you can limit the fonts to a given character set.

Figure 4: A simplified font selection dialog box.

Figure 4: A simplified font selection dialog box.


 

The following code example initializes a font for the IDC_EDITWIN edit control. Then upon the user's selection, the code adjusts the font used in the edit control:

static CHOOSEFONT      cf;
static LOGFONT         lf;
// Fill out our CHOOSEFONT and LOGFONT and CHOOSEFONT structures
// with default and predefined values.
InitializeFont(hDlg, &cf, &lf);
// Create this font.
hEditFont = CreateFontIndirect(&lf);

// Set the font in our edit control.
SendDlgItemMessage(hDlg, IDC_EDITWIN, WM_SETFONT,
  (WPARAM) hEditFont, MAKELPARAM(TRUE, 0));
// Upon user's request, create a font selection common
// dialog box and use the new font.
if (ChooseFont(&cf))
{
   hEditFont = CreateFontIndirect(&lf);
   SendDlgItemMessage (hDlg, IDC_EDITWIN, WM_SETFONT,
     WPARAM) hEditFont, MAKELPARAM(TRUE, 0));
}

Where the InitializeFont function looks like the following:

void InitializeFont(HWND hWnd, LONG lHeight, LPCHOOSEFONT lpCf, LPLOGFONT lpLf)

{
   lpCf->lStructSize        = sizeof(CHOOSEFONT);
   lpCf->hwndOwner          = hWnd;
   lpCf->hDC                = NULL;
   lpCf->lpLogFont          = lpLf;
   lpCf->iPointSize         = 10;
   lpCf->Flags              = CF_SCREENFONTS|CF_INITTOLOGFONTSTRUCT|CF_NOSIZESEL;
   lpCf->rgbColors          = RGB(0,0,0);
   lpCf->lCustData          = 0;
   lpCf->lpfnHook           = NULL;
   lpCf->lpTemplateName     = NULL;
   lpCf->hInstance          = g_hInst;
   lpCf->lpszStyle          = NULL;
   lpCf->nFontType          = SIMULATED_FONTTYPE;
   lpCf->nSizeMin           = 0;
   lpCf->nSizeMax           = 0;
   lpLf->lfHeight           = 24;
   lpLf->lfWidth            = 0;
   lpLf->lfEscapement       = 0;
   lpLf->lfOrientation      = 0;
   lpLf->lfWeight           = FW_DONTCARE;
   lpLf->lfItalic           = FALSE;
   lpLf->lfUnderline        = FALSE;
   lpLf->lfStrikeOut        = FALSE;
   lpLf->lfCharSet          = DEFAULT_CHARSET;
   lpLf->lfOutPrecision     = OUT_DEFAULT_PRECIS;
   lpLf->lfClipPrecision    = CLIP_DEFAULT_PRECIS;
   lpLf->lfQuality          = DEFAULT_QUALITY;
   lpLf->lfPitchAndFamily   = DEFAULT_PITCH | FF_DONTCARE;
   _tcscpy(lpLf->lfFaceName, TEXT("MS Shell Dlg"));
}

Font Manipulation in Web Pages

When creating Web pages, avoid placing font attribute values into inline styles, as shown below:

<SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> Hello </SPAN>

This approach makes font customization per language or script a difficult task, since a technical localizer would need to scan the entire Web content for all instances of the font definition one language at a time. If the font didn't have glyphs to handle the new language, changes would have to be made on a per–language basis.

A better way to handle font attributes is to use cascading style sheets (CSS) in which corresponding font attributes and styles are defined. In the following example, the CSS file creates a style class called "myStyle," which contains the font family and font size. You can allow these attributes to change depending on the language into which you are rendering your content. For the HTML file, all you need to do for the Web page is "span" whatever text you want formatted with the myStyle class.

 

<STYLE>
.myStyle {font-size: 10pt; font-family: Arial;}
</STYLE>

<SPAN class=myStyle> Hello </SPAN>

 

Now adopting the font to be used per language or per script becomes a much easier job, since it requires a single change in one specific file. You can extend this notion and define a specific style for all the scripts that you want to render in your multilingual Web site. In the following example, the CSS file defines an appropriate font style for each script that will be used thereafter in the inline text.

The CSS file would look like this:

 

.clsDescriptor{COLOR: #bdbddd;FONT: 0.7em/1em Verdana;}
.clsEnglish {FONT: 1.1em/1.3em "Palatino Linotype";}
.clsTitle {COLOR: darkred; FONT: 1.4em/1.6em "Palatino Linotype";}
.clsArabic {FONT: 1.1em/1.3em "Arabic Transparent";}
.clsArmenian {FONT: 1.3em/1.3em Sylfaen;}
.clsHindi {FONT: 1.1em/1.3em Mangal;}

 

And the HTML file would look like this:

The output would look like this:

Figure 5: Output of code in which an appropriate font style has been defined for each script.

Figure 5: Output of code in which an appropriate font style has been defined for each script.


Top of pageTop of page

Font Manipulation in .NET Framework

See "Font Manipulation in Web Pages" above

Top of pageTop of page

 

Top of pageTop of page Previous3 of 7 Next