Skip to main content

Globalization Step-by-Step

String Handling

Isolate Localizable Resources Mirroring

Overview and Description

Once you have chosen a resource store, you can move on to the next area that causes a lot of localizability nightmares-the way strings are created for display in the UI or for placement in a persistent storage. Many a translator has spent sleepless nights trying to decipher what a creative developer is attempting to convey to the user. There are at least as many different ways to create strings of text as there are developers, and thus it would be impossible to include all of them here. However, this section will focus on five major areas to watch out for in relation to strings.

Top of pageTop of page

Avoid Run-Time Composite Strings

When creating strings for output, well-intentioned programmers use a coding trick that has been passed down from generation to generation of developers as a good coding practice. That trick is using text strings with variables that get composed at the application's run time. Thus you see code as shown here that is written to produce the following English text:

"Are you sure you want to delete the file?";
"Are you sure you want to delete the directory?";
"Are you sure you want to delete the subdirectory?";

char szString[] = "Are you sure you want to delete the ";
char szFinalString[cbMaxSz]= szString + szDelObject + "?";

Note that szDelObject is a variable that can contain either a file, directory, or subdirectory. This practice works well for the programmer, but all the translator sees in the resource file is "Are you sure you want to delete the "; the translator has no idea what is being deleted. The value of the variable often determines whether a different syntax or definite article needs to be used, depending on the item to be deleted. The reason is that in some languages nouns have gender. For instance, in German nouns can be masculine, feminine, or neuter. In Romance languages such as French or Portuguese, nouns are either masculine or feminine. Depending on the gender of the noun in such languages, the definite article that precedes the noun will vary. For example, French uses "le" in front of masculine nouns (such as "le fils" for "the son") and "la" in front of feminine nouns (such as "la fille" for "the daughter"). Thus for the previous code, a French translator would not know whether to use "le" or "la," because the actual value of the variable-in this case, the noun-is not specified. The best way to avoid this ambiguity is to write out each sentence completely, if possible. This way the translator knows the context of the sentence and can translate it correctly. Thus in your resource store you would have:

Del_File = "Are you sure you want to delete the file?";
Del_Dir = "Are you sure you want to delete the directory?";
Del SubD = "Are you sure you want to delete the subdirectory?";
 

Another potential shortcut involves declaring a single string, such as "none," "blue," or "first," and displaying it in a number of different contexts-on a menu, in a dialog box, and perhaps in several messages. The problem with using "all-purpose" strings is that in European languages, adjectives (and some nouns) have anywhere from 4 to 14 different forms (for example, masculine, feminine, and neuter singular; and masculine, feminine, and neuter plural) that must match the nouns they modify. For instance, in Spanish the word "first" can be conveyed by the words "primero," "primera," "primer," "primeros," and "primeras," depending on the gender and number of the noun or the particular sentence structure. A single string displayed in different contexts will be correct in gender and number in some cases but incorrect in others. The user will consider such a translation amateurish. Again, the way to handle this problem is to have a separate string for the same word used in a different context, such as the following strings:

Menu_open = "open"
Dialog_open = "open"
Button_open = "open"
 

 

Top of pageTop of page

When Variables Are Necessary, Use Unique Names

Although translators would love to have strings with no variables as well as strings that are repeated for usage in different contexts, there are times when it is necessary to use variables. When there is only one variable in a string, like in the example that involved deleting a file, directory, or subdirectory, it is best to use a function that allows you to at least put a placeholder in the text. Thus the earlier example would become:

Del_File = "Are you sure you want to delete the %s"; 

To help the translators, document all the values that the variable can be. This way they will know what goes in the variable and will be able to translate it using the correct syntax and grammar for the target language.

Because word order can vary significantly from language to language, using identical placeholders can cause problems if you have a string that has two or more variables. For example, suppose you have the following string:

 Memory_Error = "Not enough memory to %s the file %s."; 

The first %s is a function like open, close, or save and the second %s is the name of the file you are trying to open, close, or save. Thus one of the possible sentences generated with this string might be:

"Not enough memory to open the file filename1." 

Suppose you plan to translate this text into Swedish and Finnish. The Swedish translation causes no problem because its translation is:

"Det finns inte tillräckligt med minne för att öppna filen FileName1." 

Notice that the command "open" ("öppna") comes before the file name. Now look at the Finnish translation:

"Liian vähän muistia tiedoston FileName1 avaamiseen." 

Notice here that the command "open" ("avaamiseen") comes after the file name; but in most programming languages if you used %sfor both your variables (as in the earlier example), your program string would actually be created as:

"Liian vähän muistia tiedoston avaamiseen FileName1." 

This is because the variable in the original call to output this string was in the order of command first, file name second. Therefore, it would be better if you had some way to show variable order, as in this string in English:

"Not enough memory to %1 the file %2." 

The Swedish and Finnish translation, respectively, would be:

"Det finns inte tillräckligt  med minne för att %1 filen %2."
"Liian vähän muistia tiedoston %2 %1."

The following shows the Win32 and .NET Framework mechanisms created to handle variable order.

Top of pageTop of page

Win32 -- FormatMessage

Win32 supports two resource types for storing strings: string tables and message tables. (For more information on message tables, see the section on Multilanguage User Interface "MUI".) String tables make sense for short strings and for strings containing only one replacement parameter; message tables are more convenient for alert and error messages that contain more than one replacement parameter. (Message tables support up to 99 parameters.) The FormatMessage API will substitute variables according to each placemarker's numeric label and not according to the label's position in the string. Localizers can freely change a string's word order and FormatMessage will still return correct results. The file format of the message table is not complicated; you can create message tables with a simple text editor. The following code appears in a message table that contains English and German translations of the same strings. The German translation of IDS_OTHERIMAGE reverses the positions of the replacement parameters.

// Sample.mc
LanguageNames=(German=2:msg00002)

MessageId=1 SymbolicName=IDS_NOFILE

Language=English
Cannot open file %1.
Language=German
Die Datei %1 kann nicht geöffnet werden.

MessageId=2 SymbolicName=IDS_OTHERIMAGE
Language=English
%1 is a %2 image.

Language=German
%2-Abbild ein %1 ist.

The syntax for formatting the first message is as follows:

// lpBuf must be large enough to hold the formatted message!
DWORD langID = MAKELANGID(LANG_GERMAN, SUBLANG_GERMAN);
HMODULE hModule = LoadLibrary(...);
TCHAR lpBuf[60];
LPVOID lppArgs[10];
DWORD len = FormatMessage(
FORMAT_MESSAGE_FROM_HMODULE|FORMAT_MESSAGE_ARGUMENT_ARRAY,
hModule, idMsg, langID, lpBuf, sizeof(lpBuf)/sizeof(TCHAR),
lppArgs);

You can use FormatMessage with string tables as well as with message tables, but it is more efficient to use the function with message tables. FormatMessage can retrieve strings from message tables directly, but it cannot access string tables. To format a string from a string table, you would first have to retrieve it with the LoadString function and then pass it in a buffer to FormatMessage. Not only is this an extra step, it's a convoluted extra step.

FormatMessage is particularly useful for a number of reasons. In conjunction with the API call GetLastError, you can use it to format error messages returned by the system. An example of this technique is given in the following ReportError routine. FormatMessage also allows you to specify the language of the string you want to retrieve from a message table. LoadString can retrieve only resources associated with the language of the current thread's locale.

void ReportError()
{
LPTSTR lpMessage;
DWORD dwErrCode = GetLastError();
FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
FORMAT_MESSAGE_FROM_SYSTEM,
NULL, // no source buffer needed
dwErrCode, // error code for this message
NULL, // default language ID
(LPTSTR)&lpMessage, // allocated by fcn
NULL, // minimum size of buffer
NULL); // no inserts

MessageBox(NULL, lpMessage, TEXT("File Error"),
MB_ICONSTOP | MB_OK );
}

Top of pageTop of page

.NET Framework -- String.Format

The .NET Framework has a counterpart to Win32's FormatMessage -the String.Format method. It allows you to number your variable placeholders so that localizers can rearrange them depending on the syntax of the language into which you are translating, without having to rearrange the variables themselves in the method call. The one difference between FormatMessageand String.Format is that instead of using FormatMessage 's percent sign (%) plus some number to designate the placeholder, String.Format encompasses the number within braces "{ }." So the earlier example would be changed from the FormatMessage syntax of:

"Not enough memory to %1 the file %2."

to the String.Format syntax of:

"Not enough memory to {0} the file {1}."

The call in your program would look like this:

String.Format("Not enough memory to {0} the file {1}", sFunc,
sFile);

The format parameter is embedded with zero or more format specifications of the form, { N [, M ][: formatString ]}, where:

  • N is a zero-based integer indicating the argument to be formatted.
  • M is an optional integer indicating the width of the region to contain the formatted value, padded with spaces. If the sign of M is negative, the formatted value is left-justified in the region; if the sign of M is positive, the value is right-justified.

  • formatString is an optional string of formatting codes.

If the value of format is,

"Brad's dog has {0,-8:G} fleas."

arg0 is a 16-bit integer with the value 42, (and in this example, underscores represent padding spaces) then the return value will be:

"Brad's dog has 42______ fleas."

The .NET Framework has also added the idea of an argument placeholder to its Console.Write method. So an application that uses the standard input, output, and error streams for console applications can use the same syntax as the String.Format method to allow it to be more localizable, such as shown here:

Console.Write("Not enough memory to {0} the file {1}", sFunc,
sFile);

Top of pageTop of page

Do Not Compound Several Variables

As stated before, there are times that variables are necessary because you might not know what needs to go in the string until run time. Things like dates, time, temperature, and number of sales are just a few examples of these types of variables. Besides giving these variables unique names, you need to make sure that you don't compound several variables together. For example, a translator ran across the following during a localization job:

"%d:%d%s on %s, %s %d, %d" 

Needless to say, the translator did not know how to translate the word "on," since it was ambiguous what the word meant or referred to in this particular context. Once the programmer was contacted, the localizer learned what the variables stood for:

1st %d-Hour in 12-hour format (01-12)
2nd %d-Minute, as decimal number (00-59)
1st %s-Current locale's A.M./P.M. indicator for 12-hour clock
2nd %s-Full name for the day of the week
3rd %s-Full name for the month
3rd %d-Day of month, as decimal number (01-31)
4th %d-Year with century, as decimal number
 

To avoid confusion, the programmer should have used unique names. Even better, the programmer should have stored the data in language-neutral data variables-such as a time structure-and then used one of the Windows globalization services-such as National Language Support (NLS) or .NET. (For more information on these globalization services, see  " Locale Model." ) Remember that even though you use unique variable names, you still need to leave enough information so the translators know what you are trying to say. The following string, once again, not only lacks unique names for the variables, but also compounds these variables together:

"%d %1 has %2 internal %3." 

Top of pageTop of page

Keep Sentences in a Single String

You have seen that good localizability practices include minimizing the use of variables, providing unique names for variables, and supplying enough information for the translators to understand the context of what needs to be translated. It is also important to write things out rather than compounding variables together in a string, as in the two previous examples.

Additionally, it is important to keep sentences in a single string. When a sentence is broken up into several strings, the strings do not necessarily appear consecutively in the localizer's string table. It is very time-consuming to piece strings together to form a correct sentence. In addition, it is not possible to merely translate word by word using the same syntax as the original language, since sentence structures differ from language to language. Sentences broken up into several strings coupled with linguistic differences among languages make it hard for autotranslation tools to create one-to-one glossaries without errors

For example, take the following English sentence that has been broken up into three strings:

"When this box is checked, Windows NT does not"
"automatically display the user name of the last person"
"to log on in the Authentication dialog box." 

In order to convey the syntactical complexity and additional length in the localized version, the sentence was translated like this in German:

"Wenn dieses Kontrollkästchen aktiviert ist, zeigt"
"Windows NT nicht automatisch den Namen des"
"Benutzers an, der sich zuletzt in dem Dialogfeld"
""Authentifizierung" angemeldet hat." 

Translated back into English, this sentence literally means:

"When this controlbox checked is, -plays"
"Windows NT not automatically the name of
"the user -dis, who him/herself last in the dialog box"
""Authentication" logged has." 

Keeping the sentences in single strings allows automated translation tools to help your translators be more precise, more productive, and thus allow more cost savings for you.

Top of pageTop of page

Watch Your String Buffer Sizes

Length restrictions for strings are potential bugs that can cause major problems. For example, a product build was broken because of the German translation for the following string:

"Press Ctrl+Alt+Del to restart." 

When translated into German, this string almost doubled in size to:

"Drücken Sie Strg+Alt+Entf, um den Computer neu zu starten." 

The increased string size caused the buffer to overflow, which in turn crashed the system. The message was supposed to go into a boot sector of a certain byte size, and there was no space for a longer message. The only thing that can be done for this type of problem is to make sure that you communicate to localizers how much room they have for their translation.

Along these same lines, buffer overruns, when there is no physical restriction, are notorious for bugs even in nonlocalized software. Localizing strings tends to reveal buffer overrun problems. These types of problems can be eliminated if buffers are dynamic or are allowed to have the maximum buffer size. Estimating size requirements will be discussed in the next section, which will explain how an optimal UI design makes localization easier.

Top of pageTop of page

Isolate Localizable Resources  Mirroring

 

Top of pageTop of page Previous 3 of 5 Next