Capitalization, Uppercasing, and Lowercasing
On This Page
Overview and Description
When creating a locale–aware application, you'll need to consider handling of linguistic nuances. These nuances might seem trivial, but could have a large impact on application design and functionality. For example, Windows allows you to convert characters into either uppercase or lowercase equivalents. Some applications use this feature to automatically convert the first letter of every sentence into uppercase or to assume that certain types of words should always be capitalized. In Russian, however, names of the days of the week are never capitalized–capitalizing the word for "Wednesday" changes the meaning to "environment," and capitalizing the word for "Sunday" changes the meaning to "resurrection."
In the past as localized products were developed, language–sensitive issues–such as casing–were sometimes handled with what were thought of as well–designed, intelligent algorithms. For example, an uppercasing macro that relies on the code–point numbers of ASCII characters and the linear relationship between uppercase characters (A = 41) and lowercase characters (a = 61) can be written as:
#define ToUpper(ch) ((ch)<='Z' ? (ch) : (ch)+'A' - 'a')
You can see the problems this English–centric approach presented when representing uppercasing on non–Latin scripts or languages with accented characters where, for example, character mapping doesn't follow the assumed relationship between lowercase and uppercase characters? There are several other reasons why algorithmic solutions for case–folding do not cover all occurrences.
First, some languages do not have a one–to–one mapping between their uppercase and lowercase characters. For instance, the uppercase equivalent of the German ß is "SS." Second, some characters have different mappings depending upon the language in which they are used. For example, the lowercase "i" in English maps to a dotless uppercase letter: "I." However, in Turkish the lowercase "i" maps to a dotted uppercase letter: "İ." Finally, most non–Latin scripts do not even use the concept of lowercase and uppercase, as in the case of Chinese, Japanese, and Korean; Arabic, Farsi, and Hebrew; as well as Thai. For example, since Farsi has no notion of uppercasing, string output is composed of random and unsupported glyphs.
The English-centric uppercasing macro used on an English string and on a Farsi string, where the notion of casing does not exist.
Capitalization, Uppercasing and Lowercasing in Win32
CharUpper, and CharUpperBuff, functions convert lowercase characters of a string or a buffer, respectively, to uppercase characters. This uppercasing is done with regard to the currently selected user–locale value and the linguistic uppercasing rules associated with this locale. CharLower and CharLowerBuff, functions convert uppercase characters of a string or a buffer, respectively, to lowercase characters. This lowercasing is done with regard to the currently selected user–locale value and the linguistic lowercasing rules associated with this locale.
If you want to perform the casing operation based on locale standards other than the currently selected user–locale rules (something that the functions just mentioned do not allow), you can use LCMapString, as shown here:
LCID Locale, // locale identifier whose rule will be used
// to perform the casing
DWORD dwMapFlags, // mapping transformation (LCMAP_LOWERCASE or
LPCTSTR lpSrcStr, // source string
int cchSrc, // number of characters in source string
LPTSTR lpDestStr, // destination buffer
int cchDest // size of destination buffer
Capitalization, Uppercasing and Lowercasing in .NET Framework
The String class provides a set of methods you can use to perform culture–sensitive string manipulation once you set your CurrentCulture, to a given desired culture. The String.ToUpper, and String.ToLower, methods can be used to convert a character string to uppercase or lowercase for a given culture.
Another major area that pertains to locale awareness is sorting in a way that matches the particular locale. The sections that follow examine the best way to accommodate the multiple sort orders that exist in various countries and regions. They will also illustrate the most efficient ways to perform string comparison for Win32 applications and in the .NET Framework
Capitalization, Uppercasing and Lowercasing in Web Pages
Scripts running in a browser might need to manipulate the character casing. VBScript and Microsoft JScript both provide means for case conversions that operate on multilingual input. In VBScript, use UCase to convert a string to uppercase and LCase to convert a string to lowercase; in JScript, use String.toUpperCase and String.toLowerCase, respectively.
The following example uses the UCase function to return an uppercase version of a string:
MyWord = UCase("Hello World") ' Returns "HELLO WORLD".
Obviously the scripting technology does not offer the same flexibility to manipulate the casing as NLS APIs do in the case of Win32 programming. However, the functions available give you the ability to manipulate the casing for whatever string, though the conversion is not locale–sensitive.