Frequently Asked Questions

Complex Scripts

A script is the set of symbols required to represent a single writing system, which may in turn be used to represent several languages. Latin, Arabic, and Thai are examples of scripts. English, French, German, and Latin are all languages written using the Latin script.

A complex script is one that requires special processing to display and process. In practice, it is a script that cannot be displayed or otherwise processed by the usual means in the GDI and USER components of Windows 2000.

The special processing required by a complex script can involve one or more of the following characteristics: character reordering; contextual shaping; display of combining characters and diacritics; specialized word break and justification rules; filtering out illegal character combinations.

Character reordering, necessary for the Arabic, Hebrew, and Indic scripts, is the rearrangement of characters in sequence from their logical order (the order in which they are input) to their visual order (the order in which they are displayed). Many scripts from the Middle East, such as Arabic, Hebrew, and Persian, are complex because of their bi-directional layout requirements; whereas words are written right-to-left, numerals are displayed left-to-right.

Many Indic scripts, such as Devanagari and Tamil, also require reordering, because vowel signs often appear to the left of, below, or above a character that they follow in logical order.

In some scripts, such as Arabic and Indic scripts, the glyph displayed depends on the surrounding characters. A single Arabic character, for example, can take different shapes if it's the first, middle, or last character in a word. Some Arabic characters must form ligatures, and others will if the appropriate glyph is available in the selected font. Contextual shaping is the formation of correct sequences of glyphs given these contexts.

Indic scripts also require contextual shaping, because the form of a vowel sign depends on the character to which it is attached.

Stacking or combining multiple characters into one "pile" is another issue which must be addressed for Arabic, Thai, Hebrew, and Indic scripts. The Unicode standard has many combining characters, but in the case of European languages, they are optional, or can be replaced with pre-composed characters. This is not the case with those in the Indic family of languages, or of others such as Thai. A Thai syllable, for example, usually consists of a consonant followed by a vowel and optionally a tone mark, the latter two of which can be placed to the left of, below, above, or to the right of the consonant.

Thai and other languages require special word-break logic because Thai words are not delimited by any enumerable set of characters, such as white space for European languages.

Since Thai syllables consist of a consonant optionally followed by one vowel and/or one tone mark, some character combinations (e.g., two vowel marks in succession) are nonsensical. Thus, one of the tasks of complex script enabling is to filter out or disallow illegal character combinations.

