Glossary of Terms Used on this Site
Portions of this glossary reproduced with permission from Developing International Software, published by Microsoft Press.
.NET Framework: A platform that enables the creation and use of Extensible Markup Language (XML)-enabled applications, processes, and Web sites as services that share and combine information and functionality with each other by design. The .NET Framework is available for Windows platforms or smart devices, providing tailored solutions for organizations and individuals. See Common Language Runtime (CLR).
–A APIs: The Win32 application programming interface (API) entry points that usually expect string parameters to be encoded in Windows code pages (also called “ANSI code pages”) or sometimes OEM code pages. See –W APIs.
Accelerator: An Alt+character combination used to activate menus, menu items, and dialog box items in Windows. The character that activates the menu or dialog box item is underlined. It is also called a “hot key.”
Accessibility: The extent to which computers are easy to use and available to a wide range of users, including people with disabilities.
ACP: Acronym for the Windows (ANSI) code page in use. Windows NT uses this code page to convert to and from Unicode (UTF-16) automatically whenever an application calls one of the “A” entry points of Win32 APIs.
Active Input Method Manager (IMM): An ActiveX control that provides limited IMM service on non-Asian language versions of Windows 95, Windows 98, Windows Me, and Windows NT 4 platforms. It is replaced by the more general Text Services Framework in Windows XP. Active IMM is also known as “Global IME.”
ADO.NET: Stands for Microsoft ActiveX Data Objects for the .NET Framework. A set of classes that expose data-access services to the .NET programmer. ADO.NET supplies a rich set of components for creating distributed, data-sharing applications. It is an integral part of the .NET Framework, providing access to relational data, XML integration, and application data.
Alphabet: Elements of a writing system composed of a collection of letters that have a one-to-one relationship with a sound. See Letter.
Alphanumeric: Consisting of either letters or numbers, or both.
Alternates: In fonts, alternates are similar to positional forms. For example, kanji uses alternative forms of parentheses when positioned vertically.
AltGr: The Alt key on the right on some non-U.S. Windows keyboard layouts. The AltGr key is equivalent to the Ctrl+Alt key combination and is used to create an alternative shift state for accessing additional characters on some keys.
Alt+Numpad: A method of entering characters by typing in the character’s decimal code with the Numeric Pad keys (Num Lock turned on). In Windows:
ANSI: Acronym for the American National Standards Institute. The term “ANSI” as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community. The source of this comes from the fact that the Windows code page 1252 was originally based on an ANSI draft—which became International Organization for Standardization (ISO) Standard 8859-1. “ANSI applications” are usually a reference to non-Unicode or code page–based applications.
ANSI C: The standardized C programming language.
Anti-aliasing: A software technique for smoothing the jagged appearance of curved or diagonal lines caused by poor resolution on a display screen.
Application programming interface (API): A set of functions supported by the operating system.
ASCII: Acronym for American Standard Code for Information Interchange, a 7-bit encoding. Although primitive, ASCII’s set of 128 characters is the one common denominator contained in most of the other standard character sets and in all Windows and OEM code pages.
ASP.NET: Stands for Microsoft Active Server Pages for the .NET Framework. The new generation of Active Server Pages (ASP) files written in a managed language on the Common Language Runtime (CLR) using the .NET Framework. Also known as “ASP+,” and “ASPX.”
Attribute: (1) In C/C++ programming, attributes contain information or parameters that define object classes property and/or behaviors. (2) In HTML, an attribute is a parameter that defines a special property of an HTML element. Attributes are specified within start tags. For example, <IMG SRC=“image.gif”> means that the element IMG has an attribute called “SRC,” which is assigned the indicated value. (3) In XML, an attribute is similar to HTML attributes, but describes additional properties on any XML element.
Base character: An encoding code point that does not graphically combine with preceding characters and that is neither a control nor a format character. The Latin “a” is an example of a base character.
Beta testing: Distributing pre-release software to future users and potential customers in order to get feedback and bug reports.
Bidirectional (BiDi) rendering: Refers to the script’s ability to handle text that reads both left to right and right to left. For example, in the bidirectional rendering of Arabic, the default reading direction for text is right-to-left, but for numbers, it is left-to-right. Processing a complex script must account for the difference between the logical (keystroke) order of input and the visual order of the output glyphs. In addition, processing must properly deal with caret movement and hit testing. The mapping between screen position and a character index for, say, selection of text or caret display requires knowledge of the layout algorithms.
Bidirectional text: A mixture of characters that are read from left to right and from right to left. Most Arabic and Hebrew strings of text for example, are read from right to left, but numbers and embedded Western terms within Arabic or Hebrew text are read from left to right.
Big-5: The multibyte encoding for Traditional Chinese characters standardized by Taiwan.
Big-endian: A computer architecture that stores multibyte numerical values with the most significant byte values first. On systems using big endian architecture, the letter “A” (U+0041) is stored as 0x00 0x41. See Little-endian.
Binary file: A file that has been encrypted, encoded, or compiled, as opposed to a plain text file.
Bitmap font: A font whose characters are represented by bitmaps or by a pattern of dots, as opposed to vector, TrueType, or OpenType fonts, whose characters are represented by lines and curves. Bitmap fonts are generally less scalable and more jagged than TrueType fonts. See TrueType.
Bopomofo: A set of characters used to teach the phonetics of Chinese. These characters are used in teaching materials such as dictionaries, but do not appear in the actual writing of the Chinese language.
Boundary: The point of interaction between systems or applications that use different character encodings.
BSTR: Known as a “basic string” or “binary string.” A BSTR is a pointer to a wide-character string used by Automation data-manipulation functions. BSTR strings are also used for passing string data between Component Object Model (COM) components, by utilizing strings allocated with COM’s memory allocator.
Byte-order mark (BOM): The Unicode character U+FEFF—or its noncharacter mirror-image, U+FFFE—used to indicate the byte order of a text stream. The presence of a BOM at the beginning of a text file is a strong clue that a file is encoded in Unicode.
Candidate window: The window of an Input Method Editor (IME) that lists characters the user can choose to replace the text highlighted in the composition window.
Cardinal splines: In GDI+, a sequence of individual curves joined to form a larger curve.
Caret: The blinking line indicating the space into which you insert text.
Case: Two distinct variations or forms of the same character within the same alphabet. These variants, which differ in shape and size, are called “uppercase” letters (also known as “capital” or “majuscule” letters) and “lowercase” letters (also known as “small” or “minuscule” letters).
Case-folding: Taking a string of text and converting everything into either lowercase or uppercase.
Chang Jei: An input method that uses radicals to build Chinese characters. See Radicals. Twenty-five radicals are assigned to the letters “A” through “Y.” The letter “X” is used to generate more complex radicals. See Input Method Editor (IME).
Character: (1) The smallest components of a writing system or script that have semantic value. A character refers to an abstract idea rather than to a specific glyph or shape that a character might have once rendered or displayed. (2) A code element.
Charset: Stands for “character set.” A set of characters used in Windows. Charsets refer to the same collections of characters as those defined by Windows code pages. See Code page below.
Client coordinates: Relative coordinates of a window or client area as specified by the system or applications. Client coordinates ensure that an application can use consistent coordinate values while drawing in the window, regardless of the position of the window on the screen.
Clipboard: A Windows utility used as a buffer for copying and pasting text.
Clusters: The sequence of characters or glyphs between points at which the Unicode representation of a string aligns with the glyph representation. For simple text, where each code point is represented by a single glyph, the cluster is the character and its glyph. For simple ligatures, where two or more code points are represented by a single glyph, the cluster is the sequence of code points and the single glyph. The most complex case is when a sequence of code points is represented by a sequence of glyphs with no internal alignment between characters and glyphs. This can occur, for example, in the case of reordering within Indic syllables.
Code page: An ordered set of characters of a given script in which a numeric index (code-point value) is associated with each character. In this book, this term is generally used in the context of code pages defined by Windows and can also be called a “character set” or “charset.”
Code point, or code element: (1) The minimum bit combination that can represent a unit of encoded text for processing or exchange. (2) An index into a code page or a Unicode standard.
Collation: Refers to a set of rules that determine how textual data is sorted and compared.
Combining character: A character that graphically combines with a preceding base character. The combining character is said to apply to that base character. The combining acute accent mark ́ - U+0301 - is an example of a combining character.
Combining-mark sequence: An alphabetic base character followed by one or more combining-mark characters such as acute and grave accents.
Combined characters or ligatures: Characters that join into one character when placed together. One example is the “ae” combination in English; it is sometimes represented by a single character. Arabic is a script that has many combining characters.
Commenting model: A way for the development team to pass along localization instructions to the localization team. This information might include whether strings can be localized or whether limitations on the string length exist, for instance, or can just consist of general comments added to the resource files. See localization.
Common controls: Within the context of Windows, a set of controlling elements (windows) that are implemented by the common control library, which is a dynamic-link library (DLL) included with the Windows operating system. Like other control windows, a common control is a child window that an application uses in conjunction with another window to perform input/output (I/O) tasks.
Common dialog boxes: Standard dialog boxes defined by Windows for operations found in numerous applications; these operations include Open, Save As, Print, Page Setup, Color Selection, Font Selection, and Find. Applications can call common dialog box API functions directly instead of having to supply a custom dialog template and dialog procedure.
Common Language Runtime (CLR): A very important part of the .NET Framework. At the base level, it is the infrastructure that executes applications, and allows them to interact with the other parts of the .NET Framework. It also provides important capabilities in optimizing, securing, and providing many robust capabilities such as application deployment and side-by-side execution. See .NET Framework.
Compatibility zone: The area in Unicode repertoire from U+F900 through U+FFEF that is assigned to characters from other standards. These characters are variants of other Unicode characters.
Complex scripts: Scripts that require special handling when it comes to shaping and laying out characters in software applications. This special handling is closely related to linguistic requirements of these scripts. Complex scripts can have any combination of the following attributes: bidirectional rendering, contextual shaping, combining characters, as well as specialized word-breaking and justification rules. See Contextual analysis below, Rendering, and Uniscribe.
Component Object Model (COM): A specification that Microsoft developed for building software components that can be assembled into programs or that add functionality to existing programs running on Microsoft Windows platforms.
Console: The Windows subsystem that runs character-based applications, as opposed to applications that have a graphical user interface (GUI).
Constant: A numeric value, typically an integer, that refers to a character value, the size of a buffer, the position of a character in a string, and so forth. It is assumed that the value does not change during the time a program is running.
Content recycling: Reusing content in localization. Content recycling saves time and money because the content is researched, presented, edited, reviewed, and translated only once.
Contextual analysis: A process for determining how to handle text based on surrounding characters, as in Arabic, in which a glyph changes shape depending on its position in a word. See Complex scripts above, Rendering, and Uniscribe.
Control Panel: A group of Windows utilities used to edit system settings, including international preferences.
Conversion or composition window: The window of an IME that displays text typed by the user, either just the way it is entered or after it is converted to ideographic form.
Cross-platform: Portable or applicable to more than one operating system.
Cultural convention: Data or data formats that are specific to a language, local dialect, or geographic location. Examples are currency symbols, date formats, calendars, numeric separators, and sort orders.
Cursive attachment: Used when adjacent glyphs need to be positioned in order to join them cursively. It is heavily utilized in fonts that support cursive scripts like Arabic. See Kashida.
Cyrillic script: The script traditionally used for writing various Slavic languages, including Russian. Over the past two centuries, the Cyrillic script has been extended so that some of the other non-Slavic minority languages of the former Soviet Union could be written. Cyrillic script is written in linear sequence from left to right.
Date picture string/time picture string: A string used to represent a date or time format—for example, "dd MMMM, yyyy".
Dead key: A key that does not produce a character by itself, such as the accent key on the international keyboard. However, when the user types in a character after pressing the accent key, an accented character appears.
Decomposition: The breakdown of an accented character or a precomposed character into an ordered set of character components. For “ã” the components are “a” followed by the combining character “~”.
Determined string: A string that has been converted from a phonetic representation into ideographs.
Device context: A Graphics Device Interface (GDI) structure that defines a set of graphics objects and their associated attributes, as well as the graphics modes that affect output. The graphics objects include a pen for line drawing, a brush for painting and filling, a bitmap for copying or scrolling parts of the screen, a palette for defining the set of available colors, a region for clipping and other operations, and a path for painting and drawing operations.
Device Driver Kit (DDK): A set of tools and libraries for creating Windows-based software to run hardware devices such as printers, along with documentation.
Diacritic: A character that is attached to or overlays a preceding base character. For example, a mark placed over, under, or through a Latin-based character—such as “~”—to indicate a change in phonetic value from the unmarked state. Most diacritics are nonspacing characters that don’t increase the width of the base character. See Accented character.
Diaeresis: Two dots placed over a Latin vowel to indicate that the vowel is pronounced as a separate syllable (as in the word “naïve”). Typically used when two vowels are adjacent, but should be pronounced separately rather than as a diphthong. See Umlaut.
Digital Dashboard: A container that provides a customized display of information consolidated from various information sources. Every Digital Dashboard is a Web page containing one or more Web Parts. See Web Parts.
Digraph: A combination of characters that is written separately but forms a single lexical unit—for example, the Danish “aa” and the Spanish “ch” and “ll”.
Double-byte character set (DBCS): A character encoding in which the code points can be either 1 or 2 bytes. Used, for example, to encode Chinese, Japanese, and Korean languages. See Multibyte character set (MBCS), and CJK/CJKV.
Dynamic-link library (DLL): A module containing functions or resources that other programs or DLLs can utilize. DLLs cannot run by themselves; other programs have to load them.
EBCDIC: Extended Binary Coded Decimal Interchange Code. These types of code pages are used on IBM and other manufacturers' mainframes.
Element: The basic unit of information in an HTML or XML document. Elements are arranged hierarchically to define the overall document structure.
Enabling: Altering program code to handle input, display, and editing of bidirectional or East Asian languages, such as Arabic and Japanese, respectively.
Encoding: A method or system of assigning numeric values to characters (for example, ASCII, Unicode, Windows 1252).
End-User Defined Character (EUDC): A special character, such as a rare ideograph, that the user creates with a EUDC editor and assigns to a code point within a reserved range.
Extended characters: (1) Characters above the ASCII range (32 through 127) in single-byte character sets. (2) Accented characters.
Extensible Stylesheet Language (XSL): An XML language used for transforming XML documents into something that can be displayed, such as HTML.
Floating accent: See Diacritic and Floating diacritic below.
Floating diacritic: A combining character that overlays the preceding base character; it can potentially change position or shape according to the shape of the base character. The combining right arrow above–character (U+20D7) is an example of a floating diacritic.
Following characters: Characters—such as closing quotation marks, closing parentheses, and punctuation marks—that shouldn’t be separated from succeeding characters.
Font: Any of numerous sets of graphical representations of characters that can be installed on a computer, printer, or another graphic output device.
Font association: The automatic pairing of a font that contains ideographs with a font that does not contain ideographs. This allows the user to enter ideographic characters regardless of which font is selected.
Font fallback: Mechanism for providing an alternate font for runs of characters not representable by the original font. Understands complex scripts and typographic effects. Font fallback is a hard-coded list of standard fallback fonts according to the Unicode character range.
Font substitution: The explicit replacement of any reference or call to a given font face name with another face name. For example, MS Shell Dlg is only a face name and has no associated physical font. It is substituted by Microsoft Sans Serif in the English version of Windows XP, and varies per localized version.
Front-end processor: See Input Method Editor (IME).
Full-width character: Characters whose glyph image extends across the entire character display cell. In legacy character sets, full-width characters are normally encoded in 2 or more bytes. The Japanese term for full-width characters is “zenkaku.” See Half-width character.
GB 2312-80: A multibyte encoding standardized by the People’s Republic of China.
Generic data type: A macro, such as TCHAR, that resolves to either an ANSI type or a wide-character (Unicode) type, depending on compile-time flags.
Generic prototype: A macro representing an API call or a function call. The macro resolves to an entry point that expects either ANSI parameters or wide-character (Unicode) parameters, depending on compile-time flags.
Global.asa: A file typically containing scripts that initialize application or session variables, connect to databases, send cookies, and perform other operations pertaining to the ASP application or to the user’s session with an ASP application as a whole.
Globalization: The process of developing a program core whose features and code design are not solely based on a single language or locale. Instead, their design is developed for the input, display, and output of a defined set of Unicode-supported language scripts and data related to specific locales. See Internationalization.
Globalized functionality testing: Functionality testing that has been enhanced to include verifying the world-readiness of a product.
G11N: Abbreviated form of GLOBALIZATION. G + 11 characters + N.
Glyph: The actual shape (bit pattern, outline, and so forth) of a character image. For example, an italic “a” and a roman “a” are two different glyphs representing the same character.
GMT: Greenwich Mean Time. See UTC.
Graphics Device Interface (GDI): In Windows, a graphics display system used by applications to display or print bitmapped text (TrueType fonts), images, and other graphic elements. The GDI, in particular, is responsible for drawing dialog boxes, buttons, and other elements.
Gregorian calendar: A solar-based dating system used as the default calendar of countries in the Western hemisphere and also used widely in other parts of the world.
Group Policy: Centralized policy-based administration that enables an administrator to control or specify registry-based policy settings, security settings, software installation, scripts to run at computer startup and operating-system shutdown, Internet Explorer maintenance, and folder redirection. Some group policy features may require the installation of Active Directory to work.
Half-width character: Characters whose glyph image occupies half of the character display cell. In legacy character sets, half-width characters are normally encoded in a single byte. The Japanese term for half-width characters is “hankaku.” See Full-width character.
Han unification: The process of assigning the same code point to characters historically perceived as being the same character but represented as unique in more than one East Asian ideographic character standard. This results in a group of ideographs shared by several cultures and significantly reduces the number of code points needed to encode them.
Hangul: The native name for the Korean language.
Hanja: The Korean name for ideographic characters of Chinese origin.
Hanzi (hantsu): The Chinese name for ideographic characters of Chinese origin.
Hard-coding: (1) Putting string or character literals in the main body of code, such as the .C files or the .H files, instead of in external resource files. (2) Basing numeric constants on the assumed length of a string or having any assumptions about language- or culture-specific matters fixed in the code (such as length of strings, formats of dates, and so on).
Hebrew lunar calendar: A calendar based on the cycle of the moon around the Earth. The length of this cycle, the lunar month, is about 29½ days. Twelve lunar months make, therefore, about 354 days.
Hijri (Islamic lunar) calendar: Since the Islamic calendar is purely lunar, as opposed to solar or luni-solar, the Islamic (Hijri) year is shorter than the Gregorian year by about 11 days, and months in the Islamic (Hijri) year are not related to seasons, which are fundamentally determined by the solar cycle.
Hiragana: The Japanese cursive script. Each hiragana character represents a phonetic syllable. See Katakana.
Hub and spoke model: Within the context of the .NET Framework, the hub is the main assembly that contains the nonlocalizable executable code and the resources for a single culture, called the “neutral” or “default” culture. The default culture is the fallback culture for the application. Each spoke connects to a satellite assembly that contains the resources for a single culture, but does not contain any code.
Ideographic character: A character of Chinese origin representing a word or a syllable that is generally used in more than one Asian language. Sometimes referred to as a “Chinese character.” See Kanji, Hanzi, and Hanja.
Indexing Service: A base service of Windows NT, Windows 2000, and Windows XP that extracts content from files and constructs an indexed catalog to facilitate efficient and rapid searching. Indexing Service can extract both text and property information from files on the local host and on remote, networked hosts.
Input context: An internal structure that stores IME-related status information. Windows supports multiple IME contexts, automatically creating an input context for each active thread.
Input language handle (HKL): A data type to indicate language/layout pairs.
Input locale: Pairing of input language (LANGID) and method of input determines what language is currently being entered and how. See Locale, System locale, User locale, and User-interface (UI) language.
Input method: Any method used to enter text. These methods include different keyboard layouts and IMEs, as well as newer input services such as voice-recognition engines or handwriting-recognition engines.
Input Method Editor (IME): A program that performs the conversion between keystrokes and ideographs or other characters, usually by user-guided dictionary lookup.
Input Method Manager (IMM): The module on Windows 2000 and Windows XP that handles communication between IMEs and applications.
Internal code input method: An input method that allows the user to select a character by typing in its Big-5 code-point index.
Internationalization: Term used outside of Microsoft to indicate globalization and localizability. I18N is a common abbreviation for “Internationalization” because the “I” in “International” is followed by 18 letters and ends with the letter “N.” See Globalization and Localizability.
I18N: Abbreviated form of INTERNATIONALIZATION. I + 18 characters + N.
Invariant culture: Whereas the neutral culture is associated with a language, but not with any particular country or region, the invariant culture is neither associated with a language nor with a particular country or region. It can be used in almost any method in the Globalization namespace that requires a culture. The invariant culture must be used only by processes that require culture- and-language-independent results, such as system services; otherwise, it produces results that might be linguistically incorrect or culturally inappropriate.
ISO: International Organization for Standardization. It is a worldwide federation of national-standards bodies.
ISO 4217 currency symbol: Three-letter ISO codes for representing currencies and funds (such as CAD for the Canadian dollar).
ISO 8859: The International Organization for Standardization’s 8-bit encoding that served as the basis for the Windows (ANSI) code page. Variants of this standard (for example, 8859-2, 8859-5, 8859-13) target different scripts, and each variant corresponds to different Windows code pages.
ISO 10646: The International Organization for Standardization’s encoding that is code-for-code equivalent to Unicode.
Isolate, initial, medial, and final character forms: The different shapes of an Arabic character that correspond to its position in a word.
Item: In scripts, a character string having all the same script and direction attributes.
Jamos: The 24 basic elements of the Korean script.
Japanese Emperor Era: The Japanese calendar that works exactly like the Gregorian calendar, except that the year and era are different. The Japanese calendar recognizes one era for every emperor’s reign. The current era is the Heisei era, which began in the Gregorian calendar year 1989. The era name is typically displayed before the year. For example, the Gregorian calendar year 2002 would be Heisei 14 in the Japanese calendar.
Johab: The Korean standard character set (KS C-5601-1992), which corresponds to Windows code page 1361. This character set includes all possible Hangul character combinations.
Kana: The set of Japanese hiragana and katakana characters.
Kashida: Character added to justify lines and paragraphs in Arabic. (“Kashida” means “stretch” in Arabic.)
Katakana: A Japanese script of phonetic syllables, chiefly used to spell words borrowed from other languages. Each katakana character represents a phonetic syllable. See Hiragana.
Keyboard layout: A standard arrangement of characters on a keyboard that defines which keys produce particular characters or scan codes.
Korean Tangun Era calendar: According to Korean legend, the god-king Tangun founded the Korean nation in BC 2333. Early Korea used a lunar calendar. As the rest of the world encroached on Korea, it eventually went to the solar, Gregorian calendar. Yet much of the country still uses the lunar calendar to keep track of births and deaths and some traditional holidays.
Korean KSC: Character encoding established by the Korean Industrial Standards Association. Now known as “KSX.”
KS C-5601-1987: The multibyte Wansung encoding standardized by Korea.
KS C-5601-1992: The multibyte Johab encoding standardized by Korea.
Language enabling: (1) Adding support to software for document content in a particular language. In this sense, to enable an application for Japanese means to modify the software so that the user can enter, display, edit, and print text containing Japanese. (2) Modifying software so that it can be localized into a particular language. In this sense, enabling for Japanese means to modify software so that it can display Japanese text correctly in menus, dialog boxes, and other user-interface elements. Note that in either sense, an enabled product can still have the user interface in English such as when the product is not localized.
Language group: Term used to describe the supported script families in Windows 2000.
Language ID (LANGID): A 16-bit value defined by Windows, consisting of a primary language ID and a secondary language ID. Used as a parameter to several Win32 functions and messages.
Language/layout pair: (1) A language installed on the system and the input method associated with it. (2) The input language.
Latin script: The set of 26 characters (A–Z) inherited from the Roman Empire that, together with later additions, is used to write languages throughout Africa, the Americas, parts of Asia, Europe, and Oceania. The Windows Latin 1 character set covers Western European languages and languages that use the same alphabet, and the Latin 2 character set covers many languages in Central and Eastern Europe. In addition, there are other Windows code pages that support Turkic and Baltic languages written in the Latin script.
Layout: The order, positioning, and spacing of text or other user-interface elements.
Lead-byte: The first byte of a 2-byte code point in a DBCS code page. See Double-byte character set (DBCS).
Leading characters: Characters—such as opening quotation marks, opening parentheses, and currency signs—that shouldn’t be separated from succeeding characters.
Left-to-right (LTR) text: Text that flows from left to right. There are two primary kinds of text: left-to-right text such as in English and Latin languages, and right-to-left (RTL) text such as in Arabic and Hebrew.
Left-to-right embedding (LRE) mark: In a document, it signals that a piece of text is to be treated as embedded left to right. For example, an English quotation in the middle of an Arabic sentence could be marked as being embedded left to right. (LRE affects word order, not character order.)
Left-to-right override (LRO) mark: A Unicode control character (U+202D) that forces characters following it to be treated as strong left-to-right characters. Allows for nested directional overrides of bidirectional characters. See Right-to-left override (RLO) mark.
Letter: (1) The basic element of an alphabet. (2) A higher level of abstraction than Character. For example, both the Spanish “ch” and the Danish “aa” can be considered as single letters for some purposes. (Both sort as a single character.) See Text element and Alphabet.
Levels of localization: The amount of translation and customization necessary to create different language editions. The levels, which are determined by balancing risk and return, range from translating nothing to shipping a completely translated product with customized features.
Ligature: Two or more characters combined to represent a single typographical character. The modern Latin script uses only a few. Other scripts use many ligatures that depend on font and style. Some languages, such as Arabic, have mandatory ligatures. Other languages have characters that were derived from ligatures, such as the German ligature of long and short “s” (ß) and the ampersand (&), which is the contracted form of the Latin word “et.”
Linear gradient brush: In GDI+, can be horizontal or vertical. For example, a horizontal gradient brush can be configured to change color as you move from the left side of a figure to the right side.
Literal: In program code, a string surrounded by double quotation marks or a character surrounded by single quotation marks.
Little-endian: A computer architecture that stores multibyte numerical values with the least significant byte values first. On systems using little endian architecture, the letter “A” (U+0041) is stored as 0x41 0x00. See Big-endian.
Locale: The collection of features of the user’s environment that is dependent on language, country/region, and cultural conventions. The locale determines conventions such as sort order; keyboard layout; and date, time, number, and currency formats. In Windows, locales usually provide more information about cultural conventions than about languages.
Locale-aware: Exhibiting different behavior or returning different data, depending on the locale. For example, the Win32 sorting functions return different results depending on the locale parameter sent to each function.
Locale ID (LCID): A 32-bit value defined by Windows that consists of a language ID, a sort ID, and reserved bits.
Locale-sensitive: Exhibiting different behavior or returning different data, depending on the locale. For example, the Win32 sort functions return different results depending on the locale parameter sent to each function.
Localizable resource: Any element of a program’s UI that requires translation or modification for different languages. These elements are either UI resources or resources that need to be modified for the adaptation of a localized product (font information, locale information, folder names, account names, and so on).
Localizability: The design of the software code base and resources such that a program can be localized into different language editions without any changes to the source code.
Localization: The process of adapting a program for a specific local market, which includes translating the user interface, resizing dialog boxes, customizing features (if necessary), and testing results to ensure that the program still works.
L10N: Abbreviated form of LOCALIZATION. L + 10 characters + N.
Localization kit: A subset of tools, source files, and binary files that can be used to create a localized edition of a program. Generally given to translators or third-party contractors.
Logical order: In the same order in which it is typed. Generally refers to text that might be displayed in a different order, such as Arabic, Hebrew, or bidirectional text. See Visual order.
Logograph, or logographic: From the Greek “logo,” meaning “word”: a letter, symbol, or sign used to represent an entire word. Chinese characters are more properly termed “logographic” than “ideographic” because they represent words or parts of words rather than abstract concepts.
Lowercase: Denotes letters that are not capitalized. For instance, the word “nationality” is all lowercase. The notion of lowercase does not apply to East Asian and Middle Eastern scripts.
Mark: In typography, a glyph for a character like a diacritic or tone mark that combines with other marks or characters. Marks can be spacing or nonspacing glyphs in a font.
Mark attachment: Used in typography when marks need to be attached or positioned to other marks, base glyphs, or ligatures.
Message table: A Win32 resource that uses sequential numbers rather than escape letters to mark replacement parameters, making it convenient to store alert messages and error messages that contain several replacement parameters.
Microsoft Layer for Unicode (MSLU): Enables Unicode applications to run on code page–based versions of Windows (Windows 95, Windows 98, and Windows Me).
Mirroring: System-provided support that offers a true right-to-left (RTL) look and feel to the user interface when creating localized applications for RTL languages (such as for Arabic and Hebrew versions of Windows 98, Windows Me, Windows 2000, and Windows XP).
Mixed environment: A computer environment, usually a network, in which the operating systems of different computers are based on different character encodings.
MLang: A Component Object Model (COM) component that provides a variety of services. These services include detecting the character encoding used by Web pages and e-mails, converting text from one encoding to another as part of an import or export operation, and displaying characters that are not included within the font specified for parts of a Web page.
MM_TEXT: Unit of measure used to convert logical units to device units; it also defines the orientation of the device’s x- and y-axes. GDI uses the mapping mode to convert logical coordinates into the appropriate device coordinates. The MM_TEXT mode allows applications to work in device pixels, where 1 unit is equal to 1 pixel. The physical size of a pixel varies from device to device.
Mode biasing: Incorporation of logic that biases the method used for input (mode) toward the type of input that is expected. For example, the name field on a Contacts dialog box will have information telling the input method that people’s names are expected. With this information, the input method biases toward those results and thus provides more accurate input.
Modeless input: A method of entering East Asian–language input with an IME. In contrast with modal input, modeless input allows you to easily and seamlessly switch back and forth between the composition mode and the direct mode (the document itself). You can easily correct, navigate, and make input to the general document, whereas with modal input the rest of the document is temporarily unavailable as long as you are in composition mode.
Morpheme: The smallest meaningful unit of a word. The word “dog” is one morpheme. The word “dogs” is two morphemes: “dog” + the plural marker “s.” Many ideographs are based on morphemes.
Multilingual: Supporting more than one language simultaneously. Often implies the ability to handle more than one script or character set.
Multilingual User Interface (MUI) Pack: A set of language-specific resources that can be added to the English version of Windows XP Professional and the .NET Server. Once installed, MUI Packs allow the UI language of the operating system to be changed to one of 33 supported languages, depending on user preference. The MUI Pack is the same as the MultiLanguage version of Windows 2000 Professional and Server, though it provides additional functionality.
Multiple-document interface (MDI): A UI in an application that allows the user to have more than one document open at the same time.
Mutex: A synchronization object that ensures only one thread at a time can access a shared resource. A thread must have ownership of the mutex before it can access the resource. The thread becomes blocked if the mutex is owned by another thread.
National standard: A linguistic rule, measurement, educational guideline, or technology-related convention as defined by a government or by the International Organization for Standardization. Examples include character sets, keyboard layouts, and some cultural conventions, such as punctuation.
Network News Transfer Protocol (NNTP): The Internet protocol that governs the transmission of newsgroups.
Neutral character: A character whose directionality (right-to-left or left-to right) is dependent on the directionality of the characters that surround it. See Contextual analysis.
Neutral culture: In the .NET Framework, refers to cultures identified by language only (with no associated geographic region). A neutral culture is indicated by a two-letter code, such as “de” for German.
NLS API: Acronym for National Language Support API. The set of system functions in 32-bit Windows containing information that is based on language and cultural conventions.
No-compile localization: A process of localization where the code is not recompiled.
Noncompile mirroring: Activating mirroring for the binary dynamic-link library resource files of an application, without the need for rebuilding the application.
Nonspacing character: A character, such as a diacritic, that has no meaning by itself but overlaps an adjacent character to form a third character.
OpenType: An extension of the TrueType font format, adding support for PostScript font data.
OpenType Layout: An extension to the OpenType format designed to provide support for international and high-end typography.
Original Equipment Manufacturer (OEM): See OEMCP.
OEMCP: Default OEM code page of the system. The OEM code page is used for conversions of MS-DOS-based, text-mode applications.
Outline: A series of contours made up of straight lines and curves that define the shape of a glyph.
Overflow characters: Punctuation characters that are allowed to extend beyond the right margin for horizontal text or below the bottom margin for vertical text.
Overload: Within the context of the .NET Framework, the concept of defining a procedure in multiple versions, using the same name but different argument lists. The purpose of an overload is to define several closely related versions of a procedure without having to differentiate them by name. You do this by varying the argument list.
Path gradient brush: In GDI+, a path gradient brush can be configured to change color as you move from the center of a figure toward the boundary.
Phoneme: A unique individual sound used in a language.
Plaintext: Computer-encoded text that contains only code elements and no other formatting or structural information (for example, font size, font type, or other layout information). Plaintext exchange is commonly used between computer systems that might have no other way to exchange information.
Pop directional formatting (PDF): In HTML, PDF terminates the effects of the last explicit code (either embedding or override) and restores the bidirectional state to what it was before the last left-to-right embedding (LRE), right-to-left embedding (RLE), right-to-left override (RLO), or left-to-right override (LRO) control characters.
Positional forms: Refers to the shape of a character that varies with the character’s position in a word. For example, the Arabic character “ha” can take any of four shapes, depending on whether the character stands alone or whether it falls at the beginning, middle, or end of a word.
Precomposed character: A character that is equivalent to a sequence of one or more characters. It is also known as a “composed character” or a “composite character.” Thus the combining character sequence “a + ' ” forms the precomposed, composed, or composite character “á.”
Private-use zone: The area in Unicode repertoire from U+E000 through U+F8FF, U+F0000 through U+FFFFD, and U+100000 through U+10FFFD that is set aside for vendor-specific or user-designed characters.
Radicals: A group of strokes in a Chinese character that are treated as a unit for the purposes of sorting, indexing, and classification. A character can contain more than one element that is recognized as a radical, but each character contains only one element, called the “main radical,” that is used as the indexing radical. The main radical often gives a hint as to the general meaning of the character, and other radicals in the character might indicate how the character is pronounced.
Rapid Application Development: Provides the ability to develop and deploy applications quickly by automating much of the development process and eliminating repetitive tasks.
RCDATA resource: A custom Windows resource element.
Reading order: The overall direction of a sequence of text. Whereas words in a given script always flow in the direction associated with that script (for example, LTR for Latin, RTL for Arabic and Hebrew), the flow of the sentence itself depends on the reading order. For example, a mixture of Arabic and French text can be regarded as French-embedded in an overall Arabic sentence, implying RTL reading order, or as Arabic-embedded in French, implying LTR reading order.
Registry: A Windows file that stores user preferences, including international settings as well as application-specific settings.
Release delta: The time between the release of the domestic product and the release of the localized edition.
Remoting: In the .NET Framework, remoting allows objects to interact with one another across application domains. The framework provides a number of services, including activation and lifetime support, as well as communication channels responsible for transporting messages to and from remote applications.
Request For Comments (RFC) documents: The written definitions of Internet protocols and policies.
Resource: (1) An element, such as a string, icon, bitmap, cursor, dialog, accelerator, or menu, that is included in a Windows resource (.RC) file. (2) Any item that needs to be translated.
Rich text: Text saved with formatting instructions that multiple applications, including compatible Microsoft applications, can read and interpret.
Right-to-left (RTL) text: Text flows from right to left. Examples are Arabic and Hebrew.
Right-to-left embedding (RLE) mark: In a document, it signals that a piece of text is to be treated as embedded right to left. For example, a Hebrew phrase in the middle of an English quotation could be marked as being embedded right to left. (RLE affects word order, not character order.)
Right-to-left override (RLO) mark: A Unicode control character (U+202E) that forces characters following it to be treated as strong right-to-left characters. Allows for nested directional overrides of bidirectional characters. See Left-to-right override (LRO) mark.
Romaji: A writing system based on the Latin alphabet that is used to represent Japanese text.
Round-trip conversion: Mapping a character from one character encoding to another and back. Of particular interest is how well information is preserved during round-trip conversion.
Ruby: An annotation or pronunciation guide for a string of text. The string of text annotated with ruby text is referred to as the “base text.”
Run-time library: Functions included with a C compiler that programs can call to perform various basic operations.
Runs: In scripts, portions of an item that have continuous formatting attributes.
Scan code: The value sent from the keyboard to the keyboard driver that represents which key was pressed.
Screen coordinates: The system and applications specify the position of a window on the screen in screen coordinates. The full position of a window is often described by a RECT structure containing the screen coordinates of two points that define the upper-left and lower-right corners of the window.
Screen dump: A bitmap of an element in a program’s graphical user interface, such as a dialog box or menu.
Script: A collection of characters for displaying written text, all of which have a common characteristic that justifies their consideration as a distinct set. One script can be used for several different languages (for example, Latin script, which covers all of Western Europe). Some written languages require multiple scripts (for example, Japanese, which requires at least three scripts—the hiragana and katakana syllabaries and the kanji ideographs imported from China). This sense of the word “script” has nothing to do with programming scripts such as Perl or Visual Basic Scripting Edition (VBScript).
Scripting engine: Component that handles character line measurement, display, caret movement, character selection, justification, shaping, and line breaking for complex scripts.
Separators: Symbols used to separate items in a list, mark the thousands place in numbers, or represent the decimal point. Different locales follow different conventions for separators.
Setup project: Allows you to create installers in order to distribute an application in Visual Studio .NET.
Shift-JIS: The multibyte encoding developed by Microsoft for Japanese that is based on Japan Industry Standard (JIS) standard X 0208. The name comes from the way the lead bytes in Shift-JIS shift around the encoding range of half-width katakana in JIS X 0208.
Shortcut key: A keyboard combination that activates a program command directly, as an alternative to activating the command through the program menus.
Simple Mail Transfer Protocol (SMTP): A protocol for sending messages from one computer to another on a network; used on the Internet to route e-mail.
Simplified Chinese: The Chinese script used in the People’s Republic of China and Singapore. It consists of several thousand ideographic characters that are simplified versions of traditional Chinese characters.
Simultaneous ship, or “sim ship”: The release of localized editions of a product at the same time as the domestic product; or a short release delta, usually within 30 days (to allow for nondevelopment needs).
Single binary: A functional binary that is fully globalized and can be used as is for any language version of the software.
Single-byte character set (SBCS): A character encoding in which each character is represented by 1 byte. Single-byte character sets are mathematically limited to 256 characters.
Slant: The obliqueness or tilt of the glyphs in a font. The most common slants are “regular” and “italic.”
Smart tags: A technology introduced in Office XP that provide users with the ability to associate text and data with actions. Smart tags can provide actions for names, dates, times, telephone numbers, addresses, stock ticker symbols, and so on.
Software Development Kit (SDK): A set of tools and libraries for creating software applications for Windows operating systems.
Sort keys: Numeric representations of a sort element based on locale-specific sorting rules. A sort key consists of several weighted components that represent a character’s script, diacritics, case, and so on.
Spacing character: A character with a nonzero width.
Specialized word break and justification: Refers to scripts, such as Thai, that have complex rules for dividing words between lines or justifying text on a line.
Specification, or “spec”: A detailed plan of a program’s user-interface design and the expected functionality of program features.
SQL-92: Designed by what was formerly the National Committee for Information Technology Standards (NCITS)–now known as the “InterNational Committee for Information Technology Standards” (INCITS)–Technical Committee H2 on Database to be a standard for relational database management systems (RDBMSs).
SQL Query Analyzer: A graphical user interface for designing and testing Transact-SQL statements, batches, and scripts interactively.
SQL Server Enterprise Manager: Allows enterprise-wide configuration and management of SQL Server and SQL Server objects.
Status window: The window of an IME in which the user can change the IME’s conversion mode or input mode.
Stroke count: The number of strokes it takes to draw an ideographic character.
Strong character: A character from which text direction can be determined.
Substitutions: English, French, and other languages based on Latin can substitute a single ligature, such as “fi,” for this particular ligature’s component glyphs “f” and “i.” Conversely, the individual “f” and “i” glyphs could replace the ligature, possibly to give a text-processing application more flexibility when spacing glyphs to fill a line of justified text.
Syllabary: A set of written characters in which each character represents a syllable (for example, a consonant sound followed by a vowel sound). Examples of syllabaries include Japanese katakana and hiragana and the Indic scripts.
System locale: The system locale determines the Windows code page used by the ANSI (non-Unicode) version of Win32 APIs. String and character parameters passed to a Win32 ANSI are converted from this Windows code page to Unicode.
Taiwan calendar: The Taiwan calendar works exactly like the Gregorian calendar, except the year is different. Years in Taiwan are calculated from the year 1912, when Dr. Sun Yat-sen founded a republic in China. 1912 represents Year 1 of the Taiwan calendar. To calculate the year in the Taiwan calendar, take the year in the Gregorian calendar and subtract 1911. For instance, the Gregorian calendar year 1971 would be 60 in the Taiwan calendar.
Text element: A script’s smallest unit of text that can be displayed or edited.
Text Object Model (TOM) interfaces: A substantial set of text-manipulation interfaces. Text solutions such as Microsoft Word and rich edit controls support the TOM feature set. Since rich edit controls ship with Windows operating systems, they are the standard means of obtaining TOM functionality.
Thai calendar: The Thai calendar uses the Buddhist Era (BE), which is 543 years older than the Christian Era (AD). To convert from a BE date to an AD date, subtract 543. Thus BE 2543=2000. Before April 1, 1889 AD, Thailand used a lunar calendar of 12 or 13 months each with 29 or 30 days, each month starting with the new moon. The Gregorian calendar was adopted on April 1, 1889.
Thread locale: The locale of a given thread. Gets inherited upon creation from the current user locale and can be changed at run time to any valid locale (per thread). Calls to NLS APIs can use this locale to format numbers, date, and time.
Traditional Chinese: The set of Chinese characters, used in such countries/regions as Hong Kong SAR, Macau SAR, and Taiwan, that is consistent with the original form of Chinese ideographic characters that are several thousand years old.
Trail-byte: The second byte of a 2-byte code point in a DBCS code page. See Double-byte character set (DBCS).
Transact-SQL: A method to communicate with and access data on SQL Server. Applications that communicate with SQL Server do so by sending Transact-SQL statements to the server, regardless of an application’s UI.
TrueType: A digital font technology (designed by Apple Computer and now used by both Apple and Microsoft in their operating systems) that offers superior display quality on computer screens and printers. See Bitmap font.
Typeface: Name given to a particular style of text. In contrast, a font is an implementation of a typeface.
Typography: The process of displaying text in a variety of fonts, sizes, and styles.
Umlaut: The two dots placed above a vowel, such as “ä,” “ö,” and “ü,” which are used in German and other European languages to indicate a change in the pronunciation of the vowel. See Diaeresis.
Unicode: A worldwide character encoding that includes most of the world’s scripts; it is developed, maintained, and promoted by the Unicode Consortium, a nonprofit computer industry organization. (The official Unicode Consortium Web site is http://www.unicode.org.)
Uppercase: Denotes letters that are capitalized. For instance, acronyms (such as “HTML”) typically consist of all uppercase letters. The notion of uppercase does not apply to East Asian and Middle Eastern scripts.
Usability testing: A series of tests in which users are observed trying to complete a given set of tasks. The purpose of usability testing is to determine how intuitive test subjects find new program features.
User assistance (UA): Refers to any form of documentation (including printed documents, online documents, and Help files) that corresponds to a particular product.
User-defined character: See End-User Defined Character (EUDC).
User locale: The user preferences for formatting of dates, currencies, numbers, and so on. The user locale is a per-user setting, and does not require the user to restart or to log off or log on the computer.
UTC: Coordinated Universal Time (often spelled out as “Universal Coordinated Time”). It is the standard time common to every place in the world. Formerly and still widely called “Greenwich Mean Time” (GMT) and also “World Time.” It is expressed using a 24-hour clock.
Version stamp: In Windows, the information included in the resource file that specifies the company name, application name, copyright, version number, and language edition of a program.
Vertical kerning: A feature used in Arabic to position diacritics at different heights, since it is considered rudimentary to display diacritics all at the same height.
Vertical metrics: A collective term to indicate information from a set of fields that is available in various font tables for determining vertical spacing when laying out text. Some of these fields can be used to arrive at the vertical distance required between two consecutive horizontal baselines, while others can be used to determine vertical spacing for each glyph when text is laid out vertically.
Visual C++: Microsoft’s object-oriented C compiler.
Visual order: The ordering used to display glyphs on a screen, printed page, or other medium. Usually used with bidirectional text, because reordering is required to go from logical order to visual order. See Logical order.
Wansung: The Korean standard character set (KS C-5601-1987), which corresponds to Windows code page 949. It covers the most common Hangul character combinations. Extended Wansung covers all possible Hangul combinations.
–W APIs: The Win32 API entry points that expect string parameters to be wide characters (encoded in Unicode). See –A APIs.
wchar_t: The ANSI C–defined wide-character type, usually either 16 or 32 bits. ANSI rules say that wchar_t should be at least as wide as the char data type, and that the wide-character equivalents of the C language source character set should be created by simple zero or sign extension.
Web Forms: Generally applies to the design-time technology that enables the authoring of ASP.NET pages in a visual designer.
Web Parts: Reusable components that contain Web-based content, such as XML code, HTML pages, or script. Web Parts support a set of standard properties that determine how the Web Parts are rendered in a Digital Dashboard. Office XP Developer makes it possible for you to create and customize Digital Dashboards and Web Parts. See Digital Dashboard.
Weight: The thickness or darkness of glyphs in a font. The most common weights are “regular” and “bold,” but some font families can include such weights as “light,” “demi,” “heavy,” “extra bold,” and so on.
Wide character: A character encoded by a wchar_t or a 16-bit (WORD) data type. Often used to refer to UTF-16-encoded characters.
Win32 API: The set of 32-bit functions supported by Windows.
Win32s API: A subset of the Win32 API that makes it possible to create a single binary that runs on all 32-bit versions of the Windows platform, including Windows 3.1/95/98/Me.
Windows Forms: The new platform for Windows application development, based on the .NET Framework. Provides a clear, object-oriented, extensible set of classes that enables you to develop rich Windows applications. Additionally, Windows Forms can act as the local UI in a multitier, distributed solution.
Windows Forms controls: Reusable components that encapsulate UI functionality and are used in client-side Windows applications. Not only do Windows Forms provide many ready-to-use controls, they also provide the infrastructure for developing your own controls.
Windows Forms Designer: Provides a rapid development solution for creating Windows applications. It is the locus of visual, client-based forms design. Using the designer, you can add components, data controls, or Windows-based controls to a form.
Windows Services: The idea (originating with MS-DOS) of a program operating in the background while a user is doing something else in the foreground has evolved on Windows systems to a Windows Service. Examples include plug-and-play device detection, running message queues, file indexing, and task scheduling. In addition to the services the operating system installs itself, programs such as SQL Server install their own Windows Services to implement functionality that must be available to all users.
World-ready: A program that has been properly globalized and developed for ease of localization (known as “localizability”). See Localizability.
Writing system: The collection of scripts and orthography required to represent a given human language in visual media.
XML: Acronym for Extensible Markup Language. An open standard for exchanging structured documents and data over the Internet that was introduced by the World Wide Web Consortium (W3C) in November 1996. XML is a simplified version of Standard Generalized Markup Language (SGML).