Export (0) Print
Expand All

7.5.2 Unisort.txt Data

Windows Server 2003 and Windows XP get their sorting data from a source file named Unisort.txt (for more information, see [MS-ASRT]). Unisort.txt is a UTF-8 file. All machine-readable data is in ASCII, although some comments contain UTF-8 data. Code points are labeled using UTF-16 values.

The file is arranged in a group of sections of records of tab delimited fields. Optional comments begin with a semicolon. Each section contains a label and perhaps a subsection label.

Cc201070._rfc_ms-adts_unisort1(en-us,PROT.10).gif

Figure 3: Unisort.txt file arrangement

Note that labels are any field that does not begin with a numeric (0xNNNN) value. Blank lines and characters following a semicolon are ignored.

This document will use the following notation to describe the processing of the file:

"open" will be used to indicate that queries for records in a specific section will be made. To open the section with the SORTKEY label and DEFAULT sublabel, the following syntax will be used. The open section will be accessible using the "DefaultTable" name.

open section DefaultTable
 where name is SORTKEY\DEFAULT from unisort.txt

"select" will assign a line from the data file to be referenced by the assigned variable name. To select the highlighted row in the preceding figure, this document will use this notation. The selected row will be accessible using the name "CharacterRow".

set UnicodeChar to 0x0041
select record CharacterRow from DefaultTable
 where field 1 matches UnicodeChar

Values from selected records will be referenced by field number. The following would select the individual data fields from the selected row.

set CharacterWeight.ScriptMember to CharacterRow.Field2
set CharacterWeight.PrimaryWeight to CharacterRow.Field3
set CharacterWeight.DiacriticWeight to CharacterRow.Field4
set CharacterWeight.CaseWeight to CharacterRow.Field5

Some sections of the data file are referenced by a locale LCID (locale identifier).

Cc201070._rfc_ms-adts_unisort2(en-us,PROT.10).gif

Figure 4: Record selection

To select the highlighted record, notation such as the following will be used.

set Character1 to 0x0043
set Character2 to 0x0068
set SortLocale to 0x0405
open section CompressionTable
    where name is SORTTABLES\COMPRESSION\LCID[SortLocale]\TWO from 
    unisort.txt
select record CompressionRow from CompressionTable
    where field 1 matches Character1 and field 2 matches Character2
set CharacterWeight.ScriptMember to CompressionRow.Field3
set CharacterWeight.PrimaryWeight to CompressionRow.Field4
set CharacterWeight.DiacriticWeight to CompressionRow.Field5
set CharacterWeight.CaseWeight to CompressionRow.Field6
Show:
© 2014 Microsoft