7.5.2 Unisort.txt Data
Windows Server 2003 and Windows XP get their sorting data from a source file named Unisort.txt (for more information, see [MS-ASRT]). Unisort.txt is a UTF-8 file. All machine-readable data is in ASCII, although some comments contain UTF-8 data. Code points are labeled using UTF-16 values.
The file is arranged in a group of sections of records of tab delimited fields. Optional comments begin with a semicolon. Each section contains a label and perhaps a subsection label.
Figure 3: Unisort.txt file arrangement
Note that labels are any field that does not begin with a numeric (0xNNNN) value. Blank lines and characters following a semicolon are ignored.
This document will use the following notation to describe the processing of the file:
"open" will be used to indicate that queries for records in a specific section will be made. To open the section with the SORTKEY label and DEFAULT sublabel, the following syntax will be used. The open section will be accessible using the "DefaultTable" name.
open section DefaultTable where name is SORTKEY\DEFAULT from unisort.txt
"select" will assign a line from the data file to be referenced by the assigned variable name. To select the highlighted row in the preceding figure, this document will use this notation. The selected row will be accessible using the name "CharacterRow".
set UnicodeChar to 0x0041 select record CharacterRow from DefaultTable where field 1 matches UnicodeChar
Values from selected records will be referenced by field number. The following would select the individual data fields from the selected row.
set CharacterWeight.ScriptMember to CharacterRow.Field2 set CharacterWeight.PrimaryWeight to CharacterRow.Field3 set CharacterWeight.DiacriticWeight to CharacterRow.Field4 set CharacterWeight.CaseWeight to CharacterRow.Field5
Some sections of the data file are referenced by a locale LCID (locale identifier).
Figure 4: Record selection
To select the highlighted record, notation such as the following will be used.
set Character1 to 0x0043 set Character2 to 0x0068 set SortLocale to 0x0405
open section CompressionTable where name is SORTTABLES\COMPRESSION\LCID[SortLocale]\TWO from unisort.txt select record CompressionRow from CompressionTable where field 1 matches Character1 and field 2 matches Character2 set CharacterWeight.ScriptMember to CompressionRow.Field3 set CharacterWeight.PrimaryWeight to CompressionRow.Field4 set CharacterWeight.DiacriticWeight to CompressionRow.Field5 set CharacterWeight.CaseWeight to CompressionRow.Field6