2.15 Index Lexicon File

The index lexicon file is a text file using Unicode encoding which lists the most frequent tokens which appear in the content index file of a master full-text index component of the current full-text index catalog. It is used by the query server to determine alternative spelling variants for the tokens encountered in the received queries.

In a binary representation, the format of the file is as follows.


0


1


2


3


4


5


6


7


8


9

1
0


1


2


3


4


5


6


7


8


9

2
0


1


2


3


4


5


6


7


8


9

3
0


1

Unicode marker

ListOfTokens (variable)

...

Unicode marker (2 bytes): A 2 byte field specific to the text files which use the Unicode encoding. The values of the bytes MUST be 0xFF followed by 0xFE.

ListOfTokens (variable): Array of Unicode characters representing the list of the most frequent tokens in the catalog. The tokens are separated by the new line characters and each token is composed of 1 to 64 non-space characters.