2.6 Compressed Token Sequence

The compressed token sequence (bitstream) contains the Huffman-encoded matches and literals using the Huffman trees specified in the block header. Decompression continues until the number of decompressed bytes corresponds exactly to the number of uncompressed bytes indicated in the block header.

The representation of an unmatched literal character in the output is simply the appropriate element index [0..255] from the main Huffman tree.

The representation of a match in the output involves several transformations, as shown in the following diagram. At the top of the diagram are the match length [2..257] and the match offset [0..WINDOW_SIZE-3]. The match offset and match length are split into subcomponents and encoded separately. For matches of length [258..32768], the token indicates match length 257, and then the additional value of the Extra Length field is encoded in the bitstream following the other match subcomponent fields.

The match subcomponents are shown in the following figure.

Match encoding subcomponents

Figure 1: Match encoding subcomponents