2.4.1 Retrieving Text

The following algorithm specifies how to find the text at a particular character position (cp). Negative character positions are not valid.

  1. Read the FIB from offset zero in the WordDocument Stream.

  2. All versions of the FIB contain exactly one FibRgFcLcb97, though it can be nested in a larger structure. FibRgFcLcb97.fcClx specifies the offset in the Table Stream of a Clx. FibRgFcLcb97.lcbClx specifies the size, in bytes, of that Clx. Read the Clx from the Table Stream.

  3. The Clx contains a Pcdt, and the Pcdt contains a PlcPcd. Find the largest i such that PlcPcd.aCp[i]cp. As with all Plcs, the elements of PlcPcd.aCp are sorted in ascending order. Recall from the definition of a Plc that the aCp array has one more element than the aPcd array.  Thus, if the last element of PlcPcd.aCp is less than or equal to cp, cp is outside the range of valid character positions in this document.

  4. PlcPcd.aPcd[i] is a Pcd. Pcd.fc is an FcCompressed that specifies the location in the WordDocument Stream of the text at character position PlcPcd.aCp[i].

  5. If FcCompressed.fCompressed is zero, the character at position cp is a 16-bit Unicode character at offset FcCompressed.fc + 2(cp - PlcPcd.aCp[i]) in the WordDocument Stream. This is to say that the text at character position PlcPcd.aCP[i]  begins at offset FcCompressed.fc in the WordDocument Stream and each character occupies two bytes.

  6. If FcCompressed.fCompressed is 1, the character at position cp is an 8-bit ANSI character at offset (FcCompressed.fc / 2) + (cp - PlcPcd.aCp[i]) in the WordDocument Stream, unless it is one of the special values in the table defined in the description of FcCompressed.fc. This is to say that the text at character position PlcPcd.aCP[i]  begins at offset FcCompressed.fc / 2 in the WordDocument Stream and each character occupies one byte.