3.1 Example of a Clx
The following is an example of a Clx. This structure demonstrates the mapping between CP elements and the location of text in the file. See section 2.4.1, Retrieving Text.
Offset |
Size |
Structure |
Value |
---|---|---|---|
0000009A |
02E8 |
FibRgFcLcb97 - rgFcLcb97 |
|
0000009A |
0108 |
... (omitted for brevity) - |
|
000001A2 |
0004 |
- fcClx |
0x000001F8 |
000001A6 |
0004 |
- lcbClx |
0x0000002D |
000001AA |
01D8 |
... (omitted for brevity) - |
|
Figure 3: Portions of the FibRgFcLcb97 structure, with emphasis on fcClx and lcbClx
As with all Word Binary files, this file has a Fib at an offset of zero in the WordDocument Stream. The preceding figure shows a portion of the FibRgFcLcb97 that is contained in that Fib. The FibRgFcLcb97 is very large. Most fields have been omitted here, for brevity.
fcClx: 0x000001F8 specifies the offset, in bytes, of a location in the Table Stream. A Clx begins at this offset.
lcbClx: 0x0000002D specifies the size, in bytes, of the Clx at offset 0x000001F8 in the Table Stream.
The following shows the top level of the Clx at offset 0x000001F8 in the Table Stream.
Offset |
Size |
Structure |
Value |
---|---|---|---|
000001F8 |
002D |
Clx - Clx |
|
000001F8 |
0000 |
RgPrc - RgPrc |
|
000001F8 |
002D |
Pcdt - Pcdt |
|
000001F8 |
0001 |
BYTE - clxt |
0x02 |
000001F9 |
0004 |
ULONG - lcb |
0x00000028 |
000001FD |
0028 |
PlcPcd - PlcPcd |
|
Figure 4: A Clx structure
RgPrc: This optional member is not present in this Clx structure. Because the first byte of this Clx structure is 0x02, the Clx begins with a Pcdt structure and does not contain an array of Prc structures.
Pcdt.clxt: 0x02 specifies that this is a Pcdt structure, as opposed to a Prc structure.
Pcdt.lcb: 0x00000028 specifies the size, in bytes, of PlcPcd. A PlcPcd is a Plc structure whose data members are Pcd structures. A Pcd is 8 bytes in size, so this PlcPcd consists of three Pcd structures and four CP elements.
The following shows the top-level expansion of the PlcPcd that is contained in this Clx. The Pcd structures, which are expanded in later tables, specify the locations of text in the file.
Figure 5: The top-level expansion of a PlcPcd
cp[0]: 0x00000000 specifies that pcd[0] applies to text starting at CP zero. Because cp[1] is 0x0000006, pcd[0] applies to CP values zero through 5, inclusive.
cp[1]: 0x00000006 specifies that pcd[1] applies to text starting at CP 0x00000006. Because cp[2] is 0x0000000D, pcd[1] applies to CP values 0x00000006 through 0x0000000C, inclusive.
cp[2]: 0x0000000D specifies that pcd[2] applies to text starting at CP 0x0000000D. Because cp[3] is 0x0000000E, pcd[2] applies only to CP value 0x0000000D.
cp[3]: 0x0000000E specifies that the last CP value to which pcd[2] applies is 0x0000000D.
pcd[0]: Specifies the location of text for CP values zero through 5, inclusive. This structure is expanded in the following table.
pcd[1]: Specifies the location of text for CP values 0x00000006 through 0x0000000C, inclusive. This structure is expanded following.
pcd[2]: Specifies the location of text for CP value 0x0000000D. This structure is expanded following.
The following table shows the expansion of pcd[0]. This structure specifies the location of the text at CP zero through 5, inclusive.
Offset |
Size |
Structure |
Value |
---|---|---|---|
0000020D |
0008 |
Pcd - pcd |
|
0000020D |
1 bit |
USHORT - fNoParaLast |
0x1 |
0000020D |
1 bit |
USHORT - fR1 (ignored) |
0x0 |
0000020D |
1 bit |
USHORT - fDirty (ignored) |
0x0 |
0000020D |
13 bits |
USHORT - fR2 (ignored) |
0x0006 |
0000020F |
0004 |
FcCompressed - fc |
|
0000020F |
30 bits |
ULONG - fc |
0x00000C22 |
0000020F |
1 bit |
ULONG - fCompressed |
0x0 |
0000020F |
1 bit |
ULONG - r1 (ignored) |
0x0 |
00000213 |
0002 |
Prm0 - prm0 |
|
00000213 |
1 bit |
USHORT - fComplex |
0x0 |
00000213 |
7 bits |
USHORT - isprm |
0x00 |
00000213 |
8 bits |
USHORT - val |
0x00 |
Figure 6: The expansion of pcd[0]
fNoParaLast: 0x1 specifies that the text that is referenced by this Pcd structure does not contain any paragraph marks.
fc.fc: 0x00000C22 specifies the offset, in bytes, in the WordDocument Stream where the text at CP zero begins. Because cp[1] is 0x00000006, there are 6 characters of text at this offset.
fc.fCompressed: 0x0 specifies that the text at offset fc.fc in the WordDocument Stream consists of 16-bit Unicode characters.
prm0.fComplex: 0x0 specifies that this is a Prm0 structure, as opposed to a Prm1 structure.
prm0.isprm: 0x00 specifies that sprmCLbcCRJ is applied to the range of CPs that are referenced by this Pcd structure. However, an isprm of 0x0000, combined with a val of 0x0000, is a special case that specifies that the CPs that are referenced by this Pcd have no additional formatting from their Pcd structure
prm0.val: 0x00, combined with isprm 0x0000, specifies that the CPs that are referenced by this Pcd have no additional formatting from their Pcd.
The following shows the expansion of pcd[1]. This structure specifies the location of the text at CP 0x0000006 through 0x0000000C, inclusive.
Offset |
Size |
Structure |
Value |
---|---|---|---|
00000215 |
0008 |
Pcd - pcd |
|
00000215 |
1 bit |
USHORT - fNoParaLast |
0x0 |
00000215 |
1 bit |
USHORT - fR1 (ignored) |
0x0 |
00000215 |
1 bit |
USHORT - fDirty (ignored) |
0x0 |
00000215 |
13 bits |
USHORT - fR2 (ignored) |
0x0006 |
00000217 |
0004 |
FcCompressed - fc |
|
00000217 |
30 bits |
ULONG - fc |
0x00000800 |
00000217 |
1 bit |
ULONG - fCompressed |
0x1 |
00000217 |
1 bit |
ULONG - r1 (ignored) |
0x0 |
0000021B |
0002 |
Prm0 - prm0 |
|
0000021B |
1 bit |
USHORT - fComplex |
0x0 |
0000021B |
7 bits |
USHORT - isprm |
0x00 |
0000021B |
8 bits |
USHORT - val |
0x00 |
Figure 22: Expansion of pcd[1]
fNoParaLast: 0x0 specifies that the text that is referenced by this Pcd might contain a paragraph mark. A value of 0x0001 specifies that there is no paragraph mark. A value of 0x0000 specifies that the referenced text might or might not contain a paragraph mark.
fc.fc: 0x00000800 specifies the offset, in bytes, in the WordDocument Stream where the text at CP 0x00000006 begins. Because fCompressed is 1, the actual offset is fc/2, or 0x00000400. Because cp[2] is 0x0000000D, there are 7 characters at this offset.
fc.fCompressed: 0x1 specifies that the text at offset fc/2 consists of 8-bit ANSI characters, except for the values that are listed in the table in the specification of FcCompressed (section 2.9.73).
prm0.fComplex: 0x0 specifies that this is a Prm0 structure, as opposed to a Prm1 structure.
prm0.isprm: 0x00 specifies that sprmCLbcCRJ is applied to the range of CPs that are referenced by this Pcd. However, an isprm of 0x0000, combined with a val of 0x0000, is a special case that specifies that the CPs referenced by this Pcd have no additional formatting from their Pcd.
prm0.val: 0x00, combined with isprm 0x0000, specifies that the CPs that are referenced by this Pcd structure have no additional formatting from their Pcd.
The following shows the expansion of pcd[2]. This structure specifies the location of the text at CP 0x0000000D.
Offset |
Size |
Structure |
Value |
---|---|---|---|
0000021D |
0008 |
Pcd - pcd |
|
0000021D |
1 bit |
USHORT - fNoParaLast |
0x0 |
0000021D |
1 bit |
USHORT - fR1 (ignored) |
0x0 |
0000021D |
1 bit |
USHORT - fDirty (ignored) |
0x0 |
0000021D |
13 bits |
USHORT - fR2 (ignored) |
0x0006 |
0000021F |
0004 |
FcCompressed - fc |
|
0000021F |
30 bits |
ULONG - fc |
0x0000080E |
0000021F |
1 bit |
ULONG - fCompressed |
0x1 |
0000021F |
1 bit |
ULONG - r1 (ignored) |
0x0 |
00000223 |
0002 |
Prm0 - prm0 |
|
00000223 |
1 bit |
USHORT - fComplex |
0x0 |
00000223 |
7 bits |
USHORT - isprm |
0x00 |
00000223 |
8 bits |
USHORT - val |
0x00 |
Figure 23: Expansion of pcd[2]
fNoParaLast: 0x0 specifies that the text that is referenced by this Pcd might contain a paragraph mark. A value of 0x0001 specifies that there is no paragraph mark. A value of 0x0000 indicates that a paragraph mark might, or might not, be contained in the referenced text.
fc.fc: 0x0000080E specifies the offset, in bytes, in the WordDocument Stream where the text at CP 0x0000000D begins. Because fCompressed is 1, the actual offset is fc/2, or 0x00000407. Because cp[3] is 0x0000000E, there is 1 character at this offset.
fc.fCompressed: 0x1 specifies that the text at offset fc/2 consists of 8-bit ANSI characters, except for the values that are listed in the table in the specification of FcCompressed (section 2.9.73).
prm0.fComplex: 0x0 specifies that this is a Prm0 structure, as opposed to a Prm1 structure.
prm0.isprm: 0x00 specifies that sprmCLbcCRJ is applied to the range of CPs that are referenced by this Pcd. However, an isprm of 0x0000, combined with a val of 0x0000, is a special case that specifies that the CPs that are referenced by this Pcd have no additional formatting from their Pcd.
prm0.val: 0x00, combined with isprm 0x0000, specifies that the CPs that are referenced by this Pcd have no additional formatting from their Pcd.
The following shows the Unicode text at offset 0x00000C22 in the WordDocument Stream. This is an array of two-byte characters. This array is not null-terminated.
Offset |
Size |
Structure |
Value |
---|---|---|---|
00000C22 |
000C |
USHORT array - text |
|
00000C22 |
0002 |
USHORT - text[0] |
0x0048 |
00000C24 |
0002 |
USHORT - text[1] |
0x0065 |
00000C26 |
0002 |
USHORT - text[2] |
0x006C |
00000C28 |
0002 |
USHORT - text[3] |
0x006C |
00000C2A |
0002 |
USHORT - text[4] |
0x006F |
00000C2C |
0002 |
USHORT - text[5] |
0x0020 |
Figure 9: The text at offset 0x00000C22 in the Table Stream
text[0]: 0x0048 Unicode 'H'.
text[1]: 0x0065 Unicode 'e'.
text[2]: 0x006C Unicode 'l'.
text[3]: 0x006C Unicode 'l'.
text[4]: 0x006F Unicode 'o'.
text[5]: 0x0020 Unicode space.
The following shows the ANSI text at offset 0x00000400 in the WordDocument Stream. This is an array of single byte characters. This array is not null-terminated.
Offset |
Size |
Structure |
Value |
---|---|---|---|
00000400 |
0007 |
BYTE array - text |
|
00000400 |
0001 |
BYTE - text[0] |
0x57 |
00000401 |
0001 |
BYTE - text[1] |
0x6F |
00000402 |
0001 |
BYTE - text[2] |
0x72 |
00000403 |
0001 |
BYTE - text[3] |
0x6C |
00000404 |
0001 |
BYTE - text[4] |
0x64 |
00000405 |
0001 |
BYTE - text[5] |
0x2E |
00000406 |
0001 |
BYTE - text[6] |
0x0D |
Figure 10: The text at offset 0x00000400 in the WordDocument Stream
text[0]: 0x57 ANSI 'W'.
text[1]: 0x6F ANSI 'o'.
text[2]: 0x72 ANSI 'r'.
text[3]: 0x6C ANSI 'l'.
text[4]: 0x64 ANSI 'd'.
text[5]: 0x2E ANSI period ('.').
text[6]: 0x0D ANSI paragraph mark.
The following structure shows the ANSI text at offset 0x00000407 in the WordDocument Stream. This is an array of single byte characters. This array is not null-terminated.
Offset |
Size |
Structure |
Value |
---|---|---|---|
00000407 |
0001 |
BYTE array - text |
|
00000407 |
0001 |
BYTE - text[0] |
0x0D |
Figure 11: The text at offset 0x00000407 in the WordDocument Stream
text[0]: 0x0D ANSI paragraph mark.
The complete text of this document is therefore, "Hello World", followed by a period and two paragraph marks.