1.3.1 Characters


The fundamental unit of a Word binary file is a character. This includes visual characters such as letters, numbers, and punctuation. It also includes formatting characters such as paragraph marks, end of cell marks, line breaks, or section breaks. Finally, it includes anchor characters such as footnote reference characters, picture anchors, and comment anchors.

Characters are indexed by their zero-based Character Position, or CP (section 2.2.1). This documentation is generally concerned with CPs (section 2.2.1), not with the underlying text. Section 2.4.1 specifies an algorithm for determining the text at a particular CP (section 2.2.1), but this is just one of many pieces of information an application might look for. The reader needs to understand that this documentation is much more about logical characters in a document than about physical bytes in a file.