2.6.4 Red-Black Tree
Each set of sibling objects in one level of the containment hierarchy (all child objects under a storage object) is represented as a red-black tree. The parent object of this set of siblings will have a pointer to the top of this tree.
A red-black tree is a special type of binary search tree where each node has a color attribute of red or black. It allows efficient searching in the list of child objects under a storage object. The constraints on a red-black allow the binary tree to be roughly balanced, so that insertion, deletion, and searching operations are efficient.
The red-black tree MUST maintain the following constraints in order for it to be valid.
-
The root storage object MUST always be black. Because the root directory does not have siblings, its color is irrelevant and can therefore be either red or black.
-
Two consecutive nodes MUST NOT both be red.
-
The left sibling MUST always be less than the right sibling. This sorting relationship is defined as follows.
-
A node with a shorter name is less than a node with a longer name (compare the length of the names from the Directory Entry Name Length field).
-
For nodes with the same name length from Directory Entry Name Length, iterate through each UTF-16 code point, one at a time, from the beginning of the Unicode string.
-
For each UTF-16 code point, convert to upper-case with the Unicode Default Case Conversion Algorithm, simple case conversion variant (simple case foldings), with the following notes.<2> Compare each upper-cased UTF-16 code point binary value.
-
Unicode surrogate characters are never upper-cased, because they are represented by two UTF-16 code points, while the sorting relationship upper-cases a single UTF-16 code point at a time.
-
Lowercase characters defined in a newer, later version of the Unicode standard can be upper-cased by an implementation that conforms to that later Unicode standard.
-
The simplest implementation of the above invariants would be to mark every node as black, in which case the tree is simply a binary tree. However, keeping the red-black tree balanced will typically result in better read performance.
All sibling objects within a storage object (all immediate child objects in one level of hierarchy) MUST have unique names in the Directory Entry Name field, where uniqueness is determined by the sorting relationship above.