2.6.4 Red-Black Tree

Each set of sibling objects in one level of the containment hierarchy (all child objects under a storage object) is represented as a red-black tree. The parent object of this set of siblings will have a pointer to the top of this tree.

A red-black tree is a special type of binary search tree where each node has a color attribute of red or black. It allows efficient searching in the list of child objects under a storage object. The constraints on a red-black tree allow the binary tree to be roughly balanced, so that insertion, deletion, and searching operations are efficient.

To be valid, the red-black tree MUST maintain the following constraints:

  1. The root storage object MUST always be black. Because the root directory does not have siblings, its color is irrelevant and can therefore be either red or black.

  2. Two consecutive nodes MUST NOT both be red.

  3. The left sibling MUST always be less than the right sibling. This sorting relationship is defined as follows:

    • A node that has a shorter name is less than a node that has a longer name. (Compare the length of the names from the Directory Entry Name Length field.)

    • For nodes that have the same name length from Directory Entry Name Length, iterate through each UTF-16 code point, one at a time, from the beginning of the Unicode string.

    • For each UTF-16 code point, convert to uppercase by using the Unicode Default Case Conversion Algorithm, simple case conversion variant (simple case foldings), with the following notes.<3>Compare each uppercased UTF-16 code point binary value.

    • Unicode surrogate characters are never uppercased, because they are represented by two UTF-16 code points, while the sorting relationship uppercases a single UTF-16 code point at a time.

    • Lowercase characters that are defined in a newer, later version of the Unicode standard can be uppercased by an implementation that conforms to that later Unicode standard.

The simplest implementation of the preceding invariants would be to mark every node as black, in which case the tree is simply a binary tree. However, keeping the red-black tree balanced will typically result in better read performance.

All sibling objects within a storage object (all immediate child objects in one level of the hierarchy) MUST have unique names in the Directory Entry Name field, where uniqueness is determined by the sorting relationship.