Exporteren (0) Afdrukken
Alles uitvouwen
EN
Deze inhoud is niet beschikbaar in uw taal, maar wel in het Engels.

An Extensive Examination of Data Structures

Visual Studio .NET 2003
 

Scott Mitchell
4GuysFromRolla.com

January 2004

Summary: This article, the third in a six-part series on data structures in the .NET Framework, looks at a common data structure that is not included in the .NET Framework Base Class Library: binary trees. Whereas arrays arrange data linearly, binary trees can be envisioned as storing data in two dimensions. A special kind of binary tree, called a binary search tree, or BST, allows for a much more optimized search time than with arrays. (31 printed pages)

Scott Mitchell continues his study of data structures, focusing on binary trees and BSTs, a common data structure that is not include in the .NET Framework Class Library.


Download the BinaryTrees.msi sample file.

Contents

Introduction
Arranging Data in a Tree
Understanding Binary Trees
Improving the Search Time with Binary Search Trees (BSTs)
Binary Search Trees in the Real-World

Introduction

In Part 1 of this series, we looked at what data structures are, how their performance can be evaluated, and how these performance considerations play into choosing which data structure to utilize for a particular algorithm. In addition to reviewing the basics of data structures and their analysis, we also looked at the most commonly used data structure—the array—and its relative, the ArrayList. In Part 2 we looked at the cousins of the ArrayList—the Stack and the Queue, which store their data like an ArrayList, but limit the means by which their contained data can be accessed. In Part 2 we also looked at the Hashtable, which is essentially an array that is indexed by some arbitrary object as opposed to by an ordinal value.

The ArrayList, Stack, Queue, and Hashtable all use an underlying array as the means by which their data is stored. This means that, under the covers, these four data structures are bound by the limitations imposed by an array. Recall from Part 1 that an array is stored linearly in memory, requires explicit resizing when the array's capacity it reached, and suffers from linear searching time.

In this third installment of the article series, we will examine a new data structure—the binary tree. As we'll see, binary trees store data in a non-linear fashion. After discussing the properties of binary trees, we'll look at a more specific type of binary tree—the binary search tree (BST). A BST imposes certain rules on how the items of the tree are arranged. These rules provide BSTs with a sub-linear search time, making them more efficient for searching than arrays.

Arranging Data in a Tree

If you've ever looked at a genealogy table, or at the chain of command in a corporation, you've seen data arranged in a tree. A tree is composed of a collection of nodes, where each node has some associated data and a set of children. A node's children are those nodes that appear immediately beneath the node itself. A node's parent is the node immediately above it. A tree's root is the single node that contains no parent.

Figure 1 shows an example of the chain of command in a fictitious company.

Figure 1. Tree view of a chain of command in a fictitious company

In this example, the tree's root is Bob Smith, CEO. This node is the root because it has no parent. The Bob Smith node has one child, Tina Jones, President, whose parent is Bob Smith. The Tina Jones node has three children:

  • Jisun Lee, CIO
  • Frank Mitchell, CFO
  • Davis Johnson, VP of Sales

Each of these nodes' parent is the Tina Jones node. The Jisun Lee node has two children:

  • Tony Yee
  • Sam Maher

The Frank Mitchell node has one child:

  • Darren Kulton

The Davis Johnson node has three children:

  • Todd Brown
  • Jimmy Wong
  • Sarah Yates

All trees exhibit the following properties:

  • There is precisely one root.
  • All nodes except the root have precisely one parent.
  • There are no cycles. That is, starting at any given node, there is not some path that can take you back to the starting node. The first two properties—that there exists one root and that all nodes save the root have one parent—guarantee that no cycles exist.

Trees are useful for arranging data in a hierarchy. As we will discuss later on in this article, the time to search for an item can be drastically reduced by intelligently arranging the hierarchy. Before we can arrive at that topic, though, we need to first discuss a special kind of tree, the binary tree.

Note   Throughout this article we will be looking at numerous new terms and definitions. These definitions, along with being introduced throughout the text, are listed in Appendix A.

Understanding Binary Trees

A binary tree is a special kind of tree, in which all nodes have at most two children. For a given node in a binary tree, the first child is referred to as the left child, while the second child is referred to as the right child. Figure 2 depicts two binary trees.

Figure 2. Illustration of two binary trees

Binary tree (a) has 8 nodes, with node 1 as its root. Node 1's left child is node 2; node 1's right child is node 3. Notice that a node doesn't need to have both a left child and right child. In binary tree (a), node 4, for example, has only a right child. Furthermore, a node can have no children. In binary tree (b), nodes 4, 5, 6, and 7 all have no children.

Nodes that have no children are referred to as leaf nodes. Nodes that have one or two children are referred to as internal nodes. Using these new definitions, the leaf nodes in binary tree (a) are nodes 6 and 8, and the internal nodes are nodes 1, 2, 3, 4, 5, and 7.

Unfortunately, the .NET Framework does not contain a binary tree class, so in order to better understand binary trees, let's take a moment to create our own binary tree class.

The First Step: Creating a Node Class

The first step in designing our binary tree class is to create a Node class. The Node class abstractly represents a node in the tree. Realize that each node in a binary tree contains two things:

  1. Data
  2. 0, 1, or 2 children

The data stored in the nodes depends on what it is that you need to store. Just like arrays can be created to hold instances of integers, strings, or other classes, nodes, too, are designed to hold some instance of a class. To make the Node class as general as possible, we will have the node store data of type object, just like the ArrayList, Queue, Stack, and Hashtable use an underlying array of objects to store data of any type.

Note   With C# Version 2.0 you will be able to use Generics to create a strongly typed Node class, rather than having to use a Node class that holds objects. For more information on using Generics, be sure to read Juval Lowy's article An Introduction to C# Generics.

The following is the code for the Node class:

public class Node
{
   private object data;
   private Node left, right;

   #region Constructors
   public Node() : this(null) {}
   public Node(object data) : this(data, null, null) {}
   public Node(object data, Node left, Node right)
   {
      this.data = data;
      this.left = left;
      this.right = right;
   }
   #endregion

   #region Public Properties
   public object Value
   {
      get
      {
         return data;
      }
      set
      {
         data = value;
      }
   }

   public Node Left
   {
      get
      {
         return left;
      }
      set
      {
         left = value;
      }
   }

   public Node Right
   {
      get
      {
         return right;
      }
      set
      {
         right = value;
      }
   }
   #endregion
}

Note that the Node class has three private member variables:

  1. data, of type object. This member variable contains the data stored in the node.
  2. left, of type Node. This member variable contains a reference to the node's left child.
  3. right, of type Node. This member variable contains a reference to the node's right child.
  4. The remainder of the class contains the constructors and the public properties, which provide access to these three member variables. Notice that the left and right member variables are of type Node. That is, the Node class has member variables that are Node class instances themselves.

Creating the BinaryTree Class

With the Node class complete, the BinaryTree class is a cinch to develop. The BinaryTree class contains a single private member variables—root—which is of type Node and represents the root of the binary tree. This private member variable is exposed as a public property.

The BinaryTree class has a single public method, Clear(), which clears out the contents of the tree. Clear() works by simply setting root to null. Below is the code for the BinaryTree class.

public class BinaryTree
{
   private Node root;

   public BinaryTree()
   {
      root = null;
   }

   #region Public Methods
   public virtual void Clear()
   {
      root = null;
   }
   #endregion

   #region Public Properties
   public Node Root
   {
      get
      {
         return root;
      }
      set
      {
         root = value;
      }
   }
   #endregion
}

The following code illustrates how to use the BinaryTree class to generate a binary tree with the same data and structure as binary tree (a) shown in Figure 2.

BinaryTree btree = new BinaryTree();
btree.Root = new Node(1);
btree.Root.Left = new Node(2);
btree.Root.Right = new Node(3);

btree.Root.Left.Left = new Node(4);
btree.Root.Right.Right = new Node(5);

btree.Root.Left.Left.Right = new Node(6);
btree.Root.Right.Right.Right = new Node(7);

btree.Root.Right.Right.Right.Right = new Node(8);

Note that we start by creating a BinaryTree class instance, and then create its root. We then must manually add new Node class instances to the appropriate left and right children. For example, to add node 4, which is the left child of the left child of the root, we use:

btree.Root.Left.Left = new Node(4);

Recall from Part 1 of this article series that an array's elements are stored in a contiguous block of memory. By doing so, arrays exhibit constant-time lookups. That is, the time it takes to access a particular element of an array does not change as the number of elements in the array increases.

Binary trees, however, are not stored contiguously in memory, as Figure 3 illustrates. Rather, the BinaryTree class instance has a reference to the root Node class instance. The root Node class instance has references to its left and right child Node instances; these child instances have references to their child instances, and so on. The point is, the various Node instances that makeup a binary tree can be scattered throughout the CLR managed heap. They are not necessarily contiguous, as are the elements of an array.

Figure 3. Binary trees stored in memory

Imagine that we wanted to access a particular node in a binary tree. To accomplish this task, we need to search the binary tree's set of nodes, looking for the particular node. There's no direct access to a given node as with an array. Searching a binary tree takes linear time, as potentially all nodes need to be examined. That is, as the number of nodes in the binary tree increases, the number of steps to find an arbitrary node increases as well.

So, if a binary tree's lookup time is linear, and its search time is linear, how is the binary tree any better than an array, whose search time is linear, but whose lookup time is constant? Well, a generic binary tree doesn't offer us any benefit over an array. However, by intelligently organizing the items in a binary tree, we can greatly improve the search time (and therefore the lookup time as well).

Improving the Search Time with Binary Search Trees (BSTs)

A binary search tree is a special kind of binary tree designed to improve the efficiency of searching through the contents of a binary tree. Binary search trees exhibit the following property: for any node n, every descendant node's value in the left subtree of n is less than the value of n, and every descendant node's value in the right subtree is greater than the value of n.

A subtree rooted at node n is the tree formed by imaging node n was a root. That is, the subtree's nodes are the descendants of n and the subtree's root is n itself. Figure 4 illustrates the concept of subtrees and the binary search tree property.

Figure 4. Subtrees and the binary search tree property

Figure 5 shows two examples of binary trees. The one on the right, binary tree (b), is a BST because it exhibits the binary search tree property. Binary tree (a), however, is not a BST because not all nodes of the tree exhibit the binary search tree property. Namely, node 10's right child, 8, is less than 10 but it appears in node 10's right subtree. Similarly, node 8's right child, node 4, is less than 8 but appears on node 8's right subtree. This property is violated in other locations, too. For example, node 9's right subtree contains values less than 9 (8 and 4).

Figure 5. Comparison of non-BST binary tree (a) and a BST binary tree (b)

Note that for the binary search tree property to be upheld, the data stored in the nodes of a BST must be able to be compared to one another. Specifically, given two nodes, a BST must be able to determine if one is less than, greater than, or equal to the other.

Now, imagine that you want to search a BST for a particular node. For example, for the BST in Figure 5 (the binary tree (b)), imagine that we wanted to search for the node 10. A BST, like a regular binary tree, only has a direct reference to one node—its root. Can you think of an optimal way to search the tree to see if node 10 exists? There's a better way than searching through each node in the tree.

To see if 10 exists in the tree, we can start with the root. We see that the root's value (7) is less than the value of the node we are looking for. Therefore, if 10 does exist in the BST, it must be in the root's right subtree. Therefore, we continue our search at node 11. Here we notice that 10 is less than 11, so if 10 exists in the BST it must exist in the left subtree of 11. Moving onto the left child of 11, we find node 10, and have located the node we want.

What happens if we search for a node that does not exist in the tree? Imagine that we wanted to find node 9. We'd start by repeating the same steps above. Upon reaching node 10, we'd see that node 10 was greater than 9, so 9, if it exists, must be in 10's left subtree. However, we'd notice that 10 has no left child, therefore 9 must not exist in the tree.

More formally, our searching algorithm goes as follows. We have a node n we wish to find (or determine if it exists), and we have a reference to the BST's root. This algorithm performs a number of comparisons until a null reference is hit or until the node we are searching for is found. At each step we are dealing with two nodes: a node in the tree, call it c, that we are currently comparing with n, the node for which we are looking. Initially, c is the root of the BST. We apply the following steps:

  1. If c is a null reference, then exit the algorithm—n is not in the BST.
  2. Compare c's value and n's value.
  3. If the values are equal, then we found n.
  4. If n's value is less than c's then n, if it exists, must be in the c's left subtree. Therefore, return to step 1, letting c be c's left child.
  5. If n's value is greater than c's then n, if it exists, must be in the c's right subtree. Therefore, return to step 1, letting c be c's right child.

We applied these steps earlier when searching for node 10. We started with the root node and noted that 10 was greater than 7, so we repeated our comparison with the root's right child, 11. Here, we noted 10 was less than 11, so we repeated our comparison with 11's left child. At this point we had found node 10. When searching for node 9, which did not exist, we wound up performing our comparison with 10's left child, which was a null reference. Hence we deduced that 9 did not exist in the BST.

Analyzing the BST Search Algorithm

For finding a node in a BST, at each stage we ideally reduce the number of nodes we have to check by half. For example, consider the BST in Figure 6, which contains 15 nodes. When starting our search algorithm at the root, our first comparison will take us to either the root's left or right child. In either case, once this step is made the number of nodes that need to be considered has just halved, from 15 down to 7. Similarly, at the next step the number is halved again, from 7 down to 3, and so on.

Figure 6. BST with 15 nodes

The important concept to understand here is that ideally at each step in the algorithm the number of nodes that have to be considered has been cut in half. Compare this to searching an array. When searching an array we have to search all elements, one element at a time. That is, when searching an array with n elements, after we check the first element, we still have n – 1 elements to check. With a BST of n nodes, however, after checking the root we have whittled the problem down to searching a BST with n/2 nodes.

Searching a binary tree is similar in analysis to searching a sorted array. For example, imagine you wanted to find if there is a John King in the phonebook. You could start by flipping to the middle of the phone book. Here, you'd likely find people with last names starting with the letter M. Since K comes before M alphabetically, you would then flip halfway between the start of the phonebook and the page you had reached in the Ms. Here, you might end up in the Hs. Since K comes after H, you'd flip half way between the Hs and the Ms. This time you might hit the Ks, where you could quickly see if James King was listed.

This is similar to searching a BST. With a BST the midpoint is the root. We then traverse down the tree, navigating to the left and right children as needed. These approaches cut the search space in half at each step. Such algorithms that exhibit this property have an asymptotic running time of log2 n, commonly abbreviated log n. Recall from our mathematical discussions in Part 1 of this article series that log2 n = y means that 2y = n. That is, as n grows, log2 n grows very slowly. The growth rate of log2 n compared to linear growth is shown in the graph in Figure 7. Due to log2 n's slower growth than linear time, algorithms that run in asymptotic time log2 n are said to be sublinear.

Figure 7. Comparison of linear growth rate vs. log2 n

While it may appear that the logarithmic curve is flat, it is increasing, albeit rather slowly. To appreciate this, consider searching an array with 1,000 elements versus searching a BST with 1,000 elements. For the array, we'll have to search up to 1,000 elements. For the BST, we'd ideally have to search no more than ten nodes! (Note that log10 1024 equals 10.)

Throughout our analysis of the BST search algorithm, I've repeatedly used the word "ideally." This is because the search time for a BST depends upon its topology, or how the nodes are laid out with respect to one another. For a binary tree like the one in Figure 6, each comparison stage in the search algorithm trims the search space in half. However, consider the BST shown in Figure 8. The topology of the BST is synonymous to how an array is arranged.

Figure 8. Example of BST that will be searched in linear time

Searching the BST in Figure 8 will take linear time because after each comparison the problem space is only reduced by 1 node, not by half of the current nodes, as with the BST in Figure 6.

Therefore, the time it takes to search a BST is dependent upon its topology. In the best case, the time is on the order of log2 n, but in the worst case it requires linear time. As we'll see in the next section, the topology of a BST is dependent upon the order with which the nodes are inserted. Therefore, the order with which the nodes are inserted affects the running time of the BST search algorithm.

Inserting Nodes into a BST

We've seen how to search a BST to determine if a particular node exists, but we've yet to look at how to add a new node. When adding a new node, we can't arbitrarily add the new node, but rather we have to add the new node such that the binary search tree property is maintained.

When inserting a new node we always insert the new node as a leaf node. The only challenge, then, is finding the node in the BST, which become this new node's parent. Like with the searching algorithm, we'll be making comparisons between a node c and the node to be inserted, n. We'll also need to keep track of c's parent node. Initially, c is the BST root and parent is a null reference. Locating the new parent node is accomplished by using the following algorithm:

  1. If c is a null reference, then parent will be the parent of n. If n's value is less than parent's value, then n will be parent's new left child; otherwise n will be parent's new right child.
  2. Compare c and n's values.
  3. If c's value equals n's value, then the user is attempting to insert a duplicate node. Either simply discard the new node, or raise an exception. (Note that the nodes' values in a BST must be unique.)
  4. If n's value is less than c's value, then n must end up in c's left subtree. Let parent equal c and c equal c's left child, and return to step 1.
  5. If n's value is greater than c's value, then n must end up in c's right subtree. Let parent equal c and c equal c's right child, and return to step 1.

This algorithm terminates when the appropriate leaf is found, which attaches the new node to the BST by making the new node an appropriate child of parent. There's one special case you have to worry about with the insert algorithm. If the BST does not contain a root, then there parent will be null, so the step of adding the new node as a child of parent is bypassed. Furthermore, in this case the BST's root must be assigned to the new node.

Figure 9 depicts the BST insert graphically.

Figure 9. Insert into a BST

Both the BST search and insert algorithms share the same running time: log2 n in the best case, and linear in the worst case. The insert algorithm's running time mimics the search's because it essentially uses the same tactics used by the search algorithm to find the location for the newly inserted node.

The order of insertion determines the BST's topology

Since newly inserted nodes into a BST are inserted as leaves, the order of insertion directly affects the topology of the BST itself. For example, imagine we insert the following nodes into a BST: 1, 2, 3, 4, 5, 6. When 1 is inserted, it is inserted as the root. Next, 2 is inserted as 1's right child. 3 is inserted as 2's right child, 4 as 3's right child, and so on. The resulting BST is one whose structure is precisely that of the BST in Figure 8.

If the values 1, 2, 3, 4, 5, and 6 had been inserted in a more intelligent manner, the BST would have had more breadth, and would have looked more like the tree in Figure 6. The ideal insertion order would be: 4, 2, 5, 1, 3, 6. This would put 4 at the root, 2 as 4's left child, 5 as 4's right child, 1 and 3 as 2's left and right children, and 6 as 5's right child.

Since the topology of a BST can greatly affect the running time of search, insert, and (as we will see in the next section) delete, inserting data in ascending or descending order (or in near order) can have devastating results on the efficiency of the BST. We'll discuss this topic in more detail a bit later in the article.

Deleting Nodes from a BST

Deleting nodes from a BST is slightly more difficult than inserting a node because deleting a node that has children requires that some other node be chosen to replace the hole created by the deleted node. If the node to replace this hole is not chosen with care, the binary search tree property may be violated. For example, consider the BST in Figure 6. If the node 150 is deleted, some node must be moved to the hole created by the deletion of node 150. If we arbitrarily choose to move, say node 92 there, the BST property is deleted since 92's new left subtree will have nodes 95 and 111, both of which are greater than 92 and thereby violating the binary search tree property.

The first step in the algorithm to delete a node is to first locate the node to delete. This can be done using the searching algorithm discussed earlier, and therefore has a log2 n running time. Next, a node from the BST must be selected to take the deleted node's position. There are three cases to consider when choosing the replacement node, all of which are illustrated in Figure 10 below.

Case 1: If the node being deleted has no right child, then the node's left child can be used as the replacement. The binary search tree property is maintained because we know that the deleted node's left subtree itself maintains the binary search tree property, and that the values in the left subtree are all less than or all greater than the deleted node's parent, depending on whether the deleted node is a left or right child. Therefore, replacing the deleted node with its left subtree maintains the binary search tree property.

Case 2: If the deleted node's right child has no left child, then the deleted node's right child can replace the deleted node. The binary search tree property is maintained because the deleted node's right child is greater than all nodes in the deleted node's left subtree and is either greater than or less than the deleted node's parent, depending on whether the deleted node was a right or left child. Therefore, replacing the deleted node with its right child maintains the binary search tree property.

Case 3: Finally, if the deleted node's right child does have a left child, then the deleted node needs to be replaced by the deleted node's right child's left-most descendant. That is, we replace the deleted node with the right subtree's smallest value.

Note   Realize that for any BST, the smallest value in the BST is the left-most node, while the largest value is the right-most node.

This replacement choice maintains the binary search tree property because it chooses the smallest node from the deleted node's right subtree, which is guaranteed to be larger than all nodes in the deleted node's left subtree. Also, since it's the smallest node from the deleted node's right subtree, by placing it at the deleted node's position, all of the nodes in its right subtree will be greater.

Figure 10 illustrates the node to choose for replacement for each of the three cases.

Figure 10. Cases to consider when deleting a node

As with the insert and searching algorithms, the asymptotic running time of delete is dependent upon the BST's topology. Ideally, the running time is on the order of log2 n. In the worst case, though, it takes linear time.

Traversing the Nodes of a BST

With the linear, contiguous ordering of an array's elements, iterating through an array is a straightforward manner. Start at the first array element, and step through each element sequentially. For BSTs there are three different kinds of traversals that are commonly used:

  • Preorder traversal
  • Inorder traversal
  • Postorder traversal

Essentially, all three traversals work in roughly the same manner. They start at the root and visit that node and its children. The difference among these three traversals is the order with which they visit the node itself versus visiting its children. To help clarify this, consider the BST in Figure 11. (Note that the BST in Figure 11 is the same as the BST in Figure 6 and is repeated here for convenience.)

Figure 11. A sample Binary Search Tree

Preorder traversal

Preorder traversal starts by visiting the current node—call it c—then its left child, and then its right child. Starting with the BST's root as c, this algorithm can be written out as:

  1. Visit c. (This might mean printing out the value of the node, adding the node to an ArrayList, or something else. It depends on what you want to accomplish by traversing the BST.)
  2. Repeat step 1 using c's left child.
  3. Repeat step 1 using c's right child.

Imagine that in step 1 of the algorithm we were printing out the value of c. In this case, what would the output be for a preorder traversal of the BST in Figure 11? Well, starting with step 1 we would print the root's value. Step 2 would have us repeat step 1 with the root's left child, so we'd print 50. Step 2 would have us repeat step 1 with the root's left child's left child, so we'd print 20. This would repeat all the way down the left side of the tree. When 5 was reached, we'd first print out its value (step 1). Since there are no left or right children of 5, we'd return to node 20, and perform its step 3, which is repeating step 1 with 20's right child, or 25. Since 25 has no children, we'd return to 20, but we've done all three steps for 20, so we'd return to 50, and then take on step 3 for node 50, which is repeating step 1 for node 50's right child. This process would continue on until each node in the BST had been visited. The output for a preorder traversal of the BST in Figure 11 would be:

90, 50, 20, 5, 25, 75, 66, 80, 150, 95, 92, 111, 175, 166, 200

Understandably, this may be a bit confusing. Perhaps looking at some code will help clarify things. The following code shows the PreorderTraversal() method from the BST class we'll be building later on in this article. Note that this method takes in a Node class instance as one of its input parameters. This input node is the node c from the list of the algorithm's steps. Realize that a preorder traversal of the BST would begin by calling PreorderTraversal() and passing in the BST's root.

protected virtual string PreorderTraversal(Node current, string separator)
{
   if (current != null)
   {
      StringBuilder sb = new StringBuilder();
      sb.Append(current.Value.ToString());
      sb.Append(separator);

      sb.Append(PreorderTraversal(current.Left, separator));
      sb.Append(PreorderTraversal(current.Right, separator));
      return sb.ToString();
   }
   else
      return String.Empty;
}

Note that the result of the traversal is being gathered in a string using a StringBuilder. First the data of the current Node is being added, and then the resulting strings from visiting the left and right children of current.

Inorder traversal

Inorder traversal starts by visiting the current node's left child, then the current node, and then its right child. Starting with the BST's root as c, this algorithm can be written out as:

  1. Visit c's left child. (This might mean printing out the value of the node, adding the node to an ArrayList, or something else. It depends on what you want to accomplish by traversing the BST.)
  2. Repeat step 1 for c.
  3. Repeat step 1 using c's right child.

The code for InorderTraversal() is just like PreorderTraversal() except that adding the current Node's data to the StringBuilder occurs after another call to InorderTraversal(), passing in current's left child.

protected virtual string InorderTraversal 
                (Node current, string separator)
{
   if (current != null)
   {
      StringBuilder sb = new StringBuilder();
      sb.Append(InorderTraversal(current.Left, separator));

      sb.Append(current.Value.ToString());
      sb.Append(separator);

      sb.Append(InorderTraversal(current.Right, separator));
      return sb.ToString();
   }
   else
      return String.Empty;
}

Applying an inorder traversal to the BST in Figure 11, the output would be:

5, 20, 25, 50, 66, 75, 80, 90, 92, 95, 111, 150, 166, 175, 200

Note that the results returned are in ascending order.

Postorder traversal

Finally, postorder traversal starts by visiting the current node's left child, then its right child, and finally the current node itself. Starting with the BST's root as c, this algorithm can be written out as:

  1. Visit c's left child. (This might mean printing out the value of the node, adding the node to an ArrayList, or something else. It depends on what you want to accomplish by traversing the BST.)
  2. Repeat step 1 using c's right child.
  3. Repeat step 1 for c.

The output of a postorder traversal for the BST in Figure 11 would be:

5, 25, 20, 66, 80, 75, 50, 92, 111, 95, 166, 200, 175, 150, 90

Note   The download included with this article contains the complete source code for the BST and BinaryTree classes, along with a Windows Forms testing application for the BST class. Of particular interest, the Windows Forms application allows you to view the contents of the BST in either preorder, inorder, or postorder.

Realize that all three traversal times exhibit linear running time. This is because each traversal option visits each and every node in the BST exactly one time. So, if the number of nodes in the BST are doubled, the amount of work for a traversal doubles as well.

Implementing a BST Class

While the standard Java SDK includes a BST class (called TreeMap), the .NET Framework Base Class Library does not include such a class. Therefore, let's create one. As with the binary tree, the first order of business is to create a Node class. We cannot simply reuse the Node class from the general binary tree because the nodes for a BST must contain data that is comparable. Therefore, rather than having allowing data of type object, we restrict data to those classes that implement IComparable.

Additionally, the BST Node class implements ICloneable because we might want to allow a developer using the BST class to clone the BST (that is, to make a deep copy). By making the Node class cloneable, we can clone the entire BST by simply returning a clone of its root. The Node class follows:

public class Node : ICloneable
{
   private IComparable data;
   private Node left, right;

   #region Constructors
   public Node() : this(null) {}
   public Node(IComparable data) : this(data, null, null) {}
   public Node(IComparable data, Node left, Node right)
   {
      this.data = data;
      this.left = left;
      this.right = right;
   }
   #endregion

   #region Public Methods
   public object Clone()
   {
      Node clone = new Node();
      if (data is ICloneable)
         clone.Value = (IComparable) ((ICloneable) data).Clone();
      else
         clone.Value = data;

      if (left != null)
         clone.Left = (Node) left.Clone();
      
      if (right != null)
         clone.Right = (Node) right.Clone();

      return clone;
   }
   #endregion

   #region Public Properties
   public IComparable Value
   {
      get
      {
         return data;
      }
      set
      {
         data = value;
      }
   }

   public Node Left
   {
      get
      {
         return left;
      }
      set
      {
         left = value;
      }
   }

   public Node Right
   {
      get
      {
         return right;
      }
      set
      {
         right = value;
      }
   }
   #endregion
}

Note that the BST Node class is quite similar to the binary tree Node class. The only differences are that the data is of type IComparable instead of object, and the Node class implements ICloneable, and therefore has a Clone() method.

Let's now turn our attention to creating the BST class, which represents a binary search tree. Over the next few sections we'll look at each of the major methods of the class. For the class's full source code, be sure to download the source code that accompanies this article, which also includes a Windows Forms application for testing the BST class.

Searching for a node

The reason BSTs are important to understand is that they offer sublinear search times. Therefore, it only makes sense to first examine the BST's Search() method. The Search() method accepts a single input parameter that implements IComparable. It then calls a private helper function, SearchHelper(), passing in the BSTs root and the data for which it's searching.

SearchHelper() recursively percolates down the tree until it either reaches a null reference—in which case null is returned—or until it finds the targeted node. The end result is that Search() returns either a null, meaning the data being searched for is not in the BST, or a reference to the node that contains the data in the search parameters.

public virtual Node Search(IComparable data)
{
   return SearchHelper(root, data);
}

protected virtual Node SearchHelper(Node current, IComparable data)
{
   if (current == null)
      return null;   // node was not found
   else
   {
      int result = current.Value.CompareTo(data);
      if (result == 0)
         // they are equal - we found the data
         return current;
      else if (result > 0)
      {
         // current.Value > n.Value
         // therefore, if the data exists it is in current's left subtree
         return SearchHelper(current.Left, data);
      }
      else // result < 0
      {
         // current.Value < n.Value
         // therefore, if the data exists it is in current's right subtree
         return SearchHelper(current.Right, data);
      }
   }
}

Adding a node to the BST

Unlike the BinaryTree class we created earlier, the BST class does not provide direct access to its root. Rather, nodes are added to the BST through the BST's Add() method. Add() takes as input some instance of a class that implements IComparable. It then snakes its way through the BST, looking for its new parent. (Recall that the new node will be inserted as a leaf node.) Once the parent is found, the new node is made either the left or right child of the parent, depending on if its value is less than or greater than the parent's value.

public virtual void Add(IComparable data)
{
   // first, create a new Node
   Node n = new Node(data);
   int result;

   // now, insert n into the tree
   // trace down the tree until we hit a NULL
   Node current = root, parent = null;
   while (current != null)
   {
      result = current.Value.CompareTo(n.Value);
      if (result == 0)
         // they are equal - inserting a duplicate - do nothing
         return;
      else if (result > 0)
      {
         // current.Value > n.Value
         // therefore, n must be added to current's left subtree
         parent = current;
         current = current.Left;
      }
      else if (result < 0)
      {
         // current.Value < n.Value
         // therefore, n must be added to current's right subtree
         parent = current;
         current = current.Right;
      }
   }

   // ok, at this point we have reached the end of the tree
   count++;
   if (parent == null)
      // the tree was empty
      root = n;
   else
   {
      result = parent.Value.CompareTo(n.Value);
      if (result > 0)
         // parent.Value > n.Value
         // therefore, n must be added to parent's left subtree
         parent.Left = n;
      else if (result < 0)
         // parent.Value < n.Value
         // therefore, n must be added to parent's right subtree
         parent.Right = n;
   }
}

Whereas the Search() method used recursion to percolate down the BST, the Add() method uses a simple while loop. Either approach works, although the while loop approach is likely to be slightly more efficient since there is a slight performance cost associated with invoking a function that is not incurred through a simple loop. The point is to realize that method working with BSTs can typically be rewritten in one of two ways—either through recursive techniques or through a while loop. (Personally I find the recursive approach to be easier to read and understand.)

Note   As you can see by examining the code, if the user attempts to add a duplicate, the Add() method exists without inserting a node. If you want, you could alter this code to raise an exception.

Deleting a node from the BST

Recall that deleting a node from a BST is the trickiest of the BST operations. The trickiness is due to the fact that deleting a node from a BST necessitates that a replacement node be chosen to occupy the space once held by the deleted node. Care must be taken when selecting this replacement node so that the binary search tree property is maintained.

Earlier, in the Deleting Nodes from a BST section, we discussed how there were three different scenarios for deciding what node to choose to replace the deleted node. These cases were summarized in Figure 10. Below you can see how the cases are identified and handled in the Delete() method.

public void Delete(IComparable data)
{
   // find n in the tree
   // trace down the tree until we hit n
   Node current = root, parent = null;
   int result = current.Value.CompareTo(data);
   while (result != 0 && current != null)
   {            
      if (result > 0)
      {
         // current.Value > n.Value
         // therefore, n must be added to current's left subtree
         parent = current;
         current = current.Left;
      }
      else if (result < 0)
      {
         // current.Value < n.Value
         // therefore, n must be added to current's right subtree
         parent = current;
         current = current.Right;
      }

      result = current.Value.CompareTo(data);
   }

   // if current == null, then we did not find the item to delete
   if (current == null)
      throw new Exception("Item to be deleted does not exist in the BST.");


   // at this point current is the node to delete, and parent is its parent
   count--;
   
   // CASE 1: If current has no right child, then current's left child becomes the
   // node pointed to by the parent
   if (current.Right == null)
   {
      if (parent == null)
         root = current.Left;
      else
      {
         result = parent.Value.CompareTo(current.Value);
         if (result > 0)
            // parent.Value > current
            // therefore, the parent's left subtree is now current's Left subtree
            parent.Left = current.Left;
         else if (result < 0)
            // parent.Value < current.Value
            // therefore, the parent's right subtree is now current's left subtree
            parent.Right = current.Left;
      }
   }
   // CASE 2: If current's right child has no left child, then current's right child replaces
   // current in the tree
   else if (current.Right.Left == null)
   {
      if (parent == null)
         root = current.Right;
      else
      {
         result = parent.Value.CompareTo(current.Value);
         if (result > 0)
            // parent.Value > current
            // therefore, the parent's left subtree is now current's right subtree
            parent.Left = current.Right;
         else if (result < 0)
            // parent.Value < current.Value
            // therefore, the parent's right subtree is now current's right subtree
            parent.Right = current.Right;
      }
   }   
   // CASE 3: If current's right child has a left child, replace current with current's
   // right child's left-most node.
   else
   {
      // we need to find the right node's left-most child
      Node leftmost = current.Right.Left, lmParent = current.Right;
      while (leftmost.Left != null)
      {
         lmParent = leftmost;
         leftmost = leftmost.Left;
      }

      // the parent's left subtree becomes the leftmost's right subtree
      lmParent.Left = leftmost.Right;
      
      // assign leftmost's left and right to current's left and right
      leftmost.Left = current.Left;
      leftmost.Right = current.Right;

      if (parent == null)
         root = leftmost;
      else
      {
         result = parent.Value.CompareTo(current.Value);
         if (result > 0)
            // parent.Value > current
            // therefore, the parent's left subtree is now current's right subtree
            parent.Left = leftmost;
         else if (result < 0)
            // parent.Value < current.Value
            // therefore, the parent's right subtree is now current's right subtree
            parent.Right = leftmost;
      }
   }
}

Note   Notice that the Delete() method throws an exception if the item to be deleted was not found in the BST.

The remaining BST methods and properties

There are a few other BST methods and properties not examined in this article. Be sure to download this article's accompanying code for a complete look at the BST class. The remaining methods are:

  • Clear(): removes all of the nodes from the BST.
  • Clone(): clones the BST (creates a deep copy).
  • Contains(IComparable): returns a Boolean indicating if specified data exists in the BST or not.
  • GetEnumerator(): returns an enumerator that enumerates through the BST nodes using the inorder traversal technique. This method allows one to iterate through the nodes in a BST using a foreach loop.
  • PreorderTraversal() / InorderTraversal() / PostorderTraversal(): we examined these methods in the "Traversing the Nodes of a BST" section.
  • ToString(): returns a string representation of the BST using a specified traversal technique.
  • Count: a public, read-only property that returns the number of nodes in the BST.

Binary Search Trees in the Real World

While binary search trees ideally exhibit sub-linear running times for insertions, searches, and deletions, the running time is dependent upon the BST's topology. The topology, as we discussed in the Inserting Nodes into a BST section, is dependent upon the order with which the data is added to the BST. Data being entered that is ordered or near-ordered will cause the BST's topology to resemble a long, thin tree, rather than a short, wide one. In many real-world scenarios, data is naturally in an ordered or near-ordered state.

The problem with BSTs is that they can become easily unbalanced. A balanced binary tree is one that exhibits a good ratio of breadth to depth. As we will examine in the next part of this article series, there are a special class of BSTs that are self-balancing. That is, as new nodes are added or existing nodes are deleted, these BSTs automatically adjust their topology to maintain an optimal balance. With an ideal balance, the running time for insertion, searches, and deletion, even in the worst case, is log2 n. I mentioned earlier that the Java SDK ships with a BST class, TreeMap. This class actually uses one of the intelligent, self-balancing BST derivatives, the red-black tree.

In the next installment of this article series, we'll look at a couple self-balancing BST derivatives, including the red-black tree, and then focus on a data structure known as a SkipList, which exhibits the benefits of self-balancing binary trees, but without the need for restructuring the topology. (Restructuring the topology, while efficient, requires a good deal of difficult code and is anything but readable.)

Until then, happy programming!

Appendix A: Definitions

The following is an alphabetized list of definitions for terms used throughout this article:

Term Definition
Binary tree A tree where each node has at most two children.
Binary search tree A binary tree that exhibits the following property: for any node n, every descendant node's value in the left subtree of n is less than the value of n, and every descendant node's value in the right subtree is greater than the value of n.
Cycle A cycle exists if starting from some node n there exists a path that returns to n.
Internal node A node that has one or more children.
Leaf node A node that has no children.
Node Nodes are the building blocks of trees. A node contains some piece of data, a parent, and a set of children. (Note: the root node does not contain a parent; leaf nodes do not contain children.)
Root The node that has no parent. All trees contain precisely one root.
Subtree A subtree rooted at node n is the tree formed by imaging node n was a root. That is, the subtree's nodes are the descendants of n and the subtree's root is n itself.

Appendix B: Running Time Summaries

The following table lists the common operations that are performed on a BST and their associated running times.

Operation Best Case Running Time Worst Case Running Time
Search log2 n n
Insert log2 n n
Delete log2 n n
Preorder Traversal n n
Inorder Traversal n n
Postorder Traversal n n

References

Cormen, Thomas H., Charles E. Leiserson, and Ronald L. Rivest. "Introduction to Algorithms." MIT Press. 1990.

Scott Mitchell, author of five books and founder of 4GuysFromRolla.com, has been working with Microsoft Web technologies for the past five years. Scott works as an independent consultant, trainer, and writer, and recently completed his Masters degree in Computer Science at the University of California – San Diego. He can be reached at mitchell@4guysfromrolla.com or through his blog at http://ScottOnWriting.NET.

Weergeven:
© 2014 Microsoft