15-111 Lecture 15 (Tuesday, June 10, 2003)

15-111 Lecture 15 (Tuesday, June 10, 2003)

Binary Search Trees (BSTs)

Much like the binary heaps we discussed last class, a binary search tree is a binary tree with special properties. Whereas the heap property provided for a partial ordering among the nodes of the tree, the properties of a BST provide for a total ordering.
In other words, given an enumeratable collection of items, a BST will ensure that we can access each node in the right order. By contrast, a heap only provided that we knew the position of a parent with respect to its children, but not the position of either of those children to its sibling, the other child.
Specifically, a BST guarantees the following:

The nodes in the left subtree have data with lower values than the root
The nodes in the right subtree always have data with higher values than the root

This property recursively applies to all subtrees, so that, for any node, the data on its left is lower and the data on its right is higher.
Let's construct a binary search tree using by inserting the letters of "HELLO WORLD" into the tree one at a time. For convenience, we will ignore duplicate letters.

How did this work? Let's go through the string one letter at a time.

"H" - the tree is initially empty, so "H" becomes the root
"E" - "E" comes before "H", so it goes on the left
"L" - "L" comes after "H", so it goes on the right
"L" - "L" is already in the tree, so we ignore it
"O" - "O" is greater than "H", so it goes to the right of it, and it is greater than "L", so it goes to the right of that, too
"W" - "W" is greater than "H", so it goes to the right of it, it is greater than "L", so it goes to the right of that, and it is greater than "O", so it goes to the right once again
"O" - "O" is already in the tree, so we ignore it
"R" - "R" comes after "H", "L", and "O", so it goes to the right of all of them, and then it comes before "W" so it goes to the left of it.
"L" - "L" is already in the tree, so we ignore it
"D" - "D" comes before "H", so it goes to the left of it, and it also comes before "E", so it goes to the left of that, too.

The result is the tree you see. If you look at any node in the tree, you will see that the binary search tree ordering property holds. As as aside, unlike heaps which only provided a partial ordering, binary search trees provide a full ordering.

Why Use A Binary Search Tree?

Wouldn't we be able to do more with a tree which doesn't adhere to such strict guidelines? As with the stack and queue, sometimes limiting a structure gives us more control.
If we have a linked list or a generic tree, to find a value we need to look at every single item in the list or tree before we know that it is not in there. If there are a lot of items, this can take a long time. With a binary search tree, we can eliminate part of the tree with each step because we have a better understanding of where to look for the data we are trying to find.
Suppose we're looking for the letter F in the "HELLO WORLD" tree. We can immediately eliminate everything to the right of the H, because we know that F can't be there because F comes before H. We move on to E, and we can eliminate everything on its left, because we know that it can't be there because it comes after E. E's right child is null, so we have determined that F is not in the tree by only looking at 2 of the 7 nodes. This is clearly an improvement over a linked list, where we would have needed to look at all 7 nodes.
How many nodes do we have to look at in a tree before we know that an item is not in the tree? To simplify this, we will only look at the best possible trees of a given size. These best trees are complete and balanced, meaning that the path from the root to the farthest leaf is at most one step longer than the path from the root to the closest leaf. The following are examples of balanced trees:

So how many nodes do we have to look at in the worst case? If there are N nodes in the tree:

N nodes # looked at

1 1

2 2

3 2

4 3

5 3

6 3

7 3

8 4

You can probably see a pattern here. The number of items you need to look at grows every time we reach a power of 2. We would only need to look at 4 items for trees with between 8 and 15 nodes and then we would have to look at 5 items in a tree with 16 nodes. This is known as logarithmic growth, and we can create a formula for the number of items we need to look at in a tree with N nodes.
# items = (log₂ N) + 1
In a tree with 1,000,000 nodes, we would only need to look at 20. This is a significant improvement over linearly searching a linked list.

The Comparable Interface

As we start developing code to implement Binary Search Trees, we're going to need a way of comparing different Objects. The Java Comparable interface provides this functionality in the form of the compareTo() method. So, instead of dealing with Objects, our trees will store Comparable Objects -- only those Objects that implement the Comparable interface. The Comparable Interface defines only one method: int compareTo(Object o).
Let's consider a.compareTo(b). In this case, compareTo() will return 1 if a is greater than b, 0 if the two are equal, or -1 if a is less than b.
Remember, the compareTo() method must be defined in each Object that implements the Comparable interface. It is in this definition where the implementor of how the particular type of Object is compared.

Inserting Into A Binary Search Tree

We already went through the process of building a tree when we created the "HELLO WORLD" tree, so now let's take a look at some code to perform the insertion. Since trees are naturally recursive, we will use recursion. The root parameter in this code is the current subtree, not necessarily the root of the original tree. We search for the position to insert the new node, by cutting the original tree in half with each examination, and determining which half to search. We then call the insert method recursively on the correct half of the tree, by passing it either the left child or the right child of the root from the previous recursive activation.
// "root" here is the root of the current subtree
void BSTinsert(BinaryTreeNode root, Comparable data)
{
   // if the tree is initially empty, the data we
   // add becomes the root of the tree
   if (null == this.root)
   {
      this.root = new BinaryTreeNode(data);
      return;
   }

   // if the current data matches the data we want to
   // insert, it is already in the tree so we ignore it
   if (root.data().compareTo(data) == 0)
   {
      return;
   }

   // if the data we want to add is less than the root's data,
   // we need to go to the left
   if (root.data().compareTo(data) > 0)
   {
      // if the left is null, we can add data there
      if (root.left() == null)
      {
         root.setLeft(new BinaryTreeNode(data));
         return;
      }
      // if not, we need to recursively insert into the
      // subtree on the left
      else
      {
         BSTinsert(root.left(), data);
         return;
      }
   }
   // if the current data is less than the one we want
   // to add, we need to go to the right
   else
   {
      // if the right is null, we can add data there
      if (root.right() == null)
      {
         root.setRight(new BinaryTreeNode(data));
         return;
      }
      // if not, we need to recursively insert into the
      // subtree on the right
      else
      {
         BSTinsert(root.right(), data);
         return;
      }
   }
}
  

Searching in a Binary Search Tree

As with insert(), this is implemented recursively. It returns the data, if found, or throws an exception, otherwise.

Comparable BSTfind(Node root, Comparable findMe) throws NotFoundException
{
   // if the current subtree is null, the findMe
   // can't possible be in it
   if (null == root)
   {
      thrown new NotFoundException("Item not found in BST.")
   }

   // if the current data matches findMe, we have
   // found it so we can return it
   if (root.data().compareTo(findMe) == 0)
   {
     return root.data();
   }

   // if the current data is greater than findMe, then
   // if findMe is in the tree it must be to the left,
   // so we will recursively search the left subtree
   if (root.data().compareTo(findMe) > 0)
   {
      return BSTfind(root.left());
   }
   // if the current data is less than findMe, then
   // if findMe is in the tree it must be to the right,
   // so we will recursively search the right subtree
   else
   {
      return BSTfind(root.right());
   }
}

Total Ordering? Let's Print the Nodes In Order

I mentioned earlier that a BST provides for a total ordering of the elements it contains. This implies that we should be able to print the nodes in order. So, how do we do that?
Well, in a balanced binary tree, the root node is in the middle of the items in the tree. Even in an unbalanced tree, we know that if there are items that are lower than the root, they'll be to its left. Since we want to print the lowest valued items first, we want to move to the left. And, we want to continue to do this, until we can't move left any more - this will get us to the lowest item in the tree. We can then print it.
Given that we are at the lowest item in the tree, we know that its parent, if it isn't the root, which has no parent, is the next lowest item in the tree. So, we want to move back up to its parent, and then print it. From there, we know that we've printed everything less than this parent, and the parent, so the next greatest item will be to the right of the parent. So, we move right.
Having moved right, we want to repeat this whole process, working our way to the left, and then back up and to the right.
So, since we defined a tree using nodes that don't have parent references, how do we move back up to the parent? The easy answer is to use recursion. Using recursion, the runtime stack will hold the path from the parent down to the current node. By returning, we can get back to our parent.
Actually, the whole operation is defined recursively in a very striaght-forward way:
  1.  void inOrder(BinaryTreeNode root)
  2.  {
  3.      if (null == root) return;
  4. 
  5.      inOrder(root.left());   // print the entire left subtree
  6. 
  7.      System.out.println(root.data());
  8. 
  9.      inOrder(root.right());  // print the entire right subtree
  10.
  11.     return;
  12. }
  

In class, we went through several traces by hand. The value of these is significantly lost in lecture notes, because they are not interactive. But, I'll include one trace, just for completeness.

Let's consider the following tree:

             10
            /  \
           /    \
          5     15
         / \   /  \
        /   \ 12  18
       3     9
            /
           8

Initally, we begin in some CallingMethod() by calling inOrder() passing it the root of the tree, which I'll symbolize as Node-10. This puts us at line 1 of inOrder(), I'll note this as: inOrder (Node-10):1.

So, the stack looks like this:

     inOrder (Node-10):1
     CallingMethod():?

Now, at line 5, we go left, activating another instance of the inOrder() method, this time, rooted at Node-5. The node isn't null, so we continue until line 5. Just before the next call, we have a stack that looks like this:

     inOrder (Node-10):5
     CallingMethod():?

And, once we make the recursive call, it looks like this:

     inOrder (Node-5):1
     inOrder (Node-10):5
     CallingMethod():?

Since Node-5 is not null, this process repeats. Again, we push another stack frame onto the stack.

     inOrder (Node-3):1
     inOrder (Node-5):5
     inOrder (Node-10):5 
     CallingMethod():?

The stack above shows that we are three levels deep in the tree (the stack depth is three -- three calls). We have gone left (line 5) twice, and are now beginning the third instance of the inOrder() method (line 1).

So, since node-3 is not null, we continue past line 3 to line 5, and go left again:

     inOrder (Node-null):1
     inOrder (Node-3):5
     inOrder (Node-5):5
     inOrder (Node-10):5 
     CallingMethod():?

This time, the root is null -- node-3 didn't have a left child. So, at line 3, we return and pick up where we left off, popping the stack as shown:

     inOrder (Node-3):5 //Continuing from here
     inOrder (Node-5):5
     inOrder (Node-10):5 
     CallingMethod():?

After line 5 in inOrder(Node-3), we hit line 7, and print the node ...so, we print out "3".

Then we continue at line 9 and try to go right:

     inOrder (Node-null):1 
     inOrder (Node-3):9
     inOrder (Node-5):5
     inOrder (Node-10):5 
     CallingMethod():?

But, the node is null, so at line 3 of inOrder (node-null), we return, and pop the stack, so we continue form here:

     inOrder (Node-3):9 // pick up here
     inOrder (Node-5):5
     inOrder (Node-10):5 
     CallingMethod():?

We next hit line 1 of inOrder(Node-3), which returns, so we pop the runtime stack again:

     inOrder (Node-5):5 // Continue from here
     inOrder (Node-10):5 
     CallingMethod():?

So, we pick up where we left off in inOrder(Node-5), and print the node at line 7...so, we print out 5. We have now printed 3 and 5.

Next, we continue to line 9, where we go right:

     inOrder (Node-9):1 
     inOrder (Node-5):9
     inOrder (Node-10):5 
     CallingMethod():?

Since Node-9 is not null, we continue to line 5, where we go left:

     inOrder (Node-8):1 
     inOrder (Node-9):5 
     inOrder (Node-5):9
     inOrder (Node-10):5 
     CallingMethod():?

And, we do the same in the next activation of inOrder(): inOrder(Node-8) reaches line 5 and recursively calls inOrder(Node-null):

     inOrder (Node-null):1 
     inOrder (Node-8):5 
     inOrder (Node-9):5 
     inOrder (Node-5):9
     inOrder (Node-10):5 
     CallingMethod():?

This call returns at line 3, since the root is null, again popping the stack:

     inOrder (Node-8):5 // continue from here
     inOrder (Node-9):5 
     inOrder (Node-5):9
     inOrder (Node-10):5 
     CallingMethod():?

So, inOrder(Node-8) continues to line 7, printing 8. We have now printed 3, 5, and 8, in that order.

inOrder(Node-8) then continues to line 9, where it calls itself recursively on its right child:

     inOrder (Node-null):1
     inOrder (Node-8):9 
     inOrder (Node-9):5 
     inOrder (Node-5):9
     inOrder (Node-10):5 
     CallingMethod():?

But, since the right child was null (didn't exit), this activation of the method returns at line 3, popping the stack:

     inOrder (Node-8):9 // continue from here.
     inOrder (Node-9):5 
     inOrder (Node-5):9
     inOrder (Node-10):5 
     CallingMethod():?

inOrder(Node-8) then conintues where it left off, right after line 9, and reaches line 11, where it returns, again popping the stack:

     inOrder (Node-9):5 // Continue from here
     inOrder (Node-5):9
     inOrder (Node-10):5 
     CallingMethod():?

Now, we're back to inOrder (Node-9), which picks up after line 5, at line 7, and prints the node. We've now printed 3, 5, 8, and 9, in that order.

It continues to line 11, where it returns, again popping the stack:

     inOrder (Node-5):9 // continue from here
     inOrder (Node-10):5 
     CallingMethod():?

So, we find ourselves back in inOrder(Node-5), after line 9, so we hit line 11, and return, again, popping the stack:

     inOrder (Node-10):5 // Continue from here
     CallingMethod():?

So, at this point, the recursion has unwound and we have found ourselves back in the first call to inOrder(), inOrder(Node-10). We have now printed the entire left subtree. So, we continue to line 7, where we print out the node. We've now printed out 3, 5, 8, 9, and 10, in order.

We then continue to line 9 of inOrder(Node-10), where we begin exploring the right subtree, by calling inOrder(Node-15). As before, we push the new call onto our stack, and begin a recursive phase (as opposed to the shrinking unwinding phase), again:

     inOrder (Node-15):1
     inOrder (Node-10):9 
     CallingMethod():?

Since the node isn't null, we proceed in inOrder(Node-15) past line 3 to line 5. Here we call inOrder() on the left subtree, again pushing the new call, inOrder(Node-12) onto the stack:

     inOrder (Node-12):1
     inOrder (Node-15):5
     inOrder (Node-10):9 
     CallingMethod():?

inOrder(Node-12) again passes the test at line three and "goes deeper" at line 5, calling inOrder() on the left subtree: inOrder(Node-null):

     inOrder (Node-null):1
     inOrder (Node-12):5
     inOrder (Node-15):5
     inOrder (Node-10):9 
     CallingMethod():?

Since the node is null, the test at line three causes it to return, popping the stack:

     inOrder (Node-12):5 // Continue from here
     inOrder (Node-15):5
     inOrder (Node-10):9 
     CallingMethod():?

inOrder(Node-12) then picks up where it left off, continuing to line 7, where it prints the node. We've now printed 3, 5, 8, 9, 10, and 12, in order.

It then proceeds to line 9 and explores the right subtree, which is null:

     inOrder (Node-null):1 
     inOrder (Node-12):9
     inOrder (Node-15):5
     inOrder (Node-10):9 
     CallingMethod():?

Since this root is null, it is caught by the test at line 3, and returns, popping the stack:

     inOrder (Node-12):9 // Continue from here
     inOrder (Node-15):5
     inOrder (Node-10):9 
     CallingMethod():?

Now we're back in inOrder(Node-12), just after line 9. Execution proceeds to line 11, where it returns, again unwinding.

     inOrder (Node-15):5 // Continue from here
     inOrder (Node-10):9 
     CallingMethod():?

So, inOrder(Node-15) continues after line 5, printing the node at line 7. We've now printed 3, 5, 8, 9, 10, 12, and 15, in order.

Execution then continues to line 9, where we go to the right, inOrder(Node-18):

     inOrder (Node-18):1
     inOrder (Node-15):9
     inOrder (Node-10):9 
     CallingMethod():?

Since Node-18 is not null, inOrder (Node-18) continues to line 5, where the left sub-tree will be explored:

     inOrder (Node-null):1
     inOrder (Node-18):5
     inOrder (Node-15):9
     inOrder (Node-10):9 
     CallingMethod():?

Unfortunately, the root passed into inOrder() is null, so it hits line 3 then returns, again popping the stack:

     inOrder (Node-18):5 // Execution continues here
     inOrder (Node-15):9
     inOrder (Node-10):9 
     CallingMethod():?

Execution continues after line 5 of inOrder(Node-18). At line 7, 18 is printed. We've now printed 3, 5, 8, 9, 10, 12, 15, and 18, in order.

We then try to explore the right sub-tree of Node-18, by continuing to line 9, where it makes a recursive call, passing its null right child as the root:

     inOrder (Node-null):1 
     inOrder (Node-18):9 
     inOrder (Node-15):9
     inOrder (Node-10):9 
     CallingMethod():?

This activaton of inOrder() returns at line 3, because the root is null. This again pops the stack:

     inOrder (Node-18):9 
     inOrder (Node-15):9
     inOrder (Node-10):9 
     CallingMethod():?

Notice that, in the stack shown above, each activation is at line 9. Upon return, each, in turn, will proceed to line 11 and return. We've seen this behavior before, it is called unwinding. There is nothing in the recursive method after line 9, except the return.

As a result, we'lll just see the stack shrink as each activation picks up after line 9, reaches line 11, and returns, popping the stack:

     inOrder (Node-15):9 // Continue here
     inOrder (Node-10):9 
     CallingMethod():?

inOrder(Node-15) continues after line 9, reaching line 11, and returning, again pooping the stack:

     inOrder (Node-10):9 // Continue here 
     CallingMethod():?

And, the same is true of inOrder (Node-10). At this point, we're back to the calling function which, having printed the elements of the tree in order, continues along its merry way:

     CallingMethod():? // Continue here