15-111 Lecture 19 (Wednesday, March 21, 2007)

15-111 Lecture 19 (Wednesday, March 21, 2007)

The Comparable Interface

As we start developing code to implement Binary Search Trees, we're going to need a way of comparing different Objects. The Java Comparable interface provides this functionality in the form of the compareTo() method. So, instead of dealing with Objects, our trees will store Comparable Objects -- only those Objects that implement the Comparable interface. Although you have used the compareTo() method in lab, we never did formally introduce it in lecture -- so we'll do that now.
The Comparable Interface defines only one method: int compareTo(Object o).
Let's consider a.compareTo(b). In this case, compareTo() will return 1 if a is greater than b, 0 if the two are equal, or -1 if a is less than b.
Remember, the compareTo() method must be defined in each Object that implements the Comparable interface. It is in this definition where the implementor of how the particular type of Object is compared.

The Big Picture

Much like our LinkedList and DoublyLinkedList classes, our BST will require two related classes: a BSTNode to represent the data and the left and right subtrees, and the BST, itself, which will contain the root of the tree and all of the methods, such as insert() and find(), that manipulate it. c
The root of the BST class serves a very similar purpose to the head of the LinkedList -- it gives us a place to start. And the left and right references within the BSTNode are analagous to the prev and next references within a doubly linked list node. They name other, related, nodes that are part of the tree structure. And, as before, the data member will be accessible, but immutable. The other references within the BSTNode will be mutable.

Inserting Into A Binary Search Tree

We already went through the process of building a tree when we created the "HELLO WORLD" tree, so now let's take a look at some code to perform the insertion. Since trees are naturally recursive, we will use recursion. The root parameter in this code is the current subtree, not necessarily the root of the original tree. We search for the position to insert the new node, by cutting the original tree in half with each examination, and determining which half to search. We then call the insert method recursively on the correct half of the tree, by passing it either the left child or the right child of the root from the previous recursive activation.
// "root" here is the root of the current subtree
void BSTinsert(BinaryTreeNode root, Comparable data)
{
   // if the tree is initially empty, the data we
   // add becomes the root of the tree
   if (null == this.root)
   {
      this.root = new BinaryTreeNode(data);
      return;
   }

   // if the current data matches the data we want to
   // insert, it is already in the tree so we ignore it
   if (root.data().compareTo(data) == 0)
   {
      return;
   }

   // if the current data is greater than the one we
   // want to add, we need to go to the left
   if (root.data().compareTo(data) > 0)
   {
      // if the left is null, we can add data there
      if (root.left() == null)
      {
         root.setLeft(new BinaryTreeNode(data));
         return;
      }
      // if not, we need to recursively insert into the
      // subtree on the left
      else
      {
         BSTinsert(root.left(), data);
         return;
      }
   }
   // if the current data is less than the one we want
   // to add, we need to go to the right
   else
   {
      // if the right is null, we can add data there
      if (root.right() == null)
      {
         root.setRight(new BinaryTreeNode(data));
         return;
      }
      // if not, we need to recursively insert into the
      // subtree on the right
      else
      {
         BSTinsert(root.right(), data);
         return;
      }
   }
}
  

Searching in a Binary Search Tree

As with insert(), this is implemented recursively. It returns the data, if found, or throws an exception, otherwise.

Comparable BSTfind(Node root, Comparable findMe) throws NotFoundException
{
   // if the current subtree is null, the findMe
   // can't possible be in it
   if (null == root)
   {
      thrown new NotFoundException("Item not found in BST.")
   }

   // if the current data matches findMe, we have
   // found it so we can return it
   if (root.data().compareTo(findMe) == 0)
   {
     return root.data();
   }

   // if the current data is greater than findMe, then
   // if findMe is in the tree it must be to the left,
   // so we will recursively search the left subtree
   if (root.data().compareTo(findMe) > 0)
   {
      return BSTfind(root.left());
   }
   // if the current data is less than findMe, then
   // if findMe is in the tree it must be to the right,
   // so we will recursively search the right subtree
   else
   {
      return BSTfind(root.right());
   }
}

Total Ordering? Let's Print the Nodes In Order

I mentioned earlier that a BST provides for a total ordering of the elements it contains. This implies that we should be able to print the nodes in order. So, how do we do that?
Well, in a balanced binary tree, the root node is in the middle of the items in the tree. Even in an unbalanced tree, we know that if there are items that are lower than the root, they'll be to its left. Since we want to print the lowest valued items first, we want to move to the left. And, we want to continue to do this, until we can't move left any more - this will get us to the lowest item in the tree. We can then print it.
Given that we are at the lowest item in the tree, we know that its parent, if it isn't the root, which has no parent, is the next lowest item in the tree. So, we want to move back up to its parent, and then print it. From there, we know that we've printed everything less than this parent, and the parent, so the next greatest item will be to the right of the parent. So, we move right.
Having moved right, we want to repeat this whole process, working our way to the left, and then back up and to the right.
So, since we defined a tree using nodes that don't have parent references, how do we move back up to the parent? The easy answer is to use recursion. Using recursion, the runtime stack will hold the path from the parent down to the current node. By returning, we can get back to our parent.
Actually, the whole operation is defined recursively in a very striaght-forward way:
  1.  void inOrder(BinaryTreeNode root)
  2.  {
  3.      if (null == root) return;
  4. 
  5.      inOrder(root.left());   // print the entire left subtree
  6. 
  7.      System.out.println(root.data());
  8. 
  9.      inOrder(root.right());  // print the entire right subtree
  10.
  11.     return;
  12. }
  

In class, we went through several traces by hand. The value of these is significantly lost in lecture notes, because they are not interactive. But, I'll include one trace, just for completeness.

Let's consider the following tree:

             10
            /  \
           /    \
          5     15
         / \   /  \
        /   \ 12  18
       3     9
            /
           8

Initally, we begin in some CallingMethod() by calling inOrder() passing it the root of the tree, which I'll symbolize as Node-10. This puts us at line 1 of inOrder(), I'll note this as: inOrder (Node-10):1.

So, the stack looks like this:

     inOrder (Node-10):1
     CallingMethod():?

Now, at line 5, we go left, activating another instance of the inOrder() method, this time, rooted at Node-5. The node isn't null, so we continue until line 5. Just before the next call, we have a stack that looks like this:

     inOrder (Node-10):5
     CallingMethod():?

And, once we make the recursive call, it looks like this:

     inOrder (Node-5):1
     inOrder (Node-10):5
     CallingMethod():?

Since Node-5 is not null, this process repeats. Again, we push another stack frame onto the stack.

     inOrder (Node-3):1
     inOrder (Node-5):5
     inOrder (Node-10):5 
     CallingMethod():?

The stack above shows that we are three levels deep in the tree (the stack depth is three -- three calls). We have gone left (line 5) twice, and are now beginning the third instance of the inOrder() method (line 1).

So, since node-3 is not null, we continue past line 3 to line 5, and go left again:

     inOrder (Node-null):1
     inOrder (Node-3):5
     inOrder (Node-5):5
     inOrder (Node-10):5 
     CallingMethod():?

This time, the root is null -- node-3 didn't have a left child. So, at line 3, we return and pick up where we left off, popping the stack as shown:

     inOrder (Node-3):5 //Continuing from here
     inOrder (Node-5):5
     inOrder (Node-10):5 
     CallingMethod():?

After line 5 in inOrder(Node-3), we hit line 7, and print the node ...so, we print out "3".

Then we continue at line 9 and try to go right:

     inOrder (Node-null):1 
     inOrder (Node-3):9
     inOrder (Node-5):5
     inOrder (Node-10):5 
     CallingMethod():?

But, the node is null, so at line 3 of inOrder (node-null), we return, and pop the stack, so we continue form here:

     inOrder (Node-3):9 // pick up here
     inOrder (Node-5):5
     inOrder (Node-10):5 
     CallingMethod():?

We next hit line 1 of inOrder(Node-3), which returns, so we pop the runtime stack again:

     inOrder (Node-5):5 // Continue from here
     inOrder (Node-10):5 
     CallingMethod():?

So, we pick up where we left off in inOrder(Node-5), and print the node at line 7...so, we print out 5. We have now printed 3 and 5.

Next, we continue to line 9, where we go right:

     inOrder (Node-9):1 
     inOrder (Node-5):9
     inOrder (Node-10):5 
     CallingMethod():?

Since Node-9 is not null, we continue to line 5, where we go left:

     inOrder (Node-8):1 
     inOrder (Node-9):5 
     inOrder (Node-5):9
     inOrder (Node-10):5 
     CallingMethod():?

And, we do the same in the next activation of inOrder(): inOrder(Node-8) reaches line 5 and recursively calls inOrder(Node-null):

     inOrder (Node-null):1 
     inOrder (Node-8):5 
     inOrder (Node-9):5 
     inOrder (Node-5):9
     inOrder (Node-10):5 
     CallingMethod():?

This call returns at line 3, since the root is null, again popping the stack:

     inOrder (Node-8):5 // continue from here
     inOrder (Node-9):5 
     inOrder (Node-5):9
     inOrder (Node-10):5 
     CallingMethod():?

So, inOrder(Node-8) continues to line 7, printing 8. We have now printed 3, 5, and 8, in that order.

inOrder(Node-8) then continues to line 9, where it calls itself recursively on its right child:

     inOrder (Node-null):1
     inOrder (Node-8):9 
     inOrder (Node-9):5 
     inOrder (Node-5):9
     inOrder (Node-10):5 
     CallingMethod():?

But, since the right child was null (didn't exit), this activation of the method returns at line 3, popping the stack:

     inOrder (Node-8):9 // continue from here.
     inOrder (Node-9):5 
     inOrder (Node-5):9
     inOrder (Node-10):5 
     CallingMethod():?

inOrder(Node-8) then conintues where it left off, right after line 9, and reaches line 11, where it returns, again popping the stack:

     inOrder (Node-9):5 // Continue from here
     inOrder (Node-5):9
     inOrder (Node-10):5 
     CallingMethod():?

Now, we're back to inOrder (Node-9), which picks up after line 5, at line 7, and prints the node. We've now printed 3, 5, 8, and 9, in that order.

It continues to line 11, where it returns, again popping the stack:

     inOrder (Node-5):9 // continue from here
     inOrder (Node-10):5 
     CallingMethod():?

So, we find ourselves back in inOrder(Node-5), after line 9, so we hit line 11, and return, again, popping the stack:

     inOrder (Node-10):5 // Continue from here
     CallingMethod():?

So, at this point, the recursion has unwound and we have found ourselves back in the first call to inOrder(), inOrder(Node-10). We have now printed the entire left subtree. So, we continue to line 7, where we print out the node. We've now printed out 3, 5, 8, 9, and 10, in order.

We then continue to line 9 of inOrder(Node-10), where we begin exploring the right subtree, by calling inOrder(Node-15). As before, we push the new call onto our stack, and begin a recursive phase (as opposed to the shrinking unwinding phase), again:

     inOrder (Node-15):1
     inOrder (Node-10):9 
     CallingMethod():?

Since the node isn't null, we proceed in inOrder(Node-15) past line 3 to line 5. Here we call inOrder() on the left subtree, again pushing the new call, inOrder(Node-12) onto the stack:

     inOrder (Node-12):1
     inOrder (Node-15):5
     inOrder (Node-10):9 
     CallingMethod():?

inOrder(Node-12) again passes the test at line three and "goes deeper" at line 5, calling inOrder() on the left subtree: inOrder(Node-null):

     inOrder (Node-null):1
     inOrder (Node-12):5
     inOrder (Node-15):5
     inOrder (Node-10):9 
     CallingMethod():?

Since the node is null, the test at line three causes it to return, popping the stack:

     inOrder (Node-12):5 // Continue from here
     inOrder (Node-15):5
     inOrder (Node-10):9 
     CallingMethod():?

inOrder(Node-12) then picks up where it left off, continuing to line 7, where it prints the node. We've now printed 3, 5, 8, 9, 10, and 12, in order.

It then proceeds to line 9 and explores the right subtree, which is null:

     inOrder (Node-null):1 
     inOrder (Node-12):9
     inOrder (Node-15):5
     inOrder (Node-10):9 
     CallingMethod():?

Since this root is null, it is caught by the test at line 3, and returns, popping the stack:

     inOrder (Node-12):9 // Continue from here
     inOrder (Node-15):5
     inOrder (Node-10):9 
     CallingMethod():?

Now we're back in inOrder(Node-12), just after line 9. Execution proceeds to line 11, where it returns, again unwinding.

     inOrder (Node-15):5 // Continue from here
     inOrder (Node-10):9 
     CallingMethod():?

So, inOrder(Node-15) continues after line 5, printing the node at line 7. We've now printed 3, 5, 8, 9, 10, 12, and 15, in order.

Execution then continues to line 9, where we go to the right, inOrder(Node-18):

     inOrder (Node-18):1
     inOrder (Node-15):9
     inOrder (Node-10):9 
     CallingMethod():?

Since Node-18 is not null, inOrder (Node-18) continues to line 5, where the left sub-tree will be explored:

     inOrder (Node-null):1
     inOrder (Node-18):5
     inOrder (Node-15):9
     inOrder (Node-10):9 
     CallingMethod():?

Unfortunately, the root passed into inOrder() is null, so it hits line 3 then returns, again popping the stack:

     inOrder (Node-18):5 // Execution continues here
     inOrder (Node-15):9
     inOrder (Node-10):9 
     CallingMethod():?

Execution continues after line 5 of inOrder(Node-18). At line 7, 18 is printed. We've now printed 3, 5, 8, 9, 10, 12, 15, and 18, in order.

We then try to explore the right sub-tree of Node-18, by continuing to line 9, where it makes a recursive call, passing its null right child as the root:

     inOrder (Node-null):1 
     inOrder (Node-18):9 
     inOrder (Node-15):9
     inOrder (Node-10):9 
     CallingMethod():?

This activaton of inOrder() returns at line 3, because the root is null. This again pops the stack:

     inOrder (Node-18):9 
     inOrder (Node-15):9
     inOrder (Node-10):9 
     CallingMethod():?

Notice that, in the stack shown above, each activation is at line 9. Upon return, each, in turn, will proceed to line 11 and return. We've seen this behavior before, it is called unwinding. There is nothing in the recursive method after line 9, except the return.

As a result, we'lll just see the stack shrink as each activation picks up after line 9, reaches line 11, and returns, popping the stack:

     inOrder (Node-15):9 // Continue here
     inOrder (Node-10):9 
     CallingMethod():?

inOrder(Node-15) continues after line 9, reaching line 11, and returning, again pooping the stack:

     inOrder (Node-10):9 // Continue here 
     CallingMethod():?

And, the same is true of inOrder (Node-10). At this point, we're back to the calling function which, having printed the elements of the tree in order, continues along its merry way:

     CallingMethod():? // Continue here

Deleting from a Binary Search Tree

What is it that makes a Binary Search Tree what it is? Of course, it is the fact that all nodes to the left of node a will be less than a, and all nodes to the right of a will be greater. Adding nodes to a BST is easy: all you have to do is traverse down the tree until find a spot where you can add it safely, and then add.
But what about deleting? The above fact about BST's is what makes deleting difficult.
A Binary Search Tree becomes completely useless if it loses it's order property that is described above. What is it about deleting that might cause this property to be in danger?
Deleting a leaf is the most trivial of deletes. All you need to do is set the reference to that particular node to null, because there are no nodes under it, and you don't have to worry about restructuring the tree. Of course, the reference to the node you want to delete will lie in its parent! So how can you go about doing this? The solution lies in recursion, and thats where we're headed now.
So deleting seems pretty easy when deleting a leaf, but what about when you want to delete the root of the tree (or any subtree)? Let's take a quick look at a common situation.
             10
            /  \
           /    \
          5     15
         / \   /  \
        /   \ 12  18
       7     9
            /
           8
  
So here we want to delete the root of the tree, which is 10. What would we make the root? The tree, before the delete, represents the list
  5, 7, 8, 9, 10, 12, 15, 18
  
Notice that the root, 10, divides the subtress rooted at 5 and at 15. So, if we delete 10, we must replace it with a number that will divide these two subtrees -- either 9 or 12.
To select these, we look for the right-most item in the left subtree, which is 9, or the left-most itme in the right subtree, which is 12.
The right-most item in the left subtree can be found by "going right until we can't go right anymore" in the left tree. Similarly, the left-most item in the right subtree can be found by "going left until we can't go left anymore". It is important to realize that these traversals never change direction -- always left, or always right. Changing direction would move us away from the extreme end of the list, whcih is the middle of the whole tree.
Now, we've got another problem. What about the item we moved? Do we need to recursively delete it? Fortunately, not. Remember, that node only has one child, not two: If there were another child, we would have gone one step further in that direction to find the "extreme" item on that side. We can just replace this moved item's parent to its only child. So, if we delete 10 by replacing it with 9, we connect 5 to 8.
The only complication is that if the tree is being implemented with recursion, to the exclusion of parent pointers, the code will be a bit ticklish. We'll need to "look two levels deep" going down the tree to connect the moved node's parent to its child, rather than working our way down to the moved node, and using a child reference to find the replacement and a (non-existant) parent reference to do the attachment.