15-111 Lecture 21 (Wednesday, March 18, 2009)

Binary Search Trees

Today we talked about a very important type of tree, the Binary Search Tree. In a BST, the parent-child relationship states that the left child contains a smaller value than the parent, and the right child contains a larger value than the parent.

`Left Child < Parent < Right Child `

This relationship will give us a complete ordering of the elements. That is, given a Tree containing some set of elements, an inorder traversal of the Tree will ALWAYS walk through the nodes in the same order, even if they are arranged differently.

For example, given the numbers 1 through 7, lets look at two different valid BSTs.

```
4		       2
/   \		     /   \
2        6 		   1	   4
/  \      / \		 	 /   \
1    3    5   7	        	3     6
/   \
5      7
```

For the left tree, our traversal will first go left. Now, it must traverse the subtree containing 1,2 and 3. To traverse this, it will first visit the left child (1), then visit the parent (2), and finally the right child (3). Next, we go back to the root (4). Now we go to the right subtree containing 5,6,7. First, we visit the left child (5), then the parent (6), and finally the right child (7). So the inorder traversal is 1,2,3,4,5,6,7.

For the right tree, we first visit the left child (1). Then we go back to the root(2). Now we visit the right subtree, whose root is 4. So we first search 4's left child (3), then we go back and visit (4), and finally we search the right subtree, which like in the left tree, will be in the order 5,6,7. So the traversal is again 1,2,3,4,5,6,7.

The moral of the story is that the parent-child relationship in a binary search tree defines a complete ordering. Any inorder traversal of a BST will traverse the Nodes in order. In the case of integers, this will probably be from smallest to largest, or if you have strings, maybe it will traverse them in alphabetical order, or in order by length of the string, depending on how you define your order.

Building a Binary Search Tree

Now lets talk about how to build a binary search tree given a list of elements. First, we note that an empty tree and a tree with one element are both valid binary trees. Without more than one item, it is impossible to violate the parent child relationship.

Now, say we have a tree with one element, and we want to insert a second element. We simply compare the new element with the value stored in the root, and then we have to choices. Either we place it in the left subtree, or we place it in the right subtree. In other words, if the new item is less than the item in the root, we insert the new item in the left subtree. Otherwise, we insert into the right subtree. We can do this recursively, where our base case is inserting into an empty tree, which is simple.

Lets try this with the sequence of numbers 4,2,6,1,3,7,5.

We first insert the 4 into the empty tree. Next, we insert 2. Since 2 < 4, we insert it into the left subtree. Since the left subtree is empty, 2 becomes the new left subtree.

```
4
/
2
```

Next, we insert 6 into the tree. Since 6 > 4, we insert into the right subtree.

```		4
/  \
2      6
```

Now, we insert 1. Since 1 < 4, we insert into the left subtree. Since 1 < 2, we insert into 2's left subtree.

```		4
/   \
2       6
/
1
```

Next, lets insert 3. Since 3 < 4, we insert into the left subtree. Since 3 > 2, we insert into 2's right subtree.

```		4
/   \
2       6
/  \
1    3
```

Using the same method, we insert the 5 and 7, to get the tree.

```		4
/    \
2      6
/  \   /  \
1    3 5    7
```

This is a pretty nice looking tree. It has no holes in it, and it branches out at every possible oppertunity. We call this a balanced tree. Notice that the depth of this tree is very small.

What if we insert the objects in the order 1,2,3,4,5,6,7? The result will still be a valid BST, but it will look very different. First, we insert 1 as the root. Then we add 2 to the right subtree.

```	1
\
2
```

Now, we insert 3. Since 3 > 1, we insert into the right subtree. Since 3 > 2, we insert into 2's right subtree.

```	1
\
2
\
3
```

You can probably see where this is going. Each new element will be added in a new level, all the way on the right. Our final tree will look like this.

``` 1
\
2
\
3
\
4
\
5
\
6
\
7
- or -

1 - 2 - 3 - 4 - 5 - 6 - 7

```

This actually looks an awful lot like a linked list. The result is that when our tree looks like this, we get no benefit from using a BST. The depth is the same as the number of elements in the tree. We call this a degenerate tree.

Searching a Binary Tree

Lets say we want to search a binary tree. This is easy, since given an item to search for, we can just compare it to the root and we instantly know one of three things. 1.) We found what we're looking for, 2.) The object we're looking for is in the left subtree, or 3.) The object we're looking for is in the right subtree, if its there at all.

```public boolean binarySearch(TreeNode current, Comparable target){
if(target.compareTo(current.getValue() < 0)
return binarySearch(current.getLeftSubtree(), target);
if(target.compareTo(current.getValue() > 0)
return binarySearch(current.getRightSubtree(), target);
else
return true;  // compareTo returned 0! We found it!
}
```

This means we only need to travel to the bottom of the tree once. We can do this because we have the assurance that all objects in a Nodes left subtree are smaller than the parent, and all objects in the right subtree are larger than the parent. So if we're looking for something that is larger than the parent, we know for certain that it CAN'T be found in the left subtree.

This means that our search time is proportional to the maximum depth of the tree. What is this depth? The answer is it depends on what order they were inserted. In the worst case, which is the degerenarate tree above, the search is linear, or O(n). There is only one path to the bottom of the tree, and it is of length n. If we want to search for 7, we would have to traverse all the way down all 7 levels until we find it.

However, in our balanced tree, we only need to check 3 levels to find the 7. So what is the runtime of the search if we have a balanced tree? Well, each level of depth can hold twice as many elements as the previous.

The root level contains 1 item. The second level contains 2 items. The third level contains 4 items.

```		50     	         Level Capacity: 1 = 2^0 Total Capacity = 2^0 = 2^1 - 1
/    \
25      75           LevelCapacity: 2 = 2^1  Total Capacity = 3 = 2^2 - 1
/    \    /   \
10    32  62    80       Level Capacity: 4 = 2^2 Total Capacity = 7 = 2^3 - 1
/  \   / \  / \   /  \
5   12 26 35 55 65 77  85   Level Capacity: 8 = 2^3 Total Capacity = 15 = 2^4 - 1
```

As you can see, the total capacity of a balanced tree is equal to 2^height - 1. For the purposes of runtime analysis, we'll forget about the -1. So for a balanced BST containing n objects, the height h is related to it by the formula...

`  n = 2^h `

If we take the log of both sides, we get the property that

` h = log(n) `

Since we determined above that the search was proportional to the height of the BST, a search in balanced tree will be O(h), or O(log(n)). However, as we saw earlier, for degenerate trees, the search will be O(n). Luckily, randomly created trees have a tendancy to be roughly balanced on average, so in general, searches run in log(n) time. But be aware that big O notation refers to the worst case scenario, so for a Binary Search, it is still O(n).

Deleting from a Binary Search Tree

What is it that makes a Binary Search Tree what it is? Of course, it is the fact that all nodes to the left of node a will be less than a, and all nodes to the right of a will be greater. Adding nodes to a BST is easy: all you have to do is traverse down the tree until find a spot where you can add it safely, and then add.

But what about deleting? The above fact about BST's is what makes deleting difficult.

A Binary Search Tree becomes completely useless if it loses it's order property that is described above. What is it about deleting that might cause this property to be in danger?

Deleting a leaf is the most trivial of deletes. All you need to do is set the reference to that particular node to null, because there are no nodes under it, and you don't have to worry about restructuring the tree. Of course, the reference to the node you want to delete will lie in its parent! So how can you go about doing this? The solution lies in recursion, and thats where we're headed now.

So deleting seems pretty easy when deleting a leaf, but what about when you want to delete the root of the tree? Let's take a quick look at a common situation.

```             10
/  \
/    \
7     15
/ \   /  \
/   \ 12  18
5     9
/
8
```

So here we want to delete the root of the tree, which is 10. What would we make the root? The tree, before the delete, represents the list

```  5, 7, 8, 9, 10, 12, 15, 18
```

Notice that the root, 10, divides the subtress rooted at 5 and at 15. So, if we delete 10, we must replace it with a number that will divide these two subtrees -- either 9 or 12.

To select these, we look for the right-most item in the left subtree, which is 9, or the left-most itme in the right subtree, which is 12.

The right-most item in the left subtree can be found by "going right until we can't go right anymore" in the left tree. Similarly, the left-most item in the right subtree can be found by "going left until we can't go left anymore". It is important to realize that these traversals never change direction -- always left, or always right. Changing direction would move us away from the extreme end of the list, whcih is the middle of the whole tree.

Once we find the right item, we copy it into the hole created by the deletion, and then delete it. Things are now simplier than they seem -- the next level has 0 or 1 children. This is because if it had two children, we whould have gone down further as we went "left as far as we can go" or "right as far as we can go".

If the next level has 0 children, we're done. If it has 1 child, we just elevate it to fill the hole, taking it's children, if any, with it.