October 25, 2007 (Lecture 15)

October 25, 2007 (Lecture 15)

Today's Example

Need It Here, Need It There, Need it Everywhere

Last class we left off with the realization that we'd need the equivalent of Java's compareTo() or Comparator. We need a way of comparing the items. This, really, is no surprise. And it is no problem. The user of the BST library can simply pass in the compare function as needed, exactly as we did with the print function for our linked list of last week. But is this the right approach?
The print function for the linked list was only needed for one method as, in fact, it will be for the BST. But the compare function threatens to be more pervasive. The code will get pretty ugly if we pass it in every time we need it. And we'll be open to a new class of errors. If the user accidentally passes in the wrong function on some occasion, the tree won't work -- certain items might be compared upon insertion based on one property and other items another property.

Almost Object-Oriented

So what we are going to do instead is to restructure out BST's definition and constructor to make it more like a class in Java. We are going to put the function pointers into the BST's struct -- that way thet are there and waiting for us whereever we need them.
If we do this and then, in effect, take a step back and look at things, we'll see something interesting. For each and every one of our BST functions, we pass in a reference to the tree in as the first argument. This reference is, in effect, a "this" reference. In Java, it was passed implicitly. Here, we pass it explicitly. But that's what it really is. So as we develop our BST, stay tuned and watch out for it!

The New BST struct and Initialization

So once we've made this change, our BST's struct and initialization method change to look as follows. Notice how it looks almost like a Java class definitions with instance variables and methods:

From bst.h:

  typedef struct {
    struct node_t *root;
    unsigned long count;
  
    int (*printfn) (const void *);
    int (*comparefn) (const void *, const void *);

  } bst;

From bst.c:

  int initBST (bst **tree, int (*printfn) (const void *), 
                         int (*comparefn) (const void *, const void *)) {

    if (!comparefn) return BAD_COMPARE_BST;

    *tree = malloc (sizeof(bst));
    (*tree)->root = NULL;
    (*tree)->count = 0;
  
    (*tree)->printfn = printfn;
    (*tree)->comparefn = comparefn;

    return 0;
  }

addBST(...)

Below is the implementation of addBST that we developed in class. It follows the algorithm we developed yesterday. It starts out at the root, compares the item with the item we want to add, and moves left or right as appropriate. It continues to do this until it can't move, becuase there is no subtree in that direction -- the present node is a leaf node. When that happens, it adds the new node in the direction it otherwise would have travelled to continue its search. The big special case is that of an empty tree, upon which it adds directly at the root reference.
The real thing that might be surprising is that I defined the required compare method a bit differently than was done in Java. Under my definition, it retunrs -1, 0, or 1 -- not any number to be interpreted as "negative for greater than", "positive for less than", or "zero" for equal. I required that anything that would otherwise have been negative be exactly -1 and did the same for anything that would otherwise have been positive. I did this just so that I could easily #define -1, 0, and 1 to RIGHT, MATCH, and LEFT -- I think it makes the code more readable. But, this isn't a universal truth. It is just my opinion -- do what pleases you here.
  
  int addBST (bst *tree, const void *item) {

     node *index;
     node *newnode;
     int difference;
     int posnfound;

    /* Preflight */
    if (!tree) return NULL_BST;
    if (!item) return NULL_ITEM_BST;
  
  
    /* Special case: Empty tree */
    if (!tree->count) {
      tree->root = malloc (sizeof(node));
      tree->root->item = (void *)item;
      tree->root->left = tree->root->right = NULL;
    
      tree->count = 1;
    
      return 0;
    }
      

    /* Common case */

    /* Search for the right position within the tree */
    for (index=tree->root, posnfound=0; (!posnfound); ) {
  
      difference = tree->comparefn (index->item, item);
      switch (difference) {

        case MATCH: return DUPLICATE_ITEM_BST;
    
        case LEFT: if (index->left) index=index->left;
                   else posnfound=1;
                   break;
               
        case RIGHT: if (index->right) index=index->right;
                    else posnfound=1;
                    break;
  
        default: fprintf (stderr, 
            "addBST: Compare function didn't return LEFT, RIGHT, or MATCH\n");
            return BAD_COMPARE_BST;
      }
    }


    /* Actually add the node here */
    newnode = malloc (sizeof(node));
    newnode->item = (void *)item;
    newnode->left = newnode->right = NULL;
  
    if (RIGHT == difference)
      index->right = newnode;
    else
      index->left = newnode;

    tree->count++;
  
  
  return 0;

  }
  

Printing the Tree

In printing the tree, we've got to first decide which traversal we should use. There are some good arguments for making a decision different than the one we made -- but I'll leave them to 15-111 and 15-211. We chose to print the tree using an "in-order" traversal -- one that produces the items in sorted order. We did this because users of data structures often want a sorted list of the contents -- and that is exactly what an in-order traversal provides.
An in-order traversal is best described recursively. At each step, we go left if we can. This brings us to the least item in the subtree. Then, we print the node, knowing that it is the least -- and thereby the next one to be printed. Then, we try to go right to explore the right subtree. Once there, we repeat the process -- we want the least item in this tree. Then, once done, we've explored the left subtree and the right subtree, so we return.
Don't spend too much time with the paragraph above. Instead, read the code below -- and do it with a sample tree and a pencil in hand. Once you trace it, it'll all make sense. There's nothing I can do here in the notes that'll do that for you.

The Implementation of the Print Function

The print method below proceeds mostly as you would expect. There are only two things I'd like to note. The first is that, since it is recursive, and we don't want to expose the node structure, an internal part of our representation, to the user, we made the recursive part "static". This prevents the user from calling it -- and leaves them only with the "clean" version that doesn't require a node reference.
We also took the opportunity to restructure out header files to put all of the "internal" stuf into bst-int.h -- the prototypes of the private function and the definition of the node. We left the definition of the bst, itself, and the other methods, within the usual, public, header file for inclusion and reference by the library's user.
  int printBST (const bst *tree) {

    if (!tree) return NULL_BST;
  
    return printRecursiveBST(tree, tree->root);
  
  }


  static int printRecursiveBST (const bst *tree, const node *root) {

    if (!root) return 0;
  
    /* Go left */
    if (root->left)
      printRecursiveBST(tree, root->left);
  
    tree->printfn (root->item);
  
    /* Go right */
    if (root->right) 
      printRecursiveBST(tree, root->right);
    
    return 0;
  }
  

A Second Look: See the "This" Reference?

Take a look back at the functions we've written today. Notice that the first argument is always a "bst *" -- even for the recursive print helper, which is not directly called by the user. Remember, this is basically an explicit version of the "this" pointer that was implicitly passed around in Java.
Every function we write that is, in effect, a method upon a data structure will have one of these, just as it did implicitly in Java. It doesn't matter if it is a public method, such as printBST(...), or a private method, such as our "static" helper. They are methods upon a particular tree -- and need to know which tree.
We didn't pass the tree reference into the recursive helper "just because" we "needed" the pointer to the caller-supplied print function, even though we surely did. But, instead, we did it to be consistent with our object-oriented philosophy -- and by sticking with that approach, we -- no great surprise -- didn't run into any trouble getting access to the needed properties of the tree we are asked print.

Some Things To Try

For more practice, you might try some of these:

Write a method that determines the depth of the BST
Write a method that determines the depth of the most shallow leaf.
Write a method that determines if two BSTs are the same -- both in what they contian and also the shape of the tree
Write a method that verifies that the supplied binary tree is, in fact, a BST -- that it is ordered properly