October 9, 2008 (Lecture 12)

October 9, 2008 (Lecture 12)

Today's Example

Linked Lists

Users typically don't notice slowdowns if they occur a little at a time. It is when the program stops or freezes that users notice and are unhappy. While arrays are great for features such as indexing, they do have their drawbacks. When you add to an array that is full, the standard idiom id to create a new, larger, array to replace it, copy over the contents, and reset the pointer. With a Linked List, however, you can grow the data structure can shrink and grow, as needed, rather than in bursts.
Insertion and remove within an array is also a bit tough, at least if you want the array to remain dense and don't want anything destroyed in the process. It requirs shifing other elements over to fill the space or to make room. This shifting operation really amounts to copying each item, one at a time, into the new position. With a linked list, if you already have an index to the right position, you can just insert or remove -- without copying or moving the other elements.
It is also possible to create the storage before it is actually needed and keep it available in a "pool" or "unused list" until it is needed and added into the real, live list.

So, What is a Linked List?

As you probably know, a list is an ordered collection of items. Like any good collection, you want to be able to add and remove items from your list, as well as traverse the list to look at each item or to look for a particular item.
A linkedlist is a collection of nodes. Conceptually, a node is an object that contains two references: a references to the data item, and a reference to the next node in the list. Basically, it is the node that keeps the items in order.
Below is a linkedlist consisting of five nodes. Each of these five nodes contains two references: one to the data item, in this case a string, and one to the next node in the list.

With the exception of the last node, each node in the linkedlist leads to its successor node. This organization is important, because it explains why you can't jump around in a linked list as you can an array -- instead, you just have to walk there, following the links created by the "next" pointers.
The list also contains three other references to nodes:
node *head node *tail node *index
Now look at head, tail, and index. These references refer to nodes in the linkedlist, too. Take a look and see which nodes each of them refers to.
head refers to the first node in the linkedlist. tail refers to the last node in the linkedlist. index can be used by the programmer to keep track of a particular node in the list.

Inserting a node at the Back of the Linked List

If we neglect the special case of an empty list, adding a node to the end of the list is straight-forward. We create a new node and initialize it so that it's "item" pointer identifies the data item and so that its "next" pointer identifies its intended successor: Nothing (NULL), since it is the last one. Then we set the existing tail's "next" reference to identify this new node as its successor. Then, we update the node's understanding of the tail and of the count of items within the list.

Inserting a node at the Front of the Linked List

If we neglect the special case of an empty list, adding a node to the front of the list is straight-forward. We create a new node and initialize it so that it's "item" pointer identifies the data item and so that its "next" pointer identifies its intended successor: the old head node. Then we set the existing "head" reference to identify this new node as the first one within the list. Then, we update the node's understanding of the count of items within the list.

Deleting a node from a linkedlist

How would you remove a node from a linkedlist? We basically need to use a temporary reference, we'll call it "index", to find the node's predecessor. We need to find the predecessor, because it is the predecessor's "next" reference that will need to change -- the node we're trying to delete will soon be gone afterall. We walkk index from the start of the list to the predecessor, then set the next node to skip past the deleted one and adjust our count.
There are only two tricks. The first is that we've got to be careful in the case of a one item list. The second is that we'll need to free the node, so we'll need another reference to "remember" it long enough to do that -- the predecessor's "next" will be reset. And, we can't reverse the order of this, because we need the soon-to-be-deleted node's next reference to set it's predecessor's next reference.

Implementing a Linked List: The Data Structures

Now that we've had the really quick nickel introduction/refresher to linked lists, we'll go about the business of implementing one. We'll begin by defining the two basic data structures that we'll need: the linked list and the nodes that compose it.
We'll begin by defining the "node". If we were working in Java, we'd obviously use a class. But, in C as close as we'll get is a struct. We'll bind the operations together, as we have in the past, using our standard library idiom. We'll end up tighening this up a bit next class. But, for now, let's just dump the data structures into "linkedlist.h".
Our node is a struct composed of a pointer to the data item and a pointer to the next node. We know that the pointer to the "next" item needs to be a pointer to our struct. But, what should we sue for our data?
We'll use a "void pointer", sometimes called a "generic pointer". We've used them before -- they are basically untyped addresses. You can't get the size of what they point to. Pointer arithmetic isn't going to work as you'd like. But, they let you point at anything -- without compiler warnings.
Most modern compilers will let you use a refernece to the node type, even within the node type, without any special incantations. But, older compilers require a "forward reference". The forward reference tells the compiler of the existence of the type, but not the details, before it is used. The line that reads, "struct node_t;" is the forward reference for older compilers:
  struct node_t;

  typedef struct node_t {
    void *data;
    struct node_t *next;
  } node;
  
It is worth noting that although we can use a struct within the definition of the very same struct -- the same is not true of a "typedef". As a result, when we define "next", we use the struct name, not the typedef name. That's okay, the two types are completely compatible -- in both directions. But, it does mean that this struct can't be anonymous -- we do need to name it, as we did (node_t).
The linked list, itself, is going to need to keep the three pieces of state: the beginning, the end, and the count. So, we've got another struct:
  typedef struct {
    node *head;
    node *tail;
    unsigned long count;
  } linkedlist;
  

Implementing the Linked List Library: Initializing the List

There is really only one surprising thing about our implementation of this function: We are passing the linkedlist in by reference. This is not necessary -- we aren't trying to allocate it within the function.
We are doing it to improve performance. You really can't pass a struct by value. As you'll learn in 15-213, it is just too big. Instead, it gets copied and a pointer gets passed -- its just that the compiler will do some smoke and mirrors to make hide the pointer from you.
Regardless, copying a big struct is slow. So, we just pass in by reference. Next class, we'll learn about "const", a way of making it clear to the compiler that we don't intent to change this pointer. But, it isn't necessary -- and that is tomorrow, this is today.
  int initLL (linkedlist *list) {

    if (!list) return NULL_LIST_LL;

    list->head = list->tail = NULL;
    list->count = 0;
  
    return SUCCESS_LL;
  }
  

Implementing the Linked List Library: Adding at the Front of the List

There really is nothing surprising about this method. The comments tell the story. The only thing to watch is that pesky special case at the end: Adding to a previously empty list. In this case, "tail" needs to be set in addition to head. DRAW THIS OUT -- REALLY. Unless you do, you really might not catch it.

  int addHeadLL (linkedlist *list, void *data) {

    /*
     * This will point to the new node. It can be "Local", because
     * it isn't needed outside of this function -- by the time
     * we leave this function the new node is referenced by its
     * predecessor in the list which is not local
     */
    node *newnode; 

    /*
     * okay, a pre-flight sanity check: 
     *  1) Do we have a valid list
     *  2) Do we have something to add
     */
    if (!list) return NULL_LIST_LL;
    if (!data) return NULL_DATA_LL;

    /* Allocate the sapce for the node */
    newnode = malloc (sizeof(node));
  
    /* Set its "data" and "next" references. At this point, the
     * node is ready to go and knows it successor in the list, if
     * any. If the list is empty, "head" will be NULL, so this node's
     * next will be NULL -- and that is correct.
     */
    newnode->next = list->head;
    newnode->data = data;

    /* Okay. Set the head so that it points to this new node instead
     * of the old head. The old head, remember, is now being pointed
     * to by the newly created node. 
     *
     * And, of course, bump the count forward.
     */
    list->head = newnode;
    (list->count)++;

    /* If list was previously empty, set tail to only node, newly added */
    list->tail = list->head;

    return SUCCESS_LL;
  }

Implementing the Linked List Library: Testing

So, at this point, we can opulate a list, so we should begin to test. It makes sense to build up a list and print it out. But, how do we print out some unknown data type associated with a void pointer?
Well, we'll cross that bridge tomorrow. For today, we're just going to write a quick and dirty debug function. It'll assume that this is a list of "int" items. As before, the comments tell most of the story.
But, do check out the "#ifdef DEBUG" and "#endif" guarding. We can "define" debug by adding the "-D" flag to gcc, e.g., "-DDEBUG". Notice the two Ds: one for the -D and one to beign "DEBUG". Defining something this way does not put it into the program at runtime. It just makes it available during preprocessing.
Also, for brevity, I'm not showing the partially developed test program here. But you surely can get to it.
  #ifdef DEBUG
  int printIntLL (linkedlist *list) {
    node *index; /* Used for keeping our place in the list */

    /*
    for (index=list->head; /* start out at the beginning */
         index; /* Last node's next is NULL, so we'll get set to that at end */
         index=index->next) /* Move one forward each iteration */
    */
    for (index=list->head; index; index=index->next) {

      /* Ugly cast to int before dereferencing:
       * This is necessary, becuase it is a void pointer. 
       * It must be cast to something, or it can't be dereference:
       * Otherwise, what type? Or even how big?
       */
      printf ("%d\n", *((int *)index->data)); 
    }

    return SUCCESS_LL;
  }
  #endif
  

Implementing the Linked List Library: Removing the First Item

Okay. OK, a couple of things worth highlighting here. The first is that we pass in a "handle" (pointer to pointer) to the data item. We do this because we are, in effect, returning it. We need to be able to change the value of the "void *", so we need a pointer to it, a "void **". This is no different, really, than the swap function of a few classes ago.
The second thing is that we need to save a reference to the node before nuking it -- or esle we won't have a reference to it to free the space.
And, the third is that we don't free the individual data item. This is because we didn't allocate it. And, this is probably the best design. The user is always welcome to copy it before passing it in, if they want us to have a deep cop (Robot says: Danger! Danger! How does it get freed?).
But, we can imagine that the user wants to use several lists to organize the same records -- for example student records sorted by last name in one list and by GPR in another. If a piece of information is changed in the student record, it should be seen in both lists -- if our list would make a copy, this would break, and break in a way the user would have some difficulty fixing. (For the curious, they could pass a pointer to a pointer, so we only copy the pointer).
But, from a more practical standpoint, we don't know how to copy thier data item. It, itself, might have references to other items. Do they get copied, too? Imagine that each of their items is, itself, a linked list. Gulp. Just copying the linkedlist struct won't really be a deep copy, will it?
We have a symmetric problem when it comes to freeing user data -- we don't know the organization or compostition. How can we free it? Really now? Answer: We can't (.)
So, what to say? Remember the golden rule: Free what you allocate -- but not what you don't!
  int removeHeadLL (linkedlist *list, void **data) {

    node *zombie; /* Will be used to keep reference, until we can free */

    /* Now routine sanity checking */
    if (!list) return NULL_LIST_LL;
    if (!list->count) return EMPTY_LIST_LL;
  
    /* Save a reference to the head, so we can deallocate the space it later */
    zombie = list->head;
  
    /* Step right over it, taing it out, reset the count */
    list->head = zombie->next;
    (list->count)--;

    /* Save the data for the caller */
    *data = zombie->data;

    /* Give back the space from the node, not the user data */
    free (zombie);
  
    /* If we took out the last one, the tail needs to be reset, too */
    if (!list->count) list->tail = NULL;

    return SUCCESS_LL;
  }
  

Implementing the Linked List Library: Deleting the List

Given our implementation of removeHeadLL(), nuking the whole list boils down to a simple loop. There are really only two things worth thinking about here.
The first is that we are not freeing the user data. Remember, we have no idea how to do that. So, if this needs to happen, they really need to call removeHeadLL() themselves and free the data as they go. I hate ot say it, but this method has a very high probability of generating a memory leak: Who's going to remember to do that. Maybe we'll learn something else we can do, if we want to be complicated about it, next class.
The seocnd thing is the funny "notused" void pointer. This is because removeHeadLL() returns the head item via an argument. We don't need it, so we just take the pointer that we need to take to make it work. We've already discussed why this migh, or might not, be a memory leak.
  int destroylist (linkedlist *list) {
    void *notused;

    if (!list) return NULL_LIST_LL;

    while (list->count)
      removeHeadLL(list, ¬used);
    
    return SUCCESS_LL;
    }
  

Implementing a Linked List: A BROKEN Example

In the example implementation of addHeadLL below, we do something tempting -- but so very broken.
Here is the FLAWED logic: Since we are only using one node, not some variable number of ndoes, we can just statially allocate it, instead of malloc()ing it.
If we do this, the code will work for a while. But, things will break. the statically allocated "newnode" is a local variable. It is allocated on the stack when the method is called -- and deallocated upon its return. After that, it might still work -- but will become corrupted as soon as a subsequent function call allocates the same chunk of stack space and scribbles on it as part of its own stack frame.
This error is quite incidious, becuase the results are not immediately visible. Another form is when a pointer to a local variable or argument is returned, either as a formal return value or via an argument. Some compilers will catch the "returning value from stack" thing -- but miss the other forms. Some compilers won't help you at all. Careful. Careful.
Below is the BROKEN ani-example:
#define BROKEN_EXAMPLE
int addHeadBROKENBADLYDLL (linkedlist *list, void *data) {

  node newnode; /* This is on the stack: Load gun */

  if (!list) return NULL_LIST_LL;
  if (!data) return NULL_DATA_LL;

  newnode.next = list->head;
  newnode.data = data;
  list->head = &newnode; /*Saved addr of local to non-local: Aim at foot/head */
  (list->count)++;

  /* If list was previously empty, set tail to only node, newly added */
  if (!tail) list->tail = list->head; 	

  return SUCCESS_LL; /* Fire! Our newnode is deallocated upon this return */
}
#endif