October 29, 2008 (Lecture 17)

October 29, 2008 (Lecture 17)

Today's Example

"const"

The "const" qualifer can be used to give the compiler a hint about how a variable is to be used. It indicates that the variable is a "constant" -- that its value will not be changed by the programmer. If the programmer does try to reassign the variable, it will usually generate a compile-time "warning" ("discards qualifier") rather than an "error" -- the code will still compile. But, the warning properly indicates that there is a problem with the semantics.
We often use const, instead of #define to define constants. When we do this for global constants, we want to put the declarations into the "globals.c" file and the "externs" into the "globals.h" file:
  const double PI = 3.14;
  
There are some trade-offs involved in choosing between "const" and #define:

#defines are substituted by the preprocessor prior to compilation. They are gone well before execution, even if debugging information is requested with the -g flag. As a result, they aren't visible while debugging. This is not true for variables qualified as "const", as they are variables like any others.

Sometimes it is desirable to share constant values between the library's internals and its users. Consider for example return values, error codes, and option flags. Since we don't include variables in header files, these "consts" cannot be made available to both the library and its users in this way. But, with #defines, we can put them in the header file and include them into each the library and the caller.

"const *"

The declaration "const *" is a bit confusing. It declares a "pointer to a constant", not a "constant pointer". In other words, it is a pointer to a constant value -- not a constant pointer.
If one assigns a "const *" pointer to data that isn't actually declared as a "const", this isn't an error. Instead, it just generates a constant view of the data. In other words that, although the value might be changeable -- it isn't changeable via that particular pointer. The opposite is, of course, not safe -- and not allowed.
It is always legal to assign a "const" to a non-const -- this doesn't place the constant value in any danger.
Consider the example below:
  int x = 5;
  int y = 6;

  int *ip = &x;
  const int *cip = &x

  /* Legal */
  x = 7; 
  *ip = 8; 

  /* NOT Legal */
  *cp = 9;
  

"const" Pointer

What if we do want a "constant pointer", a pointer which cannot be reassigned? Well we can do this -- but the syntax looks really weird.
The name of the game here is that C, like other programming languages, is designed to make the common case convenient. As a consequence, the syntax for declaring constants favors constant values, not constant pointers -- constant pointers are just much less common.
Also do note that, just like any other constant variable, constant pointers should be initialized at declaration. If they aren't -- they can't later be assigned a value -- so they are basically not useful.
So, let's take a look at this by example:
  int x = 5;
  int y = 6;

  /* This declares a "constant pointer", the pointer itself can't be 
   * reassigned 
   */
  int * const cp = &x;

  /* This declares a constant pointer to constant data. 
   * The pointer can't be assigned. And, the value, itself, can't be 
   * changed via this pointer 
   */
  int const * const cpc = &x;

  cp = &y; /* NOT legal */
  *cp = 7; /* Legal */

  cpc = &y; /* NOT legal */
  *cp = 7; /* NOT legal */
  

"const" and Function Arguments

On many occasions we use "const" when we pass arguments to functions by reference -- but don't intend to change them. This, for example, often happens when we pass strings into functions. Since we don't have a first-class string type, but instead must use a pointer to an array of chars, strings are always passed by reference.
So, if a programmer is looking at a function prototype and notices that a string is being passed into the function -- it is unclear if the function intends to change the string or just read it. We can clarify the intent, an also protect against accidental misuse, by clarifying the intent.
Consider strcpy(), a function which copies from one string to another. Notice that the src is annoted as "const", but the destination isn't.
  int strcpy (char *dest, const char *src);
  
We run into an analageous situation when we pass structs by reference. We do this, need it or not, just for performance reasons. But, when we don't actually intend to change the struct, we should note that it is "const":
  int printRecord (const struct studentRec *student) {...}
  
We can pass "constant pointers" into functions. If we do this, it prevents the function, internally, from assigning the pointer to the address of a different object. But, in practice, this is almost never done.
First, since pointer, itself, is passed by value, the caller is protected from any changes -- they only affect the function's local copy. Second, it muddies up the interface -- the internal constant use is exposed to the caller who doesn't care. If it really is important to mark the variables as constant as a measure of safety within the function, they can be assigned to constant locals -- without it leaking out to the interface.

"const" and Return Values

This is a short section: There is no such thing as a "const" return value. A reutrn value is an "rvalue" by definition -- it can't be assigned, anyway.

Linked List Implementation

For fun, we went through and annoted various parts of our linked list code with "const". The updated verion is linekd at the top.

Guidelines

For our purposes, we should always use "const" in the circumstances below, all other uses are optional:

Genuine constant values
Passing structs by reference, where we would otherwise pass by value, but for performance concerns.
Passing strings into function, except where we intend to change them

Memory Errors In C

By now, you guys have probably realized that the most incidious errors in C programs are very often memory-related problems. These problems are nasty because they are related to the language and environment -- not the problem that one is trying to solve.
We see memory errors in a lot of different ways, a few of which are listed below:

Allocating too little space, as might happen if a string-length is wrong, an input exceeds the expected and allocated size, or the sizeof() the wrong type is used

Walking past the end of an array (really a type of the above)
Freeing memory, then continuing to use it. This can happen, for example, if an object is freed upon detecting an error, but a caller retries to operation, or if both the caller and the callee free the same object.

"Leaking memory" by reassigning a pointer, without first freeing the object, or using malloc for a local purpose within a function, but not freeing it within that function.

So, it is pretty clear that if we use memory that is not properly allocated, one of three things can happen:

Nothing, we wrote onto unused space. This might, for example, happen if our request was rounded up to some standard size by malloc, so there is some extra at the end of the array, anyway. This type of error might seem innoculous enough, but it can be trouble. After a different feature is exercised, a recompile with optimzation, a port to a different system, a new version of tools, or anything else -- the error can suddly morph into a different more potent form.

We damage something. We scribble on top of something important. Later on our program relies on this damaged value and either crashes or generates incorrect results, or both. This situation is nasty, because the appearance of the errant behavior and the execution of the broken code are separated in time. This can make debugging challenging.

A segment fault or bus error. The pointer is bogus and doesn't point to allocated space. As a result, the hardware catches us and the OS strikes us dead.

But, what if we "leak" memory? Well, the textbook answer is that eventually the system will run out of emmory and either malloc() will fail or the program will be killed by the OS for exceeding some resource limit. And, this can surely happen.
But, thee days most VM systems are backed by not only a large amount of RAM -- but a truly huge amount of disk. Well before malloc() fails or the system kills off a process, things are likely to slow down, perhaps exponentially, due to pagging
You'll learn about paging in 15-213 and, in depth, in OS. But, to make a long story short, when a computer doesn't have enough memory, it plays a shell game and temporaily frees some memory by writing pages of memory off to disk. Then, shoudl they be needed in the future, they can be read back in -- perhaps after writing out other pages to make room. This shell game dramatically hamper system performance because the disk, which is being used in place of RAM, is much, much, much, much slower.
Those of you who were in class got to hear a story about my master's project. For expediency on morning, I used a malloc in place of a static allocation. I knew I should remove it, but never got around to figuring out how big a buffer I needed. And, I never freed it, because, well, it was there only temporaily, anyway.
Well, I forgot about it and the software rolled out to our project's sponsor. And, with large enough inputs, my software became slow. I optimized the code. I added caching. I restructured large portions of the code. I tried to improve the algorithm.
Months after my graduation, my advisor took a look at it. Puzzled by the behavior, he started using some tools to analyze the situation. And, among those tools, he used strace -- which traces system calls. He found that in just a few seconds of exeuction, brk() was called some 20,000 times. You'll recall that brk() is the system call that malloc() uses when it runs out of memory to request more from the OS.
He replaced my sloppy malloc() call with a proper static allocation -- and the problem was gone. It would have been similarly fixed if he had simply freed the allocated space at the end of the work loop. But, in truth, malloc() should only be used when a static allocation won't do. Static allocaitons are "born" with the program. But malloc() is dynamic and wastes time during execution. And, in my case, it didn't make sense to free soemthing a the bottom of a loop only to reallocated it again moments later at the top.

Valgrind

Valgrind is a tremendous tool for finding memory problems in C programs. For those who might be familiar, it is similar to IBM's Rational Purify tool. Regardless, it can help you to find tons of different problems, and, of particular concern to us:

Memory leaks
The use of unallocated pointers
Walking past the bounds of dynamically allocated arrays and other objects.
The use of uninitialized variables

It is a dynamic, or runtime, analysis tool. This means that it analyzes your code while it is actually running. Basically, when you run a program using valgrind, it, at runtime, injects its code into your program (or vice-versa, really), so that it is able to trace your code.
But, like all runtime analysis tools, it checks only the code that actually runs -- not all paths. So, in any execution, it won't find problems, for example, in error handlers that don't happen to be exercised or in features that aren't invoked.
This is different than, for example, splint, which is a static tool. It analyzes the source code, rather than the execution. But, as it turns out, unless you program using a very formal and restricted style, runtime tools generally provide a better analysis.
In class we took a look at an excellent tutorial from the kind folks at cprogramming.com. I refer you there for a primer on valgrind:

Valgrind Tutorial