September 18, 2007 (Lecture 7)

September 18, 2007 (Lecture 7)

Overview

Today we're going to talk a bit about parameter passing and then we'll move on and spend most of the class talking about multi-file development and variable scope, lifetime, and namespace.

Parameter Passing: Pass-by-Value

C, much like Java, uses the pass-by-value mechanism. When a function is called, the function's parameters are copies of the arguments that were passed in -- they are not the same variables. The mechanism is called "pass by value", because the values of the arguments are passed in -- not the arguments themselves. The behavior that results from this should not be a surprise to you -- it is the same mechanism as was used in Java.
The classic example of this is the doesn't-really swap() function:
  int swap (int x, int y) {
    int temp = x;
    x = y;
    y = temp;
    printf ("%d %d\n", a, b); /* 7 5 */
  }

  ...

  int main () {
    int a = 5;
    int b = 7;

    printf ("%d %d\n", a, b); /* 5 7 */
    swap (a, b);
    printf ("%d %d\n", a, b); /* 5 7 */
  }
  
Notice that, in the example above, the swap() function does swap the values of "x" and "y", its parameters. But, it does not affect, "a" and "b", main's local variables that are passed in. Please also note that renaming "a" and "b" to "x" and "y" wouldn't change this. Although main()'s two local variables and swap()'s two arguments would happen to have the same names -- they would remain different variables.

The Runtime Stack

Let's talk a bit more about function calls and variables. Each time a function is called, it's parameters and local variables are created. When the function returns, they go away. For this reason, once upon a time, local variables were known as automatic variables. This nomencalture isn't presently favored -- but remains valid and correct.
Each time a function is called, execution jumps to a completely different part of the program, and entirely new variables exist. These variables might have the same names as other variables in the program, but they are not the same -- they can have different values. And, once the function returns, these local variables "go away" and execution picks right up where it left off. Additionally, arguments need to be communicated to the function when called, and a return value needs to be communicated to the caller, upon return.
The compiler does this using a stack, often called the runtime stack. When a function is called, the compiler pushes the parameters onto the stack, followed by the return address. It then pushes empty space (allocates space) on the stack for all of the local variables. This state, which is stored on the runtime stack, and is associated with a single activation of a function, is known as a stack frame.
When the function returns, the compiler pops its frame from from the stack, revealing the return address, and stores the return value on the stack. Upon return, the caller then reads the return value from ahead of the stack.
Additionally, there is plenty of other state information associated with the activation of each function that is stored within the stack frame. For example, if there are only a few parameters, they are often passed in registers, fast memory within the CPU. These registers need to be saved to and restored from the stack when a function is called and upon its return. There are also some pieces of metadata about the stack, itself, such as pointers to the beginning of each stack frame, &c. And, to be honest, technically speaking, the paramters are part of the caller's stack frame, not the callee's stack frame.
But, for this class, we're not going to get bogged down in the details -- those are covered in 15-213. But, it is critical that you realize that each instance of a function has its own parameters and local variables. It is also very important that you realize that the compiler uses a stack to manage function invocation.

Global Variables

Java was a funny language in the sense that everything had to be part of a class. So, for example, all methods had to be defined within a class. And, all variables had to be defined within a class.
But, fortunately for us, we could create "static" methods and "static" variables. These odd animals lived within classes -- but really had nothing to do with the class or its instances. "static" methods were not able to access instance variables. And, "static" variables were shared among all instances, and if public -- among all parts of the code.
Think about that for a minute. "public static" variables, even though declared within a class, could be accessed by any method within the program. Although they were in the "name space" of the class, they were really global in nature.
The C Language has global variables proper. They are declared outside of any function. As a result, they are in "global space", not within a function's scope. Because the compiler processes each file as if from the top donw in a single pass, they are usually declared at the very top. The variable will appear undeclared if it is used at a location within the file before it is declared.
Before continuing, let me offer a warning. Global variables should only be used to model global state. And, global state is somewhat rare. Global variables shold never be used in place of local variables. And, global variables shold never be used instead of parameter passing. These are two common misuses that generate aweful code. Before using a global variable, ask yourself two questions: 1) Can this be a local? and 2) Can I pass it as an argument, where needed, rather than storing it globally?
The toy example below shows a global variable and its use. Notice all three functions can access the global variable:
  #include <stdio.h>

  int global = -1;

  void set(int value) {
    global = value;
  }

  int get() {
    return global;
  }


  int main() {

    printf ("Initial: %d\n", global);
    set (5);
    printf ("After set(): %d\n", global);
    printf ("Result from get(): %d\n", get());
    printf ("Another direct access: %d\n", global);
    global = 42;
    printf ("After main() changes the value: %d\n", global);

    return 0;
  }
  
I've got one last note. And, I know that this will be clear to many of you, especially after reading it, but it has caused some confusion over time. Variables declared within main() are not globals -- they are local to main(). main() happens to be the entry point into a C program -- but it is also a function like any other.

Function Prototypes

Let's consider a quick little program that contains add() and subt() functions.
addsub.c
  #include <stdio.h>

  float add (float x, float y) {
    return x + y;
  }

  float subt (float x, float y) {
    return x-y;
  }

  int main() {
    float a, b, c;

    a = 5.5;
    b = 7.5;

    c = add (a,b); 
    printf ("Sum is %f\n", c);

    c = subt (c,b); 
    printf ("Difference is %f, which, incidentally, is the same as \"a\"\n", c);
  
    return 0;
  }
  
As shown above, the program compiles just fine. But, if we move the add() and subt() functions below main(), the compiler barks at us:
addsub.c
  #include <stdio.h>

  int main() {
    float a, b, c;

    a = 5.5;
    b = 7.5;

    c = add (a,b); 
    printf ("Sum is %f\n", c);

    c = subt (c,b); 
    printf ("Difference is %f, which, incidentally, is the same as \"a\"\n", c);
  
    return 0;
  }

  float add (float x, float y) {
    return x + y;
  }

  float subt (float x, float y) {
    return x-y;
  }
  
Specifically, here's what it says:
  addsubt.c: In function 'main':
  addsubt.c:10: warning: implicit declaration of function 'add'
  addsubt.c:13: warning: implicit declaration of function 'subt'
  addsubt.c: At top level:
  addsubt.c:19: error: conflicting types for 'add'
  addsubt.c:10: error: previous implicit declaration of 'add' was here
  addsubt.c:23: error: conflicting types for 'subt'
  addsubt.c:13: error: previous implicit declaration of 'subt' was here
  
What's going on? Well, at the time that the compiler encounters each of the uses of add() and subt(), it hasn't encountered the functions, themselves -- it works through the file form top to bottom. So, it doesn't know their arguments or return types. What does it do? It assumes that the types of their arguments match those that are passed in. And, it assumes that the return type is "int". It has to assume this "default" type, because it can use an assignment to determine the type -- it is after-the-fact and might not even exist.
Then, when the compiler gets down to the bottom of the file, it sees the actual function definitions -- and observes the violation between the new defintion and the "implied" definition that it earlier assumed.
Rather than constraining ourselves to define functions before their first use, which is using the tail to wag the dog, we directly address the only problem -- that the compiler doesn't know the types associated with the function. We do this by placing a type of forward reference, known as a function prototype, ahead of its first use. By convention, we put it at the top of the code. This makes the compiler happy.
Notice the prototypes in the version below. They do not happen to contain the names of the arguments. This is unnecessary -- at this stage of the game, the compiler need only know the types.
addsub.c
  #include <stdio.h>

  /* Notice these prototypes */
  int add(int, int);
  int sub(int, int);
  /* End of prototypes */

  int main() {
    float a, b, c;

    a = 5.5;
    b = 7.5;

    c = add (a,b); 
    printf ("Sum is %f\n", c);

    c = subt (c,b); 
    printf ("Difference is %f, which, incidentally, is the same as \"a\"\n", c);
  
    return 0;
  }

  float add (float x, float y) {
    return x + y;
  }

  float subt (float x, float y) {
    return x-y;
  }
  

Multiple files, And Compiling Therewith

When working on larger projects, or even well structured smaller ones, it is often advisable to break up the source code into several different files, each of which contains a small section of strongly related code -- whether that is a single function or a few.
In Java, we were mostly forced to do this, because the name of the file had to be consistent with the name of the class that it contained. In other words, we were forced to implement only one class per file. But, in C, we aren't forced into any particular code organization -- instead, we need to think about it and be smart. The goals are readability and maintainability, more generally.
Regardless, there are some details we'll need to consider in putting together multiple file projects. The first of these details is, "How do we compile?" To build a system consisting of multiple source files, we simply compile as usual, listing all of the source files. Consider the example below:
  gcc -Wall -Wextra -ansi -pedantic main.c mathlib.c -o addsubt
  

Multiple Files and Header files

Let's consider breaking our example from above into two files: main.c and mathlib.c:
main.c:
  #include <stdio.h>

  int main() {
    float a, b, c;

    a = 5.5;
    b = 7.5;

    c = add (a,b);
    printf ("Sum is %f\n", c);

    c = subt (c,b);
    printf ("Difference is %f, which, incidentally, is the same as \"a\"\n", c);

    return 0;
  }

  
mathlib.c
  float add (float x, float y) {
    return x + y;
  }

  float subt (float x, float y) {
    return x-y;
  }

  
It is not great surprise that the compiler compains, much as it did before:
  main.c: In function 'main':
  main.c:9: warning: implicit declaration of function 'add'
  main.c:12: warning: implicit declaration of function 'subt'
  
But, we are actually much worse off this time. It didn't encounter the correct types before it finished compiling main.c. So, depending on the host system, this might actually link this, even if the two sets of types, "actual" and "implicit" are incompatible. This could be a big problem. Consider this broken output on an iMac. The output is wrong because the float return of the function is interpreted as an integer.
  Sum is 1075183616.000000
  Difference is 1104151936.000000, which, incidentally, is the same as "a"
  
We could fix this, as we did before, by putting the prototypes at the top of main.c. And, in fact we will. But, instead of putting them there ourselves, we're going to let the preprocessor do it for us. We are going to put them into a header, "mathlib.h". We'll then #include mathlib.h everywhere that uses any function from mathlib.c. This idiom will provide us a general solution to the problem. And, it saves us the risk of typing something wrong.
mathlib.h
  float add (float, float);
  float subt (float, float);
  
main.c
  #include <stdio.h>
  #include "mathlib.h"

  int main() {
    ...
  }
  
In the example above, notice that the system header file, stdio.h, was included using <>-brackets, but mathlib.h was included using ""-quotes. The brackets instruct the preprocessor to look for the file only within the pre-configured location for standard headers. The ""-quotes tell the preprocessor to look in the local directory before looking in the standard location. Since mathlib.h is our file, we have t use the ""-quotes.
We'll always use this idiom when we code. For every library file, we'll have a header file that we include where it is used. Also, one other note. Never, ever, ever #include a .c file, or any other file that contains actual code. This is another huge point penalty: Probably more than 10 points. Header files should only contain definitions -- not actual implementation. And implementation files should be linked together, not concatenated by the preprocessor. That generates an unmanageable mess and leaves the system vulnerable to a whole host of build problems and bugs.

Guarding Against Multiple Includes

In complex builds, it is possible that the same header file will be included more than once. The problem with this is that the compiler will run into the same definitions more than once. And, when it does, it'll warn about multiple definitions.
To prevent this from happening, we ask the preprocessor to include the file only if it hasn't done so already. To achieve this, we make use of the "if no define" and "define" directives. The first time the preprocessor encounters the file, we define a pre-processor macro. We put the entire content of the header file within an if-statement that excludes the code in the event that this macro is defined. In this way, the preprocessor only processes each header file once, at which time it defines the macro, which is, in effect, an annotation that flags it not to include the content of the file again.
Notice the name of the macro: MATHLIB_H. This is the convention: Capitalize the name of the file and use an _underscore in place of the .dot.
mathlib.h
  #ifndef MATHLIB_H
  #define MATHLIB_H

  float add (float, float);
  float subt (float, float);

  #endif /* No code below here */
  

Multiple Files, and Global Variables

Okay, now let's consider using global variables across multiple files. Consider the example below:
mathlib.c
  float lastresult;  
  
  float add (float x, float y) {
    lastresult = x + y;
    return lastresult;
  }

  float subt (float x, float y) {
    lastresult = x-y;
    return lastresult;
  }

  float getlastresult() {
    return lastresult;
  }

  
mathlib.h
  #ifndef MATHLIB_H

  float add(float, float);
  float subt(float, float);
  float getlastresult();
  
  #endif 
  
main.c
  #include <stdio.h>
  #include "mathlib.h"

  int main() {
    float a, b, c, d;

    a = 5.5;
    b = 7.5;

    c = add (a,b);
    printf ("Sum is %f\n", c);

    d = getlastresult();
    printf ("The last result was: %f \n", d);

    return 0;
  }

  
Now, if global variables are truly global, we should be able to use the one defined within mathlib.c from within main.c, right? Yep. We can. But, we've got a problem if we try. Consider the following main.c. Notice that it tries to use the "result" defined within mathlib.c
  #include <stdio.h>
  #include "mathlib.h"

  int main() {
    float a, b, c, d;

    a = 5.5;
    b = 7.5;

    c = add (a,b);
    printf ("Sum is %f\n", c);

    printf ("The last result was: %f \n", lastresult);

    return 0;
  }

  
The compiler gets angry with us:
  main.c: In function 'main':
  main.c:14: error: 'lastresult' undeclared (first use in this function)
  main.c:14: error: (Each undeclared identifier is reported only once
  main.c:14: error: for each function it appears in.)
  
What's the problem? If it is global, why does the compiler complain that it is undeclared? The problem is that the compiler compiles one file at a time. And, in processing main.c, it can't see the global variables declared within mathlib.c. It doesn't even know that they are there. And, since the individual files are seaprate until the linking step, the order of compilation doesn't matter -- it won't be remembered from one file to the next.
The fix for this is to tell the compiler that the global variable exists and is in another file that the linker will find later. We do this with the "extern" keyword:
  #include <stdio.h>
  #include "mathlib.h"

  extern float lastresult;

  int main() {
    float a, b, c, d;

    a = 5.5;
    b = 7.5;

    c = add (a,b);
    printf ("Sum is %f\n", c);

    d = getlastresult();
    printf ("The last result was: %f \n", lastresult);

    return 0;
  }
  
But, as with the other definitions associatiated with mathlib.c, we don't really want to have to do this each time we use mathlib. So, we include the extern in our header file, so it gets included everywhere that we are using mathlib, along with everything else:
  #ifndef MATHLIB_H

  extern float lastresult;

  float add(float, float);
  float subt(float, float);
  float getlastresult();

  #endif

  
Now For Some Weirdness
What we are going to do below is wrong. Don't do this. Do what we did above. We just do this to learn about linking. We don't do this to learn how to code nicely.
What if, instead of extern'ing "lastresult", either in main.c or within mathlib.h, we just redefine (note: redfine, not extern) it within main.c:
  #include <stdio.h>
  #include "mathlib.h"

  float lastresult; /* this is a repeat fo the definition within mathlib.c */

  int main() {
    float a, b, c, d;

    a = 5.5;
    b = 7.5;

    c = add (a,b);
    printf ("Sum is %f\n", c);

    d = getlastresult();
    printf ("The last result was: %f \n", lastresult);

    return 0;
  }

  
As it turns out, this works. Upon linking, the linker notices two identical definitions of the global variable and smashes them down to one.
But, just for a bit more fun, let's modify our mathlib.c so that it initializes "lastresult". Please note, we're leaving the definition within main.c, above, exaclty as it was, present -- but without the initialization.
  float lastresult = -1; /* Notice this initalization */

  float add (float x, float y) {
    lastresult = x + y;
    return lastresult;
  }

  float subt (float x, float y) {
    lastresult = x-y;
    return lastresult;
  }

  float getlastresult() {
    return lastresult;
  }

  
Okay. So, what's the big deal? it still works -- did we really expect anything different? Well, maybe not, but let's make one more change. Let's leave that initialization in mathlib.c, just as we see it above, and also mimic it exactly in main.c as below:
  #include <stdio.h>
  #include "mathlib.h"

  float lastresult = -1; /* Notice, now initialized to -1 in both places */

  int main() {
    float a, b, c, d;

    a = 5.5;
    b = 7.5;

    c = add (a,b);
    printf ("Sum is %f\n", c);

    d = getlastresult();
    printf ("The last result was: %f \n", lastresult);

    return 0;
  }
  
So, what does our friend, the compiler, think? Well, it looks like the linker has some issues.
  /usr/bin/ld: multiple definitions of symbol _lastresult
  /var/tmp//ccaJeNXt.o definition of _lastresult in section (__DATA,__data)
  /var/tmp//ccoBUcZ0.o definition of _lastresult in section (__DATA,__data)
  
Weak Symbols vs Strong Symbols
Symbols are the names associated with functions, variables, &c. The names are assocaited with the actual objects via a data structure maintained by the compiler known as the symbol table. The symbol table keeps the symbol, the location of the object in memory, and its size. It also notes whetehr the symbol is strong, or weak.
Unless special non-ortable compiler black magic is performed, it works liked this:

uninitialized global variables are weak
initialized global variables are strong
functions are strong (because they are, in effect, initialized)

When the linker encounters an apparent redefinition of multiple weak symbols, it can take any of them. In the context of uninitialized global variables, this makes sense -- they are all equivalent.
When the linker encounters a strong symbol and one or more weak definitions of the same symbol -- it takes the strong symbol. The reason for this is easily illustrated with global variables. If we have one that is uninitalized and one that is initialized, it makes sense to keep the initialization.
If the linker encounters two or more strong definitions, it fails. The reason for this is that the two different initializations might be conflicting. It can't take either, wihtout changing the source in one .c file or the other .c file might be initialized.
It is important to note that the liker can't see the actual initialization. It is looking only at the symbol table. So, it doesn't know if the actual initial values are conflicting or not. So, since it doesn't know that they are the same, it does the only safe thing -- it breaks. And, this makes the most sense, really. It would make no sense if changing the initialized value of a global variable could enable or break a compile. And that could happen if did actually allow two strong definitions to be collapsed into one in the special case that they happened to have the same initial value.
This is why our multiple defintion of "lastresult" failed when they were both inialized, even though it was to the same value -- and didn't when exactly one of the two was initialized or neither was initialized. When they were both initialized, even though it was to the same value, they were both strong symbols and therefor in conflict at link time.

"static" Global Variables

The qualifier "static" is one of the most, if no the most, overloaded reserved word in C. Its apparent meaning changes depending on the context in which it is used. The first use of "static" that we'll examine is its use as a qualifier for a global variable.
When "static" is used as a qualifer for a global variable, it means that the "static" global variable can only be used within the file in which it is declared.
Let's go back to our lastresult/getlastresult() example from above. Let's say that we want to force users of out library to call getlastresult() instead of accessing "lastresult" directly. We might want to make "lastresult" off-limits, for example, to make sure that a caller doesn't accidentally assign it a value and, thereby, break our library's state.
We can achieve this by declaring "lastresult" a "static", as seen in the example below:
  #include <stdio.h>
  #include "mathlib.h"

  extern float lastresult;

  int main() {
 
    extern float lastresult;
  
    float a, b, c, d;

    a = 5.5;
    b = 7.5;

    c = add (a,b);
    printf ("Sum is %f\n", c);

    d = getlastresult();
    printf ("The last result was: %f \n", lastresult);

    return 0;
  }
  
mathlib.c
  static float lastresult;  
  
  float add (float x, float y) {
    lastresult = x + y;
    return lastresult;
  }

  float subt (float x, float y) {
    lastresult = x-y;
    return lastresult;
  }

  float getlastresult() {
    return lastresult;
  }
  
mathlib.h
  #ifndef MATHLIB_H

  float add(float, float);
  float subt(float, float);
  float getlastresult();
  
  #endif 
  
When we attempt to compile this, we see that main.c compiles as it did before -- the extern enables it to compile, even without the "lastresult" object. But, the linker does not try to use the "static" "lastresult" from mathlib.c to satisfy the need for a "lastresult" in main.c. And, since no other object is available, the link fails:

"static" Functions

"static" functions are very similar to "static" global variables. By placing the "static" qualifier function before a function, we can lock it down so that it can only be called by other functions within the same file. In some respects, this is analagous to a "private" method in Java, whcih can only be called by other methods within the same class.

"static" Local Variables

As discussed earlier, local variables are normally "automatic" variables. Their normal lifetime is the duration of a function call. Each time a function is called, it gets brand-new local variables that are cleaned up upon the function's return.
But, if we use the "static" qualifier before a local variable, the local variable is not an "automatic" variable. it is not allocated on the runtime stack. Instead, it is allocated in the same space as the program's global variables. It isn't a global variable -- its scope is still limited to the function. But, since it isn't created and destroyed with each function call, it persists across functions.
These are considered "static" variables, because like "static" global variables, they are not allocated on the stack and have a lifetime spanning the program's execution. And, like static global variables, they are limited in where they can be used. But, it is probably easiest to think of "static local variables" and "static global variables" as two different things.
Consider the example of static local variables below. Look specifically at count(), which simply returns a monotonically increasing number with each call. Notice that the intialization, which is no more required than it is for any other variable, happens only once -- when the program, itself, is being loaded.
Notice that, when run, the value returned by count() starts out at zero and increased by one each time the function is called: 0, 1, 2, 3, 4, 5, 6, ... Unlike an "automatic" local variable, the kind that you get without the "static" qualifier, the variable persists across calls and maintains its value.
mathlib.c
  static float lastresult;  

  int count() {
    static int currentcount=0;

    return currentcount++;
  }
  
  float add (float x, float y) {
    lastresult = x + y;
    return lastresult;
  }

  float subt (float x, float y) {
    lastresult = x-y;
    return lastresult;
  }

  float getlastresult() {
    return lastresult;
  }
  
main.c
  #include <stdio.h>
  #include "mathlib.h"


  int main() {

    printf ("%d\n", count());
    printf ("%d\n", count());
    printf ("%d\n", count());
    printf ("%d\n", count());
    printf ("%d\n", count());
    printf ("%d\n", count());
    printf ("%d\n", count());

    return 0;
  }

  
mathlib.h
  #ifndef MATHLIB_H
  #define MATHLIB_H

  int count();
  float add (float, float);
  float subt (float, float);
  float getlastresult();

  #endif /* No code below here */
  
Above Program's Output:
  0
  1
  2
  3
  4
  5
  6