Return to lecture notes index

February 26, 2008 (Lecture 12)

struct

The C Language supports complex data types that are composed of several individual pieces of data. One classic example of this type of complex data type is the "student record" which might be composed of a student's name, birthdate, identification number, and perhaps another complex data type, a transcript of courses. Another classic example, is the price tag, which might contain both the name of the product and the price.

This type of complex data is sometimes called structured data and for this reason it is supported in the C Language with a constrcut known as the struct. This type of structured data is also sometimes known as a record, nomenclature that dervies from databases. For this reason, some languages call the analagous construct the record.

Consider the example below. It illustrates the definition of a struct. Please notice the keyword struct, followed by the struct's identifier. It is important to realize that this does not create an actual instance -- just defines the struct and gives the new type of struct the name "person_t".

Next comes the struct's body -- the individual fields. These are the individual types that combine to form this complex, structured, type.

Please pay careful attention to the ;-semicolon after the }-closing-squiggly. This ;-semicolon is very important -- and very easy to leave out, especially for Java programmers who aren't accustomed to ending class definitions with one. If it is left out, in many cases, the C compiler will choke and produce all sorts of seemingly meaningless error messages.

  struct person_t {
    char fname[256];
    char lname[256];
    unsigned int age;
  };
  

If we actually want to declare an instance of this struct type, we can do that as below. Please notice that we must use the keyword "struct" as part of the declaration. Regardless, the line below creates an actual "struct variable" that we can use.

  struct person_t somePerson;
  

It is also possible to contract the definition and the declaration into one statement as below:

  struct person_t {
    char fname[256];
    char lname[256];
    unsigned int age;
  } somePerson;
  

Regardless, we access the fields of the struct using the "." operator as follows:

  struct person_t somePerson;

  strcpy (somePerson.fname, "Greg");
  strcpy (somePerson.lname, "Kesden");
  somePerson.age = 65;
  

When passing structs to functions, we often pass them by reference, even if we don't intend to change them within the function. We do this to avoid the overhead of copying the struct by value -- it is much faster to only copy the pointer. Soon, we'll learn about hints we can give the compiler if we don't intend to change it.

This means that we'll spend a lot of time accessing struct's via pointers. One way of doing this is exactly as one might expect. We dereference the pointer to get to the variable and then use the .-dot operator.

  int someFunction (struct person_t *somePerson) {
    strcpy ((*somePerson).fname, "Greg");
    strcpy ((*somePerson).lname, "Kesden");
    (*somePerson).age = 65;

    return 0;
  }
  

But, this synax is ugly -- especially for something so common. One goal of any programming language is to make the common case convenient. And, in this case, the C Language does exactly that. It defines an arrow operator formed by combining a -dash and >-greater than sign: >-. This notation is nothing more than shorthand for the star-and-dot notation. Some compilers implement the arrow-operator as a first-class language feature, whereas others use the preprocessor to, before compilation-proper, convert the "shortcut" arrow notation to the star-and-dot notation. By way of example, the code below is exactly equivalent to the prior example above:

  int someFunction (struct person_t *somePerson) {
    strcpy (somePerson->fname, "Greg");
    strcpy (somePerson->lname, "Kesden");
    somePerson->age = 65;

    return 0;
  }
  

typedef

The C Language also provides a mechanism for creating new type names from old ones. This is done using a typedef as in the example below:

  typedef unsigned int uint;
  uint number1;
  unsigned int number2;
  number1 = number2;
  

"typedef" is a keyword and begins the definition. The last string on the line is the identifier, a.k.a name, of the new type. Everything in between is the description of the type using its old defintion. So, the example above defines a "uint" to be the same thing as an "unsigned int".

"typedef" is very useful in symplifying the use of structs. Notice that in every declaration of a "struct", whether as a variable or parameter, we were required to use the keyword "struct". Consider again one of the examples from above. Notice the presence of the keyword "struct". We declared the paramter to be "struct person_t *somePerson", not just "person_t *somePerson":

  int someFunction (struct person_t *somePerson) {
    strcpy (somePerson->fname, "Greg");
    strcpy (somePerson->lname, "Kesden");
    somePerson->age = 65;

    return 0;
  }
  

We can use a typedef to clean this up by defining a new type name for the struct. Consider the following definition:

  struct person_t {
    char fname[256];
    char lname[256];
    unsigned int age;
  };

  typedef struct person_t person;
  

We can now just use the type "person", in place of the type "struct person_t". Consider the revised example below:

  int someFunction (person *somePerson) {
    strcpy (somePerson->fname, "Greg");
    strcpy (somePerson->lname, "Kesden");
    somePerson->age = 65;

    return 0;
  }
  

We can actually revise our definition of the struct to contract the typedef and the struct definition as below:

  typedef struct person_t {
    char fname[256];
    char lname[256];
    unsigned int age;
  } person;
  

It is really ugly, but notice that this exactly follows the sytax of a typedef that we've seen before. To see this more easily, we can rewrite it with the meaningless whitespace formatted differently -- the "typedef" keyword, followed by the type definition, followed by the new type name.

  typedef struct person_t { char fname[256]; char lname[256]; unsigned int age; } person;
  

But, putting aside the funny spacing, we can actually simplify the definition a bit more. Since we'll be using the new type name, "person", we won't actually need to refer to it as a "struct person_t". As a result, we can use an anonymous struct within the typedef. Basically, the syntax stays the same -- except the struct identify, person_t, can go away:

  typedef struct {
    char fname[256];
    char lname[256];
    unsigned int age;
  } person;
  

Unions

There is another construct, the union. In syntax is very, very, very similar to that of a struct. But, as it turns out, it is very different than a struct. As an aside, we could further contract this defintion, using an anonymous union, as we did for the struct defintions in the prior examples:

  union measurement_t {
    unsigned int cubiccentimeters;
    unsigned float pounds;
    unsigned float kilograms;
  };

  typedef union measurement_t measurement;
  

This union indicates that a measurement is ONE OF an unsigned int cc, an unsigned float pounds, or an unsigned float kilograms. As it turns out the compiler need only allocate enough space the represent any one of of them -- not all of them. Consider the following example which shows that the storage for the three types is overlapped:

  #include <stdio.h>

  typedef union {
    int cc;
    float kg;
    float lbs;
  } measurement;


  int main() {

    measurement m;

    /*
     * It makes no sense to initialize all three, we can only use one
     * at a time. But, I want to prove a point here -- they are all 0
     */
    m.cc = 0;
    m.kg = 0;
    m.lbs = 0;
  
    /*
     * Notice that, no matter how we print the union -- and there is
     * only "one thing", not three, it is 0
     */
    printf ("%d cc, %f lbs, %f kgs\n", m.cc, m.kg, m.lbs);

    /*
     * Notice, we're only setting the value of the union using one
     * of its presentations, as "int kg"
     */
    m.kg = 10;
  
    /*
     * Notice that both of the "int" presentations have the same
     * value -- and the other one has a random value, because its
     * bits have changed, but aren't interpreted as an int and might,
     * or might not, be exactly overlapping in memory.
     */

    /* 1092616192 cc, 10.000000 lbs, 10.000000 kgs */
    printf ("%d cc, %f lbs, %f kgs\n", m.cc, m.kg, m.lbs);


    return 0;
  }
  

So, why would one use a union? Often times unions are used when one wants to allow for more than one type of data -- but does not want to allow things to be opened up to all types of data, as might be the case if a "void pointer", a.k.a., "generic pointer" is used. We'll talk more about "void pointers" soon.

Consider the case of a shipping company which quotes some clients rates by volume and others by weight. They might structure code as follows. Notice the combination of the struct and union. As an aside, we could use an enum, an enumeration, in place fo the #defined constants, but we haven't learned those, yet -- and they aren't alwasy used, in practice:

  #include <stdio.h>


  #define CC 1
  #define KG 2
  #define LBS 3


  typedef union {
    int cc;
    float kg;
    float lbs;
  } measurement;

  typedef struct {
    int units;
    measurement boxMeasurement;
  } shipmentrecord;


  int ship(shipmentrecord *, unsigned long *);


  int main() {

    shipmentrecord sr;
    unsigned long charges;

    sr.units = KG;
    sr.boxMeasurement.kg = 10;

    ship (&sr, &charges);

    return 0;
  }


  int ship (shipmentrecord *sr, unsigned long *charges) {

    switch (sr->units) {

    case CC:
      /* Compute charges by volume ... */
      break;

    case LBS:
      /* Compute charges by weight in pounds ... */
      break;

    case KG:
      /* Compute charges by mass in kilograms ... */
      break;

    }

    return 0;

  }