Return to lecture notes index
October 5, 2010 (Lecture 12)

Strings

We've made plenty of use of strings this semester. And, we've talked a little bit about them. But, going into the test, I want to look at them systematically and for the record. Unlike Java, the C Language has only a very minimal, second-class implementation of strings. In C, a string is really just an array of charactters. Consider the following example:

  int index;
  char[] name = "George Washington";

  for (index=0; index < 6; index++) {
    printf ("%c\n", name[index]); /* Prints G, e, o, r, g, e; one letter per line */
  }
  

One aspect of strings that is interesting is the way that C keeps track of the end of the string. Keep in mind that strings are represented as arrays, so, unlike Java, there is no size field. Instead, C annotes the end of the string within the string itself. It adds a NULL ('\0') at the end of each string. In the example above, althoguh the string "George" contains 6 letters, numbered 0 - 5, it actually is 7 characters long. The "invisible" 7th character is a null. The example below prints the numerical value of each character, including the 0-zero for the NULL:

  int index;
  char[] name = "George";

  for (index=0; index < 7; index++) {
    printf ("%d\n", (int) name[index]); /* Prints 71, 101, 111, ..., 0, one number per line */
  }
  

The null terminator can also be used to find the end of a string, as shown below:

  int index;
  char[] name = "George";
  char *cptr = name;

  while (*cptr) { /* stops when cptr == 0 (NULL) */
    printf ("%c\n", *cptr); /* Prints G, e, o, r, g, e; one number per line */
    cptr++;
  }
  

The C Language has a ton of functions designed to make it easy to manipulate strings. You might want to take a look at the man pages associated with the following functions and give them a try:

Unions

There is another construct, the union. In syntax is very, very, very similar to that of a struct. But, as it turns out, it is very different than a struct. As an aside, we could further contract this defintion, using an anonymous union, as we did for the struct defintions in the prior examples:

  union measurement_t {
    unsigned int cubiccentimeters;
    unsigned float pounds;
    unsigned float kilograms;
  };

  typedef union measurement_t measurement;
  

This union indicates that a measurement is ONE OF an unsigned int cc, an unsigned float pounds, or an unsigned float kilograms. As it turns out the compiler need only allocate enough space the represent any one of of them -- not all of them. Consider the following example which shows that the storage for the three types is overlapped:

  #include <stdio.h>

  typedef union {
    int cc;
    float kg;
    float lbs;
  } measurement;


  int main() {

    measurement m;

    /*
     * It makes no sense to initialize all three, we can only use one
     * at a time. But, I want to prove a point here -- they are all 0
     */
    m.cc = 0;
    m.kg = 0;
    m.lbs = 0;
  
    /*
     * Notice that, no matter how we print the union -- and there is
     * only "one thing", not three, it is 0
     */
    printf ("%d cc, %f lbs, %f kgs\n", m.cc, m.kg, m.lbs);

    /*
     * Notice, we're only setting the value of the union using one
     * of its presentations, as "int kg"
     */
    m.kg = 10;
  
    /*
     * Notice that both of the "int" presentations have the same
     * value -- and the other one has a random value, because its
     * bits have changed, but aren't interpreted as an int and might,
     * or might not, be exactly overlapping in memory.
     */

    /* 1092616192 cc, 10.000000 lbs, 10.000000 kgs */
    printf ("%d cc, %f lbs, %f kgs\n", m.cc, m.kg, m.lbs);


    return 0;
  }
  

So, why would one use a union? Often times unions are used when one wants to allow for more than one type of data -- but does not want to allow things to be opened up to all types of data, as might be the case if a "void pointer", a.k.a., "generic pointer" is used. We'll talk more about "void pointers" soon.

Consider the case of a shipping company which quotes some clients rates by volume and others by weight. They might structure code as follows. Notice the combination of the struct and union. As an aside, we could use an enum, an enumeration, in place fo the #defined constants, but we haven't learned those, yet -- and they aren't alwasy used, in practice:

  #include <stdio.h>


  #define CC 1
  #define KG 2
  #define LBS 3


  typedef union {
    int cc;
    float kg;
    float lbs;
  } measurement;

  typedef struct {
    int units;
    measurement boxMeasurement;
  } shipmentrecord;


  int ship(shipmentrecord *, unsigned long *);


  int main() {

    shipmentrecord sr;
    unsigned long charges;

    sr.units = KG;
    sr.boxMeasurement.kg = 10;

    ship (&sr, &charges);

    return 0;
  }


  int ship (shipmentrecord *sr, unsigned long *charges) {

    switch (sr->units) {

    case CC:
      /* Compute charges by volume ... */
      break;

    case LBS:
      /* Compute charges by weight in pounds ... */
      break;

    case KG:
      /* Compute charges by mass in kilograms ... */
      break;

    }

    return 0;

  }