Lecture 4 (September 4, 2008)

Lecture 4(Thursday, September 4, 2008)

Sharing or Fighting?

Cooperating processes (or threads) often share resources. Some of these resources can be concurrently used by any number of processes. Others can only be used by one process at a time.
The air in this room can be shared by everyone without coordination -- we don't have to coordinate our breathing. But the printer wouldn't be much use to any of us if all of us were to use it at the same time. I'm not sure exactly how that would work -- perhaps it would print all of our images superimposed, or perhaps a piece here-or-there from each of our jobs. But in any case, we would want to do something to coordinate our use of the resource. Another example might be an interesection -- unless we have some way of controlling our use of an intersection (stop signs? traffic lights?) -- smack!
The policy that defines how a resource should be shared is known as a sharing discipline. The most common sharing discipline is mutual exclusion, but many others are possible. When the mutual exclusion policy is used, the use of a resource by one process excludes its use by any other process.
How do we manipulate a resource from within a program? With code, of course. The portion of a program that manipulates a resource in a way that requires mutual exclusion, or some other type of protection, is known as a critical section. We also noted that in practice, even a single line of HLL code can create a critical section, because one line of HLL code may be translated into several lines of interruptable machine-language code.

Characteristics of a Solution

A solution to the mutual exclusion problem must satisfy three conditions:

Mutual Exclusion: Only one process can be in the critical section at a time -- otherwise what critical section?.
Progress: No process is forced to wait for an available resource -- otherwise very wasteful.
Bounded Waiting: No process can wait forever for a resource -- otherwise an easy solution: no one gets in.

Disabling Interrupts

It is actually possible to synchronize an arbitrary number of processes within an operating system without hardware support. But, this is a tangled, ticklish, and generally unfortunate businesss. In practice, in order to implement a general-purpose solution in the real world, we're going to need a small amount of hardware support to implement very basic concurrency control primitives, from which we can build up richer abstractions or simply correct, easily readable solutions to straight-forward problems.
One rudimentary form of synchronization supported by hardware is frequently used within the kernel: disabling interrupts. The kernel often guarantees that its critical sections are mutually exclusive simply by maintaining control over the execution. Most kernels disable interrupts around many critical regions ensuring that they can not be interrupted. Interrupts are reenabled immediately after the critical section.
Unfortunately, this approach only works in kernel mode and can result in a delayed service of interrupts -- or lost interrupts (if two of the same type occur before interrupts are reenabled). And it only works on uniprocesors, becuase diabling interrupts can be very time-consuming and messy if several processors are involved.
    Disable_interrupts();

    <<< critical section >>>

    Enable_interrupts();
    

Special Instructions

Another approach is to build into the hardware very simple instructions that can give us a limited guarantee of mutual exclusion in hardware. From these small guarantees, we can build more complex constructs in software. This is the approach that is, virtually universally, in use in modern hardware.

Test-and-Set

One such instruction that is commonly implemented in hardware is test-and-set. This instruction allows the atomic testing and setting of a value.
The semantics of the instruction are below. Please remember that it is atomic. It is not interruptable.
    TS (<mem loc>)
    {
       if (<memloc> == 0)
       {
           <mem loc> = 1;
           return 0;
       }
       else
       {
          return 1;
       }
    
Given the test-and-set instruction, we can build a simple synchronization primiative called the mutex or spin-lock. A process busy-waits until the test-and-set succeeds, at which time it can move into the critical section. When done, it can mark the critical section as available by setting the mutex's value back to 0 -- it is assumed that this operation is atomic.
    Acquire_mutex(<mutex>) /* Before entering critical section */
    {
        while(TS(<mutex>))
    }

    Release_mutex(<mutex>) /* After exiting critical section */
    {
        <mutex> = 0;
    }
    

Compare-and-Swap

Another common hardware-supplied instruction that is useful for building synchronization primatives is compare-and-swap. Much like test-and-set, it is atomic in hardware -- the pseudo-code below just illustrates the semantics, it is not meant to in any way suggest that the instruction is interruptable.
    CS (<mem loc>, <expected value>, <new value>)
    {
        if (<memloc> == <expected value>))
        {
           <mem loc> = <new value>;
           return 0;
        }
        else
        {
           return 1;
        }
    
You may also want to note that test-and-set is just a special case of compare-and-swap:
TS(x) == CS (x, 0, 1)

The pseudo-code below illustration the creation of a mutex (spin lock) using compare-and-swap:
    Acquire_mutex(<mutex>) /* Before entering critical section */
    {
       while (CS (<mutex>, 1, 0))
       ;
    }

    Release_mutex(<mutex>) /* After exiting critical section */
    {
       <mutex> =  1;
    }

    

Counting Semaphores

Now that we have hardware support, and a very basic primative, the mutex, we can build higher-level synchronization constructs that can make our life easier.
The first of these higher-level primatives that we'll discuss is a new type of variable called a semaphore. It is initially set to an integer value. After initialization, its value can only be affected by two operations:

P(x)
V(x)

P(x) was named from the Dutch word proberen, which means to test.
V(x) was named from the Dutch word verhogen, which means to increment.
The pseudo-code below illustrates the semantics of the two semaphore operations. This time the operations are made to be atomic outside of hardware using the hardware support that we discussed earlier -- but more on that soon.
    /* proberen - test *.
    P(sem)
    {
       while (sem <= 0)
       ;
       sem = sem - 1;
    }


    /* verhogen - to increment */
    V(sem)
    {
       sem = sem + 1;
    }
    
In order to ensure that the critical sections within the semaphores themselves remain protected, we make use of mutexes. In this way we again grow a smaller guarantee into a larger one:
    P (csem) {
       while (1)  {
          Acquire_mutex (csem.mutex);
          if (csem.value <= 0) {
             Release_mutex (csem.mutex);
             continue;
          } 
          else {
              csem.value = csem.value – 1;
              Release_mutex (csem.mutex);
              break;
          }
       }
    }


    V (csem) 
    {
        Acquire_mutex (csem.mutex);
        csem.value = csem.value + 1;
        Release_mutex (csem.mutex);
    }
    
But let's carefully consider our implementation of P(csem). If contention is high and/or the critical section is large, we could spend a great deal of time spinning. Many processes could occupy the CPU and do nothing -- other than waste cycles waiting for a process in the runnable queue to run and release the critical section. This busy-waiting makes already high resource contention worse.
But all is not lost. With the help of the OS, we can implement semaphores so that the calling process will block instead of spin in the P() operation and wait for the V() operation to "wake it up" making it runnable again.
The pseudo-code below shows the implementation of such a semaphore, called a blocking semaphore:
    P (csem) {
       while (1)  {
          Acquire_mutex (csem.mutex);
          if (csem.value <= 0) {
             insert_queue (getpid(), csem.queue);
             Release_mutex_and_block (csem.mutex); /* atomic: lost wake-up */
          } 
          else {
              csem.value = csem.value – 1;
              Release_mutex (csem.mutex);
              break;
          }
       }
    }


    V (csem) 
    {
        Acquire_mutex (csem.mutex);

        csem.value = csem.value + 1;
        dequeue_and_wakeup (csem.queue)

        Release_mutex (csem.mutex);
    }
    
Please notice that the P()ing process must atomically become unrunnable and release the mutex. This is becuase of the risk of a lost wakeup. Imagine the case where these were two different operations: release_mutex(xsem.mutex) and sleep(). If a context-switch would occur in between the release_mutex() and the sleep(), it would be possible for another process to perform a V() operation and attempt to dequeue_and_wakeup() the first process. Unfortunately, the first process isn't yet asleep, so it missed the wake-up -- instead, when it again runs, it immediately goes to sleep with no one left to wake it up.
Operating systems generally provide this support in the form of a sleep() system call that takes the mutex as a parameter. The kernel can then release the mutex and put the process to sleep in an environment free of interruptions (or otherwise protected).

Boolean Semaphores

In many cases, it isn't necessary to count resources -- there is only one. A special type of semaphore, called a boolean semaphore may be used for this purpose. Boolean semaphores may only have a value of 0 or 1. In most systems, boolean semaphores are just a special case of counting semaphores, also known as general semaphores.