Lecture 7 (January 31, 2000)

Reading

Chapter 7

Semaphores

Semaphores For Mutual Exclusion

The example below illustrates one of the simplest uses of semaphores. In shows the use of semaphores to protect a crical section. In this case the type of semaphore used is a boolean semaphore. Boolean semaphores can hold values of 0 or 1.
In this case, the semaphore is initialized to a value of 1 and is reduced to 0 by the P() operation when a process enters the critical section. It is incremented back to 1 by the V() operation on exit from the critical section. The P() operation allows a process to atomically enter the critical section and decrement the value of the semaphore to 0. In other words the test of the semaphore's value and decrement of the semaphore value (on entrance) is atomic.
Semaphore x = 1;

while (1)
{
  P(x);

  << critical section >>

  V(x);

  << Remainder section >>
}
Semantics of Semaphore Operations

The semantics of the semaphore operations are shown below. Please remember that this is intended to describe the behavior of the operations, not their implementation.
It is important to pay attention to the required atomicity within these operations..
The V() operation can not be interrupted during the increment, even though this may take several machine-level instructions to implement.
Furthermore, the test and decrement of x within the P() operation must occur atomically, despite the while loop. But the entire operation cannot execute atomically, because this would require that it occupy the CPU until another thread performed a V() -- this couldn't occur, except on a multiprocessor. The only safe position to context switch within the P() operation is after the test of x within the while loop, but only in the case that loop will repeat. Otherwise the decrement of x must occur before any context switch to protect the atomicity of the test and set of x.
P(x):
      while (x <= 0)
      ;
      x = x - 1;


V(x):
      x = x + 1;
Bounded Buffering

Another classic problem is the bounded buffer problem. In this case we have a producer and a consumer that are cooperating through a shared buffer. The buffer temporarily stores the output of the producer until removed by the consumer. In the event that the buffer is empty, the consumer must pause. In the event that the buffer is full, the producer must pause. Both must cooperating in accessing the shared resource to ensure that it remains consistent.
A semaphore solution to the bounded buffering problem uses counting semaphores as well as the boolean semaphores we saw earlier. Counting semaphores are a more general semaphore than binary semaphores. The P() operation always decrements the value of the semaphore, and the V() operation always increments the value of the semaphore. The value of the semaphore is any integer, not just 0 or 1. In truth, most implementations provide only counting semaphores. Binary semaphores are obtained through the disipline of the programmer. Even if binary sempahores are provided separately, the result is undefined if P() is called and the value is already 0, or if V() is called when the value is already 1.
The example below shows a general solution ot the bounded buffer problem using semaphores. Notice the use of counting semaphores to keep track of the state of the buffer. Two semaphores are used -- one to count the available buckets and another to count the full buckets. The producer producer uses empty buckets (decreasing semaphore value with P()) and increases the number of full buckets (increasing the semaphore value with V()). It blocks on the P() operation if not buckets are available in the buffer. The consumer works in a symmetric fashion.
Binary semaphores are used to protect the critical sections within the code -- those sections where both the producer and the consumer manipulate the same data structure. This is necessary, becuase it is possible for the producer and consumer to operate concurrently, if there are both empty and full buckets within the buffer.
producer:
           while(1)
           {
             << produce item >>
             
             P(empty);
             P(mutex);

             << Critical section: Put item in buffer >>
             
             V(mutex);
             V(full);
           }


consumer:
           while(1)
           {
             P(full);
             P(mutex);

             << Critical section: Remove item in buffer >>
             
             V(mutex);
             V(empty);
           }
Bounded wait?

The implementation of semaphores that we've discussed so far doesn't provided a bounded wait. If two threads are competiting for the semaphore in ther while loop, which one wins? There is no guarantee that any thread will eventually win.
This isn't a correct solution -- although problems are rare and require very high contention.
A correct solution needs to have some fairness requirement. The P() operation should add blocked processes to some type of queue. This queue. The V() operation should then free a process from this queue. It isn't necessary that this queue be strictly FIFO, but this is the easiest way to enure a bounded waiting time. It should be impossible for any thread to starve.
The Readers and Writers Problem

The Readers and Writers problem is much like a version of the bounded buffers problem -- with some more restrictions. We now assume two kinds of threads, readers and writers. Readers can inspect items in the buffer, but cannot change their value. Writers can both read the values and change them. The problem allows any number of concurrent reader threads, but the writer thread must have exclusiver access to the buffer.
One note is that we should always be careful to initialize semaphores. Unitialized semaphores cause programs to react unpredictibly in much the same way as uninitalized variables -- execept perhaps even more unpredictably.
In this case, we will use binary semaphores like a mutex. Notice that one is acquired and released inside of the writer to ensure that only one writer thread can be active at the same time. Notice also that another binary mutex is used within the reader to prvent multiple readers from changing the rd_count variable at the same time.
A counting semaphore is used to keep track of the number of readers. Only when the number of readers is avalable can any writers occur -- otherwise there is an outstanding P() on the writing semaphore. This outstanding P() is matched with a V() operation when the reader thread count is reduced to 0.
Shared:
        semaphore mutex  = 1;
        semaphore writing = 1;
        int rd_count = 0;

Writer:
        while (1)
        {
          P(writing);
            << Perform write >> 
          P(writing);
        }

Reader:
        while (1)
        {
          rd_count++;
          if (1 == rd_count)
          {
            P(writing)
          }
          V(mutex)

          << Perform read >>
 
          P(mutex);
          rd_count--;
          if (0 == read_ct)
          {
            V(writing);
          }
          V(mutex);
        }
Starvation

The solution above allows for the starvation of writers by readers. A similar solution could be implemented that allows the starvation of readers by writers. Consider the illustrations below.

Above: Starved Readers

Above: Starved Writers

Deadlock

Consider the following case:

Thread 1 Thread 2

P(x); P(y);

P(y); P(x);

Thread 1	Thread 2
P(x);	P(y);
P(y);	P(x);

In this case both threads are in some sense correct, but neither thread can execute until the other does. This situation is called deadlock. Semaphores by themselves cannot prevent deadlock.

Student Question: Couldn't we just have one super-thread that could play policeman and prevent this while ensuring mutual exclusion?

Answer: Yes. This is not a bad model. If one thread knew all of the constraints and knew the policy, it could keep the bookkeeping and enforce the policy.

To address this type of higher level problem, we need yet a higher level construct. We'll talk about one such construct next -- the monitor.

Student Question: Are we leading up from the least to the best?

Answer: No. There are trade-offs. The higher level constructs offer increased overhead, decreased flexibility, are harder to construct, and are less frequently available.

Monitors

the idea is to package all of the rules and policies and to encapsulate them into a sigle piece of code. If the monitor says a thread should block, it blocks. If the monitor says it should go, it goes. The threads don't synchronize themselves using semaphores or other primitives. They are left to the control of the monitor.

The original definition of monitors required support from the compiler. The compiler could then statically check to make sure that the constraints of the monitor were enforced. Virtually no current compilers support monitors. Monitors are now typically used via libraries. As a consequence, constraints are now verified at run-time, not compile time.

Monitors were developed in the 1970's. We'll discuss three different flavors: Brinch Hansen ('73), Hoare '74, and Mesa '7x. Please note that "Brinch Hansen" is a last name and that "Mesa" was a programming language developed by Zerox -- among their unmarketed achievements.

MONITOR name
{
  // shared variable declarations

  entry procedureA (args)
  {
    // code
  }

  entry procedureB (args)
  {
    // code
  }

  procedureC (args)
  {
    // code
  }

  // initialization code called only once -- when the monitor is created
}

Notes:

Notice that braces{} define the bounds of the monitor.
The shared variables within the monitor cannot be accessed from outside of the monitor.
The entry procedures are accessible from outside of the monitor, but the non-entry procedures are not.
The non-entry procedures really amount to helpers called by the entry prodedures. The initialization code lies within the monitor.
The proedures within the monitors can have local variables that function as expected.

Condition Variables

Only one proces can be active within the monitor at any time. This is achieved using a special kind of variable called a condition variable. Unlike semaphores, condition variables do not hold a value. They simply provide a mechanism for pairing wait and signal operations.

The notation we'll use doesn't have ()'s. This is because we are following the original notation developed by programming language theorists.

condition x;

x.signal
x.wait

The two operations defined on a condition variable are wait and signal. They are very similar to the P() and V() operations on semaphores, which your book calls wait and signal respectively. x.wait blocks the calling process and x.signal wakes up a blocked process. The big difference isthat condition variables hold no value and have no memory -- they can't count or toggle.

Although we say that condition variables have no state, behind the scenes there is a queue that is used to ensure that at most 1 process is woken up by a signal, and that no waiting process can be starved. If a signal occurs and no process is waiting, the signal is lost.

Student Question: If several processes are waiting on the same condition variable, which one or how many are awakened?

Answer: There can be 1, 10, or 10,000 waiting. Which one is awakened? There are no guarantees, except that there is bounded wait. Most implementations use a simple FIFO queue -- but no guarantees.

How are condition variables different from semaphores?

Condition variables have no value. Semaphores hold an integer or boolean value.
Condition variables have no memor. If a signal is sent, but there is no waiting process, the signal is lost. A counting semaphore can keep track of this state.
With condition variables, wait always waits. With semaphores, P() only waits if the value is 0.

A condition variable inside the monitor ensures that only one process is active inside of the monitor.A process checks to see if another is inside of the monitor. If not, it goes in. Otherwise it performs a wait on the condition variable. Eventually, an exiting process will signal it, allowing it to enter the monitor. Testing the monitor's state and waiting on the condition variable are ensured to be atomic by the monitor construct.

Question: Can condition variables occur only inside of monitors?

Answer: Traditionally, yes. The reason is that, unlike a semaphore, the wait operation on a condition variable always waits, so the test must happen first. If a cndition variable were to occur outside of a monitor, there would need to be a mechanims for ensuring that these two operations are atomic. Otherwise a signal could occur between the test and the wait. This is called a lost signal, because the waiting process will never receive it. The process blocked after the signal was received, because the condition it previously tested had changed.

Back To Monitors

What does it mean for a process to "be active within a monitor?" There are three different answers to this question. Consider the case:

P1: x.wait
P2: x.signal

Which process should be active in the monitor after the signal?

Hoare Monitors

Hoare was a mathamatician and wanted to be able to prove the correctness of programs. He selected semantics that simplified this type of proof. As a consequence, Hoare answered this question like this: P2 blocks and P1 becomes active. This is good for proofs, because the conditions that were true at the time of the signal remain true when the signalled process becomes active. P2 resumes later. This is acieved by placing P2 into a signal queue. When a process exits the monitor, the exiting process will signal the signal queue, if there are any blocked processes. If not, it will signal the entry queue. Your book calls this the, "First apporach."
Question: Does that imply that the signal queue is a stack?

Answer: No. If the signal queue were a stack, bounded wait would be violated. Bounded wait must be guaranteed, so the simple implementation would be a FIFO queue.
Another implementation is the Mesa Monitor. The Mesa monitor originated in the Mesa programming language. It does exactly the reverse when a sginal occurs. If a blocked process within the monitor is signaled, it moves from a queue of blocked processes back to the entry queue. it is then readmitted into the monitor later. The signalling process continues in the monitor until it exits the monitor.
Student Question: What happens if P2 waits?

Answer: If P2 exits or waits, another process is admitted from the entry queue into the monitor.

Student Question: What is the purpose of a signal, if the signalling process keps running?

Answer: It moves the blocked process into the entry queue so it can reenter the monitor.

The last flavor of monitor that we'll discuss is the Brinch Hansen monitor. It incorporates only the union of mesa and Hoare semantics. The Brinch Hansen monitor only allows a process or thread to signal upon exit from the monitor. At that point, the signaled process or thread can run; the signalling process has already left the monitor.