February 6, 2008 (Lecture 11)

February 6, 2008 (Lecture 11)

Reading

Coulouris, et al: 11.2
Chow and Johnson: 10.1
Chow and Johnson: 10.2

Timestamp Approach (Ricarti and Agrawala)

The Lamport approach described above was improved by Ricarti and Agrawala. Ricarti and Agrawala observed that the REPLY and RELEASE messages could be combined. This is achieved by having the process that is currently within the critical section delay its REPLY until it exists the critical section. In order to do this, each process must queue REQUESTs while within the critical section.
In many respect, this change converts this approach from a "global queue" approach to a "voting" approach. A node requests entry to the critical section and enters the critical section as soon as it has received an OK (REPLY) vote from every other node.
The details of this approach follow:

Requestor Request

Build a message
Send message to all participants

Paritipants

If not in CS and don't want in, reply OK
If in CS, enqueue request
If not in CS, but want into the CS, and the requestor's time is lower, reply OK (messages crossed, requestor was first)
If not in CS, but want into the CS, and the requestor's time is greater, enqueue request (messages crossed, participant was first)

Exit

On exit from CS, reply OK to everyone on queue (and dequeue each)

Requestor Entry

Once received OK from everyone, enter CS

This approach requires 2*(n - 1) messages, that is one message to and from everyone except self. This is an (n - 1) improvement over Lamport's approach.
But it fails to address the more serious limitation -- fault tolerance. Even a single failure can disable the entire system. Both timestamp approaches require more messages than a centralized approach -- and have lower fault tolerance. The centralized approach provides one single point of failure (SPF). These timestamp approaches have N SPFs.
In truth, it is doubful that we would every want to use either approach. In practice, centralized coordinators and ring approaches are the workhorses. Centralized coordinators can be made more fault tolerant using coordinator election (comming soon).
But these timestamp approaches are the most distributed -- they involve every host in every decision. They also illustrate some important examples of global state, logical time, &c -- and so they are a valuable part of this (and any) distributed systems course.

Mutual Exclusion: Voting

Last class we discussed the Ricarti and Agrawala approach to ensuring mutual exclusion. It was much like asking hosts to vote about who can enter the critical section and allowing access only upon unanimous consent. But is unanimous consent necessary? Can't we get away with a simple majority since two hosts can't concurrently win a majority of the votes.
In a simple form, it might operate similarly to a democratic election:
When entry into the critical section is desired:

Ask permission from all other participants via a multicast, broadcast, or collection of individual messages
Wait until more than 50% respond "OK"
Enter the critical section

When a request from another participant to enter the critical section is received:

If you haven't already voted, vote "OK."
Otherwise enqueue the request.

When a participant exits the critical section:

It sends RELEASE to those participants that voted for it.

When a participant receives RELEASE from the elected host:

It dequeues the next request (if any) and votes for it with an "OK."

Ties and Breaking Ties
So far, this approach is looking nice, but it does have a problem: ties. Imagine the case such that no processor gets a majority of the votes. Consider, for example, what would happen if each of three processors got 1/3 of the votes. Ouch!
Ties can, in fact, be broken at a somewhat high cost. If we use Lamport time with total ordering via hostid, no two messages will have concurrent time stamps. Messages that would otherwise be concurrent are ordered by hostid.
Recall that a host votes for a candidate as long as it has no outstanding votes. This becomes problematic if its vote turns out to be premature. This occurs if it votes for one candidate to later receive a request, bearing an earlier timestamp, from another candidate.
At this point, one of two things might be occuring. The system might be making progress -- the "wrong" host might have gotten more than 50% of the votes. If this is the case, we don't care. It might not be fair, but it is an edge case.
Another possibility is that no host has yet received a majority of the votes. If this is the case, it could be because of deadlock. It might be that each candidate got the same number of votes. This is the case that requires mitigation.
So, upon discovering that it voted for the "wrong" candidate, a host needs to determine which of these two situations is the case. It sends an INQUIRE message to the candidate for who it voted. If this candidate won the election, it can just ignore the INQUIRE and RELEASE normally when done. But, if it hasn't yet entered the critical section, it gives back the vote and signals this by sending back a RELINQUISH. Upon receipt of the RELINQUISH, the voter is free to vote for the preceding request.

Analysis and Looking Forward
This approach certain has some nice attributes. It does, in fact, guarantee mutual exclusion. And, it can allow a host to enter the critical section even if 1/2 of the hosts are down or unreachable.
But it has non-trivial costs. Nominally, it takes 3 messages per entry to the critical section (request, vote, release), about the same as a timestamp approach. And, in the event that votes arrive in exactly the wrong order, an INQUIRE-RELINQUISH pair of messages can occur for each host.
What we need is a way to reduce the number of hosts involved in making decisions. This way, fewer hosts need to vote, and fewer hosts need to reorganize thier votes in the event of a misvote.