February 8, 2007 (Lecture 11)

February 8, 2007 (Lecture 11)

Reading

Coulouris, et al: 11.2
Chow and Johnson: 10.1
Chow and Johnson: 10.2

Timestamp Approach (Ricarti and Agrawala)

The Lamport approach described above was improved by Ricarti and Agrawala. Ricarti and Agrawala observed that the REPLY and RELEASE messages could be combined. This is achieved by having the process that is currently within the critical section delay its REPLY until it exists the critical section. In order to do this, each process must queue REQUESTs while within the critical section.
In many respect, this change converts this approach from a "global queue" approach to a "voting" approach. A node requests entry to the critical section and enters the critical section as soon as it has received an OK (REPLY) vote from every other node.
The details of this approach follow:

Requestor Request

Build a message
Send message to all participants

Paritipants

If not in CS and don't want in, reply OK
If in CS, enqueue request
If not in CS, but want into the CS, and the requestor's time is lower, reply OK (messages crossed, requestor was first)
If not in CS, but want into the CS, and the requestor's time is greater, enqueue request (messages crossed, participant was first)

Exit

On exit from CS, reply OK to everyone on queue (and dequeue each)

Requestor Entry

Once received OK from everyone, enter CS

This approach requires 2*(n - 1) messages, that is one message to and from everyone except self. This is an (n - 1) improvement over Lamport's approach.
But it fails to address the more serious limitation -- fault tolerance. Even a single failure can disable the entire system. Both timestamp approaches require more messages than a centralized approach -- and have lower fault tolerance. The centralized approach provides one single point of failure (SPF). These timestamp approaches have N SPFs.
In truth, it is doubful that we would every want to use either approach. In practice, centralized coordinators and ring approaches are the workhorses. Centralized coordinators can be made more fault tolerant using coordinator election (comming soon).
But these timestamp approaches are the most distributed -- they involve every host in every decision. They also illustrate some important examples of global state, logical time, &c -- and so they are a valuable part of this (and any) distributed systems course.

Mutual Exclusion: Voting Districts

In order to address to reduce the number of messages required to win an election we are going to organize the participating systems into voting districts called coteries (pronounced, "koh-tarz" or "koh-tErz"), such that winning an election within a single district implies winning the election across all districts.

Coteries is a political term that suggests a closed, somewhat intimate, and conspiring collection of actors (persons, states, trade organizations, unions, &c), e.g. a "Boy's Club".

This can be accomplished by requiring that elections within any district be won by unanimous vote and then Gerrymandering each processor's district to ensure that all districts intersect. Since the subset of processors that are members of more than one district can't vote twice, they ensure that only one of the districts can gain a unanimous vote.

Gerrymandering is a term that was coined by Federalists in the Massachusetts election of 1812. Governor Elbridge Gerry, a Republican, won a very narrow victory over his Federalist rival in the election of 1810. In order to improve their party's chances in the election of 1812, he and his Republican conspirators in the legislator redrew the electoral districts in an attempt to concentrate much of the Federalist vote into very few districts, while creating narrow, but majority, Republican support in the others.
The resulting districts were very irregular in shape. One Federalist commented that one among the new districts looked like a salamander. Another among his cohorts corrected him and declared that it was, in fact, a "Gerrymander." The term Gerrymandering, used to describe the process of contriving political districts to affect the outcome of an election, was born.
Incidentally, it didn't work and the Republicans lost the election. He was subsequently appointed as Vice-President of the U.S. He served in that role for two years. Since that time both federal law and judge-made law have made Gerrymandering illegal.

The method of Gerrymandering disticts that we'll study was developed by Maekawa and published in 1985. Using this method, processor's are organized into a grid. Each processor's voting district contains all processors on the same row as the processor and all processors on the same column. That is to say that the voting district of a particular processor are all of those systems that form a perpendicular cross through the processor within the grid. Given N nodes, 2*SQRT(n) - 1 nodes will compose each voting district.
Using this approach, any pair of voting districts will intersect via at least one node, so two disticts cannot be one unanimously at the same time.

The voting district of processor 7
Here's what a node does, if it wants to enter the critical section:

Send a REQUEST to every member of its district
Wait until every member of its district votes YES
Enter the critical section
Upon exit from the CS, send RELEASE to each member of its district.

If a node gets a REQUEST, it does the following:

If it has already voted in an outstanding election (it voted, but hasn't received a corresponding RELEASE), enqueue the request.
Otherwise send YES

If a node gets a RELEASE:

Dequeue oldest request from its queue, if any. Send a YES vote to this node, if any.

As we saw with simple majority voting last class, this approach can deadlock if requests arrive in a different order at different voters. This can allow different voters within overlapping districts to vote for different candidates. In particular, it can allow for a "split" between the two voters that are the overlap between two districts.
Fortunately, we can use the same approach we discussed last class to recover from this situation if it becomes problematic:

A node records Lamport time w/total ordering before it sends a request. It sends this time with the request to all members of its district (the same time).
Each voter uses a priority queue based on the time of the request.
If a node receives a request with a time-stamp more older than the timestamp of a request for which it already voted, but for which it has not received a RELEASE, it attempts to cancel its vote. It does this by sending the candidate an INQUIRE.
If this node hasn't won the election, it forgets about our vote and sends us a RELINQUISH. Once we receive the RELINQUISEH, we vote for the older request and enqueue the candidate for which we originally voted.
If the candidate was already in the CS, no harm was done -- deadlock did not actually occur. When it goes out, we can vote for the other candidate. In this case, the processors may not have entered the CS in FIFO order, but that's okay -- deadlock didn't happen.

This approach requires about 3*(2*SQRT(N)-1) messages -- much nicer than 3*N messages. But it is not very fault tolerant, since a unanimous victory is required within a district. (Some failure can be tolerated, since failures outside of a district don't affect a node).