February 4, 2004 (Lecture 11)

February 4, 2004 (Lecture 11)

Reading

Coulouris, et al: 11.2
Chow and Johnson: 10.1

Mutual Exclusion: Voting

Last class we discussed the Ricarti and Agrawala approach to ensuring mutual exclusion. It was much like asking hosts to vote about who can enter the critical section and allowing access only upon unanimous consent. But is unanimous consent necessary? Can't we get away with a simple majority since two hosts can't concurrently win a majority of the votes.
In a simple form, it might operate similarly to a democratic election:
When entry into the critical section is desired:

Ask permission from all other participants via a multicast, broadcast, or collection of individual messages
Wait until more than 50% respond "OK"
Enter the critical section

When a request from another participant to enter the critical section is received:

If you haven't already voted, vote "OK."
Otherwise enqueue the request.

When a participant exits the critical section:

It sends RELEASE to those participants that voted for it.

When a participant receives RELEASE from the elected host:

It dequeues the next request (if any) and votes for it with an "OK."

Ties and Breaking Ties
So far, this approach is looking nice, but it does have a problem: ties. Imagine the case such that no processor gets a majority of the votes. Consider, for example, what would happen if each of three processors got 1/3 of the votes. Ouch!
Ties can, in fact, be broken at a somewhat high cost. If we use Lamport time with total ordering via hostid, no two messages will have concurrent time stamps. Messages that would otherwise be concurrent are ordered by hostid.
Recall that a host votes for a candidate as long as it has no outstanding votes. This becomes problematic if its vote turns out to be premature. This occurs if it votes for one candidate to later receive a request, bearing an earlier timestamp, from another candidate.
At this point, one of two things might be occuring. The system might be making progress -- the "wrong" host might have gotten more than 50% of the votes. If this is the case, we don't care. It might not be fair, but it is an edge case.
Another possibility is that no host has yet received a majority of the votes. If this is the case, it could be because of deadlock. It might be that each candidate got the same number of votes. This is the case that requires mitigation.
So, upon discovering that it voted for the "wrong" candidate, a host needs to determine which of these two situations is the case. It sends an INQUIRE message to the candidate for who it voted. If this candidate won the election, it can just ignore the INQUIRE and RELEASE normally when done. But, if it hasn't yet entered the critical section, it gives back the vote and signals this by sending back a RELINQUISH. Upon receipt of the RELINQUISH, the voter is free to vote for the preceding request.

Analysis and Looking Forward
This approach certain has some nice attributes. It does, in fact, guarantee mutual exclusion. And, it can allow a host to enter the critical section even if 1/2 of the hosts are down or unreachable.
But it has non-trivial costs. Nominally, it takes 3 messages per entry to the critical section (request, vote, release), about the same as a timestamp approach. And, in the event that votes arrive in exactly the wrong order, an INQUIRE-RELINQUISH pair of messages can occur for each host.
What we need is a way to reduce the number of hosts involved in making decisions. This way, fewer hosts need to vote, and fewer hosts need to reorganize thier votes in the event of a misvote. Next class, we'll discuss how to do this using voting districts.

Mutual Exclusion: Voting Districts
In order to address to reduce the number of messages required to win an election we are going to organize the participating systems into voting districts called coteries (pronounced, "koh-tarz" or "koh-tErz"), such that winning an election within a single district implies winning the election across all districts.

Coteries is a political term that suggests a closed, somewhat intimate, and conspiring collection of actors (persons, states, trade organizations, unions, &c), e.g. a "Boy's Club".

This can be accomplished by requiring that elections within any district be won by unanimous vote and then Gerrymandering each processor's district to ensure that all districts intersect. Since the subset of processors that are members of more than one district can't vote twice, they ensure that only one of the districts can gain a unanimous vote.

Gerrymandering is a term that was coined by Federalists in the Massachusetts election of 1812. Governor Elbridge Gerry, a Republican, won a very narrow victory over his Federalist rival in the election of 1810. In order to improve their party's chances in the election of 1812, he and his Republican conspirators in the legislator redrew the electoral districts in an attempt to concentrate much of the Federalist vote into very few districts, while creating narrow, but majority, Republican support in the others.
The resulting districts were very irregular in shape. One Federalist commented that one among the new districts looked like a salamander. Another among his cohorts corrected him and declared that it was, in fact, a "Gerrymander." The term Gerrymandering, used to describe the process of contriving political districts to affect the outcome of an election, was born.
Incidentally, it didn't work and the Republicans lost the election. He was subsequently appointed as Vice-President of the U.S. He served in that role for two years. Since that time both federal law and judge-made law have made Gerrymandering illegal.

The method of Gerrymandering disticts that we'll study was developed by Maekawa and published in 1985. Using this method, processor's are organized into a grid. Each processor's voting district contains all processors on the same row as the processor and all processors on the same column. That is to say that the voting district of a particular processor are all of those systems that form a perpendicular cross through the processor within the grid. Given N nodes, 2*SQRT(n) - 1 nodes will compose each voting district.
Using this approach, any pair of voting districts will intersect via at least one node, so two disticts cannot be one unanimously at the same time.

The voting district of processor 7
Here's what a node does, if it wants to enter the critical section:

Send a REQUEST to every member of its district
Wait until every member of its district votes YES
Enter the critical section
Upon exit from the CS, send RELEASE to each member of its district.

If a node gets a REQUEST, it does the following:

If it has already voted in an outstanding election (it voted, but hasn't received a corresponding RELEASE), enqueue the request.
Otherwise send YES

If a node gets a RELEASE:

Dequeue oldest request from its queue, if any. Send a YES vote to this node, if any.

As we saw with simple majority voting last class, this approach can deadlock if requests arrive in a different order at different voters. This can allow different voters within overlapping districts to vote for different candidates. In particular, it can allow for a "split" between the two voters that are the overlap between two districts.
Fortunately, we can use the same approach we discussed last class to recover from this situation if it becomes problematic:

A node records Lamport time w/total ordering before it sends a request. It sends this time with the request to all members of its district (the same time).
Each voter uses a priority queue based on the time of the request.
If a node receives a request with a time-stamp more older than the timestamp of a request for which it already voted, but for which it has not received a RELEASE, it attempts to cancel its vote. It does this by sending the candidate an INQUIRE.
If this node hasn't won the election, it forgets about our vote and sends us a RELINQUISH. Once we receive the RELINQUISEH, we vote for the older request and enqueue the candidate for which we originally voted.
If the candidate was already in the CS, no harm was done -- deadlock did not actually occur. When it goes out, we can vote for the other candidate. In this case, the processors may not have entered the CS in FIFO order, but that's okay -- deadlock didn't happen.

This approach requires about 3*(2*SQRT(N)-1) messages -- much nicer than 3*N messages. But it is not very fault tolerant, since a unanimous victory is required within a district. (Some failure can be tolerated, since failures outside of a district don't affect a node).