April 11, 2008 (Lecture 30)

April 11, 2008 (Lecture 30)

Reading

Chow and Johnson: 6.1 - 6.4
Coulouris, et al: 8.1 - 8.6

Coda and Disconnected Operation

Let's talk a little more about how AFS-2 (whole file) is organized. Typically files are collected into volumes which are nothing more than a collection of files grouped together for adminsitrative purposes. A server can hold one or more volumes. Read-only volumes may be replicated, so that a back-up copy is used if the primary copy becomes unreachable. Since writable volumes cannot be replicated write conflicts aren't a problem. Additionally, a client might have a cached copy of certain files.
Now, if a volume server should become unreachable, a client can continue to work out of its cache -- at least until it misses or tries to write a file back to the server. At this point, things break down, either because the client cannot get the files it needs, or because the client cannot inform the server of the changes, so the client cannot close the file -- or the server might violate a callback promise to other clients.
Now, let's hypothesize, for just a moment that we make sure that clients have all of the files that they need, and also allow them to close the file, even if they cannot contact the server to inform it of the change? How could we make this work?
Well, Coda has something called the hoard daemon whose job is to keep certain files in the client's cache, requesting them as necessary, just in case the client should later find itself unable to communicate with the server. The user can explicitly specify which files should be hoarded, or build a list by watching his/her activity using a tool called spy. This is the easy part (well, in some sense).
But what do we do about the delayed writes? When we can eventually tell the server about them, what do we do? Someone else may have found themselves in the same position. In orde to solve this problem, Coda keeps a version number with each file and another associated with each volume. This version number simply tracks the number of times a file has been modified. Before a client writes a file to the server, it checks the version of the file on the server. If that version number matches the version number of the file that the client read before the write, the client is safe and can send the new version of the file. The server can then increment the version number.
If the version number has increased, the client missed a callback promise. It cannot send its version -- it may be older than what is already there. This is called a conflict. Coda allows the user to send the file, but labels it as a conflict. These conflicts typically must be resolved by the user, before the file can be used. The user is basically asked, "Which one is the 'real' one?" The other one can, of course, be saved under a different name. There are some tools that can automatically merge multiple versions of certain files to automatically resolve conficts, but this is only possible for certin, well structured files -- and each one must be handed coded on an application-by-application basis.
Conflicts aren't typically a big problem, because shared files are typically not written by multiple users. As a result, the very small chance of a conflict isn't a bad price to pay for working through a failure.

Coda and Replication

Coda actually implements replication for writeable volumes. The collection of servers that main copies of a particular volume are known as a Volume Storage Group (VSG). Let's talk about how this is implemented. Coda actually doesn't maintain a single version number for each file and volume. It maintains a vector of version numbers known as the Coda Version Vector (CVV) for each file and Volume Version Vector (VVV) for each volume.
This version vector contains one entry for each replica. Each entry is the version number of the file on that replica. In the perfect case, the entry for each replica will be identical. The client actually requests the file in a two-step process.

It asks all replicas for their version number
It then asks the replica with the greatest version number for the file
If the servers don't agree about the files version, the client can direct the servers to update a client that is behind, or inform them of a conflict. CVVs are compared just like vector timestamps. A conflict exists if two CVVs are concurrent, because concurrent vectors indicate that each server involved has seen some changes to the file, but not all changes.

In the perfect case, when the client writes a file, it does it in a multi-step process:

The client sends the file to all servers, along with the original CVV.
Each server increments its entry in the file's CVV and ACKS the client.
The client merges the entries form all of the servers and sends the new CVV back to each server.
If a conflict is detected, the client can inform the servers, so that it can be resolved automatically, or flagged for mitigation by the user.

Given this process, let's consider what happens if one or more servers should fail. In this case, the client cannot contact the server, so it temporarily forgets about it. The collections of volume servers that the client can communicate with is known as the Available Volume Storage Group (AVSG). The AVSG is a subset of the VSG.
In the event that the AVSG is smaller than the VSG, the client does nothing special. It goes throguh the same process as before, but only involves those servers in the AVSG.
Eventually when the partitioned or failed server becomes accessible, it will be added back to the AVSG. At this point, it will be involved in reads and writes. When this happens, the client will begin to notice any writes it has missed, because its CVV will be behind the others in the group. This will be automatically fixed by the a write operation.
Coda clients also periodically poll the members of their VSG. If they find that hosts have appeared that are not currently in their AVSG, they add them. When they add a server in the VSG back to the AVSG, they must compare the VVV's. If the new server's VVV does not match the client's copy of the VVV, thier is a conflict. To force a resolution of this conflict, the client drops all callbacks in the volume. This is because the server had updates while it was disconnected, but the client (because it couldn't talk to the server) missed the callbacks.
Now, let's assume that the network is partitioned. Let's say that half of the network is accessible to one client and the other half to the other client. If these clients play with different files, everything works as it did above. But if they play with the same files, a write-write conflict will occur. The servers in each partition will update their own version numbers, but not the other. For example, we could see the following:
  Initial:     <1,1,1,1>
               <1,1,1,1>
               <1,1,1,1>
               <1,1,1,1>

  --------- Partition 1/2 and 3/4 ----------
  Write 1/2:   <2,2,1,1>
               <2,2,1,1>

  Write 3/4:   <1,1,2,2>
               <1,1,2,2>


  --------- Partition repaired ----------
  Read (ouch!) <1,1,2,2>
               <1,1,2,2>
               <2,2,1,1>
               <2,2,1,1>
  
The next time a client does a read (or a write), it will detect the inconsistency. This inconsistency cannot be resolved automatically and must be repaired by the user. Coda simply flags it and requests the user's involvement before permitting a subsequent access.
Notice that replication does not affect the model we discussed before -- Coda's disconnected mode is not affected. The version vectors work just like the individual version numbers in this respect.
Coda and Weakly Connected Mode
Coda operates under the assumption that servers are busy and clients have plenty of free time. This is the reason that replication and conflict detection are client-driven in Coda. But this assumption does not always hold. It would be prohibitively time consuming to implement client-based replication over a modem line. It might also be prohibitively expensive over a low-bandwith wireless connection.
Coda handles this situation with what is known as weakly connected mode. If the Coda client finds itself with a limited connection to the servers, it picks one of the servers and sends it the update. The server then propagates the change to the other servers, if possible, and the version vectors are updated on the servers, as appropriate. Basically the update generates a server-server version conflict and triggers the server to resolve it.