April 1, 2005 (Lecture 30)

April 1, 2005 (Lecture 30)

Introduction to Replication, Migration, and Caching

The performance of DSMs can be drastically improved by paying careful attention to the location of information. Since DSM systems are typically implemented over some type of commodity network, they often conform to a Non-Uniform Memory Access (NUMA) model. This means that the way a processor accesses the memory in one portion of its address space may be different than the way it accesses another portion of its address space. Some of the elements may be nearby and readily accessible, whereas others might be farther way, or otherwise not as readily accessible.
By keeping needed items close by (in terms of access time), the performance of DSMs can be drastically improved. To do this, DSMs often make use of caches and other replicas, as well as migration. Migration is the movement of an element of memory from one host to another.

Snoopy Caches: Not So Simple Any More

Multiprocessor systems usually have a big "leg up" over distributed systems, because many are designed such that the processors operate over a common bus -- not a collection of networks. This means that all memory access cross this common bus. This allows for the design of snoopy caches. Snoopy caches listen to the bus. Every time they hear a write operation for something in their cache, they can automatically invalidate the line in their cache. For this reason, multiprocessor systems with a common bus can prevent stale cache data with a very low cost.
This approach is problematic for a distributed system at best -- and not workable in practice. It could only work if there was specialized support for it within the network infastrcuture that allowed for this type of snooping. It could be done if all of the processors were, for example, on the same ethernet. But, if hosts on many networks compose the distributed system, this becomes impossible. It could also be done using a multicast of broadcase, if supported by the network or software, but this would need to be reliable -- and therefore expensive. As a whole, snoopy caches are rarely used in DSM systems. Instead, we must find other ways of keeping caches form becoming stale.

Write-Update vs. Write-Invalidate

If we choose to allow hosts to keep local copies of data that they don't own, we need to ensure that this data does not become stale. There are two ways of achieving this. The first technique is to ensure that it is always up-to-date, this is type of technique is known as a write-update approach. Another approach is to eliminate stale data. This is approach is known as write-invalidate.
Write update policies, while nice, are quite expensive. There are several problems. The first is determining exactly what is changing. The second is determining who has it. And the third is sending the updates.
In order to properly understand the difficulty of knowing what has changed, we need to ask ourselves, "What is the granularity of sharing?" and "What does the operating system track?". Let's assume that the unit of sharing is a whole page. Now, if only a few bytes within a page change, we don't want to send the whole page -- we want to update only those few bytes. Now, think back to operating systems -- does that sound fun? We've got to make each page read-only. Enter the fault handler, make a copy, make the change, take a "diff", format it as a message, and send it out as an update -- ouch! This could happen with every write -- think about initializing an array. Consider the alternative -- sending the whole page each time a single byte changes -- also ouch!
With respect to the first problem, I think we can call the second and third problems "boring accounting". We need to track our replicas, and send them the messages. We assume that hosts that have received pages have them, until they tell us otherwise.
If we take a write-invalidate approach, we can punt on the first problem, leaving us with only the "accounting". Instead of figuring out exactly what has changed, we can just invalidate the whole page -- without wasting the time to resend the whole thing. If the host needs it again, it'll ask. In general, this is a much, much better approach, because the cost is very low since an invalidate message is small and the savings are potentially big -- a whole page with each byte written. Locality becomes less of a factor, if each byte changed forces a new page to be sent over the network. Of course, the balance could tip if locality were particular strong at the non-authoritative copies and the network was cheap, fast, and reliable.

Migration

Sometimes it is the case that a page is needed on one host, then another host, and then another host. If this is the case, it makes sense to migrate or move the page to the host that is using it, instead of constantly sending updates over the network as would be required if the active host had only a readable replica (as is the case if it isn't the owner).
Migration brings with it a good deal of complexity. Not only does the page itself need to move -- this is the point of the exercise, but everyone needs to know that it has moved. One approach to this problem is to use a directory to keep track of the objects and query this directory for the location of each object. But this directory will likely become a hot spot requiring replication. Once that happens, the directory itself becomes a "home improvement project" requiring consistency control.
Another approach is to keep the last-known location of an object on each host. Then, each time an object moves, the old location can point to the new location. These pointers can form chains so that even hosts that are far behind the times can reach objects. The chaining systems can be optimized a great deal. Think back to our discussion of mutual exclusion. Remember path compression? This approach was actually designed to manage migrating objects in a virtual memory system. What we called a "token" was actually the page. Hosts searched the chains and added themselves when they wanted to migrate the page to themselves. One could also imagine approaches based on broadcast messages, trees, or simple linked lists.
The real cost of managing migration isn't the cost of moving the page of data -- this is most likely done anyway. The real cost is keeping track of the location of the migratable objects.

The Balancing Act

We have disscussed three different approaches to managing distributed objects:

remote access
replication and accessing a replica
migrtation and migration management

And each of these approaches can be applied to reads, writes, both, or some of each. Systems have policies to decide which to do and when. Most of the time, these are simple policies, such as the following (there are certainly more permutations):

Read-remote/write-remote
Read-migrate/write-migrate
Read-replicate/write-migrate
Read-replicate/write-replicate

In class we discussed the good, bad, and ugly of these and several other cases. But, the bottom line form this discussion is that the right decision is based on the characteristics of the particular configuration and the workload.
When writes are extremely rare, replication is often favored. If there are many writes and they pass temporally from host to host, migration may be favored. Migration might also be favored, if reads pass temporally from host to host, but writes are somewhat arbitrary. If writes are super-common and not temporally local to one host or another, remote access may be favored.