January 28, 2005 (Lecture 9)

January 28, 2004 (Lecture 9)

Vector Logical Time

Vector logical time can be used to detect causality violations after-the fact. Let's discuss vector logical time -- and then take a look at how we can detect prior causality violations by comparing the current local time with the timestamp of an incoming message.
As with Lamport logical time each host maintains its own notion of the local time and updates it using the timestamps placed by the sender onto messages. But with vector logical time, the time contains more information -- it contains a vector representing the state of each host. In other words, this vector not only contains the event count for the host, itself, it also contains the last-known event counts on each and every other host.
The only entry in this vector that is guaranteed to be up-to-date is the entry that represents the sender. For this reason, it is possible that the receiver may have a more up-to-date understanding of the logical time on some of the hosts. This would be the case if a message was sent from another host to the sender, but has not been received by the recipient.
As a result, when a hosts receives a message, it merges its time vector and the timestamp sent with the message -- it selects the higher of the values for each element. This ensures that the sender has information that is at least as up-to-date as the receiver.
Below is a summary of the rules for vector logical clocks:

Instead of just keeping our logical time, we keep a vector, V[], such that V[i] represents what we know of the logical time on processor i.
V[our_id] is our logical time
Send V[] vector with each message
On receive, merge both vectors, selecting the greater of the corresponding elements from each. Then increment the component for self. The event is said to have happened at new (incremented) time.
On send, increment time component for self. Send the updated timestamp vector with the message. The event is said to have happened at new (incremented) time.

Recall this example from earlier:

Let's label it in vector time, just for practice:

Comparing Vector Timestamps

When comparing vector timestamps, we compare them by comparing each element in one timestamp to the corresponding element in the other timestamp.

If corresponding elements in two timestamps are identical, the two events are the same event -- timestamps of different events should never be identical.
If Event_A "happens before" Event_B then each element of Event_A's timestamp is less than or equal to the corresponding element in Event_B's timestamp, and at least one element is less than the corresponding element in Event_B's timestamp.
If Event_B "happens before" Event_A then each element of Event_A's timestamp is greater than or equal to the corresponding element in Event_B's timestamp, and at least one element is greater than the corresponding element in Event_B's timestamp.
If two events are concurrent, they will have "mixed" timestamps such that at least one pair of corresponding elements is "greater than" and at least one corresponding pair of elements is "less than."

The above definition of vector timestamp comparison ensures both of the following properties:

Event_A "happens before" Event_B ==> Vector_Timestamp (Event_A) < Vector_Timestamp (Event_B)
Vector_Timestamp (Event_A) < Vector_Timestamp (Event_B) ==> Event_A "happened before" Event_B

Detecting Causality Violations Using Vector Timestamps

We can detect a causality violation using vector timestamps by comparing the timestamp of a newly received message to the local time. If the message's timestamp is less than the local time vector, a (potential) causality violation has occurred.
Why? For the local time to have advanced such that it is ahead of the timestamp of the newly received message, a prior message must have advanced the local time. The sender of that prior message must have gotten the newly arrived message before it sent its prior message to us. Thus a (potential) causality violation occured.
Admittedly, this doesn't fix the problem -- but at least we have a way of detecting and logging the problem. This will make it much easier to isolate and debug or system -- or at least to take mitigating action to ensure that the output from the system is correct.
Now, let's consider the this familiar example again:

This time, let's label it using vector logical time and vector timestamps:
Notice that the timestamp on the M₁ indicates a causality violation. M₁'s timestamp is (1,0,0). The local time on P₂ is (2,0,2). (1,0,0) is less than (2,0,2). This indicates that a causality violation has occured -- someone who had already seen M₁ sent P₂ a message, before P₂ received M₁.
If the timestamps are concurrent, this does not represent a problem -- the messages are unrelated.

Matrix Logical Clocks

Before we leave time to discuss communication, let me mention one more detail. There is actually another type of logical clock that is one step more encompassing than a vector logical clock -- the matrix logical clock. Much like a vector clock maintains the simple logical time for each host, a matrix clock maintains a vector of the vector clocks for each host.
Every time a message is exchanged, the sending host tells us not only what it knows about the global state of time, but what other hosts have told it that they know about the global state of time -- relaible gossip.
This is useful in applications such as checkpointing and recovery, and garbage collection. In these cases, having a lower bound on what another host knows can prove useful by enabling the disposal of unusable objects. In the case of garbage collection -- objects that are no other object can reference. In the case of recovery -- logs and/or checkpoints that are no longer needed.
We'll discuss matrix time in more detail when we discuss checkpointing and recovery -- it is much easier to understand with a clear application.