The original address: www.inlighting.org/archives/la…

Timestamp (Lamport Timestamp)

Lamport Timestamp is a method of measuring time and causality. In real life, many programs have A causality. For example, event B can be executed only after event A has been executed.

int main {
  create_photos(6);
  view_photos(6);
  return 0;
}
Copy the code

For example, in the code above, I can’t access photo 6 until I’ve created photo 6, so that’s cause and effect. But in distributed systems, how do we measure the causal order of events?

As shown in the figure below, for example, I have three machines: A, B and log Server. The user initiates A purchase operation. The first request creates an order on Server A, and then the payment operation is carried out on Server B. Server A and Server B then asynchronously send log write requests to the log Server in order to record user operations. Assuming that two logs arrive at the same time, how should the log server prioritize them?

The first thing we thought of was to use timestamp to judge the sequence of events according to the size of the timestamp. However, in a distributed system, different machines may have different times, which leads to errors in this approach. You might say that you want your machines to synchronize their NTP time regularly, but in a cluster, the internal time calculations of different machines may also cause errors. Some machines may advance faster than others, and some machines may Drift slowly.

Logic Clock

Therefore, we need to introduce the time above the Logic Clock. The most famous one in the Logic Clock is Lamport Timestamp. Through logical time, we can determine the causal order of different events.

Algorithm implementation

The implementation of Lamport Timestamp algorithm follows the following rules:

  • Each machine has an internal Timestamp (Timestamp) with an initial value of 0.
  • Timestamp + 1 after a machine executes an event.
  • When a machine sends a message to another machine, it will attach its own timestamp, such as

    .
    ,>
  • When the machine receives a message, it compares the local timestamp with the message timestamp, selects the larger timestamp and +1.

Some will say counter is called counter, some will say timestamp, but they all represent a way of counting.

Timestamp Lamport Timestamp

Send a message

void sendMessage(a) {
  do_one_event();
  timestamp = timestamp + 1;
  send(message, timestamp);
}
Copy the code

Receive information

void receiveMessage(a) {
  (message, remote_timestamp) = receive();
  counter = max(timestamp, remote_timestamp) + 1;
}
Copy the code

thinking

First, the logical timestamp representation of an event A is defined as. All timestamps below are logical timestamps, not machine time.

If event A occurs before event B, (calledhappened-beforeDenoted by), the timestamp of event A must be smaller than that of event B. It is expressed as:

But it’s important to note that the derivation cannot be reversed.

Such as aboveEvent 3 andIn event 1, thoughBut there is no causal relationship between them; they are concurrent (or independent, independent).

We can also see ifThen event B can never have happened before event A.

Of course, if event A and event B have the same timestamp, they are parallel or independent, i.e

You can refer to the following figure for further understanding.

Vector Clock

But Lamport Timestamp doesn’t work well with distributed systems, where you can’t tell whether two events are related or not, or in a multi-read key-value database, where you can’t determine which copy to keep (usually the most recent copy).

As shown in the figure below, you can’t tell whether event 3 in Process 1 is related to event 8 in Process 2 simply by comparing the logic clock.

If the execution time of event 3 is delayed by a few seconds, this will not affect event 8. Therefore, the two events do not interfere with each other. To determine whether this is the case, we introduce Vector Clock.

Algorithm implementation

A Vector Clock is an array of Logic clocks. Each element in a Vector Clock is a Logic Clock. In the figure above, we have three machines, so the Vector Clock contains three elements, and the index of each element points to the index of its own machine. We follow the following rules:

  • Each machine initializes all timestamp to 0. For example, in the above example, the initial Vector Clock for each machine is[0, 0, 0]
  • When the machine processes an Event, it will put timestamp + 1 of the same element as its index in the vector. For example, if machine 1 processes an event, then machine 1’s Vector Clock becomes(1, 0, 0]
  • Every time a message is sent, it will put its own timestamp + 1 in the vector and send it with the message. Such as < message, vector >.
  • When a machine receives a message, it compares its Vector Clock to the one in the message (timestamp by timestamp), And update the timestamp to the larger one (similar to the operation of Lamport Timestamp). Then it will add 1 to the timestamp representing itself.

As shown in the figure below:

And from that we know that if, thenEach element in the Vector Clock of is less thanEvery element in.

Determine whether two events are parallel or independent

Going back to determining whether the two events are related, we can see [3, 0, 0] in Process 1 and [2, 4, 2] in Process 2, where 3 > 2 and 0 < 4, 0 < 2. Some timestamp is larger than the other timestamp, and some timestamp is smaller than the other timestamp, from which we can know that these two events are unrelated.

K/V database application

A traditional distributed database has a leader who receives a write request and synchronizes the written data to more than half of the nodes in the cluster. After more than half of the nodes report a write success, the leader returns a write success message to the client, similar to Raft, which is a single point write.

If we introduce the Vector Clock, we can implement multipoint writes, as shown in Dynamo’s paper.

Suppose there are three copies of Key K, K1, K2, k3 (currently identical), which are located on three servers, M1, M2, M3. Due to some fault, the network partition is caused. None of the three machines can communicate with each other, but each machine can still communicate with the client.

Where k1 copy is continuously updated by Client 1 and K2 copy is continuously updated by Client 2. When communication between the three machines resumes, which version should be retained for copy synchronization? If only K2 is retained, that is, the last write win mechanism is used, then client 1 will find that its write data is lost after synchronization.

This is where we need the Vector Clock, or more specifically, the Version Clock (as described in Dynamo).

To handle this scenario, Dynamo uses Version Clock to capture the causality between different versions of the same data (Object). Each Version of an Object will have an associated Version Clock, shaped like [(serverA, counter), (serverB, counter)…] By checking the Version clocks of different versions of the same Object, you can determine whether you can completely discard one Version and keep only the other Version, or whether you need to merge the two versions. If every item (server, counter) in the Version Clock of Version A of Object has an item in the Version Clock of Version B, and the counter is less than or equal to the counter of the corresponding item in Version B, The version A of this Object can be discarded; otherwise, merge the two versions.

Going back to the previous example, k1 is updated, Version Clock is [(M1, 1), (M2, 0), (M3, 0)], k2 is updated, Version Clock is [(M1, 0), (M2,1), (M3, 0)], then the network of K1 and K2 is connected, and they find that there is a conflict between the two Version clocks by comparing them. In the case that each Logic Clock of the peer is smaller than its own Logic Clock, both versions are reserved. When the client reads Key=K, the data of the two versions and the corresponding Version Clock are returned to the client, and the client merges the two versions. When the client writes Key K after the collision merge, it sends the merged Version Clock [(M1, 1), (M2, 1), (M3, 0)] to the M1 and M2 machines to overwrite the server Version, and the conflict is resolved.

The Vector Clock defects

System Scale defects

In fact, Vector Clock does not support scaling very well, because as the number of servers for a key increases, so does the number of elements in a Vector Clock. Assuming a cluster of 1000 machines, we have to carry a Vector of 1000 lengths each time we pass information, which is not very acceptable.

Non-unique defect

Under a normal system, all messages are assumed to be ordered (that is, the same machine sends messages 1 and 2 to another machine, and the other machine receives message 1 first and then message 2). We can then restore the computation relationship between each machine based on its Vector Clock, that is, each computation has its own unique Vector Clock.

However, if the messages are not in order and overtaking occurs between them, the problem can arise, as shown below:

If you look at the left and right, they’re different ways of doing it, but in the endThe same Vector Clock is generated above. In other words, the same Vector Clock does not represent unique computation.

In this picture, we can seeNodes in theWe can’t tell thatfromOn the other side of thePass it on orHandled an event by yourself, in yourselfPlus 1 on the base of PI.

Solution 1

Add event types to the Vector Clock. For example, use three events to indicate the Vector Clock: internal, Send, and receive. But there’s a problem with that,

We specify both send and receive events, but the result is different computation resulting in the same Vector Clock.

Solution 2

Change the Vector Clock to include both the time the message was received and the local time. Here’s an example:

Change the left image toBecomes in the figure on the right

In this way, we are able to determine that each computation has a unique Vector Clock. Although this causes the Vector Clock to double in size.

Of course, this method is not completely necessary, because the original Vector Clock is sufficient as long as we can ensure that the messages are sent in an orderly manner, that is, no message Overtaking occurs.

The resources

En.wikipedia.org/wiki/Lampor…

Jameshfisher.com/2017/02/12/…

Towardsdatascience.com/understandi…

www.zhihu.com/question/19…

www.cnblogs.com/foxmailed/p…

Dl.acm.org/doi/10.1016…