This time I’m ready to write a new series about concerns in distributed systems. The rhythm will not be too tight, plan two weeks more.


This article is the first in this series. Start with what is generally considered the most important aspect of distributed systems: data consistency. Content suitable for >=0 years technical related experience.





Why is a distributed system needed?

Anything can be continuously used and developed, must have its value, distributed system is the same. I think the main purpose of distributed systems is “fast” and “massive”. This “kuai” can be divided into two aspects:

The first is the fast processing speed of the system.

The second is the speed of development (short duration).

The essence of the two points is the same. An action or a thing is divided into two or more parts to be carried out at the same time, so that the overall time is shortened. For example, it would have taken two minutes to do a single task. So I hire two people to do part of it for me, and ideally it can be done in a minute.

Of course, the second of these two aspects can be overcome in a sense, but the first is insurmountable. Because there is no single program or computer with infinite performance, if there were, distributed systems would not be as common as they are now (many times the problems that money can solve are not problems).

“Mass” is because there are no infinite hard drives, so we need to store data on different hard drives to meet demand. These hard drives may be on different hosts, in different machines, in different regions, and in the future, on different planets.


Second, side effects of distributed systems

Everything is a combination of contradictions and unity, with two sides. While distributed systems bring the aforementioned benefits, they also bring what is generally considered the biggest problem in the industry — data consistency.

Systems are for people to use, and the concepts that constitute usage scenarios are called businesses. Business is the core, for a system, business development is ultimately built on data. I can tolerate slowness, downtime, and complexity, but the only thing I can’t tolerate are data problems, data errors, data inconsistencies, and so on.

Distributed means divide and conquer and cooperate, one person is responsible for one part of a thing. Life is also full of such examples, take holding a Party for example: some people prepare the food, some people prepare the drinks, and some people prepare the setting. Everyone can do all of these things at the same time, but if any of them fail, or don’t fit in with the theme of the Party, it’s a failure. (I don’t know why, but I have a picture of a press conference with cheers and a goblet of erguotou.) .

Here’s another example of a program in an e-mart:




The four operations here, in terms of their goals, it doesn’t really matter what order they’re in, what matters is that they all succeed or they all fail, and any one of them is inconsistent and you’re going to have a problem. In essence, this problem is similar to the communication problem between people. I wrote an article about communication problem before. If you are interested, you can read it further: “Let’s briefly talk about communication efficiency”. The only difference with communication is that it is not necessary for the program to get a response. When something is broken up into 100 parts, it’s scary, and from a probability standpoint, the odds of reaching a consensus are 2/5050.

The program examples here are not rigorous, because in a real distributed system consistency issues are more complicated than this because there are “read” operations in addition to “write” operations, which will be explained in more detail later.


Iii. Reasons for data inconsistency

So what causes the data inconsistency? One is a programming problem, or the code was written incorrectly. This is easy to understand, and it’s easy to come up with a solution. Do more testing to verify that it works as expected. Common unit testing, interface testing, automated testing, integration testing, etc., are all cost-effective ways to reduce bugs to infinitely close to zero, which makes the position of “test engineer” more useful.

However, assuming there are really no bugs, there will still be data inconsistencies because the software runs on the hardware, so there are hardware factors. And for most of us here, we have less control over hardware than software. Among them, the most serious is the network problem, the network is a larger and more complex organization than others, and the uncertainty will be more serious with the larger scope of LAN, WAN and so on. Imagine that each host is just a tiny connection point in a larger network, and the more links it hosts, the more likely it is to have problems.

Some friends may have a question, other like hard disk, power failure, what also have the possibility of problems, why the network problem is the most serious? In fact, the hard drive, power supply is like a part of your body, such as hands and feet. The Internet is a communication channel between people, such as mobile phone calls. Although you do not hang up the phone actively, there are many possibilities to interrupt the whole call process, whether it is the subjective will of the other party, the signal is not good, or even intercepted by a third party. I believe we can also recognize that the probability of abnormal phone calls is much higher than their hands and feet.

In reality, the characteristics of the network, often encountered problems such as: delay, packet loss, disorder and other problems. In order to solve these problems, many theories and solutions have been developed in the decades since the Internet first appeared in 1969 (when the US military connected four universities through the Internet under an ARPA agreement), which will be reviewed in subsequent articles. In this paper, we first analyze what is consistent.


Fourth, detail consistency

First of all, what does it mean to reach an agreement? It’s easy to say:

The same thing is exactly the same at any place at any time.

Like a football game. The message is the same whether we see the ball passing from player A to player B on the field or in front of the television. But strictly speaking, this is not really consistent, because the TV needs to receive this information through satellite signals, Internet transmission, etc., we will see it later than the people in the field. Even the person at the scene could theoretically see information with a lag based on where he was, but because the speed of light is so fast that within a few hundred meters the delay is too small to be felt at all.

It can be concluded that there is no true agreement when considering the time dimension.

In distributed systems, there is no need to achieve true uniformity. Because the closer to uniformity, the system is equivalent to becoming a single unit again, can only do one thing at a time, completely losing the advantage of “fast”, one of the two purposes of distributed systems. Therefore, there are many consistent variations, which are suitable for different scenarios. For the sake of understanding, let’s go from low to high severity.

Most of the time, in order to be as “fast” as possible, most of the schemes used in the system are called final consistency, which tolerates inconsistencies under certain conditions, ensures local consistency first, and then achieves global consistency through a series of complex state synchronizations. Finally, there are many branches that can be implemented. Here are some common ones:

■ Consistency of causality: only the order of operations with causality is required to be guaranteed. For example, the reply function of moments. “Have you eaten yet?” Definitely before you say yes.

■ Read your writing consistently: The text is awkward, but easy to explain. For example, if you reply a sentence in moments, other friends don’t have to see your reply right away, but you have to see it right away, otherwise where does the reply go?

■ Conversation consistency: A conversation with a person can be interpreted as a conversation. Although chat also has a certain causal relationship, but most of the scenes are more logical sequence relationship. For example, if you say something, break it down into three pieces of information: First… And then… And finally… . If consistency is not guaranteed here then it may become: eventually… , first of all,… And then… .

More stringent than local consistency is global order consistency [appendix 1,1979], which ensures that all processes see the same global order of execution and that each process itself executes in the same order as it actually occurs. In a soccer game like the one mentioned above, if what actually happened was that ① Messi passed the ball to Ronaldo, ② Ronaldo passed it back to Messi, then everyone would see it in the same order. Even if the audience has already seen it, but we haven’t seen it yet, it doesn’t matter, the sequence of events is the same for the whole world.

To be more stringent, a relative time consistency requirement is added to the global sequential consistency, which is known in the industry as linear consistency [appendix 2,1990]. To use the analogy of messi and Cristiano Ronaldo passing the ball to each other, messi passes the ball to Cristiano Ronaldo and the whole stadium “pauses” until the message is received by all the people watching the match before Cristiano can make the next pass back. There needs to be a God (global clock) to “pause”. This is as far as we can realistically go, and the most famous system that meets this requirement is Google’s Spanner.

The different levels of conformance are summarized as follows:





Five, the conclusion

This article to this bar, originally also want to write once and for all, found that the content is too much, tens of thousands of words of the article, it is estimated that a lot of people have no courage to look down.

How to solve the problem of consistency



Appendix:

[1] How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs,Leslie Lamport,1979. Link: research.microsoft.com/en-us/um/pe…

[2] Linearizability: A Jackety Condition for Concurrent Objects,Maurice P. Herlihy,Jeannette M. Wing, 1990. Link: cs.brown.edu/~mph/Herlih…



Author: Zachary_Fan

Source: www.cnblogs.com/Zachary-Fan…



If you think the answer is good, click on a [recommendation] bar ~

Welcome to scan the following QR code and follow the public account: Cross-border architect, the first time to understand the author’s thinking.

The content includes: architecture design, distributed system, product, operation, some deep thinking.