preface

Distributed systems are becoming more and more important. Large websites are almost all distributed. The biggest difficulty of distributed system is how to synchronize the state of each node. The CAP theorem is the fundamental theorem in this area and the starting point for understanding distributed systems.

First, three indicators of distributed system

In 1998, Eric Brewer, a computer scientist at the University of California, Proposed that distributed systems have three metrics.

- Consistency means that all nodes return the same copy of the latest data at the same time. - Availability means that a non-error response is returned on each request. Partition fault tolerance means that the communication between servers does not affect the system even if the communication cannot be maintained for a certain period of timeCopy the code

They start with C, A and P. Eric Brewer says it is impossible to do all three at once. This conclusion is called the CAP theorem.

Second, how

Let’s start with Partition tolerance.

Most distributed systems are distributed over multiple subnetworks. Each sub-network is called a partition. Partition fault tolerance means that interval communication may fail. For example, if one server is located in China and the other server is located in the United States, these are two zones and they may not communicate with each other.

In the figure above, G1 and G2 are two servers across regions. G1 sends a message to G2, which may not receive it. Systems must be designed with this in mind. In general, partition fault tolerance is unavoidable, so it can be assumed that P of CAP is always true. The CAP theorem tells us that the rest of C and A can’t be done at the same time.

Third, Consistency

Consistency is called “Consistency” in Chinese. This means that any read operation that follows a write operation must return this value. For example, if a record is v0, the user initiates a write operation to G1 to change it to v1.

Next, the user’s read will get v1. This is called consistency.

The problem is that it is possible for the user to initiate a read operation to G2, and since the value of G2 has not changed, v0 is returned. The results of G1 and G2 read operations are inconsistent, which is inconsistent.

In order for G2 to change to V1, G1 should send a message to G2 asking G2 to change to V1 when G1 writes.

In this case, the user can read to G2 and get v1.

Fourth, the Availability

Availability means that servers must respond to user requests whenever they receive them.

Users can choose to initiate read operations to G1 or G2. No matter which server receives a request, it must tell the user, whether v0 or V1, otherwise availability is not satisfied.

Inconsistency between Consistency and Availability

Why can’t consistency and usability be the same? The simple answer is that communication can fail (i.e., partition tolerance). To ensure G2 consistency, G1 must lock G2’s read and write operations during write operations. Data can be read and written only after data synchronization. During the lock, G2 cannot read or write, and is not available. If the availability of G2 is guaranteed, then G2 cannot be locked, so consistency is not valid.

In summary, G2 cannot be consistent and usable at the same time. Only one target can be selected during system design. If consistency is pursued, then availability of all nodes cannot be guaranteed; If you pursue availability across all nodes, you can’t achieve consistency.

Service discovery and registry requirements for CAP

  1. If consistency is chosen at the expense of availability (CP), then in order to ensure the consistency of data across multiple servers, once one server goes down, all servers need to stop providing external data writing services. The availability of write services is sacrificed while ensuring consistency of data across all servers.
  2. If availability is chosen at the expense of consistency (AP), in order to ensure service continuity, when a server goes down, the surviving server can choose to write data locally and then directly return it to the client, which will result in inconsistent data across multiple servers.

reference

  • Mwhittaker. Making. IO/blog/an_ill…