With the wide application of microservices and distributed systems, CAP theorem has become familiar to everyone and has become the three indicators of distributed systems. In this article we will talk about CAP theorem.

The CAP theorem

In 1998, Eric Brewer, a computer scientist at the University of California, Proposed that distributed systems have three metrics:

  • Consistency.
  • Availability.
  • Partition Tolerance.

Eric Brewer says it is impossible to do all three at once. And then you take the first letter, and you form the CAP theorem.

Let’s take a look at what these three indicators represent.

Consistency: Consistency of read and write data, especially data Consistency in a distributed system. How to understand this sentence?

Suppose we have two instances of G1 and G2, both of which are now v0, and a client sends an update request to G1 to update v0 to v1, as shown in the following figure:

Without any processing, the value for the G1 instance is V1 and the value for G2 is v0. If the client initiates a read request at this time and reads v1 on the G1 instance and v0 on the G2 instance, data inconsistency occurs and data consistency is not satisfied. How to ensure data consistency? G1 needs to send a message to G2 while G1 writes, asking G2 to change to V1 as well.

In this case, the value of v1 for both instances is the same regardless of which server the client reads, which is data consistency. In plain English, the data between multiple instances must be the same. For example, in the master-slave architecture of MySQL, we need to use binlog logs to ensure data consistency between the master and slave servers.

Availability: High Availability of services, especially services in a distributed system. This is easy to understand. I send you a request, and you have to give me the correct response.

Partition Tolerance: A distributed system can respond to user requests despite network partitions. How do you understand that?

In our distributed system, a network of nodes is supposed to be connected. However, due to some faults, some nodes are disconnected, and the whole network is divided into several areas, and the data is scattered in these disconnected areas, which is called partition. Fault-tolerant means that partitions need to be accessible and, in plain English, no single point of failure. In distributed system, network jitter and failure are inevitable, so P must be realized in CAP, and CA is the only choice.

Next, we will take a look at the selection strategy of CAP and its application in open source middleware to deepen our understanding of CAP.

Keep the CP abandon A

When data consistency is required for comparison, some availability can be sacrificed to ensure data consistency, that is, strong consistency. Take the financial industry for example, because it does not allow inconsistent data at any time, otherwise it will cause losses to users. Therefore, CP must be guaranteed in this scenario.

In our open source middleware, ZooKeeper uses the policy of protecting CP and discarding A. Take A look.

In a ZooKeeper cluster, the nodes other than the Leader node are called followers nodes. The Leader node is responsible for processing user write requests:

  • When a user sends a write request to a node, if the requested node happens to be the Leader, the request is processed directly.
  • If a Follower node is requested, the node will forward the request to the Leader, who will first send a Proposal to all followers. After more than half of the nodes agree, the Leader will submit the write operation, thus ensuring strong data consistency.

The specific schematic diagram is as follows:

When network zones occur, if the number of nodes in one zone is more than half of the total number of nodes in the cluster, the zone can elect another Leader to provide services for users. However, the Leader cannot provide services for users until the Leader is elected.

If the number of nodes in any zone is greater than half of the total number of nodes in the cluster, the system cannot provide services for users and can provide services only after the network is restored.

This design ensures data consistency, but sacrifices some availability, such as when the Leader goes down.

AP abandon C

The policy of AP discarding C is a common one. In order to pursue high availability of the system, data inconsistency is allowed temporarily in the case of network jitter, and certain data consistency is sacrificed.

After the emergence of network partitions, data between nodes cannot be synchronized immediately. To ensure high availability, distributed systems need to respond to user requests immediately. However, some nodes may not have the latest data and can only return the old local data to users, resulting in data inconsistency.

For example, our Eureka registry uses this strategy. In a Eureka cluster, when an instance goes down, the entire Eureka registry does not become unavailable, and the active Eureka server can still respond to external requests. When the down server is restarted, the data between eureka instances is inconsistent before the first data synchronization, but after a data synchronization, the data between eureka instances is consistent, which is to sacrifice data consistency to ensure high availability of the system.

The above is CAP theorem, I hope to help you with your study or work. One last thought:

How did you design your company’s distributed architecture?

Welcome to pay attention to the public number [Internet crew head brother]. Pay attention to this Internet programmer, wish you and I progress together, today’s best is tomorrow’s minimum requirements.