concept

  1. Consistency Distributed Consistency refers to data Consistency on all nodes at the same time after a successful update operation is returned to the client.

  2. Availability services are always available with normal response times. Good availability means that the system can provide good service for users without user experience such as operation failure or access timeout.

  3. Partition Tolerance refers to the ability of a distributed system to provide services that meet the requirements of consistency or availability even when a node or network Partition fails. Partitioning fault tolerance is required to make the application appear as a functioning whole while being a distributed system. For example, in the current distributed system, if one or several machines break down, the remaining machines can still operate normally to meet the system requirements, and there is no impact on users’ experience.

Why does CAP satisfy only two?

  • As shown in the figure above, we prove the basic scenario of CAP. There are two nodes in the network, N1 and N2, and the network can be connected between them. N1 has an application A and A database V. N2 also has an application B2 and A database V. Now, A and B are two parts of the distributed system, and V are two sub-databases of the distributed system’s data store.

  • When consistency is satisfied, the data in N1 and N2 are the same, V0=V0. When availability is satisfied, users get an immediate response whether they request N1 or N2. If fault tolerance of the partition is met, the normal operation of N1 and N2 will not be affected if either of them is down or the network is disconnected.

  • As shown in the figure above, it is the normal operation process of the distributed system. The user requests data update to N1 machine, and the program A updates the database Vo as V1. The distributed system synchronizes data M, and V1 synchronizes DATA V0 in N2, so that data V0 in N2 is also updated as V1.

  • Here, we can define whether the data between database V of N1 and N2 are the same as consistency; The external response to the request for N1 and N2 is available row; The network environment between N1 and N2 is fault tolerant. This is a normal working scenario and an ideal scenario, but the harsh reality is that when errors occur, can consistency and availability and partition fault tolerance be met at the same time, or is there a trade-off?

  • As a distributed system, the biggest difference between it and a stand-alone system lies in the network. Now suppose an extreme case, the network between N1 and N2 is disconnected. If we want to support such network exceptions, it is equivalent to meeting the fault tolerance of partitions. Or do you have to choose between them?

  • Suppose that when the network is disconnected from N1 and N2, a user sends a data update request to N1, the data V0 in N1 will be updated to V1. Since the network is disconnected, the distributed system synchronously operates M, so the data in N2 remains V0. The application cannot immediately return the latest data V1 to the user because the data has not been synchronized. What can be done?

  • There are two options. First, sacrifice data consistency to ensure availability. Respond old data V0 to the user;

  • Second, sacrifice availability to ensure data consistency. Block and wait until the network connection is restored and data update operation M is completed, and then give the user the latest data V1.

This process proves that a distributed system with fault tolerance for partitions can only choose between consistency and availability.

How to choose?

Nowadays, for most of the large Internet application scenarios, there are many hosts, scattered deployment, and now the cluster scale is getting larger and larger, the nodes will only be more and more, so node failure, network failure is normal, so the fault tolerance of zoning has become a distributed system must face the problem. So it’s A choice between C and A. However, traditional projects may be different. Take the transfer system of the bank as an example, the consistency of data involving money cannot be compromised at all. C must ensure that it would rather stop service if there is A network failure, and it can make A choice between A and P.

All in all, there is no best strategy, a good system should be architected according to the business scenario, and only the right one is the best.