What is the CAP theorem?

In July 2000, Professor Eric Brewer of University of California, Berkeley proposed CAP conjecture at ACM PODC conference. Two years later, Seth Gilbert and Nancy Lynch of the Massachusetts Institute of Technology proved CAP theoretically. Since then, CAP theory has officially become the accepted theorem in distributed computing.

The meaning of CAP theorem is that a distributed system cannot simultaneously satisfy the three basic needs of Consistency, Availability and Partition tolerance, or at most two of them.

Three indices of the CAP theorem

Consistency

Consistency: After a write operation is successful, all nodes read the same data at the same time.

Consistency here means strong consistency.

To learn more about consistency types or distributed consistency problems, see the article distributed Consistency Problems.

How to achieve consistency (strong consistency)?
  1. Data is written to the master library to be synchronized to the slave library;
  2. The slave database must be locked during the synchronization, and the lock is released after the synchronization to prevent the slave database from requesting old data before the data synchronization is successful.

Availability

It is important to note that the C in CAP and the C in transaction ACID both refer to Consistency, but the A in CAP refers to Availability and is not the same thing as the A in transaction ACID.

Availability means that any request can be properly responded to without response timeout or error, that is, the service is always available.

How is availability achieved?
  1. Data written to the master database must be synchronized to the slave database.
  2. To ensure the availability of the slave database, the slave database resources cannot be locked during the write process.
  3. Even if the new data has not been synchronized, the request must be responded to immediately from the database, even if old data is returned, but the response cannot time out.

Partition tolerance

Zone fault tolerance means that when a node or network zone fails, other normal nodes can still provide services.

Partition fault tolerance and extensibility are closely related. Usually a basic requirement of a distributed system.

How to achieve partition fault tolerance?
  1. For example, synchronize data from the master database to the slave database asynchronously. This can effectively achieve loose coupling.
  2. Add nodes so that when one slave node dies, there are other slave nodes that can provide services.

What is the trade-off between caps?

Since CAP three indicators can only meet two at most at the same time, which one should be abandoned? The following is a discussion of the situation:

CA without P

This situation is almost non-existent in distributed systems. In distributed environment, network partition is an inevitable fact, once the fault tolerance of partition is abandoned, the existence of distributed system is meaningless.

P is a fundamental requirement for distributed systems.

CP without A

Abandoning A (availability) means sacrificing the user experience in the event of A network failure or message loss, i.e. allowing the system to go down or remain unresponsive for long periods of time until all data is consistent.

This can be done in distributed systems that do not require strong availability.

A typical example is that many distributed databases, including Redis, HBase, and Zookeeper, also preferentially guarantee CP. In extreme situations, sacrifice system availability in favor of consistency.

In fact, it is understandable that the consistency of data is their basic requirement. If the consistency of data cannot be guaranteed, what is the use of other distributed storage?

AP without C

To be highly available and allow partitioning, consistency needs to be abandoned. Once a network problem occurs, nodes may lose contact with each other. To ensure high availability, users need to be immediately returned when they access the node. In this case, each node can only provide services with local data, which results in global data inconsistency.

The abandonment of consistency here is not accurate, but the abandonment of strong consistency, the next best thing to ensure the final consistency, because a system even data is not accurate, then the value of the existence of the system is not much.

There are many cases of this kind of design, such as Taobao, JINGdong and other malls, 12306 tickets and so on.

The fit is the best

There is no consensus as to which design is the best, only that what fits is the best.

If money is a no-compromise scenario, then C (consistency) must be guaranteed, and data consistency must be guaranteed even if the service is down.

For most other scenarios, high availability and partition fault tolerance are generally guaranteed, and final consistency is preferred over strong consistency to ensure data security.