preface

A summary of distributed system consistency issues.

CAP

Let’s start with a few nouns:

  1. It’s Consistent.

    Applications need to ensure linear consistency

  2. Availability

    Non-failure nodes return reasonable responses in a reasonable amount of time (not error and timeout responses)

  3. Partition Tolerance

    The system can continue to run when the network is partitioned

Note the term linear consistency, which requires:

A system appears to have only one copy of data. Once a new value is written or read, all subsequent reads see the written value until it is overwritten again.

That is, the system ensures that the values read are up to date.

Consider the following case:

First look at the map, if is the primary database (without considering the main writing asynchronous replication conflicts at the same time) as shown in figure of four client in the case of data synchronization network interruption can still access application, here assumes that the design of the application allows the user to be correctly routed to the corresponding data center, but can not connect each other between data centers. Because data written in one data center is copied asynchronously to another, the write operation is simply queued and exchanged when the network connection is restored.

In the case of a single master database, however, the picture is not so rosy. Again in the figure above, we understand the master-slave synchronous mode. Any write and linear read requests are forced out of the main library. That is, clients connected to data center B must forward write operations to data center A. At this point, the client request cannot complete the write operation, and while it can be read, it may not be fresh data. Thus, this pattern of network outages forces a choice between linear consistency and availability.

This problem is not only a consequence of single-master replication and multi-master replication, but also exists on any unreliable network, even in the same data center.

Broadly speaking, it is a question of multiple data replicas choosing between linear consistency and availability, and the tradeoffs they face are as follows:

  • If the application needs to ensure strong (linear consistency) consistency, and some replicas are disconnected from other replicas due to network reasons, those replicas cannot process requests after being offline and must wait for the network to recover.

  • If the application does not need to guarantee linear consistency, and some replicas are disconnected from other replicas for network reasons, they can still handle requests (such as multi-master replicas). At this point, the application is still available.

CP is consistent but unavailable in the network partition, and AP is available but inconsistent in the network partition. Therefore, applications that do not require linear consistency are more fault tolerant to network problems. This insight is often called the CAP theorem. A better way to put it is to choose either consistent or available when partitioning. Because CAP was originally proposed as a rule of thumb. So we don’t have to worry about CA, because the assumption is that P is mandatory by default.

The formal definition of the CAP theorem is limited to a very narrow range, which considers only a consistent model (that is, linear consistency) and a failure (network partition, or nodes that are active but disconnected from each other). It doesn’t discuss anything about network latency, dead nodes, or other trade-offs. As a result, CAP, while historically influential, has no practical value for designing systems.

BASE

Since large-scale systems today are distributed nodes, we naturally cannot tolerate service unavailability. A partial abandonment of the obsession with linear consistency can lead to a better system design experience. Therefore, BASE theory is also a kind of guidance theory biased to program design.

Let’s also introduce some nouns:

  1. Basically Available

    Allow partial loss of availability, but core functionality is available

  2. Soft State

    An intermediate state is allowed without affecting the overall availability of the system

  3. Eventual Consistency

    All data copies in the system eventually reach a consistent state after a certain amount of time

Again, these points are easy to understand. Common high load system basic availability practices are generally degraded service, response time increase, the most important is the core function must be available. The soft state is reflected in the payment system as being in an intermediate state. Wait for the final state returned by other copies to complete final consistency.

After the language

Basic availability and final consistency are consistent with the business models of most Internet companies and provide opportunities for flexible/elastic design of systems.

The resources

  • Harvest, Yield, and Scalable Tolerant Systems
  • Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services
  • Perspectives on the CAP Theorem