Hello everyone, I am the third. Today I have not brushed the questions, and I am in a happy mood. I would like to share with you two simple knowledge points: CAP and BASE in distributed theory.

Theory of CAP

What is the CAP

The CAP principle, also known as the CAP theorem, refers to that in a distributed system, only two of the three basic requirements of Consistency, Availability and Partition tolerance can be met at most simultaneously.

  • Consistency: The ability for data to be consistent across multiple copies.
  • Availability: The services provided by the system are always available and receive the correct response with each request.
  • Fault tolerance of partitions: A distributed system can still provide services that meet the requirements of consistency and availability even when encountering any network partition failure.

What is partitioning?

In a distributed system, different nodes are distributed in different sub-networks. Due to some special reasons, the network between these sub-nodes is disconnected, but their internal sub-networks are normal. As a result, the environment of the entire system is divided into several isolated regions, which are called partitions.

Why not all three

First of all, we have to know that distributed system, is unable to avoid partition, partition fault tolerance is must meet, let’s see in meeting partition fault tolerance on the basis of consistency and availability?

Suppose you now have two partitions, N1 and N2, and N1 and N2 have different partitions for D1 and D2, and different services, S1 and S2.

  • In the meetconsistency, N1 and N2 require the same value, D1=D2.
  • In the meetavailabilityWhenever you access N1 or N2, you can get a timely response.

Ok, so here’s the scenario:

  • The user accesses N1 and modifies D1 data.
  • The user visits again and the request falls on N2. The data of D1 and D2 are inconsistent.

The following:

  • ensureconsistencyIn this case, the D1 and D2 data are inconsistent. To ensure consistency, do not return inconsistent data.availabilityNo guarantees.
  • ensureavailability: Immediately respond, availability is guaranteed, but the response data is inconsistent with D1.consistencyNo guarantees.

Therefore, it can be seen that under the premise of partition fault tolerance, consistency and availability are contradictory.

CAP principle tradeoff

CAP three are not the same, so some trade-offs have to be made.

CA without P ❌

If P (no partitioning allowed) is not required, then C (strong consistency) and A (availability) are guaranteed. But for distributed system, partition is objective existence, actually distributed system theoretically cannot choose CA.

CP without A

If A (available) is not required, each request needs to be strongly consistent across the servers, and P (partition) leads to an infinite synchronization time, so CP is also guaranteed. Many traditional database distributed transactions fall under this pattern.

AP wihtout C

To be highly available and allow partitioning, consistency needs to be abandoned. Once partitioning occurs, nodes may lose contact with each other, and for high availability, each node can only serve with local data, which can lead to global data inconsistencies. A lot of NoSQL today falls into this category.

Practical application of CAP principles

We’ve all been around microservices. Common components that serve as registries are: ZooKeeper, Eureka, Nacos… .

  1. ZooKeeper guarantees CP. Read requests to ZooKeeper are consistent at any time. However, ZooKeeper does not guarantee the availability of each request. For example, the service is unavailable during the Leader election process or when more than half of the machines are unavailable.
  2. Eureka guarantees AP. Eureka is designed to ensure A (availability) first. There are no Leader nodes in Eureka; each node is equal and equal. So Eureka won’t be unavailable during elections or when more than half of the machines are unavailable, as ZooKeeper is. Eureka guarantees that the failure of most nodes will not affect normal service delivery, as long as only one node is available. It’s just that the data on this node may not be up to date.
  3. Nacos supports both CP and AP.

The BASE theory of

What is BASE theory

BASE is an acronym for Basically Available, soft-state, and Eventually Consistent.

BASE theory is the result of balancing consistency C and availability A in CAP. It comes from the summary of distributed practice of large-scale Internet system and is gradually evolved based on CAP theorem, which greatly reduces our requirements on the system.

The core idea of BASE theory is as follows:

Even if Strong consistency cannot be achieved, each application can adopt an appropriate method to achieve Eventual consistency according to its own service characteristics.

There are three characteristics of BASE theory

Basic available

What is basic availability?

Allow some loss of availability if the system fails unexpectedly, but not completely.

What does this loss of usability mean?

  • Loss in response time: normal search engines return results in 0.5 seconds, while basic search engines can return results in 2 seconds.

  • Loss of functionality: On an e-commerce site, users can normally complete every order without a hitch. But during the promotion period, some consumers may be directed to a downgraded page to protect the stability of the shopping system.

Soft state

Soft state refers to allowing the existence of intermediate states (data inconsistency in CAP theory) of data in the system, and considering that the existence of such intermediate states will not affect the overall availability of the system, that is, allowing the system to delay the process of data synchronization between data copies of different nodes.

Final consistency

Final consistency emphasizes that all copies of data in the system can eventually reach a consistent state after a period of synchronization. Therefore, the essence of final consistency is that the system needs to ensure the consistency of the final data, rather than ensuring the strong consistency of the system data in real time.

There are three levels of distributed consistency:

  1. Strong consistency: What the system writes is what it reads.
  2. Weak consistency: It does not guarantee that the latest written value can be read, nor does it guarantee how long it will take to read the latest data. It only ensures that the data is consistent at some point in time.
  3. Final consistency: The upgraded version of weak consistency. The system will ensure that the data is consistent within a certain period of time.

The final consistency level is preferred by the industry, but some scenarios that require very strict data consistency, such as bank transfers, still require strong consistency.

How do you guarantee final consistency?

  • Read repair: When reading data, detect data inconsistency and repair it. For example, Cassandra’s Read Repair is implemented. Specifically, when querying data from Cassandra system, the system will automatically Repair data if inconsistent data is detected in copies of different nodes.
  • Write – time repair: Repair data when writing data and detecting data inconsistency. Such as Cassandra’s Hinted Handoff implementation. Specifically, when data is written between nodes in Cassandra cluster remotely, the data is cached if the write fails, and then periodically retransmits data to repair data inconsistency.
  • Asynchronous repair: This is the most common way to check the consistency of duplicate data through periodic reconciliation and repair.

conclusion

CAP is the distributed system design theory, BASE is the extension of AP scheme in CAP theory, ACID is the database transaction integrity theory.

Strictly speaking, CAP theory is not a choice between two of three, but CP and AP, because P (partition fault tolerance) must be guaranteed in general.

BASE theory is oriented to large-scale highly available and scalable distributed systems. Contrary to traditional ACID properties, rather than a strong consistency model, BASE proposes to sacrifice strong consistency for availability and allow data to be inconsistent for a period of time, but eventually need to reach a consistent state.

Do simple things repeatedly, do repetitive things carefully, and do serious things creatively.

I am a programmer who works hard.

Like, pay attention to not get lost, let’s see you next time!



Reference:

[1]. Distributed Theory (I) – CAP theorem

[2]. The theory of the CAP

[3]. Distributed Theory (II) – BASE Theory

[4]. The BASE theory