Distributed consistency and consensus algorithms

Distributed consistency
Consensus algorithm
conclusion
Reference

Blockchain technology has gradually become very popular in recent years. Cryptocurrencies led by Bitcoin have actually been known by countless people, but few people will study their underlying technology, that is, how cryptocurrencies such as bitcoin work as a distributed network.

Whether it is Bitcoin, Ethereum or EOS, as a distributed network, the problem of distributed consistency needs to be solved first, that is, how all nodes reach a consensus on the same proposal or value. This problem is difficult to be solved in a distributed cluster where all nodes can be trusted. Let alone in a complex blockchain network.

Distributed consistency

In a distributed system, how to ensure that all nodes in the cluster have the same data and can reach agreement on a Proposal is the core problem for the normal operation of the distributed system, and consensus algorithm is used to ensure the consistency of the distributed system.

However, due to the introduction of multiple nodes in the distributed system, there will be a variety of very complex situations in the system. As the number of nodes increases, node failure, failure or downtime becomes a very common thing, and solving various boundary conditions and unexpected situations in distributed systems also increases the difficulty of solving distributed consistency problems.

In a distributed system, in addition to the failure of nodes is the main reason that leads to the difficulty in achieving consistency, network communication between nodes is interfered or even blocked, and the difference in the running speed of distributed system are the problems to solve the consistency of distributed system.

CAP

In the fall of 1998, Eric Brewer, a professor at the University of California, Berkeley, first published CAP theory, In 1999, the thesis Brewer’s Conjecture and the Feasibility of Consistent, Available, partition-Tolerant Web Services was officially published. It summarizes the CAP theory proposed by Eric Brewer.

This paper proves that two very interesting theory, the first is in asynchronous network model, all the nodes in the absence of the clock can only judge for according to the received message, then completely unable to ensure consistency, availability, and at the same time partition fault tolerance, every system can only choose two of the three characteristics.

But discussed here are strong consistency, consistency, that is, all the nodes to receive the same operation will be performed in exactly the same order, be submitted to a node updates will be immediately reflected in the other parts of the through asynchronous or synchronous network connection on the node, fault tolerance, if you want to meet at the same time consistency and partition in the asynchronous network, We can only centrally store all data and route requests through other nodes to the central node for both purposes.

But in the real world, in fact, there is no absolute asynchronous network environment, if we allow each node has its own clock, although these clocks have a different time, but their update frequency are the same, so we can through the time interval between clock that receives the message, on the premise of this kind of looser, We can get more powerful service.

However, in a partially synchronous network environment, we still have no way to guarantee consistency, availability and fault tolerance of partitions at the same time. The proof process is actually very simple, you can read section 4.2 of the paper directly, but the appearance of the clock can let us know how long the current message has not been answered. The problem of information loss can be solved to some extent by timeout.

Because the network will be delay, so there is no way to achieve the strong consistency in a distributed system at the same time ensure availability, but we can reduce the requirement for consistency, the tradeoff between consistency and availability, and it is also designed a distributed system first issue to consider, due to the strong consistency of the system can lead to system availability is reduced, Simply handing over the job of receiving requests to other nodes is not a solution for high concurrency services, so final consistency is chosen in the current mainstream distributed systems.

Eventual consistency allows multiple node state of conflict, but all of the nodes to communicate to resolve conflicts in the limited time, never agree to a consistent state, here are two conditions is more important, is a node can normal communication directly, the second is a need for a limited time to solve conflict, Final consistency is achieved only if these two conditions are true.

Question of the Byzantine general

The Byzantine Generals Problem is a fault tolerance Problem in The distributed domain proposed by Leslie Lamport in The Byzantine Generals Problem. It is The most complex and strict fault tolerance model in The distributed domain.

In this model, the system does not impose any restrictions on the nodes in the cluster, they can send random data, error data to other nodes, or choose not to respond to other nodes’ requests. These unpredictable behaviors make the fault tolerance problem more complicated.

The Byzantine general problem describes a situation in which a group of generals command a part of an army. Each general does not know whether the other general is reliable or whether the other general’s information is reliable, but they have to vote whether to attack or retreat:

In this section, yellow means status unknown, green means attack, blue means retreat, and finally red means the current general’s information is unreliable.

At this point, whether the general is reliable or not, as long as all the generals agree on a plan, there is no question whether to attack or retreat:

None of this would have much effect on the current situation, nor would it have caused much damage, but if one of the generals told some of the generals to attack and others to retreat, it would have been a very serious problem.

Because there is a traitor in the general’s ranks or the message is intercepted in the process of transmission, some generals will choose to attack, the rest will choose to retreat, both of them believe that their choice is the choice of the majority, then there is a serious inconsistency.

Byzantine general problem is the highest requirement for fault tolerance of distributed systems, but this is not the problem that most distributed systems used in daily work will face, we encounter more or node failure down or not responding to the situation, which greatly simplifies the system for fault tolerance requirements; However, distributed systems like Bitcoin and Ethereum do have Byzantine fault tolerance issues to consider, and we’ll explain how they are addressed below.

FLP

FLP impossibility theorem is one of the most important theorems in the field of distributed systems. It gives a very important conclusion: in asynchronous model systems with reliable network and node failure, there is no deterministic algorithm that can solve the consistency problem.

In this paper, we show the surprising result that no completely asynchronous consensus protocol can tolerate even a single unannounced process death. We do not consider Byzantine failures, and we assume that the message system is reliable it delivers all messages correctly and exactly once.

In fact, this theorem tells us not to waste time designing algorithms for asynchronous distributed systems that can achieve consensus in any scenario. There is no guarantee that asynchronous systems can reach consensus in limited time. Here the author will not try to prove FLP impossibility theorem. For more information, read Impossibility of Distributed Consensuswith One Faulty Process.

Consensus algorithm

In the last section, we have to face in the distributed system is simple to understand the problems and challenges, we will introduce different here consensus algorithm principle, including traditional distributed systems Paxos, Raft and password in the field of currency used in the work certificate (POW), rights and interests of the certificate (POS) and entrusted rights (DPOS), By introducing and analyzing the principles of these consensus algorithms, I believe that readers will have a deeper understanding of distributed consistency and consensus algorithms.

Paxos and Raft

Paxos and Raft are two well-known consensus algorithms to solve consistency problems in distributed systems. Both of them can solve consistency problems in distributed systems, but the implementation and proof of the former is very difficult to understand, while the implementation of the latter is simple and intuitive. It was created to solve problems that were difficult to understand and implement in Paxos.

Let’s first briefly introduce what Paxos is. Paxos is actually a kind of protocol that can solve the problem of distributed consistency. It can make the nodes in the distributed network remain consistent even when errors occur. Paxos proposed by Leslie Lamport can ensure the consistency of nodes in the system without malicious nodes. It is also the first consensus algorithm that has been proved to be complete. The current complete consensus algorithm including Raft is essentially a variant of Paxos.

As a class of protocols, Paxos includes Basic Paxos, Multi-Paxos, Cheap Paxos, and other variants. In this section, we will briefly introduce Basic Paxos and Multi-PaxOS.

Basic Paxos

Basic Paxos is the most Basic protocol in Paxos. Each instance of Basic Paxos ultimately selects a single result. In a distributed system where Paxos is used as a consensus algorithm, each node has three identities: Proposer, Acceptor, and Learner:

Here, we will ignore the last identity Learner to simplify the running process of the protocol for readers to understand; Paxos runs in two phases, Prepare and Accept. When a Proposer receives a request from a client, it enters the following phase:

The above screenshot is taken from page 12 of the Paxos Lecture (Raft User Study).

A Proposer sends two RPC requests to acceptors to Prepare and Accept during the process of running the consensus algorithm. An Acceptor may Acceptor reject a proposal based on its minProposal, acceptedProposal, or acceptedValue information. If a proposal has been accepted by a majority of acceptors, We assume that the current proposal has been accepted by the entire cluster.

In the above picture, S1 and S5 receive requests X and Y from clients respectively. S1 first sends Prepare RPC and Accept RPC to S2 and S3. All three servers accepted S1’s proposal X; After that, S5 issues a Prepare(2.5) request to S3 and S4 servers. S3 has accepted X, so it returns the accepted proposal and its value (1.1, X). The server then uses the received proposal instead of its own proposal Y. Re-send the Accept(2.5, X) RPC to the other servers, and eventually all servers agree and choose the same value.

Check out the Paxos Lecture (Raft User Study) video for more information on how to work with the Paxos protocol.

Multi-Paxos

Since most distributed clusters need to accept a series of values, using Basic Paxos to process data flows can result in a significant performance penalty. Multi-paxos is an enhanced version of Basic Paxos. If the Leader in the cluster is stable, We often don’t need preparatory work, and we can cut the number of RPCS in half.

The figure above describes the processing process of multi-PaxOS in the stable stage. S1 is the Leader of the whole cluster. When other servers receive requests from clients, they forward the requests to the Leader for processing.

Of course, the emergence of the Leader role naturally brings another problem, that is, how to elect the Leader. The article Paxos Made Simple does not give the specific implementation method and details of Multi-PaxOS. So there are all sorts of nuances in the implementation of different Multi-PaxOs.

Raft

Raft is a variant of The Multi-PaxOS. By simplifying the model of the Multi-PaxOS, Raft implements a more understandable consensus algorithm, both of which agree on a series of continuous problems.

Raft has two limitations on the basis of Multi-PaxOS. First, the appending of logs in Raft must be sequential, whereas in Multi-PaxOS the appending of logs must be concurrent, but both are ordered for the state machine inside the node. Second, Raft imposes restrictions on the Leader election conditions, only the node with the latest and most complete log can be elected Leader, but multi-PaxOS has no restrictions on the Leader election because any node can log. Only after the Leader is selected, logs in the Leader need to be completed.

In Raft, all the Follower logs are a subset of the Leader, whereas in Multi-PaxOS the logs don’t guarantee this, because Raft has restrictions on how logs can be appending and the election process, so it’s much easier to implement.

In theory, Paxos supports concurrent log appending better than Raft, but it’s a bit more complex to understand and implement. Many people would say that Paxos is science while Raft is engineering. When I need to implement a consensus algorithm, I will use Raft and a simpler implementation. Avoid complications caused by boundary conditions.

This article doesn’t go into The details of how Raft is implemented, but The Raft Consensus Algorithm is a great resource for anyone interested in Raft.

POW(Proof-of-Work)

The consensus algorithms described in the previous section, Paxos and Raft, can only solve non-Byzantine fault tolerant consistency problems and can’t handle extreme situations in distributed networks, but this is not a problem in traditional distributed systems, whether it’s distributed databases or message queue clusters. Their internal nodes do not intentionally send error messages. In such systems, the most common problem is that nodes become unresponsive or fail, so they are valid and sufficient under this premise.

POW (proof-of-work), a protocol introduced in this section to prevent denial-of-service attacks and service error problems such as spam, was proposed in 1993 by Cynthia Dwork and Moni Naor to help distributed systems achieve Byzantine fault tolerance.

The key feature of proof of work is that the node requesting the service in a distributed system has to solve a moderately difficult but feasible problem, but the process of verifying the solution of the problem is very easy for the service provider, that is, a problem that is not easy to solve but easy to verify.

This problem usually requires a certain amount of CPU time to compute the answer to a question. Bitcoin, the largest blockchain network currently, uses a proof-of-work distributed consistency algorithm, in which all nodes in the network obtain billing rights to the current block by solving the following puzzle:

Sha-256 is a HASH function. To infer the input from the output of sha-256 is currently negligible, the Bitcoin network requires each node to constantly change the NONCE to get a different HASH result. If the HASH result is less than a certain range, The current difficulty (2017.12.17) is:

0x0000000000000000000000000000000000000000000000000000017268d8a21a
Copy the code

The probability of sha-256 being less than $1.37*10^{-65}$if calculated only once is $1.37*10^{-65}$. The current total network computing power is 13,919 PH/s, which is a very scary number. As network computing power continues to change, bitcoin will also change the difficulty of the current problem. Ensure that each block is discovered in about 10min; In the entire Bitcoin network, whoever gets the answer to the current question first gets billing rights to the block and sends the current block to as many Bitcoin nodes as possible through the Gossip protocol.

Effort to prove that the principle is very simple, the currency network selection puzzles very well adapted to prove the definition of the workload, is easy to prove difficult to find again at the same time, we can simply understood as the principle of workload proved to prevent errors or invalid request is to increase the client request service work, Puzzles that fit the difficulty ensure that legitimate requests are not compromised.

Because the workload proved to require a lot of computing power, and bitcoin takes about 10min to generate a block, the block size is only 1MB, only can contain 3 to 4000 transactions, on average, only 5-7 (single digit) transactions can be processed per second, so the congestion of bitcoin network is very serious.

POS(Proof-of-Stake)

Proof-of-stake is another consensus algorithm used in blockchain networks. In proof-of-stake based cryptocurrencies, the selection of the next block is randomly selected based on the shares and timing of the different nodes.

Since it doesn’t take a lot of CPU to create a new block, it has nothing to lose if it isn’t honest, which gives many nodes an excuse to cheat, each mining simultaneously on multiple chains to maximize profits.

In the early proof of ownership algorithms, the whole network only rewarded the nodes that created the block, and there was no penalty. In this case, each node would maximize the benefit by voting on the multiple chains created simultaneously. In this case, the nodes in the network could hardly reach a consensus on one chain.

There are two ways to solve the problem caused by a lack of nothing-at-stake. One is to use the Slasher protocol to penalize nodes that vote on multiple chains at the same time, and the other is to penalize nodes that create blocks on the wrong chain directly. In short, solve the problem by doing something other than the algorithm. Introduce incentives and penalties.

Compared to proof of work, proof of interest does not require a lot of electricity to secure the blockchain network, nor does it require the creation of a new currency in each block to motivate miners to participate in the current network, thus reducing the time required to reach a consensus. Ethereum, based on proof of equity, can process around 30 transactions per second.

DPOS(Delegated Proof-of-Stake)

The proof-of-interest algorithm introduced above can understand the whole blockchain network as a company, and those who contribute the most and account for the largest proportion have more chances to get the right to speak (right of accounting). For minority shareholders, a few percent or even a few percent of the shares are difficult to do anything, can only get shares to bring dividends and income.

DPOS, Delegated proof-of-stake, enables each of them to pick out representatives of their interests and participate in the bidding of bookkeeping rights, so that multiple minority shareholders can vote for their agents to protect their interests. Multiple elected nodes in the entire network can confirm 99.9% of transactions within 1s. EOS, which uses a certificate of entrusted interest, can process hundreds of thousands of transactions per second, while also being able to compare regulatory intervention.

In entrusted rights certificate, each participant to an arbitrary number of nodes generated election under a block, with the most votes in the first N nodes can be choose to become the creator of a block, the founder of the next block will be randomly selected from a group that the winners, in addition, the amount of N is also decided by the entire network voting, So you can keep the network as decentralized as possible.

conclusion

In this article, we first introduced the most important problem facing distributed systems, distributed consistency, and then introduced five different consensus algorithms, from Paxos and Raft for non-Byzantine consistency to POW, POS, and DPOS for Byzantine consistency. Just to recall, It is interesting that the implementation of multiple consensus algorithms to solve the Byzantine problem is actually simpler.

When the whole network needs to realize Byzantine fault tolerance, it is really difficult to achieve it only by algorithm, and other incentives or punishments are often needed. The best solution to solve the consistency is to maximize the benefits of nodes with honest performance. From proof of workload, proof of equity to proof of entrusted equity, different consensus algorithms lead to very large differences in performance. It can be seen that with fewer nodes “voting” in the network, the processing capacity of the network will be stronger and the performance will be faster. Delegate equity proves that electing N nodes to guarantee performance and decentralization is indeed a very smart thing to do.

Centralized network can really bring performance boost, but in the password monetary participants tend to believe more decentralized mechanism, because there is no incentive and punishment we cannot guarantee the next responsible for bookkeeping node is honest, therefore, how to guarantee the decentralized while increasing the performance of the network is every block chain network need to consider things.

Reference

Consensus (computer science)
Blockchain consensus algorithm (POW,POS,DPOS,PBFT) introduction and experience
Paxos and Raft
Proof-of-work system
Proof-of-stake
Proof of Stake FAQ · Ethereum Wiki
Delegated Proof of Stake
Delegated Proof-of-Stake Consensus
DPOS consensus algorithm – Missing white paper
Consensus algorithm
CAP theorem
Paxos Made Simple
The Raft Consensus Algorithm
In Search of an Understandable Consensus Algorithm
Talk about Paxos, multi-Paxos, raft
Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services
Eventual Consistency vs Strong Consistency?
Impossibility of Distributed Consensuswith One Faulty Process
The Byzantine Generals Problem
Byzantine fault tolerance
A Brief Tour of FLP Impossibility
Paxos Made Simple
Neat Algorithms – Paxos
Proof of FLP Impossibility
Vernacular blockchain
Paxos
Paxos lecture (Raft user study)
Bitcoin: A Peer-to-Peer Electronic Cash System
Proof of Stake · Bitcoin Wiki
Slasher
Proof of Stake FAQ

About pictures and reprints

Creative Commons Attribution 4.0 International License agreement

About comments and comments

Distributed consistency and consensus algorithms

Distributed Consistency and Consensus Algorithms · Faith-oriented programming

Follow: Draveness dead simple

Distributed consistency and consensus algorithms

Distributed consistency

CAP

Question of the Byzantine general

FLP

Consensus algorithm

Paxos and Raft

Basic Paxos

Multi-Paxos

Raft

POW(Proof-of-Work)

POS(Proof-of-Stake)

DPOS(Delegated Proof-of-Stake)

conclusion

Reference

About pictures and reprints

About comments and comments

Related Posts

XShell connects to Ubuntu remotely

Introduction to C language learning, suggestions xiao Bai (4)

Spring Boot configuration file