Vitalik’s recent post on his blog, “A Guide to a 99% Fault-tolerant Consensus,” led many to believe that a new consensus algorithm, like “black tech,” had been born. However, as Vitalik himself said, this consensus algorithm is still the algorithm of the classical Byzantine general problem. Through analysis, we can see that the research and innovation of consensus algorithm still need to follow the proven theories such as CAP. On this basis, all kinds of classical distributed algorithms and encryption algorithms are transformed and applied to the blockchain field, and it is possible to achieve good results.

1. A new algorithm?

Vitalik posted a post on his blog called “A guide to a 99 percent Fault Tolerant Consensus.” All of a sudden, major media have released the news that “V god released the new algorithm only 1% nodes do not do evil”. Has the blockchain world entered a new chapter overnight? The answer may be somewhat depressing: this is not a new algorithm.

In fact, Vitalik makes it clear on his blog that “Leslie Lamport’s famous 1982 paper, The Byzantine General Problem, contains an algorithm [for increasing fault tolerance], which I will attempt to describe and implement in a simplified form.” He later stressed on Twitter, “I didn’t invent a 99% fault tolerant consensus protocol, Leslie Lamport did. I just made an interpretation and adapted the algorithm to the blockchain domain.”

What the hell is going on here? In order to understand the context, we need to discuss the consensus problem and distributed systems and other theoretical issues.

2. Distributed system theory

In fact, before the birth of blockchain, there have been many studies on consistency in computer science, and a strictly proven theory of distributed systems has been formed. The more classical theories include FLP and CAP.

The principle of FLP impossibility is: “In the minimal asynchronous model system with reliable network and node failure (even if there is only one), there is no deterministic algorithm that can solve the consistency problem. “That the theoretical lower limit of the consistency problem is unsolvable. There is no consensus algorithm that can be implemented in any scenario in asynchronous distributed system.

The CAP principle is called the IMPOSSIBLE triangle of CAP, that is, Consistency, Availablity and Partition cannot be satisfied at the same time, and a certain characteristic needs to be weakened to design a distributed system. Therefore, on the premise of FLP impossibility principle, CAP principle provides theoretical guidance for engineering practice.

More generally than CAP, distributed system theory also defines network environment characteristics, including Safety, Liveness, and Unreliable communication. Through these features, we can define CAP in a more general and intuitive way:

Therefore, in general, network partition tolerance P is not an optional factor, but must be considered in the algorithm. This is why distributed systems typically have a trade-off between Safety and Liveness.

Of these, the last error of any type is the most serious and troublesome. In this scenario, where errors of any kind can occur, the server may produce output that it should not, and the system needs to be prepared for the worst. For example, when a server sends opposite messages to different servers. This type of error, known as the Byzantine error, was first described and analyzed by Pease and Lamport et al in the early 1980s through the Byzantine general problem.

Therefore, the design and implementation of the consistency problem of blockchain is more complex than that of distributed database, which is one of the reasons why blockchain is not just a simple distributed database.

3. The classical solution of the Byzantine general problem

The BFT problem itself is not described in this article. Lamport et al. in their classic paper, in addition to the Byzantine general problem, also offer two solutions.

The first is the OM(M) protocol for oral messages, which does not allow the use of any encryption algorithm except encryption security on the link. This protocol requires a large number of messages to be transmitted recursively between two pairs, so the message complexity is very high and exponential, which is not practical. However, this algorithm still has its high value. Firstly, it lays a foundation for the birth of Practical Byzantine Fault Tolerance, a polynomial complexity protocol. In addition, the number of 1/3 fault-tolerant nodes is proved to be the theoretical upper limit of this algorithm.

The second is the SM(M) protocol for “encrypted messages”. This algorithm differs from the first one in the use of signature algorithm. Each node can generate an unforgeable signature that can be verified by other nodes. After receiving a message, the node checks and verifies whether the message has been received by signing it. Message consensus ends when no message is received.

This paper has proved that the second algorithm can be fault-tolerant to any number of nodes (of course, at least 2 normal nodes should be included in the network, otherwise it is meaningless). Specific process can refer to the original paper.

However, this algorithm also has its limitations: different from many Byzantine algorithms in an asynchronous or semi-synchronous network environment, it assumes that it is carried out in a “synchronous” network, ignoring the communication delay between network nodes; In addition, the signature identity system information needs to be determined before the network operation, so it is difficult to implement expansion. Therefore, according to CAP theory, this approach achieves A high consistency (C) and availability (A) without considering the tolerance of network partitioning (P).

4. “99% fault-tolerant consensus Algorithm” and comparative analysis

As a comparison, we continue to look at what Vitalik calls the consensus algorithm (hereinafter referred to as the “implementation version”) “invented by Lamport and described and simplified by himself”; The version in Lamport’s paper becomes the “original version”).

The implementation version still retains the original digital signature system, that is, each node can generate an unforgeable signature, which can be verified by other nodes.

With the original version, in order to achieve the messaging between nodes, realize the timeout specifies the message version of the algorithm, namely node receives a message, check for news in addition to see if the signature has been received and in the collection, but also check the time should not be later than the signature of the received message corresponding to the time node.

After a certain amount of time (calculated according to the rounds), the node stops listening and selects a value from the checked valid messages according to some certain rule as the consensus result.

By comparison, we can find that the two algorithms are not fundamentally different, and both algorithms are based on signature system in essence. The Vitalik implementation version of the consensus algorithm increases the latency time requirement. This design implementation is often encountered in the concrete writing implementation of a consensus protocol to determine the propagation of the message and ensure that the message propagation ends at a specified time.

In addition, the implementation discusses the Observer as an independent role in passing messages across the network. The observer, as a passive viewer in the network, can receive, examine, and directly forward (without signing) messages to other nodes. This requires the introduction of a different delay time for the observer to solve the problem of the malicious node deliberately sending messages to the observer late, causing normal messages to time out.

5. Application and inspiration

As discussed above, the main problems of this kind of consensus approach are high synchronization requirements and poor scalability for the network. Another disadvantage of the implementation version is that the message volume is also large (the process of N nodes sending messages to other N-1 nodes in n-1 rounds is required, that is, the message complexity is), so it is difficult to directly apply this kind of consensus method in practical scenarios.

In order to be more applicable to the blockchain field, Vitalik also mentioned in his article that this method can be combined with other consensus algorithms (such as PBFT, PoS, etc.) at present. For example, some nodes can be randomly selected to run the consensus at certain intervals by using the observer mode discussed above. However, if the relevant assumptions of the two consensus algorithms cannot be satisfied, the consensus algorithm will also become invalid, that is, the improved optimization cannot violate the original theoretical system.

However, there are still many lessons to be learned: taking the classical theories of distributed systems and transforming them into consensus algorithms for blockchain can yield surprising results. For example, PBFT is combined with Satoshi consensus, SM(M) algorithm in BFT paper proposed by Vitalik is combined with existing blockchain consensus, and so on.

In addition, trying to apply a variety of encryption algorithms in the security field may also yield good results. For example, ByzCoin and others are also trying to use encryption algorithms such as aggregate signatures to optimize the consensus mechanism, which can greatly reduce communication complexity.

6. Summary

The research and innovation of consensus algorithm still need to follow the proven theories such as CAP.

On the basis of these theories, various classical algorithms in computer distributed theory and various encryption algorithms in the field of security can be adapted to apply in the field of blockchain, and it is possible to obtain good results.

The resources

  1. A Guide to 99% Fault Tolerant Consensus, Vitalik, https://vitalik.ca/general/2018/08/07/99_fault_tolerant.html

  2. “Distributed System (Third Edition) Version 3.01”, Maarten Van Steen, Andrew S. Tanenbaum.

  3. “The Byzantine Generals Problem”, Leslie Lamport, Robert Shostak, Marshall Pease https://people.eecs.berkeley.edu/~luca/cs174/byzantine.pdf

  4. https://en.wikipedia.org/wiki/Byzantine_fault_tolerance#Early_solutions

  5. https://www.trustnodes.com/2018/08/10/vitalik-buterin-proposes-consensus-algorithm-requires-1-honest

Contributions: fire currency institute (https://mp.weixin.qq.com/s/fd3N7xnFGIVEIT5kW_S85Q)