Most of the current projects are moving towards distributed, and once the system is distributed, more complex scenarios and solutions will be introduced. For example, when you use Elasticsearch and ZooKeeper clusters in your system, have you ever heard of the “split brain” phenomenon? Do you know how they solve the split brain problem?

If you don’t already know this, your knowledge of distribution is too superficial and I recommend you read this article.

Taking ZooKeeper as an example, we will take you to understand the phenomenon of split brain in distributed systems and how to solve it.

What is a split brain?

One common feature of cluster environments like Elasticsearch and ZooKeeper is that they have a “brain”. For example, the Elasticsearch cluster has the Master node and the ZooKeeper cluster has the Leader node.

The Master or Leader nodes in a cluster are usually elected. If the network is normal, the Leader can be elected (Zookeeper is used as an example). However, when the network communication between the two rooms fails, it is possible for the election mechanism to select two leaders from different network partitions. How do the two leaders handle data synchronization when the network recovers? Who should I listen to? This leads to the phenomenon of “split brain”.

Colloquial, a split-brain is a split of the brain, where one brain is split into two or more. Imagine if a person had multiple brains that were independent of each other. This would lead to “dancing around” and “not obeying orders”.

After understanding the basic concept of split brain, let’s take the ZooKeeper cluster scenario as an example to analyze the occurrence of split brain.

Split brain in the ZooKeeper cluster

When using ZooKeeper, we rarely encounter the phenomenon of split brain, because ZooKeeper has taken corresponding measures to reduce or avoid the occurrence of split brain. The specific solution of ZooKeeper will be described later. For now, let’s assume that ZooKeeper doesn’t take these steps to prevent splitting the brain. In this case, look at how the split brain problem occurs.

At present, six zkServer services form a cluster and are deployed in two computer rooms:

Normally, the cluster has only one Leader. When the Leader fails, the other five services elect a new Leader.

If the network between Room 1 and Room 2 is faulty, without considering half of Zookeeper’s mechanism, the following situation will occur:

The three servers in machine room 2 have detected that there is no Leader. Therefore, a new Leader is elected. One cluster is divided into two clusters and two “brains” appear at the same time, a phenomenon known as “split brain”.

As one cluster has become two, both provide services. After a while, the data may become inconsistent between the two clusters. When the network is restored, it is faced with who should be the Leader, how to merge data, how to solve data conflicts and other problems.

Of course, the above process is just a matter of assuming that Zookeeper doesn’t do anything to prevent a split brain. So how does Zookeeper deal with the split brain problem?

Zookeeper’s half principles

There are several ways to prevent split brains, but Zookeeper uses the “half rule” by default. The so-called half principle is: in the process of Leader election, if a zkServer obtains more than half of the votes, it can become the Leader.

The underlying source code implementation is as follows:

public class QuorumMaj implements QuorumVerifier { int half; // The QuorumMaj constructor. Public QuorumMaj(int n){this.half = n/2; Public Boolean containsQuorum(Set<Long> Set){// set.size() is the number of returns that zkServer gets (set.size() > half); }}Copy the code

The code above builds the QuorumMaj object by passing in the number of valid nodes in the cluster; The containsQuorum method determines whether a zkServer has received more than half of the votes. Set. Size indicates the number of votes a zkServer has received.

The above code has two core points: first, how to calculate half; Second, the vote belongs to half of the comparison.

As shown in figure 6, half = 6/2 = 3. In other words, at least four machines must vote to become the Leader in the election. Therefore, in the case of network disconnection of the above two equipment rooms, the Leader cannot be elected because there are only three servers in both room 1 and room 2. In this case, the entire cluster will have no Leader.

In the absence of a Leader, Zookeeper cannot provide services externally. Therefore, we should avoid this situation during the design and cluster setup.

If the deployment request for two equipment rooms is 3:3, the deployment request is 3:2, that is, three servers in room 1 and two servers in room 2:

In this case, the first calculation is half = 5/2 = 2, which means that more than 2 machines are needed to elect the Leader. In this case, the Leader of machine room 1 can be elected normally. For room 2, the Leader cannot be selected because there are only two servers. At this time, the entire cluster has only one Leader.

For example, room 1 has only 2 servers and room 2 has 3 servers. When the network is disconnected, the election situation is as follows:

Zookeeper clusters use a semi-mechanism to achieve either no Leader or only one Leader, thus avoiding the split brain problem.

In addition to preventing a split brain, the semi-mechanism also allows for rapid elections. Because the semi-mechanism can elect a Leader without waiting for all ZKservers to vote for the same zkServer, it is also called fast Leader election algorithm.

New and old leaders struggle

Through the half principle can prevent the machine room partition caused by the phenomenon of brain split, but there is another situation is the Leader suspended animation.

Suppose a Leader dies and the rest of the followers elect a new Leader. If the old Leader is resurrected and still thinks he is the Leader, write requests to other followers will also be rejected.

ZooKeeper maintains a variable called epoch. Each time a new Leader is created, an epoch label is generated (indicating the current Leader’s reign). The epoch is incrementing. All requests with an epoch smaller than the current leader epoch are rejected.

Do followers not know that the new Leader exists? It is possible, but certainly not the majority, otherwise the new Leader cannot be created. ZooKeeper writes also adhere to the quorum mechanism, so that writes that are not supported by the majority are invalid and the old leader has no effect even if they consider themselves to be the leader.

Why deploy an odd number of Nodes in the ZooKeeper cluster

That’s half the rule, but since Zookeeper uses this strategy by default, there’s another problem. How many clusters should be set? The number of Zookeeper nodes is usually an odd number. Why?

First, as long as half of the machines in the cluster are working properly, the entire cluster is ready for external service. So let’s enumerate some cases to see how fault-tolerant the cluster is in these cases.

If there are two nodes, the cluster becomes unavailable as soon as one node is down. In this case, the tolerance of the cluster pair is 0;

If there are three nodes, one node fails, and there are still two normal nodes, more than half of which can be elected again and serve normally. In this case, the tolerance of the cluster is 1;

If there are four nodes, one node fails, and the remaining three nodes, more than half of them, can be re-elected. However, if one more fails, there will be only two left, and the election and service will not be normal. In this case, the tolerance of the cluster is 1;

And so on, for 5 nodes, the tolerance is 2; The tolerance of 6 nodes is also 2;

So if you have three nodes and four nodes, five nodes and six nodes, you have the same tolerance for 2n and 2n-1, which is n-1. So, in order to save resources and to be more efficient (more nodes participating in elections and communication), why not have more nodes? This is why the cluster deployment should be odd.

A common solution to a split brain

The above mentioned half of the principles used by Zookeeper, here is a summary of the scenarios used to solve the split brain problem.

Method one, the Quorums

For example, if Quorums is 2 in a cluster with three nodes, the cluster can tolerate the failure of one node. In this case, one lead can be elected and the cluster is still available. For example, if the Quorums of a 4-node cluster is 3, the Quorums should exceed 3, which is equivalent to the tolerance of the cluster or 1. If two nodes fail, the whole cluster is still invalid. This is the default method ZooKeeper uses to prevent “split brain”.

Method 2: Add heartbeat cables

Multiple communication modes are used in a cluster to prevent node communication failure caused by the failure of one communication mode.

For example, add a heartbeat cable. Previously, there was only one heartbeat line. If it is disconnected, no heartbeat report can be received and the other party is presumed to be dead. If there are two heartbeat lines, if one is disconnected, the other can still receive heartbeat reports, ensuring the normal running of cluster services. High availability (HA) is available between heartbeat lines. The two heartbeat lines can also detect each other. If one heartbeat line is disconnected, the other heartbeat line immediately takes effect. Under normal circumstances, it does not work and saves resources.

Method 3: Enable the disk lock mode.

The disk lock is used to ensure that only one Leader in the cluster can obtain the disk lock to provide external services and avoid data disorder. However, there is a problem. If the Leader fails, the lock cannot be released, and other followers will never obtain shared resources. So someone designed a “smart” lock in HA. The service party can enable disk lock only when it finds that all heartbeat cables are disconnected. I don’t usually lock it

Method four, arbitration mechanism.

The result of a split brain is that the node does not know which Leader to connect to. In this case, a mediator can solve this problem. For example, provide a reference IP address. When the heartbeat mechanism is disconnected, each node pings the reference IP address. If the ping fails, it indicates that the network of the node has problems.

The above methods can be used at the same time to reduce the occurrence of split brain in the cluster, but they cannot be completely guaranteed. For example, if two machines in the arbitration mechanism break down at the same time, there is no Leader available in the cluster. This is where human intervention is needed.

summary

We often say that our systems use distribution, but do we really understand some of the scenarios and solutions for scenarios in distribution? What have you learned from the analysis and solution of the split brain scenario? Let’s study together.

Author of SpringBoot Tech Insider, loves to delve into technology and write about it.

Official account: “Program New Horizon”, the official account of the blogger, welcome to follow ~

Technical exchange: please contact the blogger wechat id: Zhuan2quan