preface

① Summary of previous situations

In the last article, we mentioned the simple implementation of distributed lock based on ZooKeeper. We respectively implemented a distributed lock through the mechanism of non-name of nodes +watch (not recommended), the principle of number taking + minimum number taking Lock + Watch. The second one is similar to the first number of the bank to handle business. A form of a waiting call.

Zookeeper is simple, orderly, replicable, and fast (section 2-④ of Basic article 1). It has a data model, session mechanism, and watch monitoring mechanism (section 3-① ~ ④ of Basic article 1). Zookeeper and its corresponding rich third-party clients for us to develop, so zooKeeper introduction to the first two articles to end.

② The link of getting Started:

High concurrency from scratch (1) – basic concept of ZooKeeper

High concurrency from zero (2) – Zookeeper implements distributed locking

Content 1: Set up a Zookeeper cluster

Originally, it should be divided into P with leader election, but because the cluster construction process is relatively uncomplicated, the main content is actually the paxOS algorithm, so let’s synthesize 1P. The content looks more, but actually there are more screenshots. I hope you can be patient

① Why do we need a ZooKeeper cluster

In order to provide reliable ZooKeeper services, we need the support of a cluster. Another feature of this cluster is that most of the nodes in the cluster are ready to use the service. A fault-tolerant cluster setup requires at least three servers, it is highly recommended that you use an odd number of servers, and that each service run on a separate machine.

② Set up the cluster

Here we are demonstrating the pseudo-clustering approach, but there are certainly more details to be paid attention to when doing a pseudo-cluster on one machine than when doing clusters on several machines separately. It can be said that if the pseudo cluster can be configured properly, to the real cluster configuration will be more straightforward.

1. You need to know the configuration parameters

InitLimit: specifies the maximum number of heartbeats that the junior follower server in the cluster can accept when initializing the synchronization connection between the junior follower server and the leader server. If the cluster environment is very large and data synchronization takes a long time, you need to adjust this parameter appropriately. Note that in ZooKeeper, any time set is defined as a multiple of TickTime if we set initLimit=2. 2. SyncLimit: Specifies the maximum number of heartbeats that can be received between the follower and the leader. Each machine is given a server ID by creating a "myId" file in the respective dataDir directory. This ID is usually represented by two base numbers: the first for followers to connect to the leader, and the second for elects the leaderCopy the code

2. Modify the zoo. CFG file

Here we open the conf folder in the zooKeeper root directory and find the zoo.cfg file.


1 the dataDir and dataLogDir

Zookeeper officially recommends that we add DataLogDir to store transaction logs. If the dataDir directory is not dataLogDir, the dataDir directory will be stored in the dataDir directory. What is the difference between a transaction log and a run log? This transaction log is like our ZooKeeper database, and it needs such a database to recover when it needs to use the read and recovery functions.


② Port Precautions

Our ZooKeeper cluster will have three ports when it is enabled. The first port number is the port through which clients connect to it. ClientPort is the port that our Java client uses to connect to it,

The second is the port number for the follower to synchronize data with the leader, and the third is the port number for the leader election. Add our node configuration at the end, 192.168.1.104 is the local IP address, and 1001 is the port number used by the younger brother to synchronize data with the leader. In case our leader crashes and hangs up, we need to re-elect, and then we need to use the third port. This is the port number 2001


(3) Construction of pseudo cluster

At this time, I copied two copies of ZooKeeper to make pseudo cluster


Open zookeeper-3.5.2-alpha 02/conf/setting. XML, which is the second ZooKeeper setting. XML


Open zookeeper-3.5.2-alpha 03/conf/setting. XML, which is the third zooKeeper setting. XML


④ Precautions before cluster startup –myid

Myid is a number, so we use 1,2,3 as the id of our zookeeper01, 02,03.

Note that myID is a line of text containing only the machine ID. The id must be unique in the cluster, and the value should be between 1 and 255. We use the word should here, not limit. There is no space or other characters in the file, just need to type 1,2,3 ···, you can.


There are two possible errors that can result from this

If zooKeeper fails to run, we can go to zkserver. CMD, right-click on it, and add pause to the end to see the error message

One is that myID is not in the right directory and needs to be in the path of our dataDir


The other is that there are other characters in myID besides numbers, here I deliberately typed a space before the 1


These are mistakes that need to be avoided as much as possible. Don’t be too quick to pay attention to small details.

For my Zookeeper01 example, just put myid file in the Data folder. This directory structure is best created manually

Myid is used not only to start the cluster, but also for elections and communication between nodes, which we will discuss later


⑤ ConnectException is reported when the cluster is started

Connection refused () : Connection refused () : Connection refused () : Connection refused () : Connection refused Connect, this is because we failed to establish communication with 02,03, so we only need to open other nodes normally at this time


③ Connect to our ZooKeeper cluster

All nodes in a cluster can provide services. When a client connects to a node, the connection string can specify the IP addresses of all or multiple nodes in the cluster. When a node fails to connect to a client, the client automatically switches to another node. When we specify the address, we use the comma, which is the comma of the English input

1. Monitoring the ZooKeeper cluster using the command

2. Cluster monitoring of ZooKeeper –JMX

Java Management Extensions (JMX) is a framework that implements Management functions for applications, devices, systems, etc

Run jconsole.exe in the bin directory of our JDK

After opening JConsole, we can see that we are running several services, for example in the image below I am running jConsole itself, Maven from Idea, and three ZooKeeper services we are running

We select an open button at random, select use insecure connection, and switch to the Mbean view

In ReplicatedServer, id1 represents the value of myID that we set to 1

Property we see a number QuorumSize that represents how many services are in the cluster

At this point, it is acting as the younger follower, and we can also try to open another node. For example, my second node is acting as the leader node

With an understanding of the tools, let’s finally get started on today’s topic, the leader election of the cluster

Content 2: The soul of the ZooKeeper cluster — leader

① Who is the leader

Every server can become a leader. How is a leader elected?

Distributed consistency algorithm — PAxOS algorithm

1. What is PaxOS – the official English description

Here is an attempt to translate

P1a: The proposer selects a proposal number N and sends a proposal prerequest P1b with number N to most recipients: If the receiver receives the pre-request number n and n is greater than the pre-request number of the proposal that has been previously responded, the receiver will respond by promising not to receive any pre-request of the proposal whose number is smaller than n, and will carry the highest proposal P2a that he has received in the response: If the proposer receives a majority of the recipients' proposals for N, then the proposer will send an acceptance request to each of the recipients' services. The acceptance request will contain the proposal number n with a value. How can the value be valued? If there is no proposal value at all in the response, P2b can be arbitrary: If the receiver receives a receive request numbered N, it accepts the proposal, but if it will respond to a pre-request greater than number n, it will not accept the proposal for NCopy the code

2. Paxos roles exist

Ps: The Byzantine general problem does not exist in zooKeeper’s cluster

3. Paxos algorithm process analysis (combined with official process description)

First we must understand two constraints of the PaxOS algorithm

1. Finally, only one proposal will be selected, and only the selected proposal value will be learner to record 2. Eventually there is a proposal that goes into effect, and the PAXOS protocol allows the proposal that a proposer sends to move toward a proposal that is accepted by a majority of acceptors, ensuring that the final result is generated.Copy the code

We will now begin the process analysis


(1) Pre-request the first stage

The proposer sends two kinds of messages: Prepare indicates a request, and Accept indicates a request

The proposal request of the proposer constitutes a structure (n,v), where n is the ordinal number and v is the proposal value

The proposer A sent A prepare request(n=2,v=5) to the receiver C,D and E. C and D received the request normally, but E was disconnected and did not receive it indirectly at the first time

The proposer B sends its prepare request(n=4,v=8) to C,D and E, and then E has resumed communication.


② Pre-request the second stage

What do we make of this? First of all, in the first stage, BOTH C and D have received the pre-request from proposer A normally. At this time, they have received no previous request, so they will make A prepare response to the request from proposer A. The corresponding concept here is P2b: If the receiver receives an receive request numbered N, it accepts the proposal

So at this point, they make their response to proposer A by setting the current offer they receive to be (n=2,v=5), and make A promise to not accept any more offers with sequence number less than 2, which corresponds to P1b: If the recipient has received the advance request number n, n is greater than the front has been the response of the proposal request number, because here have no one request before, so the response before the request number is 0, meet the conditions, then the receiver C, D to respond, promise not to receive number smaller than n proposal request, So, they will respond with the highest number of proposals they have received, which is the one they received (n=2,v=5).

At this point, the receiver E has received the proposal from proposer B, which also corresponds to P1b and P2b. The receiver promises not to receive any proposal with a number smaller than 4, and sets the highest number proposal he has ever received to (n=4,v=8).


③ The first stage of acceptance

Then our recipients C and D will also receive the proposal request from proposer B. The previous proposal of C and D is (n=2,v=5). At this time, because B has proposed (n=4,v=8), 4>2, they will also send a proposal response to proposer B, indicating that they have proposed (n=2,v=8), and then they will make modifications by themselves. Change the proposal to (n=4,v=5) and repromise not to accept requests for proposals with n less than 4. In this case, why does v have a value of 5

Because the recipient E received the proposal from proposer B first (n=4,v=8), and then received the proposal from proposer A (n=2,v=5), 4 was greater than 2, so the recipient E directly ignored A’s proposal and did not reply to A


④ Accept the second stage

When proposerB receives more than half of the Acceptor requests, he sends a receive request to all the acceptors

The proposer A will send its proposal acceptance request to C and D (n=2,v=5) after receiving the two proposals in the second pre-request stage, but C and D are obviously no longer its person, so C and D will ruthlessly discard the acceptance request it sent

When the proposer B receives the response (n=2,v=5) from C and D, it will send a receiving request (n=4,v=5). Why does it use its own number 4 but take 5 as the proposal value? Because 5 here is widely expected and is the largest number among the responses from C and D? It is not the value of your proposal received by the receiver that determines the final proposal value, but the value corresponding to the maximum number received by the proposer itself. Please note the description of the PAxOS algorithm in P2a, which takes the value corresponding to the highest number in the recipient’s response

(5) Learner participation process

After the second stage of acceptance, proposerB will send the message (n=4,v=5) to all the recipients he can contact. After everyone has received the message, A will vote again, because the maximum value of v corresponding to the message he can receive is 5. So 5 is the end result of this algorithm, and you can’t change it. At this point the learner will synchronize

③ Leader election requirements of the ZooKeeper cluster

Requirements for the election leader:

The selected leader node with the highest ZXID must be agreed by more than half of the nodesCopy the code

Built-in implementation of the election algorithm

LeaderElection FastLeaderElection (default) AuthFastLeaderElectionCopy the code

(4) Content and concept of leader election mechanism in ZooKeeper cluster

Server ID -- myID Transaction ID -- maximum ZXID logical clock stored in the server -- Number of voting rounds initiated Election status: LOOKING: election status FOLLOWING: leader status, participant voting: LEADING: The state of the leaderCopy the code

⑤ Leader election algorithm of the ZooKeeper cluster

The steps of the election algorithm

2. When other service instances receive the voting invitation, they compare whether the transaction ID of the initiator is larger than their latest transaction ID. If the transaction ID is larger, they vote for the initiator; if the transaction ID is smaller, they do not vote for the initiator; if the transaction ID is equal, they vote for the initiator; if the transaction ID is larger, they vote for the initiator. After receiving the voting feedback, the initiator checks whether the number of votes (including his/her own votes) is greater than half of the cluster. If the number is greater than half, the initiator becomes the leader. If the number is less than half and the leader is not elected, the initiator initiates voting againCopy the code

How do we select node 2 as the leader of the cluster

First we node is up and running, and it because no connection, so when the second node startup, the second node set a vote for their own, as the first node to broadcast their information to the other node, this third node does not start up, however, so it can only be broadcast to a second stage, Then the second node of the radio and also sent to the first node, this time they held each node 1, 2, at this time they will first compare the transaction id, then, might the two nodes in the early years of the transaction id is 0, that is what is the situation of 0, then compare the server id, then our node 1 myid is 1, The myID of node 2 is 2, so node 1 will vote for node 2. In this case, node 2 has two votes and node 1 has one vote

Node 3 will receive a broadcast message from node 1 and node 2, and it will send a broadcast message to node 1 and node 2 that it wants to be the leader. However, it will find that node 2 already has two votes, so it can only accept node 2 as the leader.

⑥ Simple analysis by starting log

CMD startup log for node 1

Scroll down slightly to the state where node 2 is started

The status of the last channel node 3 started

CMD startup log for node 2

CMD for node 3

finally

It’s a lot of stuff. Just a quick review

Cluster setup must pay attention to the location of myID and edit when don’t hand fast with some unrecognizable characters, the rest of what to say

Paxos algorithm is difficult to understand, but also dare not say how deep their research, if not appropriate or have other ideas, you can leave a message to me, will be improved

The election of ZooKeeper is not hard to understand. To test it, just add a few more clusters to the set up cluster and look at jConsole or run logs to test the different results. Other changes can be made when the network is volatile

Next: High Concurrency from Scratch (4) – A classic application scenario of Zookeeper