“This is the 22nd day of my participation in the First Challenge 2022. For details: First Challenge 2022.”

Source view Zookeeper

Start ZooKeeper from the source code

Zookeeper 下载地址 :

/ / select branch 3.5.8 https://github.com/apache/zookeeper.gitCopy the code

Source import idea, org. Apache. Zookeeper. Version complains, need to build an auxiliary class

package org.apache.zookeeper.version;

public interface Info {
    int MAJOR = 1;
    int MINOR = 0;
    int MICRO = 0;
    String QUALIFIER = null;
    int REVISION = -1;
    String REVISION_HASH = "1";
    String BUILD_DATE = "2020-10-15";
}
Copy the code

Then compile and execute at root:

mvn clean install -DskipTests
Copy the code

Open source projects usually find entry classes from the startup script, you can find the boot main class from the bin directory zkserver. sh or zkserver. CMD to run

org.apache.zookeeper.server.quorum.QuorumPeerMain
Copy the code

Conf file zoo_sample. CFG, rename it zoo. CFG, and configure the zoo. CFG file location in the startup parametersBefore starting the zookeeper-server project, you need to comment out all the packages (except jline) in the pom. XML file that depend on scope provided Create a copy of the log4j.properties file from the conf folder to the zookeeper-server project’s \target\classes directory, so that the project will print logs when it starts.

Bin/zkCli. Sh - server 192.168.50.190:2181Copy the code

Run the client from the source (org. Apache. The zookeeper. ZooKeeperMain), pay attention to the need to join the launch parameters, see below:

Start the ZooKeeper cluster from the source code

Copy 3 zoo. CFG files, modify the corresponding cluster configuration, create their own myID files in the data directory, fill in the machine ID, and create three startup nodes with different configurations, as shown below:Run each node separately and the cluster is started!

Start or Leader failure Elects the leader process

The leader elects a multi-level queue architecture

The whole Bottom layer of ZooKeeper election can be divided into election application layer and message transmission layer. The application layer has its own queue to receive and send votes uniformly, and the transmission layer also designs its own queue. However, the queue is divided according to the sending machine to avoid mutual influence when sending messages to each machine. For example, if a machine fails to send messages, messages to normal machines will not be affected.

Leader election source code flow chart

This section describes ZAB protocol

The whole Zookeeper is the implementation of a multi-node distributed consistency algorithm, and the underlying implementation protocol is ZAB. ZAB full name: Zookeeper Atomic Broadcast. Zookeeper is an efficient and reliable distributed coordination service for distributed applications. Zookeeper does not use Paxos to solve distributed consistency, but ZAB protocol, ZAB is a simplified implementation of Paxos algorithm. ZAB protocol definition: The ZAB protocol is designed for the distributed coordination service Zookeeper to support crash recovery and atomic broadcast. We’re going to focus on those two things. Based on this protocol, Zookeeper implements a system architecture in active/standby mode to ensure data consistency among replicas in the cluster. The details are shown in the figure below:The figure above shows how Zookeeper handles data in a cluster. All data written by the client is written to the Leader node and then copied by the Leader to the Follower node to ensure data consistency. What about the replication process? The replication process is similar to two-phase commit (2PC). ZAB only needs more than half of the followers (including the leader’s OWN ACKS) to return ACK information to execute the commit, which greatly reduces synchronization blocking. It also improves usability. After a brief introduction, let’s focus on message broadcasting and crash recovery. Zookeeper switches between these two modes. In short, when the Leader service is available, it goes into message broadcast mode, and when the Leader service is unavailable, it goes into crash recovery mode.

News broadcast

The message broadcast process of THE ZAB protocol uses an atomic broadcast protocol, similar to a two-phase commit process. All write requests sent by the client are received by the Leader, who encapsulates the request into a transaction Proposal and sends it to all Follwers. Then, according to the feedback of all Follwers, if more than half (including the Leader himself) respond successfully, The commit operation is performed. The whole broadcast process is as follows:Through the above steps, data consistency between clusters can be maintained. A few more details: After receiving the client request, the Leader will encapsulate the request into a transaction and assign a globally increasing unique ID to the transaction, called transaction ID (ZXID). ZAB protocol needs to ensure the order of transactions, so each transaction must be sorted according to ZXID and processed. Mainly through message queues. There is also a message queue between the Leader and Follwer to decouple them and unblock synchronization. To ensure that all processes in a ZooKeeper cluster can be executed in an orderly order, only the Leader server receives write requests. Even if the followers server receives write requests from the client, the followers server forwards the write requests to the Leader server, and only the read requests are processed. The ZAB protocol states that if a transaction is committed on one machine, it should be committed on all machines, even if the machine fails and crashes.

Crash recovery

What if the Leader crashes during message broadcast? Is there any guarantee that the data are consistent? In fact, when the Leader crashes, it enters what we described at the beginning as crash recovery mode (crash: the Leader loses contact with half of the Follwer). Let’s go into more details. Hypothesis 1: After the Leader copies data to all follwers, the Follower crashes before receiving an ACK from the Follower. What should I do? Hypothesis 2: What if the Leader crashes after receiving an ACK and committing himself and sending part of the COMMIT? To address these issues, ZAB defines two principles: The ZAB protocol ensures that transactions that are only proposed/replicated by the Leader, but not committed, are discarded. The ZAB protocol ensures that transactions that have been committed at the Leader are eventually committed by all servers. Therefore, ZAB designs the following election algorithm to ensure that transactions that have been committed by the Leader are committed, while discarding those that have been skipped. Aiming at this requirement, if the Leader election algorithm can ensure that the newly elected Leader server has the transaction with the largest ZXID of all machines in the cluster, then it can guarantee that the newly elected Leader must have all the submitted proposals. This has the advantage of saving the Leader server the step of checking the commit and discard of the transaction.

Data synchronization

After the crash is recovered, the Leader server first confirms whether the transactions have been committed by half of Follwer, i.e. whether data synchronization has been completed, before the official work (receiving client requests) is performed. The goal is to keep the data consistent. When Follwer servers are successfully synchronized, the Leader adds them to the list of available servers. In fact, the Leader server processes or discards transactions depending on the ZXID. How can the ZXID be generated? A: In ZAB protocol transaction number ZXID design, ZXID is a 64-bit number, where the lower 32 bits can be regarded as a simple incrementing counter for each transaction request from the client. Each Leader produces a new transaction Proposal and performs + 1 on the counter. The higher 32 bits represent the ZXID of the largest transaction Proposal in the local log from the Leader server, and the corresponding epoch value (Leader election cycle) is resolved from the ZXID. When a new election is over, this value will be increased by one. And the transaction ID increments from 0.The high 32 bits represent the uniqueness of each Leader generation, and the low 32 bits represent the uniqueness of transactions in each Leader generation. It also allows Follwer to identify different leaders by 32 bits. Simplifies the data recovery process. Based on this policy, after followers are connected to the Leader, the Leader server compares the ZXID submitted by the Follower server with the ZXID on the Follower server. The comparison result is either rolled back or synchronized with the Leader.

Zookeeper write data ZAB protocol source code analysis