————— The next day —————























— — — — — —















Zookeeper’s data model


What does Zookeeper’s data model look like? It is much like a tree in a data structure, much like a directory in a file system.




A tree is made up of nodes, and Zookeeper’s data store is also based on nodes, called ZNodes.


However, unlike nodes in a tree, znodes are referenced as path references, similar to file paths:


/ Animals/hamsters


/ plants/lotuses


This hierarchy allows each Znode node to have a unique path, as well as a clear separation of different information like a namespace.









Data:

Data stored on the Znode.


The ACL:

Records the access permission of the Znode, that is, who or what IP address can access the Znode.


Stat:

Contains various metadata for the Znode, such as transaction ID, version number, timestamp, size, and so on.


Child:

The child node reference of the current node, similar to the left child and right child of a binary tree.


One thing to note here is that Zookeeper is designed for more read and less write scenarios. Znodes are not used to store large-scale service data, but to store a small amount of status and configuration information. The data of each node cannot exceed 1MB.







Basic operations and event notification of Zookeeper



What basic operations does Zookeeper include? Here are some of the more common apis:


create

Create a node


delete

Remove nodes


exists

Check whether the node exists


getData

Get the data of a node


setData

Set the data of a node


getChildren

Obtain all child nodes under the node


Exists, getData, and getChildren are read operations. When the Zookeeper client requests read operations, you can select whether to set Watch.


What does “Watch” mean?


We can think of this as a trigger registered on a particular Znode. When the Znode changes, that is, the create, delete, setData methods are called, the corresponding event registered on the Znode will be triggered, and the client requesting the Watch will receive an asynchronous notification.


The specific interaction process is as follows:


1. The client calls getData with the watch parameter true. The server receives the request, returns the node data, and inserts the Znode path to be watched and the Watcher list into the corresponding hash table.



2. When the Znode that is deleted by Watch is deleted, the server will find all Watcher corresponding to the Znode in the hash table, notify the client asynchronously, and delete the corresponding key-value in the hash table.




Consistency of Zookeeper







What does Zookeeper’s cluster look like? It looks like this:





The Zookeeper Service cluster consists of one master and multiple slaves.


When the data is updated, it is first updated to the master node (here the node is the server, not the Znode) and then synchronized to the slave node.


When reading data, it reads any slave node directly.


In order to ensure data consistency between primary and secondary nodes, Zookeeper uses ZAB protocol, which is very similar to consistency algorithms Paxos and Raft.






Before learning ZAB, we need to first understand the three node states defined by ZAB protocol:


Looking: Election status.


Following: indicates the status of the Follower node (slave node).


Leading: State of the Leader node (primary node).



We also need to know the concept of maximum ZXID:


The maximum ZXID is the latest local transaction number of the node, including the epoch and the count. Epoch means epoch and corresponds to the Raft algorithm’s term for choosing the main time.



If the current primary Zookeeper node fails, the cluster will perform crash recovery. ZAB’s crash recovery is divided into three phases:


1.Leader election


During the election phase, the nodes in the cluster are in the Looking state. They each send a vote to the other nodes containing their server ID and the latest transaction ID (ZXID).



Then, the node will compare its own ZXID with the ZXID received from other nodes. If it finds that the ZXID of others is larger than its own, that is, the data is newer than its own, it will vote for the node with the largest known ZXID.



After each vote, the server counts the votes and determines whether a node has received more than half of the votes. If such a node exists, the node will become a quasi-leader and the state will change to Leading. The status of other nodes changes to Following.




This is equivalent to, a group of martial arts masters through fierce competition, elected the martial arts leader.



2.Discovery


The discovery phase is used to discover the latest ZXID and transaction log from the node. Some people may ask: since the Leader is selected as the master node, it is already the latest data in the cluster, why do we need to find the latest transactions in the node?


This is to prevent unexpected situations, such as when multiple leaders were generated in the previous phase for network reasons.


Therefore, in this stage, the Leader gathers ideas and receives the latest epoch values from all the followers. The Leader selects the largest epoch, increases the value by 1, and generates new epochs to distribute to each Follower.


After receiving the new epoch, each Follower returns an ACK to the Leader with their largest ZXID and historical transaction log. The Leader selects the largest ZXID and updates its own history log.



3.Synchronization


In the synchronization phase, the latest historical transaction logs collected by the Leader are synchronized to all followers in the cluster. The would-be Leader can become the official Leader only when half of the followers are successfully synchronized.


At this point, the recovery is complete.










What is Broadcast? In simple terms, when Zookeeper routinely updates data, the Leader broadcasts it to all followers. The process is as follows:


1. The client sends a data write request to any Follower.


2. The Follower forwards the data write request to the Leader.


3. The Leader sends a Propose broadcast to the followers in two-phase submission mode.


4. After receiving a WITHDRAW message, the Follower returns an ACK message to the Leader.


5. The Leader receives more than half of ACK messages, returns a success message to the client, and broadcasts the Commit request to the Follower.





The Zab protocol is neither a strong consistency nor a weak consistency, but a monotonous consistency in between. It relies on the transaction ID and version number to ensure that the data is updated and read in order.





The application of the Zookeeper











1. Distributed lock



That’s how Yahoo researchers designed Zookeeper. Using Zookeeper’s temporary sequential nodes, distributed locking can be easily implemented.


2. Service registration and discovery



Using Znode and Watcher, you can register and discover distributed services. The most famous application is Dubbo, Alibaba’s distributed RPC framework.


3. Share configuration and status information



Redis’ distributed solution, Codis, uses Zookeeper to store data routing tables and metadata from codis-Proxy nodes. In addition, commands initiated by codis-config are synchronized to each living COdis-proxy through ZooKeeper.


In addition, Kafka, HBase, and Hadoop also rely on Zookeeper to synchronize node information to achieve high availability.







A few additions:


1.ZAB protocol is relatively complex, and Xiao Grey only has a superficial understanding of it. Interested partners can go to the official community for further learning.



2. This comic is just for fun. Please try to cherish your current work and don’t imitate Xiao Grey.



For those of you who like this article, please long click on the image below to follow the subscription account Programmer Grey for more exciting content