1. What is ZooKeeper?

ZooKeeper is an open source distributed coordination service. It is a software that provides consistency services for distributed applications. Distributed applications can implement functions such as data publishing/subscription, load balancing, naming service, distributed coordination/notification, cluster management, Master election, distributed lock and distributed queue based on Zookeeper.

ZooKeeper aims to encapsulate key services that are complex and error-prone, and provide users with easy-to-use interfaces, efficient performance, and stable functions.

Zookeeper ensures the following distributed consistency features:

(1) Sequential consistency

(2) atomicity

(3) Single view

(4) Reliability

(5) Real-time (final consistency)

Read requests from clients can be processed by any machine in the cluster, and if a listener is registered on the node, the listener is also handled by the connected ZooKeeper machine. For write requests, these requests are sent to other ZooKeeper machines at the same time and are agreed upon before the request returns success. Therefore, as the number of Cluster machines in ZooKeeper increases, read request throughput increases but write request throughput decreases.

Order is an important feature of ZooKeeper. All updates are globally ordered. Each update has a unique timestamp, called zooKeeper Transaction Id (ZXID). However, the read request will only be ordered relative to the update, that is, the return result of the read request will contain the latest ZXID of the ZooKeeper.

2. What does ZooKeeper offer?

  • The file system
  • A notification mechanism

3. Zookeeper file system

Zookeeper provides a multi-tier node namespace (called zNode). Unlike file systems, where only file nodes can hold data and not directory nodes, these nodes can set associated data.

To ensure high throughput and low latency, Zookeeper maintains the tree directory structure in memory. This feature prevents Zookeeper from storing large amounts of data. Each node can store up to 1 MB of data.

4. How does Zookeeper synchronize the status of the primary and secondary nodes?

At the core of Zookeeper is the atomic broadcast mechanism, which ensures synchronization between servers. The protocol that implements this is called the Zab protocol. Zab protocol has two modes: recovery mode and broadcast mode.

  1. Recovery mode

Zab goes into recovery mode when the service starts or after the leader crashes, and the recovery mode ends when the leader is elected and most servers have completed state synchronization with the Leader. State synchronization ensures that the leader and Server have the same system state.

  1. Broadcasting mode

Once the leader has synchronized the status of most followers, it can start to broadcast messages. When a server is added to the ZooKeeper service, it starts in recovery mode, discovers the Leader, and synchronizes the status with the Leader. When the synchronization ends, it also participates in the message broadcast. The ZooKeeper service remains Broadcast until the Leader crashes or the leader loses most of the followers.

5. Four types of data nodes zNodes

(1) Persistent-persistent node

The node exists on Zookeeper unless manually deleted

(2) EPHEMERAL- The temporary node

The life cycle of the temporary node is bound to the client session. Once the client session fails (the client disconnects from ZooKeeper and the session does not necessarily fail), all the temporary nodes created by the client are removed.


The basic features are the same as the persistent node, but with the addition of sequential attributes. The node name is followed by an increment integer maintained by the parent node.

(4) EPHEMERAL_SEQUENTIAL- temporary sequential node

Basic features are the same as temporary nodes, with sequential attributes added and a self-incrementing integer number maintained by the parent node appended to the node name.

6. Zookeeper Watcher mechanism – Data change notification

Zookeeper allows the client to register a Watcher listener with a Znode on the server. When the Watcher is triggered by some specified event on the server, the server sends an event notification to the specified client to implement the distributed notification function. The client then makes business changes based on Watcher notification status and event type.

Working mechanism:

(1) The client registers watcher

(2) The server handles watcher

(3) The client calls back watcher

Watcher features:

(1) Disposable

Whenever a Watcher is touched, Zookeeper removes it from its storage, either on the server or the client. Such a design effectively relieves the pressure on the server. Otherwise, the server will constantly send event notifications to the client for frequently updated nodes. Both the network and the server are under great pressure.

(2) Serial execution by the client

The process of the client Watcher callback is a serial synchronization process.

(3) Light weight

3.1. Watcher notifications are very simple. They tell the client that an event has occurred, but they do not explain what the event is.

3.2 When a client registers a Watcher with a server, it does not pass the client’s actual Watcher object entity to the server. It simply marks the request with a Boolean attribute.

Watcher events are sent asynchronously from the server to the client. This creates a problem. Different clients communicate with each other through sockets. Zookeeper provides the ordering Guarantee, that is, the client can sense the changes of the ZNode monitored only after the monitoring event is monitored. So with Zookeeper we can’t expect to monitor every node change. Zookeeper only ensures final consistency, not strong consistency.

(5) Register watcher getData, exists, getChildren

(6) Trigger watcher create, delete, setData

(7) When a client connects to a new server, watch will be triggered by any session event. When a server is disconnected, the watch cannot be received. When the client reconnects, if necessary, all the previously registered watches will be re-registered. Usually this is completely transparent. Only in one special case can a watch be lost: for an uncreated ZNode exist Watch, the watch event can be lost if it is created during the client disconnection and then deleted before the client connects.

7. The client registers the Watcher implementation

(1) Call getData()/getChildren()/exist() API and pass Watcher object

(2) Tag request request, encapsulate Watcher to WatchRegistration

(3) Encapsulate it into a Packet object, and send the server to send request

(4) after receiving the response from the server, register Watcher with ZKWatcherManager for management

(5) Request return, complete registration.

8. The server handles the Watcher implementation

(1) The server receives Watcher and stores it

After receiving the client request, process the request to determine whether it is necessary to register Watcher. If necessary, connect the node path of the data node with ServerCnxn (ServerCnxn represents the connection between a client and a server, realizing the Process interface of Watcher. This can be viewed as a Watcher object) stored in WatchTable and watch2Paths of the WatcherManager.

(2) Watcher trigger

Take the NodeDataChanged event triggered by the setData() transaction request received by the server as an example:

2.1 packaging WatchedEvent

Encapsulate the notification status (SyncConnected), event type (NodeDataChanged), and node path into a WatchedEvent object

2.2 query Watcher

Look up Watcher based on node path from WatchTable

2.3 Not found; Note No client has registered Watcher on the data node

2.4 find; Extract and remove the Watcher from WatchTable and Watch2Paths.

(3) Call the process method to trigger Watcher

In this case, process mainly sends Watcher event notifications through the TCP connection corresponding to ServerCnxn.

9. The client calls back Watcher

The SendThread thread receives the event notification, and the EventThread calls back Watcher.

The client-side Watcher mechanism is also one-time, and once triggered, the Watcher is invalidated.

10. ACL permission control mechanism

UGO (User/Group/Others)

Currently used on Linux/Unix file systems, it is also the most widely used permission control method. Is a coarse-grained file system permission control mode.

Access Control List (ACL) Indicates an Access Control List

It includes three aspects:

Permission Mode (Scheme)

(1) IP: implements permission control by IP address granularity

(2) Digest: the most commonly used, use a permission identifier similar to username:password for permission configuration, easy to distinguish different applications for permission control

(3) World: The most open access control mode, a special Digest mode with only one access identifier: “World: Anyone”

(4) Super: indicates the Super user

Authorization object

An authorization object is a user or a specified entity to which the permission is granted, such as an IP address or a machine light.

Access Permission

(1) CREATE: data node creation permission, allowing authorized objects to CREATE child nodes under this Znode

(2) DELETE: child node deletion permission, allowing the authorized object to DELETE the child nodes of the data node

(3) READ: READ permission of a data node, allowing authorized objects to access the data node and READ its data content or child node list

(4) WRITE: data node update permission, allowing authorized objects to update the data node

(5) ADMIN: indicates the management permission of a data node, allowing authorized objects to set acLs on the data node

11. Chroot feature

Since version 3.2.0, the Chroot feature has been added, which allows each client to set a namespace for itself. If a client has Chroot set, any operations that the client can do to the server will be restricted to its own namespace.

By setting Chroot, a client can be applied to a subtree of the Zookeeper server. In scenarios where multiple applications share the same Zookeeper, different applications are isolated from each other.

12. Session management

Bucket policy: Similar sessions are managed in the same block so that Zookeeper can isolate sessions from different blocks and process the same block in a unified manner.

Allocation principle: ExpirationTime for each session

Calculation formula:

ExpirationTime_ = currentTime + sessionTimeout

ExpirationTime = (ExpirationTime_ / ExpirationInrerval + 1) *

Context Context ExpirationInterval Indicates the ExpirationInterval for a Zookeeper session. The default ExpirationInterval is tickTime

13. Server roles


(1) The only scheduler and handler of transaction requests to ensure the sequence of cluster transaction processing

(2) Scheduler of each service in the cluster


(1) Process the non-transaction request of the client and forward the transaction request to the Leader server

(2) Participate in the voting of the transaction request Proposal

(3) Participate in Leader election voting


(1) A server role introduced after version 3.0 that improves the non-transactional capabilities of the cluster without affecting its transactional capabilities

(2) Process the non-transaction request of the client and forward the transaction request to the Leader server

(3) Do not participate in any form of voting

14. Running status of the Zookeeper Server

The server has four states, which are LOOKING, FOLLOWING, LEADING, and OBSERVING.

(1) LOOKING: find the Leader state. When the server is in this state, it considers that there is no Leader in the cluster and therefore needs to enter the Leader election state.

(2) FOLLOWING the status of the FOLLOWING. The current server role is Follower.

(3) LEADING: LEADING state Indicates that the current server role is Leader.

(4) OBSERVING: the state of OBSERVING. Indicates that the current server role is Observer.

15. Data synchronization

After the Leader election is complete, the Learner (collectively named Follower and Observer) returns to the Leader server to register. When the Learner server wants the Leader server to complete the registration, the data synchronization process starts.

Data synchronization process :(all in the form of message passing)

Learner registers with Learder

Data synchronization

Synchronous confirm

Zookeeper data synchronization is classified into four types:

(1) Direct differential synchronization (DIFF synchronization)


(3) Only rollback synchronization (TRUNC synchronization)

(4) Full synchronization (SNAP synchronization)

Before data synchronization, the Leader server completes data synchronization initialization:


· Extract lastZxid (the ZXID last processed by the Learner server) from ACKEPOCH message sent when the Learner server registers


· Leader server Proposal minimum ZXIDmaxCommittedLog of cache queue:

· Leader server Proposal Maximum ZXID direct differential synchronization (DIFF synchronization) in the committedLog cache queue

· Scenario: peerLastZxid between minCommittedLog and maxCommittedLog rollback and differential synchronization (TRUNC+DIFF synchronization)

Scenario: When the new Leader server finds that a Learner server contains a transaction record that it does not have, it needs to ask the Learner server to roll back the transaction to the one that exists on the Leader server, The ZXID that is closest to peerLastZxid is TRUNC only

· Scenario: peerLastZxid is greater than maxCommittedLog

Full synchronization (SNAP synchronization)

· Scenario 1: peerLastZxid is less than minCommittedLog

· Scenario 2: There is no Proposal cache queue on the Leader server and peerLastZxid is not equal to lastProcessZxid

16. How does ZooKeeper ensure transaction order consistency?

Zookeeper uses the globally incrementing transaction Id to identify all proposals. All proposals are presented with an ZXID. The ZXID is actually a 64-bit number with the highest 32 bits being epoch (period; Era; The world; New era is used to identify the leader cycle. If a new leader is generated, the epoch will be incremented, and the lower 32 bits are used to increment the count. When a new proposal is generated, a transaction execution request will be sent to other servers according to the two-stage process of the database. If more than half of the machines can execute it and succeed, the execution will start.

17. Why is there a Master node in a distributed cluster?

In a distributed environment, some business logic only needs to be executed by one machine in the cluster, and the results can be shared by other machines. In this way, repeated computing can be greatly reduced and performance can be improved. Therefore, leader election is required.

18. What can I do if the ZK node is down?

Zookeeper is also a cluster. You are advised to configure at least three servers. Zookeeper itself ensures that when one node goes down, other nodes continue to provide services.

If one Follower crashes, there are still two servers that provide access to the Zookeeper data. Since there are multiple copies of the data on Zookeeper, the data is not lost.

If a Leader fails, Zookeeper elects a new Leader.

The mechanism of the ZK cluster is that as long as more than half of the nodes are normal, the cluster can provide services normally. The cluster fails only if there are so many ZK nodes hanging that only half or less of them work.


A 3-node cluster can fail 1 node (the leader can get 2 votes >1.5)

A 2-node cluster cannot fail any node (the leader gets 1 vote <=1)

19. Differences between ZooKeeper and NGINx load balancing

Zk load balancing is adjustable, nginx can only adjust the weight, other need to control the need to write plug-ins; However, the throughput of Nginx is much higher than that of ZK.

20. What are the deployment modes of Zookeeper?

Zookeeper can be deployed in three modes:

  1. Single-node deployment: Runs on one cluster.
  2. Cluster deployment: Multiple clusters run.
  3. Pseudo-cluster deployment: One cluster starts multiple Zookeeper instances.

21. How many machines are required in a cluster? What are the clustering rules? There are three servers in the cluster and one node is down. Can Zookeeper still be used?

The cluster rule is 2N+1 (N>0, that is, three nodes). Can continue to use, singular servers as long as more than half of the servers are not down can continue to use.

22. Does the cluster support dynamically adding machines?

In fact, the horizontal expansion, Zookeeper is not very good at this aspect. Two ways:

Restart all: Stop all Zookeeper services, modify the configurations, and start them. Previous client sessions are not affected.

Restart one by one: Under the principle that more than half of the VMS are available, the restart of one vm does not affect the external services provided by the entire cluster. This is a common way to do it.

Version 3.5 supports dynamic capacity expansion.

23. Is Zookeeper’s watch monitoring notification on a node permanent? Why not permanent?

It isn’t. Official statement: A Watch event is a one-time trigger. When the data of the Watch is changed, the server sends the change to the client with the Watch set, so that they can be notified.

Why is it not permanent? For example, if the server changes frequently, and the listening client in many cases, every change has to be notified to all the clients, putting a lot of pressure on the network and the server.

Generally, the client executes getData(“/node A “,true). If node A changes or is deleted, the client will get its Watch event. However, after node A changes again, and the client does not set the watch event, it will not send to the client.

In practice, in many cases, our clients don’t need to know every change on the server side, I just need the latest data.

24. What Are the Java clients of Zookeeper?

Java client: ZK client and Apache open source Curator.

25. What is Chubby and how do you compare it to Zookeeper?

Chubby is Google’s fully implemented PaxOS algorithm and is not open source. Zookeeper is an open source implementation of Chubby, using the ZAB protocol, a variant of the PaxOS algorithm.

26. Describe some common ZooKeeper commands.

Common commands include ls get set create delete.

27. The connection and difference between ZAB and Paxos algorithms?


(1) Both have a role similar to the Leader process, which is responsible for coordinating the running of multiple Follower processes

(2) The Leader process will wait for more than half of the followers to give correct feedback before submitting a proposal

(3) In ZAB protocol, each Proposal contains an epoch value to represent the current Leader cycle. In Paxos, the name is Ballot


ZAB is used to build a highly available distributed data master/slave system (Zookeeper), and Paxos is used to build a distributed consistent state machine system.

28. Typical Zookeeper application scenarios

Zookeeper is a typical publish/subscribe distributed data management and coordination framework that developers can use to publish and subscribe distributed data.

By cross-using the rich data nodes in Zookeeper and cooperating with Watcher event notification mechanism, it is very convenient to build a series of core functions involved in distributed applications, such as:

(1) Data publishing/subscription

(2) Load balancing

(3) Naming service

(4) Distributed coordination/notification

(5) Cluster management

(6) Master election

(7) Distributed lock

(8) Distributed queue

Data publish/subscribe


A data publish/subscribe system, known as a configuration center, is where publishers publish data for subscribers to subscribe to.


Dynamically retrieving data (configuration information)

Centralized management of data (configuration information) and dynamic update of data

Design patterns

Push model

The Pull model

Data (configuration information) feature

(1) The amount of data is usually small

(2) Data content will be dynamically updated during operation

(3) All machines in the cluster share the same configuration

For example: machine list information, run time switch configuration, database configuration information, etc

Implementation based on Zookeeper

· Data storage: Stores data (configuration information) to a data node on Zookeeper

· Data acquisition: The application reads data from the Zookeeper data node at startup initialization and registers a data change Watcher on the node

· Data change: When the data is changed, the corresponding node data of Zookeeper is updated. Zookeeper sends the data change notification to each client. After receiving the notification, the client can read the changed data again.

Load balancing

Zk naming service

Naming service is to obtain the address of the resource or service by the specified name, using ZK to create a global path, this path can be used as a name, pointing to the cluster in the cluster, the address of the service, or a remote object and so on.

Distributed notification and coordination

For system scheduling: the operator sends notifications that actually change the state of a node through the console, and ZK sends those changes to all the clients of the Watcher that have registered the node.

For performance reporting: each worker process creates a temporary node in a directory. It also carries the progress data of the work, so that the summary process can monitor changes in the sub-nodes of the directory to get a real-time global picture of the work progress.

Zk naming Service (File System)

Naming service is to obtain the address of the resource or service by the specified name, using ZK to create a global path, that is, a unique path, this path can be used as a name, pointing to the cluster in the cluster, the address of the service, or a remote object and so on.

Zk configuration Management (file system, notification mechanism)

Program distributed deployment in different machines, the program configuration information in the ZK znode, when the configuration changes, that is, when znode changes, you can change the content of a directory node in ZK, using Watcher notification to each client, so as to change the configuration.

Zookeeper Cluster management (file system, notification mechanism)

Cluster management does not care about two things: whether a machine exits or joins, or elects a master.

For the first point, all machines have a convention to create a temporary directory node under the parent directory and then listen on the parent directory node

Child node change message of. Once a machine dies, its connection to ZooKeeper is disconnected, the temporary directory node it created is deleted, and all other machines are notified that a sibling directory has been deleted, so everyone knows it’s on board.

In the same way, all machines are notified that the new sibling directory has been added, and the highcount is now available again. For the second point, we change it slightly. All machines create a temporary sequentially numbered directory node, and select the machine with the smallest number each time as the master.

Zookeeper distributed lock (file system, notification mechanism)

With ZooKeeper’s consistent file system, locking issues are made easier. Locking services can be divided into two categories, one for holding exclusivity and the other for controlling timing.

For the first type, we treat a ZNode on ZooKeeper as a lock, implemented by createZNode. All clients create the /distribute_lock node, and the client that is successfully created owns the lock. When the distribute_lock node you created is deleted, the lock is released.

For the second type, /distribute_lock already exists, and all clients create a temporary sequentially numbered directory node under it. As with master, the least numbered directory node obtains the lock and is deleted when it is used up.

Zookeeper queue Management (file system, notification mechanism)

There are two types of queues:

(1) Synchronous queue, when the members of a queue are all together, the queue can be available, otherwise it has been waiting for all members to arrive.

(2) Queue entry and exit operations are carried out in FIFO mode.

First, create temporary directory nodes under the convention directory, and listen for the required number of nodes.

The second type is consistent with the basic principle of the control sequence scenario in distributed lock service. The entry column is numbered and the exit column is numbered. A PERSISTENT_SEQUENTIAL node is created in a specific directory, Watcher notifies the waiting queue when it is successfully created, and the queue deletes the node with the smallest sequence number for consumption. In this scenario, the ZNode of Zookeeper is used for message storage. The data stored in the ZNode is the message content in the message queue, and the SEQUENTIAL serial number is the message number. Since the nodes created are persistent, you don’t have to worry about losing queue messages.

29. What are the functions of Zookeeper?

  1. Cluster management: Monitors node survival status and running requests.
  2. Primary node election: If the primary node fails, a new primary node can be elected from the standby node. Primary node election refers to the election process. Zookeeper can help complete this process.
  3. Distributed lock: Zookeeper provides two types of locks: exclusive lock and shared lock. An exclusive lock means that only one thread can use a resource at a time. A shared lock means that read locks are shared. Read and write locks are mutually exclusive, that is, multiple threads can read the same resource at the same time. Zookeeper controls distributed locks.
  4. Naming service: In distributed systems, client applications can obtain information such as the address and provider of a resource or service by using a named naming service.

30. What is the notification mechanism of Zookeeper?

The client will set up a Watcher event for a Znode. When the Znode changes, these clients will receive a notification from ZK, and then the client can make business changes according to the zNode changes.

31. The relationship between Zookeeper and Dubbo?

Functions of Zookeeper:

Zookeeper is used to register services and load balance. Which service is provided by which machine must be made known to the caller. In simple terms, it is the corresponding relationship between IP address and service name. Of course, this correspondence could also be hard-coded in the caller’s business code, but if the machine providing the service dies the caller will not know about it and will continue to request the service from the failed machine if the code is not changed. Using the heartbeat mechanism, ZooKeeper detects the failed server and deletes the IP address and service mapping of the failed server from the list. As for supporting high concurrency, it’s simply scaling out, adding machines to increase computing power without changing the code. By adding new machines to register services with ZooKeeper, more service providers can serve more customers.


Dubbo is a tool for managing the middle tier, where there are many services to access and service providers to schedule between the business tier and the data warehouse. Dubbo provides a framework to solve this problem. Note that the Dubbo here is just a frame, and what you put on the shelf is entirely up to you, just like a car skeleton that you need to match with your wheels and engine. The only way to do scheduling in this framework is to have a distributed registry that stores metadata for all the services, you can use ZK, you can use anything else, but everyone uses ZK.

Relationship between ZooKeeper and Dubbo:

Dubbo will abstract the registry, it can external to different storage media to provide services to the registry, there are ZooKeeper, Memcached, Redis and so on.

ZooKeeper is introduced as a storage medium, and the features of ZooKeeper are introduced. First, load balancing. The carrying capacity of a single registry is limited. When the traffic reaches a certain level, it needs to be distributed. Resource synchronization: Load balancing is not enough. Data and resources between nodes need to be synchronized. The ZooKeeper cluster naturally provides this function. Service providers write their URL addresses to /dubbo/${serviceName}/providers directories on ZooKeeper at startup. This operation is completed to publish the service. Other features include Mast elections, distributed locks, and more.

Source: author: thinkwon thinkwon.blog.csdn.net/article/det…