1. What is Zookeeper?

ZooKeeper is an open source distributed coordination service. ZooKeeper is a software that provides consistent services for distributed applications. Distributed applications can implement functions such as publish/subscribe, load balancing, naming services, distributed coordination/notification, cluster management, Master election, distributed locks, and distributed queues based on ZooKeeper.

The goal of ZooKeeper is to encapsulate complex and error-prone key services and provide users with an easy-to-use interface and an efficient and stable system.

ZooKeeper guarantees the following distributed consistency features:

(1) sequential consistency

(2) Atomicity

(3) Single view

(4) Reliability

(5) Real-time (final consistency)

A client’s read request can be processed by any machine in the cluster, and if a read request has a listener registered on the node, this listener is also handled by the connected ZooKeeper machine. For write requests, these requests are sent to other ZooKeeper machines at the same time and the request returns success only after an agreement is reached. Therefore, as ZooKeeper’s cluster machines grow, the throughput for read requests will increase but the throughput for write requests will decrease.

Orderability is a very important feature of ZooKeeper. All updates are globally ordered, and each update has a unique timestamp called ZXID (ZooKeeper Transaction ID). The read request will only be relative to the update order, that is, the read request will return with the latest ZXID of the ZooKeeper.

2. What does ZooKeeper provide?

  • The file system
  • A notification mechanism

3. ZooKeeper file system

ZooKeeper provides a multi-level namespace for nodes (nodes are called znodes). Unlike a file system, where only the file node can store data but not the directory node, these nodes can all set the associated data.

To ensure high throughput and low latency, ZooKeeper maintains this tree directory structure in memory. This feature prevents ZooKeeper from being used to store large amounts of data, which can be limited to 1 MB per node.

4. How does ZooKeeper ensure the state synchronization between master and slave nodes?

At the heart of ZooKeeper is the atomic broadcasting mechanism, which ensures synchronization between servers. The protocol that implements this mechanism is called the ZAB protocol. ZAB protocol has two modes, they are recovery mode and broadcast mode.

  1. Recovery mode

Zab goes into recovery mode when the service starts or after the leader crashes, and the recovery mode ends when the leader is elected and most servers have finished synchronizing with the leader’s state. State synchronization ensures that the leader and the server have the same system state.

  1. Broadcasting mode

Once the leader has synchronised its status with most of the followers, it can start broadcasting a message, thus entering the broadcast state. When a server joins the ZooKeeper service, it starts in recovery mode, finds the leader, and synchronizes the state with the leader. At the end of the synchronization, it also participates in the message broadcast. The ZooKeeper service remains in the Broadcast state until the leader crashes or loses most of the followers’ support.

5. There are four types of data nodes, ZNode

(1) Persistent – Persistent nodes

Nodes exist on ZooKeeper unless they are removed manually

(2) Ephemeral – Temporary node

The lifetime of the temporary node is tied to the client session, and all temporary nodes created by the client will be removed if the client session fails (if the client is disconnected from ZooKeeper, the session does not necessarily fail).

Persistent_sequential – Persistent_sequential node

The basic features are the same as persistent nodes, except that the order property is added, and the node name is appended to a self-incrementing integer number maintained by the parent node.

(4) EPHEMERAL_SEQUENTIAL- temporary sequential node

The basic feature is the same as the temporary node, with the addition of an order attribute, and the node name is appended with a self-incrementing integer number maintained by the parent node.

6. ZooKeeper Watcher mechanism — data change notification

ZooKeeper allows a client to register a Watcher with a server ZNode. When the Watcher is triggered by a specified event on the server, the server sends an event notification to the specified client to implement distributed notification. The client then makes business changes based on the Watcher notification status and event type.

Working mechanism:

(1) Register Watcher on the client side

(2) The server handles Watcher

(3) The client calls back watcher

Watcher features summary:

(1) Disposable

Once a Watcher is touched, ZooKeeper will remove it from the corresponding store for both the server and the client. This design effectively relieves the pressure of the server. Otherwise, for the nodes with very frequent updates, the server will constantly send event notifications to the client, which will cause great pressure on both the network and the server.

(2) Serial execution of the client

The process of the client Watcher callback is a serial synchronous process.

(3) light weight

3.1 Watcher notifications are very simple, they only tell the client that the event has occurred, but do not explain the specific content of the event.

3.2 When a client registers a Watcher with the server, it does not pass the actual Watcher object entity from the client to the server. It is just marked in the request by the client with a Boolean property.

Watcher notification events are sent asynchronously from the server to the client. There is a problem that different clients and servers communicate through sockets. Because of network delay or other factors, the client listens to events when there is no access. Because ZooKeeper provides ordering guarantee itself, the client will not know the changes of the monitored Znodes until it listens for events. So we can’t use ZooKeeper to expect to be able to monitor every node change. ZooKeeper can only guarantee final consistency, not strong consistency.

(5) Register for Watcher GetData, Exists, and GetChildren

(6) trigger watcher create, delete, setData

(7) When a client connects to a new server, watch will be triggered with any session event. When you lose the connection to a server, you cannot receive a watch. When the client reconnects, if necessary, all previously registered watches will be re-registered. Usually this is completely transparent. Only in one special case can the watch be lost: An eXist watch for an uncreated ZNode can be lost if it is created during the client disconnect and then deleted before the client connects.

7. The client registers the Watcher implementation

(1) Call getData()/getChildren()/ eXist () API and pass Watcher object

(2) Mark the request request, encapsulate Watcher to WatchRegistration

(3) Encapsulated as Packet object, and the sender sends request

(4) After receiving the response from the server, register Watcher in ZKWatcherManager for management

(5) Request return and complete registration.

8. Watcher implementation of server-side processing

(1) The server receives and stores Watcher

After receiving the client request, it determines whether it is necessary to register Watcher. If necessary, it connects the node path of the data node with ServerCnxn (ServerCnxn represents a connection between the client and the server, which realizes the Process interface of Watcher. Watcher object is stored in WatcherManager’s WatchTable and Watch2Paths.

(2) Watcher trigger

Take the NodeDataChanged event triggered by a server receiving a request for a setData() transaction as an example:

2.1 packaging WatchedEvent

The notification state (SyncConnected), event type (NodeDataChanged), and node path are encapsulated as a WatchedEvent object

2.2 query Watcher

Find Watcher by node path from WatchTable

2.3 Not found; No client has registered Watcher on this data node

2.4 find; Extract and delete Watchers from WatchTable and Watch2Paths (Watchers are disposable on the server side and fail once triggered)

(3) Call the process method to trigger the Watcher

Here, Process mainly sends Watcher event notification through the corresponding TCP connection of ServerCNXN.

9. The client calls back Watcher

The client SendThread thread receives the event notification, which is passed to the EventThread thread to call back to Watcher.

The Watcher mechanism on the client side is also one-time, and once triggered, the Watcher is deactivated.

10. ACL permission control mechanism

UGO (User/Group/Others)

Currently used in Linux/ UNIX file systems, it is the most widely used form of permission control. Is a coarse-grained file system permission control mode.

ACL (Access Control List

Including three aspects:

Permission Mode (Scheme)

(1) IP: Permission control from IP address granularity

(2) Digest: The most commonly used, with similar to the username:password permissions to configure, easy to distinguish between different applications for access control

(3) World: the most open access control method, is a special digest mode, only one access identifier “World :anyone”

(4) Super: Super user

Authorization object

An authorized object is a user or a specified entity to which permission is granted, such as an IP address or a machine light.

Access Permission

(1) CREATE: permission to CREATE data nodes, which allows authorized objects to CREATE child nodes under this ZNode

(2) DELETE: Allows an authorized object to DELETE a child node of the data node

(3) Read: the READ permission of the data node, which allows the authorized object to access the data node and READ its data content or child node list, etc

(4) Write: The authorization object is allowed to update the data node

(5) Admin: data node management authority, allowing the authorization object to carry out ACL related setting operations on the data node

11. Chroot feature

Since version 3.2.0, the Chroot feature has been added, which allows each client to set a namespace for itself. If a client sets up Chroot, any actions that the client does to the server will be restricted to its own namespace.

By setting up the Chroot, you can apply a client to a sub-tree of the ZooKeeper server, which is useful for isolating different applications from each other in situations where multiple applications share a ZooKeeper group.

Session management

Bucket strategy: Manage similar sessions in the same block to make it easier for ZooKeeper to separate sessions from different blocks and process sessions in the same block.

Allocation principle: ExpirationTime for each session

Calculation formula:

ExpirationTime_ = currentTime + sessionTimeout

ExpirationTime = (ExpirationTime_ / ExpirationInrerval + 1) *

I would like you to expirationInterval as a context check for the ExpirationInterval of a ZooKeeper session

13. Server role


(1) The only scheduler and handler of transaction requests, ensuring the sequentiality of cluster transaction processing

(2) The scheduler of each service within the cluster


(1) Handle the non-transactional request of the client and forward the transaction request to the Leader server

(2) Participate in the voting of the transaction request Proposal

(3) Participate in Leader election voting


(1) A server role introduced after version 3.0 improves the non-transactional processing capacity of the cluster without affecting the transactional processing capacity of the cluster

(2) Handle the non-transactional requests of the client and forward the transaction requests to the Leader server

(3) Do not vote in any form

14. Server working status under ZooKeeper

The server has four states, which are LOOKING, FOLLOWING, LEADING, and OBSERVING.

(1) Looking: Find the Leader state. When the server is in this state, it assumes that there is no Leader in the cluster and therefore needs to enter the Leader election state.

(2) Follow the FOLLOWING state. Indicates that the current server role is Follower.

(3) He is the leader of the team. Indicates that the current server role is Leader.

2. I would come up against OBSERVING states. Indicates that the current server role is an Observer.

15. Data synchronization

After the whole cluster completes the Leader election, Learner (collectively referred to as Follower and Observer) goes back to the Leader server to register. When Learner server wants the Leader server to complete registration, it enters the data synchronization link.

Data synchronization process :(all in the form of message passing)

Learner registers with Learder

Data synchronization

Synchronous confirm

ZooKeeper’s data synchronization generally falls into four categories:

(1) Direct differentiation synchronization (Diff synchronization)

(2) first rollback and then differentiated synchronization (TRUNC+DIFF synchronization)

(3) Rollback synchronization only (TRUNC synchronization)

(4) Full synchronization (SNAP synchronization)

Before data synchronization, the Leader server will complete the data synchronization initialization:


· LastZXID is extracted from the Ackepoch message sent by Learner server during registration (the ZXID last processed by the Learner server)


· Minimal ZXidMaxCommittedLog of the committedLog cache queue of the Leader server Proposal:

· Direct differentiation synchronization of maximum ZXID in committedLog of the Leader server Proposal cache queue (DIFF synchronization)

· Scenario: PeerLastZXID is rolled back between MinCommittedLog and MaxCommittedLog and then differentiated synchronization (TRUNC+DIFF synchronization)

Scenario: When the new Leader server finds that a Learner server contains a transaction record that it does not have, then it is necessary to let the Learner server rollback the transaction — rollback to the existing Leader server, ZXID which is also the closest to PeerLastZxID rollback synchronization only (TRUNC synchronization)

· Scenarios: PeerLastZXID is greater than MaxCommittedLog

Full Sync (SNAP Sync)

· Scenario 1: PeerLastZXID is less than MinCommittedLog

· Scenario 2: There is no Proposal cache queue on the Leader server and PeerLastZXID is not equal to LastProcessZXID

16. How does ZooKeeper ensure the sequential consistency of transactions?

ZooKeeper uses a globally increasing transaction ID to identify all proposals with a zxid, which is actually a 64-bit number. The high 32-bit epoch is the epoch; Era; The world; If a new leader is created, the epoch will increase automatically. The lower 32 bits are used to increment the count. When a new proposal is generated, according to the two-stage process of the database, it will first send a transaction execution request to other servers. If more than half of the machines can execute and succeed, then the execution will start.

17. Why is there a Master node in a distributed cluster?

In a distributed environment, some business logic only needs to be executed by one machine in the cluster, and other machines can share the result, which can greatly reduce repeated calculation and improve performance. Therefore, leader election is required.

18. How to deal with ZK node downtime?

ZooKeeper is itself a cluster, and it is recommended that you configure no fewer than three servers. ZooKeeper itself ensures that when one node goes down, the other nodes will continue to serve.

If one Follower fails, there are two servers to access. Since ZooKeeper has multiple copies of the data, the data is not lost.

If a Leader goes down, ZooKeeper elects a new Leader.

The mechanism of the ZK cluster is that as long as more than half of the nodes are normal, the cluster can provide normal services. The cluster will only fail if there are so many ZK nodes hanging that only half or less of them are working.


A cluster of 3 nodes can fail 1 node (the leader can get 2 votes >1.5)

A cluster of 2 nodes cannot fail any node (the leader can get 1 vote <=1)

19. The difference between ZooKeeper and Nginx load balancing

ZK load balancing can be adjusted and controlled, Nginx can only adjust the weight, other need to control the need to write their own plug-in; However, the throughput of Nginx is much higher than that of ZK, and it should be said that the business chooses which way to use.

20. What are the deployment modes of ZooKeeper?

ZooKeeper has three deployment modes:

  1. Standalone deployment: running on a cluster;
  2. Cluster deployment: multiple clusters run;
  3. Pseudo cluster deployment: A cluster starts multiple ZooKeeper instances.

21. What is the minimum number of machines required for clustering? What are the clustering rules? There are 3 servers in the cluster, and one of the nodes is down. Can ZooKeeper still be used?

The cluster rule is 2N+1, N>0, that is, 3. You can continue to use odd servers as long as no more than half of the servers are down.

22. Does the cluster support dynamically adding machines?

ZooKeeper is not very good at this. Two ways:

Restore all ZooKeeper services: Close all ZooKeeper services and start after modifying the configuration. Does not affect the previous client’s session.

One by one restart: Under the principle that more than half of the machine can be used, the restart of one machine does not affect the external service of the whole cluster. This is the more common way.

Version 3.5 starts to support dynamic capacity expansion.

23. Is ZooKeeper’s watch listening notice to nodes permanent? Why not forever?

It isn’t. Official Note: A Watch event is a one-time trigger that notifies clients of changes to the data on which the Watch has been set.

Why is it not permanent? For example, if the server changes frequently, and the listening client in many cases, every change has to be notified to all the clients, which puts a lot of stress on the network and the server.

In general, the client executes getData(“/node A “,true). If node A is changed or deleted, the client will get its watch event. However, after node A is changed again, and the client does not set the watch event, it will not send it to the client.

In practice, in many cases, our client does not need to know every change in the server, I just want the latest data.

24. What are the Java clients of ZooKeeper?

Java Client: ZKClient of ZK and Open Source Toth of Apache.

What is Chubby and what do you think of it compared to Zookeeper?

Chubby is Google’s, fully implemented Paxos algorithm, not open source. ZooKeeper is an open source implementation of Chubby, using the Zab protocol, a variant of the Paxos algorithm.

26. Tell me some common commands of ZooKeeper.

Ls get set create delete

27. What are the connections and differences between Zab and Paxos?


(1) Both have a role similar to that of the Leader process, which is responsible for coordinating the running of multiple Follower processes

(2) The Leader process will wait for more than half of the followers to give correct feedback before submitting a proposal

(3) In Zab protocol, each Proposal contains an epoch value to represent the current Leader cycle, which is named in Paxos Ballot


Zab is used to build a highly available distributed data primary/standby system (ZooKeeper), and Paxos is used to build distributed consistent state machine systems.

28. Typical application scenarios of ZooKeeper

ZooKeeper is a typical distributed data management and coordination framework with a publish/subscribe model. Developers can use it to publish and subscribe distributed data.

Through the cross-use of the rich data nodes in ZooKeeper and the Watcher event notification mechanism, it is very convenient to build a series of core functions that will be involved in the middle age of distributed applications, such as:

(1) Data publish/subscribe

(2) Load balancing

(3) Naming Service

(4) Distributed coordination/notification

(5) Cluster management

(6) Master election

(7) Distributed locking

(8) Distributed Queue

Data publish/subscribe


A data publish/subscribe system, known as a configuration center, is, as the name suggests, where a publisher publishes data for subscribers to subscribe to.


Dynamic data acquisition (configuration information)

Realize centralized management of data (configuration information) and dynamic update of data

Design patterns

Push model

The Pull model

Data (configuration information) features

(1) The amount of data is usually small

(2) Data content will be dynamically updated during runtime

(3) All machines in the cluster are shared with consistent configuration

Such as: machine list information, runtime switch configuration, database configuration information, etc

Implementation based on ZooKeeper

· Data store: store data (configuration information) to a data node on ZooKeeper

· Data acquisition: the application reads data from the ZooKeeper data node at the startup initialization node, and registers a data change Watcher on the node

· Change of data: when changing data, update the data of corresponding node of ZooKeeper. ZooKeeper will send the data change notice to each client, and the client can re-read the changed data after receiving the notice.

Load balancing

ZK’s naming service

Naming service refers to getting the address of a resource or service by a specified name, and using ZK to create a global path, which can be used as a name to point to a cluster within a cluster, the address of a service provided, or a remote object, etc.

Distributed notification and coordination

For system scheduling: The operator sends notifications to actually change the state of a node through the console, and ZK sends those changes to all clients of the watcher that has registered the node.

For performance reporting: Each worker process creates a temporary node in a directory. It also carries the progress data of the work, so that the summarized process can monitor the changes of the subnodes of the directory to get a real-time global picture of the progress of the work.

ZK Naming Service (File System)

Naming service is to get the address of the resource or service by the specified name. Using ZK to create a global path, that is, a unique path, this path can be used as a name, pointing to the cluster in the cluster, the address of the service provided, or a remote object, etc.

ZK configuration management (file system, notification mechanism)

The program is distributed on different machines, and the configuration information of the program is placed under the Znode of ZK. When the configuration changes, that is, when the Znode changes, the content of a directory node in ZK can be changed, and the Watcher can be used to notify each client to change the configuration.

ZooKeeper cluster management (file system, notification mechanism)

Cluster management doesn’t care about two things: whether there are machines exiting and joining, and electing masters.

For the first point, all machine conventions create a temporary directory node in the parent directory and then listen on the parent directory node

Child node change message. When a machine dies, its connection to ZooKeeper is broken, the temporary directory node it created is deleted, and all other machines are notified that a sibling directory has been deleted, so everyone knows it’s on board.

When a new machine is added, all machines are notified that a new sibling directory has been added, and highcount is now available again. For the second point, we will change it slightly. All machines will create a temporary sequential numbered directory node, each time selecting the machine with the smallest number as master.

ZooKeeper distributed lock (file system, notification mechanism)

With ZooKeeper’s consistent file system, the locking problem becomes easier. Lock services can be divided into two categories, one is to maintain exclusivity and the other is to control timing.

For the first category, we treat a ZNode on ZooKeeper as a lock, implemented by createZNode. All clients create the /distribute_lock node, and the client that succeeds in creating it will own the lock. Deleting the self-created distribute_lock node releases the lock.

For the second category, /distribute_lock already exists and all clients create a temporary sequentially numbered directory node under it, as with master, the smallest one gets the lock, and when it’s used, it’s deleted, and so on.

ZooKeeper queue management (file system, notification mechanism)

There are two types of queues:

(1) Synchronous queue. This queue can only be used when all the members of a queue are gathered, otherwise it will wait for all the members to arrive.

(2) The queue is queued and queued in accordance with FIFO.

The first is to create a temporary directory node in a convention directory and listen to see if the number of nodes is the number we want.

The second type, which is consistent with the basic principle of the control timing scenario in the distributed lock service, has a numbered entry and a numbered exit. Create a PERSISTENT_SEQUENTIAL node in a specific directory and a queue that Watcher will notify you of waiting on success. The queue removes the node with the smallest sequence number for consumption. In this scenario, ZooKeeper’s ZNode is used for message storage. The data stored in the ZNode is the message content in the message queue. The Sequential Sequence number is the number of messages that are retrieved in order. Since the nodes created are persistent, there is no need to worry about losing queue messages.

29. What functions does ZooKeeper have?

  1. Cluster management: monitoring node survival status, running requests, etc.;
  2. Primary node election: After the primary node dies, a new round of primary election can be started from the standby node. Primary node election refers to the election process, and ZooKeeper can help complete this process.
  3. Distributed locks: ZooKeeper provides two types of locks: exclusive locks and shared locks. An exclusive lock means that only one thread can use the resource at a time. A shared lock means that a read lock is shared. A read and write lock means that multiple lines can read the same resource at the same time. ZooKeeper controls distributed locks.
  4. Naming service: In a distributed system, by using a naming service, the client application can obtain the address, provider and other information of the resource or service according to the specified name.

30. What about ZooKeeper’s notification mechanism?

A client creates a watcher event for a Znode. When the Znode changes, the client will be notified by the ZK. Then the client can make business changes based on the Znode changes.

31. What is the relationship between Zookeeper and Dubbo?

What ZooKeeper does:

ZooKeeper is used to register services and load balance them. The caller must know which service is provided by which machine. This is simply the IP address and the service name. Of course, this correspondence can also be hard-coded into the caller’s business code, but if the machine providing the service dies, the caller cannot know, and if the code is not changed, the caller will continue to request the machine providing the service. ZooKeeper uses a heartbeat mechanism to detect dead machines and remove the IP and service correspondence of dead machines from the list. As for supporting high concurrency, it’s simply scaling horizontally, adding machines to increase computing power without changing the code. By adding a new machine to register the service with ZooKeeper, the service provider can serve as many customers as possible.


Dubbo is a tool for managing the middle tier. From the business tier to the data warehouse, there are many access services and service providers that need to be scheduled. Dubbo provides a framework to solve this problem. Note that the dubbo here is just a frame, and it’s up to you what you put on the rack, like a car skeleton, that you need to fit with your wheel engine. In this framework you have to have a distributed registry that stores all the metadata for all the services, you can use ZK, you can use anything else, but everyone uses ZK.

ZooKeeper and Dubbo:

Dubbo abstracts the registry, and it can serve different storage media to the registry, such as ZooKeeper, Memcached, Redis, etc.

By introducing ZooKeeper as a storage medium, we introduced the features of ZooKeeper. First of all, load balancing. The carrying capacity of a single registry is limited, and it needs to be distributed when the flow reaches a certain degree. Load balancing exists for the purpose of distribution. Resource synchronization, load balancing alone is not enough, data and resources between nodes need to be synchronized, ZooKeeper cluster is naturally equipped with such function; The service provider writes its URL address to the specified node /dubbo/${serviceName}/providers directory on ZooKeeper at startup time. This operation is complete to publish the service. Other features include MAST voting, distributed locking, and more.

Author: thinkwon


This article first in the public number: Java version of the Web project, welcome to pay attention to get more exciting content