ZooKeeper is a service for coordinating (synchronizing) distributed processes. It provides a simple, high-performance coordination kernel on which users can build more complex distributed coordination functions.

Multiple distributed processes operate the shared ZooKeeper memory data object ZNode through the API provided by ZooKeeper to achieve some consistent behavior or result. This mode is essentially a state-sharing concurrency model, consistent with the Multi-threaded concurrency model of Java, and their threads or processes are “shared memory communication”. Java does not directly provide some kind of responsive notification interface to monitor changes in the state of an object, so you can either waste CPU time with unresponsive polling retries, or respond to state changes based on some proactive notification (Notif) mechanism provided by Java (built-in queues) that requires cyclic blocking calls. When ZooKeeper implements the state sharing of these distributed processes (Data and Children of ZNode), it adopts a similar asynchronous non-blocking active notification mode, namely Watch mechanism, based on performance considerations, which makes the “shared state communication” between distributed processes more real-time and efficient. In fact, this is the main task of ZooKeeper — coordination. Consul also implements the Watch mechanism, but it is a blocking long poll.

  • ZooKeeper VS JVM

In some ways, the ZooKeeper pair is equal to the JVM, ZooKeeper contains the state object (ZNode) and Zab, the underlying execution engine for distributed processes, The JVM contains both the heap (a large area of object storage shared by multiple threads) and the JMM (Java Memory Model), which ensures that multiple threads are executed in the correct order. Zab protocol makes ZooKeeper’s internal state modification operation directly sequential, while JVM’s internal state modification operation is out of order and parallel, so additional mechanisms need to be added to ensure timing (memory barrier, processor atomic instructions), and when state reading, both JVM and ZooKeeper read old data when they read directly. But ZooKeeper has a Watch mechanism to make responsive reads more efficient, and the JVM can only use the underlying memory barrier to refresh the shared state so that other threads get the correct new data when they read again.

ZooKeeper provides an interface that allows all distributed processes to execute asynchronously and non-blocking (WaitFree), with version-based CAS operations inside, while the JVM provides a variety of blocked and non-blocking interfaces. There are Synchronized, Volatile, and AtomicOperations. When building more complex synchronization or coordination functions between threads or distributed processes on top of interfaces, the Java concurrency library directly provides synchronization tools such as latches, cyclic fences, semaphores, and basic abstract queue synchronizers. ZooKeeper requires users to build various distributed coordination functions (distributed locking, distributed publishing and subscription, and cluster membership management) based on the interface. The diagram below:

ZooKeeper
JVM
Shared state object
ZNode The heap object
Underlying execution mode
Zab is executed in sequence Concurrent execution of multiple processors (memory barriers, atomic machine instructions)
API interface
Get, Watch_Get, Cas_Set, Exist Synchronized, volatile, final, Atomic
Coordination or synchronization functions
Distributed publish subscriptions, locks, read and write locks Concurrent library synchronization tool, synchronization component based on abstract queue synchronizer

  • Watch architecture of ZooKeeper

The overall process of Watch is shown in the figure below. The client successfully registers the node status it wants to monitor with the ZooKeeper server first, and the client stores the information related to this listener locally. In WatchManager, when the data status monitored by the ZooKeeper server changes, ZooKeeper will proactively notify and send the corresponding event information to the relevant session client, and the client will respond to the local callbackWatcherThe Handler.

  • Watch feature of ZooKeeper
  1. The Watch is a one-off, requiring re-registration each time, and the client does not receive any notification if the session ends abnormally, while fast reconnection still does not affect receiving notifications.
  2. The callback execution of Watch is performed sequentially, and the client will not see the latest data before receiving notification of the change event of concerned data. In addition, it is necessary to pay attention not to block the entire Watch callback of the client in the Watch callback logic
  3. Watch is lightweight, and WatchEvent is the smallest communication unit, structurally containing only notification status, event type, and node path. The ZooKeeper server only notifies the client of what is happening, not the details.
  • Watcher interface design



As shown in the figure above, Watch is designed as an interface, and any class that implements the Watcher interface is a new Watcher. The Watcher contains two enumerated classes, one is KeeperState, which represents the state of ZooKeeper when an event occurs, and the other is the type of event occurring. It is mainly divided into two categories (one is the change of ZNode content, the other is the change of ZNode child nodes). See the following table for detailed description.

KeeperState
EventType
TriggerCondition
EnableCalls
Desc

SyncConnected

(3)

None

(1)

The client successfully establishes a session with the server. Procedure The client is connected to the server
Same as above NodeCreated

(1)

The corresponding data node that Watcher listens to is created Exists Same as above
Same as above NodeDeleted

(2)

The corresponding data node Watcher is listening to is deleted Exists, GetData, and GetChildren Same as above
Same as above NodeDataChanged

(3)

The data content and data version of the data node monitored by Watcher change Exists and GetData Same as above
Same as above NodeChildrenChanged

(4)

If the list of child nodes of the data node monitored by Watcher changes, the change of child node content will not be triggered GetChildren Same as above
Disconnected

(0)

None

(1)

The client is disconnected from the ZooKeeper server The client is disconnected from the server
Expried

(112)

None

(1)

Session timeout In this case, the client session fails and usually receives a SessionExpiredException
AuthFailed

(4)

None

(1)

There are usually two situations:

1. Use an incorrect scheme to check permissions

2.SASL permission check fails

Description AuthFailedException was received

  • The design of the WatchEvent



As shown in the figure above, WatchEvent has two modes of representation. One is logical representation, namely WatchedEvent, which directly encapsulates various abstract logical states (KeeperState, EventType) and is suitable for internal processing of both client and server. The other is the physical representation that encapsulates more of the underlying transmission data structure (int, String), and implements the serialization interface, mainly used to do the underlying data transmission.