An overview of the

What is Zookeeper?

There are all kinds of animals in the Zoo. Some of them are grumpy, some can climb trees, some can sing, and some can even dance. So we need a Keeper to manage them.

Our program is very strange and varied, of course, we also need a Boss to manage!

This article is organized by leaps and bounds. If you don’t understand any of the concepts, look at the following chapters.

XDM I put a link to the official document here, I think the official document is quite clear, I suggest you have a look first. Let me know in the comments if you think I’ve done something wrong. I’ve only been reading this stuff for a week and it’s over.

To start, the HelloWorld

This step is easy, just download, unzip, configure something, and run.

The tutorial on the website is pretty straightforward, so I won’t show you any more.

Well, let me just say something.

The singleton pattern

  • Create a zoo.cfg file in the conf folder. Zk will detect the conf folder. If there is a user-defined file, use it.
# The number of ticks of the clock, milliseconds, as well as the basic unit of time, the subsequent time is N times that. TickTime =2000 # Data directory, indicating where to store the data. DataDir = /var/lib/ZooKeeper # Exports to the server to connect to: listener port clientPort=2181
  • Start in the bin folder./ zkserver.sh start.
./zkServer.sh start
  • To connect to the server using the command line, execute it in the bin folder./ zkcli.sh-server 127.0.0.1:2181
. / zkCli. Sh - server 127.0.0.1:2181
  • If the connection is successful, you will see the following output:
Connecting to localhost:2181
log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
Welcome to ZooKeeper!
JLine support is enabled
[zkshell: 0]

See the command website for more details.

Each ZK has an initial root node of ‘/’, and you can create a node below it. For example, I created a test_data node, so the path of this node is /test_data, and each node can also store data. Much like UNIX /Linux file systems, you can think of nodes as folders and the data inside nodes as files; If the current node is not a leaf node, then the node contains one or more “folders” and a “file”. Hopefully, this will give the reader an understanding of ZK’s concept of nodes.

Now we create a node from the client side and write the data:

First look at the root:

There is a ZooKeeper node under the root node, which is built-in, so don’t worry about it.

Then we create a node, called zk_test, and write the data:

To view the status of the node and the data in the node:

We can also set new data and read:

At this point, a singleton ZK is over! See the official website for other orders.

Pseudo-cluster mode

If, like me, you don’t have the money or just want to debug locally, we can consider using pseudo-clustering and running multiple instances of ZK locally for clustering.

As we all know, a network process can be uniquely identified by IP:PORT, so we can set up multiple instances of ZK not to listen on the same PORT.

  • 1⃣ “create multiple ZK instance folders first:

I’ve created three folders, each of which has an instance of ZK. Why three? And preferably an odd number? Since ZK requires that more than half of the cluster be available, the cluster is available, so it is an odd number, so at least three.

  • Next remember to create a myid file in the data directory (which is the directory in the configuration file) that contains a number indicating the current server ID.
Echo [server ID] > myid

Here, the data for each of my instances is stored in the data folder, so I CD data and type echo [server.id] > myid, hit enter.

  • 3⃣ then modify the configuration file for each instance:

zk1:

TickTime =2000 dataDir=/usr/local/zk/zk1/data clientPort=8391 # tickTime= 10 * tickTime, InitLimit =10 # synchronization time =5 * tickTime, if timeout means synchronization failed, ServerPort =8491: "How do you locate server 2?" : "SynClimit =5 admin.serverPort=8491" : "How do you locate server 2 Server.2=127.0.0.1:2889:3889 # Server.2=127.0.0.1:2889:3889 # Server.2=127.0.0.1:2889:3889 Vote, vote, vote, vote, vote Server. 3 = 127.0.0.1:2890-3890

Same thing with Zk2 and Zk3. ⚠️ The clientPort and admin.serverPort for ZK2 and ZK3 need to be changed so that each instance listens on a different clientPort.

  • Finally, we can turn on the servers to implement the pseudo-cluster.

Let’s first look at the data for each server:

Remember that we created a node and set the data in singleton mode? I used that node as server 1. Now we can query server 2 to get the following:

Obviously, the data is synchronized. The data set on server 1 appears on server 2, indicating the cluster is successful!

Then the reader can try to create nodes in any of the three clients, set the data, and see if the other nodes are synchronized below.

Cluster pattern

This is as simple as changing the IP and PORT of the dummy cluster to the actual server and it’s over.

Design patterns

Website in this

The data model of ZK. We mentioned above that each ZK node has a root node ‘/’, each root node can set the desired child node, the child node can also set the child node… , each node can have 0/N child nodes, and can bind 0/1 data. If we think of each ZK instance as a file system, then each node is a folder that can contain a maximum of one file and N folders.

ZNode

Each node in ZK, which we call ZNode, is uniquely identified by a path, just as folders are uniquely identified by a folder path.

Each ZNode contains a stateful data structure that records information such as data version number, timestamp, and access rights. The data version number and timestamp verify that the update to the data is valid because the version number is automatically +1 each time the data is updated. Every time the client gets the data of this node, it will also get the version number of this data. Each time it updates the data, it will send the version number it gets to ZK to see if it is equal to the current version number. If not, it means it has been updated by other data.

ZNode is the primary target for client access, so it needs to be mentioned.

Watches

A client can add Watches to a ZNode. Every time a node changes, such as it is deleted, a child node is deleted, data is deleted, or data is updated, the brand will notify the client that set it and it will be deleted. The lifetime of any watch is a ZNode state change.

The data access

ZNode data is read and written atomically, each read reads the entire data, and each write overwrites the original data and writes the entire data.

In addition, ZNode can save no more than 1M data, that is, 1024KB. The official advice is that the smaller the better, otherwise big data will involve more IO and network operations, which will cause synchronization delays. You can put configuration files, you can put key data, like Redis keys, and if you really want to zoom in, you can store it somewhere else, and then you can put a pointer here to where it’s stored.

Temporary node

Temporary nodes, as the name implies, are created by the client and automatically deleted when the client connection is closed.

This feature has many applications, such as cluster monitoring: we can do will each server at startup create a temporary nodes on other servers, so that when the server is down, the other servers in the preservation of the server to create temporary node will be deleted, so you know who’s down, who is working.

Persistence nodes

Persistent nodes, as the name implies, are not deleted when a client connection is broken.

Sequential persistence nodes

Sequential persistent nodes are ordered and are not deleted when the connection is broken. Why is it sequential?

When a client requests to create a sequential node under a node, ZK adds a counter after the newly created node name, which is monotonically increasing and of the format %010d. In this way, the nodes created each time are ordered, and the naming is also unique. Since the naming is unique, it can be used as the naming service in the cluster.

Sequential temporary node

Orderly but temporary nodes. A good application is distributed locking.

If we want to implement distributed locking with sequential temporary nodes, we can do this:

  • Creates a temporary sequential node at the specified path (for example, /locks).
  • Get all nodes under /locks through the getChildren() method.
  • If the current node is the smallest, the lock is obtained and executed. Otherwise, a listener is registered on the node preceding the current node.
  • If you get the lock, you finish executing and release the lock, which is done by deleting your own temporary node. This triggers the watch that listens for on that node and then notifies the next node to acquire the lock.

Now let’s take a few questions.

Why is it a temporary sequential node? Sequential nodes are easy to understand. In order to ensure that the smallest node in the current node list gets the lock every time, the meaning of temporary nodes is that when the lock owner goes down, the node is automatically deleted without triggering a deadlock. Second, what’s the good of it? One of the benefits is the avoidance of a scare effect, where the release of a lock does not cause all processes to compete for the lock, and fair locking is achieved.

Speaking of locks, let’s talk about how to create unfair, preemptive, exclusive locks, and shared locks.

Let’s start with exclusive locks:

  • On the specified path, create a temporary node with the same name. Since ZK guarantees that only one client will succeed when creating a node, only one client will hold the lock (the node created successfully).
  • If the creation of a node fails, it means that another client has taken the lock. We can register a listener on this node, and when the node is deleted, we can compete again.
  • If the lock is obtained, it is executed, and then the lock is released (the node is deleted).

Take a look at shared locks:

  • Create a temporary sequential node on the specified path, and specify the type of the node (which is read or write by adding R/W to the prefix of the node). By the way, the self-incrementing of the sequential node has nothing to do with the node name, as long as you create a sequential node, the suffix of the sequential node is self-incrementing.
  • When you want to do a read operation, it depends on whether you are the smallest, if not, it depends on whether the previous node has a write operation; If it is the smallest/there is no write before it, it can be read. Otherwise register the watch on the previous largest write node.
  • When you want to do a read, you see if you are the smallest, and if not, you register a watch on the node before you.

What rookie course says is very clear, everybody can have a look.

There are also some packaged ZK-based distributed locks that you can use directly.

Container node

New features, not mentioned for the moment

TTL node

New features, not mentioned for the moment

ZK time format

This understanding of the line, I do not translate, I directly posted:

  • Zxid Every change to the ZooKeeper state receives a stamp in the form of a zxid (ZooKeeper Transaction Id). This exposes the total ordering of all changes to ZooKeeper. Each change will have a unique zxid and if zxid1 is smaller than zxid2 then zxid1 happened before zxid2.
  • Version numbers Every change to a node will cause an increase to one of the version numbers of that node. The three version numbers are version (number of changes to the data of a znode), cversion (number of changes to the children of a znode), and aversion (number of changes to the ACL of a znode).
  • Ticks When using multi-server ZooKeeper, servers use ticks to define timing of events such as status uploads, session timeouts, connection timeouts between peers, etc. The tick time is only indirectly exposed through the minimum session timeout (2 times the tick time); if a client requests a session timeout less than the minimum session timeout, the server will tell the client that the session timeout is actually the minimum session timeout.
  • Real time ZooKeeper doesn’t use real time, or clock time, at all except to put timestamps into the stat structure on znode creation and znode modification.

Zk state structure

The state structure is used to record the state information of the node. The state structure is used to record the state information of the node.

  • czxid The zxid of the change that caused this znode to be created.
  • mzxid The zxid of the change that last modified this znode.
  • pzxid The zxid of the change that last modified children of this znode.
  • ctime The time in milliseconds from epoch when this znode was created.
  • mtime The time in milliseconds from epoch when this znode was last modified.
  • version The number of changes to the data of this znode.
  • cversion The number of changes to the children of this znode.
  • aversion The number of changes to the ACL of this znode.
  • ephemeralOwner The session id of the owner of this znode if the znode is an ephemeral node. If it is not an ephemeral node, it will be zero.
  • dataLength The length of the data field of this znode.
  • numChildren The number of children of this znode.

The ZK Watches

A watch is an event trigger that fires every time the node it listens for changes in data, notifies the client that set it, and then delets it.

Any read to a node can be accompanied by a set of Watches. Examples are getData(), getChildren(), and Exists ().

Through the above paragraph, we can summarize three properties about Watches:

  • Trigger once. Once the Watches are triggered, the Watches are removed, and the client cannot be notified of another change to the same node unless the client sets it again.
  • Notifications to the client. If a client does not set Watches, no matter how the node of ZK changes, the client will not know. In addition, even if Watches are set, there is no guarantee that you will always see the latest data; How do you understand that? For example, when a client sets a brand name on a node, and the data changes, the brand name will set its client when it is sent, but in the process of sending, other clients modify the data. At this time, the node changes again, and because the brand name is still being sent, the second update of the data will be lost. The reason is that the Watches are sent asynchronously, and ZK will not wait for the Watches to be successfully sent before proceeding with the next operation. In fact, it is easy to understand, after all, it is impossible to make the entire cluster wait for network card users.
  • Watches on different types. Watches can be divided into two kinds, one is to monitor the node data changes, and the other is to monitor the child nodes. Although the client can set Watches by reading the node operation, the Watches set are not necessarily of the same type. For example, setData() triggers a DataWatches on the node, and create() triggers a childWatches and a DataWatches on the parent. The delete() operation fires the DataWatches and ChildWatchs for the parent node and ChildWatches for the child node.

When the client disconnects from reconnection, the Watches set previously can be used again; If you connect to a new server, it will trigger session events for ZK.

The Watches can only be triggered by three reads. Now look at the details:

  • Created Event: This is triggered by exists().
  • Deleted Event: triggered by exists(), getData(), and getChildren().
  • Changed Event: Triggered by exists() and getData().
  • Child Event: Triggered by getChildren().

ZK access control

Take a look at the access control supported by ZK:

  • CREATE: you can create a child node
  • READ: you can get data from a node and list its children.
  • WRITE: you can set data for a node
  • DELETE: you can delete a child node
  • ADMIN: you can set permissions

The consistency guarantee of ZK

ZK provides the following synchronization guarantees:

  • Sequential synchronization. Multiple update operations initiated by the same client are performed in the same order as they are sent.
  • Atomicity. The update operation has only two results: success or failure.
  • Single mirror image. No matter which instance the client is connected to, what it sees is the same, and the entire ZK cluster is exposed as an instance image.
  • Reliability. Once the update is successful, the data is persisted until the next update occurs.
  • Timeliness. ZK can ensure that the client can see the latest system within a certain time range, in this time range, the system update can also be known to the client.

ZooKeeperJavaClient

This section mainly shows you how to use Java to access the ZK server and perform operations.

When using a client connection, you can specify more than one IP:PORT, the client will pick any one, connect, and if that fails, try another until it succeeds. If you disconnect, you try to reconnect.

Here is only the simplest usage:

public class Main { public static void main(String[] args) throws IOException, InterruptedException, KeeperException { AtomicInteger atomicInteger = new AtomicInteger(0); ZooKeeper zooKeeper = new ZooKeeper("127.0.0.1:8392", 2000, event -> System.out.println(atomicInteger.incrementAndGet() + ": " + event.toString())); If (zooKeeper.exists("/zk_test1", true) == null) {// System.out.println(" Node: /zk_test1 does not exist, ready to create "); zooKeeper.create("/zk_test1", "test_data".getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT); } if (zooKeeper. Exists ("/zk_test1/sub1", true) == null) {// System.out.println(" Node: /zk_test1/sub1 does not exist, is about to be created "); zooKeeper.create("/zk_test1/sub1", "sub_test_data".getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT); } if (zooKeeper.exists("/zk_test1/sub1", true) ! = null) { Stat stat = new Stat(); zooKeeper.getData("/zk_test1/sub1", true, stat); // System.out.println("data ver: " + stat.getVersion()); SetData ("/zk_test1/sub1", "sub_test_data0".getBytes(), stat.getversion ()); zooKeeper.getData("/zk_test1/sub1", true, stat); // System.out.println("data ver: " + stat.getVersion()); } LockSupport.parkNanos(TimeUnit.SECONDS.toNanos(2)); zooKeeper.delete("/zk_test1/sub1", -1); zooKeeper.delete("/zk_test1", -1); zooKeeper.close(); }}

And then I’ll show you a couple of other examples

ZooKeeper basic operations

reference

ZooKeeper official documentation

How do I build a ZooKeeper cluster