👀 👀 theory

First, basic concepts

ZooKeeper is an open source distributed coordination service that provides a highly available, high-performance and stable distributed data consistency solution. It is commonly used to implement functions such as data publishing/subscribing, load balancing, naming services, distributed coordination/notification, cluster management, Master election, distributed locking, and distributed queuing.

2. ZooKeeper data model

2.1 ZNode (Data Node)

All data stored in Zookeeper consists of ZNodes. Nodes become ZNodes and store data in the form of key and value pairs. The overall structure is similar to the Linux file system, and the root path starts with /.

Reading and writing data in a ZNode is atomic, and each ZNode has an Access Control List(ACL) that limits who can do what. The data size of each Znode cannot exceed 1 MB

2.1.1 Znode Data node name specifications
  • The null character (that is, \u0000) cannot form the name of the ZNode’s path
  • \ ud800-uf8ff, \ ufff0-uFFFF, \u0001 – \u001F and \u007F, \u009F don’t display well, they look like garble
  • “.” can form part of a name, but cannot be used as a path name alone
  • “Zookeeper” is a reserved word
2.1.2 ZNode Data node components:
  • stat

State attributes consist of:

The attribute name Property description
czxid Creates the transaction ID of the node
mzxid The transaction ID of the last changed node
pzxid The transaction ID of the last modification (addition or removal of the child node list) of the child node list
ctime Time to create a node, in milliseconds
mtime Time when a node was last modified, in milliseconds
version Number of node data changes
cversion Number of child node changes
aversion Number of ACL changes on a node
ephemeralOwner If the node is temporary, this value is the ID of the session on which the node was created, otherwise 0
dataLength Node data length
numChildren Number of child nodes
  • data
  • children
2.1.3 Znode Data node type:
  • Persistent Nodes
  • Ephemeral Nodes

The lifetime of temporary nodes is the lifetime of the session that created them, that is, when the session ends, the nodes are deleted. So temporary nodes are not allowed to create child nodes.

  • Sequence Nodes

Sequential nodes can be persistent or temporary. When creating a node, you can add a monotonously increasing counter to the node path, and Zookeeper will set the node path by attaching a 10-bit serial number to the original node name.

  • Container Nodes

This type of node was added after version 3.6.0. A container node is a node with a special purpose, which can be used for leader election and distributed locking. When the last child node in the container is deleted, the container node will be deleted at some point in the future.

  • Timeout Expired Nodes (TTL Nodes)

This type node is added after 3.6.0 version, while creating a lasting or order lasting nodes, the nodes can be set for a millisecond timeout expired time, if in a set time, the node has not been modified, and there is no child node, the node will as a candidate for, be removed at some time in the future. Of course TTL Nodes is disabled by default.

Three, they are Time

There are many ways to represent time in Zookeeper.

  • Zxid

The transaction ID. Each change in the Zookeeper state receives such a transaction ID

  • Version numbers

  • Ticks

  • Real time

4. ZooKeeper Sessions

The Zookeeper client succeeds. Procedure

ZooKeeper Watches

5.1 Basic Concepts of Watch

All Zookeeper read-related operations, such as getData(), getChildren(), and EXISTS (), have a Boolean parameter, Boolean watch, used to set watche. According to the reading content, there are two kinds of watch: Data Watch and Child Watch, such as getData() and exists(), are Data watches that read zNode Data. If a zNode Data changes, the Data Watch of the Znode is triggered. GetChildren () corresponds to Child Watch. The creation of a Child node triggers the parent’s Child Watch, while the deletion of a node triggers both Data Watch and Child Watch.

Watch is a one-time trigger. For example, when getData(“/znode1”, true) is called –true means to set the listener, the node is deleted or the data is changed, the client-registered Watch will be triggered. When triggered, it will be deleted. In other words, if the data changes in the future, this watch will not be triggered again.

5.2 Watch Event type

So when the watch is triggered, there are data changes, node creation or deletion and so on. How can the client judge whether the watch triggered by the server is of various types? Zookeeper provides an enumeration of EventTypes. When the server triggers the watch, it tells the client what type it is.

There are many types of events as follows:

  • None
  • NodeCreated
  • NodeDeleted
  • NodeDataChanged
  • NodeChildrenChanged
  • DataWatchRemoved
  • ChildWatchRemoved
  • PersistentWatchRemoved

The last three event types: DataWatchRemoved, ChildWatchRemoved, and PersistentWatchRemoved are the event types when different types of watches are deleted, which should be understood from the literal meaning of the words.

5.3 Permanent recursion watch

After 3.6.0 (including 3.6.0), clients can use addWatch() to set a permanent, recursive watch for znode. This means that the watch is no longer a one-off, it can be fired multiple times without being deleted, and will also be fired recursively. There is also a mechanism to remove a permanent watch: removeWatches()

Watch is maintained on the ZooKeeper server, so when the client is disconnected from the server, it will not be triggered by watch, but will recover and be triggered again after reconnection.

5.3 Some notes about watch 🔕

1. Standard watch is one-time. If the client receives the callback notification of watch, the watch will be deleted; if the client wants to receive the notification, it needs to register another watch

2. As mentioned in the first article, when the client receives the watch callback notification, it may continue to set a watch to listen for the next change of ZNode. However, if znode changes many times between receiving the watch and sending the request to set a new watch, the client may not receive the change notification.

3. If you register the same watch such as Exist and getData, the delete watch event will only be triggered once when the watch is deleted

ZooKeeper Access Control using ACLs

The full name of an ACL is Access Control List. Zookeeper’s ACL implementation is similar to UNIX’s file system permission control. Each ACL is for the specified Znode, but not for the child nodes of the specified Znode, that is, the ACL does not take effect recursively. If /app is set with an ACL, /app/ childTest will not be controlled by this ACL.

The permission expression is Scheme: ID, permissions

6.1 Scheme –> Authorization policy

There are four authorization policies

The authorization policy meaning
world Default policy. Anyone can access it
auth That is, the user has been authenticated
digest MD5 is used to hash the string in username:password format for authentication. Usename :password is used in plaintext for authentication. When used for ACL authentication, the string is base64 encoded and then SHA1 encrypted
ip The CLIENT IP address is used as the ACL id. The format foraddr/bitsBits indicates that the IP address of the client must match at least the bits of the IP address in the ACL
x509
#### 6.2 id–> Authorization object

6.3 Permissions –> permission point

Access point meaning
CREATE Child nodes can be created
READ You can obtain zNode data and a list of zNode child nodes
WRITE You can set data for the ZNode
DELETE You can delete a child node of a Znode
ADMIN You can set an ACL for a ZNode. ADMIN is like the owner of a ZNode

ZooKeeper Consistency Guarantees

Zookeeper is a high-performance, scalable service that performs both read and write operations very quickly

Sequential Consistency

Updates from the client will be processed sequentially

6.2 Atomicity

6.3 Single System Image

No matter which server the client connects to, it sees the same server data model

6.4Reliability

Once the update request is processed, the results of the changes are persisted

6.5 Timeliness


👊 👊 of actual combat

First, download and install

1.1 download

Download address: zookeeper.apache.org/releases.ht… [apache – they are – 3.6.2 – bin. Tar. Gz ] (https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.6.2/apache-zookeeper-3.6.2-bin.tar.gz)

1.2 Single-Machine Deployment Mode

1.2.1 Setting the Configuration File

In the conf directory, there is a zoo_sample. CFG file, which provides the sample configuration information. You only need to rename the file to zoo.cfg, so that the file can be identified by ZooKeeper. The configuration items in the configuration file are described as follows:

  • tickTime

Unit: millisecond. Is the interval for maintaining heartbeat between servers or between clients and servers. The minimum session timeout is twice as long as tickTime

  • dataDir

The address of the directory where ZooKeeper holds the data. By default, this is where transaction logs are recorded (unless otherwise specified).

  • clientPort

Port that the client connects to the server zoo. CFG Example

tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
Copy the code
1.2.2 Starting the ZooKeeper Server (Ensure that JDK is installed on the installation machine)
./zkServer.sh start
Copy the code
1.2.3 Connecting the Client to the Server
. / zkCli. Sh - server 127.0.0.1:2181Copy the code

1.3 Cluster Mode

Zookeeper cluster deployment provides high reliability. To implement a reliable fault-tolerant cluster, you need at least three servers and an odd number of clusters.

1.3.1 Setting the cluster configuration file zoo.conf
tickTime=2000
dataDir=/var/lib/zookeeper/
clientPort=2181
initLimit=5
syncLimit=2
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888
Copy the code

In addition to the parameters required in the single-machine deployment mode, you need to configure cluster parameters

  • initLimit

When the followers server in the cluster connects to the leader server, if the leader does not receive the followers’ information after the maximum initLimit of tickTime, the connection fails

  • syncLimit

Indicates the maximum syncLimit*tickTime interval when the follower server and the leader server synchronize messages. If no response is received, the follower will be abandoned

  • server.id=host:port:port

The server.id=host:port:port in each line of the configuration file form a cluster. There are two ports, the front port for connecting to the cluster leader and the back port for the leader election. If you want to set up a pseudo cluster on the same machine, the first port can be different

1.3.2 Setting the MyID file

The myid file contains only one line of content, and this line contains the ID in the preceding configuration server.id=host:port:port. In cluster mode, this id is unique and ranges from 1 to 255

1.3.3 Checking the Server Status

./zkServer.sh status
Copy the code

1.3.4 Problems Encountered during cluster Establishment

When setting up the cluster, the startup of the three machines is displayed successfully, but the connection fails using the client command, and the./ zkserver. sh status displays Error Contacting service. It is probably not running. If you view logs under logs, you will find the following line: Exception when following the leader java.io.EOFException. After checking the data, I found that I mixed the client port with the port used by the follower server to communicate with the leader server. As shown in the following figure, this error occurred. Therefore, the two ports cannot be the same

2. Operate Zookeeper and basic commands on the ZK client

2.1 bin/zkServer. Sh

./zkServer.sh [–config ] {start|start-foreground|stop|version|restart|status|print-cmd}

Zk server related operations. Different parameters represent different operations

Sh restart # Stop the ZK server bin/ zkserver. sh stop # Check the zK running status bin/ zkserver.sh Status # Check the zK version bin/ zkserver. sh versionCopy the code

2.2 bin/zkCli. Sh

  • Start the client to connect to the ZK server
bin/zkCli.sh -server {host}:{port}
Copy the code

2.3 Client Commands

  • ls

All child nodes in the specified path are listed as ls [-s] [-w] [-r] path

List all nodes in the root path. List all nodes in the root path. List all nodes in the root pathCopy the code
  • get

Query information about a node in a specified path. Get [-s] [-w] path

# View /node1 information get /node1 # View /node1 information and status get -s /node1Copy the code
  • create

Creating a node create [-s] [-e] [-c] [-t TTL] path [data] [acl] Optional. -s indicates that a node is created in sequence. Optional Example Set the expiration time of a node

# create persistent node with path /node1 and data: create /node1 dataCopy the code
  • set

Set [-s] [-v version] path data Indicates that the node status information is displayed after the node is updated. The optional parameter -v indicates that the node of the earlier version cannot be updated (optimistic locking).

[zk: localhost:2181(CONNECTED) 27] create /node2 data Created /node2 # zk: localhost:2181(CONNECTED) 27 localhost:2181(CONNECTED) 28] get -s /node2 data #... Datlesson = 0 #... [zk: localhost:2181(CONNECTED) 29] set -s /node2 data2 #... Datteaching = 1 #... [zk: localhost:2181(CONNECTED) 30] set -v 1 /node2 data3 # localhost:2181(CONNECTED) 31] set -v 1 /node2 data3 version No is not valid : /node2 [zk: localhost:2181(CONNECTED) 32]Copy the code
  • delete

Delete [-v version] path Deletes a node. The -v parameter is the same as the -v parameter in set

Delete /node2 node delete /node2Copy the code

3. Operate Zookeeper in Java

There are two ways to operate Zookeeper in Java. One is to use the native API (links) provided by Zookeeper, and the other is to use Apache Toth (website).

3.1 What is Apache Curator

Apache Exhibit is a relatively complete Zookeeper client framework. It encapsulates and extends the Native APIS of Zookeeper, reducing the complexity of using Zookeeper, and making using Zookeeper more reliable and simple. Exhibit artifacts There are many different artifacts that can be introduced and used according to our needs:

  • curator-recipes

The artifact contains all of the operations for Zookeeper, and most scenarios can be satisfied using the artifact alone.

  • curator-framework

A simplified encapsulation of ZooKeeper’s advanced features, the artifact is built on top of the entire client, so this module should be included automatically

  • curator-client

Encapsulation of operations related to Zookeeper client links

3.2 Using Apache-Toth to perform this operation on ZooKeeper

3.2.1 the introduction of pom
< the dependency > < groupId > org. Apache. Curator < / groupId > < artifactId > curator - recipes < / artifactId > < version > 5.1.0 < / version > </dependency>Copy the code

The ZooKeeper client version corresponding to 5.1.0 is 3.6.0

3.2.2 Creating a Connection

Creating a link requires a ZK host address, a retry policy, and you can set parameters such as timeout. The entry class for creating a client connection is CuratorFrameworkFactory, which has an internal static class Builder that sets some additional advanced parameters for the connection. The RetryPolicy is RetryPolicy.

    public static CuratorFramework createWithOptions(String connectionString, RetryPolicy retryPolicy, int connectionTimeoutMs, int sessionTimeoutMs){
        return CuratorFrameworkFactory.builder()
                .connectString(connectionString)
                .retryPolicy(retryPolicy)
                .connectionTimeoutMs(connectionTimeoutMs)
                .sessionTimeoutMs(sessionTimeoutMs)
                .build();
    }
Copy the code

3.2.3 Creating a Node, updated the node content

/** * Create a persistence node * @param Client CuratorFramework * @param Path Node path * @param Payload Node content * @throws Exception */ public static void create(CuratorFramework client, String path, byte[] payload) throws Exception { client.create().forPath(path, payload); }Copy the code

For more official examples: github.com/apache/cura…

💊 💊 application

1. Application Scenarios

  • Naming service: Identifies nodes in a cluster by name
  • Unified Configuration Management
  • Data publishing/subscribing
  • A distributed lock
  • Leader election

2. Use Zookeeper to implement unified configuration management

3. Using Zookeeper to implement distributed locks

Apache Exhibit provides a variety of implementations for distributed locking

  • InterProcessMutex: Distributed reentrant lock arrangement
  • InterProcessSemaphoreMutex: distributed exclusive lock
  • InterProcessReadWriteLock: distributed re-entrant read-write lock
  • InterProcessMultiLock: A container that manages multiple locks as a single entity

3.1 Code practice

3.1.1 Distributed reentrant lock arrangement
/** * @author miaomiao * @date 2020/10/25 11:15 */ public class DistributReetrantLock { private final InterProcessMutex interProcessMutex; private final String lockPath; public DistributReetrantLock(CuratorFramework client, String lockPath) { this.lockPath = lockPath; InterProcessMutex = new InterProcessMutex(client, lockPath); // This. InterProcessMutex = new InterProcessMutex(client, lockPath); } / get * * * * block type/public void tryLock () throws the Exception {this. InterProcessMutex. Acquire (); } /** * Timeout Fails to obtain the lock if the lock is not obtained * @param time * @param unit * @return Whether the lock is obtained * @throws Exception */ public Boolean tryLock(long time, TimeUnit unit) throws Exception { return this.interProcessMutex.acquire(time,unit); } / releases the lock * * * * @ throws the Exception * / public void unLock () throws the Exception {this. InterProcessMutex. Release (); }}Copy the code

test

public static void main(String[] args) throws InterruptedException { ExecutorService executorService = Executors.newFixedThreadPool(5); final String lockPath = "/lock"; for (int i = 0; i < 5; i++) { final int clientIndex = i; Callable<Void> callable = new Callable<Void>() { public Void call() throws Exception { CuratorFramework simpleClient = MyZookeeperClient. CreateSimpleClient (192.168.0.104: "2181"); try { simpleClient.start(); DistributReetrantLock distributReetrantLock = new DistributReetrantLock(simpleClient, lockPath); / / block type for distributReetrantLock tryLock (); System.out.println("Client:" + clientIndex + " get lock!" ); . / / verify whether reentrant distributReetrantLock tryLock (); System.out.println("Client:" + clientIndex + " get lock again!" ); Thread.sleep(1000); System.out.println("Client:" + clientIndex + "Release lock! ); distributReetrantLock.unLock(); } finally { CloseableUtils.closeQuietly(simpleClient); } return null; }}; executorService.submit(callable); } executorService.awaitTermination(10, TimeUnit.MINUTES); }Copy the code

The output

Client:4 get lock!
Client:4 get lock again!
Client:4 release lock!
Client:3 get lock!
Client:3 get lock again!
Client:3 release lock!
Client:0 get lock!
Client:0 get lock again!
Client:0 release lock!
Client:1 get lock!
Client:1 get lock again!
Client:1 release lock!
Client:2 get lock!
Client:2 get lock again!
Client:2 release lock!

Copy the code
3.1.2 Distributed exclusive locking
/** * distributed lock * @author miaomiao * @date 2020/10/25 12:54 */ public class DistributeLock {private final String DistributeLock;  private InterProcessSemaphoreMutex interProcessSemaphoreMutex; public DistributeLock(CuratorFramework client,String lockPath){ this.lockPath = lockPath; this.interProcessSemaphoreMutex = new InterProcessSemaphoreMutex(client,lockPath); } / get * * * * block type/public void tryLock () throws the Exception {this. InterProcessSemaphoreMutex. Acquire (); } /** * Timeout Fails to obtain the lock if the lock is not obtained * @param time * @param unit * @return Whether the lock is obtained * @throws Exception */ public Boolean tryLock(long time, TimeUnit unit) throws Exception { return this.interProcessSemaphoreMutex.acquire(time,unit); } / releases the lock * * * * @ throws the Exception * / public void unLock () throws the Exception {this. InterProcessSemaphoreMutex. Release (); }}Copy the code
3.1.3 Distributed reentrant read/write locks

The process that acquired the write lock can continue to acquire the read lock, and when the write lock is released, it is degraded to a read lock.

/** * Distributed reentrant read/write lock *@author miaomiao
 * @date2020/10/25 13:07 * /
public class DistributeReetrantReadWriteLock {
    private InterProcessReadWriteLock interProcessReadWriteLock;
    private String lockPath;
    public DistributeReetrantReadWriteLock(CuratorFramework client,String lockPath){
        this.lockPath = lockPath;
        this.interProcessReadWriteLock = new InterProcessReadWriteLock(client,lockPath);
    }

    /** * Acquire a blocking read lock *@throws Exception
     */
    public void tryReadLock(a) throws Exception {
       interProcessReadWriteLock.readLock().acquire();
    }

    /** * Obtain blocking write lock *@throws Exception
     */
    public void tryWriteLock(a) throws Exception {
        interProcessReadWriteLock.writeLock().acquire();
    }

    /** * Release the write lock *@throws Exception
     */
    public void unlockWriteLock(a) throws Exception {
        interProcessReadWriteLock.writeLock().release();
    }

    /** * Release the read lock *@throws Exception
     */
    public void unlockReadLock(a) throws Exception { interProcessReadWriteLock.readLock().release(); }}Copy the code

3.2 Distributed Locking Principle

3.1 Exclusive locking principle

Create () = create() = create() = create() = create() = create() = create() = create() = create() = create(); The client then gains the distributed lock. At the same time, any client that has not acquired the lock can register a watcher on the /exclusive_lock node to listen for child node changes in order to regain the lock.

3.2 Read-write Locking Principle

Shared locks need to implement shared read and exclusive write. Implementation principle: When multiple clients request a shared lock, temporary sequential child nodes are created for the specified node. The path of the child nodes can distinguish the client and the current operation (write or read) of the client. The path is as follows: [hostname]- Request type W/R- number. Later, when judging whether the current client obtains the lock, if the operation of the current client is a read request, the lock is obtained if there is a write request node whose serial number is smaller than its own node or a node whose sequence is the smallest. If the operation of the current client is a write request, the lock can be obtained only when its node number is the smallest. If no lock is acquired, the read request adds a listener to the last write request node smaller than its ordinal number. The write request registers a watcher on the last node smaller than itself in the child node list.

4. Realize the Leader election through Zookeeper

In distributed systems, leader election is the process of specifying a process (one instance, one machine) as the organizer of tasks assigned to multiple servers. Before the task starts, none of the server nodes knows which node will be the leader or coordinator of the task, and then after the leader election, each node identifies a specific, unique node as the task leader.

Apache Exhibit provides two ways to elect a Leader:

4.1 Using sequential temporary nodes

The simplest way to do this is to have an “/election” node, and clients create a sequential and temporary node for the node. Each child node created by the client automatically has a serial number suffix, and the earliest child node has the smallest serial number.

Of course, these must be far from enough, but also if the leader downtime failure, must be re-promoted a new leader mechanism. One solution is that all application clients listen to the child node with the smallest serial number to determine whether they can become the leader. If the leader client is down, the smallest serial number node will disappear, so a new minimum serial number node will be created, and a new leader will be created. But doing so creates a herd effect: All clients are notified that the smallest serial number child node has been removed. Then all clients call getChilrden() to get a list of “/election” child nodes. If there are a large number of clients, this will put some pressure on the ZooKeeper server. To avoid the herd effect, each client only needs to listen to the previous node of its corresponding child node. In this way, when the leader client breaks down, the smallest sequence node is deleted, and the client corresponding to the next node of the smallest sequence child node becomes the new leader.

The corresponding implementation class in Apache Exhibit is

  • org.apache.curator.framework.recipes.leader.LeaderLatch

Core class, main entry

  • org.apache.curator.framework.recipes.leader.LeaderLatchListener

The leader Latch listener, which is called back when the leader state changes, has two methods

Public void isLeader() is called when the leader is called; Public void notLeader() is called when the leader is lost;Copy the code

The sample

                        CuratorFramework simpleClient = MyZookeeperClient.createSimpleClient("192.168.0.104:2181");

                        simpleClient.start();

                        MyLeaderLatch leaderLatch = new MyLeaderLatch(simpleClient,"/leader_election" ,new LeaderLatchListener(){
                            public void isLeader(a) {
                                System.out.println("Client:"+clientIndex+" is leader");
                            }

                            public void notLeader(a) {
                                System.out.println("Client:"+clientIndex+" lose leader"); }}); leaderLatch.start();Copy the code

4.2 Implementation using distributed lock

Related categories:

  • org.apache.curator.framework.recipes.leader.LeaderSelector

The core class, electing the main entry, must pass the LeaderSelectorListener to construct the LeaderSelector

  • org.apache.curator.framework.recipes.leader.LeaderSelectorListener

The Leader Selector listener, which is called back when the leader is selected. When a node is elected as leader, takeLeadership will be invoked. After the execution of takeLeadership method is completed, this node will abandon the leader, leading to re-election. In other words, the life cycle of the leader is equal to that of takeLeadership method.

  • org.apache.curator.framework.recipes.leader.LeaderSelectorListenerAdapter

LeaderSelectorListenerAdapter is an abstract implementation of LeaderSelectorListener overwrite the stateChanged method, and throw CancelLeadershipException when electoral defeat, This listener is officially recommended

  • org.apache.curator.framework.recipes.leader.CancelLeadershipException

Example:

             CuratorFramework simpleClient = MyZookeeperClient.createSimpleClient("192.168.0.104:2181");

                simpleClient.start();

                MyLeaderSelector leaderSelector = new MyLeaderSelector(simpleClient, "/leader_selector".new LeaderSelectorListenerAdapter() {

                    public void takeLeadership(CuratorFramework client) throws Exception {
                        System.out.println("Client:" + clientIndex + " get leader!"); }});// Start the election
                leaderSelector.start();
Copy the code

📝📝 Reference article

Github.com/Snailclimb/…

Zookeeper.apache.org/doc/r3.6.2/…

www.cnblogs.com/qlqwjy/p/10…

Application scenarios of ZooKeeper

Zookeeper, a distributed service framework, manages data in a distributed environment

www.runoob.com/w3cnote_gen…

curator.apache.org/

Chat 🏆 technology project stage v | distributed those things…