1. Node Type 2. Structure 3. Principle of Listener 4. Election Mechanism 4.1 Important Parameters. 4.2 Election status: 4.3 Leader election at server startup 4.4 Leader election during operation

Traveling through the Sahara in the rain

To the source:

Walk the Sahara in the Rain

Note: welcome to pay attention to the public number, learn technology, grow together!

1. Node type


  1. Persistent: When the client and server disconnect, the created node does not delete 1.1. Persistent directory node: This node still exists after the client disconnects from ZooKeeper 1.2. Persistent sequentially numbered directory nodes: After the client disconnects from ZooKeeper, the node still exists, but ZooKeeper sequentially numbers the node name
  2. Ephemeral: When the client and server are disconnected, the created node itself is deleted 2.1. Temporary directory node: After the client disconnects from ZooKeeper, this node is deleted 2.2. Temporary sequentially numbered directory node: When the client disconnects from ZooKeeper, the node is deleted, but ZooKeeper sequentially numbers the node name.

Note: When a ZNode is created, the sequence identifier is set. The ZNode name is appended with a value. The sequence number is a monotonically increasing counter maintained by the parent node

Note: In distributed systems, sequence numbers can be used to globally sort all events, so that clients can infer the order of events from the sequence number

2. Structure

Use the stat command to view the node details


cZxid The transaction ID when the node was created
ctime The time when the node was created
mZxid The transaction ID when the node was last modified
mtime The time when the node was last modified
pZxid Represents the transaction ID of the last modification of the list of children of the node. The list of children is affected by the addition or removal of children, but the data content of the children is not affected by the ID(Note that the pzxid can only be changed if the list of child nodes changes, and changes in the content of child nodes do not affect the pzxid)
cversion The version number of the child node is incremented by 1 each time the version number of the child node is modified
dataversion The version number of the data. Each time the data is modified, the version number is increased by 1
aclversion Permission version number, permission to modify the version number each time increment 1
ephemeralOwner The SESSIONID of the session that created the temporary node.(*If the node is persistent, the value of this property is 0) _*
dataLength The data length of the node
numChildren This node has the number of children(Count only the number of direct children)

Three, the principle of the listener

  1. The first thing you need is a main() thread
  2. When you create the ZooKeeper client in the main thread, you create two threads, one for connet communication and one for listener.
  3. The registered listener events are sent to ZooKeeper via the CONNECT thread.
  4. Add registered listener events to ZooKeeper’s list of registered listeners.
  5. ZooKeeper listens for data or path changes and sends the message to the Listener thread.
  6. The listener thread internally calls the process() method.


Common monitoring:

  1. Listen for changes in node data get path [watch]
  2. Listen to child node increment/decrement changes ls path [watch]

4. Election mechanism

The leader election of ZooKeeper has two stages, one is the leader election when the server is started, and the other is the leader server down during the operation.

4.1 Important parameters.

  1. Server ID(MyID) : The larger the number, the greater the weight in the election algorithm
  2. Transaction ID(ZXID) : The larger the value, the newer the data and the greater the weight
  3. Epoch-LogicalClock: The value of the Epoch-LogicalClock is the same during the same round of voting and increases with each vote

4.2 Election status:

Looking: Campaign status

I follow the leader status and participate in voting

Obstinate: OBSERVING status, synchronization leader status, not participating in voting

Leading a company

4.3 Leader election at server startup

Each node starts LOOKING, and then the main election process begins. Take a cluster of three machines as an example. When the first server, server1, is started, the leader election cannot be conducted. When the second server, server2, is started, the two machines can communicate with each other and enter the leader election process.


  1. Each server issues a vote, and since it is the initial case, both server1 and server2 vote themselves as the leader server. Each vote contains the server myid, zxid, epoch, and is represented by (myid, zxid). At this point server1 votes for (1,0), server2 votes for (2,0), and then sends the respective votes to the other machines in the cluster.
  2. Receive votes from various servers. After each server in the cluster receives a vote, it first determines the validity of the vote, such as checking if it is an epoch and if it comes from a server in the LOOKING state.
  3. Process votes separately. For each vote, the server needs to compare the votes of other servers with its own votes. The comparison rules are as follows: 3.1. Compare epoch 3.2 first. Check zxid. Servers with large zxid are preferred as leader 3.3. If the ZXID is the same, then the MyID is compared, and the server with the larger MyID is the leader server
  4. Count the votes. After each vote, the server calculates the vote information and determines if more than half of the machines have received the same vote information. Both server1 and server2 count that there are two machines in the cluster that received (2,0) voting information, and server2 has been selected as the leader node.
  5. Change the server state. Once the leader is identified, each server responds by updating its own status, changing to “FOLLOWING” if it is the follower and to “LEADING” if it is the leader. At this point server3 continues to start, adding the change itself to “FOLLOWING”.

4.4 Leader election during operation

When the leader server in the cluster is down or unavailable, the whole cluster cannot provide external services and enters a new round of leader election.

  1. Change the state. After the leader hangs, other non-Oberver servers change their server status to LOOKING.
  2. Each server issues one vote. At run time, the ZXID may be different on each server.
  3. Processing votes. Rules are the same as starting the process.
  4. Count the votes. Same as the startup process.
  5. Change the server state. Same as the startup process.