Wechat public account: “Moon chat technology” focus on “Star standard”, heavy dry goods, the first time delivery! [If you find the article helpful to you, welcome to “follow, read, like, retweet”]



Other articles in the Interview Essay series can be found here

One knowledge point a day

  • Think more, think more, communicate more, communicate more

“Directory”

  • 1. What is Zookeeper? What can be done?
  • 2. Talk about the data structure of Zookeeper
  • 3. What is stored in Znode?
  • 4. What is the system architecture of Zookeeper?
  • 5. Tell me more about the ZAB agreement
  • 6. How to initialize Zookeeper to elect the Leader?
  • 7. How to elect the Leader if the Leader fails and the crash recovery starts?
  • 8. Tell me about Wather listening mechanism and how it works.
  • 9. What are the features of Zookeeper?
  • 10. How does Zookeeper identify the order of requests?
  • 11. How is data synchronization conducted after leader election
  • 12. Does Zookeeper have inconsistent data?


1. What is Zookeeper? What can be done?

Zookeeper is an “open source” centralized service for maintaining configuration information, naming, providing “distributed” synchronization, and providing group services.


Functions such as “data publish/subscribe, load balancing, naming services, distributed coordination/notification, cluster management, Master elections, distributed locks and distributed queues” can be implemented based on Zookeeper.


One of the most common usage scenarios of Zookeeper is as a “registry”. Producers register their services with Zookeeper, and then consumers “get the producer’s service list information” from Zookeeper, and then “call the producer’s content data”, for example, “Dubbo, Kafka “uses Zookeeper as its registry.

2. Talk about the data structure of Zookeeper


ZooKeeper provides namespaces that are very similar to those of standard file systems. The name is a series of path elements separated by a slash (“/”). Each ZNode in the ZooKeeper namespace is identified by a path. “Each ZNode has a parent object”, whose path is the prefix of znode, with one less element; The exception to this rule is root (“/”), which has no parent. Also, as with standard file systems, “If ZNode has child nodes, it cannot be removed.”

The main differences between ZooKeeper and a standard file system are that “each ZNode can have data associated with it” (each file can also be a directory and vice versa) and that ZNodes are limited to the amount of data they can have. ZooKeeper is designed to store coordination data: status information, configuration information, location information, and so on. This meta information is usually measured in kilobytes (if not bytes). “ZooKeeper has 1M built-in integrity checks to prevent it from being used as a large data store,” but typically, it is used to store much smaller data.


There are three types of ZNodes:

  • “Persistent node” nodes are persisted
  • Ephemeral node: After the client is disconnected, ZooKeeper automatically deletes the ephemeral node
  • Sequential node. Each time a sequential node is created, ZooKeeper automatically adds a 10-digit number to the end of the path, starting at 1 and up to 2147483647 (2^32-1).

Znode comes in four forms:

  • Persistent node: For example, create /test/a “hello” is specified as a persistent node using the create parameter
  • Persistent sequence node: Specifies the sequence node by using the create-s parameter
  • Temporary node: specifies a sequential node using the create-e parameter
  • Temporary order node: The temporary and order nodes are specified by the create-s-e parameter

3. What is stored in Znode?

Znode contains Storage data, access permissions, child node references, and Node status information.


  • Data: indicates the service data information stored in a ZNode
  • Acl: records client access permissions, such as IP addresses, on zNodes.
  • “Child” : child reference of the current node
  • Stat: Contains Znode status information, such as transaction ID, version number, timestamp, and so on.

4. What is the system architecture of Zookeeper?


ZooKeeper is divided into “Server” and “Client”. The Client can connect to any Server of the entire ZooKeeper service (unless the leaderServes parameter is explicitly set, the leader is not allowed to accept Client connections). The client uses and maintains a “TCP connection” through which requests are sent, responses are received, observed events are obtained, and information is sent.

The servers that make up the ZooKeeper service must know each other. They maintain an in-memory state image, as well as transaction logs and snapshots in persistent storage, as long as “most servers are available, the ZooKeeper service is available”;


A Server in a Zookeeper cluster has three roles: Leader, Follower, and Observer

  • Leader: initiates voting and makes decisions, updates system status, and writes data
  • Follower: receives client requests, returns results, and votes in the master election
  • “Observer” : can accept client connections and forward “write requests” to the leader node, but does not participate in the voting process and only “synchronizes the leader state”. Its main purpose is “to improve read efficiency”.

The purpose of dividing servers into three types is to “avoid too many slave nodes participating in the half-write” process, which can affect performance, so Zookeeper can achieve high performance by using a small cluster of several machines, and scaling horizontally by adding Observer nodes.

Zookeeper recommends that the cluster have an odd number of nodes and be available as long as “more than half of the machines” can provide services.

When ZooKeeper starts, it elects a leader from the instance. The leader is responsible for data update. A successful update operation indicates that most servers have successfully modified data in memory. Each Server stores a copy of data in memory.

Zookeeper implements data consistency by using the ZAB protocol.

5. Tell me more about the ZAB agreement

ZAB (ZooKeeper Atomic Broadcast Atomic Broadcast) is a crash recovery supported Atomic Broadcast protocol specially designed for ZooKeeper. ZooKeeper relies on the ZAB protocol to implement distributed data consistency. Based on this protocol, ZooKeeper implements a system architecture in active/standby mode to maintain data consistency among replicas in the cluster.

ZAB protocol includes two modes, namely crash recovery and message broadcast.

  • Crash recovery: ZAB protocol enters recovery mode and “elects a new Leader” server during the startup of the entire service framework or when the Leader server is interrupted, crashes, exits, or restarts. When a new Leader server is elected and “more than half of the machines in the cluster have completed state synchronization with the Leader server”, the ZAB protocol “exits the recovery mode”. The remaining machines will continue to synchronize “until the synchronization is complete and the node is added to the cluster.”

  • Message broadcast: When “more than half of the Follower servers have synchronized with the Leader server” in the cluster, the entire service framework can “enter message broadcast mode”. When a ZAB protocol compliant server is added to the cluster after startup, if “a Leader server is already in charge of message broadcast” in the cluster, the newly added server will “consciously enter the data recovery mode” : Find the Leader’s server, synchronize data with it, and then participate in the message broadcast process together. ZooKeeper is designed to allow only a single Leader server to process transaction requests. After receiving the transaction request from the client, the Leader server will generate the corresponding transaction proposal and initiate a round of broadcast protocol. If “another machine in the cluster receives a transaction from the client” request, the non-Leader server “forwards the transaction request to the Leader server first.”

6. How to initialize Zookeeper to elect the Leader?


During the cluster initialization phase, the leader election occurs only when more than two ZKS are started. The process is as follows:

  • (1) “Each Server issues a vote”. In the initial election, ZK1 and ZK2 will vote as the Leader server, and each vote will contain the (” myID, ZXID “) of the proposed server. At this time, the vote of ZK1 is (1, 0), and the vote of ZK2 is (2, 0). Each then “sends this vote to the other machines in the cluster”.

  • (2) Receipt of the vote. After each server in the cluster receives a vote, it first “determines” the “validity” of the vote, such as checking whether it is from this round or from a LOOKING server.

  • (3) Processing of voting. Each server that initiates a poll needs to “compare the votes of others to its own,” as follows:

    • Check the ZXID first. The server with a large ZXID is preferred as the Leader. “If zxIDS are the same”, then compare myID. “The server with a larger myID serves as the Leader server”.
  • (4) Counting votes. After each vote, the server will count the voting information and “judge whether more than half of the machines have received the same voting information”. For ZK1 and ZK2, it counts that two machines in the cluster have received the voting information (2, 0). At this point, ZK2 is considered to have been elected as the Leader.

  • (5) Change the server status. “Once the Leader is determined, each server updates its status.” If it is Follower, the status is changed to FOLLOWING, and if it is Leader, the status is changed to LEADING. When the new Zookeeper node ZK3 is started, it finds that the Leader already exists and changes the status from LOOKING to FOLLOWING.

7. How to elect the Leader if the Leader fails and the crash recovery starts?


  • 1. Change Status. After the Leader hangs, the remaining non-Observer servers change their server state to LOOKING and start the Leader election process.

  • 2. Each “non-Observer” Server “issues a vote”. Consistent with the startup process.

  • 3. “Receive” votes from each server. Same process as startup.

  • 4. “Process the vote.” Same process as startup.

  • 5. “Count the votes.” Same process as startup.

  • 6. Change the Server Status. Same process as startup.

8. Tell me about Wather listening mechanism and how it works.


The specific steps are as follows:

  • Service Registration: When the Provider starts, it registers service information with the ZooKeeper server, that is, creates a node.
  • Service discovery: When a Consumer starts up, it obtains the registered service information from the ZooKeeper server based on its configured dependent service information and sets the Watch monitor. After obtaining the registered service information, it “caches” the service provider information locally and invokes the service.
  • Service Notice: Once the service provider breaks down for some reason and stops providing services, the client is disconnected from the ZooKeeper server, and the corresponding service node of the ZooKeeper service provider will be deleted. Then the ZooKeeper server will asynchronously register the service with all users. In addition, the service consumer monitored by watch sends a notification that the node is deleted, and the consumer pulls the latest service list according to the “received notification” and “updates the local cache” service list.

Clients register a Watcher event for a ZNode and “receive notification from ZooKeeper” when the zNode changes.

Four features:

  • One-time: Once a Wather is triggered, Zookeeper removes it from storage. If we want to continue listening on this node, we need to set the watch event of the node to True again in the client’s listening callback. Otherwise, the client can receive only one change notification from the node
  • Client serial: The client’s “Wather callback processing is serial synchronization” process, do not block the entire client because of one Wather logic
  • Lightweight: The unit of Wather notification is WathedEvent, which contains only the notification status, event type, and node path but does not contain specific event content. The specific time content requires the client to actively retrieve data
  • Asynchronous: The Zookeeper server sends watcher notification events to the client asynchronously. Therefore, it cannot be expected to monitor every node change. Zookeeper can only ensure final consistency, but cannot guarantee strong consistency.

9. What are the features of Zookeeper?


  • Sequential consistency: The leader generates zxids based on the request sequence to ensure the execution of the request sequence.
  • Atomicity: The results of all transaction requests are applied consistently across all machines in the cluster, either successfully or failing.
  • Single view: The data displayed by the client is the same no matter which ZooKeeper server the client is connected to.
  • “Reliability” : Once a server has successfully applied a transaction and responded to the client, the server state changes caused by that transaction are retained forever.
  • Real-time: Zookeeper only ensures that the client can read the latest data status from the server within a certain period of time.

10. How does Zookeeper identify the order of requests?


After receiving the request, the Leader assigns each request a globally unique increasing transaction ID: zxID. The Leader then places the request into a “FIFO queue”, which is then sent to all followers in accordance with the FIFO policy.

11. How is data synchronization conducted after leader election


As mentioned above, the leader is responsible for writing data, and the leader will assign each request an ZXID, put it into a queue, and execute it one by one. After each request is executed, the leader will record the ZXID executed.

We call the largest ZXID in this queue “maxZXID” and the smallest ONE “minZXID”.

Call the latest ZXID in the Observer and follower “lastSyncZXID”

“Proposal” : In fact, encapsulate some information in the request, such as ZXID, into a proposal object

  • 1. Differentiation and synchronization

    • Trigger condition :minZXID < lastSyncZXID < maxZXID

    • “Synchronization Process” :

      • 1). The leader sends DIFF instructions to the Observer and follower, and then differential synchronization begins
      • 2). Then send the differential data proposal to the Observer and follower, and the Observer and follower return an ACK indicating that the synchronization has been completed
      • 3). If more than half of the observers and followers in the cluster respond to an ACK, send a UPTODATE command
      • 4). The leader returns an ACK, and the synchronization process ends
  • 2. Roll back synchronization

    • Trigger Condition maxZXID < lastSyncZXID
    • “For example” : At this time, the largest ZXID in the queue is 100. A receives the request and its ZXID is 101. Before sending the synchronization data, A hangs up. In this case, user A needs to roll back data whose ZXID is 101
    • The synchronization process:
      • 1). Roll back to maxZXID directly
  • 3. Rollback + Differentiated synchronization

    • Trigger condition: If the Leader generates a proposal before sending it, the Leader breaks down and becomes a Follower after a new election, but the new Leader does not have the proposal data
    • “For example” : A is the leader, and the largest ZXID in the queue is 100. A receives the request and its ZXID is 101. Before sending the synchronization data, A hangs up, and B becomes the leader. The maximum ZXID in queue B is 103, and a recovers. In this case, a needs to roll back data whose ZXID is 101 before synchronization
    • The synchronization process:
      • 1).Observer and follower roll back data
      • 2). Differential synchronization
  • 4. Full synchronization

    • The trigger condition
      • 1).lastSyncZXID < minZXID
      • 2). There is no cache queue on the Leader server, and lastSyncZXID! =maxZXID
    • Synchronization Process: The leader sends the SNAP command to the Observer and follower to perform full data synchronization

12. Does Zookeeper have inconsistent data?

Zookeeper uses the half-write mechanism, which means that if two write attempts are made on three servers, the whole cluster is successfully written. If a request is made to the unwritten server, the data cannot be queried and data inconsistency occurs.