ZooKeeper

Long time no see

It’s been almost a month since your last post. Think about how you haven’t written anything for almost a month. There may have been final exams, curriculum design, and driving tests, but that’s no excuse!

A lazy winter is not good, hope the vast digging friends urge me ๐Ÿ™„๐Ÿ™„โœ๏ธโœ๏ธ.

The article is very long. Like it first and then read it. Make it a habit. โค๏ธ ๐Ÿงก ๐Ÿ’› ๐Ÿ’š ๐Ÿ’™ ๐Ÿ’œ

What is a ZooKeeper

ZooKeeper was developed by Yahoo and later donated to Apache and is now an Apache top-level project. ZooKeeper is an open source distributed application coordination server that provides consistency services for distributed systems. Its consistency is achieved through ZAB protocol based on Paxos algorithm. Its main functions include configuration and maintenance, distributed synchronization, cluster management, and distributed transactions.

zookeeper

Simply put, ZooKeeper is a distributed coordination service framework. Distributed? Coordinate services? What the hell is this? ๐Ÿค” ๐Ÿค”

In fact, when explaining the concept of distribution, I found that some students could not understand the two concepts of distribution and cluster very well. Some time ago, a student talked with me about distributed things. He said that distributed means adding machines, right? One machine isn’t enough and one more pressure resistant. Of course, it is understandable to add machines to this view, you must involve multiple machines in a distributed system, but you do not forget that there is a similar concept in computer science – Cluster, Cluster is not also add machines? But clustering and distribution are two very different things.

For example, I have a second kill service now, but the concurrency is too large for a single system to bear, so I add several servers to provide the same second kill service, which is a Cluster Cluster.

cluster

However, NOW I take a different approach. I split a seckill service into multiple sub-services, such as order creation service, point increase service, coupon deduction service, etc. Then I deploy these sub-services on different servers, which is called Distributed.

distributed

And why do I refute that distributed means adding machines? Because I think adding machines is better for building clusters because it’s really just adding machines. With distribution, you first need to split the business, then add machines (not just machines), and you have to solve the problems of distribution.

For example, how to coordinate each distributed component, how to reduce the degree of coupling between each system, the processing of distributed transactions, how to configure the entire distributed system and so on. ZooKeeper is designed to address these issues.

Consistency problem

Designing a distributed system is bound to run into a problem — partition tolerance necessarily requires a trade-off between system availability and data consistency. This is known as the CAP theorem.

In fact, it is very simple to understand, for example, a class as the whole system, and students are independent subsystems of the system. This time the class of small red xiaoming secretly fall in love by the big mouth of the class floret found, floret ecstatic told the people around, and then small red Xiaoming fall in love with the news in the class spread up. When in the process of spreading the news, you catch a classmate asking about their situation, if you don’t know, then the whole class system has a data inconsistency problem (because the flower already knows the news). If he doesn’t answer you directly because the whole class is spreading messages (everyone needs to know to provide service for consistency), then there are usability issues with the system.

The former is Eureka’s approach, which ensures AP (availability), and the latter is ZooKeeper’s approach, which ensures CP (data consistency).

Consistency protocols and algorithms

In order to solve the problem of data consistency, many consistency protocols and algorithms have appeared in the continuous exploration of scientists and programmers. Such as 2PC (two-phase commit), 3PC (three-phase commit), Paxos algorithm and so on.

At this time, please think about a problem. If students use the way of passing notes to spread information, then there will be a problem — how do I know whether my little note has reached the hand of the person I want to pass on? What if some kid hijacked it and tampered with it, right?

At this time, the concept of the Byzantine general problem was developed. It means that it is impossible to achieve consistency by means of message passing over unreliable channels, so the necessary prerequisite of all consistency algorithms is secure and reliable message channels.

Why solve the problem of data consistency? If you think about it, if a second kill system splits the service into order placing and points adding, and the two services are deployed on different machines, if the points system goes down in the middle of the message transmission, you can’t place an order without adding points, right? You have to make sure that the numbers match, right?

2PC (Two-stage submission)

Two-phase commit is a protocol to ensure data consistency in distributed system. Many databases use two-phase commit protocol to complete distributed transaction processing.

Before we get to 2PC, let’s think about what’s wrong with distributed transactions.

Take the placing order and adding points of the second kill system for example (I think you may have spit ๐Ÿคฎ๐Ÿคฎ๐Ÿคฎ), we will send a message to the points system after placing the order to tell it that the points should be increased. If we just send a message and don’t get a response, how can our order system know if the points system has received the message? If we add a process of receiving and replying, then when the points system receives the message, it returns a Response to the order system, but there is network fluctuation in the middle, and the reply message is not sent successfully. Does the order system think that the points system fails to receive the message? Does it roll back the transaction? But in this case, the points system has successfully received the message, it will process the message and add points to the user. At this time, the points will be added but the order is not successfully placed.

So what we need to solve is the atomicity problem of all services in a distributed system, the whole invocation chain, where all of our data processing either succeeds or fails.

In a two-phase commit, there are two main roles involved, the coordinator and the participant.

First stage: when to perform a distributed transaction, the transaction the originators of the first transaction request, the coordinator by coordinator will then send all participants prepare request (including transaction content) told participants you need to perform transactions, if can perform transactions I send content to perform but not submitted, please give me a reply after execution. After the participants receive the prepare message, they begin the transaction (but do not commit) and log Undo and Redo information to the transaction log, after which they report back to the coordinator whether they are ready or not.

Phase 2: In the second phase, the coordinator decides whether the transaction can be committed or rolled back according to the feedback of the participants.

For example, when all participants return the ready message, the transaction will be committed. The coordinator will send a Commit request to all participants. When the participant receives the Commit request, the previous transaction will be committed. After the submission is complete, a successful submission response is sent to the coordinator.

If not all participants have returned the ready message in phase 1, the coordinator will send a rollback request to all participants to rollback the transaction. After receiving the request, the participant will rollback the transaction it did in phase 1 and then return the processing information to the coordinator. Finally, the coordinator receives the response and returns the result of processing failure to the transaction initiator.

2 PC process

I think 2PC implementation is relatively weak, because in fact it only solves the atomicity problem of each transaction, but also brings a lot of problems.

  • Single point of failure, if the coordinator fails then the entire system becomes unavailable.
  • congestionThat is, when the coordinator sendsprepareIf a participant receives a request and can process it, it will process the transaction but not commit it. At this time, resources will be held up and not released. If the coordinator dies, these resources will not be released again, which will greatly affect performance.
  • Data inconsistency problemFor example, in phase 2, the coordinator sends only part of thecommitThe request hangs, which means that participants who receive the message commit the transaction, while those who do not receive the message do not commit the transaction, causing data inconsistencies.

3PC (Three-phase submission)

Due to a series of problems with 2PC, such as single points, fault tolerance defects and so on, 3PC (three-phase commit) was created. So what are these three stages?

Don’t confuse PCS with PCS. They’re actually phase-commit.

  1. CanCommit phase: the coordinator sends to all participantsCanCommitAfter receiving the request, the participant will check whether the transaction can be executed according to its own situation. If YES, the participant will return YES and enter the preparatory state. Otherwise, the participant will return NO.
  2. PreCommit phase: The coordinator decides whether the following can proceed based on the response returned by the participantPreCommitOperation. If all of the above participants return YES, the coordinator will send it to all participantsPreCommitPre-submit requests,After receiving a pre-commit request, the participant performs the transaction and writes’ Undo ‘and’ Redo ‘information to the transaction log, and finally returns a successful response to the coordinator if the participant successfully executes the transaction. If in the first phase the coordinator receivesAny NOInformation, orWithin a certain amount of timeIf not all participants respond, the transaction is interrupted. It sends abort requests to all participants. The participant interrupts the transaction immediately after receiving the interrupt request, or if it has not received a request from the coordinator for a certain period of time.
  3. DoCommit phase: This stage is actually equal2PCThe second phase is similar if the coordinator receives all the participants inPreCommitPhase YES response, which the coordinator will send to all participantsDoCommitThe request,After receiving a ‘DoCommit’ request, the participant commits the transaction, the response will be returned to the coordinator, and the coordinator will complete the transaction after receiving a successful transaction submission response from all participants. If the coordinator is inPreCommitphaseNO was received or responses from all participants were not received within a certain period of time, the interrupt request is sent, and the participant receives the interrupt requestThrough the rollback log recorded aboveTo rollback the transaction and report the rollback status to the coordinator, who interrupts the transaction upon receiving the message returned by the participant.
3 PC process

Here is a flow chart of 3PC in a successful environment. You can see that 3PC handles timeout interrupts in many places. For example, the coordinator interrupts transactions when all acknowledgements are received within a specified time, which reduces synchronization blocking time. It should also be noted that in the 3PC DoCommit phase, if the participant does not receive the request to commit the transaction sent by the coordinator, it will commit the transaction within a certain period of time. Why do you do that? This time because we must ensure all the coordinator in the first stage all returned can perform transactions response, this time we have reason to believe that other systems are able to do the transaction execution and submitted, so whether coordinator to have hair message give participants, participants enter the third stage of transaction commit operation is underway.

All in all, through a series of 3 PCS timeout mechanism very good ease the congestion problem, but the most important of consistency and haven’t been solved fundamentally, in PreCommit stage, for example, when a participant after received the request of other participants and coordinator or a network partition, messages are received at this time of the participants are transaction commit, This leads to data inconsistencies.

Therefore, to solve the consistency problem, you need to rely on the Paxos algorithm โญ๏ธ โญ๏ธ port ๏ธ.

` Paxos ` algorithm

Paxos algorithm is a highly fault-tolerant consistency algorithm based on message passing. It is recognized as one of the most effective algorithms to solve the distributed consistency problem, which is how to reach agreement on a certain value (resolution) in distributed system.

There are three roles in Paxos: Proposer, Acceptor, and Learner. Like 2PC, the Paxos algorithm also has two phases: Prepare and Accept.

Prepare stage

  • Proposer who: Responsible for proposingproposal, each proposer gets one first when they submit a proposalGlobally unique, increasing proposal number NIs the unique number N in the whole cluster, and then assigns that number to the proposal it wants to make, inThe first stage is to send the proposal number only to all the voters.
  • Acceptor voter on: Every voter in theacceptAfter a proposal, the proposal number N will be recorded locally, so that one of the accepted proposals saved by each voter will existThe proposal with the largest numberAnd its number is assumed to bemaxN. Each voter will onlyacceptThe number is greater than the local numbermaxNWhen approving the proposal, the voters will give feedback to the proposal with the largest number previously accepted as a responseProposer ใ€‚

The following is a flowchart of the prepare phase for your reference.

Paxos Phase 1

The accept stage

When a Proposer is submitted to a proposal, if it receives approval from more than half of its acceptors (Proposer itself agrees), If the Proposer sends a real proposal (if any) to any acceptors, then it sends the proposal number and the content of the proposal.

If the number of proposals is greater than or equal to the maximum number of proposals already approved, the Proposer accepts the proposal and returns the situation to the Proposer. If not, do not respond or return NO.

Paxos Phase 2 1

If a Proposer receives a majority of acceptances, it sends a request for a proposal to all acceptors at that time. It should be noted that more than half of acceptors have approved the proposal, and the remaining acceptors have not implemented the proposal. Therefore, we need to send the proposal and proposal number to the unapproved acceptors for unconditional execution and submission. An acceptor that has previously approved a proposal simply sends the proposal number and lets the acceptor perform the submission.

Paxos Phase 2 2

If the Proposer does not receive more than a majority of acceptances, then it increments the number of the Proposal and resumes the Prepare phase.

There are many ways for Learner to learn an Acceptor proposal. There is no explanation here.

‘Paxos’ algorithm infinite loop problem

In fact, a bit similar to two people quarrel, xiao Ming said I was right, small red said I was right, two people argue who also don’t let who ๐Ÿคฌ๐Ÿคฌ.

For example, the acceptor accepts acceptor’s proposal M1 after completing the Prepare phase. The acceptor accepts ACCEPtor’s proposal M1 after completing the Prepare phase. The acceptor accepts acceptor’s proposal M2 after completing the Prepare phase. However, the P1 proposal was not approved in phase 2 (because acceptor already approved M2 larger than M1), so the P1 increment proposal was changed to M3 and the new M3 proposal was approved again. It can’t approve M2 again, and then M2 increases itself and enters the Prepare stage…

This endless proposal is the problem of the PaxOS algorithm.

So how to solve it? It’s very simple, too many people tend to quarrel, SO I’ll just allow one proposal right now.

Elicit ` ZAB `

` Zookeeper ` architecture

As an excellent, efficient and reliable distributed coordination framework, ZooKeeper does not directly use Paxos to solve the problem of distributed data consistency, but specially customized the consistency protocol called ZAB(ZooKeeper Automic Broadcast) atomic Broadcast protocol. This protocol supports crash recovery well.

Zookeeper architecture

Three characters in ‘ZAB’

As with Paxos, before introducing ZAB, let’s first understand the three main roles in ZAB: Leader, Follower, and Observer.

  • Leader: in the clusterUnique write request handlerAnd can initiate a vote (which is also used to make write requests).
  • Follower: Can receive client requests, if the request is read can be processed by itself,If it is a write request, it is forwarded to the Leader. They vote in the election process,To vote and to stand for election ใ€‚
  • ObserverThose who do not have the right to vote and stand for electionFollower ใ€‚

There are also two modes defined for zkServer in the ZAB protocol, namely message broadcast and crash recovery.

Message broadcast mode

ZAB protocol handles write requests. Only the Leader can handle write requests. Do our followers and observers also need to update their data synchronously? Can’t the data be updated only in the Leader, while the other roles are not updated?

Keep data consistent across the cluster. What would you have done?

The first step is for the Leader to broadcast the request to the Followers and ask the Leader if they agree to update the Followers. If more than half of the Followers agree to update the Followers and observers (just like Paxos). Of course that’s a little bit fancier, so let me draw a picture.

News broadcast

B: well… ๐Ÿคฅ๐Ÿคฅ๐Ÿคฅ. Where did these two queues come from? The answer is that ZAB needs to keep followers and observers sequential. What is sequential? For example, I have A write request A now, and the Leader will broadcast the request A, because only half of the consent is needed. Therefore, one Follower F1 may not receive the request due to network reasons, and the Leader will broadcast another request B due to network reasons. F1 actually received request B first and then request A. At this time, different order of request processing will lead to different data, resulting in data inconsistency.

So on the Leader side, it prepares a queue for each other zkServer to send messages on a first-in, first-out basis. Because the protocol is ** through TCP ** to carry out network communication, to ensure the order of the message sent, receive the order is also guaranteed.

In addition, a globally monotonically increasing transaction ID ZXID is defined in ZAB, which is a 64-bit LONG, where the high 32 bits represent the epoch age and the low 32 bits represent the transaction ID. The epoch will change according to the changes of the Leader. When a Leader dies and a new Leader takes over, the epoch will change. The lower 32 bits can simply be understood as the increasing transaction ID.

The reason for defining this is also to order. After each proposal is generated in the Leader, it needs to be sorted by its ZXID before it can be processed.

Crash recovery mode

When it comes to crash recovery, we should first mention the Leader election algorithm in ZAB. When the system crashes, the biggest impact should be the collapse of the Leader, because we only have one Leader, so when the Leader goes wrong, we are bound to re-elect the Leader.

The Leader election can be divided into two different stages. The first stage is the re-election of the Leader when the Leader is down, and the second stage is the initial election of the system Leader when Zookeeper starts. Let me describe how ZAB conducts the initial election.

Assuming we have three machines in our cluster, that means we need more than two (more than half) to agree. For example, if we start server1, it will vote for itself first. The vote is myid and ZXID of the server. Since the ZXID is 0, server1 will vote (1,0). However, server1 only votes 1 at this time, so it cannot act as the Leader. At this time, it is still in the election stage, so the whole cluster is in Looking state.

Then server2 starts, it will first vote for itself (2,0) and broadcast the vote information (server1 will do the same, but it has no other servers at the time). Server1 receives the vote information from server2 and compares it with its own. First, it will compare the ZXID. If the ZXID is larger than the Leader, it will compare the myID. If the myID is larger than the Leader, it will compare the Leader. Therefore, server1 finds that server2 is more suitable to be the Leader, it will change its voting information to (2,0) and then broadcast it. After receiving it, server2 finds that it does not need to change its voting information, and its votes have already exceeded half. If server2 is set as the Leader, server1 will also change its server Settings from Following to Follower. The whole server goes from Looking to normal.

When server3 starts and finds that the cluster is not Looking, it directly joins the cluster as a Follower.

If server2 fails while the cluster is running, how does the cluster re-elect the Leader? It’s almost like an initial election.

First of all, there is no doubt that the remaining two followers will change their status from Following to Looking, and then each server will vote for itself in the same way it did in the initial poll.

Suppose server1 votes itself (1,99) and then broadcasts to other servers, server3 votes itself first (3,95) and then broadcasts to other servers. Server1 and Server3 will receive each other’s votes at this point, and as in the beginning of the election, they will compare their votes with the votes they received (the one with the larger zxID takes precedence, and the one with the larger myID takes precedence if they are the same). At this time, server1 received the vote of server3 and found that it was not suitable, so server3 received the vote of server1 and found that it was more suitable than its own, so it changed the vote to (1,99) and broadcast it. Finally, server1 set himself as the Leader after receiving more than half of the votes, and Server3 also became Follower.

Notice why ZooKeeper has an odd number of nodes. For example, there are three of us. If one of us is down, we can still work normally, and if two of us are down, we can’t work normally (there is no more than half of the nodes, so we can’t vote and other operations). Suppose we now have four servers, one of which can work, but two of which cannot work properly. This is the same as three servers, and three is one less than four, bringing the same benefits. Therefore, Zookeeper recommends an odd number of servers.

So what is crash recovery after the Leader election in ZAB?

In fact, the main is when a machine in the cluster hang, how do we ensure the data consistency of the whole cluster?

If only the followers hang and the number of hangs is less than half, as we mentioned at the beginning that the queue is maintained in the Leader, there is no need to worry about the data inconsistency caused by the failure to receive subsequent data.

If the Leader fails, we will be in trouble. We definitely need to suspend the service to change the Looking state and then re-elect the Leader (as I have mentioned above). However, this will be divided into two situations. Ensure that the proposals submitted by the Leader can be submitted by all followers and skip those that have been discarded.

What does it mean to ensure that a proposal submitted by the Leader is eventually submitted by all followers?

Let’s say the Leader (Server2) sends a COMMIT request (forget the message broadcast mode above), sends it to server3, and then hangs up about to send it to Server1. At this time, if server1 is used as the Leader during the re-election, data inconsistency will definitely occur, because Server3 will definitely submit the commit request just sent by Server2, and Server1 will discard it because it has not received the commit request.

Crash recovery

So what’s the solution?

Smart students will definitely doubt that server1 cannot become the Leader at this time, because server1 and server3 will compare their ZXIDS when voting, and server3’s ZXID must be larger than that of server1. (If you do not understand, you can see the previous election algorithm)

So what does it mean to skip proposals that have been discarded?

If the Leader (Server2) agrees to proposal N1, commits the transaction and sends all the Follower commit requests, but hangs at this time, the Leader election must be reelected. For example, select Server1 as Leader (it doesn’t matter). However, after a while, the suspended Leader recovers again, and it will definitely enter the cluster as a Follower. It should be noted that server2 has just agreed to submit proposal N1, but other servers have not received its commit message. Therefore, it is impossible for other servers to submit this proposal N1, which will cause data inconsistency. Therefore, the proposal N1 needs to be discarded eventually.

Crash recovery

Zookeeper several theoretical knowledge

It is not enough to understand the ZAB protocol, it is only a way of internal implementation of Zookeeper, and how do we do some typical application scenarios through Zookeeper? Such as cluster management, distributed locks, Master elections, and so on.

This is how to use Zookeeper, but there are a few concepts we need to understand before we can use it. Such as Zookeeper’s data model, session mechanism, ACL, Watcher mechanism, etc.

The data model

The ZooKeeper data store structure is very similar to the standard Unix file system, with many child nodes (trees) hanging from the root node. However, ZooKeeper does not have the concept of directories and files in a file system. Instead, It uses ZNode as data node. A ZNode is the smallest data unit in ZooKeeper. Each ZNode can store data and mount child nodes to form a tree namespace.

Zk data model

Each ZNode has its own node type and node state.

The node types can be divided into persistent node, persistent sequential node, temporary node and temporary sequential node.

  • Persistent node: Once created, it persists until it is removed.
  • Persistent sequential nodes: A parent node can maintain a creation sequence for its children, as reflected in the node name, which is automatically followed by a 10-digit string counting from 0.
  • Temporary node: The life cycle of a temporary node is bound to the client session. If the session disappears, the node disappears. Temporary nodes can only be leaf nodes and cannot create child nodes.
  • Temporary sequential node: A parent node can create a temporary node that maintains the order (just like the persistent sequential node above).

The node status contains many node attributes, such as CZxID and mzxID. Zookeeper uses the Stat class to maintain the node status. Here are some attributes to explain.

  • czxid:Created ZXID, the data node iscreateIs the transaction ID of.
  • mzxid:Modified ZXIDThe nodeThe last time it was updatedTransaction ID of.
  • ctime:Created Time, the time when the node was created.
  • mtime:Modified TimeIs the time when the object was last modified.
  • version: Indicates the version number of a node.
  • cversion:Child nodesVersion number of.
  • aversion: nodeACLThe version number.
  • ephemeralOwner: of the session that creates the nodesessionIDIf the object is persistent, the value is 0.
  • dataLength: Indicates the length of the node data.
  • numChildre: Indicates the number of children of the node. If it is a temporary node, the value is 0.
  • pzxid: Specifies the transaction ID of the node when the node list was last modified. Note that this is the child nodeThe list ofNot content.

The session

I think this is no stranger to backend developers, isn’t it session? However, zK client and server are maintained through TCP long connection session mechanism, in fact, for the session you can understand to maintain the connection state.

In ZooKeeper, sessions have corresponding events, such as CONNECTION_LOSS, SESSION_MOVED, SESSION_EXPIRED, and SESSION_EXPIRED.

ACL

ACL is Access Control Lists, which is a kind of permission Control. Zookeeper defines five permissions, which are:

  • CREATE: Permission to create child nodes.
  • READ: Permission to obtain node data and child node list.
  • WRITE: Permission to update node data.
  • DELETE: Permission to delete child nodes.
  • ADMIN: Sets the ACL permission of a node.

Watcher mechanism

Watcher is an event listener, which is a very important feature of ZK, and many functions depend on it. It is similar to subscription, that is, the client registers the specified Watcher with the server, and when the server meets some events or requirements of Watcher, it sends event notifications to the client. After receiving the notification, the client finds its own Watcher and executes the corresponding callback method.

Watcher mechanism

Typical application scenarios of Zookeeper

With all this theoretical knowledge, you may be confused. What is the use of these things? What can you do? Don’t worry. Just listen to me.

Choose the main

Remember the temporary nodes we talked about above? Because of its consistency, Zookeeper is good at ensuring global uniqueness of node creation (that is, the same node cannot be created repeatedly) with high concurrency.

With this feature, we can have multiple clients create a specific node, and the successful creation is the master.

But what if the master dies?

Why do you think we created temporary nodes? Remember the life cycle of temporary nodes? Does the master hang up mean the session is broken? Does a broken session mean that the node is gone? Remember Watcher? If the number of child nodes changes, it means that the master has died. Then we trigger a callback function to re-elect the node. Or we directly monitor the status of the node. We can tell if the master has hung up by seeing if the node has lost its connection and so on.

Choose the main

In general, we can fully use the temporary node, node state and watcher to achieve the function of master election. The temporary node is mainly used for election, and the node state and watcher can be used to judge the activity of master and re-election.

A distributed lock

Distributed locks can be implemented in many ways, such as Redis, database, zooKeeper, etc. Personally, ZooKeeper is very, very simple in implementing distributed locks.

We have already mentioned that ZK ensures global uniqueness of node creation in the case of high concurrency, so it’s easy to see what it can do. Implement mutex bai, and because can be distributed in the case, so can achieve distributed lock bai.

How do you do that? This is basically the same as primary selection, but we can also use the creation of temporary nodes to achieve this.

The first must be how to obtain the lock, because of the uniqueness of creating nodes, we can let multiple clients create a temporary node at the same time, the successful creation indicates that the lock has been obtained. The client that did not acquire the lock also creates a watcher to monitor the status of the node. If the mutex is released (either the client that acquired the lock is down, or the client voluntarily released the lock), it can call a callback function to regain the lock.

Zk does not need to worry about unreleased locks as redis does, because when the client hangs, the node hangs and the lock is released. Isn’t that easy?

Can you use ZooKeeper to implement both shared and exclusive locks? The answer is yes, but it’s a little more complicated.

Remember ordered nodes?

At this time I stipulate that all nodes must be created in order, when you are a read request (to obtain the shared lock), if there is no smaller nodes, or smaller nodes are read requests, then you can obtain the read lock, and then you can start reading. If there is a write request on a node smaller than the client, the client cannot obtain the read lock and can only wait for the previous write request to complete.

If you are making a write request (obtaining an exclusive lock), if there is no node smaller than yourself, the current client can directly obtain the write lock and modify the data. If a node is smaller than the client, the client cannot obtain the write lock for any read or write operation and waits for all previous operations to complete.

This works well for both shared and exclusive locks, with optimizations such as notifying all waiting clients when a lock is released, leading to a herding effect. You can do this by having waiting nodes listen only on nodes in front of them.

So how do we do that? Write requests only listen on the last node that is smaller than their own.

The naming service

The problem with UUID is that it’s too long… (Too long is not always a good thing, hey hey hey). So if conditions permit, can we use ZooKeeper to implement?

As mentioned earlier, ZooKeeper stores data nodes in a tree structure. That is to say, the full path of each node must be unique, so we can use the full path of the node as the naming method. And more importantly, paths are self-definable, which makes it easier to understand the ID Settings for some semantic objects.

Cluster management and registry

Zookeeper is too powerful. How can it be so capable?

Don’t worry, it can do a lot of things. Maybe we need to know how many machines are working in the whole cluster, we want to collect data on the runtime state of each machine in the cluster, perform offline operations on the machines in the cluster, and so on.

Zookeeper naturally supports watcher and temporary nodes to fulfill these requirements. We can create temporary nodes for each machine and monitor its parent, and if the list of child nodes changes (we may have created or deleted temporary nodes), we can use watcher bound to its parent for state monitoring and callbacks.

Cluster management

As for the registry, it is also very simple. We also let the service provider create a temporary node in ZooKeeper and write its IP, port and call mode into the node. When the service consumer needs to make calls, it will find the address list of the corresponding service (IP port and so on) through the registry. When a consumer invokes a service, it does not request the registry, but calls the service directly from the server of a service provider in the address list through a load balancing algorithm.

When a server of a service provider goes down or goes offline, the corresponding address is removed from the service provider address list. At the same time, the registry sends the new service address list to the service consumer’s machine and stores it on the consumer’s machine (of course you can have the consumer node-listen, as I remember Eureka would try and error and then update).

The registry

conclusion

See the students here is really too patient ๐Ÿ‘๐Ÿ‘๐Ÿ‘, if you think I write well, click a thumbs-up ha.

I don’t know if you remember what I said ๐Ÿ˜’.

In this article, I take you through zooKeeper, a powerful distributed coordination framework. Now let’s briefly comb through the content of the whole article.

  • The difference between distributed and clustered

  • 2PC, 3PC and PAXOS algorithm of these consistency framework principle and implementation.

  • Zookeeper’s dedicated consistency algorithm ZAB broadcasts the content of the atomic protocol (Leader election, crash recovery, message broadcast).

  • Basic zooKeeper concepts such as ACLs, data nodes, sessions, watcher mechanisms, etc.

  • Typical application scenarios of ZooKeeper, such as master selection and registry.

    If you forget, go back and understand again. If you have any questions or suggestions, please feel free to send them to ๐Ÿค๐Ÿค๐Ÿค.