Big data technology learning, gradually become a required course for many programmers, but also appeared a lot of technical forums for everyone to discuss, so today good programmers share big data technology Zookeeper cluster management and election, we can learn together!


1. Monitor cluster machines


This is typically used in scenarios where there are high requirements on the state of the machines in the cluster, the machine on-line rate, and the ability to quickly respond to changes in the machines in the cluster. In such scenarios, there is often a monitoring system to check whether the cluster machine is alive in real time. It used to be that the monitoring system periodically checked each machine by some means (such as ping), or that each machine periodically reported to the monitoring system that it was alive. This works, but there are two obvious problems:


When there is a change in the cluster machine, more things are involved in the modification.


There is a delay.


ZooKeeper has two features that allow real-time monitoring of another cluster machine activity:


The client registers a Watcher on node X, so if x? The client is notified when the child node of the


Create an EPHEMERAL node that disappears once the session between the client and server ends or expires.


For example, the monitoring system registers a Watcher on the /clusterServers node and creates an EPHEMERAL node under /clusterServers each time machines are added dynamically: /clusterServers/{hostname}. In this way, the monitoring system can know the increase or decrease of the machine in real time, and the subsequent processing is the business of the monitoring system.


2. Master the election


In a distributed environment, the same business application distribution on different machines, some business logic (such as some time-consuming calculation, network I/O), often only need to make a machine in the cluster, the rest of the machine can share the results, it can greatly reduce duplication of work, improve performance, Thus the master election is the main problem encountered in this scenario.


The strong consistency of ZooKeeper ensures global uniqueness of node creation under distributed and high concurrency. That is, when multiple clients request to create /currentMaster nodes at the same time, only one client request can be successfully created. With this feature, it is easy to select clusters in a distributed environment.


Another evolution of this scenario is the dynamic Master election. So that’s what you need, right? EPHEMERAL_SEQUENTIAL node features.


As mentioned above, of all client create requests, only one will be successfully created. With a slight variation, all requests are allowed to be created successfully, but in a certain order, so that all requests end up on ZK:

/currentMaster/{sessionId}-1 ,? /currentMaster/{sessionId}-2 ,? / currentMaster / {sessionId} – 3… . Each time the machine with the smallest serial number is selected as the Master, and if the machine dies, the smallest machine will be the Master, since the node it created will die immediately.


3. Search systems


In a search system, if each machine in the cluster generates a full index, it is not only time-consuming, but also cannot guarantee that the index data is consistent with each other. So let the Master in the cluster do the full index generation and then synchronize to the other machines in the cluster. In addition, the disaster recovery measure of Master election is that the Master can be manually specified at any time, that is, the application zK can obtain Master information from a place, such as HTTP.


In Hbase, ZooKeeper is also used to implement dynamic HMaster election. The Hbase implementation stores ROOT table addresses and HMaster addresses on the ZK, and HRegionServer registers itself as an Ephemeral node into Zookeeper. In this way, HMaster can sense the survival status of each HRegionServer at any time. In addition, once HMaster fails, another HMaster is elected to run, avoiding the single point problem of HMaster


To learn big data development, you can refer to the big data learning route provided by programmers, which provides a complete knowledge system of big data development, including Linux&&Hadoop ecosystem, big data computing framework system, cloud computing system, machine learning && Deep learning.