1. what

  • Many components in the big data ecosystem are named after certain animals or insects, such as Hadoop (🐘) and Hive (🐝). Zookeeper is a zookeeper who manages components of a big data ecosystem

2. where

  • Zookeeper is a classic distributed data consistency solution that provides distributed coordinated storage services with high performance, high availability, and strict sequential access control capabilities for distributed applications.
  • The application scenarios include configuration maintenance, distributed lock service, cluster management, and generation of distributed unique ids

2.1 Maintaining Configuration Information

  • In the past, configuration information was saved in configuration files. However, in a distributed system, a service is required to ensure the consistency of configuration data on each server. In addition, a service is required to modify configuration items efficiently, quickly and reliably and ensure data consistency of configuration items on each server.
  • Zookeeper provides this service and uses the Zab consistency protocol to ensure consistency.
  • Many open source projects use ZooKeeper to maintain configurations. For example, in hbase, the client connects to a ZooKeeper and obtains necessary hbase cluster configuration information before further operations. In kafka, the open source message queue, ZooKeeper is also used to maintain broker information. In Alibaba’s open source SOA framework Dubbo, ZooKeeper is also widely used to manage some configurations to achieve service governance.

2.2 Distributed Lock Service

  • A cluster is a distributed system in which multiple servers perform the same service and sometimes need to coordinate the progress of each other. When one machine is operating, other machines cannot operate. Locks can be found in multiple threads in a single machine, but locks are also needed in distributed systems.

2.3 Cluster Management

  • If some servers need to be removed or added to a cluster due to hardware, software, or network faults, ZooKeeper notifies other servers that these servers are added or removed. And the timely allocation of storage, computing and other tasks to make adjustments.
  • Zookeeper also diagnoses faulty servers and tries to fix them.

2.4 Generating distributed Unique ids

  • Auto_increment does not directly create a unique ID for each record. You can use ZooKeeper to generate globally unique ids in a distributed environment.
  • Each time a new ID is generated, a persistent sequential node is created, and the node serial number returned by the creation operation is the new ID.

3. Design objectives of ZooKeeper

  • ZooKeeper is dedicated to providing distributed coordination services with high performance, high availability, and strict sequential access control capabilities for distributed applications

3.1 the high performance

  • ZooKeeper stores full data in memory and directly serves all non-transaction requests from clients, especially for read applications

3.2 high availability

  • ZooKeeper provides external services in a cluster. Generally, three to five zooKeeper servers can form an available zooKeeper cluster. Each zooKeeper maintains the current server status in the memory and communicates with each other. As long as more than half of the machines in the cluster are working properly, the cluster can be serviced properly

3.3 Strictly sequential access

  • For each update request from the client, ZooKeeper assigns a globally unique increment number that reflects the order in which all transactions are performed

4. Data model

  • The data structure of ZooKeeper can be regarded as a tree structure. Each node in the tree is called zNode. A ZNode can have multiple child nodes.
  • Zookeeper is a tree structure. A ZNode is located based on the path. Such as/ns – 1 / itcast/mysql/in/table1, here, itcast, mysql, ns – 1 in table1, were the root node, nodes level 2, level 3 and level 4 node; Where NS-1 is the parent of ITcast, ITcast is the child of NS-1, ITcast is the parent of mysql, mysql is the child of ITcast, and so on.

4.1 znode

  • Node type: temporary node and permanent node. The node type is determined at creation time and cannot be changed.
    • Temporary node: The life cycle of this node depends on the session that created it. The end session can be automatically or manually deleted. Although each temporary Znode is bound to a client session, they are visible to all clients. In addition, Temporary nodes of ZooKeeper are not allowed to have child nodes.
    • Persistent nodes: The lifetime of these nodes is independent of the session, and they can only be deleted when the client displays a delete operation
  • Znode consists of several parts
  • Znode data: znode data
  • Children of a node: children
  • Node status stat: describes the creation and modification of the current node
    • CZxid: transaction ID of the data node when it is created
    • Ctime: time when a data node is created
    • MZxid: indicates the transaction ID of the data node when it was last updated
    • Mtime: time when the data node was last updated
    • PZxid: transaction ID of the child node of the data node when it was last modified
    • Cversion: indicates the number of changes of the child node
    • Datspanning: Number of changes to node data
    • AclVersion: The number of times a node’s ACL changes (permissions)
    • EphemeralOwner: If the node is temporary, the SessionID of the session that created the node; If the node is persistent, the value of this property is 0
    • DataLength: Length of data content
    • NumChildren: indicates the number of children of a data node

5. how

5.1 Single-node Installation

  • First, you need to have a JDK1.8 environment, tutorials
  • To download the ZooKeeper package, go to the 3.4.10 Download link for all Versions
  • Unpack the
    // Decompress zookeeper tar -xzvf zookeeper-3.4.10.tar.gzCopy the code
  • Modify the configuration file. Change the data saving path of the configuration file to the actual absolute path of the data directory created by yourself.
    CFG conf/zoo_sample. CFG conf/zoo. CFG # Copy the example configuration file to a new vim Conf /zoo. CFG # Modify the configuration fileCopy the code
    / / / / configuration file content this path is used to store data memory snapshots, and things in zookeeper log file dataDir = / home/they/they are -3.4.10/data
    Copy the code
  • Start the ZooKeeper server. All scripts executed are stored in the bin directory.
    CD bin // Start Zookeeper. / zkserver. sh startCopy the code
  • Stop and view the status
    // Stop: zkserver. sh stop // Check the zkserver. sh statusCopy the code
  • Start the ZooKeeper client. Also in the bin directory.
    Sh./ zkcli. sh -server 192.168.233.133 # Start the ZooKeeper client with the corresponding IP addressCopy the code