Zookeeper

  • Zookeeper: one designed for distributed applicationsDistributed coordination service
    • Zookeeper design objectives
    • ZookeeperThe data modelandHierarchical namespace
    • ZookeeperThe default nodeandTemporary node
    • ZookeeperConditional updatesandmonitoring(watches)
    • Zookeeperensure(Guarantees)
    • ZookeeperSimple API
    • ZookeeperRealize the principle of
    • Zookeeperuse
    • Zookeeperperformance
    • Zookeeperreliability
    • Zookeeper project

Zookeeper: one designed for distributed applicationsDistributed coordination service

ZooKeeper is an open source distributed coordination service for distributed applications. It contains a simple set of primitives on which distributed applications can implement synchronization services, configuration maintenance, naming services, and so on. Designed to be easy to program, Zookeeper uses a data model similar to a file system’s directory tree structure and runs in Java with Java and C bindings.

Coordinating services is very difficult to implement correctly. They are particularly prone to errors such as race conditions, deadlocks, and so on. The motivation behind ZooKeeper is to make it easier for distributed applications to implement coordinated services from scratch.

Design goals

Zookeeper is very simple.

ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchical namespace organized like a standard file system. The namespace is made up of data records called ZNode, which, in ZooKeeper parlances, are very similar to files and directories in a standard file system. Unlike typical file systems for storage, ZooKeeper data is kept in memory, which means ZooKeeper can achieve high throughput and low latency.

The implementation of Zookeeper focuses on high performance, high availability, and strict sequential access. The performance aspects of ZooKeeper mean that it can be used in large distributed systems. The reliability aspect prevents it from causing a single point of failure. Strict sorting means that complex synchronization primitives can be implemented on the client side.

Zookeeper is replicated

Like the distributed processes it coordinates, Zookeeper itself replicates between groups of hosts called “ensembles.”

The client connects to any ZooKeeper server. The client maintains a TCP connection that sends requests, gets responses, gets monitoring events, and sends heartbeats. If the TCP connection to the server breaks, the client will connect to a different server.

Zokeeper is sequential

ZooKeeper marks each update with a number that reflects the order of all ZooKeeper transactions. Subsequent operations can use this order to achieve higher levels of abstraction, such as synchronization primitives.

ZooKeeper is fast

It is especially fast when dealing with “read” loads. The ZooKeeper application runs on thousands of machines and performs best when reads are more common than writes, by a ratio of about 10:1.

Zookeeper data model and hierarchical namespace

ZooKeeper provides namespaces that are very similar to standard file systems. A path is a series of elements separated by a slash /. Each node in the ZooKeeper namespace is identified by a path.

ZookeeperThe default nodeandTemporary node

Unlike standard file systems, each node in the ZooKeeper namespace can have data associated with it as well as child nodes. This is like having a file or a directory in a file system. ZooKeeper is designed to store relevant coordination data, such as status information, configuration information, location information, and so on, so the data stored on each node is usually very small, in the range of bytes to kilobytes. We use the term ZNode to make it clear that we are talking about ZooKeeper data nodes.

Znode maintains a status (STAT) structure that contains version numbers and timestamps for data changes, access control list (ACL) changes, and can be used for cache validation and coordinated updates. Each time a ZNode’s data changes, the version number increases. For example, whenever a client retrieves data, the client also receives version information for that data.

Data stored on each node in the namespace is read and written atomically. Reading a ZNode gets all of its data, while writing replaces all of its data.

ZooKeeper also has the concept of temporary nodes. The transient node always exists when the client session that created the temporary node remains active. When the session terminates, the instantaneous node is deleted.

ZookeeperConditional updatesandmonitoring(watches)

ZooKeeper supports the concept of watches. The client can set up a watch on ZNode. When zNode changes, the watch is triggered and removed. When watch is triggered, when Watch is triggered, the client receives a packet describing the zNode changes. If the client is disconnected from the Zookeeper server, the client will receive a local notification.

Zookeeperensure(Guarantees)

Zookeeper is very fast and very simple. However, since it is intended as a basis for building more complex services such as “synchronization,” it provides a set of guarantees:

  • Sequential consistency – Change requests from clients will be applied in the order in which they were sent.
  • Atomicity – Changes either succeed or fail. There is no partial success or partial failure.
  • Single system image – Clients see the same view of the Zookeeper service regardless of which server they are connected to
  • Reliability – Once a change request is applied, the result of the change is persisted until overwritten by the next change.
  • Timeliness – The system view seen by the client is always up to date within a certain time frame.

Simple API

One of ZooKeeper’s design goals is to provide a very simple programming interface. Therefore, it only supports these operations:

  • create

    Creates a node at a specific address in the tree.

  • delete

    Example Delete a node.

  • exists

    Check whether the node exists in a path.

  • get data

    Get node data.

  • set data

    Writes data to a node.

  • get children

    Retrieves a list of children of a node.

  • sync

    Wait for data propagation to complete.

ZookeeperRealize the principle of

ZooKeeper Components displays the advanced Components of the ZooKeeper service. With the exception of the Request Processor, each server that makes up the ZooKeeper service makes a copy of each component.

replicated database

Each Zookeeper server provides services to a client, which connects to an exact Zookeeper server to submit requests. The read request is retrieved from a local copy of the server database. Requests to change the Zookeeper service status and write requests are processed through a consistency protocol.

As part of the protocol, all write requests from the client are forwarded to a separate server, called the Leader. The rest of the servers, called followers, receive the message proposal from the leader and agree on the delivery of the message. The message layer maintains update replacement and synchronization between the leader and followers when the leader fails.

Zookeeper uses a custom atomic message protocol. Because the message layer is atomic, Zookeeper ensures that local replicas are not inconsistent. When the leader receives a write request, it calculates the state the system is in and when the write request is applied, and converts this into a transaction containing the new state.

use

Zookeeper’s programming interface is intentionally simple. However, higher-order operations such as synchronization primitives, member grouping, ownership, and so on are possible through these programming interfaces.

performance

Zookeeper is designed for high performance. But is it? Research by the Zookeeper team at Yahoo! ‘s r&d center suggests this is the case. (See the following figure: Zookeeper throughput changes with read/write ratio). This is particularly high performance in applications that are “read” rather than “write” because “write” causes state to be synchronized across all servers. (” read “over” write “is a typical scenario for coordinating services.)

Zookeeper throughput changes with read/write ratio

Note: Read/write performance in version 3.2 is up to 2 times better than before version 3.1.

Benchmarking also demonstrated the reliability of Zookeeper. Figure “Reliability in the Event of an error” shows how Zookeeper responds to various failures. The events marked in the figure are as follows:

  1. A Follower fails and then recovers.

  2. A different Follower fails and then recovers.

  3. Leader failure.

  4. Both followers fail and then recover.

  5. The other Leader fails.

reliability

About ZooKeeper project

Zookeeper has been used successfully in many industrial applications. At Yahoo, Zookeeper is used as a coordination and failure recovery service for Yahoo messaging middleware, a highly scalable publish-subscribe system that manages thousands of topic replication and data distribution. Zookeeper is also used in Yahoo’s crawler crawl service to manage failure recovery. Many Of Yahoo’s advertising systems also use Zookeeper as a reliable service.