Uber’s team of engineers has released Chaperone, an open-source Kafka monitoring tool. At Uber, it is used to monitor data loss, latency, and duplication in multiple data centers and high-volume Kafka clusters.

Uber’s Kafka pipeline now spans multiple data centers. Uber’s systems generate logs of numerous service calls and events. These services run in live mode across multiple data warehouses. Data flowing through Uber’s Kafka pipeline is used for both batch processing and real-time data analysis.

Kafka is a data bus that connects Uber’s systems and a tool called uReplicator. The uReplicator is a Kafka replicator designed to replicate MirrorMaker, which Kafka uses to replicate existing clusters. When a log message is pushed to Kafka’s agent, the agent summarizes the message and pushes it to the Kafka regional cluster corresponding to the data warehouse. Consumers simultaneously process data within a Kafka regional cluster and a Kafka architecture that incorporates data from multiple data warehouses. Chaperone monitors these messages in real time.

Chaperone’s primary responsibility is to detect data loss, delay, duplication, and other data anomalies as it passes through the pipeline. It contains four components:

  • Monitoring library, which collects, summarizes, and outputs statistics for each application monitoring message. Tumbling Windows this library uses the concept of Tumbling Windows to aggregate information to generate monitor messages and send them to the corresponding Kafka topic. Rollover Windows are often used in stream processing systems like Apache Flink to divide stream data into non-overlapping shards.

  • Chaperone service (ChaperoneService), which consumes every piece of Kafka data and logs time stamps, and pushes generated monitoring messages to the corresponding Kafka topic.

  • Chaperone collector, which takes data from ChaperoneService, stores it in a database, and displays it in the user interface, making it easy to detect and locate message loss and latency.

  • WebService, which exposes REST APIs for retrieving or processing data.

In Chaperone’s implementation, monitoring data must be accurate. To achieve accuracy, Chaperone’s strategy is to ensure that each piece of data is monitored once and only once. Write-ahead logging (WAL) is used here. WAL records a monitor log before messages are sent from ChaperoneService to Kafka. This ensures that any messages sent can be played back if the service goes down. This technique is common in databases such as PostgreSQL.

Another strategy is to use a consistent timestamp regardless of where and at what stage a monitoring message is processed. Chaperone has not solved this problem completely. Hybrid technologies based on message encoding are currently used. For avro-Schema encoded messages, timestamps can be read out in constant time. For JSON messages, the Chaperone team wrote a stream-based JSON parser that only reads the timestamp and does not parse the entire JSON message. The timestamp of message processing is still used on the broker client and server.

Chaperone is not limited to checking for data loss, but can also be used to read data from Kafka by timestamp rather than by offset. This allows users to read data in any time range, regardless of whether the data has been processed or not. Therefore, Chaperone can also be used as a debugging tool, allowing users to view already processed messages for further analysis.

The source code for Chaperone is available on GitHub (github.com/uber/chaper…

Today’s recommendation,

Click on the image below to read it

How does Dropbox store user account passwords?



QCon Beijing will invite technical experts from typical Internet companies such as Google, Facebook, Alibaba, Tencent, Baidu, Meituan-Dianping and IQiyi to share their latest achievements in relevant technology fields. Surprise!