Project background

CAT (Central Application Tracking) is an open source distributed real-time monitoring system developed by Meituan-Dianping based on Java. The Infrastructure Department of Meituan-Dianping hopes to provide industry-leading and unified solutions in basic storage, high-performance communication, large-scale online access, service governance, real-time monitoring, containerization and intelligent cluster scheduling. CAT is currently positioned as a unified monitoring component in the application layer. It is widely used in middleware (RPC, database, cache, MQ, etc.) framework to provide system performance indicators, health status, real-time alarm and other services for each service line.

This paper will give a detailed and in-depth introduction to CAT client, performance, etc. Not long ago, we also published a CAT related article, which introduced the design ideas of CAT client and server in detail. For more details, please read “In-depth Analysis of Open Source Distributed Monitoring CAT”.

Product value

  • Reduce fault discovery time
  • Reduce the cost of locating faults
  • Aid in application optimization

Technical advantages

  • Real-time processing: The value of information diminishes over time, especially during incidents
  • Full data: Collects full indicator data for in-depth analysis of fault cases
  • High availability: Fault recovery and fault location require high availability monitoring
  • Fault tolerance: A fault does not affect the normal running of services and is transparent to services
  • High throughput: The collection of massive monitoring data requires high throughput capacity
  • Scalable: supports distributed, cross-IDC deployment and horizontal expansion of the monitoring system

Use of the status quo

At present, CAT has covered meituan-Dianping’s core business lines, such as food delivery, wine travel, travel, and finance. It has almost been connected to all the core applications of Meituan-Dianping and has been widely used in the production environment.

Since the beginning of 2016, CAT access applications increased by 400%, the number of machines increased by 900%, the total amount of messages processed per day reached 320 billion, the amount of messages stored was nearly 400TB, and the peak cluster QPS reached 6.5 million/SEC.

In the face of the exponential growth of traffic, CAT has encountered unprecedented challenges in communication, computing and storage. The whole system architecture also underwent a series of upgrades and transformations, including message sampling and aggregation, message storage, multi-dimensional business index monitoring, unified alarm, etc. The project was finally implemented stably. For the company in the next few years the steady growth of business flow, laid a firm foundation.

After 7 years of continuous construction, CAT is constantly developing. We also hope to better give back to the community and benefit more external companies with the services provided by CAT. We will continue to iterate and update the open source version this year, and we will continue to promote some good practices within the company in the future. We welcome everyone to build this open source community with us.

The new features

The CAT 3.0.0 Release Notes

Multilingual client

With the continuous development of business, many products and applications need to use different languages. The demand for CAT multi-language clients is increasing. In addition to Java clients, C/C++, Python, Node.js and Golang clients are provided, covering the mainstream development languages basically. For the multi-language client, the core design goal is to use the C client to provide the core API interface as the underlying foundation and encapsulate the SDK of other languages.

Usage guidelines for the main supported languages:

  • Java
  • C/C++
  • Python
  • Node.js
  • Golang

Performance improvement

  • Message sampling aggregation

    Message sampling aggregation plays an important role in the client coping with heavy traffic. It is reported when a sample is hit or the memory queue is full. Sampling aggregation splits and classifies the message tree, uses the local memory to make classification statistics, and reports the aggregated data, reducing the amount of messages on the client and reducing the network overhead.

  • Communication protocol optimization

    The communication protocol between the CAT client and server is upgraded from a customized text protocol to a customized binary protocol, which significantly improves the performance in large-scale real-time data processing scenarios. Currently, the server supports two versions of communication protocols and is backward compatible with older clients.

    • Test environment: CentOS 6.5, 4C8G vm
    • Test results: The serialization time of the new version is about 3 times lower than that of the old version

  • Message file store

The new message file store has been redesigned to address the old version of file storage index, data file node overload, and random IO deterioration.

In order to achieve read and write performance at the same time, the new message file storage introduces a secondary index storage scheme, which combines IP nodes of the same application and ensures certain sequential storage. The following figure shows the minimum unit of the index structure. Each index file consists of several minimum units. Each unit is divided into 4 x 1024 buckets. The first bucket serves as the level-1 index Header and stores the mapping information between IP addresses, message serial numbers and buckets. The remaining 4 x 1024-1 buckets serve as secondary indexes to store message addresses.

New message file storage The number of file nodes is proportional to the number of applications, which effectively reduces random I/O and improves the performance of real-time message storage. The data comparison of single machine message storage in CAT online environment of Meituan-Dianping is as follows:

The future planning

  • Technology Stack upgrade

    Embrace the mainstream technology stack, reduce learning and development costs, use the mainstream technology tools of the open source community (Spring, Mybatis, etc.), and build the next generation of open source products.

  • The product experience

    Brand new design of products and interactions to improve user experience.

  • Open Source Community Building

    Product official website construction, organization of technical exchanges.

  • More Language SDKS

About open source

Github.com/dianping/ca…

Since CAT was open source in 2011, Github has gained 5900+ star and 2400+ Forks, and has been used by more than 100 companies, including ctrip, Lufax, Liepin, Ping An and other well-known companies in the industry. In the annual global Qcon, Global Architecture and Operation Technology Summit and other continuous technology output, recognized by the industry, more and more enterprise partners join in the open source construction of CAT, contributing a great force to the growth of CAT.

Huang Binqiang, head of the infrastructure department of Meituan-Dianping, said that in the past four years, Meituan-Dianping has accumulated a lot of accumulation in the field of architecture middleware, and many system services have experienced the actual operation of large-scale online business. While using many open source products in the industry, we also hope to open source the accumulated technology. On the one hand, we want to give back to the community and contribute to the whole industry ecology. On the other hand, let more interested development engineers can also participate in the upgrade and innovation of system software together. Therefore, excellent projects like CAT will be exported in open source and operated continuously for a long time to ensure the maturity, support and active community of open source software. We also welcome more valuable comments and suggestions from you.

conclusion

This is a long run with no end, and our CAT project team will keep moving forward patiently for a long time. We hope that friends in the same industry will actively participate in us, pay attention to us, and jointly create an enterprise-level highly available, highly reliable distributed monitoring middleware product, and jointly describe the new future of CAT!

This is just a new starting point. If you have any comments or suggestions on the new version of CAT, please feel free to contact us at [email protected] or Github issues

Recruitment information

Meituan-dianping infrastructure team is looking for senior and senior Technical experts in Java, based in Beijing and Shanghai. We are the group’s core team dedicated to the development of company-level, industry-leading infrastructure components, covering the technical areas of distributed monitoring, service governance, high-performance communications, message-oriented middleware, basic storage, containerization, cluster scheduling and so on. Interested students are welcome to submit their resumes to [email protected].