preface

Microservices are an architectural style in which a large complex software application typically consists of multiple microservices. Each microservice in the system can be deployed independently, and each microservice is loosely coupled. Each microservice is only focused on completing one task and doing it well.

Before microservices, there were many single applications with low monitoring complexity and single scenarios. Under microservices, business logic is scattered in many processes (many large businesses involve dozens of services in a business process). Once a business problem occurs, tracing its source is like looking for a needle in a haystack. At this time, a perfect monitoring system is needed.

A perfect monitoring system, its construction cycle is relatively long, and with the change of business scenarios, itself also needs to be constantly iterative optimization. This paper only discusses how to establish a unified monitoring data collection and display system from several monitoring dimensions and atomic scenes, hoping to inspire people to continue to think deeply about the construction of monitoring system.

Several monitoring dimensions under microservices

Compared with the monitoring of traditional applications, the most obvious change of micro-service monitoring is the change of perspective. We transform the monitoring from the perspective of machine to the perspective of service. From the perspective of micro-service, monitoring can be stratified from data dimension, resource dimension and code dimension, as shown in the following figure:

Data dimension

Current WEB services is the mainstream, each WEB service has an entrance, whether the APP or WEB pages, entrance is responsible for user interaction, and information to the user’s background, the background generally have access to the LB or Gateway, is responsible for load balancing and forward data to specific application processing, finally the application processing after written to the database.

Resource dimension

Now many services are deployed in the cloud, involving virtualization technology, virtual hosts run on physical servers, virtual hosts are connected to each other through virtual networks. In the resource level monitoring, is an indispensable part, we not only need to collect the performance indicators of the virtual host, but also need to know the CPU, memory, disk IO data on the server running the virtual host, and the bandwidth load of the virtual network between the virtual host.

Code dimension

APM, or application performance analysis, code side monitoring and acquisition, emerged with the rise of microservices. In the microservice scenario, a business process spans dozens of services. With only traditional monitoring data, it is difficult to locate the root of the problem.

We can develop a specific collection framework for the technical stack of the code. Within the acceptable range of performance loss, we can collect the call relationship between functions and the call topology between services, and measure the response time of functions or services, so as to optimize performance or predict faults in advance.

This section describes scenarios for key monitoring indicators

The biggest feature of micro-service monitoring can be summed up in a word: there are a lot of services and the invocation between services is also very complex. When a system is faulty, critical monitoring indicators are needed to quickly locate the faulty system among hundreds of related and complex service systems. In addition to the above three dimensions, we analyze the alarms that may be generated at each level of each dimension, and summarize eight atomized monitoring scenarios, such as URL monitoring, host monitoring, and product monitoring.

URL monitoring: Both apps and the WEB essentially initiate background calls through urls. MOCK call apis can be used to obtain indicators such as response time and response status codes to display the overall health status of monitoring services.

Host monitoring: Collects basic monitoring information, such as CPU, memory, and I/O data, from hosts using the installation agent. In addition, users can enable data collection for other open source applications, such as Tomcat and Nginx, using the configuration file.

Product monitoring: The public cloud provides hosts, networks, storage devices, and some middleware to users in the form of products. The product service background reports indicator data related to each product to monitor the health status of each product resource.

Component monitoring: Some open source components, such as Tomcat, Nginx, Netty and other monitoring data collection, can load the corresponding component monitoring and collection program through the agent on the host.

Custom monitoring: Service instances collect service-related data and periodically invoke APIS to report data. Multiple service instances can report one monitoring item at the same time and query alarms in multiple dimensions.

Resource monitoring: Users report user-defined data based on resources. Each resource has the same monitoring items and the monitoring items are independent of each other.

APM: according to the different language stack, respectively to achieve the function call relationship, call topology between services display. According to different languages, some need to invade the code and collect data in the form of SDK embedding, while others decouple the code and realize data collection by overloading some methods through metaprogramming.

Event monitoring: Provides unified storage, analysis, and display of discontinuous events in public cloud products and service logic, such as cloud disk unavailability events and SSD Reset events.

With the data collection of the above atomized scene, we can display the monitoring data uniformly through THE UI, and design graphical pages based on the three dimensions described above and user experience as the core. Generally, graphical time series is taken as the horizontal axis to show the changes of indicators over time. For some statistical indicators, pie charts and bar charts can also be used to show the analysis and comparison results.

This paper mainly describes the collection and display of data in the monitoring system. As for the data storage and alarm process, interested students can continue to follow the monitoring related articles.

The authors introduce

Dong Lei: UCloud technology expert. Ten years of IT industry development experience, currently responsible for the design and development of UCloud hybrid cloud and monitoring products, continue to pay attention to micro-service architecture, monitoring, DevOps and other fields.

For more technical dry goods, please pay attention to “UCloud Technology Bulletin board” on wechat.