Summary: Development, release, and deployment of SpringBoot microservices are only a small part of its life cycle. Application and system operation and maintenance are the most important. In the process of operation and maintenance, monitoring work plays an important role. So, in order to continuously observe the state of the system, how to quickly implement Prometheus monitoring access for Spring Boot applications. This article explains in detail the complete access process and access matters!

Author: Fan Xing

For developers, most of the bad experiences behind traditional SSM MVC applications are the result of a lot of configuration when building a project. In order to solve the above problems, Spring Boot came into being. As the name suggests, the core value of Spring Boot is automatic configuration, which Spring can do as long as the corresponding JAR package exists. If the default configuration does not meet your requirements, you can also replace the automatic configuration classes and use custom configurations to quickly build enterprise-level applications.

But building a Spring Boot application is just the first step. How do we monitor it as it goes live?

Basic knowledge and concepts

First of all, before the formal explanation, I will explain to you the basic knowledge and concepts needed for this sharing. Generally speaking, building a complete and easy-to-use monitoring system mainly includes the following key parts.

Collection of monitoring data

At present, the common methods of collecting monitoring data in the industry are mainly divided into Push and Pull modes. Take the more and more widely used Prometheus monitoring system for example, Prometheus is a typical system running in Pull mode. Application and infrastructure monitoring data is exposed to Prometheus in the form of a standard OpenMetrics interface, which is periodically captured and stored by Prometheus.

Here’s a quick overview of OpenMetrics. As a cloud-native and highly extensible metric Protocol, OpenMetrics defines the de facto standard for large-scale reporting of cloud-native metrics and supports the text representation Protocol and Protocol Buffers Protocol, of which the text representation Protocol is more common. It is the default protocol used by Prometheus for data fetching. The following figure shows a sample metric representation format based on the OpenMetrics format.

The Metric data model is defined by the Metric name and a set of key/value labels with the same Metric name and labels belonging to the same sequential set. For example, acme_http_router_request_seconds_sum{path=”/ API /v1″,method=”GET”} specifies the indicator name acme_http_router_request_seconds_sum. The value of the label method is a sample point data of POST. The sample point contains a float64 value and a MILLISECond UNIX timestamp. Over time, the data collected from these sampling points will draw dynamic lines on the graph in real time.

Currently, most of the basic components in the cloud native system can support the OpenMetrics text protocol format mentioned above. For those components that do not support their own exposure metrics, The Prometheus community also has a very rich supply of Prometheus Exporter for development and operations. These components (or exporters) respond to periodic fetching requests from Prometheus by logging their health to Prometheus in a timely manner for subsequent processing and analysis. For application developers, the Prometheus multilanguage SDK also provides code burial sites to integrate their business metrics into the Prometheus ecosystem.

Data visualization and analysis

After obtaining intuitive information such as application or infrastructure running status, resource usage, and service running status, you can query and analyze multi-type and multi-dimensional information to easily track and compare nodes. At the same time, the operating status of the current system is known through the standard and easy to use visual large plate. A common solution is Grafana, a popular data visualization solution in the open source community that offers rich chart forms and templates. Alicloud ARMS Prometheus monitoring service also provides users with a fully managed version of monitoring data query, analysis and visualization based on Grafana.

Timely alarm and emergency management

When a service fault is about to occur, the monitoring system needs to react quickly and notify the administrator. In this way, the fault can be quickly handled or prevented in advance to avoid the impact on services. When a problem occurs, the administrator needs to claim and handle the problem. Through the analysis of different monitoring indicators and historical data, root causes can be found and solved.

Access process Overview

Next, we’ll talk about Prometheus access for Spring Boot microservices applications deployed on a Kubernetes cluster. For Spring Boot applications, the community provides Spring Boot Actuator framework out of the box, which is convenient for Java developers to carry out code burying and monitoring data collection and output. Starting with Spring Boot 2.0, the Operating Actuator has changed the base layer to Micrometer, providing stronger and more flexible monitoring capabilities. Micrometer is a monitoring facade that can be likened to Slf4j for monitoring. Using Micrometer, apps can connect to various monitoring systems such as AppOptics, Datadog, Elastic, InfluxDB, and, of course, Prometheus, which we cover today.

Micrometer allows application developers to map Prometheus metrics to Java application metrics using three types of semantics:

  • Counter in MicroMeter corresponds to Counter in Prometheus and is used to describe a monotonically increasing variable, such as the number of accesses to an interface, the total number of cache hits/accesses, etc. Timer logically contains Counter, that is, if the Timer is used to collect the response time of each interface, it must also collect The Times of access. Therefore, you do not need to specify Timer and Counter for an interface
  • The Gauge in MicroMeter corresponds to the Gauge in Prometheus and is used to describe variables that fluctuate continuously over a range, such as CPU usage, number of thread pool task queues, etc.
  • The Timer in MicroMeter corresponds to the Histogram from Prometheus and is used to describe time-related data, such as the RT time distribution of an interface, etc.
  • DistributionSummary in Micrometer corresponds to Summary in Prometheus; similar to Histogram, Summary is also used for statistical data distribution. However, because the data distribution is stored by Prometheus after the calculation of the client, the Summary results cannot be aggregated among multiple machines and the data distribution of the global view cannot be counted, which has certain limitations in use.

When we need to connect a Spring Boot application deployed in Kubernetes cluster to Prometheus, we need to follow the process of code burying -> Deploying application -> Service discovery.

First, we need to introduce Spring Boot Actuator dependencies related to Maven in our code, and register the data we need to monitor, or annotate the methods in the Controller with responsiveness.

Second, we deployed the buried application in Kubernetes and registered with Prometheus the endpoints that pulled monitoring data from the application (i.e., Prometheus service discovery). Ali Cloud Prometheus service provides a method for service discovery using ServiceMonitor CRD.

Finally, after the monitor acquisition endpoint of the target application is successfully discovered by Prometheus, we can start to configure the data source and the corresponding large market on Grafana. You also need to configure alarms based on some key indicators. These requirements can be easily met in Aliyun Prometheus monitoring service.

Detailed access process

Next, we enter into the actual practice. Here we select a Cloud native micro-service application (

Github.com/aliyun/alib…

) as a baseline for our transformation.

Our ultimate goal is:

1. Entrance of monitoring system: Frontend service is an entrance application developed based on SpringMVC and receives external customer traffic. We mainly pay attention to the key RED indicators of external interfaces (call Rate, Error number of failures, Duration of request);

2. Monitor the critical link of the system: monitor the objects on the critical path of the back-end service, such as the queue status of the thread pool and the hit status of Guava Cache in the process;

3. Custom indicators that are strongly relevant to business (such as UV of an interface, etc.);

4. Monitor THE JVM GC and memory usage;

5. Display all indicators and configure alarms for key indicators.

The first step is to introduce Spring Boot Actuator dependencies for initial configuration

First, we need to introduce Spring Boot Actuator dependencies and expose the monitoring data port (defined here as 8091) in the application.properties configuration. Once successful, we can access the /actuator/ Prometheus path of the app’s 8091 port to obtain OpenMetrics monitoring data.

<! -- Spring-boot-actuator dependencies --> <dependency> <groupId> org.springFramework. boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency> <! -- Prometheus dependency --> <dependency> <groupId> IO. Micrometer </groupId> <artifactId> micrometer-registrie-prometheus </artifactId> </dependency> # application.properties Add the following configuration to expose metrics spring.application.name=frontend management.server.port=8091 management.endpoints.web.exposure.include=* management.metrics.tags.application=${spring.application.name}Copy the code

The second step, code burying point and transformation

To get the RED metric for an Api, we need the @timed annotation below on the corresponding interface method. Let’s take the index page interface in the demo project as an example. Note here that the value in the @timed note is the index name exposed to /actuator/ Prometheus, and histogram=true represents the length of time we expose the interface request histogram type indicator, which is convenient for our subsequent calculation of P99. P90 Request time distribution.

@Timed(value = "main_page_request_duration", description = "Time taken to return main page", Histogram = true) @apioperation (value = "后 ", Tags = {" 页 标 论 "}) @getMapping ("/") public String index(Model Model) {Model. AddAttribute ("products", tags = {" 页 标 论 "}) @getMapping ("/") public String index(Model Model) {Model. productDAO.getProductList()); model.addAttribute("FRONTEND_APP_NAME", Application.APP_NAME); model.addAttribute("FRONTEND_SERVICE_TAG", Application.SERVICE_TAG); model.addAttribute("FRONTEND_IP", registration.getHost()); model.addAttribute("PRODUCT_APP_NAME", PRODUCT_APP_NAME); model.addAttribute("PRODUCT_SERVICE_TAG", PRODUCT_SERVICE_TAG); model.addAttribute("PRODUCT_IP", PRODUCT_IP); model.addAttribute("new_version", StringUtils.isBlank(env)); return "index.html"; }Copy the code

If we use in-process Cache libraries in our application, such as the most common Guava Cache library, etc. If we want to track the health of the in-process cache, we need to encapsulate the key objects being monitored, following the modifications provided by Micrometer.

  • Guava Cache modification consists of four steps. The code changes are relatively small and can be easily accessed:

1. Inject MeterRegistry. The specific implementation of the injection is PrometheusMeterRegistry, which can be injected by Spring Boot

In figure 2, the use of tools API, show GuavaCacheMetrics. Monitor the packing the local cache

3. Enable caching data records by calling.recordStats()

4. Add a name to generate the corresponding indicator.

  • Thread pool transformation is a three-step process that is not very complicated:

1. Inject MeterRegistry, where the specific implementation is PrometheusMeterRegistry;

Wrap the thread pool with a utility API.

3. Add a name to generate the corresponding indicator.

Of course, we must have many custom metrics that are business relevant during development. To monitor these metrics, after injecting MeterRegistry into the Bean, we need to construct Counter according to our requirements and corresponding scenarios. Gauge or Timer (the differences and usage scenarios of these types are mentioned above) to make statistics and register them with MeterRegistry for metric exposure. Here is a simple example.

@Service
public class DemoService {

    Counter visitCounter;

    public DemoService(MeterRegistry registry) {
        visitCounter = Counter.builder("visit_counter")
            .description("Number of visits to the site")
            .register(registry);
    }

    public String visit() {
        visitCounter.increment();
        return "Hello World!";
    }    
}
Copy the code

At this point, our application code transformation is complete, and the next step is to rebuild and redeploy the application image to the Kubernetes cluster where ARMS Prometheus is installed. After that, we configured the ServiceMonitor in the ARMS Prometheus console for service discovery.

After adding the ServiceMonitor, we can see the newly registered application Service in the Targets list.

Step 3, Kanban configuration

Application monitoring data has been successfully collected and stored to ARMS Prometheus. The next, and most critical, step is to configure the corresponding market and alarm based on these data. Here, we build our own business monitoring template with the help of open source large-plate templates in the Grafana community. It is based on the following two templates:

  • Spring Boot 2.1 Statistics:

Grafana.com/grafana/das…

  • The JVM (Micrometer) :

Grafana.com/grafana/das…

With these templates and the Grafana service built into ARMS Prometheus, it was easy to organize metrics that were of great interest during daily development and operations into a Grafana Dashboard. Here’s an example of a real big board that we built internally based on the above template and our own business. It contains some overview, such as component running time and memory usage, etc., there are also some detail indicators, such as in pile out of memory, a generational GC conditions, etc., RED, etc., while others, such as interface request here but many here, everyone can give full play to your imagination to create unique cool the market

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.