Springboot service monitoring based on Promethus and Grafana

Promethus is a next-generation monitoring framework, a Prometheus Server that handles millions of metrics, and Grafana’s wide variety of image presentation middleware services. This article focuses on how we are using Promethus + Grafana to monitor springboot microservice request response time.

1. The deployment of promethus

If you want to do a good job, you need to do a good job. We need to deploy Promethus first. Whether you deploy docker or start directly with the zip package decompression command, Promethus will set itself as a monitored service by default because it is not configured with the service addresses promethus is supposed to monitor.

2. The configuration of promethus

Open the promethus default configuration file, promethus.yml

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
alerting:  # Alert Manager service address
  alertmanagers:
  - static_configs:
    - targets: ["localhost:8200"]
 # set alert manager to trigger alert conditional file,Rule_files: - is ignored if promethus alert is not used"alert.rules" 
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any 
  timeseries scraped from this config.
- job_name: 'sync_bigdata_platform'

    # Override the global default and scrape targets from 
    this job every 5 seconds.
    scrape_interval: 5s

    The # service is exposed to promethus's API, which Promethus will pull throughMetrics_path:'/actuator/prometheus' 
    
    HTTP is used by default
    scheme: http
    
    Service exposed address
    static_configs:
      - targets: ['localhost:8080'] 


Copy the code

Configuration completed a simple at this time from the server side pull metric job of monitoring information, restart promethus, can be registered in promethus http://localhost:9090/targets to view their job instance is normal, Promethus cannot pull metric information if it is not normal

3. Configure the springboot service to monitor itself

How does SpringBoot count custom metric information?

Just introduce the following dependencies in the SpringBoot project:

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
  <groupId>io.micrometer</groupId>
  <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
Copy the code

Springboot-actuator provides powerful monitoring capabilities for applications. After 2.x, the underlying layer of the actuator is modified to be Micrometer. Micrometer is a monitoring facade, you can use Micrometer to connect with a variety of monitoring systems. After importing the dependencies, the configuration file needs to configure enabling information for the actuator configuration:

management:
  endpoint:
    health:
      enabled: false
  endpoints:
    web:
      exposure:
        include: The '*'
        exclude: env,beans
  metrics:
    enable:
      http: false
      hikaricp: false
Copy the code

Start springboot service, can see through the http://localhost:8080/actuator/prometheus interface to some statistics to the JVM metric information, that how to customize the statistical monitoring of metric information?

3.1 Register MeterRegistry with your own service class. MetricRegistry is an abstract class provided by Micrometer that acts as a Micrometer repository. You can add some MeterFilters through this class. Percentiles (0.5,0.75,0.9) are computed for some result statistics. If only Counter of service access and response time Timer of server are counted, the following statistics can be directly used:

Metrics.counter("http_requests_total",  Tags.of("service", service,"category", category, "method", method)).increment();
Metrics.timer("requests_latency",Tags.of("service", service,"category", category, "method", method)).record(Duration.ofMillis(time));
Copy the code

3.2 On the Prometheus home page, you can view the values of various indicators and aggregate the values of Prometheus

4 Draw the required graph in Grafana

The nice thing about Prometheus when using data collected by Prometheus in grafana to make the graphs you want is that when you count a data item, it stays a data item and does not add data items. Functions in the official Prometheus documentation also describe usage scenarios, such as drawing an average request latency of less than 1 minute:

rate(requests_latency_seconds_sum{job=~"$job",
instance=~"$instance",service=~"$service"}[1m]) /rate(requests_latency_seconds_count{job=~"$job",
instance=~"$instance",
service=~"$service"}[1m])
Copy the code

Because we take an average value, not an instantaneous value, we use the rate function. Request count (); request count (); request count (); The problem is that promethus’s calculated value is only an estimate, an approximate “linear fit”. It takes a value that does not have an interval, so it gives a decimal. In this case, The only way we can do this is to round up or down so that people don’t look at this graph and wonder why there’s a decimal when they’re counting integers.

Since promethus is used to pull the monitoring data of our service, instead of pushing data through Promethus’s push gateway, if the service actively pushes data, there will be great hidden trouble for our service response. Once Promethus has problems, Push data cannot be pushed, and there will be a big problem in our service return. In the case of pull, promethus has a problem and the monitoring data cannot be pulled, but the service still responds.

4.1. Use Promethus alert or Grafana’s alert function?

When making alert, I was torn between using Promethus alert and Grafana alert. The most striking thing about Promethus’s alert is its intelligent alert, which is that you set a threshold to determine whether you have reached the alert standard, and at the very beginning, a value that exceeds your threshold suddenly appears, and if it is Grafana, the alert alert will be triggered directly. However, according to Promethus, it only puts the alert into its PEDding queue. Within the specified time you set, if the value is always greater than the threshold you set, the alert will be moved to the firing queue, and then alert warning will be carried out. When this is better, it doesn’t alert all the time for a sudden outlier. The other thing about using Promethus’s Alert Manager is that you can write warning conditions using PromSQL provided by Promethus. You don’t need to set alert conditions in grafana by aggregating the original calculation. Alert Manager also provides Alert Silence to block alerts.