We continued our discussion of the complex applications of Prometheus + Grafana in the previous section, Monitoring Prometheus + Grafana (1)

Achieve the goal

In this section, our goal is to build a multi-dimensional visual platform for monitoring micro-services, including Docker container monitoring, MySQL monitoring, Redis monitoring and micro-service JVM monitoring, and send early warning emails if necessary.

The main components used are Prometheus, Grafana, AlertManager, node_exporter, mysql_exporter, redis_exporter, and CAdvisor. Their respective functions are as follows:

  1. Prometheus: Obtains and stores monitoring data for third-party query;
  2. Grafana: Provides Web pages for visual presentation of monitoring data from Prometheus;
  3. Alertmanager: defines alarm rules and sends alarm information.
  4. Node_exporter: Collects microservice endpoint monitoring data (with Prometheus);
  5. Mysql_exporter: collects MySQL database monitoring data;
  6. Redis_exporter: Collects Redis monitoring data;
  7. Cadvisor: Collect Docker container monitoring data.

Install Grafana, Prometheus, and monitoring services using Docker

In the last section, we installed Grafana and Prometheus using Windows installation software directly, but in our daily production = environment we used Linux, so we chose the convenient Docker for installation and deployment.

  • Create Prometheus.yml in your own mount directory
# to create Prometheus to mount directory mkdir -p/dimples/volumes / # Prometheus create Prometheus vim configuration files in that directory /dimples/volumes/prometheus/prometheus.ymlCopy the code
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']
Copy the code
  • Create alertManager.yml in your own mount directory
global:
  smtp_smarthost: 'smtp.qq.com:465'
  smtp_from: '[email protected]'
  smtp_auth_username: '[email protected]'
  # qq mailbox to obtain the authorization code
  smtp_auth_password: 'xxxxxxxxxxxxxxxxx'
  smtp_require_tls: false

#templates:
# - '/alertmanager/template/*.tmpl'

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 5m
  repeat_interval: 5m
  receiver: 'default-receiver'

receivers:
  - name: 'default-receiver'
    email_configs:
      - to: '[email protected]'
        send_resolved: true
Copy the code
  • Create create docker-comemess. yml file
version: '3'

services:
  prometheus:
    image: prom/prometheus
    container_name: prometheus
    volumes:
      - /dimples/volumes/prometheus/:/etc/prometheus/
    ports:
      - 9090: 9090
    restart: on-failure
    command: 
      - '--web.enable-lifecycle '
  grafana:
    image: grafana/grafana
    container_name: grafana
    ports:
      - 3000: 3000
  node_exporter:
    image: prom/node-exporter
    container_name: node_exporter
    ports:
      - 9100: 9100
  redis_exporter:
    image: oliver006/redis_exporter
    container_name: redis_exporter
    command:
      - "-- redis. Addr = redis: / / 127.0.0.1:6379"
      - "--redis.password 'ZHONG9602.class'"    If there is no password, this parameter is not required
    ports:
      - 9101: 9121
    restart: on-failure
  mysql_exporter:
    image: prom/mysqld-exporter
    container_name: mysql_exporter
    environment:
      - DATA_SOURCE_NAME = root: 123456 @ / (127.0.0.1:3306)
    ports:
      - 9102: 9104
  cadvisor:
    image: google/cadvisor
    container_name: cadvisor
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:rw
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    ports:
      - 9103: 8080
  alertmanager:
    image: prom/alertmanager
    container_name: alertmanager
    volumes:
      - /dimples/volumes/alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml
    ports:
      - 9104: 9093
Copy the code

Start the service with docker-compose up -d

# do not use the docker - compose installation docker run - d - name Prometheus -p 9090:9090 - v/dimples/volumes/Prometheus: / etc/Prometheus prom/prometheus --config.file=/etc/prometheus/prometheus.yml --web.enable-lifecycle docker run -d --name redis_exporter -p 9101:9121 oliver006/redis_exporter --redis.addr redis://127.0.0.1:6379 --redis.password 'zhong9602.class'Copy the code
  • Test whether data is monitored

http://127.0.0.1:9090/alerts

As shown in the figure above, the two warning rules we just defined have loaded successfully

Then go to http://127.0.0.1:9090/targets to observe in Prometheus defined in the configuration file status of each job:

All you can see is the UP status of the monitor.

Can also click on the page all the Endpoint of the link above, if the page shows the data collected, shows all the Endpoint has been successfully extracted the data, in mysql_exporter for example, go to http://127.0.0.1:9102/metrics

Visit http://127.0.0.1:9104/#/status to see if the rules we configured in AlertManager.yml have taken effect:

Configure Java application monitoring

In the above configuration, we simply display the data collected by Prometheus for ourselves through Grafana, and our core is to collect the data of Java applications through Prometheus. In this way, the monitoring indicators collected by Micrometer exposed by SpringBoot and Actuator are required to be pulled periodically through Prometheus’ pull mode

  • The first thing you need to do is to complete the Micrometer integration of Java applications. Access the actuator/ Prometheus or/ Prometheus and return data indicators collected by Micrometer normally (this step has been described in detail in the previous section and will not be described here).
  • Go to the directory where Prometheus is deployed, click Prometheus. Yml to configure the pull node, and add the Java configuration to the scrape_configs node of the configuration file

Modify prometheus.yml to configure all monitoring services

For Prometheus, we didn’t configure any monitoring, so we modified the Prometheus. Yml file to monitor the data source we wanted to monitor, as shown in the figure below

global: scrape_interval: 15s evaluation_interval: 15s scrape_configs: - job_name: 'prometheus' static_configs: -targets: ['127.0.0.1:9090'] -job_name: 'node_exporter' STATIC_configs: -targets: ['127.0.0.1:9100'] labels: Instance: 'node_exporter' -job_name: 'redis_exporter' STATIC_configs: -targets: ['127.0.0.1:9101'] labels: instance: 'redis_exporter' -job_name: 'mysql_exporter' STATIC_configs: -targets: ['127.0.0.1:9102'] labels: instance: 'mysql_exporter' -job_name: 'cAdvisor' STATIC_configs: -targets: ['127.0.0.1:9103'] labels: instance: 'cadvisor' - job_name: 'server-demo-actuator' metrics_path: '/actuator/prometheus' scrape_interval: 5s static_configs: -targets: ['127.0.0.1:8001'] labels: instance: 'server-demo' ruLE_files: - 'memory_over. Yml '- 'server_down.yml' Alerting: AlertManagers: - Static_configs: -targets: ["127.0.0.1:9104"]Copy the code

PS: Targets for each service are an array that can collect monitoring data provided by each server’s EXPORTER.

Next, create the two monitoring rules memory_over.yml and server_down.yml mentioned above

# to create memory_over. Yml vim/dimples/volumes/Prometheus/memory_over ymlCopy the code

As follows:

groups:
  - name: server_down
    rules:
      - alert: InstanceDown
        expr: up == 0
        for: 20s
        labels:
          user: Dimples
        annotations:
          summary: "Instance {{ $labels.instance }} down"
          description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 20 s."
Copy the code

When the memory usage of a node exceeds 80% and lasts for more than 20 seconds, a monitoring alarm is triggered.

Create server_down.yml:

# server_down.yml
vim /dimples/volumes/prometheus/server_down.yml
Copy the code

As follows:

groups:
  - name: server_down
    rules:
      - alert: InstanceDown
        expr: up == 0
        for: 20s
        labels:
          user: Dimples
        annotations:
          summary: "Instance {{ $labels.instance }} down"
          description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 20 s."
Copy the code

When a node is down (up==0, 1 indicates normal running) for more than 20 seconds, monitoring is triggered.

Used in Grafana

Use a browser to access http://127.0.0.1:9090. The user name and password are admin and admin. You need to change the password upon your first login.

Step 1: First, you need to add the data source, which was described in detail in the previous section and won’t be repeated here. The result is shown in the following figure:

After adding the data source successfully, we can add the monitoring panel. Similarly, we can go to the Grafana official market and select the template configured by others: grafana.com/grafana/das…

Here I have collected some useful monitoring templates, has been uploaded to the tiny cloud network backup, you just need to download and then import (link: share.weiyun.com/XDzICKtf)

The following uses MySql monitoring as an example to demonstrate importing templates:

Click Upload JSON file and select the corresponding file. After success, the interface will pop up automatically and click Import

For additional

Alertmanager Provides rich warning configurations

Groups: - name: example # define rule group rules: - alert: InstanceDown # define alarm name expr: up == 0 #Promql statement, trigger rule for: 1m # 1 minute labels: # tag Defines the severity of the alarm and the host name: instance severity: Critical Annotations: "{{$value. appName}}" {$value. appname}}" {$value. appName}} Host rules: - alert: HostMemory Usage expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes)) / node_memory_MemTotal_bytes * 100 > 80 for: 1m labels: name: Memory severity: Warning annotations: summary: "{{$value. appName}}" description: "Host memory Usage exceeds 80%." value: "{{$value}}" -alert: HostCPU Usage expr: sum(avg without (cpu)(irate(node_cpu_seconds_total{mode! =' IDLE '}[5m])) by (instance, AppName) > 0.65 for: 1M labels: name: CPU severity: Warning Annotations: Summary: "{{$value. appName}}" description: "Host CPU usage exceeds 65%." value: "{{$value}}" -alert: HostLoad expr: node_load5 > 4 for: 1m labels: name: Load severity: Warning annotations: summary: "{{ $labels.appname }} " description: "Value: "{{$value}}" -alert: HostFilesystem Usage expr: 1-(node_filesystem_free_bytes/node_filesystem_size_bytes) > 0.8 for: 1M labels: name: Disk severity: Warning annotations: summary: " {{ $labels.appname }} " description: "Value: "{{$value}}%" -alert: HostDiskio expr: irate(node_disk_writes_completed_total{job=~"Host"}[1m]) > 10 for: 1m labels: name: Diskio severity: Warning annotations: summary: " {{ $labels.appname }} " description: Value: "{{$value}} IOPS "-alert: Network_receive expr: irate(node_network_receive_bytes_total{device! ~"lo|bond[0-9]|cbr[0-9]|veth.*|virbr.*|ovs-system"}[5m]) / 1048576 > 3 for: 1m labels: name: Network_receive severity: Warning annotations: summary: " {{ $labels.appname }} " description: Value: "{{$value}}3Mbps" -alert: Network_transmit EXPr: irate(node_network_transmit_bytes_total{device! ~"lo|bond[0-9]|cbr[0-9]|veth.*|virbr.*|ovs-system"}[5m]) / 1048576 > 3 for: 1m labels: name: Network_transmit severity: Warning annotations: summary: " {{ $labels.appname }} " description: "Value: "{{$value}}3Mbps" -name: Container rules: - alert: ContainerCPU Usage expr: (sum by(name,instance) (rate(container_cpu_usage_seconds_total{image! =""}[5m]))*100) > 60 for: 1m labels: name: CPU severity: Warning annotations: summary: "{{$value. name}} "description:" Container CPU Usage exceeds 60%." value: "{{$value}}%" -alert: ContainerMem Usage # expr: (container_memory_usage_bytes - container_memory_cache) / container_spec_memory_limit_bytes * 100 > 10 expr: container_memory_usage_bytes{name=~".+"} / 1048576 > 1024 for: 1m labels: name: Memory severity: Warning annotations: Summary: "{{$value. name}} "description:" Container memory usage exceeds 1GB." value: "{{$value}}G"Copy the code

Early warning in addition to using E-mail, also can use enterprise WeChat receive, you can refer to: songjiayang. Gitbooks. IO/Prometheus /…