1. The implementation principle and the components used are introduced

Components:

  • Node_exporter is responsible for collecting server data and exposing monitoring data to port 9100
  • Prometheus obtains monitoring data from server 9100 and provides interfaces for other components to query data
  • Grafana data visualization provides a better looking interface for displaying monitoring data and providing simple alarms
  • Consul service discovery, enabling automatic registration of monitoring servers (If only one device is detected, perform configuration in the Prometheus profile without Consul)
  • The AlertManager defines detailed alarm rules and forwards the alarm information to the Web service of Prometheus – Webhook-DingTalk
  • Prometry-webhook-dingtalk is responsible for beautifying and pinning alarm messages

2. Install and configure Prometheus

Installation:

# downloadWget HTTP: / / https://github.com/prometheus/prometheus/releases/download/v2.7.2/prometheus-2.7.2.linux-amd64.tar.gz# decompressionThe tar - XZF Prometheus - 2.7.2. Linux - amd64. Tar. GzBase configuration prometheus.ymlCopy the code

Configuration file:

global:
  scrape_interval:     15s # Collect data every 15s
  evaluation_interval: 15s Evaluate the rule every 15s

This configuration uses the Alertmanager configuration method
alerting:
  alertmanagers:
  - static_configs:
    - targets: ['localhost:9093']


# Alarm rule configuration file directory
rule_files: [ 'rules.yml' ]

scrape_configs:
  - job_name: 'prometheus-server'  
    static_configs:
      - targets: ['localhost:9100']
    
  The following configuration is using Consul automatic registration service
  - job_name: 'node'
    consul_sd_configs:
      - server: '127.0.0.1:8500'
        services: []
    relabel_configs:
      - source_labels: [__meta_consul_tags]
        regex: .*prometheus-target.*
        action: keep
      # Replace the IP in instance with the machine name in Consul
      - source_labels: [ __meta__consul_service ]
        target_label: instanceCopy the code

Enable:

./prometheus &Copy the code

You can visit http://ip:9090/targets to see which servers are online and which are offline

To see the graphical interface for Prometheus, type http://ip:9090 in the browser and select a metric,node_load1, to see if any data is available:

3. Node_exporter installation

Download the zip packageWget HTTP: / / https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz# decompressionThe tar - XZF node_exporter - 0.18.1. Linux - amd64. Tar. Gz# enable
`./node_exporter &`Copy the code

4. Grafana installation

Download the package from the official websiteWget HTTP: / / https://dl.grafana.com/oss/release/grafana_6.2.5_amd64.deb# Version changes on demand
# 
sudo apt-get install -y adduser libfontconfig1
# installationSudo DPKG -i grafana_6. 2.5 _amd64. Deb# enable
- `sudo service grafana-server start`
# Boot enabled
`sudo update-rc.d grafana-server defaults`

Port 3000 is enabled by default. The default account name and password are admin/adminCopy the code

Grafana official installation guide and use

After Prometheus, node_exporter, grafana installation, you can open a browser and enter IP :3000 to log in to grafana as admin. The first login will force you to change the password.

Add Prometheus data source after entering grafana screen

Click gear => Data Sources => Add Data Source, as shown below:

Add prmetheus data source

After the data source is added, you can customize your own dashboard. Post one of mine

5. Consul installation

Installation:

# docker installation
# pull mirror
docker pull consul
Run and bind ports
docker run --name consul -d -p 8500:8500 consul

Zip package installation
Download the installation packageWget HTTP: / / https://releases.hashicorp.com/consul/1.5.2/consul_1.5.2_linux_arm64.zip# decompressionUnzip consul_1. 0.0 _linux_amd64. ZipDuplicate Consul to bin
cp consul /usr/local/bin/
# startConsul agent-server-uI-bootstrap-expect 1-data-dir/TMP/consul-bind = 192.168.50.19-client 0.0.0.0 2> &1&# -bind=> server IP-client => IP address that can be accessedCopy the code

New configuration in prometheus.yml

- job_name: 'consul-prometheus'
    consul_sd_configs:
    # consul address
      - server: 'xx.xx.xx.xx:8500'    #
        services: []
    relabel_configs:
      - source_labels: [__meta_consul_tags]
        regex: .*prometheus-target.*
        action: keepCopy the code

Registration Services:

{
  "id": "prometheus-server"."name": "prometheus-node"."address": "192.168.50.19".The IP address to register the service
  "port": 9100, 
  "tags": ["prometheus-target"]."checks": [{"http": "http://www.baidu.com".# health check website
          "interval": "15s"         # Health check interval}}]Copy the code

Save the configuration file as a JSON file and register with the JSON file:

curl --request PUT --data @regitor.json http://localhost:8500/v1/agent/service/registerCopy the code

Termination of service:

Sends a PUT request to the http://localhost:8500/v1/agent/service/deregister back to the service name

curl --request PUT http://localhost:8500/v1/agent/service/deregister/userService1   #userService1 is the deleted service nameCopy the code

6. Install and configure Alertmanager

Installation:

Download the installation packageWget HTTP: / / https://github.com/prometheus/alertmanager/releases/download/v0.17.0/alertmanager-0.17.0.linux-amd64.tar.gz# decompressionThe tar - XFZ alertmanager - 0.17.0. Linux - amd64. Tar. Gz# start
./alertmanagerCopy the code

Alertmanager. yml configuration file configuration:

global:
  resolve_timeout: 5m

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h   # Frequency of sending
  receiver: 'webhook'   # Use the notification channel webhook for the nail robot
receivers:
- name: 'webhook'
  webhook_configs:
  - url: 'http://localhost:8060/dingtalk/ops_dingding/send'     Promethees-webhook-dingtalk is used to send messages to promethees-webhook-dingTalk
    send_resolved: true         Whether to send a notification after the alarm is cleared
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname'.'dev'.'instance']Copy the code

New configuration in Prometheus. yml:

rule_files: [ 'rules.yml' ]Copy the code

Rules.yml Configuration example:

Groups: - name: host_monitoring rules: - alert: memory alert expr: ((node_memory_MemFree_bytes) + (node_memory_Cached_bytes) + (node_memory_Buffers_bytes)) / 1024 / 1024 < 500for: 2m
      labels:
        team: node
      annotations:
       # Alert_type: memory alarm
       # Server: '{{$labels.instance}}'
        #summary: "{{$labels.instance}}: High Memory usage detected"
        explain: "Free memory < 500MB, value: {{$value }}MB"
        #description: "{{$labels.instance}}: Memory usage is above 80% (current value is: {{ $value }})"- alert: disk alarm expr: (Max (node_filesystem_avail_bytes{device=~"/dev.*"}) by (instance)) / 1024 / 1024 < 1024
      for: 2m
      labels:
        team: node
      annotations:
        #Alert_type: disk alarm
        #Server: '{{$labels.instance}}'
        explain: "Available disk capacity less than 1 GiB, value: {{$value }}GiB"- alert: service alarm expr: up == 0for: 2m
      labels:
        team: node
      annotations:
        #Alert_type: service alarm
        #Server: '{{$labels.instance}}'
        explain: "Node_exporter service disconnected"Copy the code

After the configuration is complete, you can see the monitoring rule in the IP :9090/ Alerts interface, as shown in the figure:

7. Prometheus – Webhook-DingTalk installation and configuration

Since Prometry-Webhook-dingTalk is written in Golang, we install Golang first:

# Golang install configuration

# Download source codeWget HTTP: / / https://dl.google.com/go/go1.10.3.linux-amd64.tar.gz# decompression
tar -C /usr/local- XZF go1.10.3. Linux - amd64. Tar. GzAdd binary files to PATH
vim /etc/profile
export GOROOT=/usr/local/go
export PATH=$PATH:$GOROOT/bin
source /etc/profileCopy the code

Prometheus-webhook-dingtalk Installation and Configuration:

# golang under the SRC directory of the newly built and CD/usr/local/go/src/github.com/timonwong
# Clone project and compile

git clone https://github.com/timonwong/prometheus-webhook-dingtalk.git
cd prometheus-webhook-dingtalk
makeCopy the code

Start the

Profile = "ops_dingding=dingding_webhook" 2>&1 1>dingding.log &# start port 8060, start to netstat -a | grep 8060 see service service demonstrated normal bootCopy the code

At this point, all the configuration is complete. When the server has an alarm, the pinning robot will automatically send a message in the following style:

~~~ OVER ~~~