This is the fifth day of my participation in the More text Challenge. For details, see more text Challenge

AlertManager

Follow on from the previous post on customizing Prometheus

preface

After building a set of monitoring, an alarm mechanism is essential, pushing messages in a variety of ways, such as email, SMS, nail, enterprise wechat, etc., to help operation and maintenance personnel find and repair problems as soon as possible

1. Create AlertManager

The old rule starts by stealing configuration files

docker cp alertmanager:/etc/alertmanager/alertmanager.yml .
Copy the code

Start the AlertManager

docker run --name alertmanager -d -p 9093:9093 -v /Users/yujian/Documents/prometheus/alertmanager.yml:/etc/alertmanager/alertmanager.yml  prom/alertmanager:latest
Copy the code

2. Create an AlertManager alarm mode

By email, modify alertManager.yml

global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.163.com:25'
  smtp_from: [email protected]
  smtp_auth_username: [email protected]
  smtp_auth_password: xxxxx
  smtp_require_tls: false

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1m
  receiver: 'mail'
receivers:
- name: 'mail'
  email_configs:
  - to: [email protected]
Copy the code

In this case, the AlertManager alarm configuration is complete.

3. Create an alarm rule

Alarm rules represent what circumstances trigger an alarm controlled by Prometheus

# modified Prometheus. Yml rule_files: - "/ etc/Prometheus/rules. Yml" # - "second_rules. Yml"Copy the code

At this time and no/etc/Prometheus/rules. Yml configuration files, we have to create one

vi rule.yml groups: - name: node-up rules: - alert: cpumax #aleartname expr: Easy_prometheus_system_cpu_percent {job="easy_prometheus"} > 20 Summary: "{{$labels. Instance}} CPU usage exceeds 20%!" } == 0 #promQL for: 4s labels: # Severity: 1 team: node annotations: Summary: "{{$labels. Instance}} stop running!"Copy the code

To recreate the Prometheus container, will rule. Yml mounted to/etc/Prometheus/rules. The yml, start to finish see if Alerts can success

Webhook way

route: group_by: ['instance'] group_wait: 10s group_interval: 20s repeat_interval: 20s #repeat_interval: 1h receiver: 'webhook receivers: - name:' webhook webhook_configs: - url: 'http://192.168.31.150:8089/webhook'Copy the code

The message format

{"receiver":"webhook"."status":"resolved"."alerts": [{"status":"resolved"."labels": {{"status":"resolved"."labels": {"action":"Cpu utilization"."alertname":"cpumax"."application":"easy_prometheus"."cause":"The Cpu"exported_application":"easy_prometheus","instance":"192.16831.150.:8089","job":"easy_prometheus"},"annotations": {"summary":"192.16831.150.:8089CPU usage exceeds the threshold20%!"},"startsAt":"2021- 06- 19T03:21:56.117Z","ends021- 06- 19T03:22:11.117Z","generatorURL":"http://406161e43292:9090/graph? g0.expr=easy_prometheus_system_cpu_percent%7Bjob%3D%22easy_prometheus%22%7D+%3E+20\u0026g0.tab=1","fingerprint":"1bcf523 F0c524538}], "" groupLabels" : {" instance ":" 192.168.31.150:8089 "}, "commonLabels" : {" application ":" easy_prometheus ", "instance" : "192.168.31.150:8089", "job" : "easy_prometheus"}, "commonAnnotations" : {}, "externalURL" : "http://c731ba69bfca:9093", "version ":" 4 ", "groupKey" : "{}, {instance = \" 192.168.31.150:8089 \ "} ", "truncatedAlerts" : 0}

Copy the code

Modified easy-Prometheus (updated to Github) source code to add listening webhook notifications

The access_token is created on the Nail Swarm robot

type Ding struct { Alerts []struct{ Annotations struct{ Summary string `json:"summary"` } `json:"annotations"` } `json:"alerts"` } func dingding(w http.ResponseWriter, r *http.Request) { s, _ := ioutil.ReadAll(r.Body) ding := &Ding{} fmt.Println(string(s)) json.Unmarshal(s,ding) anno := ding.Alerts[0] req :=&httpgo.Req{} x, err := req.Header("Content-Type", "application/json"). Method(http.MethodPost). Url("https://oapi.dingtalk.com/robot/send?access_token=xxxxxxx"). Params(httpgo.query {"link": map[string]interface{}{"title": "AlertManager notification ", "text": "Notice" + anno. Annotations. The Summary, # figure is online looking for "picUrl" : "https://photo.16pic.com/00/65/09/16pic_6509905_b.png", # click message title quick jump to Prometheus "messageUrl" : "http://localhost:9090/alerts",}, "msgtype" : "the link,"}) Go (). The Body () if err! =nil { log.Println(err) } fmt.Println(x) }Copy the code

3. Test alarms

My test here is to start multiple applications to achieve 20% CPU utilization and hold for 3 seconds.

nailing