On the 7th day of my participation in the November Gwen Challenge, check out the details of the event: the last Gwen Challenge 2021

The Grafana nail alert card does not jump to the Grafana interface when clicked

In the Grafana config file.ini

root_url = 'xxxx'
Copy the code

Configure the address and restart

Grafana installation and enabling

I’m using the Docker method here

Official documentation: grafana.com/docs/grafan…

1. Data access, dashboard configuration display, the meaning of each indicator is not explained in detail, please refer to this article:

www.jianshu.com/p/7e7e0d067… By Jane Book – Kang teenager

2. Direct third gear start can fork this branch: github.com/monitoringa…

It has a very comprehensive and very orthodox grafana template for common data sources, which you can download and dump

3. Note that template-type dashboards can only be used for monitoring and presentation, and access alerts require custom Queries

Two, nail nail robot creation and configuration

Nail developer documentation: ding-doc.dingtalk.com/doc#/server…

1. Create nail group & Nail Robot

Create a custom robot

2. Get the URL of webhook in Robot Settings

Get the URL for webhook

3. Security Settings, this step is necessary, I select whitelist mode, and fill in grafana server address

Security Settings – Whitelist

Grafana set alarm

1. In the Grafana console, in the left column “Alerting” module, create an alert.

Disable Resolve Message If health monitoring is set to [OK], no Message is sent.

2. Create a dashboard and panel for testing, press “E” to enter editing mode, create a Query, and select data source, test item, instance ID, and data acquisition interval.

3. Create an alarm rule

  • Name User-defined alarm Name
  • Evaluate every health test frequency
  • For Indicates the time required For changing from pending to Alerting. Send to Alarm trigger
  • Message alert copy

4. Set a small alarm threshold for testing and go back to the pins to see the robot messages

// Remember to open the Disable Resolve Message tag so that the [OK] state does not alert

Other implementation details

1) Modify the EC2 monitoring of the AWS console to enable “Detailed Monitoring”, which actually means that the data capture frequency is from 5min to 1min

2) Basically follow the test routine, set alarm for the commonly used server, and multiple queries can be put into a panel

Monitoring item: CPU load

Health monitoring: Calculate the average CPU load of the previous 5 minutes every minute. If the CPU load exceeds 80, the alarm is generated

Alarm rule: When the mean value is greater than 80, the state becomes “pending”, and the pending state lasts for 3 minutes

Miscellaneous: nail group announcement, responder coordination, test robot conversion, modify robot profile picture

Fifth, perfect and expand

Grafana access stitching robot only supports link mode. In this article, link is only used for text preview. Here is a link sample

{"msgtype": "link", "link": {"text": "This new version to be released, founder XX called it mangrove. Until now, when faced with a major upgrade, product managers would pick a code name for the situation. This time, why mangrove? ", "title": "The train of The Times is moving forward ", "picUrl": "", "" messageUrl": "https://www.dingtalk.com/s?__biz=MzA4NjMwMTA2Ng==&mid=2650316842&idx=1&sn=60da3ea2b29f1dcc43a7c8e4a7c97a16&scene=2&srci d=09189AnRJEdIiWVaKltFzNTw&from=timeline&isappinstalled=0&key=&ascene=2&uin=&devicetype=android-23&version=26031933&nett ype=WIFI" } }Copy the code

You can modify the corresponding fields to enrich the functions of the nail robot, such as clicking the link to directly transfer to the service console and monitoring dashboard