The back-end developer for TX, sharing topics such as back-end technology, machine learning, data structure and algorithms, computer fundamentals, programmer interview, etc. Welcome to the public number”Ren Dong learned programming

Customized monitoring based on Grafana


Because the content of this article is relatively detailed and complex, I put the structure of the article in front of it to make it easier for you to clear your thoughts. Meanwhile, this is the first time for the subject to write a KM article. If there are any deficiencies in the article, I hope you can criticize and correct me.

<center> article structure directory </center>

1, the background

1.1. Cross monitoring

Monitoring is a very important part of the whole product life cycle. The operation and maintenance focus on hardware and basic monitoring, the research and development focus on all kinds of middleware and application layer monitoring, and the product focus on the monitoring of core business indicators. For data reporting, transmission, storage and self-monitoring by using the full link, real-time monitoring data collection, prediction of faults and alarms can be realized. However, if the self-monitoring fails, how can it be found? This requires the introduction of external cross monitoring and the life cycle of self-monitoring.

1.2. Selection of monitoring tools

Is the so-called “no monitoring, no operation and maintenance”, the status of the monitoring system is self-evident. On the selection of monitoring system and some monitoring basis can refer to the article of the big guy monitoring system selection. For the subject, the cross monitoring system needs to import ES data source, visualize the real-time interface, and alert in a timely manner after triggering the threshold value. Although Kibana of ELK three swarders can also be visualized, it is not good for QQ.

Grafana is a cross-platform open source visualization tool that provides complex query and presentation of data by configurable data sources. It supports up to 14 data sources such as MySQL, ElasticSearch and most data sources support configuration alerts. So I chose Grafana!


  • External inspection + internal self-inspection and cross monitoring can prevent the problem of internal monitoring failing and no place to query monitoring data, and enhance the depth and level of monitoring.
  • Monitoring chart three-dimensional, according to the relevance of the chart group and establish a hierarchical relationship, monitoring chart aggregation, grouping, three-dimensional is very critical to quickly locate the root of the problem.
  • Grafana’s custom monitoring configuration is easy and convenient

2, in field

As mentioned above, the selection of the cross monitoring tool is Grafana, which needs to complete the requirements of ES data source import, real-time interface visualization, multi-mode alarm after triggering threshold, etc. Now, let’s enter the actual operation stage.

2.1 Configure the DataSource

It is mainly to set relevant data sources to generate effective data sources

<center> Select the data source type </center>

<center> data source configuration </center>

2.2. Configure Dashboard

After you have configured the data sources you are using, you can add your own panels. There are also a variety of panels:

<center> visualization method </center>

Take Graph as an example, as shown below, and add or configure dashboards. The red box in the upper right corner indicates: new, star, share, save, set, query mode, time period, reduce (relax for time period, that is, change from small time period to large time period), refresh, etc

<center> interface overview </center>

< center > General interface < / center >

2.3. Configure Variables

Template variables are set here, mainly for the convenience of subsequent interface query, with flexible drop-down box configuration, custom aggregation of monitoring charts, and quick positioning of problems.

< center > Variables set < / center >

SQL > SELECT * FROM ‘Query’ WHERE Query = ‘Query’;

Query describe
{” find “:” fields “, “type” : “keyword”} Returns a list of field names of index typekeyword.
{” find “:” terms “, “field” : “@hostname”, “size” : 1000} Returns a list of values for a field using the term aggregation. The query takes the user’s current dashboard time range as the time range of the query.
{” find “:” terms “, “field” : “@hostname”, “query” : ‘<lucene query=”” style=”box-sizing: border-box;” > “} < / lucene > Use Term Aggregation & and the specified Lucene query filter to return a list of values for the field. The query uses the current dashboard time range as the time range of the query.

After you have made a valid variable set, you can save it to see the preview effect

<center> Save Variables </center>

</center> Variables </center> Variables

Note that this is only the variable configuration. The drop-down box does not work in the data query. You need to use the Query statement to bind the variable to make it work!

<center>Variables </center>

2.4. Docking nebula alarm

Nebulae alarm management system, is an alarm oriented general management scheme, provides the alarm access to report, shield, subscription, convergence, recovery, inquiry, notice (support telephone, WeChat, enterprise WeChat, mail small program), upgrade, automatic processing, statistical analysis and other management capabilities; Through the structured definition of alarm data, rich and open API, the system has a high degree of customization ability; At the same time, it seamlessly connects with Tencent Yunxing cloud engineering order system, duty system and process engine, which has a strong automatic processing capability. At present, it mainly serves for Tencent cloud-based IaaS operation and maintenance scenes, and has been connected to most of the cloud-based alarms.

Cloud alarm management system address: alarm query

2.4.1 Relevant configuration

  • Step 1: Check if a nebula alert channel already exists. If not, add a channel in Alerting — Notification channels.

Note: The channel only needs to be configured once, the first time the interface is used. If a nebula alert already exists, proceed directly to the second step.

  • Step 2: Configure the alarm conditions. Select the Alert option on the left side of the Panel page to configure the alarm conditions.

<center> template variable cannot configure alarm </center>

The template variable cannot be configured with an alert. The template variable cannot be configured with an alert.

1. Multiple Query Schemes: In the figure below, if query A is configured with template variables and cannot configure alarms, you can configure alarms by configuring the Query statement on newly added query B without using template variables. (Pay attention to shielding B view to avoid the problem of overlapping views)

<center>add Query</center>

You can set two views, one for Monitor and the other for Alert. The other configuration is the same as the first one.

The difference between the two methods is not very big, multiple views may be more convenient and intuitive.

<center> dual view configuration </center>

After solving the problem of template variables, return to the main topic and configure the warning conditions

<center> configuration alert condition </center>

  • Step 3: Configure the alert message

<center> configuration alert information </center>

  • Step 4: Add subscription alerts to Nebula

<center> nebula subscription alert message </center>

2.4.2 Test alarm

  • Alarm Trigger Test: After completing the configuration of relevant alarm conditions and rules, you can conduct a test to determine whether the alarm will be triggered.

<center> alarm trigger test </center>

  • Enable Alarm Rules: If all else fails, then you can enable Alarm Rules for formal monitoring!!

<center> turn on alarm rule </center>

  • Alarm effect:

<center> mail alert </center>

< center > RTX alarm < / center >

  • Alert history query: Grafana can also view alert history,

<center> alert history query </center>

  • Dashboard status display: The Dashboard interface will also have red and green lines to remind different status

<center> status shows </center>

3, the References

This article mainly refers to the following articles, thanks to the authors of these articles.

Grafana is completely disintegrated

Grafana is highly usable and Alerting