Author introduction: Wang Tianyi

Prometheus + Grafana is a universal monitoring system widely used in various applications.

This paper mainly introduces whether the new monitoring system built by TiDB + Prometheus can be migrated to the existing monitoring system.

For users with limited resources and low high availability requirements, it is recommended to divide clusters directly through Prometheus Label to provide Prometheus monitoring environment for All in One. Resource-rich users with high availability requirements should consider using Prometheus’s multi-tenant solution.

As a stateless application, Grafana can be deployed to a highly available architecture using Keepalived + Haproxy if there is a need for high availability.

From this article, you will learn:

  • How to integrate different TiDB cluster Prometheus monitoring platforms into the same Prometheus platform

  • How do I view metrics in Prometheus from Grafana

  • How do I inject cluster tags for different clusters into the Grafana dashboard

  • How do I import reports into Grafana in batches using the Grafana HTTP API

  • How to solve Prometheus performance problems caused by a large amount of indicator data

  • How do I import data from Prometheus into a relational database for query or indicator analysis

  • How to achieve high availability and high tenant for Prometheus

Introduction to this article:

  • What do I want to do: integrate each cluster of independent Prometheus + Grafana into a unified platform with a single portal for queries

  • How does Prometheus integrate: separate Prometheus pulls metrics from different clusters and identifies them by label

  • How does Grafana integrate: You need to isolate the label information that each EXPR is pushed into the cluster, generate new reports, and import reports in batches using the Grafana HTTP API

  • Possible risks after integration: Prometheus data explosion and slow performance

  • What to do: Warehouse demolition! Why should a newly merged library be split?

  • Target for split: Prometheus horizontal extension, remote repository for centralized data storage

  • Centralized data storage solution: Use Prometheus – Postgresql-Adapter + TimescaleDB to store data

  • What’s wrong with centralized data storage: Dashboard expr needs to be read from Timescale DB and the original PromSQl-based EXPR is not available

  • How to solve the problem of converting SQL to PromSQL by adding a layer of Prometheus on a Timescale

  • Horizontal extension and multi-tenant solution for Prometheus: Thanos

A few things to say in advance:

  • As a DBA who has been working in the first tier for a long time, and even in the second or third tier, I am most concerned about three things (in order) :

  • Get good job: correctness, data can’t be little

  • I want to go to bed early: stability, not calling the police at night

  • Peace of mind to go to endowment: inquiry is the problem of the product, close my DBA what matter

  • To this end, as a non-famous DBA, I put together the idea of TiDB monitoring integration solution

  • This article is the idea, dare not be called the scheme

  • The principal and interest are recorded. I take out a plan and then overturn the iterative process of the plan

  • Each scheme has its own uniqueness, so there is no best scheme, only the most suitable scheme

Experimental cluster environment

This section describes the operating system environment

[root@r30 .tiup]# cat /etc/redhat-release
CentOS Stream release 8
[root@r30 .tiup]# uname -r
4.18.0-257.el8.x86_64
Copy the code

This section describes the TiDB cluster environment

As the experimental environment, we deployed two sets of TiDB clusters, TIDB-C1-V409 and TIDB-C2-V409.

On a standalone machine, I deployed a cluster TIDB-Monitor system through TiUP and kept only Grafana and Prometheus components. Other TiDB components have been removed. This set of TIDB-Monitor cluster is to simulate our existing monitoring platform and migrate the monitoring of TIDB-C1-V409 and TIDB-C2-V409 to TIDB-Monitor.

Overview of the current TiDB monitoring framework

Application of Prometheus in TiDB

Prometheus is a time-series database of flexible query statements with a multidimensional data model.

Prometheus is a popular open source project with an active community and numerous success stories.

Prometheus provides several components for users to use. Currently, TiDB uses the following components:

  • Prometheus Server: Collects and stores time series data

  • Client code base: Used to customize the metrics needed in the program

  • Alertmanager: Used to implement the alarm mechanism

Application of Grafana in TiDB

Grafana is an open source metric analysis and visualization system.

TiDB uses Grafana to display the monitoring of TiDB cluster components. The monitoring items are grouped as follows:

Problems with Prometheus & Grafana

As the number of clusters increases, some users may have the following requirements:

  • Multiple TiDB clusters cannot share a monitoring cluster

  • Prometheus itself does not have high availability as data volumes grow

  • Prometheus’s query speed will slow down

For this purpose, we considered whether Prometheus and Grafana could be integrated in different clusters, so that multiple clusters could share one construction monitoring system.

Prometheus integration solution

Prometheus profile

TiDB uses the open source timing database Prometheus as the monitoring and performance indicator information storage solution, and uses Grafana as the visual component for information presentation. Prometheus is the software itself, namely Prometheus Server, and the ecosystem of various software tools based on Prometheus Server in the broad sense. In addition to Prometheus Server and Grafana, alertManager, PushGateway and a variety of exporters are commonly used components of Prometheus ecosystem. Prometheus Server itself is a sequential database with very efficient insert and query performance and a very small storage footprint compared to Zabbix monitoring, which uses MySQL as the underlying storage. Pushgateway is used between the data source and Prometheus Server to receive pushed information using Prometheus Server.

Prometheus monitoring ecology is very perfect, can monitor the object is very rich. Exporters can refer to the official exporter list for details. Prometheus will be able to monitor much more than the products on the official Exporters list; some products, such as TiDB, are not native to the company; Some can monitor a class of products through a standard exporter, such as SNMP_exporter. Others can write a simple script of their own to push to PushGateway; If there is certain development ability, but also through their own write exporter to solve. At the same time, some products, such as CEPh, can be supported without the exporter listed above, as versions update. With the arrival of containers and Kurbernetes, and more software supporting Prometheus natively, Prometheus will soon become the leader in monitoring.

Prometheus architecture introduction

The architecture diagram for Prometheus is as follows:

Prometheus Server software is the core part of Prometheus ecology, which is used for storing and retrieving monitoring information and pushing alarm messages. The Alertmanger receives alarms pushed by the Prometheus Server, groups and de-loads the alarms, routes the alarms based on the alarm labels, and sends the alarms to the recipients by email, SMS, wechat, pin, and Webhook. Most applications that use Prometheus as a monitor require a half operator to collect data, but some applications that support Prometheus natively, such as the TiDB component, Monitoring data can be collected directly without having to deploy MY exporter. PromQL is a Prometheus data query language that allows users to retrieve monitoring information by writing PromQL directly on a browser using the Prometheus Server Web UI. PromQL can also be solidified into grafana reports for dynamic presentation, as well as richer customization through the API. Prometheus collects static exporters and monitors various dynamic objects, such as Kubernetes node, Pod, and Service, through Service Discovery. Besides My friend and Service Discovery, users can also write scripts to collect custom information and push it to pushGateway. Pushgateway is a special exporter to Prometheus Server, which can run pushGateway as well as any other exporters.

Promethes Label usage rules

Prometheus identifies different metrics based on Label

Different labels can be added to the host, which can be used to distinguish different metrics for the same cluster with the same name, or to aggregate different metrics for the same cluster:

## modify prometheus.conf
static_configs:
- targets: ['localhost:9090']
  labels:
    cluster: cluster_1
    
 ## check prometheus configuration file
 ./promtool check config prometheus.yml
 
 ## reconfig the prometheus
 kill -hup <prometheus_pid>
 
 ## get the cpu source aggregation
 sum(process_cpu_seconds_total{cluster='cluster_1'})
Copy the code

Data collection can be stopped or retained by tags:

## stop the metric collection for the jobs with the label cpu_metric_collection_drop
scrape_configs:
  - job_name: 'cpu_metric_collection'
    static_configs:
    - targets: ['localhost:9090']
    relabel_configs:
    - action: drop
      source_labels: ['cpu_metric_collection_drop']
      
## keep the metric collection for the jobs with the label cpu_metric_collection_keep
scrape_configs:
  - job_name: 'cpu_metric_collection'
    static_configs:
    - targets: ['localhost:9090']
    relabel_configs:
    - action: keep
      source_labels: ['cpu_metric_collection_keep']
Copy the code

Relabel operation for Prometheus

Label is an important parameter in Prometheus monitoring system. In a centralized, complex monitoring environment, we may not have control over the resources we are monitoring and their metrics data. Redefining monitoring labels can effectively control and manage data indicators in a complex environment. After Prometheus fetches exportor data, it edits the data labels and allows the user to modify, add, and delete unnecessary labels using relabel_configs.

# The source labels select values from existing labels. Their content is concatenated # using the configured separator and matched against the configured regular expression # for the replace, keep, and drop actions. [ source_labels: '[' <labelname> [, ...] ']' ] # Separator placed between concatenated source label values. [ separator: <string> | default = ; ]  # Label to which the resulting value is written in a replace action. # It is mandatory for replace actions. Regex capture groups are available. [ target_label: <labelname> ] # Regular expression against which the extracted value is matched. [ regex: <regex> | default = (.*) ] # Modulus to take of the hash of the source label values. [ modulus: <int> ] # Replacement value against which a regex replace is performed if the # regular expression matches. Regex capture groups are available. [ replacement: <string> | default = $1 ] # Action to perform based on regex matching. [ action: <relabel_action> | default = replace ]Copy the code

In the example above,

can contain the following centralized actions:

  • Replace: Replace with the value of replacement is matched to the source_label by the regex regular;

  • Keep: Saves the metric of the matched label and deletes the metric of the unmatched label.

  • Drop: Deletes the metric of the matched label and keeps the metric of the unmatched label.

  • Hash: sets target_label to the hash value of the modulus configuration of source_label.

  • Labelmap: Configure the names and values of all labels matched by the Regex as new labels.

  • Labeldrop: deletes the labels that match the rule and keeps the labels that are not matched.

  • Labelkeep: Saves labels that meet the rule and deletes unmatched labels.

From the introduction of Prometheus Label, we can consider using the Label feature to Label different TiDB clusters.

Label is used to distinguish different cluster information in TiDB

Several options for modifying the Prometheus configuration file

Take the tiDB job as an example to complete a basic configuration. There are two ways to modify the TIDB job:

  • Create a tiDB job and label the job tiDB-C1-v409 and tiDB-C2-V409 with relabel_configs

  • Create two tidb jobs, job-tidb-c1-v409 and job-tidb-c2-v409

Solution 1: Create a Tidb job and run relabel_configs to distinguish different clusters

## The first way - create one job for tidb, and distinguish different clusters by relabel_configs operation - job_name: "tidb" honor_labels: true # don't overwrite job & instance labels static_configs: - targets: - '192.168.12.31:12080' - '192.168.12.32:12080' - '192.168.12.31:12080' - '192.168.12.31:22080' - '192.168.12.32:22080' - '192.168.12.32:22080' - '192.168.12.39:22080' relabel_configs: -source_labels: ['__address__'] regex: '(.*):12080' target_label: 'cluster' replacement: 'tidb-c1-v409' - source_labels: [ '__address__' ] regex: '(.*):22080' target_label: 'cluster' replacement: 'tidb-c2-v409'Copy the code

In the configuration above:

  • ‘address’ indicates the address that is filtered out by targets. In this example, six values can be filtered: 192.168.12.3{1,2,3}:{1,2}2080;

  • Regex means matching the results filtered by source_labels with a regular expression.

  • Target_label renames the label address to cluster;

  • Replacement indicates that the matching result of the regular expression is renamed to tidb-C1-v409 or tidb-C2-v409.

  • Reload the Prometheus configuration file. On the Prometheus GUI page, go to Status -> Targets and you can see the following results.

Solution 2: Create different jobs to differentiate different clusters

## The second way - create two jobs for different clusters - job_name: "tidb-c1-v409" honor_labels: true # don't overwrite job & instance labels static_configs: - targets: - '192.168.12.31:12080' - '192.168.12.32:12080' - '192.168.12.33:12080' labels: cluster: tidb-C1-v409-job_name: "tidb-c2-v409" honor_labels: true # don't overwrite job & instance labels static_configs: - targets: - '192.168.12.31:22080' - '192.168.12.32:22080' - '192.168.12.33:22080' Labels: cluster: TIdb-C2-V409Copy the code

In the configuration above:

  • Job_name can be used as an identifier to distinguish two clusters.

  • For different jobs, only independent cluster endpoints are configured in the target.

  • Label a cluster with labels;

  • Reload the Prometheus configuration file. On the Prometheus GUI page, go to Status -> Targets and you can see the following results.

It is difficult to compare the advantages and disadvantages of the two schemes. The first scheme reduces the number of jobs but increases the number of labels in jobs. The second solution reduces the number of labels in jobs but increases the number of jobs. It’s like 2 times 3 times 4, it’s hard to tell whether 2 times 3 times 4 is better or 2 times 3 times 4 is better.

Example of modifying the Prometheus configuration file

In the first scenario, clusters are distinguished by relabel_configs.

In the case of Blackbox_exporter, because two cluster deployments have an intersection of machines, only one Blackbox_exporter can operate in a real production environment to save resources.

In the modified experimental environment, refer to Prometry-example.

After returning to Prometheus service, check whether all jobs are UP in status -> Target on Prometheus’ Web GUI.

By checking a metric at random, such as PD_REGIONS_status, you can see that the cluster label has two values, tiDB-C1-v409 and tidb-C2-v409.

Grafana integration scheme

View the Datasource information in Grafana

Because Prometheus has consolidated all metrics into the same Prometheus, you need to configure Prometheus in Grafana.

View the reports in Grafana

Using the Overview report as an example, the display of the report is abnormal. The information of the two clusters was mixed together and there was no way to distinguish them. In this example, tiDB-C1-V409 and TIDB-C2-V409 have three TIDB nodes respectively, but the information of these nodes is mixed together in the Overview Dashboard.

Overview dashboard -> Service port Status as an example to analyze the report definition Probe_success {group=”tidb”} == 1

If the cluster information is missing, manually push the cluster information.

count(probe_success{cluster="tidb-c1-v409", group="tidb"} == 1)

After the modification, tiDB node information of TIDB-C1-V409 can be displayed.

Push cluster information into the dashboard

By manually pushing information into the cluster, we can verify that the dashboard displays properly.

Try the following logic to push cluster information into the dashboard:

  • Through the curl -s command can view http://192.168.12.34:9090/api/v1/targets to all metric url traverse these urls, given all the metric

  • Iterate through all the metrics and push cluster information one by one in the report. For example, overview.json is used

  • For the “expr” : “node_memory_MemAvailable_bytes” formula, which has no option, directly push the cluster information to “expr” : “Node_memory_MemAvailable_bytes {cluster =” tidb – c1 – v409 “}”

  • For “expr” : “\ncount(probe_success{group=” tidb “} == 1)” “\ ncount (probe_success {cluster =” tidb – c1 – v409 “, group = “tidb”} = = 1)”

For details, see tidb_dashboard_inject_cluster.sh 1

Run the tidb_dashboard_inject_cluster.sh script to inject cluster information. Copy the original Dashboard folder each time and run the script:

[root@r34 ~]# rm -rf dashboards && cp -r dashboards-bak/ dashboards && ./tidb_dashboard_inject_cluster.sh "tidb-c1-v409" "/ root/dashboards" "192.168.12.34:9090"Copy the code

Check the injected script:

[root@r34 dashboards]# cat overview.json | grep expr | grep -v tidb-c1-v409
              "expr": "sum(increase(tidb_server_execute_error_total[1m])) by (type)",
              "expr": "sum(rate(tikv_channel_full_total[1m])) by (instance, type)",
              "expr": "sum(rate(tikv_server_report_failure_msg_total[1m])) by (type,instance,store_id)",
Copy the code

None of these are present in/TMP /tidb-metirc and can be changed manually. Since no metric had been obtained, and Prometheus had no metric, it was not that important.

Import the redefined report into Grafana

You can skip the script import-dashboard.sh to batch import dashboards into Grafana via the Grafana API.

See SOP Series 14 for details on how multiple TiDB clusters share a Grafana.

New problems introduced after index integration

By doing so, we have been able to integrate Prometheus and Grafana from different clusters into the same Prometheus + Grafana monitoring platform. What are the risks of this:

  • New bugs may be introduced during the integration process – this is inevitable, and customization requires tolerance for bugs. It will only get better and better in later operation and maintenance

  • Large quantities of metric information can cause

  • Prometheus performance issues Prometheus and Grafana are not yet highly available

Not only TiDB monitoring, but also Kubernetes monitoring uses a set of cluster monitoring method to collect metrics. Since Prometheus itself only supports standalone deployment and does not support cluster deployment, we could not have high availability or horizontal expansion. Prometheus’ data storage capacity is also limited by the disk capacity of a standalone device. In the All in One scenario, Prometheus collects a large amount of data, consumes a large amount of resources and may not achieve optimal performance. Splitting Prometheus was inevitable.

However, in the actual environment, in order to save resources and facilitate operation and maintenance, many enterprises integrate multiple different cluster monitoring into one monitoring platform as described above. The concept of how fast and how much less is difficult to achieve on a surveillance platform. More can not be fast, long time can not save.

Performance problems can be solved from the following aspects:

  • Delete those low cost performance indicators with low usage and high space occupancy

  • Reduced the retention policy for historical records stored by Prometheus

  • Flow Prometheus data into the data warehouse

  • Summarize data in a federated manner

Deal with low cost performance indicators

Metrics that are low usage and high footprint should be removed as soon as the business needs can be met. Such low cost performance indicators are likely to result in Prometheus’ OOM. You can use the following alarm rules to find indicators that occupy a large amount of space. If the usage is not high, you can use the drop command in relabel_config to delete the indicators.

count by (\_\_name\_\_)({\_\_name\_\_=~".+"}) > 10000

Prometheus split problem

What did we just do: Combine different cluster information into one monitoring platform. What to do: Split data from a monitoring platform into multiple Prometheus platforms.

The splitting of Prometheus can be considered from the following dimensions:

  • Split by business dimension

  • Sharding large services

Split by business dimension

Breaking it down from the business dimension is the opposite of our goal. If I split it that way, then I’m fine with doing nothing.

Sharding large businesses

When the business is very complex or the historical data needs to be retained for a long time, you can consider dividing the business into multiple groups. In this case, we have no time to dismantle, there is no need for data integration.

Trade-offs and compromises should be made between dismantling and integration

Integration can cause performance problems. To address performance issues, we split Prometheus. To be or not To be, it is a question. Suppose we take a hybrid approach to compromise:

Then a new problem might be introduced, and our query might come from a different datasource. In a dashboard, we can’t aggregate queries against dashboards that can’t. To solve this problem, there are basically two solutions:

  • Prometheus federated query

    • Each line of business has its own Prometheus monitoring system, which may monitor multiple subsystems

    • Aggregation of Prometheus lines of business from a central Prometheus Server

  • Centralized storage of data will

    • Data from Prometheus is periodically imported into the data warehouse; Prometheus retains only data for a short period of time

    • Prometheus is treated as an Adapter and does not store data, and the collected data is directly aggregated into the database

    • Prometheus is itself a timing database and can be replaced by other libraries such as InfluxDB (the open source version does not support high availability) or TimescaleDB

Store Prometheus data centrally

Prometheus + Grafana is a monitoring system similar in nature to the data warehouse model, which is a database + report presentation model.

Grafana is also a reporting tool that supports a variety of data sources. In addition to Prometheus, we can store data in relational databases such as PostgreSQL or MySQL.

We have two options for importing metrics into the database:

  • Extract METIRC directly into the database programmatically;

  • Data extraction to the database through Prometheus and associated Adapters: an additional layer of middleware, more components, but less work.

Import Prometheus data into PostgreSQL

TimescaleDB, an open source timing database based on PostgreSQL, is itself very similar to Prometheus.

Superior query speed, high availability, and horizontal row scaling compared to Prometheus. SQL statements are more operational friendly than PromSQL. Timescale itself provides prometheus-PostgresQL-Adapter, which is stable, efficient and easy to maintain compared to other three-party tools.

For details about how to install Prometheus – Postgresql-Adapter, see Prometheus – Postgresql-Adapter Installation

Take the prometheus-PostgresqL-Adapter one step further

We have been able to store Prometheus metadata in PostgreSQL, so how do we display reports in Grafana? There are two paths we can take:

  • Use PostgreSQL directly as Grafana data source – simple architecture and a lot of work to modify;

  • Add another layer on top of PostgreSQL and use PromQL to read PostgreSQL data – complex architecture with little modification

At present, the Prometheus-PostgresQL-Adapter project has been replaced by the Promscale project. Promscale is more convenient to use TimescaleDB + PostgresQL as a remote storage for Prometheus than promethes-PostgresqL-Adapter.

Promscale provides us with the following features:

  • SQL and PromQL dual engine query analysis metric

  • PostgreSQL provides persistent storage capability and performance for historical data analysis

  • High availability of data

  • The ACID properties

  • The Timescale DB provides horizontal scalability

Prometheus multi-tenant and high availability solutions

Thanos and Cortex are both highly available and multi-tenant solutions for Prometheus. And both entered the CNCF incubator project.

Cortex was built as an extensible, easy-to-use solution for Prometheus monitoring and long-term storage. The Cortex multi-tenant feature isolated different Prometheus sources in a single cluster, allowing unaccredited parties to share a cluster. Thanos is an easy-to-install solution for executing instances on the user’s Prometheus to transition to a monitoring system with long-term storage capabilities.

Thanos and Cortex are both good Prometheus multi-tenant and high-availability solutions, but Thanos is the solution chosen for this article:

  • All components in Thanos are stateless.

  • Monitoring data and cluster state are persisted to the object storage OSS

  • Support for high availability deployment of Prometheus

  • Documentation is robust and the user base is higher than other solutions

Thanos features and components

Sidecars:

  • Run the Sidecar container in Pod as Prometheus

  • Uploading chunks of data to object Storage (OSS) as Prometheus

  • Supports multiple object storage systems (OSS), such as Aliyun, Tencent Cloud, S3, Google Cloud storage, and Azure storage

  • Seamlessly integrated for deployment in Prometheus Operator

Store:

  • Chunks are retrieved from the object storage (OSS) for querying long-term monitoring metrics

  • Time-based partitioning queries

  • Tag-based partitioning queries

Compact:

  • Create Downsampled chunks for monitoring data in OSS to expedite data queries over long periods

Query:

  • As a PromQL query entry instead of a Prometheus query

  • Eliminate duplicate data from different data sources (multiple stores)

  • Partial response support

Rule:

  • A simplified version of Prometheus (mostly using rule functionality, no fetching of data, no PromQL parsing queries)

  • Write the result to OSS in Prometheus 2.0 storage format

  • Storage node primarily for Rule (uploading TSDB blocks to OSS via StoreAPI)

The Bucket:

  • The supervisor stores monitoring data in the object storage Bucket

Thanos’ Docker-compose case

The following project is for TiDB to connect to Thanos’ Docker-compose implementation: tidb-Thanos-quick-start

The docker-compose environment for Thanos requires the following image:

  • PROM/Prometheus: v2.24.1

  • Quay. IO/thanos/than…

  • minio/minio:RELEASE.2020-05-01T22-19-14Z

  • PROM/node – exporter: v0.18.1

  • PROM/alertmanager: v0.21.0

  • GCR. IO/google_cont…

  • Grafana/grafana: 7.3.7

In this Docker compose, two sets of Prometheus monitors are created, each for monitoring two TIDB systems. We can deploy two unmonitored TIDB clusters separately and collect metrics from Prometheus in Docker Compose. This allows us to look up both sets of cluster monitoring in the Query component at the same time. You can even compare them.