“This is the 25th day of my participation in the Gwen Challenge in November. See details: The Last Gwen Challenge in 2021”

An overview,

In the architecture design of Prometheus, Prometheus Server does not directly monitor specific targets; its main tasks are data collection, storage, and external data query support. So in order to be able to monitor something like the CPU usage of the host, we need to use MY Exporter. Prometheus periodically pulls monitoring samples from HTTP service addresses (usually /metrics) exposed to its Exporter.

As you can see from the above description, Exporter can be a relatively open concept. It can be a stand-alone application that runs independently of the monitored target or is directly built into the monitored target. As long as monitoring sample data can be provided to Prometheus in a standard format.

In order to collect host operating indicators, such as CPU, memory, and disk information. We can use node_exporter.

Node_exporter collects server running indicators, including loadavg, Filesystem, and meminfo. Node_exporter is similar to zabbix-Agent in traditional host monitoring

Node-export is officially provided and maintained by Prometheus. Node-export is not bundled with Prometheus, but is essentially a prerequisite

Second, the function of

Node_exporter is used to provide hardware and system metrics for the *NIX kernel.

  • For Windows, use wMI_EXPORterr
  • If you want to collect NVIDIA GPU indicators, you can use Prometry-DCGM

Depending on the *NIX operating system, the NODE_EXPORTER collection indicator can be supported differently, such as:

  • Diskstats supports Darwin, Linux
  • CPU support for Darwin, Dragonfly, FreeBSD, Linux, Solaris, etc.

For details, refer to node_exporter

You can use the -collectors. Enabled parameter to specify node_maintain the collection function module, or use the -no-collector parameter to specify the unwanted module. If you do not specify node_maintain the collector function module, the default configuration will be used.

Three, installation,

1. Binary package

Node_exporter is also written in Golang and has no third-party dependencies. Node_exporter can be downloaded and unpacked to run. You can obtain the latest node_EXPORTER binary package from Promethes. IO/Download /.

The curl - OL https://github.com/prometheus/node_exporter/releases/download/v1.1.2/node_exporter-1.1.2.darwin-amd64.tar.gz The tar - XZF node_exporter - 1.1.2. Darwin - amd64. Tar. GzCopy the code

Running node_exporter.

cdNode_exporter - 1.1.2. Darwin - amd64 cp node_exporter - 1.1.2. Darwin - amd64 / node_exporter/usr /local/bin/
node_exporter
Copy the code

After a successful startup, you can see the following output:

INFO[0000] Listening on :9100                            source="node_exporter.go:76"
Copy the code

Visit http://localhost:9100/ to see the following results:

# curl http://localhost:9100
<html>
	<head><title>Node Exporter</title></head>
	<body>
		<h1>Node Exporter</h1>
		<p><a href="/metrics">Metrics</a></p>
	</body>
</html>
Copy the code

Docker container

docker run -d \
  --net="host" \
  --pid="host" \
  -v "/:/host:ro,rslave" \
  quay.io/prometheus/node-exporter \
  --path.rootfs /host
Copy the code

Node_exporter Monitors indicators

If the deployment is binary or DOCKE R, you can visit http://${IP}:9100/metrics and see all monitoring data of the current host obtained by node_EXPORTER, as shown below:

. de_cpu_seconds_total{cpu="2",mode="nice"= 1.43 node_cpu_seconds_total} {the CPU"2",mode="softirq"= 77.66 node_cpu_seconds_total} {the CPU"2",mode="steal"= 618.42 node_cpu_seconds_total} {the CPU"2",mode="system"= 20981.5 node_cpu_seconds_total} {the CPU"2",mode="user"= 26925.45 node_cpu_seconds_total} {the CPU"3",mode="idle"} 2.52970118 e+06 node_cpu_seconds_total {= CPU"3",mode="iowait"= 58.83 node_cpu_seconds_total} {the CPU"3",mode="irq"} 0
node_cpu_seconds_total{cpu="3",mode="nice"= 1.57 node_cpu_seconds_total} {the CPU"3",mode="softirq"= 54.37 node_cpu_seconds_total} {the CPU"3",mode="steal"= 538.14 node_cpu_seconds_total} {the CPU"3",mode="system"= 18511.33 node_cpu_seconds_total} {the CPU"3",mode="user"24297.44}# HELP node_disk_io_now The number of I/Os currently in progress.
# TYPE node_disk_io_now gauge
node_disk_io_now{device="dm-0"} 0
node_disk_io_now{device="dm-1"} 0
node_disk_io_now{device="vda"} 0
# HELP node_disk_io_time_seconds_total Total seconds spent doing I/Os.
# TYPE node_disk_io_time_seconds_total counter
node_disk_io_time_seconds_total{device="dm-0"0.321 node_disk_io_time_seconds_total} {device ="dm-1"13765.443000000001 node_disk_io_time_seconds_total} {device ="vda"} 317.065...Copy the code

Each monitoring indicator is preceded by information similar to the following:

# HELP node_disk_io_time_seconds_total Total seconds spent doing I/Os.
# TYPE node_disk_io_time_seconds_total counter
node_disk_io_time_seconds_total{device="dm-0"0.321 node_disk_io_time_seconds_total} {device ="dm-1"13765.443000000001 node_disk_io_time_seconds_total} {device ="vda"317.065}Copy the code

HELP indicates the current indicator, and TYPE indicates the data TYPE of the current indicator.

In addition to these, you may also see the following monitoring indicators on the current page, depending on the physical host system:

  • Node_boot_time_seconds: specifies the system startup time
  • Node_cpu_seconds_total: indicates the CPU usage
  • Nodedisk * : indicates disk I/O
  • Nodefilesystem * : specifies the filesystem size
  • Node_load1: system load
  • Node_memory * : Memory usage
  • Node_network * : network bandwidth
  • Node_time: indicates the current system time
  • Go_ * : Indicators related to go in a node
  • Process_ * : A node can run indicators related to its own processes

5. Configure Prometheus

To enable Prometheus Server to access monitoring data from the current node, modify the Prometheus configuration file. Edit Prometheus. Yml and add the following under the scrape_configs node:

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  # Collect node exporter monitoring data
  - job_name: 'node'
    static_configs:
      - targets: ['172.16.106.84:9100']
Copy the code

Restart Prometheus Server and access http://${IP}:9090/ to access Prometheus Server. If you type “Up” and click the execute button, you will see the following result:

1 indicates normal; otherwise, 0 indicates abnormal.

Now check the targets status:

Note: A collection of instances for the same collection purpose, or multiple copies of the same collection process, is managed by Job.

* job: node
    * instance 2: 1.2.3.4:9100
    * instance 4: 5.6.7.8:9100
Copy the code

For the configuration file, the Settings are as follows:

 job_name: node
    static_configs:
      - targets: ['172.16.106.116:9100']
        labels:
          instance: vm-1
      - targets: ['172.16.106.119:9100']
        labels:
          instance: vm-2
Copy the code

Query monitoring data using PromQL

Prometheus UI is a built-in visual management interface for Prometheus, allowing users to easily learn about the current configuration of Prometheus and monitor task running status. The Graph panel also enables users to query monitoring data directly using PromQL in real time, or to query monitoring data for specific monitoring metrics using PromQL expressions. To query host load changes, use the node_load1 keyword to query host load sample data collected by Prometheus, which is displayed in chronological order and shows the host load trend over time:

PromQL is a powerful data query language designed by Prometheus. In addition to monitoring metrics as query keywords, PromQL provides a number of built-in functions to help users further process temporal data. For example, the rate() function can be used to calculate the change of sample data in unit time, namely the growth rate. Therefore, through this function, we can approximate the CPU utilization by CPU usage time:


r a t e ( n o d e c p u s e c o n d s t o t a l [ 2 m ] ) rate(node_cpu_seconds_total[2m])

If you want to ignore which CPU is used, you can use the without expression to aggregate the data after removing the label CPU:


a v g w i t h o u t ( c p u ) ( r a t e ( n o d e c p u s e c o n d s t o t a l [ 2 m ] ) ) avg without(cpu) (rate(node_cpu_seconds_total[2m]))

7. Data visualization

Prometheus UI provides the ability to quickly validate PromQL as well as temporary visualization support, while the introduction of monitoring systems in most scenarios often requires building monitoring data visualization panels (dashboards) for long-term use. Consider using third-party visualization tools such as Grafana, an open source visualization platform that provides full support for Prometheus.

Binary package installation:

Wget https://dl.grafana.com/oss/release/grafana-7.4.5.linux-amd64.tar.gz tar - ZXVF grafana - 7.4.5. Linux - amd64. Tar. GzCopy the code

Docker installation:

docker run -d --name=grafana -p 3000:3000 grafana/grafana
Copy the code

To access Grafana’s interface, visit http://localhost:3000 and log in using the admin/admin account by default. The default usage wizard is displayed on the Grafana home page, which includes the following main processes: Install, add data source, create Dashboard, invite members, and install applications and plug-ins:

Here, Prometheus will be added as the default data source, as shown in the following figure. Specify the data source type as Prometheus and set the access address for Prometheus. If the configuration is correct, click “Add” and a message indicating that the connection is successful will be displayed:

After adding the data source, you can configure a node_exporter template in Grafana. Of course, as open source software, the Grafana community encourages users to share dashboards via grafana.com/dashboards, There are plenty of dashboards you can use directly: for example, HERE I selected a popular template (ID: 8919) that looks like this:

Eight, expand knowledge

1. Recommended friend

Node_exporterr is officially recommended by Prometheus exporterr, and similar to:

Officially recommended by github.com/prometheus, there will be a lot of third-party exporters, which can be developed and uploaded by individuals or organizations. If you have custom collection needs, you can write your own.

2, pay attention to the version

Because Node_EXPORTER is an older component with some best practices that have not been merged, such as conforming to the Prometheus naming convention, it is recommended to use a newer version, currently (2021.3) at 1.1.2

Changes in the names of some indicators (detailed comparison)

* node_cpu ->  node_cpu_seconds_total
* node_memory_MemTotal -> node_memory_MemTotal_bytes
* node_memory_MemFree -> node_memory_MemFree_bytes
* node_filesystem_avail -> node_filesystem_avail_bytes
* node_filesystem_size -> node_filesystem_size_bytes
* node_disk_io_time_ms -> node_disk_io_time_seconds_total
* node_disk_reads_completed -> node_disk_reads_completed_total
* node_disk_sectors_written -> node_disk_written_bytes_total
* node_time -> node_time_seconds
* node_boot_time -> node_boot_time_seconds
* node_intr -> node_intr_total
Copy the code

There are two ways to solve the versioning problem:

  • One is to start two versions of Node-Exporter on the machine, both of which Prometheus collects.
  • The second is to use the indicator converter, which will convert the old indicator name to the new indicator
  • For the Grafana demo, look for a Dashboard template that supports both sets of metrics

3. Implementation principle

The main function of Node-Exporter:

package main

import (
	"fmt"
	"net/http"
	_ "net/http/pprof"
	"os"
	"os/user"
	"sort"

	"github.com/prometheus/common/promlog"
	"github.com/prometheus/common/promlog/flag"

	"github.com/go-kit/kit/log"
	"github.com/go-kit/kit/log/level"
	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promhttp"
	"github.com/prometheus/common/version"
	"github.com/prometheus/exporter-toolkit/web"
	"github.com/prometheus/node_exporter/collector"
	kingpin "gopkg.in/alecthomas/kingpin.v2"
)
Copy the code

Can see the realization of the exporter need to introduce github.com/prometheus/client_golang/prometheus library, client_golang is Prometheus official go library, can be used to integrate existing applications, Also used as a base library for connecting to the Prometheus HTTP API.

For example, the underlying data types and corresponding methods are defined:

Counter: Collects monotonically increasing data such as the number of events. Gauge: Collects current state, such as database connection numbers Histogram: Collects random normal distribution data, such as response delay Summary: Collects random normal distribution data, which is similar to HistogramCopy the code

Reference address: github.com/prometheus/…

A detailed explanation of the client_golang library can be found in Prometheus principles and source code analysis

Nine, summary

One might think that with a combination of Prometheus+Grafana+ Node_exporter, manual operations were largely absent. However, we need to understand that for a monitoring platform, all the data it receives must be the data that can be provided by the monitored. A small monitoring collector like Node_EXPORTER can obtain monitoring data, not all performance data of the entire system, but only common counters. The value itself does not change, whether viewed by command or by such a fancy tool. So whether we see data on the monitoring platform or on the command line, it’s important to know what it means and how changes in those values affect the next steps of performance testing and analysis.

References:

  • [1] : www.xuyasong.com/?p=1539
  • [2] : Performance Test Actual Combat lecture 30