This article belongs to the K8S monitoring series, and the other articles are:

  1. K8s Monitoring (1) Install Prometheus
  2. K8s Monitoring (2) Monitors cluster components and PODS
  3. K8s Monitoring (3) Prometry-Adapter

K8s monitoring the fourth article, this article is about monitoring the host machine indicators. Node_exporter is official and most users do this with node_exporter, but I prefer Telegraf. The reason is that Node_exporter has the following major pain points:

  • There are too many indicators. For CPUS alone, each CPU core has 6 indicators. If there are 72 cores, then there are 432 indicators for cpus alone.
  • You can’t customize the metrics you want to collect, so you either collect them or you don’t collect them, not just parts of them.
  • Customized monitoring scripts are not supported.
  • There are no indicators of TCP’s 11 states (maybe I don’t know what to look at?) And I don’t know what to do with so many network indicators. I can’t understand any of them.

Telegraf has no such problems. With that in mind, this article will deploy both, and the choice is up to you.

As usual, all yML files have been uploaded to Github.

node_exporter

Just note the following:

  • Use Daemonset to ensure that each K8S node is deployed;
  • To mount both /proc and /sys on the host, you need to mount the root.
  • Use the host network namespace.

There are only 5 deployment files, all of which start with Node-exporter. Kubectl apply = kubectl apply = kubectl apply = kubectl apply = kubectl apply = kubectl apply = kubectl

The curl 127.0.0.1:9100 / metricsCopy the code

After you have collected all indicators, modify Prometry-config. yml and add the following configuration:

- job_name: node_exporter
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - monitoring
  scrape_interval: 30s
  scrape_timeout: 30s
  tls_config:
    insecure_skip_verify: true
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  relabel_configs:
  - action: keep
    source_labels:
    - __meta_kubernetes_service_label_k8s_app
    regex: node-exporter
Copy the code

Change the – in the service label name k8S-app to _, otherwise Reload Prometheus will report an error.

After the modification, execute kubectl apply-f promethy-config. yml. At this point, you’d better log in to Prometheus to check whether the configuration file is valid.

curl -XPOST POD_IP:9090/-/reload
Copy the code

This can then be seen in the Target of the Prometheus Web page.

telegraf

Node_exporter has too many ambiguous indicators, which can take up a lot of extra resources, so I choose the more customized Telegraf. Telegraf is a metric collection tool developed by InfluxData using Go, and InfluxData’s other product, InfluxDB, is well known, These two factors together with the remaining Chronograf and Kapacitor constitute the monitoring system tick of InfluxData.

I’m not going to mention tick here, we’re just going to use telegraf. Telegraf is similar to logstash. It is divided into four parts: Input, Processor, Aggregator and Output. Each part is provided with specific functions by each plug-in. All of telegraf’s functionality is provided by plug-ins, but plug-ins fall into four categories.

This article will use the Input, Output and Processor, as for the Aggregator (aggregate, used to calculate the maximum, minimum, average, etc.) of a period of time can be studied by interested children.

Here, we use Telegraf to collect performance indicators of the host computer. Since there are many indicators, including CPU, memory, disk, network, etc., multiple input plug-ins will be used. Some plug-ins offer options that give us more control over what metrics we want to collect, which is handy and much more useful than node_exporter’s collection.

After obtaining these metrics, Prometheus Output is used because it is collected by Prometheus, and all collected metrics are displayed on the MeTIRC page.

Let’s start with its configuration file. All of its configuration is in this configuration file. Before we get into configuration files, we need to know that Telegraf has some concepts of its own:

  • Field: indicates the name of an indicator.
  • Tag: indicates the tag of an indicator.

To avoid duplication, I have posted the configMap content directly, and we only need to start with [Agent].

apiVersion: v1
kind: ConfigMap
metadata:
  name: telegraf
  namespace: monitoring
  labels:
    name: telegraf
data:
  telegraf.conf: |+
    [agent]
      interval = "10s"
      round_interval = true
      collection_jitter = "1s"
      omit_hostname = true
    [[outputs.prometheus_client]]
      listen = ": 9273"
      collectors_exclude = ["gocollector", "process"]
      metric_version = 2
    [[inputs.cpu]]
      percpu = false
      totalcpu = true
      collect_cpu_time = false
      report_active = false
    [[inputs.disk]]
      ignore_fs = ["tmpfs", "devtmpfs". "devfs". "iso9660". "overlay". "aufs". "squashfs"]
      [inputs.disk.tagdrop]
        path = ["/etc/telegraf", "/dev/termination-log". "/etc/hostname". "/etc/hosts". "/etc/resolv.conf"]
    [[inputs.diskio]]
    [[inputs.kernel]]
    [[inputs.mem]]
      fielddrop = ["slab", "wired". "commit_limit". "committed_as". "dirty". "high_free". "high_total". "huge_page_size". "huge_pages_free". "low_free". "low_total". "mapped". "page_tables". "sreclaimable". "sunreclaim". "swap_cached". "swap_free". "vmalloc_chunk". "vmalloc_total". "vmalloc_used". "write_back". "write_back_tmp"]
    [[inputs.processes]]
    [[inputs.system]]
    [[inputs.netstat]]
    [[inputs.net]]
      ignore_protocol_stats = true
      interfaces = ["eth*", "bond*". "em*"]
      fielddrop = ["packets_sent", "packets_recv"]
Copy the code

The configuration file

The official documentation of the Telegraf configuration file is here, there is not much in it, you can have a look. It doesn’t matter if you don’t want to see it. I’ll explain everything I have here. Telegraf uses the CONFIGURATION file format of TOML. [] indicates a dictionary and [[]] indicates a list. Yml looks like this:

agent:
  # Collection interval
  interval: 30s
  # Without this, it looks like it would only be collected once
  round_interval: true
  # Multiple inputs may cause CPU spikes if they are collected at the same time. Use this time to stagger
  collection_jitter: 1s
  [bug Mc-10868] - no hostname tag will be added for all metrics
  omit_hostname: true
inputs:
  - disk:
    The specified file system is not collected
    ignore_fs: []
    # if the tag has path as the following, do not collect the corresponding index
    tagdrop:
      path: ["/etc/telegraf", "/dev/termination-log". "/etc/hostname". "/etc/hosts". "/etc/resolv.conf"]
  - system: {}
  - cpu:
    Node_exporter uses this option. You can't turn it off yet, but Telegraf can
    percpu: false
    # This is definitely on, statistics total CPU usage
    totalcpu: true
    # Count CPU time to see if you need it
    collect_cpu_time: false
    # add a new active index, which is the sum of all values except idle. If CPU time is not counted, subtract 100 from it
    # idle, the value is the value of active
    report_active: false
  - mem:
    Check the inputs in mem inputs for inputs
    # Because the old man is not good at learning, many memory indicators do not understand, simply kill them, you decide by yourself
    fielddrop: []
outputs:
  - prometheus_client:
    listen: : 9273
    # Exclude go itself (Goroutine, GC, etc.) and Process
    collectors_exclude: ["gocollector", "process"]
Copy the code

Among the four major parts of Telegraf, only Processor has no corresponding keyword. Currently, it only filters Input, Output, and Aggregator. Fielddrop and Tagdrop in the Input configuration file belong to the Processor and are used to filter indicators. The keywords for filtering are:

  • namepass: The indicator name is used as the filtering condition. Pass is a whitelist. The value of pass is a list, and the elements in the list can use wildcards.
  • namedrop: Blacklist. Note that name is not the same as field. For example, there is a total field in the memory indicator, but its name is mem_total.
  • fieldpass: filters by field name, its value type is also list;
  • fielddrop: Blacklist;
  • tagpass: If the tag contains a key/value, the metric is not collected. Note that its value type is dictionary, as described above;
  • tagdrop: Blacklist;
  • taginclude: This is to remove the tag and the value type is list. Keep all tags in the list;
  • tagexclude: Deletes all tags from the list.

I only used TagDrop and FieldDrop, and you can use anything else you need. The Processor can easily delete indexes that we do not need, which is very convenient.

Combined with this, you should be able to easily understand the configuration I’m using here. I’ve only collected a few common system metrics here, but if you need more, check out the official input documentation for a variety of plug-ins.

pod

Telegraf obviously also uses Daemonset, and there are some key points to mention about its POD configuration.

  • Mount /proc and /sys separately. Disk metrics will be problematic.
  • Use HOST_PROC, HOST_SYS, and HOST_MOUNT_PREFIX environment variables to let Telegraf collect the directory of the mounted host.
  • HostNetwork and hostPID must be true.
  • Use securityContext to run pod as a non-root user. The uid specified is the uid of the host user, regardless of whether the user is present in the image. Once pod is running, you can pass through the pod hostps -ef|grep telegrafView running users.
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: telegraf
  namespace: monitoring
  labels:
    k8s-app: telegraf
spec:
  selector:
    matchLabels:
      name: telegraf
  template:
    metadata:
      labels:
        name: telegraf
    spec:
      containers:
      - name: telegraf
        image: Telegraf: 1.13.2 - alpine
        resources:
          limits:
            memory: 500Mi
          requests:
            cpu: 500m
            memory: 500Mi
        env:
        - name: "HOST_PROC"
          value: "/host/proc"
        - name: "HOST_SYS"
          value: "/host/sys"
        - name: "HOST_MOUNT_PREFIX"
          value: "/host"
        volumeMounts:
        - name: config
          mountPath: /etc/telegraf
          readOnly: true
        - mountPath: /host
          name: root
          readOnly: true
      hostNetwork: true
      hostPID: true
      nodeSelector:
        kubernetes.io/os: linux
      securityContext:
        runAsNonRoot: true
        runAsUser: 65534
      tolerations:
      - operator: Exists
      terminationGracePeriodSeconds: 30
      volumes:
      - name: config
        configMap:
          name: telegraf
      - hostPath:
          path: /
        name: root
Copy the code

I won’t say much about Services, which are used for discovery by Prometheus. After applying all three files, modify the Prometheus configuration.

Modify Prometheus configuration

The Prometheus configuration may vary depending on usage.

    - job_name: telegraf
      kubernetes_sd_configs:
      - role: endpoints
        namespaces:
          names:
          - monitoring
      scrape_interval: 30s
      scrape_timeout: 30s
      tls_config:
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: keep
        source_labels:
        - __meta_kubernetes_service_label_k8s_app
        regex: telegraf
      - source_labels:
        - __meta_kubernetes_endpoint_node_name
        target_label: instance
Copy the code

The only configuration added here is to change the instance tag to node name instead of the default __address__. If you want to keep the default instance, you can change instance to whatever name you want. I changed the default instance to Node name because there were too many indicators.

The node name is used in conjunction with the kubectl top node command (which covered the previous article in this series). Therefore, the value of the instance tag should correspond exactly to the value you would see using kubectl Get Node. Of course, if you can use the kubectl top node command directly, then there is no need to add this tag.

After modifying the apply, and then was also the exec in the Prometheus container, check the/etc/Prometheus/config/Prometheus. If yml has been changed. After the changes are made, the execution is returned to the host;

curl -XPOST PROMETHEUS_CONTAINER_IP:9090/-/reload
Copy the code

After reload, you can access the metrics page directly from the host IP.

curl IP:9273/metrics
Copy the code

You can see that the indicators are clear and easy to understand, and are small in number, far more than node_exporter.

Modify Prometheus Adapter configuration

In the previous article, we deployed the Prometheus Adapter and used it to provide the Resource Metric API through which we could use the Kubectl top command. But since I removed the index with an id=”/” tag, the default node index query is invalidated.

If you want to use it, you can restore the index that was deleted. The default query statement looks like this:

# cpu
sum(rate(container_cpu_usage_seconds_total{<<.LabelMatchers>>, id='/'}[1m])) by (<<.GroupBy>>)

# memory
sum(container_memory_working_set_bytes{<<.LabelMatchers>>,id='/'}) by (<<.GroupBy>>)
Copy the code

But if you restore all of them, the number of indicators will increase a lot, some of the gains outweigh the losses. Now that we have collected the host’s metrics, we can simply ask it to query the host’s metrics, not the container’s. So we just need to replace these two queries with these two:

# cpu
100-cpu_usage_idle{cpu="cpu-total", <<.LabelMatchers>>}

# mem
mem_used{<<.LabelMatchers>>}
Copy the code

But you need to make sure that the following configuration exists:

        resources:
          overrides:
            instance:
              resource: nodes
Copy the code

This configuration maps the node resource to the value of the instance tag. When you execute kubectl top node, it will fetch all nodes and then put each node into a query expression, such as the CPU with node name k8S-node1:

100-cpu_usage_idle{cpu="cpu-total", instance="k8s-node1"}
Copy the code

The complete configuration can be seen on Github, and is actually just two query statements changed.

The kubectl top node command can be executed after the Prometheus Adapter pod is removed and restarted after apply, but the CPU display is not correct. If you are not familiar with the adapter implementation logic, you can study it. But memory is accurate, just look at memory.