Wechat official account: Operation and maintenance development story, author: Wanger


When we deployed the Web site before, there was a monitoring part in the architecture diagram, and it is very important to build an effective monitoring platform for operation and maintenance, only in this way can we ensure the stable operation of our servers and services more efficiently. There are several common open source monitoring software. Zabbix, Nagios, Open-Flcon, and Prometheus, each with its own advantages and disadvantages, can be found by interested children, but Prometheus is more friendly to K8S cluster monitoring, Today we will look at how to deploy a Prometheus to monitor K8S from all angles

The main content

  • 1. Prometheus architecture

  • 2.K8S monitoring indicators and implementation ideas

  • 3. Deploy Prometheus on K8S

  • 4. Configuration resolution based on K8S service discovery

  • 5. Deploy Grafana on K8S platform

  • 6. Monitor Pod, Node, and resource objects in the K8S cluster

  • 7. Visually display Prometheus monitoring data using Grafana

  • 8. Alarm rules and alarm notification

1 Prometheus architecture

What is Prometheus

Prometheus is a monitoring system originally built on SoundCloud. It has been a community open source project since 2012 and has a very active developer and user community. To emphasize open source and independent maintenance, Prometheus joined the Cloud Native Cloud Computing Foundation (CNCF) in 2016, becoming the second hosting project after Kubernetes. Website: Prometheus. IO github.com/prometheus

Composition and architecture of Prometheus

  • Prometheus Server: collects indicators, stores time series data, and provides an interface for querying data

  • ClientLibrary: ClientLibrary

  • Push Gateway: short-term storage indicator data. Mainly used for temporary tasks

  • Exporters: Collects the monitoring indicators of the existing third-party service and exposes metrics

  • Alertmanager: alarm

  • Web UI: Simple Web console

    The data model

    Prometheus stores all data as time series; Have the same metric name and the label belongs to the same metric. Each time series is uniquely identified by a metric name and a set of key-value pairs (also known as labels). Time series format:

{=,… } Example: API_http_requests_total {method= “POST”, handler= “/messages”}

Jobs and examples

Scrape_configs: -job_name: ‘Prometheus’ Static_configs: -targets: [‘ localhost:9090 ‘] -job_name: ‘node’ static_configs: -targets: [‘ 192.168.1.10:9090 ‘]

2 K8S monitoring indicators and implementation ideas

K8S Monitoring indicators

Kubernetes itself monitors

  • Node Resource Usage

  • The Node number

  • Number of Pods (Node)

  • Resource Object status

Pod monitoring

  • Pod Quantity (item)

  • Container Resource utilization

  • The application

    Prometheus monitors the architecture of K8S implementations

Monitoring indicators The specific implementation For example,
Pod performance cAdvisor Container CPU
The Node performance CPU node – exporter node Memory usage
K8S Resource object kube-state-metrics Pod/Deployment/Service

Service discovery: Prometheus. IO/docs/promet…

3 Deploy Prometheus on the K8S platform

3.1 Cluster Environment

|

|

|

|

The IP address role note
192.168.73.136 nfs
192.168.73.138 k8s-master
192.168.73.139 k8s-node01
192.168.73.140 k8s-node02
192.168.73.135 k8s-node03

|

|

3.2 Project Address

Git Clone project code can be directly selected, or you can write your own, the following operation will also have the implementation code

[root@k8s-master src]# git clone https://github.com/zhangdongdong7/k8s-prometheus.git Cloning into 'k8s-prometheus'... remote: Enumerating objects: 3, done. remote: Counting objects: 100% (3/3), done. remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0 Unpacking objects: 100% (3/3), done. [root@k8s-master src]# cd k8s-prometheus/ [root@k8s-master k8s-prometheus]# ls alertmanager-configmap.yaml kube-state-metrics-rbac.yaml prometheus-rbac.yaml alertmanager-deployment.yaml kube-state-metrics-service.yaml Prometheus - rules. Yaml alertmanager - PVC, yaml node_exporter - 0.17.0. Linux - amd64. Tar. Gz Prometheus - service. Yaml alertmanager-service.yaml node_exporter.sh prometheus-statefulset-static-pv.yaml grafana.yaml OWNERS prometheus-statefulset.yaml kube-state-metrics-deployment.yaml prometheus-configmap.yaml README.mdCopy the code

3.3 Authorization using RBAC

Role-based Access Control (RBAC) : implements Authorization. Write authorization YAML

[root@k8s-master prometheus-k8s]# vim prometheus-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
rules:
  - apiGroups:
      - ""
    resources:
      - nodes
      - nodes/metrics
      - services
      - endpoints
      - pods
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - configmaps
    verbs:
      - get
  - nonResourceURLs:
      - "/metrics"
    verbs:
      - get
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: kube-system
Copy the code

create

[root@k8s-master prometheus-k8s]# kubectl apply -f prometheus-rbac.yaml
Copy the code

3.4 Configuration Management

Use Configmap to save configuration information that does not require encryption. You need to change the IP addresses of Nodes based on their own addresses

[root@k8s-master prometheus-k8s]# vim prometheus-configmap.yaml # Prometheus configuration format https://prometheus.io/docs/prometheus/latest/configuration/configuration/ apiVersion: v1 kind: ConfigMap # metadata: name: prometheus-config namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: EnsureExists data: prometheus.yml: | rule_files: - /etc/config/rules/*.rules scrape_configs: - job_name: prometheus static_configs: - targets: - localhost:9090 - job_name: kubernetes-nodes scrape_interval: 30s static_configs: - targets: 192.168.73.135:9100-192.168.73.139:9100-192.168.73.140:9100 - job_name: 192.168.73.138:9100-192.168.73.140:9100 kubernetes-apiservers kubernetes_sd_configs: - role: endpoints relabel_configs: - action: keep regex: default; kubernetes; https source_labels: - __meta_kubernetes_namespace - __meta_kubernetes_service_name - __meta_kubernetes_endpoint_port_name scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token - job_name: kubernetes-nodes-kubelet kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token - job_name: kubernetes-nodes-cadvisor kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __metrics_path__ replacement: /metrics/cadvisor scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token - job_name: kubernetes-service-endpoints kubernetes_sd_configs: - role: endpoints relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scrape - action: replace regex: (https?) source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scheme target_label: __scheme__ - action: replace regex: (.+) source_labels: - __meta_kubernetes_service_annotation_prometheus_io_path target_label: __metrics_path__ - action: replace regex: ([^:]+)(? ::\d+)? ; (\d+) replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_service_annotation_prometheus_io_port target_label: __address__ - action: labelmap regex: __meta_kubernetes_service_label_(.+) - action: replace source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - action: replace source_labels: - __meta_kubernetes_service_name target_label: kubernetes_name - job_name: kubernetes-services kubernetes_sd_configs: - role: service metrics_path: /probe params: module: - http_2xx relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_service_annotation_prometheus_io_probe - source_labels: - __address__ target_label: __param_target - replacement: blackbox target_label: __address__ - source_labels: - __param_target target_label: instance - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - source_labels: - __meta_kubernetes_service_name target_label: kubernetes_name - job_name: kubernetes-pods kubernetes_sd_configs: - role: pod relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scrape - action: replace regex: (.+) source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_path target_label: __metrics_path__ - action: replace regex: ([^:]+)(? ::\d+)? ; (\d+) replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_pod_annotation_prometheus_io_port target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - action: replace source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - action: replace source_labels: - __meta_kubernetes_pod_name target_label: kubernetes_pod_name alerting: alertmanagers: - static_configs: - targets: ["alertmanager:80"]Copy the code

create

[root@k8s-master prometheus-k8s]# kubectl apply -f prometheus-configmap.yaml
Copy the code

3.5 Statically Deploying Prometheus

Storageclass is used for dynamic provisioning and persistence of data from Prometheus. For details, see the previous article “NFS Dynamic Storage Provisioning in K8S”. In addition, the statically supplied prometheus-statefulset-static-pv.yaml can be used for persistence

[root@k8s-master prometheus-k8s]# vim prometheus-statefulset.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: prometheus namespace: kube-system labels: k8s-app: prometheus kubernetes.io/cluster-service: "True" addonmanager. Kubernetes. IO/mode: Reconcile version: v2.2.1 spec: serviceName: "Prometheus" replicas: 1 podManagementPolicy: "Parallel" updateStrategy: type: "RollingUpdate" selector: matchLabels: k8s-app: prometheus template: metadata: labels: k8s-app: prometheus annotations: scheduler.alpha.kubernetes.io/critical-pod: '' spec: priorityClassName: system-cluster-critical serviceAccountName: prometheus initContainers: - name: "init-chown-data" image: "busybox:latest" imagePullPolicy: "IfNotPresent" command: ["chown", "-R", "65534:65534", "/data"] volumeMounts: - name: prometheus-data mountPath: /data subPath: "" containers: - name: Prometheus -server-configmap-reload image: "jimmidyson/configmap-reload:v0.1" imagePullPolicy: "IfNotPresent" args: - --volume-dir=/etc/config - --webhook-url=http://localhost:9090/-/reload volumeMounts: - name: config-volume mountPath: /etc/config readOnly: true resources: limits: cpu: 10m memory: 10Mi requests: cpu: 10m memory: 10mi-name: Prometheus -server image: "PROM/Prometheus :v2.2.1" imagePullPolicy: "IfNotPresent" args: 10mi-name: Prometheus -server image:" PROM/Prometheus :v2.2.1" imagePullPolicy: "IfNotPresent" args: - --config.file=/etc/config/prometheus.yml - --storage.tsdb.path=/data - --web.console.libraries=/etc/prometheus/console_libraries - --web.console.templates=/etc/prometheus/consoles - --web.enable-lifecycle ports: - containerPort: 9090 readinessProbe: httpGet: path: /-/ready port: 9090 initialDelaySeconds: 30 timeoutSeconds: 30 livenessProbe: httpGet: path: /-/healthy port: 9090 initialDelaySeconds: 30 timeoutSeconds: 30 # based on 10 running nodes with 30 pods each resources: limits: cpu: 200m memory: 1000Mi requests: cpu: 200m memory: 1000Mi volumeMounts: - name: config-volume mountPath: /etc/config - name: prometheus-data mountPath: /data subPath: "" - name: prometheus-rules mountPath: /etc/config/rules terminationGracePeriodSeconds: 300 volumes: - name: config-volume configMap: name: prometheus-config - name: prometheus-rules configMap: name: prometheus-rules volumeClaimTemplates: - metadata: name: Prometry-data spec: storageClassName: managed-nfs-storage # accessModes: - ReadWriteOnce resources: requests: storage: "16Gi"Copy the code

create

[root@k8s-master prometheus-k8s]# kubectl apply -f prometheus-statefulset.yaml
Copy the code

Check the status

[root@k8s-master prometheus-k8s]# kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE coredns-5bd5f9dbd9-wv45t  1/1 Running 1 8d kubernetes-dashboard-7d77666777-d5ng4 1/1 Running 5 14d prometheus-0 2/2 Running 6 14dCopy the code

You can see a Pod for Prometheus-0, which is stateful deployment using the StatefulSet controller, and normal for both containers with Runing states, If not Runing, you can use Kubectl describe pod promeths-0-n kube-system to view error details

3.6 Creating a Service Exposes the access port

NodePort is used to fix an access port. Random ports are not applicable for easy access

[root@k8s-master prometheus-k8s]# vim prometheus-service.yaml kind: Service apiVersion: v1 metadata: name: prometheus namespace: kube-system labels: kubernetes.io/name: "Prometheus" kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile spec: type: NodePort ports: - name: http port: 9090 protocol: TCP targetPort: 9090 nodePort: 30090 # Fixed external access port selector: k8S-app: PrometheusCopy the code

create

[root@k8s-master prometheus-k8s]# kubectl apply -f prometheus-service.yaml
Copy the code

check

[root@k8s-master prometheus-k8s]# kubectl get pod,svc -n kube-system NAME READY STATUS RESTARTS AGE pod/coredns-5bd5f9dbd9-wv45t 1/1 Running 1 8dpod/kubernetes-dashboard-7d77666777-d5ng4 1/1 Running 5 14dpod/prometheus-0 2/2 Running 6 14dNAME TYPE cluster-ip external-ip PORT(S) AGE service/ kube-DNS ClusterIP 10.0.0.2 < None > 53/UDP,53/TCP 13dservice/ kubernetes-Dashboard NodePort 10.0.0.127 <none> 443:30001/TCP 16dservice/ Prometheus NodePort 10.0.0.33 <none> 13dservice/ kubernetes-Dashboard NodePort 10.0.0.127 <none> 443:30001/TCP 16dservice/ Prometheus NodePort 10.0.0.33 <none>  9090:30090/TCP 13dCopy the code

3.7 web access

Using an arbitrary NodeIP port access, access to the address: http://NodeIP:Port, this case is: http://192.168.73.139:30090 access interface as shown for success:

4 Deploy Grafana on the K8S platform

From the above web access, it can be seen that Prometheus’ BUILT-IN UI has few functions, and the visual display function is not perfect, which can not meet the daily monitoring requirements, so we often need to combine Prometheus+Grafana method for visual data display. Github.com/kubernetes/… Grafana.com/grafana/dow… Yaml for Grafana is already written in the project you just downloaded, and modified to suit your environment

4.1 Deploying Grafana using StatefulSet

[root@k8s-master prometheus-k8s]# vim grafana.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: grafana namespace: kube-system spec: serviceName: "grafana" replicas: 1 selector: matchLabels: app: grafana template: metadata: labels: app: grafana spec: containers: - name: grafana image: grafana/grafana ports: - containerPort: 3000 protocol: TCP resources: limits: cpu: 100m memory: 256Mi requests: cpu: 100m memory: 256Mi volumeMounts: - name: grafana-data mountPath: /var/lib/grafana subPath: grafana securityContext: fsGroup: 472 runAsUser: 472 volumeClaimTemplates: - metadata: name: grafana-data spec: storageClassName: Managed-nfs-storage # and Prometheus use the same storage class accessModes: - ReadWriteOnce resources: requests: storage: "1Gi" --- apiVersion: v1 kind: Service metadata: name: grafana namespace: kube-system spec: type: NodePort ports: - port : 80 targetPort: 3000 nodePort: 30091 selector: app: grafanaCopy the code

4.2 Web access of Grafana

Use any NodeIP and port for access. The access address is http://NodeIP:Port. In this example: http://192.168.73.139:30091 access interface as follows, success will require account password, the default password is admin, after landing can make change the password

After login, the screen is as follows

The first step is to add the data source. Click on the Create Your First Data Source database icon and add it as shown below

Step 2: After adding Data, click the green Save&Test at the bottom, and the message “Data sourse is working” will be displayed, indicating that the Data source is added successfully

4.3 Method of monitoring Pod, Node, and resource object data in K8S cluster

1) The Pod kubelet node uses the metrics interface provided by cAdvisor to obtain all Pod and container performance metrics of the node. https://NodeIP:10255/metrics/cadvisor https://NodeIP:10250/metrics/cadvisor

2) Nodes use the node_exporter collector to collect Node resource utilization. Github.com/prometheus/… Using document: Prometheus. IO/docs/guides…

  • Use the node_exporter. Sh script to deploy the node_exporter collector on all servers. You can run the script without modifying it
[root@k8s-master prometheus-k8s]# cat node_exporter.sh #! /bin/bashwget https://github.com/prometheus/node_exporter/releases/download/v0.17.0/node_exporter-0.17.0.linux-amd64.tar.gz tar ZXF Linux-amd64.tar. gz mv node_exporters 0.17.0.linux-amd64 /usr/local/node_exporters cat <<EOF >/usr/lib/systemd/system/node_exporter.service [Unit] Description=https://prometheus.io [Service] Restart=on-failure ExecStart=/usr/local/node_exporter/node_exporter --collector.systemd --collector.systemd.unit-whitelist=(docker|kubelet|kube-proxy|flanneld).service [Install] WantedBy=multi-user.target EOF  systemctl daemon-reload systemctl enable node_exporter systemctl restart node_exporter [root@k8s-master prometheus-k8s]# ./node_exporter.shCopy the code
  • Check whether the node_EXPORTER process is in effect
[root@k8s-master prometheus-k8s]# ps -ef|grep node_exporter
root       6227      1  0 Oct08 ?        00:06:43 /usr/local/node_exporter/node_exporter --collector.systemd --collector.systemd.unit-whitelist=(docker|kubelet|kube-proxy|flanneld).service
root     118269 117584  0 23:27 pts/0    00:00:00 grep --color=auto node_exporter
Copy the code

3) Kube-state-metrics collects the status information of various resource objects in K8S, and only needs to be deployed on the master node github.com/kubernetes/…

  1. Yaml, which created RBAC, licenses Metrics
[root@k8s-master prometheus-k8s]# vim kube-state-metrics-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-state-metrics
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kube-state-metrics
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups: [""]
  resources:
  - configmaps
  - secrets
  - nodes
  - pods
  - services
  - resourcequotas
  - replicationcontrollers
  - limitranges
  - persistentvolumeclaims
  - persistentvolumes
  - namespaces
  - endpoints
  verbs: ["list", "watch"]
- apiGroups: ["extensions"]
  resources:
  - daemonsets
  - deployments
  - replicasets
  verbs: ["list", "watch"]
- apiGroups: ["apps"]
  resources:
  - statefulsets
  verbs: ["list", "watch"]
- apiGroups: ["batch"]
  resources:
  - cronjobs
  - jobs
  verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
  resources:
  - horizontalpodautoscalers
  verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: kube-state-metrics-resizer
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups: [""]
  resources:
  - pods
  verbs: ["get"]
- apiGroups: ["extensions"]
  resources:
  - deployments
  resourceNames: ["kube-state-metrics"]
  verbs: ["get", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kube-state-metrics
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-state-metrics
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: kube-state-metrics
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: kube-state-metrics-resizer
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: kube-system
[root@k8s-master prometheus-k8s]# kubectl apply -f kube-state-metrics-rbac.yaml
Copy the code
  1. Yaml for Deployment and ConfigMap is written for metrics pod Deployment without modification
[root@k8s-master prometheus-k8s]# cat kube-state-metrics-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: kube-state-metrics namespace: kube-system labels: k8s-app: kube-state-metrics kubernetes.io/cluster-service: "True" addonmanager. Kubernetes. IO/mode: Reconcile version: v1.3.0 spec: the selector: matchLabels: k8s - app: Kube-state-metrics version: v1.3.0 replicas: 1 template: metadata: labels: K8S-app: kube-state-metrics version: V1.3.0 annotations: scheduler. Alpha. Kubernetes. IO/critical - pod: 'spec: priorityClassName: system-cluster-critical serviceAccountName: kube-state-metrics containers: - name: kube-state-metrics image: Lizhenliang /kube-state-metrics:v1.3.0 Ports: -name: http-metrics containerPort: 8080-name: Telemetry containerPort: 8081 readinessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 5 timeoutSeconds: 5 - name: Adon-resizer Image: Lizhenliang/Adon-Resizer :1.8.3 Resources: limits: CPU: 100m memory: 30Mi Requests: CPU: 100m memory: 30Mi env: - name: MY_POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: MY_POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace volumeMounts: - name: config-volume mountPath: /etc/config command: - /pod_nanny - --config-dir=/etc/config - --container=kube-state-metrics - --cpu=100m - --extra-cpu=1m - --memory=100Mi - --extra-memory=2Mi - --threshold=5 - --deployment=kube-state-metrics volumes: - name: config-volume configMap: name: kube-state-metrics-config --- # Config map for resource configuration. apiVersion: v1 kind: ConfigMap metadata: name: kube-state-metrics-config namespace: kube-system labels: k8s-app: kube-state-metrics kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile data: NannyConfiguration: |- apiVersion: nannyconfig/v1alpha1 kind: NannyConfiguration [root@k8s-master prometheus-k8s]# kubectl apply -f kube-state-metrics-deployment.yamlCopy the code

2. Write yamL for Service to port expose metrics

[root@k8s-master prometheus-k8s]# cat kube-state-metrics-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: kube-state-metrics
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/name: "kube-state-metrics"
  annotations:
    prometheus.io/scrape: 'true'
spec:
  ports:
  - name: http-metrics
    port: 8080
    targetPort: http-metrics
    protocol: TCP
  - name: telemetry
    port: 8081
    targetPort: telemetry
    protocol: TCP
  selector:
    k8s-app: kube-state-metrics
[root@k8s-master prometheus-k8s]# kubectl apply -f kube-state-metrics-service.yaml
Copy the code

3. Check pod/ kube-state-metrics-7C76bDBF68-kqqGD and expose ports 8080 and 8081

[root@k8s-master prometheus-k8s]# kubectl get pod,svc -n kube-system NAME READY STATUS RESTARTS AGE pod/alertmanager-5d75d5688f-fmlq6 2/2 Running 0 9dpod/coredns-5bd5f9dbd9-wv45t 1/1 Running 1 9dpod/grafana-0 1/1 Running  2 15dpod/kube-state-metrics-7c76bdbf68-kqqgd 2/2 Running 6 14dpod/kubernetes-dashboard-7d77666777-d5ng4 1/1 Running 5 16dpod/prometheus-0 2/2 Running 6 15dNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/alertmanager ClusterIP 10.0.0.207 <none> 80/TCP 13dservice/grafana NodePort 10.0.0.74 < None > 80:30091/TCP 15dservice/ kube-DNS ClusterIP 10.0.0.2 <none> 53/UDP,53/TCP 14dService /kube-state-metrics ClusterIP 10.0.0.194 < None > 8080/TCP,8081/TCP 14dservice/ kubernetes-Dashboard NodePort 10.0.0.127 <none> 443:30001/TCP 17dservice/ Prometheus NodePort 10.0.0.33 <none>  9090:30090/TCP 14d[root@k8s-master prometheus-k8s]#Copy the code

5 Visually display Prometheus monitoring data using Grafana

Generally, when Prometheus collects data, we need to monitor Pod, Node, and resource objects in THE K8S cluster, so we need to install the corresponding plug-in and resource collector to provide API for data acquisition. We have configured it in 4.3. We can also use the statuses of the various collectors in the Target Staus menu in Prometheus’ UI, as shown in the figure below:

Only when the status of each Target is UP, we can use the built-in interface to obtain the relevant data of a monitoring item, as shown in the figure:

It can be seen from the above figure that the interface visualization function of Prometheus is simple and cannot meet the requirements, so we need to combine Grafana to visually display Prometheus monitoring data. In the previous chapter, Granfana has been successfully deployed. Therefore, dashboards and panels need to be added to design and display related monitoring items. However, in fact, there are many mature templates in Granfana community, which we can use directly. Then according to their own environment modify Panel query statement to obtain data grafana.com/grafana/das…

Recommended template:

  • Add template 3119 for cluster resource monitoring, as shown in the figure

  • If a Panel does not display data after the template is added, you can click Edit on the Panel to query the PromQL statement, and then go to Prometheus’ interface to debug whether the PromQL statement can obtain the value. Finally, the monitoring interface after adjustment is shown in the figure

  • Resource status monitoring: 6417 In the same way, add a resource status monitoring template, and then adjust the monitoring page as shown in the figure. You can obtain the monitoring display of various resources in K8S

  • Node monitoring: 9276 In the same way, add a resource monitoring template and adjust the monitoring page, as shown in the figure. You can obtain the basic information about each Node

6 Deploy Alertmanager on the K8S

6.1 Procedure for Deploying Alertmanager

6.2 Deployment Alarms

We use Email to send alarm information

  1. You need to prepare an email address and enable the STMP sending function

  2. Use ConfigMap to store alarm rules and write yamL files for alarm rules, which can be modified and added according to their actual conditions. Prometheus is more troublesome than Zabbix because all alarm rules need to be defined by themselves

[root@k8s-master prometheus-k8s]# vim prometheus-rules.yaml apiVersion: v1 kind: ConfigMap metadata: name: prometheus-rules namespace: kube-system data: general.rules: | groups: - name: general.rules rules: - alert: InstanceDown expr: up == 0 for: 1m labels: severity: error annotations: summary: Instance {{$alllabels. Instance}} stop working Job "{{$labels. The instance}} {{$labels. Job}} has stopped more than five minutes." node. The rules: | groups: - name: node. The rules rules: - alert: NodeFilesystemUsage expr: 100 - (node_filesystem_free_bytes{fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100) > 80 for: 1m labels: severity: warning annotations: summary: "Instance {{ $labels.instance }} : "Description: "{{$allelages. instance}}: {{$allelages. mountpoint}} Partition usage is higher than 80% (current value: {{ $value }})" - alert: NodeMemoryUsage expr: 100 - (node_memory_MemFree_bytes+node_memory_Cached_bytes+node_memory_Buffers_bytes) / node_memory_MemTotal_bytes * 10 0 > 80 for: 1m labels: severity: warning Annotations: summary: "Instance {{$labels. Instance}} Memory usage is too high "description: "{{$value. instance}} memory usage > 80% (current value: {{$value}})" - alert: NodeCPUUsage expr: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100) > 60 for: 1m labels: severity: Warning Annotations: Summary: "Instance {{$alllabels. Instance}} CPU usage is too high" "{{$alllabels. Instance}}CPU usage greater than 60% (current value: {{ $value }})" [root@k8s-master prometheus-k8s]# kubectl apply -f prometheus-rules.yamlCopy the code

3. Compile the yamL file of the alarm ConfigMap, add the Alarm configuration of AlertManager, and configure the email sending address

[root@k8s-master prometheus-k8s]# vim alertmanager-configmap.yaml apiVersion: v1 kind: ConfigMap metadata: name: alertmanager-config namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: EnsureExists data: alertmanager.yml: | global: resolve_timeout: 5m smtp_smarthost: 'xxx.com.cn:25' smtp_from: '[email protected]' '[email protected]' smtp_auth_password: 'xxxxx' receivers: - name: default-receiver email_configs: - to: "[email protected]" route: group_interval: 1m group_wait: 10s receiver: default-receiver repeat_interval: 1m [root@k8s-master prometheus-k8s]# kubectl apply -f alertmanager-configmap.yamlCopy the code

4. Create PVC for data persistence. My YAML file uses the same storage class used for Prometheus installation for automatic provisioning, which needs to be modified according to its actual situation

[root@k8s-master prometheus-k8s]# vim alertmanager-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: alertmanager
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: EnsureExists
spec:
  storageClassName: managed-nfs-storage
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: "2Gi"
[root@k8s-master prometheus-k8s]# kubectl apply -f alertmanager-pvc.yaml
Copy the code

5. Write yamL for Deployment to deploy the Pod of AlertManager

[root@k8s-master prometheus-k8s]# vim alertmanager-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: alertmanager namespace: kube-system labels: k8s-app: alertmanager kubernetes.io/cluster-service: "True" addonmanager. Kubernetes. IO/mode: Reconcile version: v0.14.0 spec: replicas: 1 the selector: matchLabels: k8s - app: Alertmanager version: v0.14.0 Template: Metadata: Labels: K8S-APP: AlertManager version: v0.14.0 Annotations: scheduler.alpha.kubernetes.io/critical-pod: '' spec: priorityClassName: system-cluster-critical containers: - name: Prometheus - alertmanager image: "a PROM/alertmanager: v0.14.0 imagePullPolicy" : "IfNotPresent" args: - --config.file=/etc/config/alertmanager.yml - --storage.path=/data - --web.external-url=/ ports: - containerPort: 9093 readinessProbe: httpGet: path: /#/status port: 9093 initialDelaySeconds: 30 timeoutSeconds: 30 volumeMounts: - name: config-volume mountPath: /etc/config - name: storage-volume mountPath: "/data" subPath: "" resources: limits: cpu: 10m memory: 50Mi requests: cpu: 10m memory: 50Mi - name: prometheus-alertmanager-configmap-reload image: "Jimmidyson/configmap - reload: v0.1 imagePullPolicy" : "IfNotPresent" args: - --volume-dir=/etc/config - --webhook-url=http://localhost:9093/-/reload volumeMounts: - name: config-volume mountPath: /etc/config readOnly: true resources: limits: cpu: 10m memory: 10Mi requests: cpu: 10m memory: 10Mi volumes: - name: config-volume configMap: name: alertmanager-config - name: storage-volume persistentVolumeClaim: claimName: alertmanager [root@k8s-master prometheus-k8s]# kubectl apply -f alertmanager-deployment.yamlCopy the code

6. Create ports exposed by the AlertManager service

[root@k8s-master prometheus-k8s]# vim alertmanager-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: alertmanager
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/name: "Alertmanager"
spec:
  ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: 9093
  selector:
    k8s-app: alertmanager
  type: "ClusterIP"
  [root@k8s-master prometheus-k8s]# kubectl apply -f alertmanager-service.yaml
Copy the code

7. Check the deployment status. The POD/AlertManager-5D75D5688F-fMLq6 and Service/AlertManager are running properly

[root@k8s-master prometheus-k8s]# kubectl get pod,svc -n kube-system -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES POD/AlertManager-5d75d5688F-fmlq6 2/2 Running 4 10D 172.17.15.2 192.168.73.140 < None > <none> pod/coredns-5bd5f9dbd9-qxvmz 1/1 Running 0 42m 172.17.33.2 192.168.73.138 <none> <none> pod/grafana-0 1/1 Running 3 16d 172.17.31.2 192.168.73.139 <none> <none> pod/kube-state-metrics -7C76bdbf68-hv56m 2/2 Running 0 23h 172.17.15.3 192.168.73.140 <none> <none> pod/kubernetes-dashboard-7d77666777-d5ng4 1/1 Running 6 17d 172.17.31.4 192.168.73.139 <none> < None > pod/ Prometheus -0 2/2 Running 8 16d 172.17.83.2 192.168.73.135 < None > < None > NAME TYPE cluster-ip External-ip PORT(S) AGE SELECTOR Service/AlertManager ClusterIP 10.0.0.207 < None > 80/TCP 14D k8S-app = AlertManager Service /grafana NodePort 10.0.0.74 <none> 80:30091/TCP 16d app=grafana service/ kube-DNS ClusterIP 10.0.0.2 <none> 53/UDP,53/TCP 42m k8S-app = kube-DNS service/kube-state-metrics ClusterIP 10.0.0.194 <none> 8080/TCP,8081/TCP 15d K8s-app =kube-state-metrics service/kubernetes-dashboard NodePort 10.0.0.127 < None > 443:30001/TCP 18D K8s-app =kubernetes-dashboard service/ Prometheus NodePort 10.0.0.33 < None > 9090:30090/TCP 15d k8S-app = PrometheusCopy the code

6.3 Sending test Alarms

Log in to the default Prometheus Web page and select the Alert menu to see the four alarm rules defined using Promethy-rules.yaml

Since an InstanceDown instance is defined in the alarm rules, we can disable Kubelet on server 138 to test whether an alarm email can be sent

[root@k8s-master Prometheus -k8s]# kubectl get node NAME STATUS ROLES AGE VERSION 192.168.73.135 Ready < None > 18d v1.15.2 192.168.73.138 Ready < None > 17D v1.15.2 192.168.73.139 Ready < None > 19D v1.15.2 192.168.73.140 Ready < None > 19D v1.15.2  [root@k8s-master prometheus-k8s]# systemctl stop kubelet [root@k8s-master prometheus-k8s]# kubectl get node NAME STATUS ROLES AGE VERSION 192.168.73.135 NotReady < None > 18D v1.15.2 192.168.73.138 Ready < None > 17D v1.15.2 192.168.73.139 Ready < None > 19D v1.15.2 192.168.73.140 Ready < None > 19D v1.15.2Copy the code

After a while, let’s refresh the alarm Rules screen on the Web and see that the InstanceDown instance is pink and shows 2 Active

After waiting for 5 minutes according to the rule, we refresh the configured alarm inbox and receive an InstanceDown email notification. The interval for sending the email can be set in the alertManager-configmap. yaml configuration file to restore the stopped kubelet. No alarm notification will be received by email

If you see here, I believe that you have a certain understanding of how to deploy a set of Prometheus monitoring platform in the K8S cluster to visualization of data display and alarm, read ten thousand books, not travel ten thousand miles, or hope you can own more to practice

Special thanks to Teacher Li Zhenliang for his guidance, interested students can go to Teng Xu class to search his relevant courses

If you have any questions, welcome to discuss and study together

Write in the last

– If you see here, I believe that you have a certain understanding of how to deploy a set of Prometheus monitoring platform in the K8S cluster to visualization of data display and alarm, read ten thousand books, not travel ten thousand miles, or hope you can own more to practice

– Special thanks to Teacher Li Zhenliang for his guidance, interested students can go to Tencent classroom to search his relevant courses

This article uses the article synchronization assistant to synchronize