The default monitoring indicators of Prometheus Operator cannot fully meet the actual monitoring requirements, so we need to add custom monitoring based on services. The steps to add a custom monitor are as follows: Create a ServiceMonitor object for Prometheus to add monitoring items. 2. Associate the ServiceMonitor object with the Metrics data interface. 3

This article will take how to add Redis monitoring as an example

The deployment of redis

k8s-redis-and-exporter-deployment.yaml

---
apiVersion: v1
kind: Namespace
metadata:
  name: redis
---
apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: redis
  name: redis
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9121"
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
        ports:
        - containerPort: 6379
      - name: redis-exporter
        image: oliver006/redis_exporter:latest
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
        ports:
        - containerPort: 9121
Copy the code

While deploying Redis, we deploy Redis_exporter as a SIDecar and redis service in a Pod. Note also that we have added Annotations: Promethe. IO/Scrape: “True” and Prometheus. IO/port: “9121”

Create a Redis Service

apiVersion: v1
kind: Service
metadata:
  name: redis-svc
  namespace: redis
  labels:
    app: redis
spec:
  type: NodePort
  ports:
  - name: redis
    port: 6379
    targetPort: 6379
  - name: redis-exporter
    port: 9121
    targetPort: 9121
  selector:
    app: redis
Copy the code

Check the deployed service and verify that Metrics is getting the data

[root@]# kubectl get po,ep,svc -n redisNAME READY STATUS RESTARTS AGE pod/redis-78446485d8-sp57x 2/2 Running 0 116m NAME ENDPOINTS AGE endpoints/redis-svc 100.102.126.3:9121100102 126.3:6379 6 m5s NAME TYPE CLUSTER - EXTERNAL IP - the IP PORT (S) AGE service/redis - SVC NodePort 10.105.111.177 < none > / TCP, 6379-32357, 9121:31019 / TCP 6 m5s validation metrics [root @ qd01 - stop - k8s - master001 MyDefine] # curl 10.105.111.177:9121 / metrics# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summaryGo_gc_duration_seconds {quantile="0"} go_gc_duration_seconds{quantile="0.25"} 0 GO_gc_duration_seconds {quantile="0.5"} 0 go_gc_duration_seconds{quantile="0.75"} 0 GO_gc_duration_seconds {quantile="1"} 0 go_gc_duration_seconds_sum 0 go_gc_duration_seconds_count 0# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 8
# HELP go_info Information about the Go environment.
# TYPE go_info gauge.Copy the code

Create ServiceMonitor

Now that Prometheus accesses Redis, create the ServiceMonitor object promethes-ServicemonitorRedis.yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: redis-k8s
  namespace: monitoring
  labels:
    app: redis
spec:
  jobLabel: redis
  endpoints:
  - port: redis-exporter
    interval: 30s
    scheme: http
  selector:
    matchLabels:
      app: redis
  namespaceSelector:
    matchNames:
    - redis
Copy the code

Perform the create and look at -Servicemonitor

[root@]# kubectl apply -f prometheus-serviceMonitorRedis.yaml
servicemonitor.monitoring.coreos.com/redis-k8s created

[root@]# kubectl get serviceMonitor -n monitoring
NAME                      AGE
redis-k8s                 11s
Copy the code

Now switch to the PrometheusUI interface to view targets, you can find that there is more redis-K8S monitoring item created just nowNow you can consult redis- My exporter’s collection of Redis monitoring indicators

Configuration PrometheusRule

We can now collect the monitoring metrics of Redis, but there are no monitoring alarm rules configured. We need to add our own alarm rules based on the metrics we’re actually interested in. First let’s look at the default rules for Prometheus, which look something like this.

Now to add a rule for Redis, check out the AlertManager configuration under Prometheus’ Config page:

The configuration of alertManagers instance is obtained by the Service discovery mechanism of Kubernetes, whose role is Endpoints. The Service name is AlertManager-main, and the port name is not Web. Let’s look at alertManager-main:

[root@]# kubectl describe svc alertmanager-main -n monitoringName: alertmanager-main Namespace: monitoring Labels: alertmanager=main Annotations: <none> Selector: Alertmanager =main,app= AlertManager Type: ClusterIP IP: 10.111.141.65 Port: Web 9093/TCP TargetPort: web/TCP Endpoints: 100.118.246.1:9093100.64. 147.129:9093100.98. 81.194:9093 Session Affinity: ClientIP Events: < none >Copy the code

See that the service name is AlertManager-main and the Port name is web, so Prometheus and AlertManager components are associated correctly. And the corresponding alarm rules file located in: / etc/Prometheus/rules/Prometheus – k8s – rulefiles – 0 / directory all YAML files below. Check for YAML files in Prometheus’ Pod directory:This YAML file is actually what a PrometheusRule file we created earlier contains:The name of PrometheusRule is Promethes-k8s-Rules, and the namespace is monitoring. Yaml automatically generates a corresponding -.yaml file in the promethes-k8s-RuleFiles-0 directory above, so if we need to customize an alarm option later, we just need to define a PrometheusRule resource object. Why was Prometheus able to recognize the PrometheusRule resource object? The resource object Prometheus (Prometheus-Prometheus. yaml) has a very important property, ruleSelector, which matches the rule filter. Requires matching PrometheusRule resource objects with Prometheus = K8s and Role = alert-Rules tags, okay?

  ruleSelector:
    matchLabels:
      prometheus: k8s
      role: alert-rules
Copy the code

So to customize an alarm rule, simply create a PrometheusRule object with Prometheus =k8s and Role =alert-rules tags, for example, now add an alarm to see if Redis is available, To check whether Redis is started, use the redis_up indicator to create the file prometry-redisrules.yaml:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    prometheus: k8s
    role: alert-rules
  name: redis-rules
  namespace: monitoring
spec:
  groups:
  - name: redis
    rules:
    - alert: RedisUnavailable
      annotations:
        summary: redis instance info
        description: If redis_up = = 0. redis will be unavailable
      expr: | redis_up == 0      for: 3m
      labels:
        severity: critical
Copy the code

After creating PrometheusRule, you can see the Redis-Rules we created ourselves

 kubectl apply -f prometheus-redisRules.yaml

kubectl get prometheusrule -n monitoring
NAME                   AGE
etcd-rules             4d18h
prometheus-k8s-rules   17d
redis-rules            15s
Copy the code

Note that the label must have at least Prometheus = K8S or role=alert-rules. After creation, go to the rules folder in the container once in a while:Now you see that the rulefile we created has been injected into the corresponding RuleFiles folder. Then go to Prometheus’ Alert page and see our new Alert rule above:

Configure alarm

Now we know how to add an alarm rule configuration item, but how to send these alarm messages? So this is where we need to configure AlertManager and I’m using email and wechat as examples

The alertManager configuration file alertManager. yaml is created using the alertManager-secret. yaml file. The default configuration is cat alertManager-secret. yaml

apiVersion: v1
kind: Secret
metadata:
  name: alertmanager-main
  namespace: monitoring
stringData:
  alertmanager.yaml: |- "global": "resolve_timeout": "5m" "inhibit_rules": - "equal": - "namespace" - "alertname" "source_match": "severity": "critical" "target_match_re": "severity": "warning|info" - "equal": - "namespace" - "alertname" "source_match": "severity": "warning" "target_match_re": "severity": "info" "receivers": - "name": "Default" - "name": "Watchdog" - "name": "Critical" "route": "group_by": - "namespace" "group_interval": "5m" "group_wait": "30s" "receiver": "Default" "repeat_interval": "12h" "routes": - "match": "alertname": "Watchdog" "receiver": "Watchdog" - "match": "severity": "critical" "receiver": "Critical"type: Opaque
Copy the code

Now we need to modify this file and configure wechat and email related information. The premise is that you need to prepare the enterprise wechat related information by yourself, and you can search related tutorials online. Start by creating the AlertManager.yaml file

global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.51os.club:25'
  smtp_from: 'amos'
  smtp_auth_username: '[email protected]'
  smtp_auth_password: 'Mypassword'
  smtp_hello: '51os.club'
  smtp_require_tls: false
  wechat_api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
  wechat_api_secret: 'SGGc4x-RDcVD_ptvVhYrxxxxxxxxxxOhWVWIITRxM'
  wechat_api_corp_id: 'ww419xxxxxxxx735e1c0'

templates:
- '*.tmpl'

route:
  group_by: ['job'.'severity']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: default
  routes:
  - receiver: wechat
    continue: true
    match:
      alertname: Watchdog

receivers:
- name: 'default'
  email_configs:
  - to: '[email protected]'
    send_resolved: true
- name: 'wechat'
  wechat_configs:
  - send_resolved: false
    corp_id: 'ww419xxxxxxxx35e1c0'
    to_party: '13'
    message: '{{ template "wechat.default.message" . }}'
    agent_id: '1000003'
    api_secret: 'SGGc4x-RDcxxxxxxxxY6YwfZFsO9OhWVWIITRxM'
Copy the code

I have added two receivers here. The default one is sent through mailbox. For Watchdog, we send the alarm through Webhook, which is wechat.

I am lazy here, because now the system happens to have an alarm Watchdog, so I match the alarm Watchdog here. Of course, you can change the self-defined monitoring RedisUnavailable of Redis

Then use it to create a templates file that is the template for sending wechat messages, wechat. TMPL:

{{ define "wechat.default.message" }}
{{- if gt (len .Alerts.Firing) 0 -}}
{{- range $index.$alert : = .Alerts -}}
{{- if eq $index 0 -}}
AlertTpye: {{ $alert.Labels.alertname }}
AlertLevel: {{ $alert.Labels.severity }}

= = = = = = = = = = = = = = = = = = = = =
{{- end }}
===Alert Info===
Alert Info: {{ $alert.Annotations.message }}
Alert Time: {{ $alert.StartsAt.Format "The 2006-01-02 15:04:05" }}
===More Info===
{{ if gt (len $alert.Labels.instance) 0 -}}InstanceIp: {{ $alert.Labels.instance }}; {{- end -}}
{{- if gt (len $alert.Labels.namespace) 0 -}}InstanceNamespace: {{ $alert.Labels.namespace }}; {{- end -}}
{{- if gt (len $alert.Labels.node) 0 -}}NodeIP: {{ $alert.Labels.node }}; {{- end -}}
{{- if gt (len $alert.Labels.pod_name) 0 -}}PodName: {{ $alert.Labels.pod_name}} {{- end }}
= = = = = = = = = = = = = = = = = = = = =
{{- end }}
{{- end }}

{{- if gt (len .Alerts.Resolved) 0 -}}
{{- range $index.$alert : = .Alerts -}}
{{- if eq $index 0 -}}
AlertTpye: {{ $alert.Labels.alertname }}
AlertLevel: {{ $alert.Labels.severity }}

= = = = = = = = = = = = = = = = = = = = =
{{- end }}
===Alert Info===
Alert Info: {{ $alert.Annotations.message }}
Alert Start Time: {{ $alert.StartsAt.Format "The 2006-01-02 15:04:05" }}
Alert Fix Time: {{ $alert.EndsAt.Format "The 2006-01-02 15:04:05" }}
===More Info===
{{ if gt (len $alert.Labels.instance) 0 -}}InstanceIp: {{ $alert.Labels.instance }}; {{- end -}}
{{- if gt (len $alert.Labels.namespace) 0 -}}InstanceNamespace: {{ $alert.Labels.namespace }}; {{- end -}}
{{- if gt (len $alert.Labels.node) 0 -}}NodeIP: {{ $alert.Labels.node }}; {{- end -}}
{{- if gt (len $alert.Labels.pod_name) 0 -}}PodName: {{ $alert.Labels.pod_name }}; {{- end }}
= = = = = = = = = = = = = = = = = = = = =
{{- end }}
{{- end }}
{{- end }}
Copy the code

Yaml and Wechat. TMPL are used to create alertManager-main Secret

kubectl delete secret alertmanager-main -n monitoring
kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml --from-file=wechat.tmpl -n monitoring
Copy the code

After the above steps are created, we will soon receive a wechat message and an alarm message in our mailbox:

If you look at the Configuration information for AlertManager again, you can see that it has changed to the configuration information above