This article continues the previous article on configuring custom alarm rules using Prometheus. In this article, we will demo the process of installing Prometheus and configuring Alertmanager to send emails when an alarm is triggered, but we will do it in a simpler way — via Rancher.

We’ll see how to do this without using dependencies in this article. In this article, we do not need:

  • Specifically configured to run Kubectl pointing to the Kubernetes cluster

  • Kubectl because we can use Rancher UI

  • Installation/configuration of Helm Binary

preparation

  • A Google Cloud account (free) and any other cloud account

  • Rancher V2.4.2 (latest version when this article was published)

  • Kubernetes cluster running on GKE (version 1.15.11-Gke.3) (EKS or AKS is ok)

Start an instance of Rancher

First, start an instance of Rancher. You can start by following Rancher’s instructions:

www.rancher.cn/quick-start…

Deploy a GKE cluster using Rancher

Use Rancher to set up and configure a Kubernetes cluster. You can access the documentation at the link below:

Rancher2. Docs. The rancher. Cn/docs/cluste…

The deployment of Prometheus

We will use Rancher’s app Store to install Prometheus. Rancher’s app Store is mainly a collection of Helm Charts so that users can deploy applications repeatedly.

Once our cluster is up and running, let’s select the default project created for it under the “Apps” TAB and click the “Launch” button.

Now let’s search the chart we’re interested in. There are many fields we can set — but for this demo we will leave the default values. You can find useful information about these values in the Detailed Description section. Don’t worry about problems, just check them out. At the bottom of the page, click Launch. Prometheus Server and Alertmanager will be installed and configured.

When the installation is complete, the page looks like this:

Next, we need to create Services to access Prometheus Server and Alertmanager. Click on the workload TAB below the resource, and in the load balancing section, we can see that it is not configured yet. Click Import YAML, select Prometheus Namespace, copy two YAMLs at once and click Import. Later you’ll see how we know to use those particular ports and component tags.

apiVersion: v1
kind: Service
metadata:
  name: prometheus-service
spec:
  type: LoadBalancer
  ports:
    - port: 80
      targetPort: 9090
      protocol: TCP
  selector:
    component: server
Copy the code
apiVersion: v1
kind: Service
metadata:
  name: alertmanager-service
spec:
  type: LoadBalancer
  ports:
    - port: 80
      targetPort: 9093
      protocol: TCP
  selector:
    component: alertmanager
Copy the code

When done, service will show Active.

In the vertical ellipsis drop-down menu on the right you can find IP and click View/Edit YAML. At the bottom of the YAML file, you’ll see something similar:

Status: loadBalancer: ingress: -ip: 34.76.22.14Copy the code

Accessing the IP will show us the GUIs of Prometheus Server and Alertmanager. You will find that there is nothing to look at at this point because the rules have not been defined and the alarms have not been configured.

Add rules

Rules allow us to trigger alarms. These rules are based on the expression language of Prometheus. Whenever conditions are met, alarms are triggered and sent to the Alertmanager.

Now let’s see how we add rules.

Under the Resources -> Workload TAB, we can see what Deployment creates when it runs chart. Let’s take a closer look at Prometheus-Server and Prometheus-AlertManager.

We start with the first one and understand its configuration, how we edit it and what port the service is running on. Click the vertical ellipsis menu button and click View/Edit YAML.

First, we see two containers associated with Deplolyment: Promethees-server-configmap-reload and Promethees-Server. The specific part of the container Prometheus-Server has some information about it:

As we know, Prometheus is configured via Prometheus.yml. This file (and others listed in serverFiles) will be mounted to the Server Pod. To add/edit rules, we need to modify this file. In effect, this is a Config Map, which can be found under the Resources Config TAB. Click the vertical omit menu button and Edit. In the rules section, let’s add a new rule and click Save.

groups:
  - name: memory demo alert
    rules:
      - alert: High Pod Memory
        expr: container_memory_usage_bytes{pod_name=~"nginx-.*", image! ="", container! ="POD"} > 5000000
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: High Memory Usage

  - name: cpu demo alert
    rules:
      - alert: High Pod CPU
        expr: rate (container_cpu_usage_seconds_total{pod_name=~"nginx-.*", image! ="", container! ="POD"} [m] 5) > 0.04for: 1m
        labels:
          severity: critical
        annotations:
          summary: High CPU Usage
Copy the code

The rules will be automatically loaded by Prometheus Server, and we can see them in the Prometheus Server GUI:

Here’s an explanation of the above two rules:

  • Container_memory_usage_bytes: current memory usage, in bytes, including all memory, accessed at any time.

  • Container_cpu_usage_seconds_total: cumulative CPU time, in seconds

All metrics can be found on the following page:

Github.com/google/cadv…

RE2 syntax is used for all regular expressions in Prometheus. Using regular expressions, we can only select a time series for pods whose names match a particular pattern. In our example, we need to look for pods that start with nginx- and exclude “pod” because this is the container’s parent cgroup and will display statistics for all containers within the POD.

For Container_CPU_usage_SECONds_total, we use what is called a Subquery. It will return our index every five minutes.

If you want to learn more about queries and examples, check out the official Prometheus documentation.

Configure the alarm

Alarms alert us as soon as problems occur, allowing us to immediately know that an error has occurred in the system. Prometheus provides alarms through the Alertmanager component.

On the Resources > Workload TAB, click the button on the menu bar to the right of Prometheus – AlertManager, select View/Edit YAML, and check its configuration:

Alertmanager You can configure alertManager. yml. This file (and others listed in AlertManager Files) will be mounted to the AlertManager Pod. Next we need to modify the configMap associated with alertManager to set the alarm. On the Config TAB, click on the menu bar of the Promethees-AlertManager line, and then select Edit. Use the following code instead of the basic configuration:

global:
  resolve_timeout: 5m
route:
  group_by: [Alertname]
  # Send all notifications to me.
  receiver: demo-alert
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  routes:
    - match:
        alertname: DemoAlertName
      receiver: "demo-alert"

receivers:
  - name: demo-alert
    email_configs:
      - to: [email protected]
        from: [email protected]
        # Your smtp server address
        smarthost: smtp.gmail.com:587
        auth_username: [email protected]
        auth_identity: [email protected]
        auth_password: 16letter_generated token # you can use gmail account password, but better create a dedicated token for this
        headers:
          From: [email protected]
          Subject: "Demo ALERT"
Copy the code

The new configuration will be reloaded by Alertmanager and we can see the GUI under the Status TAB.

Test the end-to-end scenario

Let’s deploy some components for monitoring. Deploying a simple Nginx Deployment is sufficient for the exercise. Using the Rancher GUI, under the Resources -> Workload TAB, click Import YAML, paste the following code (using the default namespace this time), and click Import:

apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 3 # tells deployment to run 2 pods matching the templateTemplate: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.7.9 ports: -containerport: 80Copy the code

In Prometheus UI, we use one of the two expressions previously configured for alarms to look at some metrics:

rate (container_cpu_usage_seconds_total{pod_name=~"nginx-.*", image! ="", container! ="POD"}[5m])
Copy the code

Let’s add some payload to one of the pods to see the value change. When the value is greater than 0.04, we should get an alarm. To do this, we need to select one of the Nginx Deployment Pods and click Execute Shell. Where we will execute a command:

There are three alarm stages:

  • Inactive- The condition is not satisfied

  • Pending- Conditions are met

  • Firing- An alarm is triggered

We have already seen that the alarm is in inactive state, so increasing the load on the CPU allows us to observe the remaining two states:

When an alarm is triggered, it is displayed in the Alertmanager:

Configure Alertmanager to send emails when we receive alarms. If we looked in our inbox, we’d see something like this:

Total knot

We all know how important monitoring is in the overall operation and maintenance process, but monitoring is incomplete without alarms. An alarm can occur when a problem occurs, and then we immediately know that there is a problem in the system. Prometheus includes both functions: the monitoring solution and the alarm functionality of its Alertmanager component. In this article we saw how easy it is to deploy Prometheus using Rancher and integrate Prometheus Server with Alertmanager. We also used Rancher to configure alarm rules and push the Alertmanager configuration, so it can alert us when problems occur. Finally, we saw how to receive an email (which can also be sent via Slack or PagerDuty) with triggering alarm details based on the definition/integration of Alertmanager.