How to protect your Kubernetes?

Public account: Operation and Maintenance development story author: Jock

With the continuous development of Kubernetes, technology continues to mature, more and more companies choose to deploy their own applications in Kubernetes. But is deploying the application to Kubernetes enough? Obviously not, application containerization is only the first step in a long journey, how to make the application safe and stable operation is all the work.

Here mainly from the following aspects to organize, for most companies enough to use.

Node

Node can be a physical host or a cloud host, which is the carrier of Kubernetes. Most of the time we don’t really care what happens to Node unless it’s an exception. But as operations people, the last thing we want is an exception, and the same goes for Node.

The Node does not need to perform too many complicated operations. The operations are as follows:

> Kernel upgrade

CentOS is the preferred operating system for most enterprises. By default, the series 7 operating system default version is 3.10. The kernel of this version has many known bugs in Kubernetes community, so it is necessary to upgrade the kernel for nodes, or enterprises can choose Ubuntu as the underlying operating system.

To upgrade the kernel, perform the following steps:

Wget https://elrepo.org/linux/kernel/el7/x86_64/RPMS/kernel-lt-5.4.86-1.el7.elrepo.x86_64.rpm RPM - the ivh The kernel - lt - 5.4.86-1. El7. Elrepo. X86_64. RPM cat/boot/grub/grub2. CFG | grep menuentry grub2 - set - the default 'CentOS Linux (5.4.86-1.el7.elrebo.x86_64) 7 (Core)' grub2-editenv list grub2-mkconfig -o /boot/grub2/grub.cfg rebootCopy the code

> Software Update

For most people, updating software is not done in many cases for fear of compatibility issues. However, in actual production, we also need to update the software with known high-risk vulnerabilities, which can be dealt with specifically.

> Optimize the Docker configuration file

For the Docker configuration file, the main optimization is log driver, log retention size and image acceleration, etc., other configuration depends on the situation, as follows:

cat > /etc/docker/daemon.json << EOF { "exec-opts": ["native.cgroupdriver=systemd"], "log-driver": "Json - file", "log - opts" : {" Max - size ":" 100 m ", "Max - the file" : "10"}, "BJP" : "169.254.123.1/24", "oom - score - adjust" : -1000, "registry-mirrors": ["https://pqbap4ya.mirror.aliyuncs.com"], "storage-driver": "overlay2", "storage-opts":["overlay2.override_kernel_check=true"], "live-restore": true } EOFCopy the code

> Optimize kubelet parameters

For K8S, kubelet is the leader of each Node, responsible for the “food and life” of the Node.

cat > /etc/systemd/system/kubelet.service <<EOF [Unit] Description=kubelet: The Kubernetes Node Agent Documentation=https://kubernetes.io/docs/ [Service] ExecStartPre=/usr/bin/mkdir -p /sys/fs/cgroup/pids/system.slice/kubelet.service ExecStartPre=/usr/bin/mkdir -p /sys/fs/cgroup/cpu/system.slice/kubelet.service ExecStartPre=/usr/bin/mkdir -p /sys/fs/cgroup/cpuacct/system.slice/kubelet.service ExecStartPre=/usr/bin/mkdir -p /sys/fs/cgroup/cpuset/system.slice/kubelet.service ExecStartPre=/usr/bin/mkdir -p /sys/fs/cgroup/memory/system.slice/kubelet.service ExecStartPre=/usr/bin/mkdir -p /sys/fs/cgroup/systemd/system.slice/kubelet.service ExecStart=/usr/bin/kubelet \ --enforce-node-allocatable=pods,kube-reserved \ --kube-reserved-cgroup=/system.slice/kubelet.service \ --kube-reserved=cpu=200m,memory=250Mi \ --eviction-hard=memory.available<5%,nodefs.available<10%,imagefs.available<10% \  --eviction-soft=memory.available<10%,nodefs.available<15%,imagefs.available<15% \ --eviction-soft-grace-period=memory.available=2m,nodefs.available=2m,imagefs.available=2m \ --eviction-max-pod-grace-period=30 \ --eviction-minimum-reclaim=memory.available=0Mi,nodefs.available=500Mi,imagefs.available=500Mi Restart=always StartLimitInterval=0 RestartSec=10 [Install] WantedBy=multi-user.target EOFCopy the code

Its function is mainly to increase resource reservation for each Node, which can prevent Node downtime to a certain extent.

> Log configuration management

The log configuration management here applies to system logs, not self-developed application logs. By default, no special configuration is required for system logs, which I mention here to ensure that logs can be traced. In the event that the system is hacked for some reason and the system is deleted, the logs are also available for analysis.

Therefore, it is necessary to back up system logs of Node nodes remotely if conditions permit. Rsyslog can be used for configuration management, and logs can be saved to the remote log center or OSS.

> Security Configuration

Security configuration This section does not involve much, but mainly focuses on hardening known security problems. Here are five (and more, depending on your situation) :

SSH password expiration policy
Password Complexity Policy
Maximum number of SSH login attempts
System Timeout Configuration
History Configuration

Pod

Pod is the minimum scheduling unit of K8S and the carrier of application. Its stability is directly related to the application itself. When deploying the application, several aspects should be considered.

> Resource constraints

Pod uses host resources. Proper resource restriction can effectively prevent resource oversold or resource preemption. When configuring resource limitation, you need to determine the Pod QoS based on the actual application. The QoS configuration varies with the application.

If the application level is high, you are advised to configure the Guaranteed level as follows:

resources:
  limits:
    memory: "200Mi"
    cpu: "700m"
  requests:
    memory: "200Mi"
    cpu: "700m"
Copy the code

If the application level is normal, you are advised to configure the Burstable level as follows:

resources:
  limits:
    memory: "200Mi"
    cpu: "500m"
  requests:
    memory: "100Mi"
    cpu: "100m"
Copy the code

Pods of the BestEffort type are strongly discouraged.

> Scheduling Policies

The scheduling policy also depends on the situation. If your application needs to schedule certain nodes, you can use affinity scheduling as follows:

affinity:
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - preference: {}
        weight: 100
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: env
              operator: In
              values:
                - uat
Copy the code

If a node allows only one application to be scheduled, then spoiling scheduling is required, that is, spoiling the node first, and then the Pod that needs to be scheduled to that node needs to tolerate the spoiling. The safest way is a combination of label and stain. As follows:

Tolerations: -key: "key1" # smidges key operator: "Equal"; "Value1" #effect: "NoExecute" #effect policy tolerationSeconds: 3600 # How long the original pod is expelled. Note that only effect: "NoExecute" can be set; otherwise, an error occursCopy the code

Of course, in addition to the association between Pod and Node, there is also the association between Pod and Pod. Generally, in order to achieve real high availability, we do not recommend that all the PODS of the same application can be scheduled to the same Node, so we need to make anti-affinity scheduling for Pod, as follows:

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: app
          operator: In
          values:
          - store
      topologyKey: "kubernetes.io/hostname"
Copy the code

If an application is affinity to another application, affinity can be used, which can reduce network latency to a certain extent, as follows:

affinity:
  podAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: security
          operator: In
          values:
          - S1
      topologyKey: failure-domain.beta.kubernetes.io/zone
Copy the code

> Elegant upgrade

By default, Pod adopts the rolling update strategy. We mainly focus on how the old Pod can handle traffic gracefully after the new Pod is up, and it is insensitive to the outside world.

The easiest way to do this is to “sleep for a few seconds”, which does not guarantee 100% grace with traffic. Here’s how:

lifecycle:
  preStop:
    exec:
      command:
      - /bin/sh
      - -c
      - sleep 15
Copy the code

If you have a registry, you can remove the original service from the registry before exiting, such as nacOS used as the registry, as follows:

lifecycle: preStop: exec: command: - /bin/sh - -c - "curl -X DELETE your_nacos_ip:8848/nacos/v1/ns/instance? serviceName=nacos.test.1&ip=${POD_IP}&port=8880&clusterName=DEFAULT" && sleep 15Copy the code

Probe Configuration

Is the probe important? Important! It is an important basis for Kubelet to determine whether a Pod is healthy or not.

The main probes of Pod are:

livenessProbe
readinessProbe
startupProbe

StartupProbe is newly added after V1.16. StartupProbe is mainly used for applications that take a long time to startup. In most cases, you only need to configure livenessProbe and readinessProbe.

Typically, a Pod on behalf of an application, so when configuring probe can best direct response is normal, many frameworks are to carry health detection function, we can consider to use these in the configuration probe health detection function, if the framework, also can consider to let developers to develop a health check, This makes it easier to standardize health tests. As follows:

readinessProbe:
  failureThreshold: 3
  httpGet:
    path: /health
    port: http
    scheme: HTTP
  initialDelaySeconds: 40
  periodSeconds: 10
  successThreshold: 1
  timeoutSeconds: 3
livenessProbe:
  failureThreshold: 3
  httpGet:
    path: /health
    port: http
    scheme: HTTP
  initialDelaySeconds: 60
  periodSeconds: 10
  successThreshold: 1
  timeoutSeconds: 2
Copy the code

To configure startupProbe, perform the following operations:

StartupProbe: httpGet: Path: / Health Prot: 80 failureThreshold: 10 initialDelay: 10 periodSeconds: 10Copy the code

> Protection Policies

The protection policy here mainly refers to the control of the number of PODS running through the protection policy when we actively destroy the Pod.

K8S uses PodDisruptionBudget (PDB) to do this. For some important applications, we need to configure the PDB as follows:

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: pdb-demo
spec:
  minAvailable: 2
  selector:
    matchLables:
      app: nginx
Copy the code

In the PDB, the number of pods is controlled primarily by two parameters:

MinAvailable: indicates the minimum number of available pods, which indicates the minimum number of pods in the running state or the percentage of the total number of pods in the running state in the Pod cluster.
MaxUnavailable: Indicates the maximum number of unusable pods, which indicates the maximum number of pods in an UNusable state in a Pod cluster or the percentage of the number and total number of unusable pods.

Note: minAvailable and maxUnavailable are mutually exclusive, meaning that only one of them can occur at a time.

The log

Logs cover the entire application life cycle and are indispensable for troubleshooting and data analysis. Logs are analyzed in the following aspects.

> Log standard

Logs are generally divided into service logs and exception logs. For logs, we do not want them to be too complicated, nor too simple. We want to use logs to achieve the following goals:

Recording and monitoring the operation of the program;
To know the internal running state of the program in detail when necessary;
Minimize the impact on system performance;

So how do you define logging standards? I would like to summarize the following points:

Use log classification properly
Unified output format
Code code specification
Unified log output paths
Unified naming conventions for log output

The main purpose of this specification is to facilitate the collection and viewing of logs.

> collect

Different log collection schemes are available for different log output:

A Logging Agent is deployed on a Node to collect data
Collect in Pod as Sidecar

A Logging Agent is deployed on a Node to collect data

This log collection solution is mainly for logs that have been outputted as follows:There is no way to collect logs that are not standard output.

Collect in Pod as Sidecar

This collection solution is mainly for non-standard logs. You can run the log collection client in Pod in sidecar mode to collect logs to the log center. The architecture is as follows:However, this approach is a waste of resources, so ideally all application logs should be stdout so that they can be collected easily.

> analysis

Under normal business conditions, we rarely look at the log content and only analyze the problem when it goes wrong (most of the time), so why am I bringing it up here?

In fact, logs carry a lot of information. If we can effectively analyze logs, it can help us identify and troubleshoot many problems. For example, the log center of Ali Cloud has done a good job in log analysis.

> the alarm

Logs and alarms help you quickly know problems and narrow the scope of troubleshooting. But to do log alarms must be well log “keywords” management, which is to determine a certain keywords can accurately represent a problem, had better not appear refers to the phenomenon, the advantage is able to make alarm more preparation, rather than some alarm storm or invalid alarm, gradually is numb.

monitoring

The life cycle of clusters and applications cannot be separated from the monitoring system. An effective monitoring system can provide us with higher observability and facilitate us to analyze, troubleshoot and locate problems linearly. Coupled with effective alarm notification, it is also convenient for us to quickly know problems.

Monitoring is mainly introduced from the following aspects.

> Cluster Monitoring

Prometheus is commonly used for monitoring K8S clusters and applications running on K8S. The stability of the entire cluster is related to the stability of applications, so monitoring the cluster is crucial. The following lists some monitoring items for handling in actual work.

> Application Monitoring

In many enterprises, application monitoring is not integrated into the application, and monitoring metrics are not integrated into the application, so it is strongly recommended that application monitoring be added during application development and metrics be exposed in Prometheus standard format.

In addition to developer active exposure of metrics, we can also configure our EXPORTER through javaAgent to fetch metrics such as JVM monitoring metrics.

Monitoring at the application level can be more granular, which makes it easier to detect problems. Here are a few simple app monitoring items:

Each of these monitoring items is operated by a close friend, such as Redis middleware and BLcakbox.

> Event Monitoring

In Kubernetes, there are two types of events. One is a Warning event, indicating that the state transition that generated the event was between unexpected states. The other is a Normal event, which indicates that the desired state is the same as the current state.

In most cases, events refer to what is happening or has happened. It is easy to ignore such information in practical work, so it is necessary for us to use event monitoring to avoid such problems.

In K8S, the common event monitoring is kube-eventer, which can collect the events of pod/ Node/Kubelet and other resource objects, as well as the events of custom resource objects, and then send such information to relevant personnel.

Through events, we mainly pay attention to the following monitoring items:

> Link Monitoring

Under normal circumstances, applications in K8S exist independently and have no explicit connection with each other. At this time, a means is needed to show the relationship between applications, which is convenient for us to track and analyze the problems of the whole link.

At present, there are many popular link monitoring tools. I mainly use Skywalking for link monitoring, whose main agent is rich and provides high self-expansion capability. Interested friends can know about it.

Link monitoring is used to achieve the following objectives.

> Alarm Notification

Many people ignore alarm notifications and think that alarms are ok. However, you need to be careful when making alarm notifications.

Here’s a quick list of concerns.Personally, the difficulty isWhich indicators need alarms. We must follow the following rules when selecting indicators:

The indicators of alarms are unique
Alarm indicators correctly reflect problems
The problems exposed need to be solved by borrowing

Considering these rules, it is convenient for us to select the desired indicators.

The second is the classification of emergency degree, which is mainly based on whether the problems exposed by the alarm index need to be solved in time and the scope of impact to measure comprehensively.

Fault upgrade is a strategy to improve the fault level and the emergency level. Notification channel classification helps us distinguish different alarms and receive alarm information quickly.

Write in the last

These are the basic operations that are necessary for YAML engineers to be able to use in most companies.

If you have any mistakes or better suggestions, you are welcome to leave a message or join a group to communicate.