Public account: Operation and maintenance development story author: Jiang Zong

background

Today, I found that the status of many PODS was Evicted, and I didn’t have the permission to monitor them. I wanted to see if there was any special situation in the monitoring picture on Grafana, but I didn’t have the permission to see it.

When I found a large number of Evicted states in pod, I checked the node where the POD was located, and it was already 7 hours ago when Evicted occurred. So there could be a reason: the disk did exceed the default Kubelet resource reservation parameter when Evicted occurred. However, after the problem occurred, the threshold was triggered and the disk space had been reclaimed, causing the disk space to be restored when I looked at it.

On each Kubernetes Node, the default kubelet root directory is /var/lib/kubelet and the log directory /var/log is stored on the Node’s system partition. This partition is also occupied by Pod’s EmptyDir volume, container logs, mirroring layer, and container writable layer. Ephemeral-storage is the management of the system partition.

Why is there an Evicted pod

In my experience with K8S, pods with Evicted status are usually due to disk pressure. It is usually possible to describe pod that the keyword DiskPressure appears in events. Indicates that the disk is under pressure. Maybe your data disk usage has reached 85%.

Why 85%?

Of course there are grounds. Kubelet’s default reservation parameter is 15%, so kubelet will start garbage collection when your disk usage exceeds 85%.

Website: https://kubernetes.io/zh/docs/concepts/scheduling-eviction/node-pressure-eviction/

View the root directory used by docker

/var/lib/docker $docker info /data/docker info /var/lib/docker info Docker Root Dir: /data/docker ...Copy the code

View the docker daemon.json file

$ cat /etc/docker/daemon.json
{
    "log-driver": "json-file",
    "log-opts": {
       "max-size": "1g",
       "max-file": "4"
    },
    "data-root": "/data/docker",
    "storage-driver": "overlay2"
}
Copy the code

I found that the log file of the Docker container I configured is 1G for a single file, and it can rotate 4 files at most, namely 4G. Why does that lead to excessive disk usage? On second thought, I have recently used skywalk-agent in the form of initContainers and mounted to the emptyDir directory of the local host. EmptyDir has one point: once pod is restarted, the data in emptyDir will disappear. If you keep writing data to emptyDir until POD is restarted, the file will get bigger and bigger, and the disk usage will increase.

The disk is full

  • Cannot create Pod (ContainerCreating all the time)
  • Cannot delete Pod from Terminating
  • Unable to exec to container

So with these questions step by step to start the investigation

Lots of Evicted status pods

$ kubectl get po -A -o wide | grep -v "Running"
NAMESPACE              NAME                                                              READY   STATUS             RESTARTS   AGE     IP              NODE                           NOMINATED NODE   READINESS GATES
nsop                   account-service-pre-master-6db67f5cc-5nrgf                        0/1     Evicted            0          103m    <none>          node2.pre.ayunw.cn       				<none>           <none>
nsop                   account-service-pre-master-6db67f5cc-7zbrf                        0/1     Evicted            0          103m    <none>          node2.pre.ayunw.cn       				<none>           <none>
nsop                   account-service-pre-master-6db67f5cc-h78hv                        0/1     Evicted            0          103m    <none>          node2.pre.ayunw.cn       				<none>           <none>
nsop                   account-service-pre-master-6db67f5cc-jj4xx                        0/1     Evicted            0          103m    <none>          node2.pre.ayunw.cn       				<none>           <none>
nsop                   account-service-pre-master-6db67f5cc-jz4cs                        0/1     Evicted            0          103m    <none>          node2.pre.ayunw.cn       				<none>           <none>
nsop                   account-service-pre-master-6db67f5cc-km2cz                        0/1     Evicted            0          103m    <none>          node2.pre.ayunw.cn       				<none>           <none>
Copy the code

This can cause network load when we have too many expelled PODS in our cluster. Because each POD even when expelled is connected to the network, and in the case of cloud Kubernetes clusters, blocks an IP address, this can lead to running out of IP addresses if your cluster has a fixed IP address pool. Also, when we have too many pods in the Evicted state, it becomes difficult to monitor the pods by running the kubectl get Pod command because there are so many Evicted pods. Of course, you can filter Evicted pods by grep or other means.

View any node of Evicted’s pod

Describe Any POD that displays DiskPressure, indicating that the disk is under pressure

$ kubectl describe po account-service-pre-master-6db67f5cc-5nrgf -n nsop ... QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s topology.kubernetes.io/env=pre:NoSchedule topology.kubernetes.io/region=bce-gz:NoSchedule topology.kubernetes.io/type=appserver:NoSchedule topology.kubernetes.io/zone:NoSchedule op=Exists Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 100m default-scheduler Successfully assigned  nsop/account-service-pre-master-6db67f5cc-5nrgf to node2.pre.ayunw.cn Warning Evicted 100m kubelet, node2.pre.ayunw.cn The node had condition: [DiskPressure].Copy the code

Log on to node2.pre-ayunw.cn

[[email protected] ~]# df -Th | egrep -v "overlay2|kubernetes|docker" Filesystem Type Size Used Avail Use% Mounted On devtmpfs devtmpfs 32G 0 32G 0% /dev TMPFS TMPFS 32G 0 32G 0% /dev/shm TMPFS TMPFS 32G 5.9m 32G 1% /run TMPFS TMPFS 32G 0 32G 0% /sys/fs/cgroup /dev/vda1 ext4 50G 7.9G 39G 17% / /dev/vdb1 XFS 200G 138G 63G 69% /data TMPFS TMPFS 6.3G 0 6.3 G/run/user / 0 0%Copy the code

There is still 69% disk space, it seems that there is no problem, should be disk space is also very adequate.

The iostat command is used to query disk I/OS

[[email protected] ~]# iostat -xk 1 3 Linux 5.10.8-1.el8.elrebo.x86_64 (node2-pre-ayunw.c) 08/31/2021_x86_64_ (32)  CPU) avg-cpu: %user % Nice % System % IOwait % Steal % Idle 1.86 0.00 1.77 0.03 0.00 96.34 Device R/S W/S rkB/s wkB/s RRQM /s WRQM /s % RRQM % WRQM r_await w_await aqu-sz rareq-sz wareq-sz SVCTM %util vda 0.02 2.77 0.40 22.43 0.00 2.14 4.57 43.58 1.51 0.75 0.00 24.11 8.09 0.38 0.11 VDB 0.08 126.81 3.31 519.35 0.00 0.54 0.31 0.43 3.20 0.56 0.07 40.29 4.10 0.47 6.01 AVg - CPU: %user % Nice % System % IOwait % Steal % Idle 3.09 0.00 3.34 0.03 0.00 93.54 Device R/S W/S rkB/s wkB/s RRQM /s WRQM /s % RRQM % WRQM r_await w_await aqu-sz rareq-sz wareq-sz SVCTM %util VDA 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 VDB 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 %user %nice % System % IOwait % Steal % Idle 2.74 0.00 2.81 0.00 0.00 94.45 Device R/S W/S rkB/s wkB/s RRQM /s WRQM /s % RRQM % WRQM r_await w_await aqu-sz rareq-sz wareq-sz SVCTM %util VDA 0.00 3.00 0.00 40.00 0.00 7.00 0.00 70.00 0.00 0.67 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00Copy the code

There doesn’t seem to be any IO pressure in sight. But because I don’t have the monitoring authority, I guess I didn’t monitor POD directly. Kubelet’s eviction hard was triggered at that time, some of the disk space was salvaged, and the stress was driven down. But in fact, there is nothing to do with the IO pressure, because the follow-up smell of the monitoring authority of the personnel looked at a period of time before the monitoring, said disk IO completely no pressure.

View logs on the node

Check the kubelet log and message log on the node. No Evicted log is found.

$ tail -500 kubelet.log  | grep "Evicted"
$ tail -500 /var/log/messages  | grep "Evicted"
Copy the code

In this case, we can only temporarily dispose of the POD in the Evicted state

$ kubectl get po -n nsop --field-selector 'status.phase! =Running' -o json| kubectl delete -f -Copy the code

Since it was a disk space problem, I thought to check to see if there were any directories that kept growing. Finally, all the Evicted pods appear to have one thing in common: They all have Skywalking enabled and log to local temporary storage in the form of emptyDir.

At present, the company has changed the default docker directory and kubelet directory to /data directory, node where pod is located, To/data/directory by du – sh. / * | grep G command to check the got/data/kubernetes/kubelet/pods/XXX/volumes/kubernetes. IO ~ empty – dir/vol – apm – empty/logs directory exists skywalking – API. The log Are rotated logs, and the log retention time is not set by default.

The following parameters are enabled by default in the configuration file of Skywalk-agent:

$ egrep -v "^$|^#" agent.config agent.service_name=${SW_AGENT_NAME:Your_ApplicationName} The collector. Backend_service = ${11800} SW_AGENT_COLLECTOR_BACKEND_SERVICES: 127.0.0.1: logging.file_name=${SW_LOGGING_FILE_NAME:skywalking-api.log} logging.level=${SW_LOGGING_LEVEL:INFO} plugin.mount=${SW_MOUNT_FOLDERS:plugins,activations}Copy the code

Skywalk-agent log under emptyDir

[[email protected] vol-apm-empty]# cd logs/
[[email protected] logs]# ll
total 4327672
-rw-r--r-- 1 root root 260328481 Aug 31 09:43 skywalking-api.log
-rw-r--r-- 1 root root 314573222 Aug 12 02:56 skywalking-api.log.2021_08_12_02_56_35
-rw-r--r-- 1 root root 314573394 Aug 13 15:01 skywalking-api.log.2021_08_13_15_01_56
-rw-r--r-- 1 root root 314574277 Aug 15 03:12 skywalking-api.log.2021_08_15_03_12_26
-rw-r--r-- 1 root root 314574161 Aug 16 15:21 skywalking-api.log.2021_08_16_15_21_13
-rw-r--r-- 1 root root 314574334 Aug 18 03:31 skywalking-api.log.2021_08_18_03_31_18
-rw-r--r-- 1 root root 314572887 Aug 19 15:40 skywalking-api.log.2021_08_19_15_40_22
-rw-r--r-- 1 root root 314574238 Aug 21 03:44 skywalking-api.log.2021_08_21_03_44_28
-rw-r--r-- 1 root root 314574144 Aug 22 15:49 skywalking-api.log.2021_08_22_15_49_08
-rw-r--r-- 1 root root 314573963 Aug 24 03:51 skywalking-api.log.2021_08_24_03_51_28
-rw-r--r-- 1 root root 314572991 Aug 25 15:54 skywalking-api.log.2021_08_25_15_54_21
-rw-r--r-- 1 root root 314573321 Aug 27 03:57 skywalking-api.log.2021_08_27_03_57_11
-rw-r--r-- 1 root root 314572890 Aug 28 16:01 skywalking-api.log.2021_08_28_16_01_26
-rw-r--r-- 1 root root 314573311 Aug 30 04:05 skywalking-api.log.2021_08_30_04_05_34
Copy the code

My docker root directory has been changed, not the default /var/lib/docker, but /data/docker. My k8s kubelet directory is also changed, at /data/kubernetes/kubelet.

There are two ways to resolve log overflow temporarily

  • In K8s – check Evicted pod scheduling in which the master node, and then to the/data/kubernetes kubelet/pods down through the du – sh command to find directory footprint of pod, Then delete the log files indicated in the screenshot after rotation (i.e., with time 2021_08_17)
  • The EmptyDir directory will be deleted as soon as the POD restarts.

steps

$CD/data/kubernetes kubelet/pods $du - sh / * | grep G 1.3 G / 02 c9511d 8 c59-0-0787-49 f1 - db239baee79 1.3 G F3ca0. / 079-810 - d - 468 - d - 9136-75 f3d3235b2d 4.8 G / 07 fc67f7 - d0c d46d - 4-8 e14705ae1 f6c - 401 3.0 G . / 091594 a0 - b5ac - 45 c2-8 ad9-7 dcfc91c9e55 1.8 G / 130 a1b35 - b447 - e1-8802-43 eb74aefa566c 1.2 G . / 1 b257c27 - cbaf - 49 f8 - bca3 - ceadc467aad6 2.8 G / 2 ec50216 f81e - 4 e83 dee2 7.0 G - 922 - d - 14316762 Efe baae6. / 321-1-4535-8 a20-0 fdfa6cc3117 8.0 G / 46680114-11 f7-47 af - 9 platform - 347 f56592924...Copy the code

Here I find the pod that occupies 7.0GB, find the pod name based on the directory name, trigger the POD’s CICD, which is equivalent to updating the pod’s deploym.yaml, and then apply -f regenerates the pod

$ docker ps -a | grep "321baae6-1efe-4535-8a20-0fdfa6cc3117" a69b2635ba98 registry.ayunw.cn/tsp/msmessagecenter "/startApp.sh" 5 weeks ago Up 5 weeks k8s_msmessagecenter-perf-dev-v1-0-0_msmessagecenter-perf-dev-v1-0-0-7f746b84bf-wb4g5_tsp_321baae6-1efe-4535-8a20-0fdfa6c c3117_0 c8f2cc0a2737 874552b27b34 "sh -c 'set -ex; Mkdi..." 5 weeks ago Exited (0) 5 weeks ago k8s_init-skywalking-agent_msmessagecenter-perf-dev-v1-0-0-7f746b84bf-wb4g5_tsp_321baae6-1efe-4535-8a20-0fdfa6cc3117_0 C415f52e7489 registry. Ayunw. Cn/library/k8s GCR. IO/pause: 3.2 "/ pause" five weekes line Up five weekes k8s_POD_msmessagecenter-perf-dev-v1-0-0-7f746b84bf-wb4g5_tsp_321baae6-1efe-4535-8a20-0fdfa6cc3117_0Copy the code

Check that the directory has disappeared after the POD is removed completely

$du - sh. / * | grep G 1.3 G / 02 c9511d - 0787-49 formula 1-8 c59-0 db239baee79 1.3 G. / 079 f3ca0-810 - d - 468 - d - 9136-75 f3d3235b2d 4.8 G . / 07 fc67f7 d0c d46d - 4-8 e14705ae1 f6c - 401 3.0 G / 091594 a0 - b5ac - 45 c2-8 ad9-7 dcfc91c9e55 1.8 G . / 130 a1b35 - b447 - e1-8802-43 eb74aefa566c 1.2 G / 1 b257c27 - cbaf - 49 f8 - bca3 - ceadc467aad6 2.8 G . / 2 ec50216 - f81e - 4 e83 dee2 8.0 G - 922 - d - 14316762/46680114-11 f7-47 af - 9 platform - 347 f56592924...Copy the code

Method to permanently resolve the number of log retention

  • Change the parameters of the Dockerfile image or write the configuration file in advance and COPY it when building the image

I will COPY the agent. Config parameter to Dockerfile

$cat Dockerfile FROM registry. Ayunw. Cn/library/alpine: 3.12.0 ENV LANG = C.U TF - 8 \ SKYWLKING_AGENT_VERSION = 8.6.0 RUN the set -eux && mkdir -p /opt/skywalking/agent \ && apk add wget \ && wget https://downloads.apache.org/skywalking/${SKYWLKING_AGENT_VERSION}/apache-skywalking-apm-es7-${SKYWLKING_AGENT_VERSION}. tar.gz -P /tmp/ \ && cd /tmp && tar zxf apache-skywalking-apm-es7-${SKYWLKING_AGENT_VERSION}.tar.gz \ && mv /tmp/apache-skywalking-apm-bin-es7/agent/* /opt/skywalking/agent \ && rm -f / opt/skywalking/agent/optional plugins/apm - spring - the annotation - plugin - 8.6.0. Jar / opt/skywalking/agent/plugins/thrift - plugin - 8.6.0. Jar \ && mv/opt/skywalking/agent/plugins/thrift - plugin - 8.6.0. Jar / TMP/thrift - plugin - 8.6.0. Jar \ && cp -r/opt/skywalking/agent/optional plugins / * / opt/skywalking/agent/plugins / \ && unset export \ && rm -rf /tmp/* /opt/skywalking/agent/config/agent.config COPY agent.config /opt/skywalking/agent/config/ WORKDIR /Copy the code
$ egrep -v "^$|^#" agent.config agent.service_name=${SW_AGENT_NAME:Your_ApplicationName} The collector. Backend_service = ${11800} SW_AGENT_COLLECTOR_BACKEND_SERVICES: 127.0.0.1: logging.file_name=${SW_LOGGING_FILE_NAME:skywalking-api.log} logging.level=${SW_LOGGING_LEVEL:INFO} The plugin. Mount = ${SW_MOUNT_FOLDERS: plugins, activations} # is after I change the following parameters, Max_history_files =${SW_LOGGING_MAX_HISTORY_FILES:3}Copy the code

Max_history_files =${SW_LOGGING_MAX_HISTORY_FILES:-1} but it is commented out by default. I’m going to turn it on here, and I’m going to change negative 1 to 3. By default, -1 indicates the maximum number of historical log files to be retained, and -1 indicates that the maximum number of historical log files to be retained is not set. See skywalking’s official website for the meaning of the parameters.

Then rebuild the Skywalk-Agent image and reference it in deployment.

$ cat deployment.yaml apiVersion: apps/v1 kind: Deployment ... dnsPolicy: ClusterFirst terminationGracePeriodSeconds: 10 serviceAccountName: default imagePullSecrets: - name: registry-auth-ayunw-cn initContainers: - name: init-skywalking-agent image: "registry.ayunw.cn/library/skywalking-agent:33-ac402d20" command: - 'sh' - '-c' - 'set -ex; mkdir -p /skywalking/agent; cp -r /opt/skywalking/agent/* /skywalking/agent; ' volumeMounts: - name: vol-apm-empty mountPath: /skywalking/agent containers: - name: demo-hello-pre-master image: "registry.ayunw.cn/paas/demo-hello:537-c87b6177" ... volumeMounts: - name: vol-apm-empty mountPath: /skywalking/agent volumes: - name: vol-apm-empty emptyDir: {}Copy the code

How can I avoid disk overcrowding?

Ensure that kubelet’s GC and expeller parameters are configured correctly. If the configuration is correct, even if the disk reaches the full point, the Pod on the node has been automatically expelled to other nodes, there will be no Pod continuously ContainerCreating or Terminating problem.

Default GC expulsion parameter

Kubelet has the following default hard ejection conditions: Memory.available <100Mi nodefs.available<10% imagefs.available<15% nodefs.inodesfree <5% (Linux node)Copy the code

In fact, the DiskPressure here can only be roughly detected to be caused by skywalking at present, but without monitoring, it cannot be 100% confirmed that skywalking is the cause. Therefore, if the problem needs to be more accurately located, monitoring and other means should be used to investigate. If you have a better way to solve the problem of POD Evicted status, you are also welcome to reply back and communicate with us.