A,background

Recently, I migrated the inventory system to K8S cluster in my company. I learned that the K8S cluster supports limiting the resource usage of each application to avoid the failure of the whole cluster caused by the unlimited memory or excessive CPU usage of one application. Therefore, we try to reconstruct the resource limitation of the existing high concurrency system. After practice, we finally find that if this feature is not set properly, it will bring disaster to the application system.

Second,Using the environment

1. Docker version

rke@k8s-master-dev-1:~$ docker -v

Docker version 18.06.0-ce, build 0ffa825

2. K8S version

root@k8s-master-dev-1:~# su – rke

rke@k8s-master-dev-1:~$ kubectl version

Client Version: Version. The Info {Major: “1”, Minor: “18”, GitVersion: “v1.18.3 GitCommit:” 2 e7996e3e2712684bc73f0dec0200d64eec7fe40 “, GitTreeState:”clean”, BuildDate:” 2020-05-20T12:52:00z “, GoVersion:”go1.13.9″, Compiler:”gc”, Platform:” Linux /amd64″}

Server Version: Version. The Info {Major: “1”, Minor: “18”, GitVersion: “v1.18.14 GitCommit:” 89182 bdd065fbcaffefec691908a739d161efc03.” GitTreeState:”clean”, BuildDate:” 2020-12-18T12:02:35z “, GoVersion:”go1.13.15″, Compiler:”gc”, Platform:” Linux /amd64″}

rke@k8s-master-dev-1:~$

Three,Phenomenon of the problem

1. Carry out the following configuration in K8S

FIG. 1



Figure 2



– Set the initial memory value to 512MB and the maximum memory value to 4500MB (since the JVM limits the maximum memory usage to 4096MB, this is reserved for the operating system and other programs)

The CPU is set to an initial value of 100MI and a maximum value of 500MI

2. Problem phenomena

(1) Application card, sometimes can open the page, sometimes can not open the page

This configuration was removed and the application returned to normal

(2) POD will be continuously restarted every 20 to 30 minutes during program operation, interrupting business use

After checking the operating system log, we found that POD occurred OOM and was killed by the operating system (OOM-Killer)

May 11 15:29:12 K8S-Node-Prod-3 Kernel: [18750.337581] Java Invoked OM-Killer: gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=969

May 11 15:29:12K8S-node-prod-3 kernel: [18750.337621] OOM_KILL_PROCESS.COLD.30 + 0xB / 0x1CF

Four,The problem summary

1, CPU as far as possible do not limit the size, should give full play to its characteristics

2, the Java application should use -XMS and -Xmx to set the minimum and maximum memory, it is better not to set the memory limit on K8S, otherwise because the setting is too small or not in place, have an impact on business

JAVA_OPTS=”$JAVA_OPTS -server -Xms4096M -Xmx4096M -Xss512k -XX:+AggressiveOpts -XX:+UseBiasedLocking -XX:+DisableExplicitGC -XX:MaxTenuringThreshold=15 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:LargePageSizeInBytes=128m -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -Djava.awt.headless=true”

Practice is the truth, production is no small matter, carefully treat and evaluate the impact of each configuration change on the production business.