Original: https://blog.mygraphql.com/zh/posts/cloud/containerize/java-containerize/java-containerize-resource-limit/

origin

Time back to 2017, the old team will be on Kubernetes, lucky to participate in and learn (mainly learn). There was a pitfall that all Java containerizers suffer from: the JDK8 Not Containerized Design Syndrome. The simplest example is Runtime.getRuntime().availableProcessors() which returns the number of CPUs on the host, rather than the CPU share/quota that the container itself would have expected, or K8s CPU request/limit.

By 2021, things should be cool (though wages still can’t keep up with CPI and house prices). Although my current project is still using JDK8, it is still JDK 1.8.0_261 and has backport a lot of containerized features to this version. Recently in the project performance optimization, in the quagmire of ISTIO struggling.

Suddenly the students in front of the good news: after changing the POD CPU request from 2 to 4, the performance has been significantly optimized. I envy envy 😋 while, curiously studied once principle.

The principle of

Logic of linear thinking

Kubernetes uses cgroup for resource limitation:

  • The CPU request corresponds to the share indicator of the cgroup. Specify the priority of each container in case the host CPU is insufficient and each container needs to compete for CPU (number is high first, scale)
  • The CPU limit corresponds to the limit target of the cgroup. This is a hard limit and cannot be exceeded. Excess will slow the thread.

Then the problem comes, the test environment host CPU resources are sufficient, there is no container to fight for the CPU situation. So, why does a larger CPU request significantly optimize performance?

Possibility:

  1. Linear thinking: Linux CFS Scheduler implementation is not very good, in nonContainers need to compete for CPUIn this case, the CPU request still affects the scheduling
  2. Skeptic: The new version of JDK8 just automatically calculates default configurations, such as thread pools, based on CPU requests.

As a Java-only programmer, I focus on the latter.

verification

As a programmer who only knows how to write code, nothing speaks for you like the programs you run. At the very least, the machine won’t run your program faster (don’t mention Linux Taskset to me) just because you’re friendly with it, or waiting for you to give it some air, or fill in a KPI, and it won’t generate a small report that has anything to do with relationships.

Let’s go back and look at the POD configuration:

    resources:
      limits:
        cpu: "16"
      requests:
        cpu: "2"

Into the container:

$ cd /tmp
$ cat <<EOF > /tmp/Main.java
public class Main {
    public static void main(String[] args) {
        System.out.println("Runtime.getRuntime().availableProcessors() = " +
                Runtime.getRuntime().availableProcessors());
    }
}
EOF

$ javac Main.java
$ java -cp . Main
Runtime.getRuntime().availableProcessors() = 2

CPU request:

    resources:
      limits:
        cpu: "16"
      requests:
        cpu: "4"

Into the container:

$ cd /tmp
$ java -cp . Main
Runtime.getRuntime().availableProcessors() = 4

As you can see, Java gets the number of CPUs from the CPU request in the container configuration.

The influence of availableProcessors ()

Also look at the effect of availableProcessors(). The purpose of -XX:+PrintFlagsFinal is to print the calculated default configuration when the JVM starts.

$java-xx :+ PrintFlagsFinal-cp. Main > req1.txt # Request CPU =4 $java-xx :+ PrintFlagsFinal-cp. Main > req4.txt
$ diff req1.txt req4.txt 2c2 < intx ActiveProcessorCount = -1 {product} --- > intx ActiveProcessorCount := 4 {product} 59c59 < intx CICompilerCount := 2 {product} --- > intx CICompilerCount := 3 {product} 305c305 < uintx MarkSweepDeadRatio  = 5 {product} --- > uintx MarkSweepDeadRatio = 1 {product} 312c312 < uintx MaxHeapFreeRatio = 70 {manageable} --- > uintx MaxHeapFreeRatio = 100 {manageable} 325c325 < uintx MaxNewSize := 178913280 {product} --- > uintx MaxNewSize := 178782208 {product} 336 C336 := 196608 {product} < uintx minheapFreeratio = 40 {manageable} --- > uintx MinHeapDeltaBytes := 524288 {product} > uintx MinHeapFreeRatio = 0 {manageable} 360c360 < uintx  NewSize := 11141120 {product} --- > uintx NewSize := 11010048 {product} 371c371 < uintx OldSize := 22413312 {product} --- > uintx OldSize := 22544384 {product} 389c389 < uintx ParallelGCThreads = 0 {product} --- > uintx ParallelGCThreads = 4 {product} BOOL UseParallelGC = false {product} < BOOL UseParalleloldGC = false {product} -- > bool UseParallelGC := true {product} > bool UseParallelOldGC = true {product} 738c738 < Runtime.getRuntime().availableProcessors() = 1 --- > Runtime.getRuntime().availableProcessors() = 4

As you can see, availableProcessors() affects the number of GC threads in the JVM, the number of JIT threads, and even the GC algorithm. A bigger problem is that some servlet Containers (such as Jetty) and Netty also use this number by default to configure their thread pools.

ilya

If the Linux CFS Scheduler is still considered to be affected by a cgroup share(CPU request) when the host CPU is surplus, this possibility needs to be ruled out. Then after POD is pulled up, directly use Linux terminal to modify the share file of cgroup, increase it by a factor of two, and test it again, you can know. Yes, anti-patterns are a common way to troubleshoot problems. But I didn’t do this test because I didn’t want to be too scientific 🙃 Everything is a line.

Fill in the pit

It’s a programmer’s job to fill holes, whether you like it or not, whether you dug the hole or left it behind. There are several ways to fill this hole:

  1. Modify the POD CPU request to the usage amount in busy time, that is, increase the request, and the limit remains the same
  2. Upgrade to JDK11, open by defaultPreferContainerQuotaForCPUCountParameters, i.e.,availableProcessors()Returns the CPU limit number.
  3. All default usesavailableProcessors()Where, change to explicitly specify, such as GC thread number, Netty thread number…
  4. CPU request/limit remains the same, i.e., request is much smaller than limit. But it explicitly tells the JVM the number of CPUs it can use.

International custom, I chose 4. The reason:

  • If POD is configured with a large request, it is equivalent to locking up the host’s resources. The actual resource utilization of the host must be reduced. In fact, this request is just a peak demand at busy time, such as compilation at startup or buying by e-commerce.
  • For all default usesavailableProcessors()Where, change to explicitly specified. This workload is large, the use of the unknown futureavailableProcessors()Out of your control.
  • Upgrading JDK11 is not something I can decide for a programmer

I Just do it. I Just do it.

In other words, since JDK 8U191, the -XX:ActiveProcessorCount= Count parameter has been supported, which tells the JVM the actual number of CPUs available. So, as long as:

Java-XX :+ PrintFlagsFinal-XX :ActiveProcessorCount=$POD_CPU_LIMIT -CP. MAIN # If $POD_CPU_LIMIT is too large, adjust it

– XX: see ActiveProcessorCount instructions: https://www.oracle.com/java/t…

conclusion

Clearly, this is a Blog that should have been written years earlier. Your home is probably no longer using JDK8. It usually goes directly to the JDK11 LTS. Or, what this paper wants to say is a method and attitude of proving the problem. It may not directly bring you any benefits, sometimes, even very annoying to some people, affect your good prospects for promotion. But if an industry is to progress, it depends on that sentiment. There is a word in English: Nerd. To describe this attitude.


Further reading

Prehistoric availableProcessors()

Until JDK8 is designed for containerization, it will have to be done on its own. There are two methods (layers) :

  1. Mount bind modifies the system file for the number of kernel CPUs
  2. Overload the sysconf function of gun libc
  3. Overload the JVM_ActiveProcessorCount function in Linux’s dynamic link.so and return it after customization

Approach 3 is relatively simple. Here’s just method 2:

Reference: https://stackoverflow.com/que…

#include <stdlib.h> #include <unistd.h> int JVM_ActiveProcessorCount(void) { char* val = getenv("_NUM_CPUS"); return val ! = NULL ? atoi(val) : sysconf(_SC_NPROCESSORS_ONLN); }

First, make a shared library of this:

gcc -O3 -fPIC -shared -Wl,-soname,libnumcpus.so -o libnumcpus.so numcpus.c

Then run Java as follows:

$ LD_PRELOAD=/path/to/libnumcpus.so _NUM_CPUS=2 java AvailableProcessors

Methods 1 and 2 are generic and work equally well for non-Java ecosystems such as JNI, but the implementation requires some knowledge of Linux. For reference: https://geek-tips.imtqy.com/a… , https://github.com/jvm-profil…

reference

https://christopher-batey.med…

https://www.batey.info/docker…

https://mucahit.io/2020/01/27…

https://blog.gilliard.lol/201…

https://cloud.google.com/run/…

https://stackoverflow.com/que…

https://www.oracle.com/java/t…

https://stackoverflow.com/que…

https://bugs.openjdk.java.net…

https://programmer.group/5ce1…