In general, every company has a standard specification for JVM parameters, and there are even some default configurations at the company level, which will be optimized at the code level, JVM level, and even Linux server level if performance issues are encountered (for more specific usage scenarios).

Heap Settings

-Xms: indicates the initial heap size

-Xmx: indicates the maximum heap size

-xx :NewSize=n: Sets the size of the young generation

-xx :NewRatio=n: Sets the ratio of the young generation to the old generation. For example, n=3 indicates that the ratio of the young generation to the old generation is 1:3, and the young generation accounts for 1/4 of the sum of the young generation and the old generation

-xx :SurvivorRatio=n: ratio of Eden zone to two Survivor zones in a young generation. Notice that there are two Survivor zones. For example, n=3 indicates Eden: Survivor= 3:2, and a Survivor zone accounts for 1/5 of the entire young generation

-xx :MaxPermSize=n: Set the persistent generation size

Collector setup

-xx :+UseSerialGC: Sets the serial collector

-xx :+UseParallelGC: Sets the parallel collector

-xx :+UseParalledlOldGC: sets the parallel generation collector

-xx :+UseConcMarkSweepGC: Sets the concurrent collector

Garbage collection statistics

-XX:+PrintGC

-XX:+PrintGCDetails

-XX:+PrintGCTimeStamps

-Xloggc:filename

Parallel collector setup

-xx :ParallelGCThreads=n: Sets the number of cpus used by the parallel collector for collection. Collect the number of threads in parallel.

-xx :MaxGCPauseMillis=n: Sets the maximum pause time for parallel collection

-xx :GCTimeRatio=n: Sets the percentage of garbage collection time in the running time of the program. The formula is 1/(1+n) concurrent collector setting

-xx :+CMSIncrementalMode: set it to incremental mode. This mode applies to a single CPU.

-xx :ParallelGCThreads=n: Sets the number of cpus used when the collection mode of the young generation of the concurrent collector is parallel collection. Collect the number of threads in parallel.

Tuning strategy

Young Generation setting

  • Response time first applications: Set as large as possible until approaching the minimum response time limit of the system (selected based on the actual situation). In this case, the frequency of young generation collection is the smallest. At the same time, reduce the number of objects that reach the aged generation.
  • Throughput first applications: as large as possible, possibly up to Gbit. Since there is no response time requirement, garbage collection can be done in parallel and is generally suitable for applications with more than 8 cpus.

Old generation setting

  • Response-first applications: The older generation uses a concurrent collector, so its size needs to be carefully set, generally taking into account parameters such as and. If the heap is set too small, memory fragmentation, high recycle rates, and application pauses can result in traditional token cleanup. If the heap is large, it takes longer to collect. The optimal scheme, which generally reduces the time spent by the young generation and the old generation, generally improves the efficiency of the application
    • Concurrent garbage collection information
    • Number of concurrent collections for persistent generation
    • Traditional GC information
    • The proportion of time spent on recycling between the younger and older generations
  • Throughput-first applications: Typically throughput-first applications have a large young generation and a small old generation. The reason for this is that it is possible to recycle most of the short-term objects and reduce the medium-term objects, while the old generation is used to store the long-term living objects.

Fragmentation problems caused by smaller heaps

  • XX: + UseCMSCompactAtFullCollection: when using concurrent collector, open to the older generation of compression.

  • XX: CMSFullGCsBeforeCompaction = 0: the above configuration under the condition of open, set how much time after Full GC, here to compress old generation

The official guide

  • -xMS180m -XMx180m Heap size: set the heap size to 3-4 times that of old age objects, that is, 3-4 times that of old age objects after FullGC.

  • -xx :MetaspaceSize=64M -xx :MaxMetaspaceSize=64M Metasize: Set the MetaspaceSize value to 1.2 to 1.5 times that of the object that survived in the old age

  • -Xmn64m Young generation: Set it to 1 to 1.5 times that of the old generation object

  • Old age: Set to 2-3 times the number of old age objects.

    For example,

  jstat -gc pid
  # # #
   S0C    S1C    S0U    S1U      EC       EU        OC         OU       MC     MU    CCSC   CCSU   YGC     YGCT    FGC    FGCT     GCT   
  13824.0 22528.0 13377.0  0.0   548864.0 535257.2  113152.0   46189.3   73984.0 71119.8 9728.0 9196.2     14    0.259   3      0.287    0.546
Copy the code

OU indicates that the memory occupied by the old years is 47189.3K (about 47M); The JVM configuration parameters should be modified as follows

-XX:MetaspaceSize=128M -XX:MaxMetaspaceSize=128M -Xms256m -Xmx256m
Copy the code

Production configuration Suggestions

Recommended JVM parameters for critical systems

  1. To cancel biased locking: -xx: -usebiasedlocking

  2. Integer Cache size: -xx :AutoBoxCacheMax=20000

  3. Access the parallel zero memory page at startup: -xx :+AlwaysPreTouch

  4. SecureRandom generation acceleration: -djava.security.egd =file:/dev/./urandom

Optional performance parameters

  1. -xx :+PerfDisableSharedMem :+PerfDisableSharedMem :+PerfDisableSharedMem :+PerfDisableSharedMem :+PerfDisableSharedMem :+PerfDisableSharedMem: You can’t end this Stop the World safe spot
  2. -xx: -usecounterdecay Disables JIT call counter decay. By default, the call counter is slashed in half on each GC, causing some methods to stay warm and never reach the threshold of 10,000 C2 compilations.
  3. -xx: -tieredCompilation is a proud feature that is enabled by default in JDK8. It is statically compiled in C1 and then compiled in C2 after sufficient sampling. However, in our field tests, the final performance decreased slightly by 2%, probably because C2 was not compiled after C1 was compiled for some methods. There are also more occasional service timeouts when the application starts, perhaps because it is busy compiling. So we disabled it, but remember to turn on -xx: -usecounterdecay to avoid some warm method having to be interpreted forever.

GC policy

For the sake of robustness, it is better to use CMS for heaps under 8GB. G1 is now the default, but it does not perform better in small heaps, and JDK11’s ZGC is expected.

-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
Copy the code

Throughput priority garbage collector

-XX:+UseParallelGC
Copy the code

Set the promotion

-XX:MaxTenuringThreshold=2

This is the most obvious parameter to change. The maximum number of Young GC’s an object can survive in a Survivor zone to advance to the aged generation. CMS defaults to 6 in JDK8 and 15 in others such as G1.

Young GC pause source is one of the biggest application, and how much after YGC live objects directly affect the pause time, so if clear Young GC perform most of the temporary objects in the frequency and the application of the longest life cycle, you can set it a little shorter, that is not a temporary object of Cenozoic objects quickly promoted to the older generation, don’t stay.

Use – XX: + PrintTenuringDistribution under observation, if the size of the generations behind is always the same, that after a certain age object can always promotion to the old generation, can set the promotion threshold is small, such as in the JMeter 2 is enough.

– XX: + ExplicitGCInvokesConcurrent but don’t add – XX: + DisableExplicitGC

Enable full GC to use CMS algorithm, not full pause, mandatory.

If system.gc() is disabled, it may not be a good thing, as long as there are no lousy libraries, so it should not be added.

Monitoring configuration

1. -XX:+PrintCommandLineFlags

O&m sometimes makes temporary changes to startup parameters and outputs each startup parameter to STDout for future reference. It prints out the parameters set on the command line and the parameters implicitly affected by these parameters, such as -xx :+UseParNewGC when CMS is turned on.

2. -XX:-OmitStackTraceInFastThrow

Setting up StackTrace for exceptions is an expensive operation, so when the application throws the same exception in the same place N times (20,000?) After that, the JVM optimizes certain exceptions such as NPE, array out-of-bounds, and so on, and no longer carries the exception stack. At this Point, you may see a series of Nul Point exceptions in the log, and the previous output of the complete stack log has been scrolled to where, so you have no idea where the NPE happened. So, disable it, ElasticSearch does it too.

Crash file

1. -XX:ErrorFile

When the JVM crashes, Hotspot generates an error file that provides details about the status of the JVM. As mentioned earlier, print it to a fixed directory so you don’t have to look around for it. %p in the file name is automatically replaced with the application PID

-XX:ErrorFile=/var/log/hs_err_pid<pid>.log
Copy the code

2. coredump

Of course, it is better to generate coredump, which can be converted from Heap Dump, Thread Dump, and crash.

Add ulimit -c unlimited or some other option to the startup script. If you have root permission, it is better to set the output directory

echo "/{MYLOGDIR}/coredump.%p" > /proc/sys/kernel/core_pattern
Copy the code

What? You don’t know what the coredump is for? It seems that you are a happy person who has never encountered JVM Segment Fault.

3. – XX: + HeapDumpOnOutOfMemoryError (optional)

When the JVM is dying Out Of Memory, the Heap is dumped to the specified file. Otherwise the development of a lot of times really do not know how to reproduce the error.

The path only points to the directory, and the JVM keeps the filename unique, called javA_pid ${pid}.hprof. Because if you point to a file that already exists, you can’t write to it.

-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${LOGDIR}/
Copy the code

However, in the container environment, the OUTPUT of 4G HeapDump will cause more than 20 seconds of DISK I/O full on ordinary hard disks, and it is also a real bad neighbor, affecting all other containers on the same host.