Summary of JVM troubleshooting

After JDK1.8, the JVM has removed the permanent band by default and migrated the permanent band to MetaSpace. The previously configured MaxPermSize is invalid in JDK1.8. The default MetaSpace size is 64 MB, and 70% of MetaSpace is expanded by default. With frequent expansion of MetaSpace, full GC or service may be suspended. If metaspace utilization is high, consider optimizing the JVM memory configuration.

In order to systematically master GC problem processing, here is a learning path, and the overall framework of the article is also developed in accordance with this structure, mainly divided into four steps.

Build a body of knowledge: learn the basics of GC, from JVM memory structure to garbage collection algorithms and collectors, and master some common GC problem analysis tools.
Determine the metrics: Understand the metrics for basic GC, how to set the metrics for individual systems, and how to determine if GC is a problem in a business scenario.
Scene tuning practice: Analyze and solve nine common GC problem scenarios in CMS by using the knowledge and system evaluation indicators.
Summary of optimization experience: summarize the whole process and put forward suggestions, and improve the summarized experience into the knowledge system.

1. GC

1.1 Explanation of some nouns

TLAB: Mutator Threads, short for Thread Local Allocation Buffer, can allocate objects to a block of memory in Eden preferentially. As Java Threads have no lock contention, the Allocation speed is faster. Each TLAB is a thread exclusive.
Mutator: The role of garbage producer, that is, our application, garbage producer, allocates and frees through Allocator.

1.2 Memory Division

The latest version of Java is Java 16, future Java 17 and now Java 11 and Java 8 are LTS versions, and the JVM specification is changing with iteration. Since this article focuses on CMS, the memory structure of Java 8 is included here.GC works mainly in the Heap area and MetaSpace area (blue). In the direct memory (MetaSpace is in this area), if you are using DirectByteBuffer, then the GC is managed indirectly through the Cleaner when the allocated memory is not enough. We’ll cover the basics of any automatic memory management system: allocating space for new objects, and then collecting garbage object space.

1.3 Assigning Objects

In Java, the unaddressed object address operation mainly uses the ALLOCATE and free methods of C. There are two allocation methods:

Free list: Converts random I/O to sequential I/O by storing additional free addresses, but incurs additional space consumption
Bump Pointer: When allocating memory using a pointer as a demarcation point, you only need to move the pointer to the free end equal to the size of the object. The allocation efficiency is high, but the usage is limited.

1.4 Collecting Objects

1.4.1 Identifying garbage

Reference Counting: Counts the references of each object. When a Reference is made, the counter +1 and the Reference fails, -1. The Reference count is placed in the object header. Although recycled references can be solved by the Recycler algorithm, the Recycler algorithm was used by early programming languages to solve the Recycler problem in multithreaded environments, where reference count changes require costly synchronization and low performance.
Reachable objects can be searched from GC Root. At this time, it is not enough to determine whether the objects are alive or dead. Multiple marks are needed to determine the objects more accurately. Currently, mainstream Java VMS use this algorithm.

Tips: In current JVMS, objects that can be used as GC Root include the following:

Objects referenced in the virtual machine stack (local variables in the stack frame), such as parameters, local variables, temporary variables, etc. used in the method stack called by each thread
An object referenced by a class static attribute in a method area, such as a Java class reference type static variable
An object referenced by a constant in a method area, such as a reference in a string constant pool
Objects referenced by JNI in the local method stack
Internal references to the Java virtual machine, such as class objects corresponding to basic data types, some resident exception objects, and so on, as well as system class loaders
The object held by the synchronization lock
Jmxbeans that reflect Java virtual machine internals, callbacks registered in JVMTI, native code caches, and so on.

1.4.2 Collection Algorithm

Mark-sweep: The collection process is mainly divided into two stages. The first stage is Tracing, which starts from GC Root and walks through the object graph and marks every object encountered. The second stage is Sweep, which checks every object in the heap and reclaims all unmarked objects. There is no object movement. The whole algorithm will use Tricolour Abstraction, BitMap and other technologies in different implementations to improve the efficiency of the algorithm, which is more efficient when there are more living objects.
Mark-compact: The primary goal of this algorithm is to resolve the fragmentation that occurs in a non-mobile collector. This algorithm also consists of two phases. The first phase is similar to Mark-Sweep, and the second phase compacts living objects in a Compaction Order. It implements a two-finger recovery algorithm, Lisp2 algorithm, and a Threaded Compaction algorithm.
Copying: Dividing a space into two similar-sized From and To halves, using only one at a time, transferring surviving objects From one half To another by Copying each time one is recycled. There are recursive (Robert R. Fenichel and Jerome C. Yochelson) and iterative (Cheney) algorithms, as well as approximate priority search algorithms that solve the problems of the first two recursive stacks, cached rows, and so on. The replication algorithm can quickly allocate memory by colliding Pointers, but it also has the disadvantage of low space utilization, and the replication cost is high when the living object is large.

Some comparisons of the three algorithms in terms of whether to move objects, space and time, assuming that the number of viable objects isL, heap space size is ** H**, then

1.5 the collector

Currently, there are two main categories in Hotspot VM: generation collection and partition collection, which can be seen in the figure below, but in the future there will be a trend towards partition collection. Within Meituan, ZGC has been used in some businesses (interested students can learn this articleExploration and practice of new generation garbage collector ZGC), the rest is mostly stuck on CMS and G1. In addition, Epsilon (A no-op Garbage Collector), which does not perform any Garbage collection, was provided after JDK11 for performance analysis. Another is Azul’s Zing JVM, whose C4 (Concurrent Continuously Compacting Collector) Collector is also influential in the industry.

1.5.1 Generational collector

ParNew: a multi-threaded collector that uses the replication algorithm, mainly working in the Young region, can control the number of threads collected by the -xx :ParallelGCThreads parameter, the whole process is STW, often used in combination with CMS.
CMS: To obtain the shortest recovery pause time as the goal, USES the “tag – clear” algorithm, four step (initial tag, concurrent tags, tags again, concurrent clearance) for garbage collection, including the initial and the marking will STW, most used in Internet sites or * * * * B/S system on the server side of JDK9 is marked, JDK14 was deleted.

1.5.2 Partition collector

G1: A server-side garbage collector (mainstream) used in multi-processor and high-memory environments to achieve high throughput while meeting garbage collection pause times as much as possible.
ZGC: JDK11 is a low latency garbage collector for large memory and low latency services. SPECjbb 2015 benchmark tests show that the maximum pause time is only 1.68ms under 128GB heap, which is much better than G1 and CMS.

1.5.3 Common collectors

The most used collectors are CMS and G1, both of which have the concept of generation and the main memory structure is as follows:

1.6 Common Tools

1.6.1 Cli Terminal

Standard terminal classes: JPS, JINFO, Jstat, JStack, and JMap
Functional integration classes: JCMD, VjTools, arthas, Greys

1.6.2 Visual Interface

Easy: JConsole, JVisualvm, HA, GCHisto, GCViewer
Advanced: MAT, JProfiler

2.GC problem determination

2.1 Determine whether GC has a problem?

2.1.1 Evaluation criteria

There are two core metrics for evaluating GC:

Latency: also known as the maximum Latency, which is the maximum time of a garbage collection STW. The shorter the better. Increasing frequency is acceptable to some extent, and the main development direction of GC technology.
Throughput: During the life cycle of the application system, since GC threads occupy the currently available CPU clock cycles of Mutator, throughput is the percentage of the time effectively spent by Mutator in the total system running time. For example, when the system runs for 100 minutes, GC takes 1 minute. The system throughput is 99%, and a throughput first collector can accept longer pauses.

At present, the systems of major Internet companies generally pursue low latency to avoid the loss of user experience caused by a long GC pause. The measurement indicators need to be combined with SLA of application services, which are mainly judged by the following two points:In short, that isThe pause time should not exceed TP9999 of the application service, and the GC throughput should not be less than 99.99%. For example, if TP9999 of A service A is 80 ms and the average GC pause is 30 ms, then the maximum pause time of the service should not exceed 80 ms and the GC frequency should be limited to 5 minutes at A time. If not, then tuning or more resources are required for parallel redundancy. (You can stop for a moment and check the GC on the monitoring platform. Meantime, if the minute level indicator exceeds 6ms, the single-machine GC throughput will be less than 4 9s.)

2.1.2 Learning GC Logs

After getting the GC log, we can simply analyze the GC situation. With some tools, we can intuitively see the distribution of causes, as shown in the figure below, which is drawn using GCEasy:GC Cause:

System.gc() : Manually triggers a GC operation.
CMS: SOME actions during the execution of CMS GC, focusing on the TWO STW phases CMS Initial Mark and CMS Final Remark.
Promotion Failure: Old block does not have enough space to allocate to promoted objects in Young block (even though the total available memory is large enough).
Concurrent Mode Failure: During a CMS GC run, the Old area does not have enough space to allocate to new objects and the collector degrades, severely affecting GC performance. The following is a case in point.
GCLocker Initiated GC: If a thread executes in a JNI critical region and needs to GC, the GCLocker blocks GC and other threads from entering the JNI critical region until a GC is triggered when the last thread exits the critical region.

Taking CMS as an example, I conducted a log analysis, which is as follows: Firstly, six processes of CMS GC are determined:

1.Initial flag: The application thread is suspended to collect object references for the application, and when this phase is complete, the application thread starts again.2.Concurrent marking: All other object references are iterated from the object references collected in the first phase.3.Concurrent pre-cleanup: Changes the object references generated by the application threads when phase 2 is run to update the results of phase 2.4.Relabelling: Since the third phase is concurrent, object references may change further. As a result, the application thread is paused again to update the changes and ensure a correct object reference view before the actual cleanup. This stage is important because you must avoid collecting objects that are still referenced.5.Concurrent cleanup: All objects that are no longer applied are purged from the heap.6.Concurrent reset: The collector does some finishing work so that the next GC cycle has a clean state. Among them4Four phases (those with the name Concurrent) are executed concurrently with the actual application, while the others2Phases require suspending the application thread (STW). Initial marking and re-markingCopy the code

Analyze the log with the following code and corresponding GC parameters

public class GCTest {

    private static final int _10MB = 10 * 1024 * 1024;

    / * * *@param args
     * @throws InterruptedException
     */
    public static void main(String[] args) throws Exception {
        test();

    }

    /** * VM arg: -xms100m (set maximum heap memory) -XMx100m (set initial heap memory) * -xmn50m (set new generation size) * -xx :+PrintGCDetails(print GC log details) * -xx :+UseConcMarkSweepGC (use CMS GC algorithm) * -xx :+UseParNewGC (The new generation adopts parallel GC method, * UseConcMarkSweepGC is automatically enabled if older JDK versions use it.) * -xx :SurvivorRatio=8 * Eden: fromsurvivor: tosurvivor -- > 8:1:1) * - XX: MaxTenuringThreshold = 1 (how many times is used to control the object can undergo * Minor GC (young GC) was promoted to old age, the default 15) * Survivor - XX: + PrintTenuringDistribution (output area surviving object's age distribution) * - XX: CMSInitiatingOccupancyFraction = 68 * (set with the old s space utilization rate trigger CMS for the first time * GC, default 68%) *@throws InterruptedException
     */
    public static void test(a) throws InterruptedException {
        List<byte[]> list = new ArrayList<>();
        for (int n = 1; n < 8; n++) {
            byte[] alloc = new byte[_10MB]; list.add(alloc); } Thread.sleep(Integer.MAX_VALUE); }}Copy the code

Obtain logs and analyze them

D:\platform\git\demo>java -Xms100m -Xmx100m -Xmn50m -XX:+PrintGCDetails 
-XX:+UseConcMarkSweepGC -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
-XX:+PrintTenuringDistribution GCTest
[GC (Allocation Failure) 
[ParNew Desired survivor size 2621440 bytes, new threshold 1 (max 1)
- age   1:     636784 bytes,     636784 total: 33997K->632K(46080K), 0.0105313 secs] 
33997K->31354K(97280K), 0.0107013 secs]
[Times: user=0.01 sys=0.00, real=0.01 secs]
// Step 1,young GC, if there is no room for the new generation to store new objects,young GC (also called minor GC) occurs.
//33997K->632K(46080K, 0.0105313 secs)] Cenozoic from 33997K GC to 632K,
// Heap size: 33997K->31354K(97280K)
// The new generation uses the copy algorithm to collect memory in the survivor region.
// The size of the object can be set to the size of the object in the old age, which involves memory guarantee policy.
/ / parameter XX: PretenureSizeThreshold = < value >)
[GC (Allocation Failure) [ParNew: 32961K->32961K(46080K), 0.0001067 secs]
[CMS: 30722K->40960K(51200K), 0.0111639 secs] 63683K->62060K(97280K), 
[Metaspace: 2509K->2509K(1056768K)], 0.0118014 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
// Step 2, the ratio of the space occupied by the old era reaches the threshold, triggering the first GC of the old era, namely the CMS GC
//CMS: 30722K->40960K(51200K): old age from 30722K to 40960K, total size 51200K
[GC (CMS Initial Mark) [1 CMS-initial-mark: 40960K(51200K)] 72995K(97280K), 0.0004270 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
//step 3 initial tag
[CMS-concurrent-mark-start]
[CMS-concurrent-mark: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
//step 4 concurrency flags
[CMS-concurrent-preclean-start]
[CMS-concurrent-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
// Step 5 Perform concurrent pre-clearing
[CMS-concurrent-abortable-preclean-start]
[CMS-concurrent-abortable-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
[GC (CMS Final Remark) [YG occupancy: 32035 K (46080 K)][Rescan (parallel) , 0.0003930 secs]
// re-mark step 6
[weak refs processing, 0.0000436 secs][class unloading, 0.0001861 secs]
[scrub symbol table, 0.0003188 secs]
[scrub string table, 0.0000946 secs] / /weak refs processingTo deal witholdA weak reference to the section for recyclingnative memory
class unloadingrecyclingSystemDictionary
[1 CMS-remark: 40960K(51200K72995)]K(97280K), 0.0013492 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
[CMS-concurrent-sweep-start]
[CMS-concurrent-sweep: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
// Step 7 concurrent cleanup
[CMS-concurrent-reset-start]
[CMS-concurrent-reset: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
// resets step 8 concurrently
Heap
 par new generation   total 46080K, used 32445K [0x00000000f9c00000.0x00000000fce00000.0x00000000fce00000)
  eden space 40960K,  79% used [0x00000000f9c00000.0x00000000fbbaf5b0.0x00000000fc400000)
  from space 5120K,   0% used [0x00000000fc900000.0x00000000fc900000.0x00000000fce00000)
  to   space 5120K,   0% used [0x00000000fc400000.0x00000000fc400000.0x00000000fc900000)
 concurrent mark-sweep generation total 51200K, used 40960K [0x00000000fce00000.0x0000000100000000.0x0000000100000000)
 Metaspace       used 2515K, capacity 4486K, committed 4864K, reserved 1056768K
  class space    used 269K.capacity 386K.committed 512K.reserved 1048576K
Copy the code

Basically, the format is: size occupied by region before reclamation -> size occupied by region after reclamation (size set by locale), time. Take the most typical part of them for analysis

ParNew: 32961K->32961K(46080K),0.0001067[CMS: 30722K->40960K(51200K),0.0111639 secs] 63683K->62060K(97280K), 
[Metaspace: 2509K->2509K(1056768K)], 0.0118014 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
Copy the code

ParNew shows that the Cenozoic generation performs GC in parallel

Serial collector: DefNew: is using -xx :+UseSerialGC (new generation, old generation are using serial collector). Parallel collector: ParNew: either use -xx :+UseParNewGC (the new generation uses the parallel collector, the old generation uses the serial collector) or -xx :+UseConcMarkSweepGC(the new generation uses the parallel collector, the old generation uses CMS). -xx :+UseParallelOldGC; -xx :+UseParallelGC; -xx :+UseParallelGC; Is to use -xx :+UseG1GC (G1 collector)Copy the code

Heap area situation

[Metaspace: 2509K->2509K(1056768K)], 0.0118014 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
Copy the code

In fact, there are some special cases during execution, such as a Concurrent mode failure, which occurs when AN object newly generated exceeds the maximum heap memory and generates an OOM. The GC log is as follows:

[GC [ParNew: 31539K->496K(46080K), 0.0137601 secs] 31539K->31218K(97280K), 0.0152885 secs] 
[Times: user=0.02 sys=0.00, real=0.02 secs]
[GC [ParNew: 33275K->33275K(46080K), 0.0003866 secs][CMS: 30722K->40960K(51200K), 0.0217084 secs]
63997K->61912K(97280K), [CMS Perm : 2534K->2532K(21248K)], 0.0231226 secs] 
[Times: user=0.02 sys=0.00, real=0.02 secs]
[GC [1 CMS-initial-mark: 40960K(51200K)] 72152K(97280K), 0.0004554 secs]
 [Times: user=0.00 sys=0.00, real=0.00 secs]
[Full GC [CMS[CMS-concurrent-mark: 0.010/0.011 secs] 
[Times: user=0.00 sys=0.00, real=0.01 secs]
 (concurrent mode failure): 40960K->40960K(51200K), 0.0159043 secs] 72152K->72152K(97280K), [CMS Perm : 2532K->2532K(21248K)], 0.0165056 secs] [Times: user=0.01 sys=0.00, real=0.02 secs]
[Full GC [CMS: 40960K->40960K(51200K), 0.0075621 secs] 
72152K->72138K(97280K), [CMS Perm : 2532K->2532K(21248K)], 0.0084089 secs]
[Times: user=0.00 sys=0.00, real=0.01 secs]
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at GCTest.test(GCTest.java:19)
        at GCTest.main(GCTest.java:12)
Copy the code

The reason for this is that during the CMS GC, there are objects that need to be put into the old generation and there is not enough space in the old generation (memory guarantee lapses). The same is true of promotion failed, which is caused by the fact that during Minor GC(Young GC), survivor Spaces are not available and objects can only be placed in older generations, which are not available either.

2.2 My GC troubleshooting process?

2.2.1 GC stack information

First take a look at the stack information for the corresponding Java process using the jstat (View virtual machine running information) command

jstat -gcutil 26819
S0    S1      E     O      M      CCS   YGC YGCT   FGC  FGCT       GCT
0.00 43.75 0.00 42.22 67.19 50.93 4955 30.970 4890 3505.049 3536.020
Copy the code

You can see that M(metaSpace usage) is 67.19 and metaSpace usage is 67.19. O is 42.22, and the utilization rate of old area is 42.22

top -H -p 26819
26821 appdev 20 0 6864m 1.2g 13m R 87.6 7.5 53:40.18 java
26822 appdev 20 0 6864m 1.2g 13m R 87.6 7.5 53:41.40 java
26823 appdev 20 0 6864m 1.2g 13m R 87.6 7.5 53:43.64 java
26824 appdev 20 0 6864m 1.2g 13m R 85.6 7.5 53:41.59 java
26825 appdev 20 0 6864m 1.2g 13m R 85.6 7.5 53:43.82 java
26826 appdev 20 0 6864m 1.2g 13m R 85.6 7.5 53:40.47 java
26827 appdev 20 0 6864m 1.2g 13m R 85.6 7.5 53:45.05 java
26828 appdev 20 0 6864m 1.2g 13m R 83.6 7.5 53:39.08 java
Copy the code

The CPU usage of 26821 to 26828 is very high, 26821 is converted to hexadecimal 68c5 jstack.

jstack 26819 > 26819.text
vim 26819.Text Then search for 68c5-68cc"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007f0aa401e000 nid=0x68c5 runnable
"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007f0aa4020000 nid=0x68c6 runnable
"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007f0aa4021800 nid=0x68c7 runnable
"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007f0aa4023800 nid=0x68c8 runnable
"GC task thread#4 (ParallelGC)" os_prio=0 tid=0x00007f0aa4025800 nid=0x68c9 runnable
"GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00007f0aa4027000 nid=0x68ca runnable
"GC task thread#6 (ParallelGC)" os_prio=0 tid=0x00007f0aa4029000 nid=0x68cb runnable
"GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00007f0aa402a800 nid=0x68cc runnable
Copy the code

It can be found that the thread of full GC is executing consistently, occupying high CPU resources, and it is consistent, indicating that the condition of full GC has been reached, but the memory cannot be reclaimed, thus occupying a large amount of CPU, causing the program to be unavailable. View the following startup configuration parameters:

-Xms1000m -Xmx1000m -XX:MaxNewSize=256m -XX:ThreadStackSize=256 -XX:MetaspaceSize=38m -XX:MaxMetaspaceSize=380m
Copy the code

Analyze the logic of the program, which is a public service that loads many JARS into memory. Many colleagues upload the JAR, and the program loads the jar into the ClassLoader (in the heap space, which is cleaned up during full GC) for analysis and saving.

2.2.2 Problem Analysis

According to the fullGC trigger condition of metaspace in JDK8, the initial metaspacesize is 38m, meaning that the first gc is performed when the first loaded class reaches 38m (according to JDK 8 features, Both G1 and CMS do a good job of collecting Metaspace extents-usually with Full GC), and then the JVM dynamically adjusts (after GC) the metaspacesize size.

Similar in nature to permanent generations, meta-spaces are implementations of method areas in the JVM specification. However, the biggest difference between a meta-space and a permanent generation is that the meta-space is not in the virtual machine, but uses local memory. Therefore, by default, the size of a meta-space is limited only by local memory, but you can specify the size of a meta-space with the following parameters:

In JDK 8, classes metadata is now stored in the native heap and this space is called Metaspace. There are some new flags added for  Metaspace in JDK 8: -XX:MetaspaceSize=<NNN> where <NNN> is the initial amount of space(the initial high-water-mark) allocated for class metadata (in bytes) that may induce a garbage collection to unload classes. The amount is approximate. After the high-water-mark is first reached, the next high-water-mark is managed by the garbage collector -XX:MaxMetaspaceSize=<NNN> where <NNN> is the maximum amount of space to be allocated for class metadata (in bytes). This flag can be used to limit the amount of space allocated for class metadata. This value is approximate. By default there is no limit set. -XX:MinMetaspaceFreeRatio=<NNN> where <NNN> is the minimum percentage of class metadata capacity free after a GC to avoid an increase in the amount of space (high-water-mark) allocated for class metadata that will induce a garbage collection. -XX:MaxMetaspaceFreeRatio=<NNN> where <NNN> is the maximum percentage of class metadata capacity free after a GC to avoid a reduction in the amount of space (high-water-mark) allocated for class metadata that will induce a garbage collection.Copy the code

translation

-xx :MetaspaceSize, the initial size of the space, which triggers garbage collection for type offloading and is adjusted by the GC: if a large amount of space is freed, the value is reduced appropriately; If very little space is freed, increase this value appropriately until MaxMetaspaceSize is exceeded. -xx :MaxMetaspaceSize. The maximum space is unlimited by default. In addition to the two options above to specify the size, there are two gC-related attributes: -xx :MinMetaspaceFreeRatio, the minimum percentage of Metaspace free space capacity after GC, reducing the amount of garbage collected as a result of allocating space. -xx :MaxMetaspaceFreeRatio, after GC, Maximum Metaspace percentage of free space capacity, reduced for garbage collection resulting from free spaceCopy the code

By default class metadata allocation is only limited by the amount of available native memory. We can use the new option MaxMetaspaceSize to limit the amount of native memory used for the class metadata. It is analogous to MaxPermSize. A garbage collection is induced to collect the dead classloaders and classes when the class metadata usage reaches MetaspaceSize (12Mbytes on the 32bit client VM and 16Mbytes on the 32bit server VM with larger sizes on the 64bit VMs). Set MetaspaceSize to a higher value to delay the induced garbage collections. After an induced garbage collection, the class metadata usage needed to induce the next garbage collection may be increased.Copy the code

According to the description,

Metadata usage reaches MetaspaceSize (default MetaspaceSize is 20.8m on 64 servers).

2. XX: MinMetaspaceFreeRatio is used to avoid the next application of free metadata is greater than the temporary have free metadata and trigger gc, for example, is When the metaspacesize usage reaches the initial value of 6m, expansion is performed (MinMetaspaceExpansion and MaxMetaspaceExpansion have been done before, but still failed), and then after GC, because the reclaimed memory is very small, Then calculate (memory to be committed [unused])/(Memory to be committed + Commmited memory) ==40%, (Memory to be committed + Commmited memory) Easy to start GC】) if the metaspaceSize is greater than the metaspaceSize threshold, then we will try to expand the metaspaceGC threshold, but this increment is at least MinMetaspaceExpansion. Otherwise, the threshold will not be increased), this parameter is mainly used to avoid triggering the metaspaceGC threshold and the committed memory amount after GC is close, so the metaspaceSize threshold is enlarged to minimize the probability of the next GC.

For the same reason, -xx :MaxMetaspaceFreeRatio (default 70) is used to prevent the next free metadata request from being much smaller than the current free memory, resulting in GC. The main function is to reduce unnecessary memory footprint.

Full GC raised by metaspace in JDK8

-xx :MaxMetaspaceSize; -xx :MaxMetaspaceSize; -xx :MaxMetaspaceSize; This setting is the same as the previous Max perm size. -xx :MaxMetaspaceSize can also cause oom problems with Max Perm.

We can achieve the famed OOM error by setting the MaxMetaspaceSize argument to JVM and running the sample program provided.

MetaspaceSize Default initial size: MetaspaceSize (12Mbytes on the 32bit client VM and 16Mbytes on the 32bit server VM with larger sizes on the 64bit VMs). We can set the initial size we want with -xx :MetaspaceSize. Setting a larger size increases the time it takes to reach full GC for the first time. (default 64 m)

Ps: The following is the process whose Id is different from the preceding one.

jstat -gc 1706

S0C	S1C	S0U	S1U	EC	EU	OC	OU	MC	MU	CCSC	CCSU	YGC	YGCT	FGC	FGCT	GCT
31744.0	32768.0	0.0	21603.6	195584.0	192805.8	761856.0	384823.3	467712.0	309814.3	65536.0	36929.1	101	2.887	3	1.224	4.112

Analysis: MC is the commited memory, MU is the memory currently in use. There is a question whether MC is the total memory that metaspace has already used, because this value has reached maxMetaspacesize, and why mu is not the same as MC. I guess it is due to fragmented memory, there are students who know can tell me.

Fullgc was performed three times when maxMetaspacesize was reached. But then, due to the continuous memory application and continuous FULLGC, fullGC could not reclaim the memory, and the frequency of fullGC increased a lot at this time. In the following top-H-P 1706 look at the CPU, you can see that a large number of high CPU processes are performing fullGC through jStack.

jmap -clstats 1706

Total = 131 8016 13892091 N/A Alive =45, Dead =86 N/A

The second: Total = 1345 37619 77242171 N/A alive=1170, Dead =175 N/A alive Classloaders are created by themselves. Each time the GC does not reclaim the ClassLoader

A Class in a VM can only be collected by GC if it meets the following three conditions: it can be unloaded

All instances of this Class are already GC, meaning that there are no instances of this Class in the JVM.
The ClassLoader that loaded the class has already been GC. Classloaders are recycled. All ClassLoader instances need to be recycled.
The java.lang.Class object of the Class is not referenced anywhere, such as the method of the Class that cannot be accessed through reflection anywhere

jcmd 1706 GC.class_stats | awk '{print $13}' | sort | uniq -c | sort -nrk1 > topclass.txt
Copy the code

In conclusion, the custom classloader loads the classes repeatedly and the number keeps increasing. See a large number of class duplicates.

2.2.3 GC Log Analysis

[Heap Dump (before Full GC):, 0.4032181 secs]2018-01-10T16:37:44.658+ 0800:21.673: [Full GC (Metadata GC Threshold) [PSYoungGen: 14337K->0K(235520K)] [ParOldGen: 18787K->30930K(761856K)] 33125K->30930K(997376K), [Metaspace: 37827K->37827K(1083392K)], 0.1360661 secs] [Times: Here: [Metaspace: 37827K->37827K(1083392K)] reaches our initial value of 38m, and gc does not reclaim memory. The value 1083392K is suspected to be caused by the use of CompressedClassSpaceSize = 1073741824 (1024.0MB). [Heap Dump (before Full GC):, 5.3642805 secs]2018-01-10T16:53:43.811+ 0800:980.825: [Full GC (Metadata GC Threshold) [PSYoungGen: 21613K->0K(231424K)] [ParOldGen: 390439K->400478K(761856K)] 412053K->400478K(993280K), [Metaspace: 314108K->313262K(1458176K)], 1.2320834secs] [Times: User =7.86 sys=0.06, real=1.23 secs] 314108K->313262K(1458176K)] reached our specified MinMetaspaceFreeRatio and the GC barely reclaimed memory. The value 1458176K is a combination of CompressedClassSpaceSize = 1073741824 (1024.0MB) and MaxMetaspaceSize = 503316480 (480.0MB). This is followed by a very frequent repetition of FULLGC.Copy the code

With the above foundation, you will know how to solve the problem this time. Classloaders continue to be created. Classloaders continue to load classes. The previous classloaders and classes are not recycled at fullGC.

The program avoids creating duplicate classLoaders and reduces the number of classloaders created.
Increase XX:MinMetaspaceFreeRatio (default 40) and you can see that it is now (67.19%).
Set a larger maxmetaspaceSize.

Extended learning

The system.gc() method is called to recommend Full GC to the JVM, but in many cases it triggers Full GC, increasing the frequency of Full GC and thus the number of intermittent pauses. It is strongly recommended not to use this method and to let the vm manage its memory by passing -xx :+ DisableExplicitGC to disable RMI calling System.gc.

2, the older generations insufficient space (old/Tenured) Old s space only in the Cenozoic object into and create a large object, the big array will appear the phenomenon of insufficient, when performing Full GC after the space is insufficient, it throws the following error: Java. Lang. OutOfMemoryError: To avoid Full GC due to both of these conditions, you should tune so that objects are collected in Minor GC phases, keep objects alive longer in the new generation, and don’t create large objects and arrays.

3, the perm space is insufficient (JDK <=7, metaspace in JDK8). The method area in the runtime data area of the JVM specification is commonly referred to as the immortal generation or immortal area in the HotSpot VIRTUAL machine. Permanet Generation stores some class information, constants, static variables and other data. When there are too many classes to load, reflected classes and called methods in the system, Permanet Generation may be occupied. Full GC is also performed when the CMS GC is not configured. If after the Full GC can’t recycle, then the JVM throws the following error message: Java. Lang. OutOfMemoryError: To avoid Full GC caused by Full PermGen, you can increase the PermGen space or switch to CMS GC.

4,CMS GC has promotion failed and Concurrent mode failure. In particular, watch out for promotion failed and Concurrent mode failure in the GC logs, which may trigger the Full GC. Promotion failed is caused when a Minor GC fails to place survivor Spaces and objects can only be placed in older ages. Concurrent mode failure is caused when an object is placed in the old age during CMS GC and there is insufficient space in the old age (sometimes “insufficient space” is caused by too much floating garbage at the time of CMS GC that temporarily runs out of space to trigger Full GC). The measures are: Increase survivor space, old s space or lower trigger a concurrent GC ratio (- XX: CMSInitiatingOccupancyFraction = 70, room was 70%). However, in JDK 5.0+ and 6.0+, it is possible that the JDK bug29 will cause the CMS to trigger the sweeping action long after the remark is finished. For this kind of situation, can be set up through – XX: CMSMaxAbortablePrecleanTime = 5 (in ms) to avoid.

5. The average size of the Minor GC promoted to the old generation (Eden to S2 and S1 to S2) is greater than the remaining space of the old generation. This is a complicated triggering situation. A judgment is made that if the average size of the Minor GC promoted to the old generation from the previous statistics is greater than the remaining space of the old generation, then Full GC is triggered directly. For example, if a 6MB object is promoted to the old generation after the first Minor GC is triggered, the next Minor GC is checked to see if the old generation has more than 6MB of free space, and if less than 6MB, the Full GC is performed. In this example, after the first Minor GC, the PS GC will check to see if the remaining space of the old generation is greater than 6MB. If it is less than 6MB, it will trigger the collection of the old generation. In addition to the above four cases, Sun JDK applications that use RMI for RPC or administration perform Full GC once an hour by default. May be passed by at startup – Java – Dsun. Rmi. DGC. Client. GcInterval = 3600000 to set up the Full GC execution time interval or through – XX: + DisableExplicitGC to prohibit rmi System. GC.

6, the heap allocation of large objects Large objects, refers to the Java objects that need a lot of consecutive memory space, such as a long array, the object goes straight to the old age, and old age although there are a lot of the remaining space, but unable to find a large enough contiguous space to assigned to the current object, the situation will trigger the JVM to Full GC. In order to solve this problem, the CMS garbage collector provides a configurable parameters, namely – XX: + UseCMSCompactAtFullCollection switch parameters, is used to “enjoy” after Full GC service additional complimentary a defragmentation process, the problem of space debris, But teton may not be the same time, the JVM designers also provides another parameter – XX: CMSFullGCsBeforeCompaction, this parameter is used to set how much time without compression in the implementation of the Full GC, followed by a band of compression.