This paper analyzes in detail how the ParNew + CMS collector allocates memory areas according to the generation algorithm, and then finds problems in the process of implementation -> optimization ideas, to find the cause of triggering unnecessary GC. Because the article has been long, so there are some concepts do not expand the narrative, there are questions can leave a message oh ~

1. GC

1.1 Generation collection theory

The theory of generational collection is based on three hypotheses:

Weak generation hypothesis: most of the objects are dead in the dynasty -> establish Cenozoic era;
Strong generation hypothesis: the more garbage collection the object is less likely to die -> establish the old age, and the new generation to promote the old age mechanism;
Cross-generation reference hypothesis: Cross-generation references are relatively rare compared to same-generation references -> memory sets.

Therefore, the garbage collector based on generation theory will divide the Heap area of the memory space where THE JVM process is located into the new generation and the old age, and based on the characteristics of the new generation object and the old age object, using different memory collection algorithm, and therefore the memory area is further divided;

1.2 Basic Concepts:

1.2.1 GC classification and trigger timing

Young/Minor GC: Garbage collection that occurs in the new generation memory region; The Minor GC is generally triggered when the new generation runs out of memory to continue allocating to new objects;
Major/Old GC: garbage collection that occurs in older memory areas; At present, there is only a separate collection behavior in the CMS collector. By default, the collection behavior is triggered when the used space of the old generation reaches %92 or the memory fragmentation causes the new generation cannot be promoted normally.
Mixed GC: garbage collection behavior occurring in the New generation and part of the old age; Currently only available in G1 collector;
Full GC: Unlike the previous three partial collections, Full GC collects the entire Java heap as well as the method area; In general, Full GC will be triggered when the method area is Full and the new generation guarantee mechanism fails. This situation will be analyzed emphatically below.
Safe points (when to enter GC) : Safe points are generally chosen at code with characteristics that require long execution, such as loop calls, exception jumps, method calls, etc. Instead of pausing anywhere in the code instruction stream for garbage collection, the garbage collector makes all threads run to the nearest safe point when garbage collection is about to occur, and then pauses for garbage collection to begin. HotSpot actively interrupts by setting a flag bit when the garbage collector is about to collect, polling this flag every time the thread runs to a safe point, and suspending at the current safe point if the flag bit is set.

1.2.2 Object Allocation Policy

New objects are preferentially allocated among new ones, thereforeTLABThe region is in Cenozoic;
If the size of the new object exceeds-XX:PretenureSizeThresholdThe specified size will be distributed over the old age;
Cenozoic objects live beyond their age-XX:MaxTenuringThresholdThe specified age will move from the new generation to the old age; Where the age information of the object is stored in the object headerMarkWordIn the.

1.3 General steps for GC

Currently, the mainstream garbage collector adopts the reachability analysis algorithm, so the general steps of garbage collection are as follows:

Collect GCRoots, including static object references of class, object references of method area constants, object references in virtual machine stack and Native stack. If it is a partial collection behavior, it also needs to add objects referenced across generations in the memory set.
Mark alive object, fromGCRootsStarting, the object reference graph is traversed, and the nodes that can be connected with GCRoots are marked as living objects. Usually throughTricolor notation (tricolor abstraction)To mark;
Garbage object collection, based on the triggered GC type, to collect objects in different regions;
Reset the data structures associated with the collection algorithm in preparation for the next round of garbage collection.

Garbage collectors all have the above four steps, but have different implementations; Different garbage collectors adopt different strategies such as concurrent user threads with GC threads, multiple GC threads running in parallel, but The common goal is to reduce The amount of time garbage collection causes Stop The Wrold;

1.3.1 Procedure for marking viable objects

Garbage collectors such as Serial and ParNew simply wait until all threads are safe. And CMS, G1 garbage collector adopted a strategy of concurrent tags, but in the process of concurrent tags, a reference to the relationship between objects have been changed to: perhaps originally and GCRoots indirect reference node in concurrent interrupt even the tag process, which can produce the problem of concurrency, can be used for analysis of three color marker algorithm;

Three color tag

Tricolor mark, that is, mark objects in different marking states by different colors;

Black: indicates that the object has been accessed by the garbage collector and that all references to the object have been scanned;
White: indicates that the object has not been accessed by the garbage collector. At the beginning of concurrent marking, it is clear that all objects in the target region are white;
Gray: Indicates that the object has been accessed by the garbage collector, but all references to the object have not been scanned;

Moreover, the efficiency of garbage collector is optimized based on the characteristics of three-color mark: if the gray object reference being traversed is black, since all references of the black node have been scanned, there is no need to scan the black node; It also prevents recursive tagging from going into an infinite loop.

But this optimization raises two problems:

In the process of concurrent marking, a node that has been marked black is referenced by a new object (white), but because the collector does not iterate over the black node, the white node is omitted to be marked as a viable object, causing the garbage collector to reclaim the viable object.
In the process of concurrent mark, is scanning a node has been marked as grey, with a originally in the chain of references, but has not yet been scanned object (white) between the reference was cut off, and the node does not exist, directly or indirectly with other grey node references are cut off, when the object node won’t be traversed to; There is no impact at this time, but if the object has a reference to a black node, it will still result in an incorrect garbage collection.

Wilson showed that by destroying one of the conditions that caused the above two problems, the problem of recovering objects that should have survived could not occur; The solutions include incremental update and original snapshot;

Incremental update: break the first condition – when the black object inserts a new reference relation of the white object, the new insert reference will be recorded. After the end of concurrent scanning, the black object in these recorded reference record relation will be taken as the root in the final marking stage, and the scan and mark again. CMS adopts this strategy, and the STW is required for the final markup phase.
Original snapshot: destroy the second condition – when the gray object wants to delete the reference relation pointing to the white object, the reference to be deleted is recorded. After the concurrent scanning, the gray object in these reference relation is taken as the root in the final marking stage and scanned again. G1 and Shenandoah use the original snapshot to achieve.

1.3.2 Garbage object collection process

If the garbage collector uses generational collection algorithms, that is, mark-clean, mark-copy, and mark-collation, the common feature of the algorithms is that they all need to be marked in advance, which has been done in Step 2. In addition, the garbage collector plans heap memory according to the collection algorithm it uses. The specific planning method will be described in detail in the next section about the collection strategy of the new generation and the old generation.

1.4 New generation collection strategy

Because most objects in the Cenozoic live and die quickly, the recovery cost of mark-copy algorithm is positively related to the number of objects surviving after Cenozoic GC. After the mark-copy algorithm is used to clean up the memory region, there is no space fragment, so we can use the way of pointer collision to allocate the memory of new objects. Therefore, the mark-copy algorithm is used for Heap generation memory reclamation.

Initially Fenichel proposed a half-partition copy algorithm, in which the new generation of memory is divided into two pieces of the same size, one of which is used at a time, and when that piece of memory is used up, the Minor GC is triggered to copy the surviving objects from that piece of memory to the other piece of memory.

1.4.1 Appel Collection

Half region replication reduces the available memory of the new generation by half, resulting in a large space waste. But because according to IBM, 98% of the new generation of objects do not survive the first GC collection, there is no obvious need to split the space in half;

Therefore, Andrew Appel optimized the half-region replication: the new generation memory was divided into a large Eden and two small survivors; Only Eden and one Survivor are used for allocating memory each time. When garbage collection occurs, the surviving objects of these two regions will be copied to another Survivor and the two regions used for allocating memory will be emptied. As shown in the picture below, Young Generation:

The new generation garbage collectors such as Serial and ParNew further subdivide the memory layout of the new generation based on Appel collection algorithm. By default, the ratio of Eden to Survivor is 8:1 and can be customized by setting the VM option -xx :SurvivorRatio.

1.4.2 Cost of Appel recovery – cost of allocating guarantee

However, Appel will have the situation that the number of remaining objects after Minor GC >10% and the remaining Survivor cannot complete the replication, so the -allocation guarantee mechanism needs to be introduced.

There are two cases that satisfy the distributive guarantee mechanism:

The garbage collector can safely perform the Young GC if the memory of the previous generation has been allocated.
If the previous condition is not met, if the old generation is free and continuous memory > the average memory occupied by the new generation surviving objects after multiple Young GC, garbage collector can try to perform Young GC. If the old generation memory is not enough, garbage collector can only trigger Full GC for global garbage collection. That’s when the price becomes high.

Therefore, if Full GC is frequently triggered by a guarantee mechanism, you can use the -xx :SurvivorRatio option to increase the Survivor ratio and reduce the frequency of Full GC to improve performance. Examples can be seen in the practice section below.

1.4.3 Dynamic age determination

The designers of the garbage collector are also aware of the problems that can cause Full GC, so they propose dynamic (promotion) age determination algorithms: the age at which the new generation advances to the old generation is no longer determined solely by -xx :MaxTenuringThreshold; If the sum of all object sizes of a given age in the TO Survivor2 space is greater than half of the Survivor space, then objects older than or equal TO that age can go straight TO the old age.

In this way, most objects of the same age occupying the Survivor zone can be moved to the old age in advance to slow down the Survivor zone. Because the -xx :MaxTenuringThreshold value is too large, a large number of objects are piled up in the Survivor zone, resulting in frequent triggering of the allocation guarantee mechanism.

1.4.4 Cost of dynamic age determination

The strategy is straightforward, the cost of allocating guarantee failure is high -Full GC, so reduce the trigger mechanism, but using dynamic age determination brings new problems: When the JVM has just started, the program to generate most of the objects are in the new generation of distribution, and age of 0, the minority is 1, so based on the above mechanism, a virtual opportunity application > = 0 is old s promotion strategy, which could lead to a lot of new generation object directly copied into old age, and the moving object cost is relatively high, involving the distribution of the new space, Object transfer, program object value reference address replacement;

Allocation of new space: Because the operating system usually allocates virtual memory to the process, it does not allocate memory until the process actually uses it.
Substitution of address for program object value references: Java objects are value references, so such asObject object = new Object()In this type of statement,objectIs the value of a memory space address; Therefore, when an object is moved, the VM needs to replace the reference address of the object without awareness. The replacement strategy varies from garbage collector to garbage collector,ZGCuseDyeing pointer,ShenandoahThe use ofBrooks pointerAnd other garbage collectors to useRead barrierGive priority to.

Therefore, a large number of objects enter the old age at great cost, resulting in the phenomenon of GC long pause when the project is started. At present, there is no good way to bypass this mechanism. You can set the size of Survivor region reasonably by observing the size of the space occupied by the objects when the program is started. Or through grayscale publishing and other code on-line measures to let traffic slowly into the instance.

However, it is also necessary to consider whether the project can accept one or two long GC pauses after the initial start of the project. Generally, modifying JVM parameters is the last last-ditch strategy. Do not change it easily, and remember to distinguish between early optimization and early detection of problems

At this point, some of the mechanisms involved in the new generation are covered at a macro level.

1.5 Collection strategies in the old days

Because according to the object allocation strategy, there are two possibilities for objects to enter the old age:

Is greater than-XX:PretenureSizeThresholdObjects of specified size are allocated directly to the old age;
The new generation is promoted from the new generation to the old generation through dynamic age judgment, allocation guarantee and other mechanisms.

The old layouts were thus divided into Tenured and Humongous, the latter used to store large objects.

Currently, CMS(Concurrent Mark Sweep) is the only garbage collector that uses generational collection algorithms. There are two trigger times:

The memory usage of the old age reaches92%, can be set-XX:+UseCMSInitiatingOccupancyOnlymake- XX: CMSInitiatingOccupancyFractionParameters take effect and the threshold is changed.
After each Young GC completes, the CMS background polls the threadconcurrentMarkSweepThreadWill:The previous Young GC failed, or the next Young GC may fail based on the average size of the total objects promoted by each new generation, the background polling thread will fireMajor GC;

The stages of CMS GC are: initial tag (STW), concurrent tag, final tag (STW), and concurrent cleanup (the same as the general GC steps discussed above).

1.5.1 Final Remark

In the first half of final marking, CMS adopts the incremental update method mentioned above to make the object reference graph changed during concurrency for final traversal, which is the same as the process in the stage of concurrent marking.

However, the second half of the final tag of CMS will traverse the card table (CMS memory set) and clean up the objects in the RefrenceQueue queue. The objects in the queue are entered into the RefrenceQueue queue when the objects are wrapped by different Refrence implementation classes in the code.

RefrenceQueue Object clearing: Except forWeakRefrenceThe referenced object is cleaned up directly before the memory exception is thrownSoftRefrenceReference objects to clean up, etc.; Related to GC but easily overlookedFinalRefrenceThe JVM returns void for all implementations at class load timeIs not emptyfinalizemethodsObject to mark (noteObjectFinalize is an empty method); Then, during the concurrent marking phase, if an object marked by the JVM is marked by the garbage collector, the object is put inFinalizerClass, i.eThe object isFinalizerAt this point, it will not be garbage collectedUntil theFinalizerThreadThe thread with a lower priority gets the time slice for the objects in turnDequeue (at which point the object is unreferenced)Of the execution objectfinalizeMethod, if the object is not re-referenced, it will be collected in the next garbage collection;
RefrenceQueueObject cleanup can cause memory leaks: Relatively commonSocksSocketImplThe superclass of the class is implementedfinalizeMethod,In case the code does not manually close the socket causing a memory leak;If the current socket is used up, the unclosed socket object is added because there is no reference relationshipFinalizerClass, but it is not known when FinalizerThread will get the time slice to process these objects. If not processed for a long time, a large number of legacy socket objects will be aged and causeMajor GC; The current solution is to close the memory leak explicitly in code. The practice section explains how to use virtual machine tools to analyze the object causing the memory leak.

1.5.2 Concurrent Cleanup Phase

CMS uses a concurrent cleanup strategy. Both concurrency and cleanup bring some problems:

Concurrent: Instead of Serial Old and Parallel Old doing garbage collection, the user thread and GC thread are concurrent, so when CMS does garbage collection according to the object marked in the previous step, the user thread is still generating garbage objects. These objects can only be disposed of in the next garbage collection, which is called floating garbage;
Concurrency issues: Because of this, the old generation of concurrent cleanup strategies are different from the new generation of full and Young GC strategies, so the user thread must be left with a certain amount of memory to trigger the Major GC in advance. So if -xx: CMSInitiatingOccupancyFraction parameters for the current application set is too large, will cause the old s unable to accommodate into old s all objects during Concurrent, triggering Full GC, often leads to throw out the problem of Concurrent Mode Failure (PS: Full GC is triggered by memory fragmentation, so here are some of the methods before Full GC.
Clear: The disadvantage of using mark-clear algorithm is that it will generate more memory fragmentation and require additional free linked list data structure to maintain the available space; Fragmentation can result in a Full GC, even though there is still enough space left in the old age to accommodate larger objects. The current CMS also noticed this problem and added -xx: CMSFullGCsBeforeCompaction parameter, this parameter is used in the CMS after several garbage collection (determined by the parameters), the next first before they enter Full GC defragmentation, this parameter is the default is 0, that is, before every to trigger a Full GC will go to the sorting of debris, Full GC is triggered if there are still not enough allocations after collation;
Clean up the problem: Although the above methods to some extent the fragmentation caused by object memory allocation issues, but also can lead to CMS cannot concurrent clear, because defragmentation can lead to a moving object, and the movement of the object need of the reference object value address replacement, obviously user threads with GC thread could operate the same object at the same time, so the concurrency is unsafe, At this point only STW can be used, resulting in relatively long GC pauses;

At this point, some of the mechanisms involved in the old era in the macro will be finished ~

1.6 Non-heap-method area memory reclamation policy

The method area is stored in the non-heap. After Jdk8, the string constant pool is moved from the method area to the heap, so the storage pressure of the method area is reduced. The method area mainly stores the class information that has been loaded by the virtual machine, static variables, constants and the code compiled by the compiler.

Garbage collectors based on generational algorithms generally treat the memory area of the method area as permanent generation PERM to bring it into the garbage collector’s jurisdiction, and determine the upper limit of the method area size by the -xx: MaxPermSize parameter; However, because the unload conditions of type information are harsh, and the default size of the method area is not large enough, if there are a large number of bytecode frameworks such as reflection, dynamic proxy, CGlib, etc. at this time, new Class information is constantly loaded in memory, and the method area is gradually filled up, resulting in Full GC or even OOM.

In jdK8, the concept of permanent generation is removed and the meta-space implemented in the process local memory is used instead:

1.6.1 Methods OOM solution

As mentioned above, the probability of OOM in the method area is caused by the continuous loading of class information by different dynamic loading mechanisms; So when OOM occurs in the method area, Can pass JCMD < PID > GC. Class_stats | awk ‘{print $13}’ | sed ‘s / \ \ (. *) \. \ \ (. *) / 1 / g \’ | sort | uniq -c | sort -NRk1 method is used to confirm which packages have more classes over time, and then follow up the source code of the packages to check.

You can also dump snapshots and use tools such as JProfiler or MAT to analyze them. If no obvious problems exist at the package level, you can set the -xx :+TraceClassLoading and -xx :+TraceClassLoading and TraceClassLoading parameters to observe class loading and unloading information in detail.

1.7 Out-of-heap memory reclamation policy

The most intuitive definition of out-of-heap memory is: the memory space that belongs to the JVM process, but is not part of the data portion of the virtual machine running, and is not directly managed by the virtual machine. It is mainly divided into direct Memory and JNI(Java Native Interface) Memory:

Direct Memory: Through UnSafe. AllocateMemory and ByteBuffer. AllocateDirect navtie methods, such as application of heap memory, and through DirectByteBuffer object stored in the JVM heap is operated in reference to the memory; For example, the common NIO class applies for out-of-heap memory through native method, because it avoids copying data back and forth between JVM heap and Native heap (zero copy), it can improve efficiency in some scenarios. You can run the JCMD pid VM. Native_memory detail command to view the memory distribution of the direct memory ~
JNI Memory: The Code has out-of-heap Memory applied by calling Malloc, MMAP, BRK and other Native codes through JNI;

The JVM uses the -xx :MaxDirectMemorySize parameter to control the maximum amount of out-of-heap memory that a vm can apply for. If this parameter is not configured in Java8, it is the same as -xmx by default.

For direct memory: When the virtual machine creates a DirectByteBuffer in the heap, it will associate a PhantomReference object — Cleaner for the object. Through the mechanism of virtual reference, when the DirectByteBuffer is recycled, the Cleaner will be triggered. Unsafe.freememory (long address) is indirectly called to free out-of-heap memory.

For JNI memory: since there are no direct references in the heap, you have to rely on the calling thread to free itself.

1.7.1 OOM Failure Caused by Memory Leak symptom

Because out-of-heap memory is not handled by the garbage collector, but is managed directly by the operating system; In the operating system, when the process space is insufficient, the swap area will be opened on the disk, and the GC time will soar, and the user thread will be blocked.

Out-of-heap memory The main cause of OOM is that an application requests out-of-heap memory and does not tell the operating system to release it.

Active request for direct memory not released: The JVM designers have taken this into account by calling the bits. reserveMemory method each time a DirectByteBuffer is created. System.gc() triggers Full GC to force all invalid DirectByteBuffer references from Young and Old to be released; However, if the JVM parameter is set to -xx :+DisableExplicitGC, system.gc () becomes an empty method, which cannot trigger gc to reclaim out-of-heap memory manually, resulting in OutOfMemoryError: Direct Buffer Memory;
Solution: Production environment Settings -xx :+DisableExplicitGC Disables manual GC to prevent STW service unavailability due to manual foreground GC. But the JVM can also set – XX: XX: + ExplicitGCInvokesConcurrent and manual concurrent GC + ExplicitGCInvokesConcurrentAndUnloadsClasses for the background, This allows manual GC to avoid STW causing service unavailability.
Off-heap memory unfreed solution via JNI: Cases involving OFF-heap MEMORY are difficult to handle because there is no intuitive tool to analyze; Try using the gperfTools tool, which dynamically converts the malloc call to its wrapper libtcmalloc.so while the program is running, and then calculates the memory allocation. However, since JNI provides not only malloc for memory allocation, but also MMAP/BRK, this approach can only do a simple analysis;

1.7.2 GC delay caused by JNI method safety points

Because JNI methods can reference objects in the JVM heap, if GC occurs, the object will be moved, and the JNI method may retrieve the memory address of the object after GC, resulting in a runtime error. JNI Memory is not managed by the garbage collector, so write barriers, Brook Pointers, dye Pointers, and other measures do not apply to it.

Therefore, the JNI method enters the safe point when the current method runs, and all threads cannot run the new JNI method at this point.

The impact:

If it isYoungExtents are not allocated and exist at this timeJNIMethod does not exit because it cannot proceedYoung GCWill go straight into the old age;
If the old age is still not enough to distribute, it can only waitJNIMethod exits, and the thread blocks.

Practice 2.

2.1 Out-of-heap Memory OOM

2.1.1 No setting -xx :+DisableExplicitGC

public class NonHeapOOM { // -Xmx64m -Xms64m -Xmn32m -XX:+UseConcMarkSweepGC // -XX:+PrintGCDetails -XX:MaxDirectMemorySize=10m public static void main(String[] args) throws InterruptedException { int i = 1; While (true){system.out.println (" "+(i++)+"); Thread.sleep(1000l); / / each assignment 1 m outside the heap memory ByteBuffer ByteBuffer = ByteBuffer. AllocateDirect (1024 * 1024); }}}Copy the code

If we set the out-of-heap memory to 10m and allocate 1m per cycle, then system.gc () will trigger memory reclamation on the 11th out-of-heap allocation. In GC, periodic GC is performed, as shown in the following figure:

2.1.2 Settings – XX: + DisableExplicitGC

-Xmx64m -Xms64m -Xmn32m -XX:+UseConcMarkSweepGC -XX:+PrintGCDetail-XX:MaxDirectMemorySize=10m -XX:+DisableExplicitGC
Copy the code

Young GC is not triggered by the garbage collector, so it will not free the memory out of the heap. When the 11th memory is allocated, it will directly report OOM:

3. Reference and Summary:

Practice article feel that their own examples really do not have any reference value, so I write this blog reference to all the GC problem example article links posted below, you can refer to read ~

In-depth Understanding of the Java Virtual Machine

This article is enough for GC principles and performance tuning practices!

Here’s your online GC case

Analysis and solution of 9 common CMS GC problems in Java

Practical GC tuning cases – Analysis and resolution of long pauses in young GC

One online JVM tuning practice, FullGC40 times a day to 10 times a day

Record a Metaspace OOM problem

Netty memory leak detection feast

Troubleshooting of “out-of-heap memory leakage” caused by Spring Boot and experience summary

Troubleshooting an OUT-OF-heap OOM problem

Linux memory allocation summary – MALloc, BRK, Mmap

JVM out-of-heap memory utilization improvements: DirectBuffer details

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Analyze common CMS GC issues

1. GC

1.1 Generation collection theory

1.2 Basic Concepts: