The knowledge for this chapter is derived from Chapter 3 of Understanding the JVM Virtual Machine. Garbage collectors are designed with three core issues in mind:

  1. Which objects need to be reclaimed?
  2. When is it recycled?
  3. How to recycle?

For Java developers, garbage collection is done by virtual machines. However, these “automated” technologies must be monitored and adjusted when it comes to troubleshooting memory spills and leaks. The following is the brain map of this chapter (zoom in) :

Virtual machine garbage collection is mainly for heap space, which is used to hold various objects. Therefore, determining which objects are “dead” and which are “alive” is the first step in the garbage collector’s work, and there are two methods: reference counting and reachability analysis.

1. Reference counting method

The logic of this method is simple: add a reference counter to each object, increment it by one each time it is referenced, and decrease it by one when a reference to it is released. This approach still works in some applications, but at least it’s not adopted by the JVM. One example is sufficient: the example of circular referencing.

public class Recycle {
    /** * VM args : -XX:+PrintGCDetails */
    public static void main(String[] args) {

        Obj a = new Obj();
        Obj b = new Obj();

        a.reference = b;
        b.reference = a;

        a = null;
        b = null; System.gc(); }}class Obj {

    private static final int _1MB = 1024 * 1024;
    // Use it to take up a bit of space so that you can see the memory changes in the GC printed logs.
    private byte[] box = new byte[_1MB];

    public Obj reference = null;
}

Copy the code

Here is the result of the program:

[GC (system.gc ())] [PSYoungGen: 5379K->872K(38400K)] 5379K->880K(125952K), 0.0010217 secs] [Times: User =0.00 sys=0.00, real=0.00 secs] [Full GC (system.gc ()) [PSYoungGen: 872K->0K(38400K)] [ParOldGen: 8K->656K(87552K)] 880K->656K(125952K), [Metaspace: 3206K->3206K(1056768K)], 0.0048771 secs] [Times: User sys = = 0.09 0.00, real = 0.00 secs]Copy the code

PSYoungGen was reclaimed from 5379 K to 872 K, indicating that the virtual machine did not give up on retrieving objects A and B because of their “left foot on right foot” operations. The objects referenced by A and B are clearly inaccessible, and the number of references to both is not zero. This example shows that the JVM does not implement memory reclamation using reference counting methods.

The PS Avenge collector. YoungGen is the YoungGen exploiter.

2. Accessibility analysis

Reachability analysis is the current mainstream language approach to garbage collection. The idea is to take GC Roots as the root and search along the “reference chain”. If an object is not on the reference chain (or unreachable GC Roots), then it cannot be referenced again.

GC Roots is a collection. Prior to the formal reachability analysis, the virtual machine first performs root enumeration (described later) to find objects that can be used as GC Roots. These objects can be put into collections:

  1. Objects referred to in the local variable table of the stack frame, such as parameters, local variables, and temporary variables in methods executed by individual threads.
  2. The object referenced by the static property in the method area.
  3. Method area objects in the constant pool, such as references in the String constant pool (String Table).
  4. References that are resident within the JVM, such as Class objects for primitive data types, resident exception objects, and system Class loaders.
  5. Objects referenced in JNI (Native methods) in the Native method stack.
  6. The object held by the synchronization lock.
  7. Indicates the internal conditions of the Java virtual machine.

3. It’s time to Die or Live

Whether an object is referenced or not determines whether it is worth living. Prior to JDK 1.2, the division of references was a “one size fits all” approach: an object was either referenced or discarded. However, these two states are too narrow a description of what is tasteless to eat and what is a pity to discard.

Take the computer’s various cache files as an analogy: when the computer has enough storage space, we can tolerate cache files (since they generally speed up access, it’s ok to stick around). But when storage space on a computer is tight, we need to clear it to make room for other, more important files.

So, after JDK 1.2, Java added to the concept of references, dividing them into four levels, decreasing from top to bottom:

  1. Strong reference. It is common in all kinds of assignment statements, such asObj a = new Obj();. As long as a strong reference exists, the object will never be reclaimed.
  2. Soft reference. Soft references are used to describe objects that are available, but not required (similar to caching). Before an OOM exception occurs, the vm will recycle all soft-referenced objects in the recycle list. If not enough memory is reclaimed this time, an OOM error is thrown.
  3. A weak reference. Objects associated with them can only survive until the next garbage collection, regardless of whether memory is sufficient or not.
  4. Phantom reference. Also known as phantom reference, it does not affect the life cycle of the associated object, it is used when the object is reclaimed, the system gets a virtual reference, and records it.

3.1 Code practices of the four references

Strong references are the most common situation in development. For this type of reference, the virtual machine would rather throw an OOM exception than reclaim the memory. In addition, if you want to reclaim the strong reference at some point, you can assign it to NULL. The collector reclaims the “leaked” object space the next time it works.

Object a = new Object();
a = null;
Copy the code

For soft references, we use the SoftReference

generic class. Before running the experiment, it is a good idea to set the virtual machine parameters to limit the heap space for quick results. The idea is to create a soft reference, then a strong reference to force the virtual machine to do garbage collection, and then see if the soft reference still exists.

Note that SoftReference itself is a strong reference and will not be reclaimed (unless the SoftReference itself is assigned a null value). Its internal data (the object obtained via.get()) is a WeakReference that meets the SoftReference definition.

public class Cycle {
    /** * The JVM heap space is now 20MB, not allowed to expand. * VM args : -Xms20m -Xmx20m */
    public static void main(String[] args) {

        int _1MB = 1024 * 1024;
	    // Apply for a 10MB array and use a soft reference association.
        SoftReference<byte[]> softReference = new SoftReference<>(new byte[_1MB * 10]);
        
        // This soft reference exists.
        System.out.println(softReference.get());
        
        // Create another 10MB space associated with strong references. An OOM error occurs when GC is not performed.
        byte[] _10MB = new byte[ _1MB * 10];
        
        // To avoid OOM, the original soft reference is recycled.System.out.println(softReference.get()); }}Copy the code

The practice for weak references is simple: create a weak reference, then actively do a garbage collection, and see if it still exists after that.

WeakReference<Object> weakObj = new WeakReference<>(new Object());
System.out.println(weakObj.get());
System.gc();
System.out.println(weakObj.get());
Copy the code

A virtual reference is different from a soft or weak reference. Such as:

// Ignore the second argument for now and pass in null.
PhantomReference<Object> phantomReference = new PhantomReference<>(new Object(), null);
System.out.println(phantomReference.get());
Copy the code

You can observe that even when memory is abundant and garbage collection is not performed, nothing remains inside the virtual reference. When an object obj associated with another reference is reclaimed, the virtual reference is placed in a “message queue” for processing, usually to record the status of the object being reclaimed.

You can think of it like the observer pattern: virtual references only act as sentinels for other objects, but are not responsible for holding references to objects, and thus do not affect the life cycle of objects.

Therefore, in the code block above, the second parameter needed to construct PhantomReference

actually needs to be passed to a ReferenceQueue

queue. When the object in the first argument is reclaimed, the virtual references to it are placed in the queue for processing. Here’s an example:

public class Recycle {
    /** * VM args : -XX:+PrintGCDetails -Xms20m -Xmx20m */
    public static void main(String[] args) {

        ReferenceQueue<Obj> notifyQueue = new ReferenceQueue<>();
        WeakReference<Obj> obj = new WeakReference<>(new Obj());
        PhantomReference<Obj> phantomReference = new PhantomReference<>(obj.get(),notifyQueue);
        System.out.println("phantomReference(hashcode=" + phantomReference.hashCode() + Creates a virtual reference association with the weak reference object obj.);
        
        // Weak reference objects are reclaimed after a GC operation.
        System.gc();

        // Create a new thread to keep getting notifyQueue information
        new Thread(() -> {
            while (true) {
                Reference<? extends Obj> poll = notifyQueue.poll();
                if(poll ! =null) {
                    System.out.println("phantomReference(hashcode=" + poll.hashCode() + ") the associated OBJ was reclaimed"); } } }).start(); }}class Obj {}
Copy the code

4. The finalize method

What you can do with finalize(), you can do better with a try-catch statement. We only understand the impact of this approach on garbage collection here and do not recommend active use of this approach in actual development.

The “unreachable” object screened by the first round of reachability analysis will not be reclaimed immediately. In fact, the VIRTUAL machine will determine whether the object needs to execute the Finalize () method. The virtual machine deems it unnecessary to perform it in two cases:

  1. The class to which the object belongs has no active overridefinalize()Methods.
  2. The virtual machine has already called the object oncefinalize()Methods.

For objects that need to be finalize(), the virtual machine places the objects ina Queue named F-Queue and executes the Finalize () method of those objects through an additional low-priority thread of Finalizer. Note that the thread may not wait for every Finalize () to complete before collecting space, nor is it guaranteed to execute finalize() methods of every object in order, because the virtual machine allocates resources to more valuable user threads.

Objects entering the f-queue still have a chance to save themselves: If the object is successfully reconnected to GC Roots during finalize(), the virtual machine will remove it from the “death” Queue in the second filter to see if there are any “escaped” objects in the F-queue.

From the point of view of an object, it has gone through this journey from being marked to being reclaimed:

Here’s an example in code.

public class Recycle {

    Obj can save itself by linking itself to this static reference.
    static Obj SAVE_POINT = null;

    public static void main(String[] args) throws InterruptedException {

        Obj obj = new Obj();
        System.out.println("Obj object hashCode =" + obj.hashCode());
        
        obj = null;
        System.gc();

        // Let the main thread sleep for 500 seconds because the Finalizer thread has too low priority.
        Thread.sleep(500);

        if(SAVE_POINT ! =null) {
            System.out.println("SAVE_POINT reference hashcode =" + SAVE_POINT.hashCode());
        }else{
            System.out.println("The first self-rescue failed.");
        }

        SAVE_POINT = null;
        System.gc();
        Thread.sleep(500);

        if(SAVE_POINT ! =null) {
            System.out.println("SAVE_POINT reference hashcode =" + SAVE_POINT.hashCode());
        }else{
            System.out.println("The second self-rescue failed."); }}}class Obj {
    @Override
    protected void finalize(a) {
        System.out.println("Finalize rewritten!");
        Recycle.SAVE_POINT = this; }}Copy the code

Because the Obj class overrides the Finalize () method, the virtual machine does not reclaim an instance of the Obj class immediately. Instead, the virtual machine places it in the “death-queue” of the F-queue. The results of the whole program are as follows:

Obj object hashcode = 1639705018 Finalize overwritten! SAVE_POINT quotes hashcode = 1639705018 the second self-rescue failed.Copy the code

There were two duplicate blocks in the previous code, but the object only successfully saved itself in the first block. The reason is the second one mentioned above: the virtual machine will not execute the Finalize () method of an object repeatedly, so an object has only one chance to “save” itself again through the code block in Finalize ().

However, finalize() methods cannot be understood as destructors in C/C++. It is expensive and uncontrollable to run, so it is not officially recommended to actively rewrite and call the Finalize () method. If you must do something other than close system resources, you can do it with a try-catch block.

5. Space reclamation in the method area

Prior to JDK 7, most people considered method sections to be equivalent to permanent generation, hence the term “permanent generation collection.” As defined in the Java Virtual Machine Specification, any virtual machine may not implement garbage collection of the method area. Permanent generation means that a class itself is “uninstalled” by the VIRTUAL machine. However, to determine whether a class is “useless”, it must meet the following stringent criteria:

  1. All instances of this class and related subclasses have been reclaimed.
  2. The classloader that loads the classClassLoaderIt is used to read the outside.classFile and convert tojava.lang.ClassClass) has been reclaimed. In general application scenarios, it is difficult to change this condition artificially.
  3. Corresponding to the classjava.lang.ClassIs not called in any place, nor is anything related to the class invoked by reflection.

However, even if these three conditions are met, the virtual machine simply marks the class as “recyclable” and does not execute it immediately. A single GC in the Java heap (especially in the new generation) tends to reclaim 70% to 90% of memory, whereas garbage collection in the method area is a high-investment, low-return activity.

However, in the application scenario where reflection, dynamic proxy, and Cglib are used to generate a large amount of bytecode dynamically, the virtual machine is required to be capable of class unloading to save space in the method area. The garbage collection mentioned in this chapter is all about Java heap space, so only the “young generation” and “old generation” are covered below.

6. Garbage collection theory

Garbage collection methods can be divided into “Reference Counting GARBAGE Collection” and “Tracing garbage collection” from the perspective of how to determine the death of objects. Since virtual machines actually use the latter, this article focuses only on the latter.

Which of the following three rules of thumb apply to the garbage collector of a commercial VIRTUAL machine:

  1. The Weak Generational Hypothesis: most people were born and died.
  2. The Strong Generational Hypothesis: that hard-to-recycle objects are tougher.
  3. The Intergenerational Reference Hypothesis: two objects that have a Reference relationship tend to be born and die together.

The first two rules of thumb lay down the design principle of generational recycling for virtual machines: the majority of objects that die (the new generation) and the minority that endure multiple GC rounds (the old generation) should be stored separately in the Java heap. In this way, the virtual machine can improve the efficiency of garbage collection by simply focusing its GC work on the new generation and promoting a few “tough” objects to the old generation (old age does not mean no collection).

6.1 Granularity of GC collection

According to the characteristics of the new generation (the majority, with high GC returns) and the old generation (the minority, with low GC returns), three garbage collection ideas are proposed: mark-sweep, mark-copy, and mark-collation. In addition, the GC work is divided into finer granularity:

  1. Partial GC: Collects Partial portions of the Java heap space.
    • Minor GC/Young GC: Collects garbage from new generation objects.
    • Major GC/Old GC: Garbage collection of Old objects.
    • Mixed GC: The current mode of operation of the G1 collector because it does not distinguish between Cenozoic and old-age in the traditional way.
  2. Full GC: Collects the entire Java heap, equivalent to a Minor + Major GC.

When a new object is created, it is first allocated to the young generation of the Java heap. In addition, there are two kinds of promotion conditions for young generation objects:

  1. Boil too many rounds of young generation collection.
  2. This object takes up a lot of space and is stored directly in the old age from the “moment of birth” in order to save the cost of moving it around the Java heap.

6.2 GC trigger timing of different granularity

Most garbage collectors are either Minor or Full except for the G1 collector that uses Region for Java heap partitioning management and involves Mixed GC. Garbage collectors always prefer Minor GC because Major GC is much less efficient than Minor GC (so it is rare, but not always, to pursue Major GC alone, such as the Parallel Avenge avenge). If the Minor GC still runs out of space, the Major GC continues (which is equivalent to the Full GC).

The “Eden”, “Survivor”, “memory allocation guarantee” mentioned below can be referred to the “Appel” collection process mentioned in the mark-copy algorithm.

Minor GC

When the Eden space in the young generation does not have enough space to allocate, Minor GC is performed for that space first.

Full GC

  1. After each Minor GC, some objects are promoted from Survivor space to old age. The garbage collector actually records the size of this part of the object as an empirical value. After the next Minor GC, the collector “decides” whether there is enough room for this round of promotion to the old age based on previous experience values. If the collector determines that there is not enough space for older years in this Minor GC, it proceeds with a Major GC.

  2. In addition, since large objects are also promoted directly to older ages, a Major GC is also performed if there is not enough contiguous space available for older ages.

If there is still no free space after the Major GC (Full GC), an OOM exception will be raised and the heap is out of space.

6.3 Cross-generation Reference Problems

However, splitting the Java heap in two can be awkward for cross-generation references: is it possible, for example, that a new generation object to be collected is being referenced by an older object while doing a Minor GC? If so, it should not be recycled in case a null-pointer exception is thrown from an older object. In order to prevent accidents, the garbage collector will have to search the entire old age area in addition. Reverse the Major GC and you’ll encounter a similar situation.

The third lesson is that cross-generational references are only part of the phenomenon. This is derived from the reasoning of the previous two experiences. For example, if a new generation object references an old age object, the cross-generation reference will make the new generation object also difficult to recover. As time goes by, the new generation will also be promoted to the old age, and the cross-generation reference will also disappear.

Therefore, according to this hypothesis, the garbage collector does not need to scan the entire generation again and again based on the GC Roots. It only needs to create a similar index to record the memory that has cross-generation references (the “memory Set” Remembered Set, which we will talk about later) for subsequent processing. While this increases the cost of additional maintenance of the data structure, it is still cost-effective compared to scanning the entire old era.

Garbage collection algorithms

The following three garbage collection algorithms are widely used by the garbage collector, each with its own advantages and disadvantages:

7.1 Mark-clear algorithm

Proposed in 1960 by John McCarthy, the father of Lisp, this algorithm is the most basic and logically simplest garbage collection algorithm. When the garbage collector works, all recyclable objects in the target region are marked in the first round, followed by all marked objects being reclaimed in the second round. The most significant disadvantage of this algorithm is that after a single GC, there is a large amount of fragmentary space in the heap space. Therefore, when large objects fail to allocate memory, another round of GC can be triggered and a vicious cycle can occur.

The following is a schematic illustration of the large amount of fragmentary space generated by the tag clearing algorithm:

7.2 Mark-copy Algorithm (suitable for young generation)

In 1969, Fenichel proposed a “use half, leave half” algorithm for Semispace Copying: Only half of the space is used at a time, and when one half of the space runs out, GC is performed, leaving the remaining living objects to move to a brand new half, which is completely empty and waiting for the next round of use.

At present, most virtual machines recycle the new generation space based on this algorithm (this algorithm will be implemented in the new generation space by default in the following introduction), because for the new generation, only a few objects survive, which means that the cost of moving objects is reduced. In addition, half of the “completely clean” memory space is left after each GC, so you don’t need to worry about memory fragmentation like mark-sweep algorithms do.

But the downside is obvious, as the available space is reduced by 50%. IBM conducted a study that quantified the “life and death” — 98 percent of objects are disposable, or they don’t survive the first round of garbage collection, so there’s no need to allocate backup space in a 1:1 ratio.

In 1989, Andrew Appel proposed a more reasonable proportion partition replication generation strategy based on this feature, which is also called “Appel recycling “: The whole new generation will be divided into one Eden space and two Survivor Spaces, and only one Eden space and one Survivor space will be used each time (the Survivor space in use is also called “From” space, and the spare one is called “To” space). The ratio of Survivor and Eden to space allocation is 8:1, in other words, the space utilization of the new generation increases from 50% to 90%.

The process can be understood by referring to the diagram: Eden space occupies most of the space and is used to store a large number of “living and dying” objects. After a GC, most of the new objects in Eden space die, as well as a small number of objects From Eden space that were “expected to be promoted to the old age”. The remaining survivors of the two parts were moved together into the reserved To space. Then, the To space will be changed into From space in the next round, and the original From space will be cleaned up and become a new To space. This screening process will go round and round.

Being moved To the To space means that the object survives at least once during the GC. Objects that repeatedly hop in From/To space and are not recycled after a certain number of rounds (the default value for HotSpot virtual machines is 15) are eventually promoted To old age objects.

Of course, there is no guarantee that a Survivor space will hold all the surviving objects at once. If objects that occupy more than 10% of the total interval of the new generation survive in a SINGLE GC, then other memory regions are required for Handle Promotion. Guaranteed allocation means that a “guarantor” guarantees that when Survivor space runs out, it can “replace” the remaining required memory space without an OOM. And this “guarantee” is mostly “wool pulled” from older areas.

In addition, as mentioned earlier, the garbage collector avoids moving large objects back and forth frequently, so when an object takes up a certain amount of space, it will be promoted directly to the old age.

7.3 Mark-Collation Algorithm (applicable to older years)

Mark-copy algorithms move a large number of objects when the object survival rate is high, which is obviously not suitable for older ages. In 1984, Edward proposed the mark-collation algorithm in view of the characteristics of the old era. The difference between it and the mark-clean algorithm is that the surviving objects will be arranged in compact order. In other words, the mark-sweep algorithm is non-mobile, while the mark-tidy algorithm is mobile. Whether or not you choose to move objects is a situation with both advantages and disadvantages:

If you choose to move objects, then all references to those objects need to be updated. To do this, you must pause all user threads to save the scene — or, more figuratively, “Stop the world”. If objects are cleared without moving, the problem of memory fragmentation needs to be solved by more complex memory management systems, such as “free list” memory allocation problems. However, memory allocation is one of the most frequent operations of a user program. If you lose a lot of time in memory allocation, your overall application throughput will suffer.

In summary, if you choose to move objects, you make the memory reclamation process more complicated, and if you choose not to move objects, you make the memory allocation process more complicated. From the garbage collector’s pause times, there is almost no pause time required for not moving objects. But in terms of program throughput, moving objects is a more far-sighted choice: while not moving objects reduces the pause time for garbage collection, it leads to more complex allocation of memory, which far outweighs the benefits.

However, there is a compromise: in the early days when space is sufficient, a low-latency mark-clean algorithm can be used and some memory fragmentation can be tolerated. Until the fragmentation reaches a point where it affects the allocation of large objects, the mark-de-clutter algorithm is used uniformly to address it — which is what the CMS collector does.

8. Enumerate the root node

Enumerating root nodes, the process by which the garbage collector looks for all possible GC Roots nodes, is a step before performing an accessibility analysis. To date, all garbage collectors have had to suspend all user threads to perform this step, “Stop the world”.

The nodes that can be fixed as GC Roots are mainly references within the local variable table in the stack frame, or global references within the method area. For a large Java application, the content of the method area alone is in the thousands of megabytes, not to mention the temporary scanning of each thread’s method stack, each stack frame within the stack, and each local variable table within the stack frame.

When a class is loaded, it analyzes some key points and notes which reference its instance object will have at which offset, where its valid region is, It is then encapsulated into a data structure called OopMap (short for Ordinary Object Pointer Map) for storage. For just-in-time compilation, the virtual machine also records in advance which locations in the stack and registers are references at key points. In general, this approach to a certain extent avoids the virtual machine to stack frame global scan, improve the efficiency of enumeration.

For example, I used the HSDIS disassembly tool to convert a.class bytecode file into native code and intercepted some of the results:

[Constants] # {method} {0x0000000017472b48} 'newObj' '()V' in 'com/jitT/TestMainClass' # [sp+0x20] (sp of caller) 0x0000000002ba72a0: mov %eax,-0x6000(%rsp) ... 0x0000000002ba72f5: mov %rdx,%rbp 0x0000000002ba72f8: data16 xchg %ax,%ax 0x0000000002ba72fb: callq 0x0000000002ae61a0 ; OopMap{rbp=Oop off=96} ; *invokespecial <init> ; - com.jitT.TestMainClass::newObj@4 (line 12) ; {optimized virtual_call} 0x0000000002ba7300: add $0x10,%rsp 0x0000000002ba7304: pop %rbpCopy the code

In the callq instruction at 0x0000000002ba72Fb, we observed an Oop{RBP =Oop off=96} flag indicating the presence of a common object pointer in the RBP stack base register, The valid range starts from the callQ instruction until the start of the instruction stream 0x0000000002BA72A0 (HEX) + 96 (dec) = 0x0000000002BA7300, i.e., until the next add instruction, Because the object of the RBP register was popped at instruction 0x0000000002ba7304 (hsDIS and JITwatch tools can refer to this link. This Baidu cloud disk link (extract code: DEh4) provides ready-made 64-bit Windows side HSDIS-AMd64. All dependent library).

Because the reference relationship can change at any time as the thread executes, in theory, each instruction that changes the reference should be followed by an OopMap. However, if all oOPmaps are recorded in detail, this can cause storage problems.

8.1 security point

Therefore, the virtual machine actually records OopMap only at key points that the author has repeatedly emphasized, which are actually called safe points. Assume that all current user threads are in the run state. When enumerating the root node, all user threads do not stop immediately, but run to this safe point to ensure that the actual reference state of the program is consistent with the reference information provided by the OopMap at that safe point.

The criterion for selecting a safe spot is whether it is likely to get stuck in a long wait. The enumeration root node does not start until all user threads have safely paused, so it is not acceptable for a few time-consuming threads to “creep in” to the safe point and delay the overall work progress. Sequential instruction flows generally do not take much time, but where instruction reuse is involved: method calls, loop jumps, exception jumps, etc., they tend to take a long time, and these are the places where safety points appear.

When it comes to coordinating all user threads to a safe point, there are two choices, but in fact the majority of virtual machines choose the latter:

Presuspension: The system first forces all user threads to pause, then allocates cpus to those threads that do not reach a safe point and waits for them to complete.

Voluntary Suspension: The thread keeps polling the same flag bit during execution. When this flag bit changes, the threads run to the nearest safe point and then actively suspend.

8.2 Security Zones

However, the use of safety points assumes that all user threads are currently in the RUN state. Threads that are suspended due to system calls or CPU usage time slices cannot respond to flag bit changes. As for the virtual machine, it is also obvious that it does not have the patience to wait for these threads to reallocate to the CPU before moving to a safe point.

The solution to this problem is to “extend” the security points into a security zone, which means that the reference relationship in a certain area will not change. This way, when the virtual machine is enumerating root nodes, even if the thread is suspended and not docked at the specified safe point, as long as it is in an area that does not affect OopMap, garbage collection can still proceed.

For a thread, when it enters code in a safe zone, it uses flag bits to indicate that it is in a “safe state”. When it leaves the security zone, it checks to see if the root node enumeration has been completed. If completed, the thread executes as usual. If not, the thread waits until it receives a signal that it is allowed to leave the safe zone.

9. Memory sets and card tables

First, Remember is used to identify “cross-generation references” so that the collector can handle these “individual cases” separately. In fact, even garbage collectors like G1, ZGC, and others that involve Partial GC face similar problems. The idea is to make it smaller (which I understand as indexing) : to logically break the entire Java heap into regions of a specific size. If there is a way to keep track of which region a cross-generation reference occurred, the collector only needs to search for that region, avoiding a scan of the entire Java heap range.

As for the granularity of this area, there are some options for reference below, but the third option is actually selected:

  1. Word length precision: A record maps a machine word length (such as 32 or 64 bits) that contains fields containing Pointers to objects.
  2. Object precision: A record is mapped to an object that contains an object pointer.
  3. Card precision: Each record is accurate to an area of memory that contains object Pointers.

For Card accuracy, each mapped memory area is also called a Card Page. And the collection of storage (record -> memory) is also called Card Table. For the Hotspot VIRTUAL machine, a card page is 2^9 bytes, or 512 bytes, and a card page may store multiple objects. So its card table markup logic is:

CARD_TABLE [this address >> 9] = 0; 
Copy the code

Each card page corresponds to a value in bytes. A value of 0 indicates that it is clean (no cross-generation references have occurred) and a value of 1 indicates that it is dirty (cross-generation references have occurred internally).

9.1 write barriers

The next question is: when should I change the corresponding card table to get dirty when cross-generation references occur? Obviously, this action should be performed the moment the assignment begins.

The virtual machine has ample room to intervene when loading bytecode instructions to interpret the executed bytecode. In some cases, the virtual machine is already loading the compiled bytecode file. The virtual machine maintains the state of the card table by inserting additional actions before and after reference changes through Write barriers. The process can be analogous to AOP’s idea of circular notification, and write barriers can also be divided into pre-write barriers and post-write barriers.

The virtual machine, after applying an unconditional write barrier, inserts card table maintenance operations (whether cross-generation references or not) in all references to assignment, which adds extra overhead but is nothing compared to scanning the entire GC Roots. In the case of high concurrency, card tables may suffer from pseudo-sharing problems.

Expand a popular explanation of compile execution and explain execution.
Compile execution is equivalent to writing.java files in advance and compiling them by Javac into.class files ready for loading by the JVM. Explain execution is similar to the "q&a" interaction we have with the REPL terminal and the JVM, except that the latter does not generate a.class file.

9.2 Pseudo Share Problem

Modern CPU caches are measured in units of a Cache Line. If a cache row is 64 B and a value in the card table occupies 1 Byte space, the cache row contains 64 card entries, which map 32 KB of memory space.

If, in the concurrent case, two threads change the reference relationship in the 32 KB space at the same time, eventually both changes will affect a cache row. Obviously, the two threads can affect each other (write back, invalidate, or synchronize) and cause performance degradation.

Even though these data share a single cache row, the entire cache row must be refreshed whenever any change is made, which is the “pseudo-sharing” problem. For write barriers, the optimal approach is to “try not to change the state of the card table” : the card table is marked dirty only when it is not marked.

10. Concurrency accessibility analysis

The reachability analysis will take all the nodes of GC Roots as the starting point to perform BFS on the “reference graph”, thus consuming more time. We’ll start with the case where the user threads are all frozen (serial), that is, the reference relationship does not change while the garbage collector is working. Here we introduce tri-color Marking to illustrate. Given the following conditions:

  1. Black: All references within this object have been scanned.
  2. Grey: The object is being scanned.
  3. White: This object has not been scanned yet.

The process of reachability analysis is from white block (not scanned) to gray block (scanning) to black block (scanning completed), but it is worth noting that objects marked as black block are not rescan now.

The author uses PPT animation to demonstrate the progress of a garbage collector under normal conditions (reference relationships do not change dynamically) :

In reality, however, the garbage collector’s marking process is concurrent with the user’s thread. This means that there are two kinds of unintended consequences: first, objects that should have been collected are marked as “alive,” which is called floating garbage. The second is the recycling of objects that should survive (hereinafter referred to as the “object disappearance” problem), which is obviously far more serious.

But the garbage collector has a problem with this: it’s like being told to drop everything and sit still in your chair while your mom cleans the house. If she cleans while you throw confetti, the house will never clean properly.

10.1 Object Disappearance Problem

Object disappearance occurs when both conditions are met:

  1. Added A reference to one or more black object A objects to white object B.
  2. Removed all direct or indirect references from grey object C to white object B.

Black object A re-references white object B, but gray object C removes A direct reference to white object B and causes the “object to disappear” :

Black object A re-references white object B, but gray object C removes the indirect reference to white object B and causes the “object to disappear” :

To prevent “object disappearance”, it is only necessary to break either of these conditions, so two solutions are born:

Incremental Update: If a reference is added to a black object, it becomes gray again. The gray objects are scanned again after the concurrency ends. This method breaks the first condition.

Snapshot At The Beginning (SATB) : Grey objects are recorded as soon as certain references are unreferenced. Re-scanning these gray objects after the concurrency ends breaks the second condition.

For the Hotspot virtual machine, both approaches are used. CMS is based on incremental updates, while G1 is based on raw snapshots.

11. Classic garbage collector

In practice, virtual machines tend to work together with two garbage collectors and have them specialize in Minor or Major GC, depending on the nature of the collector.

A garbage collector can be measured by three metrics: Footprint, Throughput, and Latency. According to the “There is no such thing as a free lunch theorem,” there are few garbage collectors that perform well in all three areas, because a garbage collector’s performance advantage in one area must come at the expense of the others. Therefore, these three indicators are also called “impossible triangle”.

Therefore, different garbage collectors (combinations) should be selected in application scenarios that pursue different metrics.

Serial/Serial Old

Serial is the most basic garbage collector and was the only choice for Hotspot VIRTUAL machines prior to JDK 1.3. The Serial collector also spawned the Serial Old collector specifically for the collection of the Old, which is based on the mark-collation algorithm. They are “single-threaded” collectors, where in addition to the fact that the garbage collector is only using a single CPU to complete its work, it also means that all other user threads are suspended while the collector is working, a phenomenon known as “Stop the world” :

But even today, Serial garbage collector remains the default new generation collector for Hotspot VIRTUAL machines in client mode. It has the advantage of being simple and efficient. Especially for single-core processors or machines with a small number of processors, Serial garbage collector has no overhead of thread interaction, so it can focus on garbage collection and maximize the efficiency of single-threaded operation.

11.2 ParNew (Young generation)

ParNew is equivalent to the concurrent version of Serial garbage collector. It has the same characteristics as Serial except that it uses multiple GC threads to work together. The reason I still cover it is that it is the preferred next-generation garbage collector in JDK 7 and earlier because it is the only one that works with the CMS garbage collector described below.

The ParNew collector will never work better than Serial in a single-threaded environment. Even because of the overhead of thread interaction, the collector’s hyper-threading implementation is less pseudo-dual-core than the Serial collector. However, the greater the number of processors on the server, the greater the advantage of ParNew.

Insane.

The Parallel Avenge is a new generation garbage collector, also based on a mark-copy algorithm and supported by multiple GC threads working together. What makes it special is that it achieves a manageable throughput. Throughput is calculated as user thread execution time/(user thread execution time + garbage collector execution time). For example, in a 100-minute working time, the virtual machine executes user threads for 99 minutes, resulting in a throughput of 99%.

These types of collectors are better suited for analysis tasks that require less interaction with the user than garbage collectors that seek low pause times. They are also called “throughput first collectors.” The antithesis is the “low latency garbage collector”, typically represented by Shenandoah, ZGC, etc.

The Parallel Scavenge collector provides two virtual machine startup parameters to control throughput: the -xx MaxGCPauseMillis, which strictly controls the maximum pause time per GC, and the -xx GCTimeRatio, which sets the throughput ratio directly. However, the lower the pause times, the better. With the same amount of recycle space, the decrease of pause time will inevitably lead to the increase of recycle times. A similar trade-off occurs in the G1 collector with a “time pause model.”

If manually tuning these parameters is difficult, use the Parallel Exploiture -xx: useadaptive IzePolicy: Even with the ** adaptive tuning strategy **, the garbage collector adjusts the appropriate Eden/Survivor space ratio based on the actual run, allowing for direct promotion to older object thresholds and so on. This is also one of the biggest features of the ParNew collector.

However, it is interesting to note that prior to JDK 6, the Parallel Scavenge collector, which focused on younger generation collections, could not find a suitable partner to the older collector, including the CMS garbage collector, which was already in place. So once it was chosen as a young collector at the time, there seemed to be no alternative to the relatively low-performing Serial Old collector.

11.4 Parallel Old

When the old collector appeared in JDK 6, the throughput-first collector finally came into its own. It is an older version of the Parallel Insane, supporting concurrent collection and based on the mark-collation algorithm. The Parallel Avenge and the Parallel Old collector can be considered in applications where throughput is important or where processor resources are scarce.

11.5 CMS (Old era)

CMS, also known as Concurrent Mark Sweep, is a garbage collector that seeks minimum collection pause times and is ideal for browser-based B/S servers because of their emphasis on speed of response to user requests. As the name suggests, CMS is basically a garbage collector based on mark-clean algorithms. As mentioned earlier, CMS only properly enables the time-consuming mark-clean algorithm to defragment Java heap space when there is too much memory fragmentation.

It is an old age collector that often works with ParNew for garbage collection. The workflow is divided into four steps:

CMS Initial Mark, which marks objects directly associated with GC Roots, requires all user threads to be suspended, but it takes a short time.

CMS Concurrent mark, which searches deep into the object graph based on the initial tag, is time-consuming, but can be concurrent without suspending all user threads.

CMS remark, which corrects object marks that reference relationships change during concurrent marking in order to avoid object disappearance problems, see incremental updates above. All user threads need to be paused and completed using multiple GC threads.

CMS Concurrent sweep sweeps all objects judged to be dead.

The following flow chart describes the workflow of a CMS, using only a quad-core environment as an example:

Overall, CMS is an excellent garbage collector because it is the first successful attempt by the Hotspot VIRTUAL machine to pursue low pauses. However, there is currently no garbage collector that maintains the highest “cost performance” for all approved scenarios. Taking CMS as an example, it still has the following disadvantages:

First, programs designed for concurrency are, in fact, very sensitive to processor resources. Although it does not require “Stop the world” at every stage of its work, the concurrent phase still makes the application slower (according to the no free lunch theorem, there is no garbage collector that does not require pauses and does not affect the original application’s performance). Number of working threads started by the CMS (CPU usage) = (Number of cpus + 3) / 4 For machines with more than four cores, the CMS takes up no more than 25% of the CPU resources. But for machines with less than four cores, CMS can greatly affect user threads.

Second, because the CMS cleanup process runs concurrently with other user threads, new “floating garbage” generated during this time waits for the next GC to collect. This means that the garbage collector always has to “work ahead” to make room (or at least enough to hold the “floating garbage” generated by user threads during concurrent cleanups), rather than waiting for the old age to be almost filled up before garbage collection begins.

In JDK 5, the default is to start the CMS after the older generation uses 68% space. But if the old s space use not so fast (that is, the most objects are temporary), will be the VM to launch parameters – XX: CMSInitiatingOccupancyFraction increase percentage of trigger CMS, to the CPU resources assigned to users more threads.

If the reserved space is not enough to allocate new objects, the virtual machine will have to “Stop the world” again and temporarily use the Serial Old garbage collector to do a special collection of the Old era. The pause time is longer, so the higher the CMS trigger percentage is not the better.

Third, because CMS is based on the purge-mark algorithm, it also inherits the defects of this algorithm, such as memory fragmentation, which causes large objects to be unable to allocate space and has to start a Full GC.

11.6 G1 (Mixed Collection)

G1, the Garbage First collector, was chosen because G1 represents another milestone in Garbage collection technology. Also since G1, the design orientation of the most advanced garbage collector has changed from “clean once” to “able to handle the application’s memory allocation rate.” Since JDK 6, G1 has been used as an “experimental” garbage collector for developers, and finally, with JDK 8 Update 40, G1 has been able to provide concurrent class offloading, which is all that was originally planned.

G1 is a garbage collector for server applications. On the release of JDK 9, G1 was announced as the replacement of the Parallel Insane and Parallel Old insane, and the CMS garbage collector was put on the back burner. Its designers wanted G1 to be a collector that could build a “pause time model” : a collector that would most likely work for no more than N seconds in a user-specified slice of M seconds.

In order to achieve this goal, G1 differentiates itself from the traditional garbage collector in one way: it divides the entire Java heap into regions without distinction. Any Region can act as an Eden space, a Survivor space, or an old epoch space, depending on the actual need. Only the Region space to be reclaimed is combined into a Collection Set (CSet) each time. Therefore, G1’s Partial GC process can also be called Mixed GC.

As for recycling, the measurement standard is no longer simply based on the young generation/old generation, but by evaluating the “cost performance” of recycling space of each Region, and listing a priority list based on recycling value. In addition, the user can use the virtual machine parameter -xx :MaxGCPauseMillis(200 milliseconds by default), based on which the G1 collector determines the “Region space that should be cleaned most”.

In addition, the Region space also has special Humongous regions for storing “large” objects. G1 considers that an object that occupies more than half of the Region space can be considered a “large” object. If the object is a “super-large” object with more than one Region, it will be placed in N contiguous Humongous Spaces.

CMS adopts incremental update to avoid errors in marking results caused by reference relationship changes during concurrent marking, while G1 mainly implements it through raw snapshot (SATB). In addition, G1 designs two Top at Mark Start (TAMS for short) Pointers for each Region space and reserves some space for them: New objects created during concurrent collection are allocated above these two pointer positions, and G1 does not recycle them by default because they are alive.

The G1 collector also works in four steps:

Initial Marking: Only mark objects that GC Roots can reach and change the value of the TAMS pointer to ensure that the next user thread runs correctly allocating new objects in the available Region space, which requires a short “Stop the world” period.

Concurrent Marking: Do complete reachability analysis starting with GC Roots and deal with reference changes during SATB. This process is time-consuming, but can be concurrent with the user thread.

Final Marking: Once again, the user thread is paused briefly to process the small number of SATB records that remain after the concurrent Marking phase is over.

Live Data Counting and Evacuation: Statistical and Evacuation of Region Spaces that are ready for recycling, which form a CSet. In addition, live objects in these regions are moved to another free Region. Because object movement is involved, this process must suspend all user threads and requires multiple GC threads to complete concurrently.

In contrast to the CMS collector, G1 is not all about low latency, or it tries to find a balance between low latency and high throughput. As such, it leaves the expectation of pause time to the user’s choice of whether to focus more on low pauses or high throughput. In general, the default setting of 200 milliseconds is sufficient. If the pause time is excessively low (say, 20 milliseconds), the G1 collector can only “rush” to reclaim a small portion of Region space at a time, or even not as fast as the user thread can use it, triggering a Full GC, which degrades performance.

11.7 Common GC collector combinations and startup parameters

In the JDK version 7/8, the garbage collector is the Parallel Insane + Parallel Old combination. Until JDK 9, garbage collections from both the younger and older generations were delivered uniformly to the G1 garbage collector that supports Mixed GC.

order young old Launch parameters The young generation of Eden Young generation Survivor The old s
0 serial Serial Old -XX:+UseSerialGC “Eden Space” “Survivor Space” “Tenured Gen”
1 serial CMS -XX:-UseParNewGC-XX:+UseConcMarkSweepGC “Eden Space” “Survivor Space” “CMS Old Gen”
2 ParNew CMS -XX:+UseConcMarkSweepGC “Par Eden Space” “Par Survivor Space” “CMS Old Gen”
3 ParNew Serial Old -XX:+UseParNewGC “Par Eden Space” “Par Survivor Space” “Tenured Gen”
4 Parallel Scavenge Serial Old -XX:+UseParNewGC “PS Eden Space” “PS Survivor Space” “Tenured Gen”
5 Parallel Scavenge Parallel Old -XX:+UseParallelOldGC “PS Eden Space” “PS Survivor Space” “PS Old Gen”
6 G1 G1 -XX:+UseG1GC “G1 Eden Space” “G1 Survivor Space” “G1 Old Gen”

12. Reference materials

  • Let’s talk about collections of memories in the JVM
  • Why must Java main methods be static?
  • Why only one Eden and two survivors?
  • Pseudo sharing problem
  • In-depth understanding of JVM learning Notes: Analysis of garbage collection algorithms for young and old generations
  • Java strong weak virtual references, only experienced, will remember
  • One of the most common Java reference types you can’t miss is virtual references
  • Interviewer: You say you are familiar with the JVM? So you talk about concurrent accessibility analysis
  • Introduction to the Java GC combinator