preface

In the Learning JVM series, you’ve covered the JVM specification, the Class file format, and how to read bytecode, ASM bytecode processing, the life cycle of classes and custom Class loaders, memory allocation, bytecode execution engines, and more. This article introduces the basic garbage collection mechanism and understands the heap memory collection process that Java programs automatically perform in the real run

If you are interested in JVMS, bytecodes, Class file formats, ASM bytecode processing, Class loading and custom Class loaders, memory allocation, and bytecode execution engines, check out the previous articles and learn more

Android engineers learn JVM(vi)- bytecode execution engine

Android engineers learn the basics of JVM(5)- memory allocation

Android engineers learn about JVMS (4)- class loading, connecting, initialization, and unloading

Android engineers learn how to use THE JVM(iii)- bytecode framework ASM

Android Engineer learning JVM(II)- Teaches you to read Java bytecode

Android Engineers learn about JVM(I)- Overview of the JVM

1. What is garbage

Memory space in memory that is no longer used is garbage

Garbage collection is mainly concerned with the Java heap, which holds objects, so the memory space is no longer being used, that is, objects are no longer being used

How do I tell if an object is no longer in use?

Method 1: reference counting method

Add a reference counter to an object, incremented by one if it has access and decayed by one if it has references

void demo1(a) {
    Object ref1 = new Object();// The Object reference count is 1
    Object ref2 = ref1;// The Object reference count is 2
    ref1 = null;//Object A reference invalidation is reduced by 1 and the reference count is reduced to 1
    ref2 = null;//Object The last reference is invalid and can be reclaimed
}
Copy the code

But the way reference counting works, there is an obvious drawback – circular references

Class Obj {
    public Object prop;
}

Class Demo {
    public static void main(String[] args) {
        Obj obj1 = new Obj(); // The first Obj reference count is 1
        Obj obj2 = new Obj(); // The second Obj reference count is 1
        obj1.prop = obj2; // The first Obj reference count is 2
        obj2.prop = obj1; // The second Obj reference count is 2
        obj1 = null; // The first Obj reference count is reduced to 1
        obj2 = null; // The second Obj reference count is reduced to 1
        Obj1 and obj2 are already null, but the reference count is never zero because of circular references}}Copy the code

In this case, you need to unloop the references and manually free the memory before setting obj1 and obj2 to null

obj1.prop = null;
obj2.prop = null;
obj1 = null;
obj2 = null;
Copy the code

This situation is actually quite common in real programming, and it can be very painful to guard against. Fortunately, none of the garbage collection mechanisms currently in use actually use reference counting.

So what is actually used?

Method 2: Reachability analysis algorithm

The reachabness analysis algorithm is introduced from the graph theory in the discrete book order. The program regards all reference relations as a graph and searches for the corresponding reference node from a node GC ROOT Case. After finding this reference node, it continues to search for the reference node of this node. The remaining nodes are considered unreferenced, that is, useless.

Both the virtual machine stack and the local method stack are thread-private areas of memory that ensure the survival of objects referenced in them as long as the thread is not terminated. Objects referenced by class static attributes in the method area are obviously alive. The object referenced by the constant may be alive at the moment and, therefore, may also be part of GC roots. It can be concluded that:

Objects that can be used as GC ROOT in Java are

1. Objects referenced in the virtual machine stack (local variable table)

2. Objects referenced by static properties in the method area

The object referenced by the constant in the method area

4. Objects referenced in the Native method stack (Native objects)

When an object is unreachable, does it have to be reclaimed?

The answer is no, unreachable objects must go through the marking process at least twice before they are truly declared dead. Two marks refer to:

1. If the object has no reference chain connected to GC Roots after the reachability analysis, it will be marked for the first time and filtered by whether it is necessary to execute finalize() method.

2. When objects do not overwrite finalize() method, or Finalize () method has been called by virtual machine, the virtual machine will regard both cases as “not necessary to execute” and directly mark the second time.

3. If the object is determined to be necessary to finalize(), then the object will be placed ina Queue called f-Queue and executed later by a low-priority Finalizer thread automatically set up by the virtual machine.

Here’s an example:

public class FinalizerTest {
    public static FinalizerTest object;
    public void isAlive(a) {
        System.out.println("I'm alive");
    }
 
    @Override
    protected void finalize(a) throws Throwable {
        super.finalize();
        System.out.println("method finalize is running");
        object = this;
    }
 
    public static void main(String[] args) throws Exception {
        object = new FinalizerTest();
        // The first time, finalize method will save itself
        object = null;
        System.gc();
 
        Thread.sleep(500);
        if(object ! =null) {
            object.isAlive();
        } else {
            System.out.println("I'm dead");
        }
 
        // The second time, finalize method has been executed
        object = null;
        System.gc();
 
        Thread.sleep(500);
        if(object ! =null) {
            object.isAlive();
        } else {
            System.out.println("I'm dead"); }}}Copy the code

The output is as follows:

method finalize is running
I'm alive
I'm dead
Copy the code

If you do not rewrite Finalize (), the output is as follows:

I'm dead
I'm dead
Copy the code

As can be seen from the execution result, when GC happens for the first time, finalize() method does execute and escapes successfully before being collected. The second time GC occurs, since the Finalize () method is called by the JVM only once, object is reclaimed.

In other words, Finalize can be used to save objects, but this is not recommended, GC is uncontrollable and has great uncertainty.

2. Classification of references

The first section describes reference counting and root search algorithms to determine whether an object has a reference. Are references seen in Java garbage collection only referred to and not referred to?

The answer is no. If you just look at references and no references, the functionality is too thin. There are actually four types of references in Java:

1. Strong references are those that are common in program code, such as “Object obj=new Object()”. Garbage collector will never collect surviving strong references.

2. Soft references: There are objects that are useful but not necessary. These objects are listed in the collection scope for a second collection before the system is about to run out of memory.

Weak references are also used to describe non-essential objects, and objects associated with weak references can only survive until the next garbage collection occurs. When the garbage collector works, objects associated only with weak references are reclaimed regardless of whether memory is sufficient.

4. Virtual references are the weakest kind of reference relationship. An object instance cannot be obtained by virtual reference. The sole purpose of setting a virtual reference association for an object is to receive a system notification when the object is reclaimed by the collector.

How to determine if an object is garbage

Garbage collection algorithms

3.1. Mark removal method

The mark removal algorithm is divided into two stages: mark the objects to be recovered, and then collect them uniformly

It has two major disadvantages:

1, efficiency problem, marking and clearing two processes are not high efficiency

2. Space problem. A large number of discontinuous memory fragments will be generated after marking and clearing

3.2. Replication algorithm

Copy algorithm: divide the memory into two identical areas, use one area each time, when one area is used up, copy the remaining objects in this area to the other area, and then delete this area.

Advantages: simple implementation, efficient operation, do not consider the memory fragmentation problem

Disadvantages: Serious memory waste

Current commercial virtual machines use this algorithm to recover the new generation. In fact, most objects have a short lifetime, so there is no need to divide the memory space according to the 1:1 ratio. Instead, the memory is divided into a large Eden space and two small Survivor Spaces, and Eden and one Survivor are used each time

When recycling is done, the surviving objects in Eden and Survivor are copied to another Survivor space at once, and Eden and the Survivor space that was just used are cleaned up. The default HotSpot VIRTUAL machine is Eden:Survivor= 8:1, which means that 90% of the available memory space of the new generation is occupied by the new generation, and only 10% is wasted due to the replication algorithm

If there is not enough Survivor space, it relies on the old generation for allocation guarantee, and the objects that cannot be put into the old generation directly

Distribution guarantee: When the new generation carries out garbage recycling, the living area of the new generation can not hold the living objects, so these objects need to be placed in the old age, that is, the old age makes space distribution guarantee for the GC of the new generation.

Distribution guarantee is not always successful, and there may be old maximum continuous space is not enough to store the situation. Here’s a set of rules:

1. Before MinorGC occurs, the JVM checks whether the maximum contiguous space of the old age is greater than the total space of all objects of the new generation. If so, MinorGC is safe. (Even if MinorGC doesn’t reclaim any objects, surviving objects can still be put into the old age)

2. If less, then the JVM checks to see if it is set to allow guarantee failure, and if so, continues to check if the maximum contiguous available space in the old age is greater than the average size of objects promoted to the old age over time

3. If greater than, try a MinorGC

4. If less than, perform FullGC

3.3. Mark finishing method

Mark arrangement method: Because of the low efficiency and space waste of the replication algorithm when there are many objects, the replication algorithm is generally not used in the old era, and the mark arrangement method is mostly used in the old era

Tag cleanup, the marking process is the same as tag cleanup, but instead of directly cleaning up the recyclable objects, all surviving objects are moved to one end, and then the memory beyond the boundary is cleared directly.

3.4 Generational collection algorithm

Currently, garbage collection of commercial VMS adopts the generational collection algorithm, which divides the memory into several blocks based on the object life cycle and adopts different garbage collection algorithms based on different characteristics.

Generally, the Java heap is divided into the new generation and the old generation, so that the appropriate algorithm can be adopted according to the characteristics of each generation. In the new generation, where every garbage collection finds a large number of objects dead and a small number of objects alive, the replication algorithm is used. The collection can be done with a small cost of copying live objects. However, in the old age, the object has a high survival rate, and there is no extra space for it to allocate guarantee, so it must use “mark-clean” or “mark-tidy” algorithm to recycle

4, summary

1. Garbage collection is mainly for heap memory

2, garbage collection first to know what is garbage, that is, no longer used memory, reference counting and reachable algorithm used to determine whether the memory is garbage

3. There are not only references and no references in the JVM, but references are expanded into strong, soft, weak, and virtual types

4. Common algorithms for tag recycling garbage include tag clearing, copy algorithm and tag sorting algorithm. Currently, garbage collection of commercial VMS adopts generational collection algorithm