JVM – Garbage collection

Note: This article is a learning series. Some of the content will be taken from related books or online resources, with notes in the middle and at the end

The significance of garbage recycling

It takes the memory management aspect of Java programmers off the radar.

Garbage collection automatically manages the JVM’s memory space by cleaning up garbage objects that are no longer being used, freeing up more space for other objects.

What is a reference to an object?

Garbage collection in Java typically takes place in the Java heap, which holds almost all object instances in Java

In Java, the concept of a reference is summarized as follows (the strength of a reference decreases in order) :

  • Strong references: This type of reference is the most common in Java programs, and as long as strong references exist, the garbage collector will never reclaim the referenced object

  • SoftReference: objects that are not required and will be collected by the garbage collector if the system memory is insufficient. The JDK provides a SoftReference class to implement soft references

  • WeakReference: used to describe non-essential objects that are collected by the garbage collector whenever GC occurs, regardless of whether memory is sufficient. The JDK provides a WeakReference class to implement weak references

  • Virtual references: Unlike the other references, this does not affect the life cycle of an object. If the object is virtual, it can be reclaimed at any time, just as if there were no references at all. The JDK provides PhantomReference classes to implement virtual references

Example code is shown below

public class ReferenceDemo {
    public static void main(String[] arge) {
        / / strong reference
        Object object = new Object();
        Object[] objects = new Object[100];

        / / soft references
        SoftReference<String> stringSoftReference = new SoftReference<>(new String("SoftReference"));
        System.out.println(stringSoftReference.get());
        System.gc();
        System.out.println(stringSoftReference.get()); // Manual GC, when there is enough memory and the object is not collected

        System.out.println();

        / / weak references
        WeakReference<String> stringWeakReference = new WeakReference<>(new String("WeakReference"));
        System.out.println(stringWeakReference.get());
        System.gc();
        System.out.println(stringWeakReference.get()); // Manual gc, in which case null is returned and the object has been collected

        System.out.println();

        / / reference
        // Virtual references are mainly used to track the activity of objects being collected by the garbage collector.
        // one difference between a virtual reference and a soft or weak reference is that a virtual reference must be used in conjunction with a ReferenceQueue.
        When the garbage collector attempts to reclaim an object and finds that it has a virtual reference, it adds the virtual reference to its associated reference queue before reclaiming the object's memory
        ReferenceQueue<String> stringReferenceQueue = new ReferenceQueue<>();
        PhantomReference<String> stringPhantomReference = new PhantomReference<>(new String("PhantomReference"), stringReferenceQueue); System.out.println(stringPhantomReference.get()); }}Copy the code

Of course, there are a lot of knowledge points about these references, this article will only give a brief introduction, and there will be a chance to introduce them in detail in a separate article.

How do I determine which garbage objects need to be collected?

Reference counter

Each object has a reference counter, which is +1 when a reference is added, -1 when a reference is released, and 0 when the counter is reclaimed

The reference counting algorithm is easy to implement and efficient to determine, and it is a good choice in most cases, while the Java language does not choose this algorithm for garbage collection, mainly because it is difficult to solve the problem of circular references between objects

public class LoopReferenceDemo {

    public static void main(String[] args) {
        TestA a = new TestA(); / / 1
        TestB b = new TestB(); / / 2
        a.b = b; / / 3
        b.a = a; / / 4
        a = null; / / 5
        b = null; / / 6}}class TestA {
    public TestB b;

}

class TestB {
    public TestA a;
}Copy the code

Although both a and B are null, a and B have circular references, such that a and B are never collected

If you search the Internet for the keyword “reference counter”, you will usually get this result. However, it is not clear why a and B are not returned. Here is a brief explanation:

  • First line: the reference counter for TestA is incremented by 1, so that the number of references to TestA is 1

  • Line 2: The reference counter for TestB is incremented by 1, and the number of references for TestB is 1

  • Line 3: The reference counter for TestB is incremented by 1, and the number of references for TestB is 2

  • Line 4: The reference counter for TestA is incremented by 1, and the number of references for TestA is 2

The following figure shows memory distribution

Reference counter example -1

  • Line 5: Set the a variable to null so that it no longer refers to the reference in the heap, so TestA’s reference counter is reduced by 1 and the number of references to TestA is 1

  • Line 6: Set the b variable to null so that it no longer refers to the reference in the heap, so TestB’s reference counter is reduced by 1 and the number of references in TestB is 1

The following figure shows memory distribution

Reference counter example -2

  • TestA and TestB still hold references to each other in the heap. The reference counter is still not equal to 0. This is called a circular reference.

The above points of reference: www.zhihu.com/question/21…

Accessibility analysis

Although the above “reference counter” algorithm has the problem of “circular reference”, currently mainstream VIRTUAL machines use “GC Roots Tracing” algorithm to mark which objects can be reclaimed.

The algorithm starts from GC Roots and searches down the path, which is called reference chain. When an object is not connected to GC Roots by any reference chain, the object is not available. It’s called “unreachable objects”

GC Roots include:

  • A reference object in the virtual machine stack (the local variable table in the stack frame)

  • The object referenced by the static property entity in the method area

  • The object referenced by the constant in the method area

  • Objects referenced by JNI(Native methods) in the Native method stack

In fact, to actually declare an object dead in a root search algorithm, there are at least two marking processes:

  • If an object is found to have no reference chain connected to GC Roots after the root search, it will be marked for the first time and filtered based on whether it is necessary to execute the Finalize () method

  • When an object does not overwrite a Finalize () method, or a Finalize () method has already been called by a virtual machine, the virtual machine considers both cases unnecessary

  • If the object is determined to be necessary to finalize (), then the object will be placed ina Queue named F-Queue, and finalize () will be executed later by a low-priority Finalizer thread automatically set up by the virtual machine

  • Finalize () method is the last chance for objects to escape death (because the finalize () method of an object can only be called automatically by the system once at most). Later, GC will make a second small tag for objects in the F-queue. If you want to save yourself successfully in Finalize () method, Just make the object rereference any object on the chain in the Finalize () method to establish the association

  • If the object is not yet associated with any reference on the chain, it will be recycled

As shown in the figure below

Accessibility analysis

From the above picture,reference1,2,3 are gc roots

Reference1 points to Instance1,reference2 to Instance4, instance4 to Instance6, and reference3 to Instance2

So instance1,2,4, and 6 are all gc roots reachable and live objects that can’t be collected by the garbage collector

Instance3 and 5 do not have GC roots reachable, so they are not available and will be collected by the garbage collector

TestA and TestB hold references to each other, but do not have GC roots reachability. Therefore, under the reachability analysis algorithm, they will be collected by the garbage collector

Algorithms for garbage collection

Mark-clear algorithm

The algorithm is divided into two stages: “mark” and “clear”. The first stage is to mark the objects to be recycled, and after the completion of the marking, all the previously marked objects are recycled. It is the most basic collection algorithm. Subsequent collection algorithms are based on this idea and improve on its shortcomings

Mark-clear

Main disadvantages:

  • Efficiency issues: Need to mark and clear two scans

  • Space issues: Marking and cleaning creates a large number of discrete memory fragments, which may cause the program to be unable to find enough contiguous memory to allocate a large space and have to start another garbage collection operation early

Replication algorithm

Divide the available memory into two pieces by capacity, use one piece at a time, and when the memory is used up, copy the remaining objects onto the other piece, and then clean up the first piece at a time

copy

Advantages:

  • Only one piece of memory is operated at a time. When allocating memory, you do not need to consider the situation of memory fragmentation. You only need to move the pointer of the object and allocate memory in sequence

Disadvantages:

  • It shrinks the memory by half

  • Continuous copying of long-lived objects leads to a loss of efficiency (no understanding) (For objects with high survival, multiple copies are made, resulting in a loss of efficiency)

Mark-compression algorithm

It is the same as the mark-clear algorithm, except instead of clearing, the action is to move all objects towards one end and then clean up the objects beyond the boundary (the marked objects)

Mark-compress

Features:

  • The replication algorithm is more suitable for the new generation. In the old age, the survival rate of objects is relatively high. If more replication operations are performed, the efficiency will be lower

  • The marking process of this algorithm is the same as the marking process in the mark-clean algorithm, but the processing of the garbage objects after the marking is different. Instead of cleaning the recyclable objects directly, it moves all objects to one end and then directly cleans up the memory beyond the end boundary

Generational collection algorithm

The Java heap is divided into “new generation” and “old generation “, and different algorithms are used for different generations

Generational management

In the new generation, due to the very short life cycle of objects, a large number of objects will die and only a small number of objects will survive each garbage collection. Thus, the “replication algorithm” only needs to pay a small amount of the replication cost of living objects to complete the collection

In the old days, due to the long life cycle of the object, the survival rate is high, there is no extra space to allocate and guarantee it, so it must use “mark-clean algorithm” or “mark-compression algorithm” for recycling

Minor GC: Collecting memory from a young generation space (including Eden and Survivor regions) is called a Minor GC

Major GC: Clean up the old days

Full GC: Cleans the entire heap space – both young and old generations

Young generation: This is where all new objects are created. The young generation is divided into three parts (Enden and two Survivor zones, also called From and To), and when Eden’s zone fills up with objects,Minor GC is performed and all surviving objects are transferred To one of the Survivor zones (forms),Minor GC also checks surviving objects and moves them To another Survivor zone (To), so that there is always an empty survivor zone for a period of time, and after many GC cycles, Objects that still survive are moved to the old generation memory space. This is done by setting an age threshold before the young generation is eligible to be promoted to the old generation. Note that the two Survivor regions are symmetric and not sequential, and from and to are relative.

Old age: Objects that have not been cleared after N collections in the young generation are put into the old generation with a long life cycle. For older generations, a Major GC is performed to clean up. In some cases, the Full GC is triggered to clean up the entire heap memory

Meta-space: A portion of memory outside the heap, usually used directly by the system, to hold the runtime constant pool, etc., without significant impact on the meta-space

Garbage collector

The garbage collector is a concrete implementation of the memory reclamation algorithm. The Java Virtual Machine specification does not specify how the garbage collector should be implemented, so the garbage collector provided by different vendors and different versions of virtual machines may vary greatly

The Sun HotSpot VIRTUAL Machine version 1.6 includes the following collectors: Serial, ParNew, Parallel Insane, CMS, Serial Old, Parallel Old

These collectors work together in different combinations to complete garbage collection in different generations. Here is a brief introduction of garbage collectors:

Serial collector

The serial collector, the oldest, most stable, and efficient collector, may cause the program to pause for a long time, using only one thread to collect. New generation, old generation using serial recycling

New generation uses “copy algorithm”

In the old days, they used the tag compression algorithm.

Garbage collection “stops the world”

ParNew collector

Is the multithreaded version of Serial collector, the new generation parallel, the old Serial

New generation uses “copy algorithm”

In the old days, they used the tag compression algorithm.

Garbage collection “stops the world”

Paralle collector

Similar to the ParNew collector, but more concerned with the throughput of the system.

You can enable the Adaptive tuning policy by setting parameters. The vM collects performance monitoring information based on the current system running status and dynamically adjusts these parameters to provide the most appropriate pause time and maximum throughput

You can also use parameters to control how many milliseconds or percentage of time the GC takes

New generation uses “copy algorithm”

In the old days, they used the tag compression algorithm.

Parallel Old collector

Paralle is an older version of the Paralle collector, which uses multithreading and a “mark-collation algorithm”. This collector was only introduced in JDK1.6

CMS collector

Is based on the “mark-clear” algorithm, its operation process is relatively more complex than the previous collector, the whole process is divided into four steps, including:

  • CMS Initial Mark

  • CMS Concurrent Mark

  • Re-marking (CMS Remark)

  • CMS Concurrent sweep

Initial and concurrent tags still need to Stop the World.

The initial tag simply marks objects that GC Root can be directly associated with, which is fast.

The concurrent marking phase is the process of GC Root Tracing.

Relabelling this is to correct the part of the mark record during the concurrent marking that the mark changes as the user program continues to operate. The pause time during this phase is slightly longer than the initial marking phase, but much shorter than the concurrent marking phase

The collector thread can work with the user thread during the concurrent marking and concurrent cleaning processes that are the longest in the process, so in general, the CMS collector’s memory reclamation is performed concurrently with the user thread

Advantages: Concurrent collection, low pauses

Disadvantages: generates a lot of space debris, and the concurrency phase reduces throughput

G1 collector

In contrast to the CMS collector project, the G1 collector has the following features:

  • Spatial integration:

    • The G1 collector uses a mark-collation algorithm and does not generate space debris. Allocating large objects does not trigger the next GC prematurely because contiguous space cannot be found
  • Predictable pause:

    • Reducing pause times is a common focus of G1 and CMS

    • In addition to pursuing low pauses, G1 models predictable pause times.

    • The ability to explicitly specify that no more than N milliseconds should be spent on garbage collection within a time segment of N milliseconds is almost characteristic of real-time Java (RTSJ) garbage collection

The garbage collector mentioned above collects the entire New generation or the old generation, which is no longer the case with G1.

When using the G1 collector, the memory layout of the JAVA heap is quite different from that of other collectors. It divides the JAVA heap into independent regions of equal size. While the concept of new generation and old generation is retained, the new generation and old generation are no longer physically separated, but are all part of (and can be discontinuous) regions A collection of

Similar to ParNew, G1’s Cenozoic collector triggers a collection when Cenozoic usage reaches a certain percentage

Like CMS, the G1 collector has a brief pause while collecting old objects

The collection procedure is as follows:

  • Marking stage:

    • Initial mark, which stops the world event and triggers a normal Mintor GC(reclaim from young space)
  • Root Region Scanning :

    • Survivor zones are reclaimed during running the program, which must be done before the Young GC
  • Concurrent Marking :

    • Concurrent marking (and application concurrent execution) throughout the Java heap may be interrupted by the Young GC

    • If all objects in a region object are found to be garbage, that region is immediately reclaimed

    • At the same time, the object activity (the percentage of living objects in the region) is calculated for each region during concurrent marking.

  • Remark :

    • Again, there will be a short pause (STW)

    • Is used to collect concurrent marking phases, generating new garbage (concurrent phases are executed with the application)

    • G1 uses a faster initial snapshot algorithm than CMS: Snapshot-at-the-beginning (SATB)

  • Copy / Clean up :

    • Multithreaded cleanup of inactivated objects will have STW

    • G1 copies the live objects of the reclaimed region to the new region, clears the Remember Sets, simultaneously clears the reclaimed region, and returns it to the list of free regions

  • After copying/cleaning process:

    • Newly copied in Young Generation and recently copied in Old Generation regions, active objects from the reclaimed region have been copied by the collector

reference

<< In-depth understanding of the JVM virtual machine >>

The end of the

This article covers many points, including object references, how to define garbage objects, GC algorithms, existing garbage collectors, and so on.

Due to space and time reasons, each point is not covered in depth (of course, each point in this article is enough to write a book, hehe).

I will look for opportunities to discuss these points one by one.

In a word, “there is no end to learning”.

Code warehouse(Blog matching code)

  • Udf-starter: Base project, scaffolding, frame

  • Udf-sample: indicates an integration example


For the fastest updates, please follow our official account

For the fastest updates, please follow our official account