Overview of garbage collection and related algorithms

  1. The difference between Java and C++ lies in garbage collection technology and dynamic memory allocation. C++ does not have garbage collection technology and requires manual collection by programmers.
  2. Garbage collection is not a companion product of the Java language. Back in 1960, Lisp was the first language to use dynamic memory allocation and garbage collection techniques.
  3. There are three classic questions about garbage collection:
    • What memory needs to be reclaimed?
    • When is it recycled?
    • How to recycle?
  4. Garbage collection is Java’s signature capability and greatly improves development efficiency. Nowadays, garbage collection has almost become the standard of modern languages. Even after such a long time of development, Java garbage collection mechanism is still evolving. Different sizes of devices and application scenarios with different characteristics pose new challenges to garbage collection, which is certainly a hot topic for interviews.

Big factory interview question

The ant gold dress

  1. Which garbage collectors do you know, and their pros and cons, specifically CMS and G1?
  2. What are the JVM GC algorithms, and what is the collection algorithm used in the current JDK version?
  3. G1 collector What is GC? Why GC?
  4. Two kinds of determination methods of GC? Features of CMS collector and G1 collector

baidu

  1. Let’s talk about GC. Let’s talk about generational collection
  2. Garbage collection policies and algorithms

Tmall

  1. How does a JVM reclaim memory
  2. CMS features, garbage collection algorithms? What are their strengths and weaknesses? What are their common weaknesses?

drops

  1. What are the Java garbage collectors? Tell us about G1 application scenarios. How do you usually use garbage collectors together

jingdong

  1. Do you know which garbage collectors, their pros and cons, specifically CMS and G1,
  2. Including principles, processes, advantages and disadvantages. The realization principle of garbage collection algorithm

Ali.

  1. Let’s talk about garbage collection algorithms.
  2. When is garbage collection triggered?
  3. How to choose the right garbage collection algorithm?
  4. What are the three garbage collectors for the JVM?

Bytes to beat

  1. What are the common garbage collector algorithms, and what are the pros and cons of each?
  2. What do system.gc () and Runtime.gc() do?
  3. Java GC mechanism? What GC Roots do you have?
  4. Java object recycling method, recycling algorithm.
  5. CMS and G1, what kind of problem does CMS solve? Talk a little bit about the recycling process.
  6. CMS recycle paused several times, why did it pause twice?

What is garbage?

  1. Garbage is an object that has no pointer to it in the running program. This object is garbage that needs to be collected.
  2. An object is considered garbage when it can no longer be reached from any pointer in the running program.
  3. If garbage is not cleaned up in time, the memory space occupied by these garbage objects will be retained until the end of the application, and the reserved space cannot be used by other objects. It may even cause memory overflow.

More than a decade ago in the days of disk defragmentation

Why is GC needed?

To learn about GC, you need to understand why GC is needed in the first place. [Obtaining resources]

  1. For high-level languages, the basic understanding is that without garbage collection, memory will run out sooner or later, because constantly allocating memory without recycling is like constantly producing garbage without ever cleaning it up.
  2. In addition to releasing useless objects, garbage collection can also clear memory of record fragments. Defragmentation moves occupied heap memory to one end of the heap so that the JVM can allocate it to new objects.
  3. As applications deal with larger and more complex businesses and more users, they cannot be kept running without GC. The GC that often causes STW can’t keep up with the actual requirements, so there are constant attempts to optimize GC.

Early garbage collection

  1. In the early days of C/C++, garbage collection was largely manual. Developers can use the new keyword for memory allocation and the delete keyword for memory release. For example:

MibBridge *pBridge= newCmBaseGroupBridge ();// If the registration fails, use Delete to free the memory area occupied by the object
if(pBridge->Register (kDestroy)! = NO ERROR)deletePBridge;Copy the code
  1. This method can flexibly control the time of memory release, but it will bring management burden to the developer to frequently apply for and release memory. If an area of memory is forgotten to be collected due to programmer coding problems, a memory leak can occur, and garbage objects can never be cleaned up. As the system runs longer, garbage objects can continue to consume more memory until they overflow and cause the application to crash.
  2. With the garbage collection mechanism, the above code will most likely look like this
MibBridge *pBridge=new cmBaseGroupBridge(); 
pBridge->Register(kDestroy);
Copy the code
  1. Now, in addition to Java, C#, Python, Ruby and other languages all use the idea of automatic garbage collection, which is also the future trend. It can be said that this automatic memory allocation and collection method has become a necessary standard of modern development languages.

Java garbage collection mechanism

Automatic memory management

Website: docs.oracle.com/javase/8/do…

Advantages of automatic memory management

  1. Automatic memory management reduces the risk of memory leaks and spills by eliminating the need for developers to manually allocate and reclaim memory
  2. Without the garbage collector, Java would be like CPP, with dangling Pointers, wild Pointers, and leaks.
  3. The automatic memory management mechanism frees programmers from the heavy memory management and allows them to concentrate more on business development

Concerns about automatic memory management

  1. Automatic memory management is a black box for Java developers, and relying too heavily on “automatic” can be a disaster, at its worst weakening the Java developer’s ability to locate and resolve problems when a program runs out of memory.
  2. At this point, it is important to understand how the JVM automatically allocates and reclaims memory. Only with a true understanding of how the JVM manages memory can we quickly locate and resolve outofMemoryErrors based on the error exception log.
  3. These “automated” technologies need to be monitored and tuned when it comes to troubleshooting memory spills and leaks, and when garbage collection becomes a bottleneck for higher concurrency.

What areas should be concerned about recycling?

  1. The garbage collector can collect from the young generation, the old generation, and even the full stack and method area.
  2. The Java heap is the focus of the garbage collector
  3. In terms of times:
    1. Frequent collection of Young areas
    2. Less collection of Old zones
    3. Almost no Perm area is collected

Garbage collection related algorithms

Marking phase: reference counting algorithm

Mark the purpose of the phase

Garbage marking phase: The main purpose is to determine whether the object is alive or not

  1. Almost all Java object instances are stored in the heap, and before the GC can perform garbage collection, it first needs to distinguish between alive and dead objects in memory. Only objects marked as dead are freed by GC during garbage collection, so this process can be referred to as the garbage marking phase.
  2. So how exactly do you mark a dead object in the JVM? Simply put, an object is declared dead when it is no longer referenced by any living object.
  3. There are two ways to judge the survival of objects: reference counting algorithm and reachability analysis algorithm.

Reference counting algorithm

  1. The Reference Counting algorithm is relatively simple and stores a Reference counter attribute of an integer for each object. Used to record when an object is referenced.
  2. For an object A, if any object references A, then A’s reference counter is incremented by 1. When a reference is invalid, the reference counter is decayed by one. As long as the reference counter of object A has A value of 0, it indicates that object A can no longer be used and can be reclaimed.
  3. Advantages: simple implementation, garbage object easy to identify; High judgment efficiency and no delay in recovery.
  4. Disadvantages:
    1. It requires separate fields to store counters, which adds to the storage overhead.
    2. Each assignment requires updating the counter, along with addition and subtraction, which adds time overhead.
    3. A serious problem with reference counters is that they cannot handle circular references. This is a fatal flaw that prevents such algorithms from being used in the Java garbage collector.

A circular reference

When the pointer to P breaks, the internal references form a loop, and the counters are still 1 and cannot be reclaimed. This is a circular reference, causing a memory leak

Proof: Java does not use reference counting algorithms[Obtaining resources]

/** * -xx :+PrintGCDetails */
public class RefCountGC {
    // The only function of this member attribute is to take up some memory
    private byte[] bigSize = new byte[5 * 1024 * 1024];//5MB

    Object reference = null;

    public static void main(String[] args) {
        RefCountGC obj1 = new RefCountGC();
        RefCountGC obj2 = new RefCountGC();

        obj1.reference = obj2;
        obj2.reference = obj1;

        obj1 = null;
        obj2 = null;
        // Explicitly perform garbage collection
        Can obj1 and obj2 be collected?System.gc(); }}Copy the code

  • If you’re not careful, just take itobj1.referenceandobj2.referenceSet to null. The two pieces of memory in the Java heap remain references to each other and cannot be reclaimed

When there is no GC

Comment out the following lines of code before it’s too late

System.gc();Comment out this line of code
CODE
Heap
 PSYoungGen      total 38400K, used 14234K [0x00000000d5f80000.0x00000000d8a00000.0x0000000100000000)
  eden space 33280K, 42% used [0x00000000d5f80000.0x00000000d6d66be8.0x00000000d8000000)
  from space 5120K, 0% used [0x00000000d8500000.0x00000000d8500000.0x00000000d8a00000)
  to   space 5120K, 0% used [0x00000000d8000000.0x00000000d8000000.0x00000000d8500000)
 ParOldGen       total 87552K, used 0K [0x0000000081e00000.0x0000000087380000.0x00000000d5f80000)
  object space 87552K, 0% used [0x0000000081e00000.0x0000000081e00000.0x0000000087380000)
 Metaspace       used 3496K, capacity 4498K, committed 4864K, reserved 1056768K
  class space    used 387K.capacity 390K.committed 512K.reserved 1048576K

Process finished with exit code 0
Copy the code

The GC

Open a comment for that line of code

[GC (System.gc()) [PSYoungGen: 13569K->808K(38400K)] 13569K->816K(125952K), 0.0012717 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
[Full GC (System.gc()) [PSYoungGen: 808K->0K(38400K)] [ParOldGen: 8K->670K(87552K)] 816K->670K(125952K), [Metaspace: 3491K->3491K(1056768K)], 0.0051769 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
Heap
 PSYoungGen      total 38400K, used 333K [0x00000000d5f80000.0x00000000d8a00000.0x0000000100000000)
  eden space 33280K, 1% used [0x00000000d5f80000.0x00000000d5fd34a8.0x00000000d8000000)
  from space 5120K, 0% used [0x00000000d8000000.0x00000000d8000000.0x00000000d8500000)
  to   space 5120K, 0% used [0x00000000d8500000.0x00000000d8500000.0x00000000d8a00000)
 ParOldGen       total 87552K, used 670K [0x0000000081e00000.0x0000000087380000.0x00000000d5f80000)
  object space 87552K, 0% used [0x0000000081e00000.0x0000000081ea7990.0x0000000087380000)
 Metaspace       used 3498K, capacity 4498K, committed 4864K, reserved 1056768K
  class space    used 387K, capacity 390K, committed 512K, reserved 1048576K

Process finished with exit code 0
Copy the code

1. It is obvious from the printed log that GC has been performed

2. If the reference counting algorithm is used, the two objects will not be recycled. Now that both objects are reclaimed, Java does not use a reference counting algorithm for markup.

summary

  1. Reference counting algorithms are the recycle of choice for many languages, such as Python, which has become more popular due to artificial intelligence, and supports both reference counting and garbage collection.
  2. Which is optimal depends on the scenario, and there are large-scale attempts to retain only reference counting mechanisms in practice to improve throughput.
  3. Java did not choose reference counting because of the basic difficulty of handling circular reference relationships.
  4. How does Python solve circular references?
    • Manual dereference: It is well understood that when appropriate, the dereference relationship.
    • Use weakreference WeakRef, a standard library provided by Python that is designed to address circular references.

Marking stage: Reachability analysis algorithm

Reachability analysis algorithm: also known as root search algorithm, traceable garbage collection

  1. Compared with reference counting algorithm, reachability analysis algorithm not only has the characteristics of simple implementation and efficient execution, but also can effectively solve the problem of cyclic reference in reference counting algorithm and prevent the occurrence of memory leakage.
  2. In contrast to the reference counting algorithm, the reachability analysis is chosen by Java and C#. This type of Garbage Collection is often called Tracing Garbage Collection

The idea of reachability analysis[Obtaining resources]

  • The “GCRoots” root set is a set of references that must be active
  • The basic idea is as follows:
  1. The reachability analysis algorithm takes the root object set (GCRoots) as the starting point and searches whether the target object connected by the root object set is reachability from top to bottom.
  2. After using the reachability analysis algorithm, the living objects in memory are directly or indirectly connected by the root object set, and the path searched is called the Reference Chain.
  3. If the target object is not connected by any reference chain, it is unreachable, which means that the object is dead and can be marked as a garbage object.
  4. In the reachability analysis algorithm, only the objects that can be connected directly or indirectly by the root object set are the viable objects.

What elements can GC Roots be?

  1. Object referenced in the virtual machine stack
    • For example: parameters, local variables, etc. used in methods called by each thread.
  2. Objects referenced by JNI (commonly referred to as local methods) in the local method stack
  3. The object referenced by the class static property in the method area
    • Example: Java class reference type static variable
  4. The object referenced by the constant in the method area
    • For example: references in a StringTable constant pool
  5. All objects held by synchronized
  6. Internal references of the Java VIRTUAL machine.
    • Class objects corresponding to basic data types, some resident exception objects (NullPointerException, OutofMemoryError), system Class loaders.
  7. Jmxbeans that reflect Java virtual machine internals, callbacks registered in JVMTI, local code caches, and so on.

  1. In a word, all references to the heap space except those surrounding the heap space, such as virtual machine stack, local method stack, method area, string constant pool, etc., can be used as GC Roots for accessibility analysis

  2. In addition to these fixed COLLECTION of GC Roots, other objects can be added “temporarily” to form a complete collection of GC Roots, depending on the garbage collector selected by the user and the memory region currently reclaimed. Such as:

    Generational collection

    And partial recovery (PartialGC).

    • If you do garbage collection for only one area of the Java heap (for example: Typical only for new generation), must consider the memory region is a virtual machine implementation details, more is not isolated, sealed the area of the object can be referenced in other parts of the object, then you need will be along with all the associated region object also add to the collection of GC Roots to consider, to ensure the accuracy of the accessibility analysis.

tip

Since Root stores variables and Pointers on a stack, a pointer that holds objects in the heap but is not in the heap itself is Root.

Pay attention to

  1. If a reachability analysis algorithm is used to determine whether memory is retrievable, the analysis must be performed in a consistent snapshot. If this is not satisfied, the accuracy of the analysis results cannot be guaranteed.
  2. This is also an important reason why GC must “Stop The World”. Even CMS collectors, which claim to be (almost) non-pausing, are required to pause when enumerating root nodes.

Finalization mechanisms for objects

Finalize () method mechanism

Callback function before object destruction: Finalize ()

  1. The Java language provides an object finalization mechanism that allows developers to provide custom processing logic for objects before they are destroyed.
  2. When the garbage collector finds that there is no reference to an object, that is, before garbage collection of the object, the Finalize () method of the object is always called.
  3. The Finalize () method allows to be overridden in subclasses for resource release when objects are reclaimed. This method usually does some resource freeing and cleaning, such as closing files, sockets, and database connections.

Object class finalize() source code

// Wait to be overwritten
protected void finalize(a) throws Throwable { }` </pre>
Copy the code
  1. Never actively call a Finalize () method of an object. Instead, let the garbage collector call it. There are three reasons:
    1. Objects may be resurrected when finalize().
    2. The execution time of Finalize () method is not guaranteed. It is completely determined by GC thread. In extreme cases, if GC does not happen, Finalize () method will have no chance to execute.
    3. A bad Finalize () can seriously impact GC performance. For example, Finalize is an infinite loop
  2. In terms of function, finalize() method is similar to the destructor in C++, but Java adopts automatic memory management mechanism based on garbage collector, so finalize() method is fundamentally different from the destructor in C++.
  3. Finalize () method corresponds to a Finalize thread, because of its low priority, even if the method is called actively, it will not be directly recycled

To be or not to be?

Objects ina virtual machine generally have three possible states, thanks to finalize().

  1. If an object is unreachable from all root nodes, it is no longer in use. In general, this object needs to be reclaimed. But in fact, they are not necessarily dead. They are on probation.

    An unreachable object may “resurrect” itself under certain conditions

    If so, it does not make sense to recycle it immediately. To do this, define three possible states for objects in the virtual machine. As follows:

    1. Accessible: This object can be reached from the root node.
    2. Resurrectible: All references to objects are released, but objects may be resurrected in Finalize ().
    3. Untouchable: Objects will enter the untouchable state when Finalize () is called and not resurrected. Untouchable objects cannot be resurrected, because Finalize () will only be called once.
  2. The above three states are distinguished by finalize() method. Objects can only be reclaimed if they are not reachable.

The specific process

To determine whether an object objA is recyclable, we must go through at least two marking processes:

  1. The first mark is made if the object objA to GC Roots has no chain of references.
  2. Filter to determine if this object needs to be finalize()
    1. If the object objA does not override the Finalize () method, or if the Finalize () method has already been called by a virtual machine, then the virtual machine considers it “unnecessary to execute” and objA is judged untouchable.
    2. If the object objA overrides the Finalize () method and has not yet executed it, then the object will be inserted into the F-Queue and its Finalize () method will be triggered by a low-priority Finalizer thread automatically created by the virtual machine.
    3. The Finalize () method is the last chance for objects to escape death, and objects in the F-queue are later marked a second time by GC. If objA is linked to any object in the reference chain in the Finalize () method, then on the second markup, objA is removed from the “to be collected” collection. After that, the object will again have no references. In this case, the Finalize () method will not be called again, and the object will directly become the untouchable state, that is, the Finalize () method of an object will only be called once.

View the Finalizer thread using JVisual VM

The code shows that the Finalize () method can resurrect objects

We override the CanReliveObj class’s Finalize () method to refer obj to the current class object this when calling its Finalize () method

/** * Tests the Finalize () method of the Object class, the finalization mechanism for objects. * * /
public class CanReliveObj {
    public static CanReliveObj obj;// Class variable, belonging to GC Root

    // This method can only be called once
    @Override
    protected void finalize(a) throws Throwable {
        super.finalize();
        System.out.println("Call finalize() method overridden by the current class");
        obj = this;// The current object to be collected is linked to obj, an object on the reference chain, in finalize()
    }

    public static void main(String[] args) {
        try {
            obj = new CanReliveObj();
            // The object successfully saves itself for the first time
            obj = null;
            System.gc();// Call the garbage collector
            System.out.println("First GC");
            // Because the Finalizer thread has a low priority, pause for 2 seconds to wait for it
            Thread.sleep(2000);
            if (obj == null) {
                System.out.println("obj is dead");
            } else {
                System.out.println("obj is still alive");
            }
            System.out.println("Second GC");
            // The following code is exactly the same as the above, but the self-rescue fails
            obj = null;
            System.gc();
            // Because the Finalizer thread has a low priority, pause for 2 seconds to wait for it
            Thread.sleep(2000);
            if (obj == null) {
                System.out.println("obj is dead");
            } else {
                System.out.println("obj is still alive"); }}catch(InterruptedException e) { e.printStackTrace(); }}}Copy the code

If you comment out the Finalize () method

// This method can only be called once
   @Override
   protected void finalize(a) throws Throwable {
       super.finalize();
       System.out.println("Call finalize() method overridden by the current class");
       obj = this;// The current object to be collected is linked to obj, an object on the reference chain, in finalize()
   }
Copy the code

Output result:

The first1Gc obj is dead2Gc obj is dead '</pre>Copy the code

Let go of the Finalize () method

Output result:

The first1The next GC call to finalize() method obj is still alive2Gc obj is deadCopy the code

The first self-rescue succeeds, but the second self-rescue fails because the Finalize () method will be executed only once

Tracing GC Roots of MAT and JProfiler

MAT is introduced

  1. MAT, short for Memory Analyzer, is a powerful Java heap Memory Analyzer. Used to find memory leaks and view memory consumption.
  2. MAT is based on Eclipse and is a free performance analysis tool.
  3. You can download and use MA at www.eclipse.org/mat/…

1. While Jvisualvm is powerful, MAT is better for memory analysis

2. This section is mainly to analyze what GC Roots are in real time, and a dump file is needed in the middle

How to obtain dump files

Method 1: Use jmap for command lines

Method 2: Use JVisualVM

  1. The captured heap dump file is a temporary file that is automatically deleted when JVisualVM is shut down. To save it, you need to save it as a file. Heap dump can be captured by:
  2. The procedure is shown below

Capture dump example

Use JVisualVM to capture heap dumps

Code:

  • NumList and Birth took the first memory snapshot for GC Roots
  • NumList and Birth are then null, the reference object is reclaimed, and the second time the memory snapshot is taken, it is no longer GC Roots
public class GCRootsTest {
    public static void main(String[] args) {
        List<Object> numList = new ArrayList<>();
        Date birth = new Date();

        for (int i = 0; i < 100; i++) {
            numList.add(String.valueOf(i));
            try {
                Thread.sleep(10);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }

        System.out.println("Data added, please do:");
        new Scanner(System.in).next();
        numList = null;
        birth = null;

        System.out.println("NumList, birth have been null, please operate:");
        new Scanner(System.in).next();

        System.out.println("The end"); }}Copy the code

How do I take a heap memory snapshot

1. Perform the first step and then stop to generate the step dump file

2, Click heap Dump

3. Right-click – > Save as

4. Enter the command and continue to execute the program

5. We then take the second heap memory snapshot

Use MAT to view heap memory snapshots

1, Open MAT, select File – > Open File, Open the first two dump files

You can also click Open Heap Dump

2. Choose Java Basics — > GC Roots

3. GC Roots contains two local variables we defined for the first heap snapshot, of type ArrayList and Date, Total:21

4. Open the second dump file and on the second memory snapshot, the objects referenced by the two local variables are released so they are no longer GC Roots, as can be seen from Total Entries = 19 (two GC Roots missing)

JProfiler GC Roots tracing

1. In actual development, we rarely look at all GC Roots. It is common to check the GC Root of one or more objects, a process called GC Roots tracing

2. Next, we use JProfiler to demonstrate GC Roots tracing

Again, use the following code

public class GCRootsTest {
    public static void main(String[] args) {
        List<Object> numList = new ArrayList<>();
        Date birth = new Date();

        for (int i = 0; i < 100; i++) {
            numList.add(String.valueOf(i));
            try {
                Thread.sleep(10);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }

        System.out.println("Data added, please do:");
        new Scanner(System.in).next();
        numList = null;
        birth = null;

        System.out.println("NumList, birth have been null, please operate:");
        new Scanner(System.in).next();

        System.out.println("The end"); }}Copy the code

1.

2,

You can see the color turns green, you can see the change dynamically

3. Right-click the object and select Show Selection In Heap Walker to view an object individually

4. The Incoming References to trace GC Roots

Click Show Paths To GC Roots and select default Settings in the pop-up screen

OOM JProfiler analysis

Here is a brief introduction, which will be explained in more detail in a later chapter.

/ - Xms8m - Xmx8m * - * * * XX: + HeapDumpOnOutOfMemoryError is the meaning of the parameters when the program appears OOM is in the current project directory to generate a dump file * /
public class HeapOOM {
    byte[] buffer = new byte[1 * 1024 * 1024];//1MB

    public static void main(String[] args) {
        ArrayList<HeapOOM> list = new ArrayList<>();

        int count = 0;
        try{
            while(true) {list.add(newHeapOOM()); count++; }}catch (Throwable e){
            System.out.println("count = "+ count); e.printStackTrace(); }}}Copy the code

Program output log

com.atguigu.java.HeapOOM
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid14608.hprof ...
java.lang.OutOfMemoryError: Java heap space
	at com.atguigu.java.HeapOOM.<init>(HeapOOM.java:12)
	at com.atguigu.java.HeapOOM.main(HeapOOM.java:20)
Heap dump file created [7797849 bytes in 0.010 secs]
count = 6
Copy the code

Open the dump file

1. Look at this huge object

2. Identify the offending code in the main() thread

Clear stage: mark-clear algorithm

Garbage removal phase

  • After successfully separating live and dead objects in memory, the next task of the GC is to perform garbage collection to free up the memory space occupied by the unused objects so that there is enough available memory to allocate memory for the new objects. The three garbage collection algorithms that are currently common in the JVM are fetch Resources
  1. Mark-sweep algorithm
  2. Copying algorithms
  3. Mark-compact algorithm

background

Mark-sweep is a very basic and common garbage collection algorithm, which was proposed by J.McCarthy et al in 1960 and applied to Lisp.

Implementation process

When available memory in the heap is exhausted, the entire program (also known as stop the world) is stopped and two tasks are performed, the first is marked and the second is cleared

  1. Tag: Collector traverses from the reference root node, marking all referenced objects. It is usually recorded as reachable in the Header of an object.
    • Note: it marks referenced objects, which are reachable objects, not garbage objects that are about to be cleared
  2. Cleanup: The Collector traverses the heap memory linearly from beginning to end and reclaims an object if it is not marked as reachable in its headers

Disadvantages of the mark-clear algorithm

  1. The efficiency of the tag – clearing algorithm is not high
  2. During GC, the entire application needs to be stopped and the user experience is poor
  3. The free memory cleared in this way is discontinuous, resulting in internal fragmentation and the need to maintain a free list

Note: What is clearance?

This does not mean that the object addresses need to be cleared are stored in the free address list. The next time a new object needs to be loaded, determine whether there is enough space for garbage, and if so, store it (that is, overwrite the original address).

The free list was mentioned when allocating memory for objects:

  1. If the memory is neat
    • Memory is allocated by means of pointer collision
  2. If the memory is not tidy
    • The virtual machine needs to maintain a free list
    • Use the free list to allocate memory

Cleanup phase: copy algorithm

background

  1. In order to address the shortcomings of mark-sweep algorithms in garbage collection efficiency, M.L. Insky published a famous paper in 1963, Lisp Garbage Collector CA Lisp Garbage Collector Algorithm Using Serial Secondary Storage. The algorithms m.L. Insky described in this paper are known as Copying algorithms, which m.L. Insky himself successfully introduced into an implementation of Lisp.

core idea

The living memory space is divided into two pieces and only one piece is used each time. During garbage collection, the living objects in the used memory are copied to the unused memory block. After that, all objects in the used memory block are cleared, the roles of the two memory blocks are swapped, and finally garbage collection is completed

In the new generation, the replication algorithm is used. The surviving objects in Eden and S0 are copied to S1 as a whole

Advantages and disadvantages of replication algorithms

advantages

  1. No marking and cleaning process, simple implementation, efficient operation
  2. After copying the past to ensure the continuity of space, there will be no “fragmentation” problem.

disadvantages

  1. The disadvantage of this algorithm is also obvious, that is, it requires twice the memory space.
  2. For GC with a large number of regions, such as G1, replication rather than movement means that the GC needs to maintain object reference relationships between regions, which is not a small amount of memory or time

Application scenarios of the replication algorithm

  1. If there are a lot of junk objects in the system, the number of viable objects to be copied by the replication algorithm is not too large and the efficiency is high
  2. When a large number of objects survive in the old age, there will be many objects to copy, which will be inefficient
  3. In the new generation, garbage collection for conventional applications can typically reclaim 70 to 99 percent of memory space at a time. Recycling is cost-effective. So now commercial virtual machines are using this collection algorithm to recycle the new generation.

Clear phase: mark-compress algorithm

Mark-compress (or mark-tidy, Mark-compact) algorithm

background

  1. The high efficiency of the replication algorithm is based on the premise of fewer live objects and more garbage objects. This happens a lot in the new generation, but in the old age, it’s more common for most objects to be living objects. If the replication algorithm is still used, the cost of replication will be high due to the large number of living objects. Therefore, due to the nature of old-time garbage collection, other algorithms need to be used.
  2. The mark-clean algorithm can be used in the old days, but it is not only inefficient, but also generates memory fragmentation after a collection, so JVM designers need to improve on this. The mark-compact algorithm was born.
  3. Around 1970, g.L. Steele, C.J.Chene, D.S. Ise and other researchers published mark-compression algorithms. In many modern garbage collectors, mark-compression algorithms or improved versions of them are used.

Implementation process

  1. The first stage marks all referenced objects from the root node, as in the mark-clearing algorithm
  2. The second phase compresses all living objects to one end of memory and discharges them in sequence. After that, clean up all the space outside the boundary.

Comparison between mark-compression algorithm and mark-clear algorithm

  1. The end result of the mark-sweep-compact algorithm is the same as the memory defragmentation once the mark-sweep-compact algorithm is complete, so it can also be called a Mark-sweep-compact algorithm.
  2. The essential difference between them is that the mark-sweep algorithm is a non-mobile recovery algorithm, while the mark-compression algorithm is mobile. Whether or not to move a reclaimed live object is a risky decision with both advantages and disadvantages.
  3. As you can see, marked living objects are sorted by memory address, while unmarked memory is cleaned up. This way, when we need to allocate memory for new objects, the JVM only needs to hold the starting address of memory, which is significantly less overhead than maintaining a free list.

Advantages and disadvantages of mark-compression algorithms

advantages

  1. Eliminating the fragmentation of memory in the mark-clean algorithm, the JVM only needs to hold the starting address of memory when allocating memory to a new object.
  2. Eliminates the high cost of halving memory in the replication algorithm.

disadvantages

  1. In terms of efficiency, the mark-collation algorithm is lower than the copy algorithm.
  2. As well as moving objects, if the object is referenced by another object, you need to adjust the address of the reference (because the HotSpot VIRTUAL machine is not a handle pool, but a direct pointer).
  3. You need to pause the user application throughout the movement. Namely: STW

Garbage collection algorithm summary

Compare the three algorithms of clearing stage

  1. In terms of efficiency, the replication algorithm is the leader, but it wastes too much memory.
  2. In order to take into account the three indicators mentioned above, the mark-collation algorithm is relatively smoother, but its efficiency is not satisfactory. It has one more mark stage than the copy algorithm, and one more memory collation stage than the mark-clean algorithm.
Mark clear Tag to sort out copy
rate medium The slowest The fastest
The space overhead Less (but will pile up debris) Less (do not accumulate debris) Usually requires twice as much space as live objects (no shards)
Moving objects no is is

Generational collection algorithm[Obtaining resources]

Q: Isn’t there an optimal algorithm?

A: No, there is no best algorithm, only the most suitable algorithm

Why use a generational collection algorithm

  1. None of these algorithms can completely replace other algorithms, they all have their own unique advantages and characteristics. Generational collection algorithm came into being.
  2. The generational collection algorithm is based on the fact that the life cycles of different objects are different. Therefore, objects with different life cycles can be collected in different ways to improve collection efficiency. The Java heap is typically divided into new generation and old generation, so that different collection algorithms can be used according to the characteristics of each generation to improve the efficiency of garbage collection.
  3. During the running of a Java program, a large number of objects are generated, some of which are related to business information:
    • For example, Session objects, threads, and Socket connections in Http requests are directly linked to services, so their life cycle is relatively long.
    • But there are some objects, mainly temporary variables generated in the process of running the program, the life cycle of these objects will be relatively short, such as: String object, because of its invariable class characteristics, the system will produce a large number of these objects, some objects can even be recycled only once.

At present, almost all GC uses generational mobile algorithm to perform garbage collection

In HotSpot, based on the concept of generation, the memory reclamation algorithm used by GC must combine the characteristics of both the young generation and the old generation.

  1. Young Gen
    • Characteristics of the young generation: the area is smaller than that of the old generation, the object life cycle is short, the survival rate is low, and the recycling is frequent.
    • In this case, the recovery of the replication algorithm is the fastest. The efficiency of the replication algorithm depends only on the size of the current living object, so it is suitable for young generation recycling. The problem of low memory utilization of the replication algorithm is alleviated by the two survivor designs in hotspot.
  2. Tenured Gen
    • Characteristics of the old age: large area, long life cycle of objects, high survival rate, recycling less frequent than the young generation.
    • In this case, there are a large number of objects with high survival rates, and the replication algorithm becomes obviously unsuitable. It is usually implemented by mark-clean or a mixture of mark-clean and mark-clean.
      • The cost of the Mark phase is proportional to the number of viable objects.
      • The overhead of the Sweep phase is positively related to the size of the area managed.
      • The overhead of the Compact phase is proportional to the data of the living object.
  3. Take the CMS collector in HotSpot as an example. CMS is implemented based on Mark-sweep and has a high recycling efficiency for objects. For the fragmentation problem, CMS uses Serial Old collector based on mark-Compact algorithm as a compensation measure: When memory collection is poor (due to a Concurrent Mode Failure due to fragmentation), Serial Old is used to perform Full GC to defragment Old memory.
  4. The generational idea is widely used by existing virtual machines. Almost all garbage collectors distinguish between the new generation and the old generation

Incremental collection algorithm and partitioning algorithm

Incremental collection algorithm

With the existing algorithm, the application will be in a Stop the World state during garbage collection. In the Stop the World state, all threads of the application are suspended, suspending all normal work and waiting for the garbage collection to complete. If garbage collection takes too long, the application will be suspended for a long time, which will seriously affect user experience or system stability. In order to solve this problem, the research on real-time garbage collection algorithm directly led to the birth of Incremental Collecting algorithm.

Basic idea of incremental collection algorithm

  1. If processing all the garbage at once requires a long system pause, alternate the garbage collection thread with the application thread. Each time, the garbage collection thread collects only a small area of memory space, and then switches to the application thread. Repeat until garbage collection is complete.
  2. In general, incremental collection algorithms are still based on traditional mark-sweep and copy algorithms. Incremental collection algorithms allow garbage collection threads to mark, clean, or copy in a phased manner by properly handling conflicts between threads

Disadvantages of incremental collection algorithms

Using this approach reduces system downtime because application code is also intermittently executed during garbage collection. However, the overall cost of garbage collection increases due to the consumption of thread switches and context transitions, resulting in a decrease in system throughput.

Partitioning algorithm

This is primarily for the G1 collector

  1. In general, under the same conditions, the larger the heap space, the longer the time required for a GC and the longer the pauses associated with GC. To better control the pauses generated by GC, reduce pauses generated by a GC by splitting a large memory area into smaller chunks and reasonably reclaiming several cells at a time, rather than the entire heap space, based on the target pause times.
  2. The generation algorithm divides the heap space into two parts according to the life cycle of the object, and the partition algorithm divides the whole heap space into continuous different cells. Each area is used independently and recycled independently. The advantage of this algorithm is that it can control how many cells are recycled at a time.

Write in the last

Note that these are just the basic algorithm ideas, the actual GC implementation process is much more complex, currently under development frontier GC are composite algorithms, and both parallel and concurrent.

In the end, I wish you all success as soon as possible, get satisfactory offer, fast promotion and salary increase, and walk on the peak of life.

If you can, please give me a three support me?????? [Obtain information]