An overview of the

Garbage Collection (GC) refers to the various parts of Java memory, including program counters, virtual machine stacks, and local method stacks, which are born and die with a thread, and the stack frames in the stack are executed methodically as methods enter and exit. How much memory is allocated in each stack frame is basically known when the class structure is determined, so the memory allocation and reclamation in these areas are deterministic. In this area, there is no need to think too much about how to recycle. When the method ends or the thread ends, the memory will naturally follow the reclamation.


Two areas, the Java heap and the method area, have uncertainty: An interface of multiple implementation classes need memory may not be the same, a method to implement the branch memory needed for different conditions may also be different, only in the operating range, we can know the program will create those objects, creating many objects, this part is dynamic memory allocation and recovery. The garbage collector is concerned with how this portion of memory is managed.

1. Memory allocation and reclaiming policy


The allocation of object memory is all allocated on the heap

The Java Heap is the primary area managed by the Garbage collector and is therefore also known as the Garbage Collected Heap. From the point of view of garbage collection, the Java heap can be subdivided into: new generation and old generation: Eden space, From Survivor, To Survivor space, etc. The purpose of further partitioning is to better reclaim memory, or to allocate memory faster.

Basic structure of heap space:

As shown in the figure, Eden zone, From Survivor0 zone and To Survivor1 zone all belong To Cenozoic era, and Old Memory belongs To Old age

1.1 Objects are preferentially allocated in Eden

In most cases, objects will be allocated first in Eden. When Eden does not have enough space to allocate, the VIRTUAL machine will initiate a Minor GC. After a garbage collection, if the object is still alive, it will enter S0 or S1, and the age of the object will be increased by one. When it reaches a certain age (15 by default), it is placed in the old age. The age threshold for storing an object to the old age can be set by using -xx :MaxTenuringThreshold.

Case study:

public class GCTest {

    public static void main(String[] args) {
        byte[] allocation1, allocation2,allocation3,allocation4,allocation5;
        allocation1 = new byte[32000*1024];
        allocation2 = new byte[32000*1024];
        allocation3 = new byte[1000*1024];
        allocation4 = new byte[1000*1024];
        allocation5 = new byte[1000*1024]; }}Copy the code

GC output:

[GC (Allocation Failure) [PSYoungGen: 37244K->928K(76288K)] 37244K->32936K(251392K), 0.0229665 secs] [Times: user=0.10 sys=0.01, real=0.02 secs] 
Heap
 PSYoungGen      total 76288K, used 36825K [0x000000076ab00000.0x0000000774000000.0x00000007c0000000)
  eden space 65536K, 54% used [0x000000076ab00000.0x000000076ce0e518.0x000000076eb00000)
  from space 10752K, 8% used [0x000000076eb00000.0x000000076ebe8000.0x000000076f580000)
  to   space 10752K, 0% used [0x0000000773580000.0x0000000773580000.0x0000000774000000)
 ParOldGen       total 175104K, used 32008K [0x00000006c0000000.0x00000006cab00000.0x000000076ab00000)
  object space 175104K, 18% used [0x00000006c0000000.0x00000006c1f42010.0x00000006cab00000)
 Metaspace       used 3138K, capacity 4496K, committed 4864K, reserved 1056768K
  class space    used 345K.capacity 388K.committed 512K.reserved 1048576K
Copy the code

As you can see from the output, the virtual machine will initiate a Minor GC, because the FROM region has occupied 8% and the old generation has occupied 18%. So there will be no Full GC. After Minor GC is performed, memory is allocated in Eden region if the later allocated object has Eden region.


1.2 Large objects directly into the old age


Large objects are Java objects that require a large amount of contiguous memory space. The most typical large objects are long strings or arrays with a large number of elements, such as the byte[] array in the above example. We should avoid them when writing programs.


The reason to avoid large objects in the Java VIRTUAL machine is that when allocating space, it tends to trigger garbage collection in advance to obtain enough contiguous space to place them, and when copying objects, large objects imply high memory copying overhead. The HotSpot VIRTUAL machine provides the -xx: PretenureSizeThreshold parameter, which specifies that objects larger than this value are allocated directly in the older generation. This is to avoid replication between Eden and two Survivor zones, resulting in a large number of replication operations.


1.3 Long-lived objects will enter the old age


Most collectors in the HotSpot VIRTUAL machine use generational phones to manage the heap memory, so memory collection must be able to decide which live objects should be stored in the new generation and your live objects in the old generation. To do this, the virtual machine defines an Age counter for each object, which is stored in the object header. The object is usually born in Eden, and if it survives the first Minor GC and can be held by Survivor, it is moved to Survivor space. If the age is set to 1 year, the object increases its age by one year if it has not survived a Minor GC in a Survivor zone, and when it reaches a certain age (15 by default) it is promoted to the old age. The age threshold for the object to be promoted to the old age can be set by using -xx: MaxTenuringThreshold.


1.4 Dynamic object age determination


The HotSpot virtual machine does not always require objects to be aged –XX in order to be able to better accommodate the memory of different applications: MaxTenuringThreshold can be promoted to the old age. If the total size of all objects of the same age in Survivor space is greater than half of Survivor space, objects older than or equal to this age can directly enter the old age without waiting until -xx: MaxTenuringThreshold Specifies the required age.

Case study:

public class GCTest {

    private static final int _1MB = 1024 * 1024;


    public static void main(String[] args) {
        testTenuringThreshold2();
    }
    /** * VM parameters: -verbose:gc -Xms20M -Xmx20M -Xmn10M -XX:+PrintGCDetails -XX:SurvivorRatio=8 * -XX:MaxTenuringThreshold=15 * -XX:+PrintTenuringDistribution */
    @SuppressWarnings("unused")
    public static void testTenuringThreshold2(a)
    {
        byte[] allocation1, allocation2, allocation3, allocation4;
        allocation1 = new byte[_1MB / 4];
        // Allocation1 + Allocation2 is greater than half of survivo
        allocation2 = new byte[_1MB / 4];
        allocation3 = new byte[4 * _1MB];
        allocation4 = new byte[4 * _1MB];
        allocation4 = null;
        allocation4 = new byte[4* _1MB]; }}Copy the code

Execution Result:

Heap
 PSYoungGen      total 9216K, used 7541K [0x00000007bf600000.0x00000007c0000000.0x00000007c0000000)
  eden space 8192K, 92% used [0x00000007bf600000.0x00000007bfd5d738.0x00000007bfe00000)
  from space 1024K, 0% used [0x00000007bff00000.0x00000007bff00000.0x00000007c0000000)
  to   space 1024K, 0% used [0x00000007bfe00000.0x00000007bfe00000.0x00000007bff00000)
 ParOldGen       total 10240K, used 8192K [0x00000007bec00000.0x00000007bf600000.0x00000007bf600000)
  object space 10240K, 80% used [0x00000007bec00000.0x00000007bf400020.0x00000007bf600000)
 Metaspace       used 3113K, capacity 4496K, committed 4864K, reserved 1056768K
  class space    used 342K.capacity 388K.committed 512K.reserved 1048576K
Copy the code


1.5 Area for GC


For Hotspot VM implementations, there are two broad categories of GC

Partial GC:

  • Minor GC/Young GC: Collects garbage only for Young generations
  • Major GC/Old GC: Only Old GC is collected
  • Mixed GC: Integrate new generation and part old generation garbage collection

Whole heap collection:

  • Full GC: Collects the entire Java heap and method area.


2. How can I determine whether the object is dead?


The heap holds almost all object instances in the Java world, and the first thing the garbage collector needs to do before collecting the heap is to determine which objects are “alive” and which are “dead”.

2.1 Reference counting method


Many textbook algorithms for determining whether an object is alive are as follows: add a reference counter to the object and increment it by one each time it is referenced; When a reference is invalid, the counter value is reduced by one; An object whose counter is zero at any point in time cannot be used.

Objectively speaking, although Reference Counting algorithm occupies some extra memory space for Counting, its principle is simple and judgment efficiency is high, so it is a good algorithm in most cases. However, in the Java world, at least in mainstream Java virtual machines, reference counting algorithms are not used to manage memory, mainly because this seemingly simple algorithm has a lot of exceptions to consider and requires a lot of extra processing to work correctly. Reference counting alone, for example, is difficult to solve the problem of circular references between objects.


/** * objA and objB will be GC after testGC() is executed. * /
public class ReferenceCountingGc {
    Object instance = null;
    public static void main(String[] args) {
        ReferenceCountingGc objA = new ReferenceCountingGc();
        ReferenceCountingGc objB = new ReferenceCountingGc();
        objA.instance = objB;
        objB.instance = objA;
        objA = null;
        objB = null;
		// Can objA and objB be reclaimed if GC occurs on this line?System.gc(); }}Copy the code


2.2 Accessibility analysis


The basic idea of this algorithm is to start with a series of objects called “GC Roots”, from which you start to search down. The path of the nodes is called the reference chain. When an object is not connected to GC Roots by any reference chain, it is proved that the object is not available.

Objects that can be used as GC Roots include:

  • The object referenced in the virtual machine stack (the local variable table in the stack frame)
  • Reference objects in the Native method stack
  • Class static properties in the method area reference objects
  • The object referenced by the constant in the method area
  • All objects held by the synchronization lock


2.3 reference


Whether the reference count algorithm is used to judge the number of references to the object, or whether the reference chain is reachable by the reachability analysis algorithm, whether the object is alive or not is inseparable from “reference”.

Prior to JDK1.2, a reference was traditionally defined in Java: a reference was a reference to a block of memory or an object if the value stored in it represented the starting address of another block of memory. There is nothing wrong with this definition, but now it seems a little too narrow. An object has only two states of “referenced” or “unreferenced” under this definition, and it is unable to describe some “tasteless” objects. For example, we want to describe a class of objects: They can be kept in memory when there is enough space, and can be discarded if the space is still too tight after garbage collection — a scenario that many systems’ caching capabilities fit into.

After JDK1.2, Java expanded the concept of references, References are classified into Strongly re-reference, Soft Reference, Weak Reference, and Phantom Reference. The strength of these four references decreases successively.

  • Strong reference

    Strong references are the most traditional definition of “reference”, referring to the reference assignments that are common in program code, such as “Object obj=new Object()”. In any case, as long as strong references exist, the garbage collector will never reclaim the referenced object.

  • Soft references

    Soft references are used to describe objects that are useful but not necessary. Only objects that are associated with soft references are listed in the recycle range for a second collection when they are about to run out of memory. If there is not enough memory in the recycle range, an out of memory exception is thrown. The SoftReference class was provided to implement soft references after JDK 1.2.

  • A weak reference

    Weak references are also used to describe non-essential objects, but they are weaker than soft references, and objects associated with weak references only survive until the next garbage collection occurs. When the garbage collector starts working, it reclaims associated objects that are weakly referenced, regardless of whether there is currently enough memory. WeakReference classes were provided after JDK 1.2 to implement weak references.

  • Phantom reference

    Virtual references, also known as “ghost references” or “phantom references”, are the weakest type of reference relationship. The existence of a reference to an object has no effect on its lifetime and cannot be removed by virtual reference. The sole purpose of setting a virtual reference association for an object is to receive a system notification when the object is reclaimed by the collector. The PhantomReference class was provided after JDK 1.2 to implement virtual references.


2.4 Whether the object lives or dies


Even in the reachability analysis, unreachable objects are not necessarily dead. At this time, they are in the “probation stage”. In order to truly declare an object dead, at least two experience processes must be experienced. Unreachable objects in the reachability analysis method are marked for the first time and screened. The screening condition is whether it is necessary to implement finalize method for the object. When objects do not overwrite the Finalize method, or finalize method has been called by the virtual machine, the virtual machine considers these two cases as unnecessary to execute.

Objects judged to need to be executed are marked a second time in a queue, and are actually reclaimed unless the object is associated with any other object in the reference chain.


2.5 Recycling method area


The method area garbage collection is divided into two main parts: obsolete constants and no longer used types. Recycling deprecated constants is very similar to recycling objects in the Java heap.

  • Obsolete constant

Deprecated constants: Suppose a string “Java” was entered into the constant pool, but there are no string objects with the value “Java” in the system. In other words, there are no string objects that reference “Java” in the constant pool, and there are no other references to the literal in the virtual machine. If a memory collection occurs at this point and the garbage collector determines that it is necessary, the “Java” constant is cleaned out of the constant pool by the system. Symbolic references to other classes (interfaces), methods, and fields in the constant pool are similar.

  • Type not in use

It is relatively easy to determine whether a constant is “obsolete,” while the criteria for determining whether a type is “no longer in use” are more demanding. The following three conditions must be met simultaneously:

  1. All instances of the class are already recycled, that is, there are no instances of the class or any of its derived children in the Java heap.
  2. The classloader that loaded the class has already been recycled, a condition that is usually difficult to achieve unless it is a carefully designed alternative classloader scenario, such as OSGi, JSP reloading, and so on.
  3. The java.lang.Class object corresponding to this Class is not referenced anywhere, and the method of this Class cannot be accessed anywhere through reflection.


A virtual machine can recycle useless classes that meet the above three conditions. This is only “yes”, not necessarily recycled when they are no longer used.


Garbage collection algorithms



3.1 Generational collection algorithm


Most of the current garbage collectors for commercial virtual machines are designed to follow the “Generational Collection” theory, which is essentially a set of rules of thumb for how most programs actually run. It’s based on two Generational hypotheses

  1. The Weak Generational Hypothesis: the overwhelming majority of subjects were born and died.
  2. The Strong Generational Hypothesis: the more times an object goes through a garbage collection, the less it survives.

Together, these two generational hypotheses form a consistent design principle for several common garbage collectors: the collector should divide the Java heap into different regions and then allocate the collected objects to different regions for storage based on their age (that is, the number of times the object has survived the garbage collection process). Obviously, if most of the objects in an area is in the evening out, it is difficult to survive garbage collection, then put them on together, each time recycling focuses only on how to keep a small amount of live rather than a tag which a large number of objects will be recycled, is at a relatively low cost recovery to a lot of space if all the rest is difficult to the demise of the object, By putting them together, the virtual machine can recycle this area at a low frequency, which allows for both the time cost of garbage collection and the efficient use of memory space.

Designers typically divide the Java heap at least into Young Generation and Old Generation regions [2]. As the name implies, in the new generation, a large number of objects are found dead during each garbage collection, and the few objects that survive after each collection will be gradually promoted to the old age for storage


3.2 Mark clearing algorithm


One of the earliest and most basic garbage algorithms is the “Mark-sweep” algorithm, which, as its name suggests, has two phases: “Mark” and “Sweep” : First mark all the objects that need to be recycled, after the mark is completed, all the marked objects are recycled, or you can go back, mark the surviving objects, all the unmarked objects are recycled. The marking process is to determine whether an object is garbage

There are two obvious problems with this garbage algorithm

  • The efficiency problem
  • Space issues (large number of discontinuous fragments after marker clearing)


3.3 Mark-copy algorithm


Mark-copy algorithm is often referred to as replication algorithm for short. In order to solve the problem of low efficiency of mark-clean algorithm in the face of a large number of recyclable objects, “mark-copy” collection algorithm appears. It divides memory into two equally sized pieces and uses one piece at a time. When this area of memory is used up, the surviving objects are copied to another area, and then the used space is cleaned up again. In this way, half of the memory range is reclaimed each time.


3.4 Mark-collation algorithm


The efficiency of mark-copy algorithm will decrease when the survival rate of the object is high. More importantly, if you do not want to waste 50% of the space, you need to have extra space for allocation guarantee, in order to cope with the extreme case that all objects in the used memory are 100% alive, so this algorithm generally cannot be directly selected in the old days.

Object in view of the old s survival characteristics, this paper puts forward another targeted “tag – the whole” (Mark – Compact) algorithm, the marking process still like “tag – clear” algorithm, but the subsequent steps are not directly to recyclable objects to clean up, but let all live objects move one end to the memory space, Then clean up memory directly outside the boundary.



Garbage collector


If collection algorithms are the methodology of memory collection, then the garbage collector is the practice of memory collection.

Although we than on the collector of a, but not to choose a best collector, although the garbage collector technology is improving, but until now is not the best collector, more do not exist “universal” collector, so we choose only the most suitable collector for specific applications.


4.1 Serial Collector


The Serial collector is the most basic and oldest collector and was the only choice for the new generation of HotSpot virtual machine collectors prior to JDK1.3. The collector is a collector of the single thread work, but it’s the meaning of “single” is not just that it can only use a single processor or a collection of threads to complete garbage collection work, more important is emphasised on its garbage collection, must suspend all other worker threads, until it is about the end of the collection.

The designers of The virtual machine were certainly aware of The poor user experience of Stop The World, so The pauses were reduced in subsequent garbage collector designs (there were still pauses, and The search for The best garbage collector continued).

But does the Serial collector have any advantages over other garbage collectors? Of course it does, it’s simple and efficient (compared to the single-threaded efforts of other collectors). The Serial collector naturally achieves high single-thread collection efficiency because it has no overhead of thread interaction. Serial collectors are a good choice for virtual machines running in Client mode.


4.2 ParNew collector


The ParNew collector is essentially a multithreaded parallel version of the Serial collector, with all control parameters available to the Serial collector (e.g. -xx: SurvivorRatio, -xx: SurvivorRatio, -xx: SurvivorRatio,) except for garbage collection using multiple threads at the same time. PretenureSizeThreshold, -XX: HandlePromotionFailure, etc.), collection algorithms, Stop The World, object allocation rules, and reclamation policies are all identical to Serial collector

In addition to supporting multithreaded parallel collection, the ParNew collector, especially the preferred next-generation collector for legacy systems prior to JDK 7, is currently the only one that works with the CMS collector, in addition to the Serial collector.


4.3 Parallel Scavenge


The Parallel Collector is a new generation collector. It is also based on the mark-copy algorithm and is a multi-threaded collector that can collect in Parallel

  • The Parallel Insane and ParNew
-xx :+UseParallelOldGC :+UseParallelOldGC :+UseParallelOldGCCopy the code

The Parallel Avenge collector is also often referred to as a “through-first collector” because of its affinity for throughput. The Parallel Scavenge collector focuses on throughput (efficient CPU utilization). Garbage collectors like CMS focus more on user thread pause times (to improve the user experience). Throughput is the ratio of the amount of time the CPU spends running user code to the total CPU elapsed time.

The Parallel Exploiter provides a number of parameters to find the most appropriate pause times or maximum throughput. Use the Parallel Exploiter to support an adaptive adjustment strategy if manual optimization is difficult and you don’t know how the collector operates. Leaving memory management optimization to the virtual machine is also a good option. This kind of priority, priority, priority


This is the JDK1.8 default collector

Run the Java -xx :+PrintCommandLineFlags -version command

-XX:InitialHeapSize=262921408 -XX:MaxHeapSize=4206742528 -XX:+PrintCommandLineFlags -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC
java version "1.8.0 comes with _211"
Java(TM) SE Runtime Environment (build 1.8. 0 _211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)

Copy the code

Insane +UseParallelGC -xx :+UseParallelOldGC JDK1.8 insane +UseParallelOldGC You can disable this feature by using -xx: -useParalleloldgc


4.4 Seria Old Collector


Serial Old is an older version of the Serial collector, which is also a single-threaded collector. It is used primarily for two purposes: as a companion to the Parallel Scavenge collector in JDK1.5 and earlier releases, and as a fallback to the CMS collector.


4.5 Parallel Old Collector


Parallel Old is an older version of the Parallel Avenge collector, supported by multiple threads for concurrent collection and implemented on a mark-collation algorithm. The Parallel Avenge and Parallel Old collectors are preferred in applications where throughput and CPU resources are important.


4.6 CMS Collector


The CMS (Concurrent Mark Sweep) collector is a collector whose goal is to obtain the shortest collection pause time. At present, a large part of Java applications are set on Internet websites or B/S system servers based on browsers. Such applications usually pay more attention to the response speed of services and hope that the system pause time is as short as possible to bring good interactive experience to users. The CMS collector does not often meet the needs of such applications.

As the name (including “Mark Sweep”) suggests, the CMS collector is based on a mark-sweep algorithm. Its operation is more complex than the previous collectors, and the whole process is divided into four steps:

  1. CMS Initial mark: Suspend all other threads and record objects directly connected to root. Very fast
  2. CMS Concurrent mark: The process of traversing the entire object graph from directly associated objects in GC Roots. This process takes a long time but does not require the user thread to pause and can be run concurrently with the card and phone threads
  3. CMS remark: In order to correct the marking record of the part of the object whose marking changed because the user program continued to operate during concurrent marking, the pause time during this phase is usually slightly longer than the initial marking phase, but also much shorter than the concurrent marking phase
  4. CMS Concurrent sweep: Forcibly removes dead objects judged to be dead by the marking phase. Since living objects do not need to be moved, the entire phase can also be concurrent with the user thread.

As its name suggests, it is an excellent garbage collector with its main advantages: concurrent collection and low pauses. But it has the following three obvious disadvantages:

  • Sensitive to CPU resources
  • Unable to handle floating garbage
  • The collection algorithm it uses – the “mark-sweep” algorithm – results in a large amount of space debris at the end of the collection.


4.6 Garbage First


The Garbage First (G1 for short) collector is a milestone in the history of Garbage collector technology. It pioneered the idea of local collection-oriented design and region-based memory layout.

G1 (garbage-First) is a server-based Garbage collector, mainly for machines equipped with multiple processors and large memory capacity. High throughput performance characteristics while meeting the GC pause time requirements with extremely high probability

  • Concurrency and parallelism: The G1 takes full advantage of CPU, multi-core hardware, and uses multiple cpus (cpus or CPU cores) to reduce stop-the-world pause times. While other collectors would have paused GC actions performed by Java threads, the G1 collector can still allow Java programs to continue executing concurrently.
  • Generational collection: Although G1 can manage the entire GC heap independently without the cooperation of other collectors, the concept of generational collection is retained.
  • Spatial integration: Different from CMS’s “mark-clean” algorithm, G1 is a collector based on the “mark-clean” algorithm as a whole; Locally, it is based on the mark-copy algorithm.
  • Predictable pauses: This is another big advantage G1 has over CMS. Reducing pause times is a common concern of BOTH G1 and CMS, but G1 also models predictable pause times, allowing users to explicitly specify a time segment of M milliseconds in length.

The G1 collector operates in the following steps:

  • Initial tag
  • Concurrent tags
  • In the end tag
  • Screening of recycling

The G1 collector maintains a priority list behind the scenes, prioritizing the Region with the greatest collection value (hence its name garbage-first) based on the allowed collection time each time. This use of regions and prioritized Region collection ensures that the G1 collector can collect as efficiently as possible in a limited amount of time (dividing memory into pieces).


4.6 ZGC Collector


ZGC (Z Garbage Collector) is an experimental [1] low-latency Garbage Collector that was added in JDK 11 and developed by Oracle. In 2018 Oracle created JEP 333 to submit ZGC to OpenJDK, pushing it into the Release manifest of OpenJDK 11.

Like ParNew and G1 in CMS, ZGC uses a mark-copy algorithm, although it makes significant improvements to the algorithm.

ZGC operation process:

  • Concurrent tags
  • Concurrent preparatory reallocation
  • Concurrent redistribution
  • Concurrent remapping


“When there is a high wind, life does not give up”