Hi, I’m Jack Xu. We know that the JVM tuning is the main tone of the choice of the garbage collector and parameter setting, so we must grasp knowledge of recycling, otherwise how tuning, then what is garbage, garbage in our analogy life, is not, to clear, so the first step is to find the garbage, In Java we have two approaches.

How to define garbage

Reference counting

Add a reference counter to the object, and the counter value is +1 every time it is referenced somewhere. When the reference is invalidated, the counter value is -1. An object with a counter of 0 at any time cannot be in use, is deemed unreachable and waits for GC to clean up. This method is almost useless because if AB and AB hold references to each other, they can never be recycled.

/ * * *@author jack xu
 */
public class ReferenceCount {

    public Object instance = null;

    public static void main(String[] args) {
        testGC();
    }

    public static void testGC(a) {
        ReferenceCount a = new ReferenceCount();
        ReferenceCount b = new ReferenceCount();

        a.instance = b;
        b.instance = a;

        a = null;
        b = null; }}Copy the code

As you can see in the figure below, the last two objects are no longer accessible, but because they reference each other, their reference count will never be zero, and the reference counting algorithm will never tell the GC collector to collect them. This leads to a memory leak, which can eventually lead to a memory overflow.

Accessibility analysis

The basic idea of the accessibility analysis algorithm is that some objects called GC Roots are taken as the starting point, and the search is started from these nodes, and the path searched is called (Reference Chain). When an object is not linked by any reference chain to the GC Roots (i.e. unreachable from the GC Roots node to the node), the object is proved to be unavailable, as shown in the figure below.

  • Objects referenced in the Java virtual machine stack (the local variable table in the stack frame)
  • Objects referenced by JNI (also known as Native methods in general) in the Native method stack.
  • An object referenced by a class static property in the method area
  • A reference object to a constant in the method area.
  • Class loader
  • Thread Thread class

Garbage collection algorithm

After determining which garbage can be collected, all the garbage collector has to do is start garbage collection, but there is a problem involved: how to do it efficiently? There are three algorithms.

Mark-sweep

This is the most basic garbage collection algorithm. The mark-clean algorithm is divided into two phases: the mark-clean phase and the cleanup phase. The marking phase marks all the objects that need to be reclaimed, and the clearing phase reclaims the space occupied by the marked objects.

  • Position discontinuous, resulting in debris
  • Low efficiency, two scanning, marking and clearing are time-consuming

Copying algorithms

In order to solve the defects of mark-Sweep algorithm, Copying algorithm was proposed. It divides the available memory into two equal sized chunks by capacity, using only one chunk at a time. When the memory of this piece is used up, the objects that are still alive are copied to another piece, and then the used memory space is cleared up once, so that memory fragmentation is not easy to occur.

Cons: 50% of the memory space is always free and wasted!

Mark-compact

In order to solve the defects of Copying algorithm and make full use of memory space, the Mark-Compact algorithm was proposed. The algorithm performs the same marking phase as Mark-Sweep, but instead of cleaning up the retractable objects directly after marking, it moves all the living objects to one end and then cleans up the memory beyond the end boundary.

Disadvantages: low efficiency, double scan, pointer needs to be adjusted

Generational collection algorithm

Generational Collection is not strictly an idea or theory, but a combination of different algorithms for different situations by combining the above three basic algorithm ideas.

We know that most objects are ephemeral, so we divide the heap into new generation, old generation, and eternal generation (called meta-space in JDK8), so that they can be collected in different generations. The new generation is further divided into Eden and Survivor zones, with a ratio of 8:1:1.

Here’s a graphic illustration of the generational garbage collection process:

Step 1: The newly allocated objects are placed in the Garden of Eden. When the Garden of Eden is full, a minor GC is triggered. The minor GC will clean up all unused garbage in the young generation, including S0 and S1.

Step 2: The remaining objects in The Garden of Eden are the ones that have not been cleared. Place their age +1 in s0

Step 3: When the garden of Eden is full, minor GC is fired again. The surviving objects in The Garden of Eden age +1 are placed in s1, and the surviving objects in S0 age +1 are placed in S1, leaving s0 empty

Step 5: After the survivor zone reaches the age, enter the old age. The default age is 15 (6 in CMS). This age can be changed by yourself

After looking at the above diagram, we know that the young substitute is the replication algorithm, because most of the object life cycle is short, recycling is very frequent, using the replication algorithm is high efficiency; The old generation uses a tag clearing or tag sorting algorithm, because objects live longer in the old age, and it is not necessary to copy to copy.

Garbage collector

Garbage collector is the specific implementation of garbage collection algorithm, to put it bluntly is landing, we introduce the following several commonly used collectors.

Serial/Serial Old

The Serial/Serial Old collector is The most basic and oldest collector. It is a single-threaded collector, and while it is garbage collecting, it must pause all user threads, stopping The World. Serial collectors are targeted at the new generation of collectors and adopt Copying algorithm. The Serial Old collector is an Old collector that uses the Mark-Compact algorithm. The advantage is that it is simple and efficient to implement, but the disadvantage is that it can cause pauses for the user.

ParNew

The ParNew collector is essentially a multithreaded version of the Serial collector, with multiple threads doing garbage collection in parallel. When a multi-core CPU is used, it is more efficient than Serial. When a single-core CPU is used, it is similar to Serial. In the new generation, the use of replication algorithm, with CMS use.

Parallel Scavenge/Parallel Old

The Parallel Scavenge collector is a new breed of multithreaded collector. It’s also Parallel collecting, which looks like ParNew, but Parallel Scanvenge is more concerned with system throughput, using Copying algorithms.

Parallel Old is an Old version of the Parallel scexploiture. It is the Parallel collector. It uses multithreading and Mark-Compact algorithms, and is also focused on throughput.

Parameter control: -xx: + useParallelGC-xx: +UseParallelOldGC

CMS

Concurrent Mark Sweep (CMS) collector is a collector aiming at obtaining the shortest collection pause time. It is a Concurrent collector using the Mark-Sweep algorithm.

Note: Parallel means that the garbage collection threads execute in parallel, and parallel means that the user thread and the garbage collection thread execute together.

Its operation process is more complicated than the previous collectors, and the whole process is divided into four steps, including:

  • CMS Initial Mark
  • CMS Concurrent Mark
  • (CMS remark)
  • CMS Concurrent Sweep

The initial marking and re-marking steps still require “Stop The World”. The initial marking is only to mark the objects that can be directly associated with GC Roots, which is very fast. The concurrent marking stage is the process of GC Roots Tracing, while the re-marking stage is to correct the marking records of the part of objects whose marking changes due to the continuous operation of the user program during the concurrent marking. The pause time for this phase is typically slightly longer than for the initial markup phase, but much shorter than for concurrent markup.

Since the collector threads can work with the user threads during the longest concurrent marking and concurrent cleanup, the CMS collector’s memory reclamation process is generally performed concurrently with the user threads.

Parameter control: -xx :+UseConcMarkSweepGC

G1

The G1 collector is the latest achievement of collector technology. It is a service-oriented collector that can make full use of multi-CPU and multi-core environments. So it’s a parallel and concurrent collector, and it can set pause times. Compared with the CMS collector, the G1 collector has the following features:

1. Spatial integration. The G1 collector adopts mark-collation algorithm and will not generate memory space fragmentation. Allocating large objects does not trigger the next GC prematurely because contiguic space cannot be found.

2, predictable pauses, this is another big advantage of G1, reduce the pause time is the common concern of G1 and CMS, but G1 in addition to the pursuit of low pause, also can establish predictable pauses model, can let the user specify in a length of N segment within milliseconds, time on garbage collection may not consume more than N milliseconds, This is almost already a feature of the real-time Java (RTSJ) garbage collector.

The above mentioned garbage collectors collect the entire new generation or the old generation, and G1 is no longer the case. When using the G1 collector, the memory layout of the Java heap is very different from other collectors. It divides the entire Java heap into independent regions of equal size. Although the concept of the new generation and the old age is still retained, the new generation and the old age are no longer physically separated. They are all sets of partial (and possibly discontinuous) regions.

The working process is as follows:

  • Initial Marking: Marking the objects that GC Roots can associate with and changing the value of TAMS requires suspending the user thread
  • Concurrent Marking: Perform an accessibility analysis from GC Roots to find surviving objects and execute them concurrently with user threads
  • Final Marking: Fix the concurrent Marking phase where the user thread needs to be suspended because of changes in data caused by concurrent execution of the user program
  • Live Data Counting and Evacuation (Counting and Evacuation) : Sorts the collection value and cost of each Region, and makes a collection plan based on the expected GC pause time

    Parameter control: -XX: +UseG1GC

Choice of garbage collector

The choice of garbage collector is based on two key metrics, pause time and throughput.

  • Pause time: The time it takes for the garbage collector to collect the terminal application to respond
  • Throughput: User-code run time /(User-code run time + garbage collection time)

The shorter the pause time, the better the application that needs to interact with the user, and the better the response time improves the user experience; High throughput can efficiently utilize CPU time and complete the operation tasks of the program as soon as possible. It is mainly suitable for tasks that are performed in the background and do not require too much interaction.

Let’s divide the garbage collector described above into classes

  • Concurrency collectors [pause time first] : CMS, G1 —-> Suitable for scenarios where relative time is required, such as the Web
  • Parallel collector [throughput first] : Parallel Scanvent and Parallel Old —-> Suitable for scientific computing, background processing, and other interactive scenarios
  • Serial collector: Serial and Serial Old —-> suitable for relatively small memory, embedded devices

Finally, JamesGosling, when he was designing Java in 1995, didn’t realize that The language would have more Web development in The future, a smaller pause time scenario, so it was serialized at first, and you had to Stop The World, which is unthinkable today. Imagine you on Taobao is high, suddenly the web page can not open, this you can endure? Later, there is the default PS+Parallel Old Java8, which is throughput first. Later, Java8 and Java9 have higher requirements for time, and there are CMS, G1 and ZGC, which is not introduced in this article. So with the progress of The Times, the garbage collector is also constantly transformed and upgraded.

Finally, look at the Oracle website to tell us how to choose a garbage collector, because the web is all blog translation, not quite correct, we look at the official website will be more authoritative. Original is not easy, if you think the writing is good, please click like!