Java GC garbage collection is one of the most common JVM questions to be asked in an interview. In this article, you will learn about the underlying principles of Java GC.

Wedges -JVM memory structure replenishment

There are some things that we didn’t cover in the previous article, “Inside the JVM Architecture”, but this article uses garbage collection mechanisms to learn about them. Remember the heap structure diagram in the JVM?

The figure shows three regions in the heap: Eden, From Survivor, and To Survivor. You can also see the size ratio in the picture: 8:1:1 to be exact. The reason for this is explained later in this article, but it depends on the specific situation of garbage collection.

Remember the common arguments like -xms and -xmx when setting up the JVM? That’s right. They’re used to set the size of regions in the heap.

(Photo from Internet)

Control parameters:

  • -Xms Sets the minimum size of the heap.
  • -Xmx Sets the maximum size of the heap.
  • -Initial and maximum sizes of Cenozoic and Cenozoic Xmn stacks (NewSize and MaxNewSize are refined).
  • -xx :NewSize Sets the minimum space size of the new generation.
  • -xx :MaxNewSize Sets the maximum space size of the new generation.
  • -xx :PermSize Sets the minimum space size of the permanent generation.
  • -xx :MaxPermSize Sets the maximum space of the permanent generation.
  • -Xss sets the stack size for each thread.

See if these parameters are not as boring as they used to be. They all have their corresponding positions in the diagram.

Did you find that there is no parameter to directly set the size of the old age space? And we just do a simple calculation.

Old age space size = heap space size - young generation large space sizeCopy the code

Have you had trouble remembering the above parameters immediately? Well, the following mnemonics may help you better remember and understand the meaning of parameters.

Xmx (Memory maximum), Xms (Memory Startup), Xmn (Memory nursery/new), and Xss (stack size).

The format of the parameter can be understood as follows:

  • -: standard VM options, VM specification options.
  • -x: indicates a non-standard VM option that is not supported by all VMS.
  • -xx: advanced option, advanced features, but unstable option.

Summary of the GC

Garbage Collection is often referred to as “GC” and is “automated” by virtual machines.

Consider the question, why should developers learn about GC and memory allocation when GC is automatically collected? In order to be able to configure the above parameter configuration? What are the parameters for?

“When it comes to running out of memory, memory leaks, and garbage as a bottleneck for higher concurrency, we need to monitor and regulate the automatic collection of GC.”

Three areas of the JVM — the program counter, the virtual machine stack, and the local method stack — are created and destroyed by a thread. The stack frame performs loading and unloading operations with the method entering and exiting, realizing automatic memory cleaning. Their memory allocation and reclamation are deterministic.

Therefore, GC garbage collection is concentrated in the heap and method areas, where memory is allocated and used dynamically during program execution.

Let’s look at the GC garbage collection process through concepts and specific algorithms.

How do you determine if an object is alive

There are generally two methods for judging objects: reference counting algorithm and Reachability Analysis.

Reference counting algorithm: Adds a reference counter to an object, increments it every time a reference is made to it, decays it every time a reference is released, and can be reclaimed when the counter is 0.

Reference counting algorithm is simple to implement and efficient to judge. It is widely used in Microsoft COM and Python languages, but it is not used in mainstream Java virtual machines, mainly because it cannot solve the problem of object reference cycle.

Accessibility analysis algorithm: The basic idea is through a series called “the GC Root” objects (such as the system class loader, objects in the stack, active threads, etc.) as a starting point, based on the relationship between object references, began to search down, path through the known as the reference, when there is no reference to an object to the GC Root chain is linked together, prove that the object is not available.

In the figure above, the middle green is the live object and the gray is the recoverable object. Although the gray parts are still internally related, they are not reachable to GC Root.

Interview questions

Interviewer, what algorithms are used for Java GC? Where are they applied?

A: Copy algorithm, tag clearing, tag collation…

Are you still learning by rote? As you read on, you will see that you don’t have to memorize any more.

Mark clearing algorithm

The mark-sweep algorithm consists of two phases: first, all objects that need to be reclaimed are marked, and then all marked objects are uniformly reclaimed after the marking is complete.

The tag clearing algorithm is the most basic collection algorithm, and the subsequent collection algorithms are based on this idea and improve its shortcomings.

The main disadvantages: one is the efficiency problem, the efficiency of the marking and cleaning process is not high; In addition, there is a space problem. After the mark is cleared, there will be a large amount of discontinuous memory fragmentation. Too much space fragmentation may cause the program to be unable to find enough contiguous memory when it needs to allocate large objects in the later run and have to trigger another garbage collection action in advance.

Replication algorithm

Copying algorithm: Dividing available memory into two equivalent pieces by capacity and using only one piece at a time. When a block of memory is used up, the surviving objects are copied onto another block and the previous block is cleaned up.

Each time half memory reclamation, memory allocation does not have to consider the complex situation of memory fragmentation, as long as the heap top pointer is moved, in order to allocate memory, simple implementation, efficient operation.

Disadvantages: Reduced memory to half, low cost performance, continuous replication of long-lived objects leads to low efficiency.

The replication algorithm is used in the next generation of JVM heap. Go back to the original push allocation structure diagram.

During GC collection, when the Eden zone is full, the surviving objects are copied to one of the Survivor zones; When it is recycled, Eden and any surviving objects from the used Survivor zone are copied to another Survivor zone, and Eden and used Survivor zones are cleaned up.

If the other Survivor region does not have enough memory storage, the old age is entered.

There is a mechanism for which objects will enter the old age: each time an object undergoes replication, age is increased by 1, and after reaching the promotion age threshold, it will be moved to the old age.

During the whole process, as the objects in Eden are “born and die instantly” like duckweed, the ratio of memory allocation is 8:1:1 instead of 1:1.

And for those like “water bear”, after many cleaning still survive the object, will enter the old age, and the old cleaning algorithm will use the “mark sorting algorithm” described below.

Tag sorting algorithm

Mark-compact: The marking process is the same as the mark-clean algorithm, but instead of cleaning up the recyclable objects directly, the next step is to move all surviving objects to one end and then clean up memory directly beyond the end boundary.

This algorithm not only does not waste 50% of the memory, but also solves the problem of low efficiency of the replication algorithm when the object survival rate is high.

Generational collection algorithm

The basic idea of generational collection algorithm: The Java heap memory is logically divided into two parts, the new generation and the old generation, and different garbage collection strategies are adopted for objects with different life cycles and different sizes.

However, in the new generation, most objects are transient objects, and only a small number of objects survive. Therefore, the replication algorithm is adopted to clean up the few objects. However, for the objects in the old age, the survival rate is higher, and there is no extra guaranteed memory, so the algorithm of tag collation is adopted.

In fact, looking back, generational collection algorithm is the new generation and old algorithms from the strategic dimension of planning.

summary

At this point, when the interviewer asks which garbage collection algorithms are used in Java GC and in which scenarios, it is no longer necessary to memorize them.

We will continue to update the Java GC as well as the garbage collector and garbage collection tuning in future articles. Please follow the public account “Program New Horizons” for first hand updates.

Stop asking me about Java GC garbage collection.

Resources: Understanding the Java Virtual Machine in Depth.


Program new horizon