Welcome to pay attention to the public number: five small bamboo

What is garbage?

Memory space that is not being used in memory is garbage. Memory in Java is dynamically allocated and automatically reclaimed. Learning about garbage collection mechanisms and tuning strategies can help you deal with and deal with memory leaks in any way you work.



Java virtual machine runtime data is divided into program counters, virtual stack, local method stack, heap, and method area.

Among them, the program counter, virtual machine stack, local method stack these three areas are thread private, will be automatically recovered with the thread dies, so do not need to manage.

So garbage collection only needs to focus on the heap and method area.

Memory allocation in Java

The heap allocation

  • If local thread allocation buffers are enabled, they will be allocated on TLAB in thread priority.
  • Objects are allocated on Eden first
  • Large objects enter the old age
  • Long – lived objects enter old age

On the stack

Based on escape analysis techniques, if an object is always inside a method. To make sure that the object doesn’t escape from the method, have the object allocate memory on the stack so that the object will self-destruct as the method ends, reducing garbage collection pressure.

How to Determine Garbage

Reference counting

Adds a reference counter to an object. For any reference to the object, the count is incremented by 1, and for any reference to fail, it is subtracted by 1. Any object with a counter of 0 at any time is considered garbage.

  • Advantages: simple implementation, high efficiency.
  • Disadvantages: Cannot solve the problem of object circular references. In a multithreaded environment, reference count changes are also expensive to synchronize, resulting in lower performance.

Accessibility analysis

Algorithms that are currently available only in mainstream virtual machines. It searches the object node from the Root (GC Root) down a path known as the reference chain. If an object has no connection to the Root node, it is considered garbage.

The GC Root object

Objects that can be used as GC ROOTS include:

  • Objects referenced in the virtual stack
  • Method area class static attribute credit object
  • Method to the object referenced by a constant
  • An object referenced by a JNI (Native Method) in the local method stack

Reference types

Deciding the survival of an object depends on a reference. Prior to JDK 1.2, a reference simply indicated the location of another chunk of memory stored in the data of the type. After JDK 1.2, Java references fall into the following four categories.

  • Strong reference: Object obj = new Object(); The most common reference, such that a new creation will result in a strong reference to the object. As long as the object has a strong reference to it and is reachable by GC Roots, the object will not be reclaimed.
  • Soft Reference: SotFreReference class to implement. Represents objects that are useful but not necessary. Before OOM, the garbage collector will recycle the objects to which the soft reference refers. The objects associated with the soft reference will only be collected when the memory is low.
  • WeakReference: WeakReference, which represents a non-essential object, will be recovered during YGC. Because YGC times are uncertain, weak references can be recycled at any time.
  • Phantom reference: PhantomReference, which cannot be used to get the object to which it refers. It can be collected by the garbage collector at any time. Phantom references are used primarily to track the activity of objects being collected by the garbage collector and must be used in conjunction with reference queues:

Such as

ReferenceQueue<String> queue = new ReferenceQueue<>();
PhantomReference<String> phantomReference = new PhantomReference<>("Hello", queue);

When the garbage collector is about to collect an object and finds it has a dummy reference, it queues the dummy reference before collecting the object’s memory. The program knows whether the object is to be collected by the garbage collector by the presence of the dummy reference in the reference queue. Then it can do something before collecting it.

Garbage collection algorithm

Mark-Sweep

In the most basic algorithm, the first step is marking. Starting from GC Roots, the object reference relationships are marked successively. Then, the objects that are not marked are cleared.

Disadvantages: one is the efficiency problem, marking and cleaning two processes are not high efficiency; In addition, the cleanup results in a large number of discrete memory fragments. As a result, when we need to allocate large objects, we cannot find enough contiguous memory to trigger another GC.

Copy (Copy)

Copy algorithm: divide the memory into two areas of the same size, use one of them at a time, when the use is finished, the surviving objects in this area are copied to the other area, and then this area is cleared. No memory fragmentation, but memory utilization is not high, resulting in half of the space wasted.

Objects in the new generation are mostly “live and die”, and Hotspot divides the new generation into a large Eden and two smaller Surivor regions by default. During GC, objects that live in Eden and one Surivor region are copied to another Surivor region. By default, the ratio between Eden and the two Surivor zones is 8:1:1, so only 10% of the space is wasted.

Mark-Compact

As with tag cleanup, mark first, but then do not directly collect object cleanup. Instead, live objects are moved to one end, and then memory outside the end boundary is cleaned up directly.

Since the replication algorithm has low efficiency and space waste when there are many living objects, the marker-collation algorithm is generally chosen in the old era.

Comparison of three GC algorithms

Recovery algorithm advantages disadvantages
mark-clear Implement a simple Memory fragmentation exists
copy Fragments free, good performance Low memory usage
mark-collate No fragments The finishing process is expensive

Generational collection

Combine the above algorithms together to adapt to local conditions. Generally we divide the heap space into the new generation and the old age. According to their specific use of different garbage collection algorithms, in the new generation, each GC will have a large number of objects die, a few survive, so the use of copy algorithm. It costs only a few objects to copy and does not cause discontinuous memory fragmentation. In the old age, when the survival rate of the object was high, the mark arrangement or mark removal was adopted.

Garbage collector

The term

  • STW: Short for “Stop The World”, also known as global pause. This will cause the service to be suspended with no response.
  • Serial collection: GC single-threaded memory collection, which suspends all user threads. E.g. Serial, Serial Old
  • Parallel collection: Multiple threads perform concurrent GC, when the user thread is suspended. Such as the Parallel
  • Concurrent collection: The user thread and the GC thread execute simultaneously, without pausing the user thread. Suitable for scenarios where response times are critical. For example, the CMS collector

Serial/Serial Old

Serial is a single-threaded collector that uses only one CPU or one collector thread to do garbage collection, and stops all worker threads until the collection is complete. Serial stops the user thread and collects the young generation using the copy algorithm, while Serial Old collects the Old generation using the mark-collate algorithm.

Features: Single thread collection, STW

Start Serial + Serial Old using -XX:+UseSerialGC

ParNew

A multithreaded version of Serial that uses multiple threads for garbage collection. He is the garbage collector of the new generation. Need to work with older CMS collectors. So you need to use CMS, the new generation default ParNew.

The number of threads collected can be controlled by the ==-XX:ParallelGCThreads== parameter, and the procedure is also STW

Parallel Scavenge/Paraller Old

Parallel Scavenge is the new generation of Scavenge collectors, using a replication algorithm, Paraller Scavenge, and Paraller Old.

The main parameters

  • – XX: UseParallelGC open
  • -XX:MaxGCPauseMillis maximum garbage collection pause time
  • -XX:GCTimeRatio sets the throughput size

Controllable throughput

The maximum GC pause time is controlled with the -XX: MAXGCPAUSEMILLIS parameter. -XX:GCTimeRatio sets the throughput size

Increased throughput can make efficient use of CPU time and complete program tasks as quickly as possible.

Adaptive GC strategy

In addition to processing providing the above two parameters to control the throughput size, Parallel Scavenge can also enable the indicator GC policy with -XX:+ UseAdversiveSizePolicy. Once turned on, it is no longer necessary to manually set the new generation size, Eden/Surivor ratio and other parameters. The virtual machine dynamically adjusts these parameters according to the running status of the system, so as to achieve the optimal pause time and maximum throughput.

CMS

  • 1. Initial Marking: Marking only the objects to which GC ROOTS can be directly associated, resulting in STW.
  • 2. Concurrent marking: The process of GC Roots Tracing is carried out simultaneously by the GC thread and the user thread.
  • 3. Relabeling: Objects whose markup has changed due to program running during the correction of concurrent markup. Will STW
  • 4. Concurrent cleanup: garbage objects are collected concurrently (GC thread and user thread are executed simultaneously)
  • 5. Concurrent Reset: Clean up the CMS GC context information for the next GC.

Advantages: low pause, concurrent execution

Disadvantages:

  • Due to concurrent execution, there is a lot of pressure on CPU resources.
  • Floating garbage generated during collection cannot be handled.
  • Due to the mark-sweep algorithm, there is a lot of fragmentation. The result is that there is not enough memory to allocate large objects, triggering FULLGC.

Use -XX:UseConcMarkSweepGC to start the ParNew+CMS/ serialOld collector combination, that is, the new generation uses Parnew, the old CMS, when the CMS fails, serialOld standby. In order to solve the problem of fragmentation, the CMS can pass – XX: + UseCMSCompactAtFullCollection, forcing the JVM in FullGC to compress the old s, after the completion of execution defragmentation, at the same time will STW. Want to decrease The Times of STW can configure – XX: + CMSFullGCsBeforeCompaction parameters, set in the implementation of The Times, the JVM again space arrangement in old age.

JDK9 has marked the CMS as deprecated, and the CMS has been removed in JDK14.

G1

The new generation of collector introduced in JDK7 is a server-oriented application collector. In contrast to the above collector, G1 works on the entire heap, while the other collectors only work on the new or old generations.

G1 divides Java Heap into several regions of the same size, and specifies the size of the Region with the parameter -XX: G1HeapRegionSize. The value range is 1 ~ 32M, and it should be the power of 2. G1 classifies each Region, including Eden, Surivor, Old, and Humongous respectively. Humongous is equivalent to a large Old, which is used to store large objects.

The G1 heap memory layout is different from the traditional heap memory layout.

G1 divides the space into areas, tracks the value of the Garbage pile in each area, and builds a priority list to collect the areas with the most Garbage, which is why it’s called Garbage-First.

What are the characteristics of G1 versus CMS

  • Concurrency and parallelism: make full use of multiple CPUs and shorten STW time. The concurrent marking phase can execute concurrently with the user thread, and the final marking phase can execute concurrently with the GC thread.
  • Generational collection: G1 can do the garbage collection of the entire GC heap on its own without working with other garbage collectors.
  • Spatial integration: G1 as a whole is implemented by the “marking-finishing” algorithm, while some local Eden and Surivor regions seem to be implemented by the “copying” algorithm. The whole process avoids memory fragmentation.
  • Controlled pause times: In addition to pursuing low pauses, G1 also builds a predictable pause time model, allowing users to specify no more than one period of time spent collecting during a period of time.

G1 Garbage Collection mode

Young GC
  • 1. When all Eden Regions are full, the Young GC is triggered
  • 2. Objects in Eden Region are transferred to Surivor Region
  • 3. Objects in the original Surivor Region are transferred to another Surivor or promoted to Old Region
  • 4. Idle regions are put into the free list for next use.
Mixed GC

When the old s percentage accounted for the total size of the Heap to reach a threshold (- XX: InitialingHeapOccupancyPercent), the default 45%, will trigger a Mixed GC, collect the entire new generation and some of the old age.

Mixed GC collection process

  • 1. Initial markup: Marks only the objects that GC ROOTS can associate with. Modify the value of TAMS, this phase will STW.
  • 2. Concurrent marking: Starting from the GC Root, the objects in the heap memory are analyzed for reachability to find the surviving objects. This process can be carried out concurrently with the user thread.
  • 3. Final markup: Fixed a change in the markup when the user thread continued to run during the concurrent markup phase. This procedure is STW, but can be executed in parallel.
  • 4. Screening and recovery: prioritize the recovery value and cost of each Region, and make a recovery plan according to the user’s expected pause time.
Full GC

Full GC mode is collected using Serial Old, so it STW when the memory of a replicated object runs out, or when enough space cannot be allocated.

How do I reduce Full GC?

  • Increase -XX:G1ReserverPercent to increase the memory reserved.
  • Reduce – XX: InitialingHeapOccupancyPercent, triggered when the old s reaches this value is Mixed GC,
  • Increases the number of threads in the concurrent phase of -XX: ConcgcThreads.

conclusion

Summary of the above introduction to the garbage collector

  • Serial, acting on the new generation, using the replication algorithm
  • Parnew parallel, acting on the new generation, using the replication algorithm
  • Serial, works on the Old, mark-tidy algorithm
  • Parallel, acting on the new generation, replication algorithm
  • Parallel, mark – collate algorithm
  • CMS concurrency, acting on old age, mark-sweep algorithm
  • G1 concurrency + parallelism, acting on the entire heap, replication algorithm, mark-collate

The resources

In-depth Understanding of Advanced Java Virtual Machine JVM Features and Best Practices