JVM memory generation and garbage collection

Copyright notice: This article is an original article by Gu Dong. It can be reproduced at will, but the source must be noted in a clear place.

Personal Homepage: Gudong.name

Gudong.name /2017/04/24/…

There are many articles about JVM memory model and garbage collection on the Internet. I have seen many articles before, but I have never systematically studied them. This time I happened to see Zhou Zhiming’s “In-depth Understanding of Java Virtual Machine”, the content is very good.

This article will briefly document and share knowledge about the JVM memory model, memory overflow, memory generation, and garbage collection algorithms. Of course, in the original book, the author spent a lot of space to explain these parts. If this article keeps you interested, read the original book.

JVM memory region

As you all know, the JVM memory region is divided into five parts. If you are confused, refer to the previous article – Introduction to JVM memory regions.

Here are the five parts of the JVM

Program counter This is a small area of memory that can be thought of as a line number indicator of the bytecode executed by the current thread.
Java virtual machine stack it is the memory model of Java method execution, each method is called to the completion of the execution process, corresponding to a stack frame in the virtual machine stack from the stack to the stack process, thread private.
The local method stack is similar to the virtual machine stack, except that the local method stack is used to execute local methods and is thread private.
This area of the Java heap exists for the sole purpose of storing objects. Almost all object instances in your application are allocated memory here, and all threads share this area.
The method area is used to store data such as class information that has been loaded by the virtual machine, constants, static variables, and code compiled by the just-in-time compiler, which is shared by all threads.

The OOM

As we all know, after any application is started, the memory allocated by the operating system must be limited, so how to manage the memory reasonably and effectively becomes particularly important.

As you can see from the previous section, most of the object memory allocation we are talking about occurs on the Java heap, so most of the memory management here refers to the Java heap memory. Application counters, virtual stacks, they live and die with threads, so memory is relatively manageable and has fewer problems.

An application starts, in the process of constantly running, constantly create new objects, and these objects are stored in a heap memory area, but the size of this area is limited, and the need to generate object is unlimited, so that when one object creation found heap memory is really no space can be used to create objects, The JVM will throw an OutOfMemoryError (OOM) and the program will hang.

This is just a snapshot. In fact, OOM is far from that simple. There are a few other things you need to know about the OOM.

– The JVM will perform memory garbage collection (GC) before OOM occurs
There are many different implementation algorithms for garbage collection
In order to better manage memory, the heap memory is divided into generations
The garbage collection algorithms of the new generation and the old generation of heap memory are inconsistent

So here to OOM have a comprehensive understanding, it is necessary to have a comprehensive understanding of the knowledge point in front. Next, let’s talk about the OOM’s memory generation.

Memory is a generational

When an application is started, the operating system will allocate an initial memory size to it. From the above, it can be seen that most of this memory belongs to the heap memory. In order to make better use of and manage this part of memory, the JVM divides this area into the new generation and the old generation.

As objects are created in the new generation, if there is no space for new objects in the new generation, a COLLECTION will occur. This is called a Minor collection. Each time an object in the new generation has passed a Minor collection, if the object has not been collected, it will increment its tag count by one. This number is used to identify how many Minor GC’s the object has undergone, and for Sun’s Hotspot virtual machine, if the number exceeds 15, the object will be moved to the old age.

Over time, if the older generation doesn’t have enough space for objects, the older generation will try to initiate a GC, which is called a Full GC.

Full GC occurs less often than Minor GC, but every time a Full GC occurs, the entire heap memory region needs to perform a garbage collection, which has a much larger impact on program performance than Minor GC, so we should try to avoid or minimize Full GC.

At the same time, the most common type of GC that occurs in the heap is the Minor GC of the new generation, because all objects are allocated to the new generation first, so the memory changes quickly. Only when the memory runs out of memory can GC occur, but the typical Minor GC executes much faster than the Full GC.

Why is that? Because the new generation and the old generation of garbage collection algorithms are different.

Garbage collection algorithm

Mark-sweep algorithm

This is the most basic collection algorithm, and as its name suggests, the algorithm is divided into two phases: “mark” and “clean” :

All marked objects are marked first, and all marked objects are recycled after marking is complete.

The reason why it is the most basic collection algorithm is that the subsequent collection algorithms are based on this idea and improved on its shortcomings. There are two main shortcomings of it:

The efficiency problem

The labeling and removal processes are inefficient;

Space problem

A large number of discrete memory fragments are generated after token clearing, and too many space fragments are more likely to lead to garbage collection, because when the program needs to allocate large objects, it is more likely to have to trigger another garbage collection action prematurely because it cannot find contiguous memory of sufficient size.

Copying algorithms

To solve this problem of efficiency, a collection algorithm called Copying emerged, which divided available memory into two equally sized pieces by capacity, using only one piece at a time. When this area of memory is used up, the surviving objects are copied to the other area, and the used memory space is cleaned up again.

In this way, each time a piece of memory is reclaimed, memory allocation does not have to consider the complexity of memory fragmentation, as long as the heap top pointer is moved, in order to allocate memory, simple implementation, efficient operation. However, the cost of this algorithm is half of the original memory, which is a bit too high.

However, the efficiency of this algorithm is so high that commercial virtual machines now use this collection algorithm to recover the new generation. Why can the new generation use copy algorithms?

Research by IBM shows that 98% of the new generation of objects will die overnight, so there is no need for a 1:1 partition of memory space. In view of this, the new generation adopted the following division strategy.

The Cenozoic is now subdivided into three parts, a larger Eden and two smaller Survivor zones.

When garbage collection is performed, the surviving objects in Eden and Survivor are copied to another Survivor space at once, and Eden and the Survivor space that was just used are cleaned up. The default HotSpot VIRTUAL machine size ratio of Eden to Survivor is 8:1, which means that each new generation has 90% (80%+10%) of the available memory of the entire new generation, and only 10% of the memory is “wasted”.

Once this cleanup is complete, the original Survivor is empty and remains empty until the next Minor GC, when it serves as the host for the surviving object. The two survivors in turn act as a staging post for the new generation of survivors in the GC process.

However, if the memory region in which the replication algorithm is used has a large number of viable objects, the replication algorithm becomes overwhelmed and requires a larger Survivor region for those viable objects, perhaps even a 1:1 ratio. So for the old age of the heap memory region, there is the following algorithm.

Mark-collation algorithm

The marking process is still the same as the mark-clean algorithm, but instead of cleaning up the recyclable objects directly, the next step is to move all surviving objects toward one end and then clean up memory directly beyond the end boundary. This method avoids fragmentation and does not require an extra chunk of memory, which is suitable for older generations.

However, compared with the replication algorithm, although the algorithm occupies less memory space, the garbage collection time will be longer than that of the replication algorithm (time to space), so it is also mentioned above

We should try to avoid or reduce the occurrence of Full GC.

The two algorithms are described in very concise terms

Replication algorithm: Trade space for time
Mark-tidy algorithm: Time for space

In short, you can’t have your cake and eat it, but they are the best choice for both the new generation and the old generation.

conclusion

Comb briefly some knowledge points that the article talks about

For better management of heap memory, this area is divided into new generation and old generation
Recycling occurs more frequently in the new generation than in the old
The new generation of garbage recycling occurs asMinor GC; The GC that occurs in the old age becomesFull GC
The new generation uses replication algorithm for garbage collection; The old days used mark-de-clutter algorithms
In order to manage the memory of the new generation more efficiently, according to the replication algorithm and combined with the research and demonstration of IBM, the new generation is divided into three parts: a relatively large Eden region and two relatively small Survivor regions, with a ratio of 8:1:1

reference

In-depth Understanding of Java Virtual Machine – Teacher Zhou Zhiming

Explore the principle of Android GC

Pay attention to my

Weibo – Gu Dong da Xia

Zhihu – Goo dong

Lot – splash