preface

In a sunny noon, and colleagues xiao Yong walking in the company downstairs in the small park, see a lot of little sister, thought when can and these little sister together to discuss life ah, happy, hey hey hey.

Pack up your slaver, xiao Yong always makes a sound at this time, quite pleasing (FU) happy (CK). Xiao Yong: smallholder, don’t you advocate garbage classification now? What is garbage after all? Xiao Yong always asks such thoughtful questions when I walk with him! I: what is rubbish, you not be rubbish? Xiao Yong: Fuck you, man. Me: xiao Yong, promise me that we will discuss some relaxed problems when we take a walk in the future, ok? What is garbage, garbage is the object without reference is garbage. Let’s have a lunch break

Me: Come on, that’s it. Don’t you want to be on your resume: familiar with GC algorithms, familiar with garbage collectors, with actual JVM tuning experience? Promise to brighten your mind, and you’ll be fine to tell them when you go in for a future interview.

Xiao Yong: I’m kind of interested, but if you don’t understand, then you’ve wasted my time. Dinner is on you. I have no problem, but my three fans will not say yes to you, Xiao Yong: you have no problem, please start your performance ~

What is garbage

Garbage is an object or objects that have no reference to them (circular references) but still occupy memory space.

GC is an automatic storage management mechanism. When some occupied memory is no longer needed, it should be released. This storage resource management is called garbage collection.

Like our chest, we might hold that many clothes inside, there may be a few months or years will not through the time, but we don’t wear the clothes have been take in our wardrobe (memory), we don’t put these clothes away or give away, so we can put more can wear clothes, this is similar to the “recycling”.

In GC, there are only recyclable and non-recyclable, as shown below:

1.1 differences between Java and C++ garbage collection

Java is you can just throw garbage, Java will automatically handle it for you, while C++ has to handle it manually, but it is easy to cause a problem is to forget to recycle or recycle multiple times

  • java

    1. Garbage disposal by GC
    2. High development efficiency, low execution efficiency
  • C++

    1. Manual garbage disposal
    2. Forgetting to reclaim will result in a memory leak
    3. Reclaim multiple times, illegal access
    4. Develop efficiently and execute efficiently

How to find garbage?

Above we know what is garbage, so how do we find garbage?

The heap holds almost all object instances in Java, and the first thing the garbage collector needs to do before recycling the heap is to determine which of these objects are “alive” and which need to be recycled (that is, objects that are no longer referenced)

There are two algorithms for finding garbage

  • Reference count algorithm
  • Root Searching (Root Reachable Algorithm)

1. Reference counting

A reference counter is added to the object. Whenever it is referenced somewhere, the value of the counter is +1. When the reference is invalid, the value of the counter is -1. An object with a value of 0 cannot be used again.

When become 0 in this picture, this time using reference counting algorithm can determine it is junk, but reference counting method can’t solve a problem, is when the object is a circular reference counter value is 0, the reference counter cannot inform GC collector to recycle them, as shown in the figure below:

That’s when we need to use ourRoot reachable algorithm

2. Root reachable algorithm

When a program is started, the objects immediately needed are called root objects. The so-called root reachable algorithm is to find the root object first, and then follow the line to find those useful. For example, the Java program main() method is run. A main() method starts a thread.

Thread stack variables: Threads have a thread stack and a main stack frame. Objects that start inside main() are our root objects.

Static variable: A class has a static variable that is initialized as soon as it is loaded into memory, so the object that the static variable goes to is called the root object.

Constant pool: If your class uses objects from other classes, these are the root objects.

JNI: If we call classes or objects that are used by native methods written in C and C++

Figure of object5 and object6 although reference each other between them, but couldn’t find it from the root, so is rubbish, and object8 without any reference to natural is rubbish, other Object can be found from the root, so it is useful, will not be garbage collected.

3. The difference between

algorithm thought advantages disadvantages
Reference counting method Add a reference counter to the object. Each time a reference is made to the object, the counter is +1. When the reference is invalid, the counter value is -1 High judgment efficiency It can not solve the situation of reference between objects, the overhead is relatively large, frequent and large number of reference changes, bringing a lot of extra operations
Accessibility analysis Through a series of objects called “GC Roots” as starting points, search down from these nodes. When GC Roots reach an object that is unreachable, the object is recoverable More precise and rigorous, can analyze the circular data structure reference to each other Implementation is complex, requiring analysis of a large amount of data and consuming a lot of time

How to clean up trash

After we find the corresponding garbage, what if we go to clean up the garbage? There are three commonly used GC algorithms:

  • Mark-sweep
  • Copying (copy)
  • Mark-compact (Mark compression)

1. Mark-clear algorithm

Just like its name, the algorithm can be divided into “tags” and “remove” in two stages, first mark all need recycled object, after the completion of the unified recycle all the tags were labeled objects, this is the most basic collection algorithms, why do you say it is the most basic level, the basis for subsequent collector is this line of thinking and its deficiency were improved.

Tag removal algorithm, it has its own small problems, you can see the picture above, we found that cannot be recycled from GC roots, is not green, purple can be recycled, we recycle it after they became free, the algorithm is relatively simple, in the case of live objects more efficiency is higher, it need to experience two scan, The first scan is to find what works, and the second scan is to find what doesn’t and clean it up. There are two problems: A tag is efficiency, and the removal of two process efficiency is not high, another is the space problem, the tag will produce a large number of discrete space debris after clear, if too much space debris will lead to future application in the running process needs to allocate large objects, unable to find enough contiguous memory and had to trigger another garbage collection action in advance.

2. Replication algorithm

In order to solve the problem of efficiency, so the emergence of a copy (Copying) algorithm, it will be available memory according to the capacity is divided into two pieces of equal size, using only one piece, every time when it ran out of a piece of memory, will also living object assignment to another piece of the above, then has used the memory space of a clear, So that every time memory on the whole half area for recycling, also won’t have to consider when the memory allocation of memory fragments, such as complicated situation, only need to move on the top of the heap pointer, this applies to the circumstances of less live objects, so suit to Eden, scan only once, and improves the efficiency without fragmentation, but will cause the waste of the space, to reduce to half of the original memory, This is too high, and moving the copied object requires adjusting the object’s reference

3. Mark compression algorithm

Tag compression is the process of putting everything in order, and the process of cleaning up is compressed at the same time. Before recycling, useful to the front walk, will sort out the remaining, but mark compression algorithm still has its problems, we found that cannot be recycled by GC Roots of object, and then to move forward, cannot be recovered at this time we need the scanning twice and moving objects, the first time scanning the useful object, a second time to move, And moving around requires synchronization if it’s multi-threaded, so this is much less efficient, but it doesn’t fragment, and allocating objects doesn’t halve memory.

4. To summarize

  • Mark-sweep: As soon as it is marked as garbage, it is then swept away. The rest of the space is still fixed, and it is fairly efficient, but it is prone to debris
  • Copying: It’s more efficient to cut memory in two and use only one half. If there’s too much garbage, copy what’s useful to another one and then clean up the rest directly from memory
  • Mark-compact: Bring all the objects together, clean up all the garbage, and then the remaining space is still contiguous, so you allocate whatever you want to allocate directly into it

Logical partition of heap memory

Hot Spots in the JVM use generational algorithms

The new generation is divided into Eden and Survivor

Eden (Eden) : The default ratio is 8: it is the area survivor that we throw into after new objects are created. The default ratio is 1: it is the area survivor that we run to after recycling once. Different objects are installed in this area, so different algorithms are adopted

Because there are very few living objects in the new generation and many dead objects, the algorithm used is the replication algorithm

Living objects are particularly suitable for: tag clearing and tag compression algorithms

An object goes from birth to death

After an object is generated, it will be allocated on the stack first. If it is not allocated on the stack, it will enter Eden area. Eden area will enter Surivivor area after garbage collection, and survivor area will enter another survivor area after garbage collection. At the same time, some objects in the Eden region also enter another survivot, and when they are old enough, they enter the old region, which is a logical movement of the whole object.

So when is it going to be allocated on the stack, and when is it going to be allocated in Eden?

1 stack allocation

Stack allocation:

  • Thread-private small object: small object, thread-private
  • No escape: used in a piece of code that no one else knows about
  • Support for scalar substitution: This means that replacing an object with a common attribute and a common type is called scalar substitution

Allocation on the stack is faster than Allocation on the heap. If the Allocation on the stack is insufficient, the Allocation on the stack takes precedence over TLAB(Thread local Allocation Buffer) : In Eden area, many threads will allocate objects to it, but when allocating objects, we will certainly carry out space expropriation. Whoever grabs the object will be the one who gets it. The efficiency of multi-threaded synchronization will be reduced, so THE TLAB mechanism is designed

  • Occupying Eden, the default value is 1%. It takes 1% of the space in Eden, which is called thread-only space. When allocating objects, it first allocates the space unique to the thread
  • Multi-threaded users can apply for space without competing for Eden, which improves efficiency

2 the old s

When does the object enter the old age?

How many times has it been recycled into the old age?

  • More thanXX:MaxTenuringThresholdSpecified times (YGC)
    1. Avenge 15 times into the Old age
    2. CMS entered the old age 6 times
    3. G1 enters the old age 15 times

The Internet says you can increase the number of times, this is impossible

Dynamic age judgment

In order to be able to adapt to the memory status of different programs, virtual machines do not always require that the age of objects must reach MaxTenuringThreshold to be promoted to the old age. If the sum of all object sizes of the same age in Surivivor space is greater than half of Survivor space, Objects older than or equal to this age can go straight to the old age without waiting until the age specified in MaxTenuringThreshold.

Copy and copy between two Survivor as long as more than 50 percent of the time put the oldest into the old zone, that is not necessarily 15 years old.

Copy there is so many objects in s1 to s2 in more than fifty percent, s1 inside inside the area and Eden, the entire copy an object at once into the s2, after a garbage collection, after the past, this time the whole combined object already more than half of s2, which some of the oldest object directly into the old district, This is called dynamic youth judgment

Large object directly into old age, the so-called large object refers to need large amounts of memory space Java objects in a row, the most typical large objects is that kind of a long string and array, often appear large objects can easily lead to memory and triggers garbage collection in advance when I was a lot of space to get enough contiguous memory space

First start new an object, then that is allocated on the stack, if able to allocated on the stack, is allocated on the stack, stack pop-up directly, pop-up end, if not allocated on the stack, judge whether the object is large objects, if it is a large object, directly into old age, after the FGC end if not, enter the thread local distribution (TLAB), In any case, GC cleaning will be carried out in Eden area. If the cleaning is completed, it will be directly ended; if not, IT will enter S1, and S1 will continue GC cleaning; if it is old enough, it will enter S2, and S2 will continue GC cleaning

MinorGC/YGC: Triggered when the young generation runs out of space. MajorGC/FullGC: triggered when the old generation can no longer allocate space

Common garbage collector

Cenozoic collectors: Serial, ParNew, Parallel Insane

Serial Old, CMS, Parallel Old collector

Cenozoic and old collectors: G1, ZGC, Shenandoah

Each garbage collector does not operate independently. The following figure shows that there are wires between garbage collectors that can be used cooperatively:

New generation garbage collector

1. Serial collector

Serial collector is the most basic and oldest collector, collector is a single thread, it’s the meaning of the “single thread” is not to say that he will only use a single processor or a collection of threads to complete garbage collection, more important is to emphasize on its garbage collection, will suspend all other worker threads, until it is about the end of the collection

When Serial is running, it pauses all The threads, “stops The World”, and when The GC is finished, The application thread continues executing. It’s like if you have three girlfriends and they want you to go shopping with them at The same time, you can only go with one of them before The other one. The other girls have to wait while one is with them, but garbage collection is more complicated than that!

Advantages: Because of the single-threaded approach, this is one of the most efficient disadvantages of the other types of collectors for a single CPU: the risk and experience of suspending all threads without the user’s knowledge and control is not acceptable

Use the command to enable Serial as a Cenozoic collector

-xx :+UserSerialGC # select Serial as the new generation garbage collector

2. ParNew collector

The ParNew collector is essentially a multithreaded parallel version of The Serial collector. In addition to using multiple threads at The same time for garbage collector, The rest of The control parameters available to The Serial collector, collection algorithm, Stop The Wrold, object allocation rules, and so on are exactly The same as The Serial collector. On multi-core machines, the number of mobile threads enabled by default is the same as the number of cpus, but this can be modified by the parameters

-xx :ParallelGCThreads # Set the number of threads for JVM garbage collection

The ParNew collector does not offer much innovation over the Serial collector, except that it supports multi-threaded parallel collection, but it is the preferred new generation collector for HotSpot VMS running in server mode, especially legacy systems prior to JDK 7. One reason that has nothing to do with functionality or performance but is important is that it is currently the only one besides the Serial collector that works with the CMS collector.

Advantages: With efficient UTILIZATION of CPU, there are advantages for efficient utilization of system resources in GC. Disadvantages: Same as Serial. ParNew is the preferred next-generation collector for many virtual machines running in Server mode, because CMS can only be used with Serial or ParNew. In today’s multi-core environment, multi-threaded parallel ParNew is preferred. ParNew collector is used when CMS is activated (using -xx: +UseConcMarkSweepGC option), it can also be forcibly specified or disabled using the -xx: +/ -useparnewGC option

Applicability. 3

The Parallel Avenge collector is a new generation of collector. It is based on the mark-copy algorithm and the Parallel Avenge collector. The Parallel avenge collector is a multi-threaded collector that can be exploited as a Parallel collector. While collectors such as CMS focus on minimizing the downtime of user threads during garbage collection, the Parallel Scavenge collector aims at a controlled throughput, which is the amount of time the processor spends running user code as a percentage of the total consumption of the processor, as shown in the following figure:

If the virtual machine completes a task and the user code plus garbage collection takes 100 minutes, of which garbage collection takes 1 minute, then the throughput is 99%. The shorter the pause time, the better the application needs to interact with the user or ensure the quality of service response, good response speed can improve the user experience.

The garbage collector collects every 100 seconds and pauses for 10 seconds, and the garbage collector collects every 50 seconds and pauses for 7 seconds, which is shorter but has lower overall throughput and lower overall CPU utilization.

Collect frequency Each pause time throughput
Collect every 100 seconds 10 seconds 91%
Collect every 50 seconds 7 seconds 88%

You can set the maximum amount of time for the collector to complete memory collection by -xx :MaxGCPauseMillis, and you can control throughput precisely by -xx :GCTimeRatio.

Parallel collector and Parallel Old collector combined garbage collection diagram, in the new generation, when the user threads are executed to the safe point, all threads suspend execution, ParNew collector to multithreading, using the replication algorithm garbage collection work, after collection, The user thread continues to start execution; In the Old days, when all user threads had reached a safe point, all threads would suspend execution. Parallel Old collector uses multiple threads and a tag collation algorithm for garbage collection.

Old age garbage collector

1. Serial Old collector

Serial Old is an older version of the Serial collector, which is also a single-threaded collector using a mark-collation algorithm. The main purpose of this collector is to be used by HotSpot virtual machines in client mode. If used on the server side with the Parallel Scavenge collector, the other is used as a fallback in case the CMS collector fails.

Serial collector and Serial Old collector running schematic:

Applicable scenarios:Client mode; Single-core server; Avenge with the Parallel Scavenge; As a fallback to the CMS collector in the event of Concurrent Mode Failure in Concurrent collections

Parallel Old collector

The Parallel Old Exploiter is an older version of the Parallel Avenge collector, which supports concurrent collection through multiple threads and is based on the mark-collation algorithm to exploit the computing power of multi-core cpus. Insane /Parallel Avenge

2. CMS collector

The CMS (Concurrent Mark Sweep) collector is a collector whose goal is to obtain the shortest collection pause time. CMS collector is based on mark-clear algorithm implementation, the operation of this collector is a little more complex than the previous several collectors, the whole process is divided into four steps:

1) CMS Initial mark: Just mark objects that GC Roots can be directly associated with, which is very fast

CMS Concurrent mark: The process of traversing the entire object graph from directly associated objects in GC Roots, which takes a long time but does not require the user thread to pause and can be run concurrently with the garbage collection thread

3) CMS remark: During the correction of concurrent marking, the pause time in this stage will be slightly longer than that in the initial marking stage due to the continuous operation of the user program

CMS Concurrent sweep: Removes dead objects judged by the marking phase, which can also be concurrent with user threads because there is no need to move living objects.

The initial tagging and re-tagging steps still require “Stop The World” to suspend all user threads. Since The garbage collector thread can work with The user thread during The concurrent tagging and concurrent cleanup phases, which are The longest in The process, in general, The CMS collector’s memory reclamation process is executed concurrently with the user thread, as shown below:

Advantages:CMS collector is an excellent collector, it is mainly reflected in: concurrent collection, low pause.

Disadvantages:

The CMS collector is very sensitive to processor resources, and while it does not cause user threads to pause during the concurrent phase, it can slow down the application and degrade overall throughput by taking up a portion of the threads. The default number of threads for CMS collection is (number of processor cores +3)/4. That is to say, if the number of processor cores is greater than or equal to four, garbage collection threads will occupy no more than 25% of processor computing resources in concurrent collection. Processor resources will decrease with the increase of CPU. The CMS’s impact on user programs can become significant. The CMS collector is unable to handle The “floating garbage”, which may result in a “Concurrent Mode Failure” leading to another Full “Stop The World” GC during The CMS Concurrent marking and Concurrent cleanup phases. The user thread is still continuing, and the program will naturally generate new garbage objects during the running, but this part of the garbage object is after the marking process, CMS can not deal with them in the current collection, so it has to be left for the next garbage collection to clean up. This part of the garbage is called a “floating garbage” because of a CMS is a collector based on “tag – clear” algorithm, therefore when it has a lot of space debris gather over, space debris too much, will bring big trouble to large objects, may have to ahead Full GC operation, but through parameters: – XX: +UseCMS-CompactAtFullCollection for optimization

New generation and old generation garbage collectors

G1 collector

The Garbage First (G1 for short) collector is a milestone in the history of Garbage collector technology, which pioneered the idea of collector design oriented to local collection and region-based memory layout.

The G1 collector, a garbage collector for server-side applications, became the default garbage collector in server-side mode when JDK9 was released, while CMS was reduced to a deprecated collector

Features:

In G1 collector before all other collector, target range are either new or old age, or Java heap, but G1 do the comprehensiveness, any part of it can be geared to the needs of heap memory to form back to collected for recycling, measure is no longer belongs to which it points, but the block of memory to store up the amount of waste, recycling benefits most, This is the Mixed GC mode of the G1 collector. The region-based heap memory layout pioneered by G1 is key to achieving this goal.

Although G1 still retains the concepts of Cenozoic and old age, Cenozoic and old age are no longer fixed. They are both dynamic collections of a series of regions. G1 can build predictable pause time models because it uses regions as the minimum unit of a single collection

G1 no longer insists on a fixed size and number of generation regions, but divides the continuous Java heap into multiple independent regions of equal size. Each Region can act as the Eden space, Survivor space or old chronspace of the new generation according to needs. The collector can apply different policies to regions that play different roles.

Region A special Humongous Region used to store large objects. G1 considers an object that is more than half the size of a Region to be a large object.

The G1 collector runs as follows:

  • Initial Marking: Mark objects to which GC Roots can be directly associated, and change the value of the TAMS pointer so that the next phase of user threads running concurrently will allocate new objects correctly in the available regions. This requires a shorter pause thread, but is done synchronically with the Minor GC, so there are no actual additional pauses during this phase

  • Concurrent Marking: Analyze the reacability of objects in the heap starting from GC Roots, recursively scan the object graph in the whole heap to find the objects to be reclaimed. This stage is time-consuming, but can be executed concurrently with user programs.

  • Final Marking: Another short pause is made for the user thread, and the user processes the last few SATB records that remain after the concurrent phase is over

  • Live Data Counting and Evacuation: Update statistics of regions, sort the reclamation value and cost of each Region, and make a reclamation plan based on the pause time expected by users. You can select multiple regions to form a collection, and then assign the surviving objects of the Region to be reclaimed to an empty Region. Then clear the entire Region.

conclusion

Do you understand, Yong? Little yong, little yong, don’t fall asleep, I’m not finished yet! Xiao Yong wake up!! Small brave daze of say: how, get off work? . Off duty what? Do you understand what I mean by “GC”? Xiao Yong: I understand. I will go to the interview tomorrow. What you said is great! . I have put everything in my notes, if you are interested, you can come and have a look, that’s all for today, let’s go up

end….

I am a small farmer, afraid of what infinite truth, further have further joy, everyone come on!