preface

Tuning the JVM is a must for Java developers if they want their Java applications to improve performance

JVM tuning is a body of knowledge that does not clearly define the right way to do it. Because each program has different priorities, it is only by constantly trying to tune your program according to how it is running that you can find the right way to tune your program

Ha ha! The reader may feel that this is nonsense, and there is no clear definition of how to tune it? Are you out of tune?

Of course, you don’t need to know nothing about tuning, but you need to understand the class loading mechanism, GC garbage collection mechanism, and JVM configuration parameters

Only by understanding the underlying logic of Java program running, can we more accurately determine what parameters need to be configured for the current program and what effect will be achieved after configuration. This is why many Java programmers are often asked in the interview process if they know anything about the JVM, not rocket building but fundamentals

All the information and information about the garbage collection mechanism comes from the documents that Oracle makes available to the public. Is that why the garbage collection mechanism is almost the same everywhere you look

This article will elaborate on the garbage collection mechanism

  • How to mark garbage?
  • How to do garbage collection?
  • What garbage collectors are the landing implementations for garbage collection?
  • What is the difference between these collectors?

Automatic garbage collection

Automatic garbage collection is to look at heap memory, to identify which object is using, which objects are not delete, delete unused objects, the process of using the object or reference object means that some parts of the program is still maintaining the pointer to the object, any part of the program is no longer referenced unused objects or objects, thus can be recycled without reference objects take up inside save

In programming languages like C, allocating and freeing memory is a manual process, but in Java the process of unallocating memory is handled automatically by the garbage collector

How do I determine if memory needs to be reclaimed

This process is called marking. This is where the garbage collector identifies which memory is in use and which is not

How to mark

Object to recycle

Reference counting

The basic idea of reference counting is that when an object is added to a counter, +1 runs out of -1, and the object with a count of 0 is eventually reclaimed. But there’s a problem with this implementation, and it’s called circular referencing. Suppose object A refers to object B, and object B refers to object A. When the GC does a collection, it feels like neither object can be collected because there are references to both objects.

When retrieving A, IT was found that B was referencing A, so it could not be retrieved. When collecting B, it is found that A cannot collect even when referring to B, so the concept of circular reference comes into being

Accessibility analysis

In simple terms, think of an object and its references as a diagram. GC Roots were selected as the activity object. The reference chain is then traced, and an object is considered recyclable if it is unreachable from GC Roots, meaning there is no reference.

The JVM does garbage collection first by marking up all objects that can be used as GC Root. Once you’ve found all of them, follow each GC Root down, marking all of the ones in use. Those that are not marked as in use are recycled after the tag is completed

A root is a root, and if the root is to be reclaimed, then some objects associated with the root will be reclaimed. That is, an object that can’t be found anywhere is considered recyclable

Can be an object for GC Root

  • The object being referenced in the virtual machine stack
  • The local method stack uses the object being referenced
  • The object referenced by the static property
  • The object referenced by the method area constant

Reference type and reachability level

There is also a concept mentioned in this reachability algorithm called references, and there are many different kinds of references in Java

In addition to the concept of GC Root, it is important to understand reference types and reachability levels for reachability analysis algorithms

Reference types

Reference types describe
Strong reference StrongReference The most common plain object reference is not recycled as long as there is a strong reference to an object
Soft reference SoftReference The JVM will only attempt to reclaim objects pointed to by soft references if it thinks it is out of memory (cache scenario)
A weak reference WeakReference Although it is a reference, it can be recycled at any time
Phantom reference PhantomReference Objects cannot be accessed through it. Provides a mechanism for executing specified logic after objects are finallize (Cleaner)

Level of accessibility

Level of accessibility describe
Most Reachable An object can have one or more threads accessible without various references
Appear Reachable This is when we can only access the state of an object through a soft reference
Weak can arrive as Weakly Reachable State that can only be accessed by weak references. When a weak reference is cleared, the destruction condition is met
Phantom Reachable There are no other references, and finallize is over, only phantom references to this object
Inaccessible unreachable Means the object can be cleared

summary

The reachability algorithm is to find an object that can be used as an entry point (this object is usually stored in constants and stacks), and then trace through this object to find all the objects that are being used and mark them. Once the tag is complete, it determines which objects are unreachable and which are recyclable

Method area recovery

Classes and class element information stored in the method area are unloaded when the classloader is reclaimed.

Garbage collection algorithm

Mark-sweep algorithm

All objects to be reclaimed are first identified and then cleaned

The marking and clearing process is inefficient and has memory fragmentation issues, which is not suitable for very large heaps

Other collection algorithms are improved based on the idea of mark-clear

Memory fragmentation is when memory usage is not continuous like this area of memory on the diagram. If each small green block is 8 bytes, the total memory area is 32 bytes. After collecting 2 and 4 small memory blocks,1 and 3 memory blocks are not contiguous, and a 16-byte object is generated. Because the 2 and the 4 are both 8 bytes there’s no way they’re going to fit. But if you put the 1’s and the 3’s together you can still fit 16 bytes of objects in this memory area

Copying algorithms

Divide two areas of equal size and copy living objects to another area when collecting.

By storing objects sequentially during copying, memory fragmentation can be avoided. The drawback is that copy + reserved memory will cause a certain amount of memory waste

Copying to a new memory area is done sequentially, so the copy algorithm solves the memory fragmentation problem

The disadvantage is also obvious, reserving a memory area for living objects. If your program needs 8 gigabytes of memory to run, then you have to add another 8 gigabytes of memory for copying algorithms to avoid fragmentation, that’s a waste

Mark-compact algorithm

Similar to mark-sweep, but to avoid memory fragmentation, it moves objects during cleanup to ensure that the moved objects occupy contiguous memory space

In essence, it is similar to the mark-clear algorithm, but it has an additional defragmenting step, and the main purpose is to avoid the memory fragmentation problem

Generational collection

Each algorithm has different advantages and disadvantages, so what algorithm do we use when doing garbage collection? The JVM has already figured this out for us, since garbage collection algorithms are both good and bad. Then, according to the life cycle of the object, it will be divided into several regions, and the appropriate garbage collection algorithm will be adopted for different regions, which is generational collection.

The HotStop is divided into two main regions: Cenozoic and old age

The Cenozoic is further divided into three regions

  • Eden area (for storing new objects)
  • Survivor0 zone (used to store objects in Eden zone that have not been collected after garbage collection)
  • Survivor1 zone (used to store objects in Survivor0 that have not been collected after garbage collection)

Default memory usage of the generation area

  • New Generation S0:S1:Eden -> 1:1:8
  • New generation: Old age -> 2:8

New object is assigned to the Eden area, if more than – XX: PretenureSizeThreshold: large object directly into the threshold of the old s

The following figure

An important point of generational collection is to be able to understand the reason for each generation. How does GC collect each generation after generational collection

Through the above figure, we can understand the so-called generational collection and the use of different algorithms. Different algorithms are adopted for different areas to achieve the highest execution efficiency. The new generation uses the copy algorithm and the old generation uses the tag sorting algorithm

Because the lifetime of an object in the new generation is not very long and it’s basically the time from the start of a thread to the end of a thread, the replication algorithm is used

However, the old objects live for a long time and may even be large, so in order to avoid memory consumption, we use the tag collation algorithm

Garbage collector

With the above talk about the tags and algorithms for garbage collection, let’s talk about the specific implementer of garbage collection, the garbage collector

There are many garbage collectors in Java, and each one is implemented differently in terms of garbage collection algorithms, parallel serialization, and GC pause times

Serial collector

A single thread performs all garbage collection, which is suitable for single-processing machines, the JVM’s default collector in Client mode

Serial GC

Set this parameter to -xx :+UseSerialGC

The new generation of serial collector, which uses the replication algorithm

Serial old

Set this parameter to -xx :+SerialOldGC

The serial collector of the old era, which uses the mark-collation algorithm, is different from the new generation of replication algorithm

An overview of the

Green represents the thread executed by the user, and red represents the thread of the GC collector. It starts a thread to do some marking, cleaning, cleaning. One of the most important things to do when GC is working is to stop the user’s thread and wait for the GC thread to finish garbage collection before running the user’s code, which can be a problem for both generation and generation.

The serial collector is not capable of producing garbage while simultaneously cleaning it up, and the concept is actually quite understandable. The user thread is constantly producing objects while executing and releasing many objects after executing. Since the serial collector has only one GC thread, it has to stop the user’s thread and wait for it to complete garbage collection before executing the user’s thread. It’s like I’m cleaning up the trash and someone’s constantly throwing it out on the other side, and the job never gets done. However, this kind of collector is rarely used, and is usually used only by clients. Servers with multi-core cpus will not adopt this kind of collector

One of the big optimization points about GC optimization is how to reduce the GC time and make the program a little faster. Because there will be a delay in our program during GC collection, if the delay is very long, the response speed of the whole program will be slow, and the user experience will be poor

Parallel collector

Server mode JVM default GC, the overall algorithm is similar to Serial GC, the difference is that the new generation and the old generation are multithreaded parallel GC,

Parallel GC

Set this parameter to -xx :+UseParallelGC

The new generation of parallel collector, which uses the copy algorithm

ParNew GC

Set this parameter to -xx :+UseParNewGC

The new generation of parallel collectors, which use the replication algorithm, is essentially a multithreaded version of the Serial GC

The most common application scenario is working with older CMS

Parallel Old GC

Set this parameter to -xx :+UseParallelOldGC

A parallel collector from the old days, which uses a mark-collation algorithm

An overview of the

Parallel collector also known as throughput-first GC: Throughput = code run time/(code run time +GC time)

You can set the GC time or throughput equivalents, and you can automatically adjust the Eden,Survivor size, and MaxTenuringThreshold values

Common parameter Settings for this collector

-xx :ParallelGCThreads: Sets the number of threads used for garbage collection. Usually, it can be equal to the number of cpus; -xx :MaxGCPauseMills: Sets the maximum garbage collection pause time. It has a value greater than0The integer; -xx :GCTimeRatio: Sets the throughput size. Its value is one0to100The integer between; -xx :+UseAdaptiveSizePolicy: Enable the adaptive GC policy. To achieve a direct balance between heap size, throughput, and pause time;Copy the code

The characteristic of the parallel collector is that it transforms the single-threaded processing of the serial collector into multi-threaded processing

But even in the case of parallel collection, some corresponding stop-the-world still exists, meaning that our user threads are still not accessible while the GC is working. Of course, this time is usually very short and the user will hardly notice it

Concurrent collector

All of the previously mentioned serial parallel collectors have a stop-the-world problem. If the GC pauses are long, it is not friendly for the entire application. The JVM team is constantly looking and optimizing for a variety of implementations

CMS (Concurrent Mark Sweep)

Set this parameter to -xx :+UseConcMarkSweepGC

Designed for older generations, based on mark-clear algorithms, designed to minimize pause times

Because it tries to execute with user threads, it reduces pause times, which is important for time-sensitive systems such as the Internet, many of which still use CMS today

When the CMS works, not every step will be executed concurrently with the user thread, it will also have stop-the-world situations, mainly GC pauses during initialization and re-marking. However, concurrent tagging and concurrent cleanup are performed concurrently with the user thread

Because of the mark-sweep algorithm, there are memory fragmentation issues, and FullGC occurs in the case of long runs, resulting in bad pauses

CMS uses more CPU resources to compete with user threads

CMS is no longer maintained in JDK8 because there is not much room to tweak the overall GC performance and algorithm

G1

The G1 collector came into use in java7. It is a collector designed for heap memory, with both throughput and pause times, and is designed to replace CMS as the default after JDK9

G1 divides the heap into fixed-size regions, each of which can be either old or new

Region to Region is a copy algorithm, but in practice it can be regarded as a mark-collation algorithm as a whole, which can effectively avoid memory fragmentation

On the left

Overall Memory layout

  • The New Generation of Red (Eden and Surivor)
  • Blue years
  • FullGC is executed when large memory cannot be found

The picture on the right

GC execution phase. Each circle represents a pause stop-the-world

  • Yellow is the mark
  • Blue is a new generation collection
  • Red is mixed collection (it defiles the space in addition to garbage collection)

The advantage of G1 is even greater when the Java heap is particularly large, since it breaks the heap into many small regions and manages each region individually

So there will be better improvements to the system. Overall, its throughput pauses are pretty good

Garbage collector combination

There are seven types of garbage collectors, both for the new generation and the old generation, as well as G1, which is compatible with both

The default combination of the JDK is ParallelScavenge+Parallel Old.

The garbage collector is more about choosing the default, choosing another collector and adjusting the parameters in the collector if changes are necessary