1. Theoretical basis of G1

Blogs and books have been read about the G1 collector collection process, and almost nine times out of ten they tell a different story. To determine which one is correct, you have to look at the implementation code. Of course, I am not going to learn C language to see the code, the next will be combined with their own understanding and information to introduce G1 collector collection process, and strive to do the first contact G1 can also see.

1.1. The G1 is introduced

G1 was a landmark product, almost 10 years from its introduction to commercial use, and the default garbage collector from JDK9 onwards is G1, which will gradually replace CMS. G1 has significant differences from previous garbage collectors. For example, it can do new era and old era collection, while other collectors are a combination of Cenozoic collectors and old era collectors; At the same time, the division of the heap is obviously different from the traditional division of the heap.

If the permanent generation is ignored, the heap can be divided into 1 Eden + 2 Survivor of the new generation and a large chunk of the old generation and then collected by generation, as shown in the figure below:

In the G1 collector, the whole heap is divided into many regions of the same size, with the size of two to the NTH power, and regions are not required to be spatially contiguous. Each Region can play one of three roles: Eden, survivor, and Old. There’s also a humonous area, which is used to store a large object, and a large object is defined as an object half the size of a region, because G1 treats it as an old age, so I put it together as an old age. In addition, the role of each region is not constant.

This is why the G1 collector is named: The collector plans the collection according to the pause time specified by the user. In simple terms, the collection is arranged in order of the collection value of each region. For example, if a region has very few objects surviving, the region must be ranked high in order to be collected first. Because collecting this region requires only a small number of objects to move, it takes only a small amount of time and can release a large amount of memory. Therefore, this region must be collected first. That’s how G1(Garbage First) got its name.

G1 has only YongGC (YGC) and Mixed GC. YGC is used to collect all the Cenozoic generation, while Mixed GC is used to collect all the young generation and the old generation with high income. When a Full GC occurs, it uses a single thread for GC, which is time-consuming and should be avoided at all costs. The following illustrates the advantages of using G1 over other collectors by introducing the G1 collection process.

1.2. YongGC

Specific GC process will not say too fine, otherwise have to write tens of thousands of words, because also introduced memory set, card table, etc..

Collected as follows:

As shown in the figure, after the young generation is selected, the surviving object will be moved to an empty region during collection, and then the original memory space will be cleaned up to release the memory. It should be emphasized that YGC targets at the whole new generation. In many cases, there is 1-2G in the original Eden region, and all will be cleared after one YGC. On the other hand, the memory usage of survivor region will increase a little bit. After all, most objects are dead overnight, so YGC is still very efficient. The following points need to be emphasized:

  1. Basically all Eden area is cleared, 1666M -> 0;
  2. The surviving objects of the new generation will move to one or more empty regions, which will be marked as survivor regions. However, if the survivor object reaches the age, it will enter the old age. If the space is insufficient, the survivor object in Eden area will also enter the old age.
  3. The large object will enter the humogous, and the collector will treat it as the old age;
  4. YGC stops the world (STW), but generally due to YGC’s high efficiency (using multiple threads and collecting at the same time), STW times are short.

1.3. mixed gc

Mixed GC is unique to G1. Mixed GC collects all new generations and old generations with high revenue. Mixed GC is divided into two main steps:

  • Global Concurrent marking, which provides marking services for mixed GC;
  • Hybrid garbage collection, including copying living objects and cleaning up memory;

The global concurrent markup is divided into four steps:

  1. Initial tag (including root scan);

  2. Concurrent token;

  3. Relabeling;

  4. Clean up;

    Initial marking phase: marking root objects directly reachable from GC Roots, requires STW. The initial tag in the hybrid collection is almost the same as the initial tag in the New generation, so it’s not uncommon to see articles that say it’s done at the same time as YGC, so the G1 collector actually has no extra pauses at this stage. In fact, the initial mark of the mixed collection is borrowed from the result of the Cenozoic collection, that is, the Survivor zone of the Cenozoic after the Cenozoic garbage collection as the root, so the mixed collection must occur after the Cenozoic collection, and there is no need for another initial mark. In the GC log, it is gc Pause (young)(inital-mark).

    Concurrent marking phase: When YGC finishes, if the conditions for concurrent marking are found to be met, the concurrent thread begins to perform concurrent marking. Concurrency markers are started based on Survivor partition of the new generation and RSet of the old generation. Time is after YGC, only achieve InitiatingHeapOccupancyPercent threshold, will trigger a concurrent tags. InitiatingHeapOccupancyPercent 45, the default value is said is when already allocated memory and the memory allocated more than 45% of the amount of ram you can begin to concurrent tags. The concurrent tag marks all partitions. STW is not required at this stage.

    Tips: If you find that the heap memory occupied after collection reaches 40% of the total heap memory, mixed GC may occur next time. If you find that the heap memory is 2G and becomes 1.3g next time, mixed GC may occur, and frequent mixed GC will definitely be a problem. How to view valid information in logs will be explained later.

    Relabelling phase: Used to mark objects whose references have changed during the concurrent marking phase, requiring a brief STW on the user thread.

    Cleanup phase: STW is also required in this phase. The main surviving objects will be counted. The statistical results will be used to sort partitioned regions for the selection of the next Collect Set (CSet for short, see the introduction at the end of this section). Newly allocated objects, that is, newly allocated objects that are not within the scope of this concurrent tag, need to be considered active.

    One of the more misleading aspects of this phase is that the cleanup operation does not clean up the garbage objects and does not perform a copy of the living objects. That is, in extreme cases, the JVM’s memory usage is unchanged after this phase is over.

Mixed garbage collection stage: Mixed garbage collection is actually the same as YGC. The first step is to select several partitions from partitions for recycling. These selected partitions are called Collect Set (CSet for short). The second step is to copy the surviving objects in these partitions to the free partitions and put the reclaimed partitions into the free partition list. The log is marked as [GC Pause (mixed)], meaning that both the young generation and the old generation are collected at the same time. See below:

After recycling, it looks like this:

CSet: A collection set (CSet) is a collection of partitions that can be reclaimed. Data that survives in a CSet is moved to another availability zone during GC, and partitions in a CSet can come from Eden space, survivor space, or old age.

There are two submodes for selecting CSet in G1 mode, corresponding to Young GC and Mixed GC respectively:

  • Young GC: CSet is the Region of all Young generations;
  • Mixed GC: CSet is all the regions in the young generation plus the high-yield regions marked in the global concurrent marking phase;

1.4 extensions

1.4.1. Memory sets and card tables

The G1 collector will have to talk about the Remembered Set (RSet) and the Card Table (Card Table). Many articles will be interspersed with the GC process, and it will be difficult to read without understanding these two data structures, but it is easier to understand the GC process without introducing them.

Consider the following problem: YGC aims at all the Cenozoic generation, so when selecting GC Roots, it is difficult to determine which ones are Roots because of cross-generation references. In fact, many Roots are in the old age. But because the old age is so big, you can’t scan the whole old age every time, which is like scanning the whole heap, because the new generation must scan the whole heap. Therefore, a data structure called memory set (RSet) was introduced in the Cenozoic era to record which Region objects point to objects in the current partition (memory set is an abstract data structure used to record the collection of Pointers from non-collection Region to collection Region, so as to avoid adding the whole old age into GC Roots scanning range). Let’s draw a simple picture to understand this.

The easy thing to know is that if you ignore the old age as the root and don’t scan it, obviously object E is junk, which leads to misjudgment; However, it cannot scan all the Old ages. Therefore, the RSet used in the Eden Region records that objects in the Old Region point to objects in the local Region. That probably gives you a sense of what an RSet is. The key is the starting address of another Region, and the value is a set of elements in the set.

A card table divides a partitioned Region into contiguous regions of fixed size. Each Region is called a card. The card table is typically a byte array, with each partition’s address identified by the card’s index (array subscript). By default, each card is not referenced. When a card is referenced, the value of the array subscript corresponding to the card address is marked as 0. At the same time, record it in the RSet as required. For example, IF O points to E, then the index of the card table O in the Old region will be recorded in the RSet of the Eden region. See the figure below to see what I mean.

Note: Authoritative books and some articles write that memory set is a kind of abstraction, card table is its implementation, the relationship between the two is like Map and HashMap, but because the above way is easier for me to understand, so I introduce the above way.

1.4.2. STAB and TAMS

The second question is: how do you ensure that the collection thread and the user thread run without interfering with each other during the concurrent marking phase?

The first thing to be solved is that when the user thread changes the object reference relationship, it must ensure that it cannot break the original object graph structure, resulting in errors in the marking result.

G1 uses The snapshot-at-the-beginning (SATB) algorithm. SATB can be understood as taking a Snapshot of objects in The heap before GC starts, and The live objects are considered to be alive, thus forming an object graph. At the time of GC collection, the new generation objects are also considered live objects, and other unreachable objects are considered garbage objects. SATB algorithm is mainly divided into the following three steps:

  1. A snapshot is generated at the start of the tag, marking the live object.
  2. All changed objects are enqueued during concurrent marking (making all old references non-white in write barrier);
  3. There may be floating garbage that will be collected next time;

Step 2 can be interpreted as follows: suppose there is a relationship A.b= B, C.b=null(where B is white), and a change in the concurrency token A.b=null,C.b=B. B is marked as non-white because the old reference A refers to B. In addition, there are gray and black (black is a viable object, gray may be garbage or not). This belongs to the content of the three-color tag algorithm, so as to avoid more confusion (SATB actually relies on three-color tag). The following diagram shows the change process.

The second problem to solve is that the newly allocated objects in the concurrent process are considered viable objects, so how do you find the newly allocated objects in the GC process?

Each region records two top-at-Mark-start (TAMS Pointers, prev TAMS and Next TAMS). Objects above TAMS are newly assigned and are therefore considered implicitly marked. This way we find the newly allocated objects in the GC process and treat them as live objects.

Finally, having solved the problem of objects being allocated during GC, what about references changing during GC? G1 offers a solution through a Write Barrier. A Write Barrier is the extra processing required to assign values to reference fields. The extra processing is that all references refer to non-white objects, which reference objects have changed and which reference objects have changed

1.5. Advantages of G1 over CMS

  • The G1 has advantages in space compression;
  • The G1 avoids memory fragmentation by dividing memory into regions.
  • Eden, Survivor and Old regions are no longer fixed, and are more flexible in terms of memory usage efficiency.
  • G1 uses Pause Time to control garbage collection, while CMS uses stop the World (STW) to control garbage collection.
  • G1 will be used in the Young GC, while CMS can only be used in the O sector

G1 Log interpretation and experience sharing

2.1. Log Interpretation Key information interpretation

GC Pause (G1 Evacuation Pause) (young) :

Class space used 5944K, Capacity 6094K, committed 6144K, Reserved 1048576K 2020-03-23T17:32:14.984+ 0800:9.925: [GC Pause (G1 Evacuation Pause) (young)2020-03-23T17:32:15.005+ 0800:9.947: [SoftReference, 0 refs, 0.0000722 secs]2020-03-23T17:32:15.005+ 0800:9.947: [WeakReference, 71 refs, 0.0000329 secs]2020-03-23T17:32:15.005+ 0800:9.947: [FinalReference, 1660 refs, 0.0018747 secs]2020-03-23T17:32:15.007+ 0800:9.949: [PhantomReference, 0 refs, 3 refs, 0.0000193 secs]2020-03-23T17:32:15.007+ 0800:9.949: [JNI Weak Reference, 0.0000216 secs], 0.0252556 secs] [Parallel Time: 20.4ms, GC Workers: 23] [GC Worker Start (ms): Min: 9925.6, Avg: 9925.7, Max: 9925.8, Diff: 0.2] [Ext Root Scanning (MS): Min: 0.9, Avg: 2.3, Max: 10.1, Diff: 0.2] [Processed Buffers: Min: 0.0, Avg: 1.2, Max: 2.8, Diff: 2.8, Sum: 27.9] [Processed Buffers: Min: 0.0, Avg: 1.2, Diff: 2.8, Sum: 27.9] 3.3, Max: 13, Diff: 13, Sum: 75] [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.5, Diff: 0.5, Sum: 75] 2.9] [Code Root Scanning (MS): Min: 0.0, Avg: 0.7, Max: 4.3, Diff: 4.3, Sum: 15.0] [Object Copy (MS): Min: 8.7, Avg: 14.6, Max: 18.6, Diff: 10.0, Sum: 336.1] [Termination (MS): Min: 0.0, Avg: 1.1, Max: 1.4, Diff: 1.4, Sum: 31.2] 25.4] [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 23] 0.0, Max: 0.0, Diff: 0.0, Sum: 0.5] [GC Worker Total (ms): Min: 19.9, Avg: 20.1, Max: 20.2, Diff: 0.3, Sum: 0.5] 461.7] [GC Worker End (ms): Min: 9945.8, Avg: 9945.8, Max: 9945.8, Diff: 0.1] 0ms] [Purge: 0ms] [Purge: 0ms] [Purge: 0ms] [Purge: 0ms] [Purge: 0ms] [Purge: 0ms] [Purge: 0ms] 0.0 ms] [Redirty Cards: 0.3ms] [Humongous Register: 0.3ms] [Humongous Reclaim: 0.3ms] [Free CSet: 0.3ms] [Eden: 0.3ms] 284.0 M (284.0 M) - > 0.0 B (310.0 M) Survivors: 20.0 M - > 38.0 M Heap: Heap after GC invocations=8 (full 0): Heap after GC invocations=8 (full 0): Heap after GC invocations=8 (full 0): garbage-first heap total 4194304K, used 96923K [0x00000006c0000000, 0x00000006c0204000, 0x00000007c0000000) region size 2048K, 19 young (38912K), 19 survivors (38912K) Metaspace used 49410K, capacity 49926K, committed 50428K, reserved 1093632K class space used 5944K, capacity 6094K, committed 6144K, Reserved 1048576K} [Times: user=0.38 sys=0.06, real= 0.02secs]Copy the code

I won’t go into the details of the YGC process, but only the log information that I think is critical:

[Object Copy (ms): Min: 8.7, Avg: 14.6, Max: 18.6, Diff: 10.0, Sum: 336.1]
[Times: user=0.38 sys=0.06, real=0.02 secs]
Copy the code

The two lines represent the time and time taken to copy the object, mainly Avg and real. If you find that YGC takes a long time, pay special attention to these two lines, such as real=0.56 secs; Object Copy (MS): Avg: 501 Then you know that Object Copy takes too much time, but the new generation almost dies overnight, so why are so many objects still alive? Next, we need to make a specific analysis according to our own business, which will be combined with the problems encountered in our project.

 [Eden: 284.0M(284.0M)->0.0B(310.0M) Survivors: 20.0M->38.0M Heap: 336.1M(4096.0M)->94.7M(4096.0M)]
Copy the code

This line represents the situation of memory reclamation. It can be seen that all Eden area will be reclaimed, the survival area increases by 18M, and the final heap memory is from 336.1m before reclamation to 94.7m after reclamation. If you find intermittent survival areas that go from a small footprint to a large footprint after reclaiming, you probably have a problem with a large number of objects surviving. If heap memory or rising after recycling, of course, we here is the maximum heap memory of 4 g, for example, if the recovery has been soaring after the 2000 m (usually takes up 45% or so) that you might want to analysis whether the normal, such as whether there are often large object application memory, large objects and didn’t get back into the old s lead to rising. However, mixed GC is usually carried out at this time, and some problems encountered will be analyzed later.

As for mixed GC, it is similar to YGC analysis. Since the initial tag in the global concurrent tag is accompanied by YGC, what you see at this stage is actually YGC content

The 2020-03-24 T19: surely. 416 + 0800:7388.991: [GC Pause (G1 Evacuation Pause) (Young) (Initial-Mark)2020-03-24T19:34:12.447+ 0800:7389.022: [SoftReference, 0 refs, 0.0000844secs]2020-03-24T19:34:12.447+ 0800:7389.022: [WeakReference, 24 refs, 0.0000157 secs]2020-03-24T19:34:12.447+ 0800:7389.022: [FinalReference, 20 refs, 0.0000212 secs]2020-03-24T19:34:12.447+ 0800:7389.022: [PhantomReference, 0 refs, 0 refs, 0.0000098 secs]2020-03-24T19:34:12.447+ 0800:7389.022: [JNI Weak Reference, 0.0000400 secs], 0.0336560 secs] [Parallel Time: 28.3ms, GC Workers: 18] [GC Worker Start (MS): MinAvg: 7388994.1, Max: 7388994.7, Diff: 1.2]... YGC 2020-03-24T19:34:12.450+ 0800:7389.025: [GC concurrent-root-region-scan-start] 2020-03-24T19:34:12.450+0800: 7389.025: the Total timefor whichApplication Threads were stopped: Stopping Threads took: 0.0006589 seconds 2020-03-24 T19: surely. 461 + 0800:7389.036: [GC concurrent-root-region-scan-end, 0.0111133 secs] 2020-03-24T19:34:12.461+ 0800:7389.036: [GC concurrent - mark - start] 2020-03-24 T19: surely. 917 + 0800:7389.493: [GC concurrent-mark-end, 0.4565315 secs] 2020-03-24T19:34:12.919+ 0800:7389.494: [GC 2020-03-24T19:34:12.919+ 0800:7389.494: [GC 2020-03-24T19:34:12.920+0800] 7389.496: [GC ref - proc2020-03-24 t19: surely. 920 + 0800:7389.496: [SoftReference, 2129 Refs, 0.0008546 secs]2020-03-24T19:34:12.921+ 0800:7389.497: [WeakReference, 1891 refs, 0.0006409 secs]2020-03-24T19:34:12.922+ 0800:7389.497: [FinalReference, 947 refs, 0.0009644 secs]2020-03-24T19:34:12.923+ 0800:7389.498: [PhantomReference, 2 refs, 279refs, 0.0002176 secs]2020-03-24T19:34:12.923+ 0800:7389.498: [JNI Weak Reference, 0.0000991 secs], 0.0030448 secs] 2020-03-24T19:34:12.923+ 0800:7389.499: [Times: user= 1, class = 1, real= 1] [TBM: TBM = 1] 7389.553: the Total timefor whichApplication Threads were stopped: Stopping Threads took: 0.0002143 seconds 2020-03-24T19:34:12.979+ 0800:7389.554: [GC cleanup 1993M->1296M, 0.0411176secs] [Times: User =0.04 sys=0.31, real=0.05 SECS] 2020-03-24T19:34:13.020+ 0800:7389.596: Total timefor whichApplication Threads were stopped: Stopping Threads took: 2020-03-24T19:34:13.021+0800 seconds 2020-03-24T19:34:13.021+0800 seconds 2020-03-24T19:34:13.021+0800 seconds 2020-03-24T19:34:13.021+0800 seconds 2020-03-24T19:34:13.021+0800 seconds 2020-03-24T19:34:13.021+0800 seconds [GC concurrent-cleanup-end, 0.0009866 secs] 2020-03-24T19:36:25.421+ 0800:7521.996: [GC Pause (G1 Evacuation Pause) (mixed)2020-03-24T19:36:25.438+ 0800:7522.013: [SoftReference, 0 refs, 0.0000595 secS]2020-03-24T19:36:25.438+ 0800:7522.013: [WeakReference, 45 refs, 0.0000194 secs] 2020-03-24T19:25.438 + 0800:7522.013: [FinalReference, 0 refs, 0.0000086 secs] 2020-03-24T19:25.438 + 0800:7522.013: [PhantomReference, 0 refs, 5 refs, 0.0000110 secs] 2020-03-24T19:25.438 + 0800:7522.013: [JNI Weak Reference, 0.0000482 secs], 0.0193059 secs] [Parallel Time: 13.0ms, GC Workers: 18] [GC Worker Start (MS): MinAvg: 7521999.7, Max: 7522000.2, Diff: 0.9]... Content with YGCCopy the code

Note that YGC pauses (G1 Evacuation pauses) (young) (initial-mark) and GC pauses (G1 Evacuation pauses) (mixed) may be interspersed, The content of these two paragraphs is the same as that of YGC, so there is no need to explain. Steps for mixed GC are easy to read from the log information, such as initial flag GC Pause (G1 Evacuation pause) (young) (initial-mark), root Region scan, concurrent flag, etc., The steps described in 1.3 are easy to follow.

Generally speaking, after a collection (mostly YGC) the heap memory usage reaches around 45%, and if it doesn’t go down later it will soon be mixed GC to clean up old ages. If the program is normal, mixed GC frequency should be very low. Our project may have some problems, but it is usually only once every 30 minutes. If your project comes up frequently, say 1-5 minutes, you may need to consider whether there is a problem. Frequent mixed GC is likely to indicate that memory is not released in the old days, and may be increasing. In the end, the speed of memory collection can not keep pace with the speed of allocation, and then the full GC will be required, which will take more than ten seconds at a time, and will continue. Since the full GC is a single-threaded STW, the service will be basically unavailable. So full GC is something to avoid at all costs.

2.2. Some parameters and important tips

-xx :+CrashOnOutOfMemoryError If this option is enabled, the JVM will crash and produce text and binary crash files (if core files are enabled), which jdk8 adds when out of memory errors occur. - XX: GCLogFileSize = XXXXXXXXX GC log file size limit - XX: NumberOfGCLogFiles = 100 GC cap on the number of log files - XX: + HeapDumpOnOutOfMemoryError Dump file -xx :InitialHeapSize=4294967296 Initial heap memory size -xx :MaxDirectMemorySize=805306368 Maximum direct memory size -xx :MaxHeapSize=4294967296 Max heap memory size -xx :MaxGCPauseMillis=200 Max STW pause time -xx :+PrintGCDateStamps print GC time (current time, For example, 2020-03-23T17:33:07.939+0800) -xx :+PrintGCDetails Prints GC details -xx :+PrintGCTimeStamps Print GC time (the amount of time the JVM takes until garbage collection takes place, such as 63 for 64 seconds) -xx :+UseG1GC uses the G1 collector -xx :G1HeapRegionSize=n Size of each Region -g1heapwastePercent After the end of global Concurrent marking, we can know how much space in old Generegions will be recycled. After each YGC and before the Mixed GC occurs again, we will check whether the garbage proportion reaches this parameter. Won't happen next time Mixed GC - G1MixedGCLiveThresholdPercent old generation of live objects in the region accounted for, only under this parameter, Cset-g1mixedgccounttarget Specifies the maximum number of times a MixedGC is executed after a global Concurrent markingCopy the code

When writing parameters, remember that the initial heap memory and maximum heap memory are the same, to avoid the JVM reallocating memory after each garbage collection, which can cause full GC problems if inconsistent.

For G1 collectors, do not specify the size of the Cenozoic (-xMN) and old generations, and let the collector adjust dynamically. This may cause two problems

  • G1 will no longer consider pause time goals. So, in essence, setting the size of the younger generation blocks the pause time goal;
  • The G1 no longer has the ability to expand and contract the space of the younger generation as needed. Because the size is fixed, you can’t change the size;

-xx: MaxGCPauseMillis=x allows you to set the time for starting the application to pause, and G1 uses this parameter to select CSet when running to meet the response time Settings. In general, this value can be set to 100ms or 200ms (depending on the situation), but 50ms is not reasonable. If the pause time is set too short, G1 can’t keep up with garbage generation. Eventually degenerate to Full GC. So tuning this parameter is an ongoing process, gradually getting to the best state. A tip on how to set the pause time is less than or equal to 10% of the application response time to the client. For example, the HTTP service response time to the browser is generally 1000ms, so you can set the pause time to 100ms, of course, the specific debugging. Remember: Time out is a goal and is not always guaranteed to be met.

Finally, be aware of Evacuation failures, full X, which should not occur and should be addressed immediately if they occur.

2.3. Experience sharing

2.3.1. Full GC experience sharing

After the operation and maintenance personnel informed the project to stop full GC, they first logged in the dump file on the machine, and then restarted the machine to temporarily restore online, and then sent the dump file to the machine for analysis using Mat.

1. The dump file:

-jmap -dump:format=b,file=./myfile 324576 -myfile = dump file name, 324576 = pid 1. Jmap-histo :live PID Check the distribution of live objects (do not use full GC online, if you do not add LIVE, you will force a full GC); 2. Jmap-heap pid View heap parameters and heap usage.Copy the code

2. Download the dump file to a local directory:

-command: SCP -r./myfile [email protected]: /home/ Tom /mydir - Add a. Hprof suffix to the file after downloading (or when dumping), because mat only selects hprof when opening dump.Copy the code

3. Download MAT and analyze:

- download address: https://www.eclipse.org/mat/downloads.php, pay attention to choose domestic mirror; - The dump file (6 gb) cannot be opened after downloading, so you need to use the command line to start MAT and specify the maximum heap memory parameter; - Go to the installation directory and start memoryAnalyzer. exe -vmargs -XMx8gCopy the code

When you open it, you see

Obviously there is a problem with objects that take up 3 gigabytes. Click on the icon in the upper left corner to check the object distribution and find objects that take up too much memory

Basically, an object occupies about 5M, by this point, you will basically know where the problem is, find the code to solve the BUG, in fact, many times the full GC code is written with the problem, such code really exists, encountered several times. Full gc(system.gc ()) is a package that can be used as a reference. For example, when the project is just started, it may do some loading and will be used once, and then it will not appear again.

2.3.2. Log analysis

So if you look at the monitor, the heap is going up over this period of time.

cat gc.log | grep 'Eden'|less
Copy the code

After recycling, it is basically around 3000M, so much is occupied in the old era, so mixed GC will be carried out in large quantities basically, check the frequency

$ cat gc.log | grep 'mixed'|awk '{print $1}'
$ cat gc.log | grep 'real'|awk '{print $4}'|sort -n -k 2 -t =  # You can skip this one
Copy the code

In fact, this frequency is not reasonable, it is almost one mixed GC per minute. Let’s look at the gc content: From the mixed GC, we find that the yong GC before the mixed GC always has large object memory allocation request.

Note large objects constantly apply for memory, and then the large objects directly enter the old Humongous area (if the value is 5M, then three consecutive Humongous areas are obtained). As a result, the memory continues to increase, but the large objects continue to apply for memory after they are not released. If you use jmap-histo PID, you can almost find what the object is.

2.4.2. Analysis of the problem that YGC time is too long

In general, due to the low survival rate of newborn objects, the time for YGC is very short, as there is no need to copy a large number of living objects. However, I really saw the problem of YGC time being too long (1-3 seconds) twice, which needs to be solved because it can STW. Some people reported that intermittent jitter existed in the service before (memory jitter also existed. At that time, only 2G memory was allocated, so the problem was very obvious. Later, 4G memory was changed to improve a lot, and the following analysis is based on the improved analysis).

Grep ‘Eden’ again:

The front is fortunately as above, belong to normal. But a closer look revealed a problem:

The memory suddenly increased, of course, after several gc, it came down again. Obviously, mixed GC must be done here (maybe several times), check some mixed GC frequency and find that it is every 30 minutes (it is clear if the heap memory is monitored). In this case, you can guess whether the problem is caused by a timed task (for example, the topology should refresh every n minutes). You can use Jmap to check the memory usage of the object and determine which function the object belongs to to determine the problem.

When you get into the log content, you can see that a lot of memory copying takes up too much time

Due to the scheduled task, a lot of objects are created and allocated in Eden area. After GC, these objects are copied, which leads to the explosion of survivor area. At the same time, because Survivor cannot fully accommodate these objects, it directly enters the old age, thus occupying 1G of memory. At this point, 2G memory has already accounted for half of the maximum heap of memory, and mixed GC will be carried out to collect all the new generation and some valuable old generations.