Too long to see what this article says

  • What about the sudden surge in CPU usage of online machines?
  • How to deal with frequent full GC of online machines?
  • Oom how to prevent only the accident happened to know it? How to prevent it before it happens?

guide

Let me show you a picture

That’s right, our old friend Arthas. Artifact Arthas. Just deployed on the server in the afternoon, feeling very excited, and have a new toy (HAhHH).

Of course it doesn’t matter if there’s no Arthas. This article will get you started. The following points are involved

  • How to observe heap status on line. Nip in the bud. I’ve seen jMap and JStats before, but I’ll take a closer look at how to read these metrics.

  • How to locate the fault when the ONLINE CPU suddenly surges? It involves a little bit of JVM configuration. So called tuning.

Previous articles have mentioned some of these:

How to eliminate online bugs with Arthas

OOM exclusion process

1: How to nip it in the bud — watch the heap indicator

Commonly used observation indicators and significance

The main observation is %CPU and %MEM. Easy to understand.

Process indicators:

top 
ps -ef | grep "java"
Copy the code

The GC indicators:

The GC needs some background to understand the metrics.

1: the heap memory

Our heap physical memory is allocated like this. This article’s JDK 1.8. The default garbage collector package is ParallelGC.

All JAVA objects allocated on the heap will first occupy a space in Eden. The Garden of Eden is the place where human ancestors lived, roughly equivalent to the place where human beings were born. The Eden region can be thought of roughly as the place where objects (meaning objects assigned to the heap) are born. Let’s say A, A = new A(). The new character requests a memory space from the JVM, which will be used to store the newly born object baby. The house has a hexadecimal value like 0x2222211 to mark its physical address in the JVM (similar to my house in building 21, no. 10 Yard east of Northwest Wang Dong Road). This house is in Eden.

The JVM knows the physical address of each object through a root-reachable algorithm. Each object lives in a small house. The JVM manages a large house, such as the one below, where each cell is a house and an object lives in the house.

The root-reachable algorithm calculates that the houses that are still inhabited are gray grids. The unoccupied house is a green grid. Once lived in, but now empty house is black grid [recyclable]. The black grid is the focus. We need to get rid of them so that new people will have a house to live in when they arrive.

During the cleanup, many of the people in Eden were new, and they were unstable and moved quickly. There will be a lot of empty black boxes. In order to clear their tracks very quickly, the houses they used to live in can be quickly built for new people. The JVM invented a way to move all the people still living in the neighborhood, known as the green grid, to the neighborhood next door during garbage collection. Now that the neighborhood is empty, the JVM blows it flat and builds a new Eden neighborhood on top of it. In this way, the new object can live in Eden again. This method is called copying and is one of three basic garbage collection algorithms.

After that, it was not long before there were a lot of empty houses in Eden. The second clean-up process begins. This time, the JVM will move the people who are still living in Eden and those who have been living in S0 before to s1, another cell to the right of S0. And blow Eden and S0 out of the sky. And then bang, bang, bang, bang and build a new Eden and S0 on the same spot. The second garbage collection was completed.

After. The house was empty. Take out [garbage collection] a third time. Boom. Boom boom bam. S1 and Edean were blown flat. People go to S0. And then, boom, boom, boom, boom.

Boom, boom, boom.

Ygc will be triggered until s and Eden run out of houses for new people. Eden area and S area are collectively called Cenozoic era, yGC full name is Young GC. It’s just a matter of slamming everybody, and then slamming everybody. It is worth mentioning that some of the people who lived there in the younger generation after more than a few cleanouts, gained respect from the JVM and moved next door to the older generation, also known as the lifetime zone. How many times is that? Is configurable in the JVM’s parameters. CMS is 6 times, and the other 9 garbage collectors are 15 times. FGC is triggered when the number of people living in the area increases until the area runs out of houses for new people. FGC is full GC. Basically, you just bang everybody and bang, bang, bang, bang.

To sum up: Eden // S0 // S1: small community for young generation. Tenure (old): place where the older generation lives. Ygc: Young GC Triggered when the space of the younger generation runs out. FGC: Full GC Triggered when the space of the older generation runs out. The young generation and the old generation are recycled simultaneously.

2: Observe gc —- jstat

Jstat -gcCapacity 20606 [GC capacity]Copy the code

  • NGC: The current young generation has about 168M memory
  • NGMX: the largest young generation is about 524m (Mn represents the minimum unexplained)
  • S0C: no object exists in S0 area
  • S1C: about 50m in S1
  • EC: About 166m (NGC
  • OGC: The current age is about 352m
  • YGC: Young generation garbage collection occurred 41266 times
  • FGC: Old age garbage collection has occurred 14 times
Jstat - gcuTIL 20606 [GC use]Copy the code

  • N: The memory of the current young generation is about 168M
  • NGMX: the largest young generation is about 524m (Mn represents the minimum unexplained)
  • S0: no object exists in S0 area
  • S1: S1 uses 100%
  • E: Eden district uses 54.07%
  • O: 85.51% in the old days
  • YGC: 41329 times of young generation garbage collection
  • YGCT: Garbage collection of young generation takes 1187.704ms
  • FGC: Old age garbage collection has occurred 14 times
  • FGCT: Garbage collection in the old age took 18.581ms
  • GCT: Total waste recovery time 1206.285ms
Jstat -gcnew(old) 20606 Gc heap size jstat -gcnew(old) 20606 GC heap sizeCopy the code

3: Observe GC —- jmap

Jmap - histo: live pid | sort - k - 2 - g r | less number for each instance of the class, the footprint, full informationCopy the code

Jmap-heap pid Indicates the current heap statusCopy the code

4: to observe the gc – in the invincibility of Alsace is the most commonly used I/dashboard / / JVM/thread / / watch / / trace. But Arthas has so many uses that you can follow Arthas’ official documentation. Ali, the God of Java.

dashboard
Copy the code

thread
Copy the code

2: What about the sudden surge of online CPU

How to do?

Reboot first.

Network management. Add $5 and get the instant noodles.


I am a serious dividing line.

2: How to locate the problem of sudden surge of online CPU

First, the most important point is to use the commands you just taught to find the most occupied objects and threads. Especially threads.

Threads are divided into business threads and garbage collection. If it’s the former, it’s easy. Arthas finds the thread number and uses it to find the stack information. Locate the faulty sentence based on the stack information. Memory usage goes straight to see if loops or unbounded queues/chains create too many objects that can’t be garbage collected. Not by much. The following two ways of writing, we try to figure out.

Object o = null; for (;;) { o = new Object(); }Copy the code
for (;;) { Object o = new Object(); }Copy the code

Here are some of the differences between a memory leak and an overflow.

  • Memory leak: An object that cannot be reached by a root-reachable algorithm. There is no reference and no recycling. Nail house.
  • Content overflow: Physical space is occupied. The new object cannot apply for new memory space.

Back to the thread type question. If the garbage collector thread is taking up space. The general idea is to look up objects. Jamp-dump Analyzes dump files. The analysis software finds the call chain of the most occupied objects and figures out how they can be created but not reclaimed. Thread pools, connection pools, or queues of all kinds. Understand all understand.

I drew a picture for you, embarrassing:

It is very well made and can be understood by people with long experience in tuning.

Listen to applause.

Add time to play

There are a lot of topics that I want to write about that I haven’t written about yet, including:

  • Arthas Major Walkthrough
  • Mysql > alter table mysql > alter table mysql > alter table mysql
  • Distributed communication large framework RPC/MQ

Come on 2021 ⛽️⛽️.