This article is from: PerfMa Technology Community

PerfMa website

Issue an overview

Last night, my colleague suddenly asked me a question about how a System GC occurred as soon as it was started. According to my experience, it was probably caused by insufficient memory outside the heap, but the startup was not enough, which seemed impossible, so I asked whether IT was called by myself. I did a search and there’s no room for it.

Preliminary Problem location

The breakpoint is set on the system.gc method so that when it is called you can see the stack. It is called in the run method of a Daemon class called sun.misc.gc

Of course, if you are in production environment, you can not just let you debug, we can also see the call stack through tools such as btrace, there are many ways

The Demo emersion

When I see the call stack, I immediately think of RMI. RMI is really useful in our System because the System will communicate with remote processes through RMI. In my experience, RMI will perform System GC on a regular basis and the default time is one hour.

-Dsun.rmi.dgc.client.gcInterval=36000000
-Dsun.rmi.dgc.server.gcInterval=36000000
Copy the code

In order to make the problem more simple investigation, so I wrote a demo to simulate

To run the above program, we can run a random Java process locally, get the corresponding process number, and pass it as the first program parameter. We can print the GC log by adding several JVM parameters

Problem analysis

By default, it takes 1 hour for System GC to reappear.

Estimate some people guessed the RMI, also know the above parameters can control the GC interval, as to why will immediately trigger System GC also wondered at a time, I didn’t notice this before, but after analysis, the real situation is far more than you can see this kind of situation, the truth is either immediately, or may not happen regularly, Depending on your GC algorithm, depending on how long it takes your operating system to restart, is this getting more interesting?

OK, so let’s start our analysis tour by looking at our Daemon’s run method

MaxObjectInspectionAge is a native method. When executed for the first time, the latencyTarget value is 36000000. If the value returned by maxObjectInspectionAge is greater than an hour, the System GC will be triggered immediately. Now the focus is on the maxObject TinspectionAge native method

When was the last time the heap was gc? For the sake of analysis, I will directly use CMS GC

First, it takes nanoseconds, and it depends on the operating system, so I’m going to do it on Linux

If Linux supports CLOCK_MONOTONIC, the value returned is the number of nanoseconds relative to the time the OS started. I won’t say much about CLOCK_MONOTONIC, but you can search for several POSIX-defined clocks. Our test system supports this. So what is directly returned is the number of nanoseconds relative to the operating system boot

Then the last GC time of Cenozoic and old generation is traversed

JavaTimeNanos :javaTimeNanos :javaTimeNanos :javaTimeNanos :javaTimeNanos :javaTimeNanos :javaTimeNanos :javaTimeNanos :javaTimeNanos :javaTimeNanos :javaTimeNanos :javaTimeNanos :javaTimeNanos :javaTimeNanos :javaTimeNanos :javaTimeNanos :javaTimeNanos :javaTimeNanos Last_gc = last_TIME_of_last_GC; last_GC = last_time_of_last_GC; t2 = last_time_of_last_GC; Therefore, the native method of maxObjectInspectionAge above returns the difference between the current nanosecond and t2 time, so when the difference between the two time exceeds 1 hour, System GC will be performed

The problem is that if we have no GC, or only one GC, such as only YGC, or only Old GC, then the value of T2 above is dependent on the GC algorithm and will be initialized to 0 by default under PS GC.

jlong               PSParallelCompact::_time_of_last_gc = 0;
jlong               PSMarkSweep::_time_of_last_gc   = 0;
Copy the code

But under CMS GC we see that there is no initialization, and without initialization, the value is uncertain, depending on the compiler

So I’m going to use the PS algorithm as an example

  • If the operating System has been running for more than an hour, the first System GC will be triggered immediately

  • If the operating system runs for 20 minutes, the first Sysmte GC will be triggered after 40 minutes

Problem conclusion

So when we see the System GC firing, in addition to the familiar scenarios, the first System GC firing in the RMI scenario is also determined by a number of factors. Hopefully this article will benefit you

When it comes to System GC, I’m sure some of you will think of the DisableExplicitGC JVM parameter, which can be turned on to disable System GC. However, I don’t recommend setting this parameter. Since the JVM wants to do System GC, it must be done. The risk of banning JVM parameters is even greater. In order to give you a better understanding of how JVM parameters are implemented, PerfMa recently held a course to explain JVM parameters in detail.

Let’s learn together:

PerfMa KO series of JVM parameters

A “weird” JVM cache loading problem check