This is the 19th day of my participation in the Genwen Challenge

Recently, we need to conduct a round of pressure test for all modules on the line in view of the coming high flow, and the result as shown in the figure appears after pressure test for one module:

It is impossible to determine why this result occurs from the pressure test results shown in the figure above.

Problems follow up

As we can see from ali Cloud monitoring, the service CPU was almost 100% at that time, and the service OF YGC and FULL GC was quite frequent. We tested 100% reproduction on a simulation environment.

  1. Check out the JVM information using one of our common commands:

  1. You can see that there are a lot of GC’s and full GC’s are frequent;

Let’s look at the GC information jMAP-heap 23638

  1. The Survivor space and old generation memory are used up.

Now I wondered if I had any big objects, so I jmap-histo again; Open with Jprofiler (found on idea plugins, usually can start the service locally to view various data)

Found no large object;

  1. Now I wonder if the startup parameters are incorrectly configured

jinfo -flags 24445

GCPauseIntervalMillis=8000 GCPauseIntervalMillis=8000 GCPauseIntervalMillis=8000

-XX:MaxGCPauseMillis=xxx

Represents the maximum number of milliseconds of pauses per GC, and the VM will adjust the Java heap size and other GC-related parameters to make GC pauses less than XXX milliseconds, as long as possible to ensure that memory reclamation takes no longer than the set value. Note that this can cause the VM to reduce overall throughput (throughput = time to run user code/total TIME to run the VM), and in some cases, the VM will not reach the desired pause time target. By default, the VM has no pause time target value. The pause time of the GC depends primarily on the amount and volume of live data in the heap. Use this parameter with caution. Too small a value will cause the system to spend too much time doing garbage collection. The reason is that to meet the maximum pause time, the VM will set up a smaller heap to store a relatively small number of objects, increasing the collection rate and leading to a higher frequency of GC.

At this point the problem was finally found.

conclusion

Let’s take a quick look at the JVM heap and the commonly used G1 and CMS:

  • G1 collector

Service-oriented garbage collector. Advantages: parallelism and concurrency, generational collection, spatial integration, predictable pauses. Operation steps: Initial Marking, Concurrent Marking, Final Marking, screening and Evacuation (Live Data Counting and Evacuation)

  • CMS collector

The CMS (Concurrent Mark Sweep) collector is a collector whose goal is to obtain the shortest collection pause time. Implementation based on mark-clear algorithm. Operation steps: CMS Initial mark: mark concurrent objects that GC Roots can be directly associated with. Fixed drawbacks of CMS Concurrent sweep during concurrent marking: CPU resource sensitivity, inability to collect floating garbage, and marking — space debris from the sweep algorithm