Premise:

A large cross-border e-commerce business develops rapidly, and the online machine expands frequently. However, there is no unified standard for the operation of the online machine, especially the JVM memory, to give the owner of each application service. After the 618 promotion, I discussed with the operation and maintenance students, hoping to standardize the JVM parameters of the online server, so that they can be given to each application in a unified way, which can improve the stability of the online server and reduce the time for everyone to adjust THE JVM parameters.

I refer to the experience of a company working on Taobao Tmall: After discussion, a recommended default JVM template was determined according to JDK version and online machine configuration:



The final recommended JVM template:

JDK version Machine Configuration Recommended JVM parameter Remarks

Jdk1.76v8g-server-xms4g -XMx4g-xmn2G-xSS768K -xx :PermSize= 512m-xx :MaxPermSize= 512m-xx :+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:+DisableExplicitGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=68 -verbose:gc -XX:+PrintGCDetails -Xloggc:{CATALINA_BASE}/logs/gc.log + PrintGCDateStamps – – XX: XX: XX: + HeapDumpOnOutOfMemoryError – HeapDumpPath = {CATALINA_BASE} / logs at the front desk

Jdk1.78v8g-server-xms4g -XMx4g-xmn2G-xSS768K -xx :PermSize= 512m-xx :MaxPermSize= 512m-xx :+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:+DisableExplicitGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=68 -verbose:gc -XX:+PrintGCDetails -Xloggc:{CATALINA_BASE}/logs/gc.log + PrintGCDateStamps – – XX: XX: XX: + HeapDumpOnOutOfMemoryError – HeapDumpPath = {CATALINA_BASE} / logs at the front desk

Jdk1.74v8g-server-xms4g -XMx4g-xmn2G-xSS768K -xx :PermSize= 512m-xx :MaxPermSize= 512m-xx :+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:+DisableExplicitGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=68 -verbose:gc -XX:+PrintGCDetails -Xloggc:{CATALINA_BASE}/logs/gc.log + PrintGCDateStamps – – XX: XX: XX: + HeapDumpOnOutOfMemoryError – HeapDumpPath = {CATALINA_BASE} / logs at the front desk

Jdk1.76v8g-server-xms4g -xmx4g-xx :MaxPermSize=512m \

Verbose: gc – XX: + PrintGCDetails Xloggc  {CATALINA_BASE} / logs/gc log – XX: + PrintGCTimeStamps \ background





Recommended configuration of a BAT company:









Configuration description:

1. The heap Settings

O-xms: initial heap size

O-xmx: Maximum heap size

O-xx :NewSize=n: Sets the size of the young generation

O-xx :NewRatio=n: Sets the ratio of the young generation to the old generation. For example, is 3, indicating that the ratio of the young generation to the old generation is 1:3, and the young generation accounts for 1/4 of the sum of the young generation and the old generation

O-xx :SurvivorRatio=n: ratio of Eden zone to two Survivor zones in a young generation. Notice that there are two Survivor zones. For example, 3 means Eden: Survivor= 3:2, and a Survivor zone accounts for 1/5 of the whole young generation

O-xx :MaxPermSize=n: Set the persistent generation size

2. Collector Settings

O-xx :+UseSerialGC: Sets the serial collector

O-xx :+UseParallelGC: Sets the parallel collector

O-xx :+UseParalledlOldGC: Sets the parallel generation collector

O-xx :+UseConcMarkSweepGC: Sets the concurrent collector

Garbage collection statistics

-XX:+PrintGC

-XX:+PrintGCDetails

-XX:+PrintGCTimeStamps

-Xloggc:filename



4. Parallel collector setup

-xx :ParallelGCThreads=n: Sets the number of cpus used by the parallel collector for collection. Collect the number of threads in parallel.

-xx :MaxGCPauseMillis=n: Sets the maximum pause time for parallel collection

-xx :GCTimeRatio=n: Sets the percentage of garbage collection time in the running time of the program. Formula for 1 / (1 + n)

5. Concurrent collector setup

-xx :+CMSIncrementalMode: set it to incremental mode. This mode applies to a single CPU.

-xx :ParallelGCThreads=n: Sets the number of cpus used when the collection mode of the young generation of the concurrent collector is parallel collection. Collect the number of threads in parallel.

(4)



Parameter Description:



-Xms3072m -Xmx3072m

-xms-xmx specifies the minimum and maximum values for JVM heap Settings

-Xmn1024m sets the size of the young generation to 1024m

Total JVM memory size = young generation size + Old generation size + persistent generation size (PERM).



-Xss768k Sets the stack size for each thread. After JDK5.0, the stack size of each thread is 1M. Before JDK5.0, the stack size of each thread is 256K. The size of memory required by more applied threads is adjusted. Reducing this value generates more threads for the same physical memory. However, the operating system has a limit on the number of threads in a process, which cannot be generated indefinitely. The experience value is about 3000~5000.



-XX:PermSize=512m -XX:MaxPermSize=512m

The permanent generation has a fixed size of 64m, so increasing the young generation will reduce the size of the old generation. This value has a significant impact on system performance. Sun officially recommends setting it to 3/8 of the entire heap.

Set the initial value of non-heap memory. Default is 1/64 of physical memory. The maximum non-heap memory size is set by XX:MaxPermSize. The default is 1/4 of the physical memory



-XX:+UseConcMarkSweepGC

The CMS collector is also known as the momentary pause concurrent collector. It is garbage collection for the elderly generation. The CMS collector performs garbage collection concurrently through multiple threads to minimize pauses caused by garbage collection. The CMS collector uses the same algorithm as the Parallel collector for garbage collection of the young generation. This garbage collector is suitable for applications that cannot tolerate long pauses and require fast responses.



-xx :+UseParNewGC; -xx :+UseParNewGC



-XX:+CMSClassUnloadingEnabled

If you enable CMSClassUnloadingEnabled, garbage collection will clean up persistent generations and remove classes that are no longer used. This parameter is only useful if UseConcMarkSweepGC is also enabled.



-xx :+DisableExplicitGC disables system.gc () to prevent programmers from accidentally calling GC methods and affecting performance;



-XX:+UseCMSInitiatingOccupancyOnly

Flag to command the JVM not to start the CMS garbage collection cycle based on data collected at runtime. But, when this flag is open, the JVM collected through CMSInitiatingOccupancyFraction values for each CMS, not just for the first time. However, keep in mind that most of the time, the JVM makes better garbage collection decisions than we do ourselves. Therefore, this flag should only be used when we have a good reason (such as testing) and have a deep understanding of the life cycle of the objects generated by the application.



-XX:CMSInitiatingOccupancyFraction=68

By default, the CMS will be collected when the tenured generation reaches 68% of the total. Increase this value if your tenured generation is not growing so fast and you want to reduce the number of CMS.



-xx :+UseParNewGC: use multiple threads to collect data on the younger generation.





-XX:HeapDumpPath

-XX:+PrintGCDetails

-XX:+PrintGCTimeStamps

-Xloggc:/usr/aaa/dump/heap_trace.txt

The above parameters are Heap Dump information



” -XX:+HeapDumpOnOutOfMemoryError

This parameter controls how much information about the heap is printed when OutOfMemoryError occurs





As you may have noticed, CMS is recommended for garbage collection;

CMS is a collector to obtain the shortest recovery pause time as the goal, can effectively reduce the server pause time;

The GC thread of CMS will have a relatively high CPU occupancy rate, but it still shows superior characteristics on multi-core servers. It is also deployed on major e-commerce websites in China. So it’s highly recommended!



Concept of CMS:

The CMS collector is also known as the momentary pause concurrent collector. It is garbage collection for the elderly generation. The CMS collector performs garbage collection concurrently through multiple threads to minimize pauses caused by garbage collection. The CMS collector uses the same algorithm as the Parallel collector for garbage collection of the young generation. This garbage collector is suitable for applications that cannot tolerate long pauses and require fast responses. CMS uses a variety of methods to minimize GC pause times and user program pauses. Pause times are reduced at the expense of CPU throughput. This is a trade-off between pause time and performance, which can be simply interpreted as “space (performance)” for time.



The rhythm of adjustment:

For fear of affecting online applications, the adjustment steps are divided into three steps:

Step 1: Partially affect a small number of machines, compare the unadjusted machines, observe the adjusted results;

Step 2: adjust some of the application parameters, pressure test, observe the effect of high concurrent pressure test;

Step 3: Adjust the JVM parameters of some core applications and verify the effect through 818 promotion;

At present, the 818 promotion has been achieved. It’s a good time to sum up.



One: Long-term performance,

The first change: the number of FGC decreased, more than doubling;

Mobile project, before the adjustment is basically 1-2 car per day, after the adjustment is basically 2-3 days:







Online (another project) : You can see that FGC statistics are much less frequent;











Second change: FGC time reduced

















It used to take nearly 500ms for an FGC, but now it takes only 100ms.

It turns out that the biggest benefit of CMS is to reduce FGC pause times.



Two: pressure measurement and promote performance

Basically, the time of FGC is greatly shortened, while the time of YANGGC is longer with little change in frequency.

Data source: Pressure test team summary



xxxx-online4.server.org

CMS

xxxx-online1.server.org

CMS

xxxx-online34.server.org

Default garbage collector

instructions

Fullgc number

1

1

1

Fullgc total time

343

250

1219

Default garbage collector /CMS FullGC time

3.55

4.88

CMS fullGC time is significantly less than the default garbage collector time.

Fullgc point in time

2:48:36

3:14:36

5:30:36

CPU usage % at fullGC

40%

10%

16%

Load Average at fullGC

1.19

0.49

1.21

Younggc total number

1094

1098

1078

Younggc total time

44093

44632

30387

Younggc Average time

40.30

40.65

28.19

Younggc Maximum time

1332

1268

928

CMS/ Default garbage collector (Younggc total time)

1.45

1.47

CMS younggc time is longer than the default garbage collector time

CMS/ Default garbage collector (younggc average time)

1.43

1.44

CMS younggc time is longer than the default garbage collector time

CMS/ Default garbage collector (younggc Max time)

1.44

1.37

CMS Younggc times are worse than the default garbage collector worst case scenario

<! –EndFragment–>



The number of full GC counts on sentry

We can safely say:

1. Full GC == Major GC refers to GC that stops the world for older generations/permanent generations

2. Full GC = Stop the world in old GC

3. Full GC time = total time to stop the world during old GC

4. CMS is not equal to Full GC, we can see that the CMS is divided into several phases, only the Stop the world phase is counted as Full GC times and times, while concurrent GC times and times with the business thread are not considered as Full GC



CMS Initial mark and Remark will both stop the world, which is recorded as two times. The CMS may fail and raise a Full GC again

If a concurrent mode failure occurs during a CMS concurrent GC, then a Mark-sweep-compact full GC is performed, which is completely stop-the-world.



It is this feature that causes the CMS to update its full GC counter twice per concurrent GC cycle, initial mark and final re-mark once each; If a Concurrent mode failure occurs, the subsequent full GC will calculate itself.



Four: Several problems encountered:

Problem 1: Stack overflow;

-Xss256k this parameter is adjusted, remote feedback may affect the trace call. The following error is reported:

Java.lang.StackOverflowError

at net.sf.jsqlparser.util.deparser.ExpressionDeParser.visitBinaryExpression(ExpressionDeParser.java:278)

at net.sf.jsqlparser.util.deparser.ExpressionDeParser.visit(ExpressionDeParser.java:246)

at net.sf.jsqlparser.expression.operators.conditional.OrExpression.accept(OrExpression.java:37)

at net.sf.jsqlparser.util.deparser.ExpressionDeParser.visitBinaryExpression(ExpressionDeParser.java:278)

at net.sf.jsqlparser.util.deparser.ExpressionDeParser.visit(ExpressionDeParser.java:246)

Because this parameter sets the stack size per thread. After JDK5.0, the stack size of each thread is 1M. Before JDK5.0, the stack size of each thread is 256K. Reducing this value generates more threads for the same physical memory.

So remove the -xSS256K parameter on an Inventory machine today and see if this is the cause



Problem 2: The initialization mark phase takes too long:

It is generally recommended that the time of two STW in CMS stage should not exceed 200ms. If the time is too long due to CMS Initial Mark stage:

In the CMS Initial Mark phase, to minimize the STW time overhead, we can use:

-XX:+CMSParallelInitialMarkEnabled

Enable parallelization in the process of initial marking to further improve the efficiency of initial marking.

Problem 3: The STW time in the remark phase is too long

The diagram below:









This can be done by:

The purpose of starting YGC before CMS GC is to reduce the reference of OLD Gen to YGC Gen and reduce the cost of remark —– Generally, 80% of THE GC time of CMS is in remark phase

-XX:+CMSScavengeBeforeRemark

Jmap analysis:









OutOfMemoryError caused by niO framework occupying DirectMemory

Solution: Use XX:+DisableExplicitGC

Increase the size of DirectMemory

1. DirectMemory is not part of the Java heap. Allocating memory is actually a call to the operating system’s Os:malloc() function.

2. The capacity can be specified by -xx :MaxDirectMemorySize. If not specified, it defaults to the maximum value of the Java heap (specified by -xmx). Note that the IBM JVM default Direct Memory is not directly related to -xmx.

3. The use of Direct Memory prevents data from being copied back and forth between the Java heap and Native heap. Improve performance from certain scenarios.

4. Direct ByteBuffer objects automatically clean up the native buffer, but this process can only be performed as part of the Java heap GC, so they do not automatically respond to the stress placed on the native heap.

GC occurs only when the Java heap is too full to service heap allocation requests, or when the System.gc() function is explicitly called in a Java application to free up memory (some NIO frameworks use this method to free up occupied DirectMemory).

6. Improper use of this area may also cause OutOfMemoryError.

7. In the case of frequent Buffer creation, DirectBuffer should not be used because of the high cost of creating and destroying DirectBuffer. However, if DirectBuffer can be reused, it can greatly improve performance in the case of frequent read and write. (Read and write to DirectBuffer is faster than normal Buffer, but its creation and destruction are slower than normal Buffer.)