“This is the 10th day of my participation in the Gwen Challenge.

The first part introduces the OOM common scenarios and solutions, and the next part introduces the JVM performance optimization case. This section describes how to use JMeter, a performance test tool commonly used in Java.

JMeter performance testing tool

Basic overview

Apache JMeter is a Java-based stress testing tool developed by the Apache organization. Used to stress test software, it was originally designed for Web application testing, but has since expanded to other testing areas. It can be used to test static and dynamic resources, such as static files, CGI scripts, Java objects, databases, FTP servers, and so on. JMeter can be used to simulate heavy loads on servers, networks, or objects, test their strength under different stress categories, and analyze overall performance. Here we mainly introduce the parts relevant to our next case.

The main interface

Using the process

New thread group

Create a test thread group and set the number of threads and the thread initialization startup mode. Select “Test Plan” in the left action bar and right click to add a thread group, as shown in the figure below:

Initialize the thread group information, as shown in the figure below, with 10 threads and 1000 requests per thread. Thus, Tomcat will receive 10,000 requests in this thread group run.

Added a JMeter tuple

The HTTP sampler is added. The sampler is used to sample performance data for specific requests, as shown in the figure below. This case adds sampling for HTTP requests.

After setting the specific target of the request, such as the target server address, port number, path and other information, as shown in the figure below, Jmeter will make batch requests to the target according to the setting.

Adding listeners

Jmeter can present the results of a batch request as a report. In the listener, add an aggregate report, as shown in the following figure:

Run & view the results

Debug and run, analyze index data, dig performance bottleneck and evaluate system performance status

Adjust heap size to improve service throughput

TomcatJVM configuration

In the production environment, you are not advised to directly configure Tomcat variables in catalina.sh, but in setenv.sh in the bin directory that is the same as Catalina.

So if we want to modify the MEMORY configuration of the JVM, we need to modify the setenv.sh file (by default, we don’t have any, we need to create a new setenv.sh file).

The initial configuration

Sh: setenv.sh: setenv.sh: setenv.sh:export CATALINA_OPTS="$CATALINA_OPTS -Xms30m"
export CATALINA_OPTS="$CATALINA_OPTS -XX:SurvivorRatio=8"
export CATALINA_OPTS="$CATALINA_OPTS -Xmx30m"
export CATALINA_OPTS="$CATALINA_OPTS -XX:+UseParallelGC"
export CATALINA_OPTS="$CATALINA_OPTS -XX:+PrintGCDetails"
export CATALINA_OPTS="$CATALINA_OPTS -XX:MetaspaceSize=64m"
export CATALINA_OPTS="$CATALINA_OPTS -XX:+PrintGCDateStamps"
export CATALINA_OPTS="$CATALINA_OPTS- Xloggc: / opt/tomcat8.5 / logs/gc log"
Copy the code

We view the log message:

There are a lot of Full GC logs, check out our Jmeter summary report

Throughput is 866.9/ SEC

Optimizing the allocation of

Next we test another set of data, increasing initialization and maximum memory:

export CATALINA_OPTS="$CATALINA_OPTS -Xms120m"
export CATALINA_OPTS="$CATALINA_OPTS -XX:SurvivorRatio=8"
export CATALINA_OPTS="$CATALINA_OPTS -Xmx120m"
export CATALINA_OPTS="$CATALINA_OPTS -XX:+UseParallelGC"
export CATALINA_OPTS="$CATALINA_OPTS -XX:+PrintGCDetails"
export CATALINA_OPTS="$CATALINA_OPTS -XX:MetaspaceSize=64m"
export CATALINA_OPTS="$CATALINA_OPTS -XX:+PrintGCDateStamps"
export CATALINA_OPTS="$CATALINA_OPTS- Xloggc: / opt/tomcat8.5 / logs/gc log"
Copy the code

Restart Tomcat and view gc.log

We searched the keyword Full and found only one FullGC, as shown in the figure below. We can see that after increasing the initial memory and the maximum memory, our times of Full were significantly reduced.

Looking at the Jmeter summary report, you can see that the throughput is 1142.1/ SEC, which is basically a significant improvement. This shows that the server performance is significantly improved after we increase the memory, which is the demonstration of our case.

JIT optimization

Is the heap the only option for allocating objects?

In Understanding the Java Virtual Machine, Java heap memory is described as follows: As JIT compilation progresses and escape analysis techniques mature, on-stack allocation and scalar replacement optimization techniques will lead to subtle changes in how all objects allocated to the heap become less “absolute”.

It is common knowledge in Java virtual machines that objects are allocated memory in the Java heap. However, there is a special case where an object may be optimized for stack allocation if, after Escape Analysis, it is found that there is no Escape method. This eliminates the need to allocate memory on the heap and garbage collection. This is also the most common out-of-heap storage technique.

Compilation overhead

The time overhead

Interpreter execution, in the abstract, looks like this: input code -> [interpreter interprets execution] -> execution results

JIT compilation and then execution is abstracted as: input code -> [compiler compilation] -> compiled code -> [execution] -> execution result

Note: When JIT is said to be faster than interpret, it means that executing compiled code is faster than the interpreter interprets it, not that compiling is faster than interpreting. JIT compilation, however fast, is at least a little slower than interpreted execution, which requires another “execute compiled code” process to get the final result. Therefore, interpreted execution is always faster than JIT compiled execution for “once” code. How is code executed only once? Roughly speaking, the following conditions are strictly executed only once when both are met.

  1. Is called only once, such as the class initializer ())
  2. Without loops, JIT compilation and then execution of code that only executes once is not worth the cost.
  3. For code that is executed only a few times, the speed gain from JIT compilation may not offset the overhead of initial compilation.

JIT compilation guarantees positive benefits only for frequently executed code (hot code).

The space overhead

For normal Java methods, a 10+ expansion ratio for the size of the compiled code relative to the size of the bytecode is normal. As with the time overhead mentioned above, the space overhead is also worth compiling only for code that executes frequently, and compiling all code will significantly increase the space taken up by the code, causing it to explode. This explains why some JVMS choose not to always do JIT compilation, opting instead for a hybrid execution engine with an interpreter +JIT compiler.

Just-in-time compilation optimizations for code

Escape analysis

  1. How to allocate objects on the heap to the stack requires the use of escape analysis.
  2. Escape Analysis is one of the most advanced optimization techniques in Java virtual machines. This is a cross-function global data flow analysis algorithm that can effectively reduce the synchronization load and memory heap allocation stress in Java programs.
  3. Through escape analysis, the Java Hotspot compiler can figure out the scope of a reference to a new object and decide whether or not to allocate the object to the heap.
  4. The basic behavior of escape analysis is analyzing object dynamic scope:
    • When an object is defined in a method and is used only inside the method, no escape is considered to have occurred.
    • An object is considered to have escaped when it is defined in a method and referenced by an external method. For example, as a call parameter to be passed elsewhere.

Objects that do not escape can be allocated to the stack, and stack space is removed at the end of the method execution.

Escape analysis includes:

  1. Global variable assignment escape
  2. Method returns a value to escape
  3. The instance reference escaped
  4. Thread escape: Assignment to a class variable or to an instance variable that can be accessed in another thread

Parameter Settings

Parameter Settings:

  1. Escape analysis is enabled by default in HotSpot after JDK 6U23.
  2. If you are using an earlier version, you can use:
    • Explicitly enable escape analysis with the option “-xx :+DoEscapeAnalysis”
    • Run the -xx: +PrintEscapeAnalysis option to view the filter results of escape analysis.

Conclusion: If you can use local variables in your development, do not use definitions outside the method.

Code optimization 1: Stack allocation

Using escape analysis, the compiler can optimize code by allocating on the stack. Convert heap allocation to stack allocation. If, after escape analysis, an object is found to have no escape method, it may be optimized for stack allocation. This eliminates the need to allocate memory on the heap and garbage collection. Can reduce garbage collection time and frequency.

Code Optimization 2: Synchronous elision (Elimination)

Synchronous ellipsis. If an object is found to be accessible only from one thread, operations on the object can be performed without regard to synchronization.

  1. The cost of thread synchronization is quite high, and the consequence of synchronization is reduced concurrency and performance.
  2. When a synchronized block is dynamically compiled, the JIT compiler can use escape analysis to determine whether the lock object used by the synchronized block can only be accessed by one thread and not published to other threads. If not, the JIT compiler unsynchronizes the code when it compiles the synchronized block. This can greatly improve concurrency and performance. This unsynchronization process is called synchronization elision, also known as lock elimination.

Code optimization 3: Scalar replacement

  1. A Scalar is a quantity which cannot be broken down into smaller quantities. Primitive data types in Java are scalars.
  2. In contrast, data that can be decomposed is called aggregates. Objects in Java are aggregates because they can be decomposed into other aggregates and scalars.
  3. In THE JIT stage, if an object is found not to be accessed by the outside world after escape analysis, then the OBJECT will be disassembled into several member variables contained in it to be replaced by JIT optimization. This process is called scalar substitution.

Summary of escape analysis

Escape analysis summary: Escape analysis is not mature

  1. Papers on escape analysis were published in 1999, but it wasn’t implemented until JDK 1.6, and the technology isn’t fully developed yet.
  2. The fundamental reason is that there is no guarantee that the performance cost of non-escape analysis will be higher than its cost. Scalar substitution, stack allocation, and lock elimination can be done after escape analysis. However, escape analysis itself also requires a series of complex analysis, which is actually a relatively time-consuming process.
  3. An extreme example would be an escape analysis where no object is escape-free. Then the process of escape analysis is wasted.
  4. Although this technique is not very mature, it is also a very important tool in real-time compiler optimization.
  5. Note that there is some argument that through escape analysis, the JVM allocates objects on the stack that will not escape, which is theoretically possible, but depends on the JVM designer’s choice.
  6. Most of the books are based on JDK versions prior to JDK 7. The JDK has changed a lot, and the cache of intern strings and static variables used to be allocated to the persistent generation, which has been replaced by the metadata section. However, the intern string cache and static variables are not transferred to the metadata area, but are allocated directly on the heap, so this is also consistent with the previous point:Object instances are allocated on the heap.

Properly configure heap memory

The recommended configuration

Adding more memory can improve system performance and the effect is significant, so the question that arises is, how much memory should we add? If the memory is too large, the GC time will be relatively long if the FullGC is generated, and if the memory is small, GC will be triggered frequently. In this case, how can we reasonably adapt the heap memory size?

Analysis: Set according to the recommended formula in Java Performance.

  1. Java whole heap size Settings, Xmx and Xms are set to 3-4 times older living objects, i.e., 3-4 times older memory usage after FullGC.
  2. The method area (PermSize and MaxPermSize PermSize or MetaspaceSize and MaxMetaspaceSize) is set to 1.2-1.5 times that of the old age survivable object.
  3. The young XMNS are set to 1-1.5 times that of the old ones.
  4. The memory size of the old age is set to 2-3 times that of the old age living objects.

However, the above statement is not absolute, that is to say, this is a reference to, according to a variety of tuning after come to a conclusion, we can according to this value to set up our initializes the memory, in the case of ensure the normal operation of the program, we have to check, the recovery rate of GC, GC pause time consuming, judging the actual data in the memory, Full GC is almost impossible. If there is a Full GC, do a memory Dump analysis, and then do a reasonable memory allocation.

How to calculate old age surviving objects

We also have to pay attention to a point is, the above said the old age survival object how to determine.

Method 1: View logs

Add GC logs to JVM parameters. The GC logs will record the memory size of each generation after each FullGC and observe the space size after GC of the old age. The memory condition after FullGC for a period of time (such as 2 days) can be observed, and the size of the surviving object in the old age after FullGC can be estimated according to the space size of the old age after FullGC for many times (the average value can be taken according to the memory size after FullGC for many times).

Recommended/safe!

Mode 2: Force the FullGC to be triggered

  1. Will affect the online service, use with caution!
  2. Approach 1 is more feasible, but requires changing JVM parameters and analyzing logs. At the same time, FullGC may not be triggered when CMS collector is used, so FullGC is not recorded in the log. It’s more difficult to deal with when you’re analyzing it. Therefore, it is sometimes necessary to force a FullGC to be triggered to see the size of the old age surviving object after the FullGC.
  3. Note: Forcibly triggering FullGC will cause online service interruption (STW), be careful! The recommended operation mode is to remove the service node before forcing FullGC, then suspend the service back to the available node after FullGC, provide services externally, trigger FullGC in different time periods, and estimate the size of the surviving object in the old age after FullGC according to the memory condition of the old age after multiple FullGC

How do I force the Full GC to fire?

  1. Jmap -dump:live,format=b,file=heap.bin Dumps the current living object to a file, which triggers FullGC
  2. Jmap-histo :live prints the number of instances, memory usage, and full name of each class. FullGC is triggered
  3. In a performance test environment, FullGC can be triggered using Java monitoring tools, such as VisualVM, which integrates with JConsole, and either VisualVM or JConsole, which has a GC trigger button

conclusion

  1. When memory is relatively tight, you can tune memory in the same way as described above to find a memory setting that is acceptable for both GC frequency and GC time, and can meet the current service needs with a smaller amount of memory.
  2. When memory is relatively rich, you can add a little more memory to the service, which can reduce the frequency of GC and increase the GC time correspondingly. Generally, low latency requirements can be considered to set a little more memory, the delay requirements are not high, you can set a small memory in accordance with the above way.
  3. If you observe outofMemoryErrors in the garbage collection log, try to increase the Size of the Java heap to 80% to 90% of physical memory. Of particular note are outofMemoryErrors caused by heap space and the need to increase space.
    • For example, increment -xms and -xmx to resolve outofMemoryErrors of the old generation
    • Add -xx :PermSize and -xx :MaxPermSize to resolve OutOfMemoryError caused by permanent generation (before jdk7); -xx :MetaspaceSize and -xx :MaxMetaspaceSize to resolve OutOfMemoryError caused by Metaspace (after jdk8)
  4. Keep in mind that the amount of Java heap can use is limited by hardware and whether a 64-bit JVM is used. After increasing the size of the Java heap, check the garbage collection log until there are no OutofMemoryErrors. If the application is running in a steady state without an OutOfMemoryError, you can proceed to the next step, calculating the size of the active object.

Do you estimate GC frequency?

Normally, we should make a memory estimate according to our system, which can be tested in the test environment. At the beginning, we can set the memory to be larger, such as 4G, of course, this can also be estimated according to the business system.

For example, obtaining a piece of data from the database takes 128 bytes and requires obtaining 1000 pieces of data. Then, the size of memory to be read once is (128 B/1024 Kb/1024M) * 1000 = 0.122m. Then, our program may need to read concurrently, for example, 100 times per second. So the memory footprint is 0.122100 = 12.2m, if the heap is set to 1 GB, then the young generation size is about 333M, then 333M80% / 12.2m =21.84s, which means that our program runs almost two or three youngGC times per minute. This will give us a rough estimate of the system.

conclusion

This article introduces the common JVM performance optimization scenarios in Java, including heap size adjustment to improve service throughput, JIT optimization, and rational heap memory configuration. The specific tuning depends on the business scenario, and the tuning method is also different in different business scenarios. The next article covers JVM performance tuning scenarios common in Java.

Welcome everyone to pay attention to the public account (MarkZoe) to learn from each other and communicate with each other.