Java application performance optimization is a commonplace topic, typical performance problems such as slow page response, interface timeout, high server load, low concurrency, frequent database deadlock and so on. Especially in the “rough and fast” Internet development mode is popular today, with the increase of system access and code increasingly bloated, all kinds of performance problems began to come. Java application performance bottlenecks are numerous, such as disk, memory, network I/O and other system factors, Java application code, JVM GC, database, cache, etc. Java performance optimization is divided into four layers: application layer, database layer, framework layer, and JVM layer.

The difficulty of each layer of optimization increases step by step, and the knowledge involved and problems solved will be different.

  • The application layer needs to understand the code logic and locate the problematic lines of code through the Java thread stack.
  • The database level needs to analyze SQL, locate deadlocks, etc.
  • Framework layer needs to understand the source code, understand the framework mechanism;
  • The JVM layer needs to have an in-depth understanding of the types and workings of the GC, as well as the effects of various JVM parameters.

There are two basic analysis methods for Java performance optimization: field analysis and post hoc analysis. Field analysis preserves the site and then uses diagnostic tools to analyze and locate the site. Onsite analysis has a great impact on online services, and some scenarios (especially when critical online services are involved) are inappropriate. Post hoc analysis requires the collection of as much field data as possible, and the immediate restoration of service, as well as post-hoc analysis and replication of the collected field data. Starting with a performance diagnostic tool, let’s review some of the classic cases and practices in the HeapDump performance community.

Performance diagnostic tool

One type of performance diagnosis is to diagnose the system and code that has been confirmed to have performance problems. The other type is to test the performance of the system before going online in advance to determine whether the performance meets the requirements of going online. This article focuses on the former; the latter can be tested with various performance pressure tools, such as JMeter, and is beyond the scope of this article. For Java applications, performance diagnostic tools are divided into two layers: OS level and Java application level (including application code diagnosis and GC diagnosis).

OS diagnosis

OS diagnosis focuses on CPU, Memory, and I/O.

CPU diagnosis

For THE CPU, focus on Load Average, CPU usage, and Context Switch times.

You can run the top command to view the average load and CPU usage. PerfMa open sourceXPocket plug-in containerIntegration of thetop_x, it is an enhanced version of Linux Top, can display CPU usage/load, CPU and memory processes used by the list. This plug-in for the complex top command output for functional separation and sorting, more clear and easy to use, support pipelining, especially can directly get top process or thread TID, PID; . The mem_s command is used to sort processes by swap size to enhance the original top function.The graph shows that the CPU of the current system is more than 51% used. If the CPU usage of some processes is high, you can run the cpu_t command of top_x to automatically obtain the CPU usage of the process with the highest CPU usage. You can also run the -p parameter to specify the process PID and run the cpu_t command to view the following information:The vmstat command can be used to check the number of CPU context switches. XPocket is also integratedvmstatTool.

Context switching times occur in the following scenarios:

  • When the time slice is used up, the CPU schedules the next task
  • It is preempted by other tasks of higher priority
  • If I/O is blocked during task execution, suspend the current task and switch to the next task
  • User code actively suspends the current task to give up the CPU
  • Multi-task preempt resource, suspended because it was not grabbed
  • Hardware is interrupted.

Java thread context switching comes primarily from competing for shared resources. Locking a single object is rarely a system bottleneck unless the lock granularity is too large. However, in a code block with high access frequency and continuous locking of multiple objects, a large number of context switches may occur, which becomes the bottleneck of the system. In log4j, asynchronous AsyncLogger logging is used to write logs, resulting in frequent CPU context switching, resulting in a service avalanche. AsyncLogger uses the Disruptor framework, which processes MultiProducer on top of the core data structure RingBuffer. Disruptor calls Unsafe. Park to suspend the current thread. The disruptor calls Unsafe. To put it simply, when the consumption speed failed to keep up with the production speed, the production thread made infinite retries with a retry interval of 1 Nano, resulting in frequent SUSPENSION and awakening of the CPU, and a large number of CPU switches, occupying CPU resources. The Distuptor and OG4j2 versions have been moved to 3.3.6 and 2.7 respectively.

Memory

From the perspective of the operating system, memory concerns whether the application process is sufficient. You can run the free -m command to check the memory usage. You can run the top command to view the virtual memory VIRT and physical memory RES used by processes. Based on the VIRT = SWAP + RES formula, you can calculate the SWAP partition used by specific applications. Excessive SWAP partition usage may affect the performance of Java applications. You can set the swappiness value to as small as possible. For Java applications, taking up too many swap partitions can affect performance, since disk performance is much slower than memory.

I/O

I/O includes disk I/O and network I/O. Disks are more prone to I/O bottlenecks. You can run the iostat command to check the disk read/write status. You can run the CPU I/O wait command to check whether the disk I/O is normal. If the disk I/O status is always high, the disk is slow or faulty, causing performance bottlenecks and requiring application optimization or disk replacement.

In addition to common commands such as top, ps, vmstat, and iostat, other Linux tools can be used to diagnose system problems, such as mpstat, tcpdump, netstat, pidstat, and SAR. Here is a summary of Linux performance diagnostic tools for different device types, as shown in the following figure for your reference.

Java application diagnostic tools

Application code diagnostics

Application code performance problems are relatively easy to solve. Through some application level monitoring alarm, if the problem of the function and code, directly through the code can be located; Alternatively, by using top+ JStack, you can find the faulty thread stack and locate the code of the faulty thread to find the problem. For more complex and logical code segments, Stopwatch printing performance logs also tends to locate most application code performance issues.

Common Java application diagnostics include threading, stack, GC, and so on.

jstack

The jstack command is usually used with top to locate Java processes and threads using the top -h -p PID command, and export the thread stack using the jstack -l pid command. Since the thread stack is transient, it needs to be dumped multiple times, usually 3 times, usually every 5 seconds. The Java thread PID located by TOP is converted into hexadecimal to obtain the NID in the Java thread stack, and the corresponding problem thread stack can be found.

XPocket is integratedjstack_xTool, you can use stack -t nid command to view the call stack of a thread waiting to be locked, through the call stack to locate the business code.

XElephant, XSheepdog

XElephantThe HeapDmp performance community provides free online analysis of Java memory Dump files. It makes the dependencies between objects in memory clearer, does not need to install software, provides upload mode, is not limited by the memory of the local machine, and supports the analysis of large Dump files.

XSheepdogThe HeapDmp performance community provides a free online analysis of thread Dump files, threads, thread pool, stack, method and lock relationships, presented to the user from a variety of perspectives, let the thread problem at a clear point of view.

The GC diagnosis

The Java GC takes care of the programmer’s risk of managing memory, but application pauses caused by GC are another issue that needs to be addressed. The JDK provides a number of tools to locate GC problems, including Jstat, JMap, and third-party tools such as MAT.

jstat

The jstat command prints GC details, Young and Full GC counts, heap information, and more. The command format is jstat -gcxxx -t pid.

MAT

MAT is a powerful tool for analyzing Java heap, providing intuitive diagnostic reports. The built-in OQL allows SQL queries to the heap, which is powerful. Outgoing reference and incoming reference can trace object references to the source.

Shallow size and Retained size of the object are two columns of MAT, Shallow size and Retained size respectively. The former represents the size of the object itself, excluding the referenced object, and the latter is the sum of the Shallow size of the object itself and its directly or indirectly referenced object. That is, the amount of memory the GC frees after the object is collected, which is generally a concern. For some large (tens of gigabytes) Java applications, large memory is required to open MAT. Generally, the memory of the local development machine is too small to open. You are advised to install a graphical environment and MAT on the offline server and open it remotely. Or run the mat command to generate a heap index and copy the index locally, but this way you will see limited heap information.

To diagnose GC problems, it is recommended to add -xx :+PrintGCDateStamps to the JVM parameters.

For Java applications, top+ JSTACK + JMAP +MAT can be used to locate most applications and memory problems, which is a necessary tool. In some cases, Java application diagnostics need to refer to OS-related information, and more comprehensive diagnostic tools such as Zabbix, which integrates OS and JVM monitoring, can be used. In distributed environments, infrastructure such as distributed tracking systems also provide strong support for application performance diagnostics.

Performance optimization practices

After introducing some of the commonly used performance diagnostics tools, we’ll share examples from the JVM layer, the application code layer, and the database layer, combining some of our practices in Java application tuning.

JVM tuning: GC pain

In this paper, we recorded the situation that the interface time is long and the picture can not be accessed. After eliminating the database, synchronization day to block problems and system problems, we began to investigate GC problems. The following output is displayed after the jstat command is executed

Bash-4.4 $/app/jdk1.8.0_192/bin/jstat -gc 1 2s S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT 170496.0 170496.0 0.0 0.0 171008.0 130368.9 1024000.0 590052.8 70016.0 68510.8 8064.0 7669.0 983 13.961 1400 275.606 289.567 170496.0 170496.0 0.0 0.0 171008.0 41717.2 1024000.0 758914.9 70016.0 68510.8 8064.0 7669.0 987 14.011 1401 275.722 289.733 170496.0 170496.0 0.0 0.0 171008.0 126547.2 1024000.0 770587.2 70016.0 68510.8 8064.0 7669.0 990 14.091 1403 275.986 290.077 170496.0 170496.0 0.0 0.0 171008.0 45488.7 1024000.0 650767.0 70016.0 68531.9 8064.0 7669.0 994 14.148 1405 276.222 290.371 170496.0 170496.0 0.0 171008.0 146029.1 1024000.0 714857.2 70016.0 68531.9 8064.0 7669.0 995 14.166 1406 276.366 290.531 170496.0 170496.0 0.0 171008.0 118073.5 1024000.0 669163.2 70016.0 68531.9 8064.0 7669.0 998 14.226 1408 276.736 290.962 170496.0 170496.0 0.0 171008.0 3636.1 1024000.0 687630.0 70016.0 68535.6 8064.0 7669.6 1001 14.342 1409 276.871 291.213 170496.0 170496.0 0.0 171008.0 87247.2 1024000.0 704977.5 70016.0 68535.6 8064.0 7669.6 1005 14.463 1411 277.099 291.562Copy the code

There is an FGC almost every second, and the pause is quite long. -xx: -useadaptivesizePolicy -xx: -useadaptivesizePolicy -xx: -useadaptivesizePolicy -xx: -useadaptivesizePolicy

GC tuning is still necessary for high-concurrency, high-volume interactions, especially since default JVM parameters often do not meet business requirements and need to be tuned specifically. There is a lot of public information about interpreting GC logs that is not covered in this article. There are three basic ideas for GC tuning: reduce GC frequency by increasing heap space and reducing unnecessary object generation; Reducing GC pause time can be achieved by reducing heap space and using the CMS GC algorithm. Avoid Full GC, adjust the CMS trigger ratio, avoid Promotion Failure and Concurrent mode Failure (older generation allocates more space, increases the number of GC threads to speed up collection), reduce large object generation, etc.

Application layer tuning: Smelling bad code

It is undoubtedly a good way to improve the performance of Java applications to analyze the root of code efficiency decline from the application layer code tuning.FGC practice: Bad code causes frequent FGC unresponsiveness problem analysisThis article documents cases where bad code leads to memory leaks and excessive CPU usage and a large number of interface timeouts.Using the MAT tool to analyze the JVM Heap, you can see from the pie chart above that most of the Heap is occupied by the same memory, then look at the Heap details and trace back to the top level, and soon find the culprit.Once the memory leak object is found, search the project globally for the object name, which is a Bean object, and locate one of its properties of type Map. The Map uses ArrayList to store the result of each probe interface response according to the type. After each probe, the result is inserted into the ArrayList for analysis. Since the Bean object is not recycled and there is no clear logic for this property, the service has not been restarted for ten days. The Map gets bigger and bigger until it fills up memory. When memory is full, you can’t allocate any more memory for HTTP response results, so you’re stuck on readLine. And our interface with a large amount of I/O has a particularly large number of alarms, which is estimated to be related to the need for more memory due to a large response.

For locating bad code, in addition to code review in the conventional sense, MAT and other tools can also be used to quickly locate system performance bottlenecks to a certain extent. However, in the case of binding to a specific scenario or binding to service data, auxiliary code walk-throughs, performance detection tools, data simulation, and even online traffic diversion are required to finally identify the source of performance problems. Here are some possible characteristics of bad code for your reference: (1) Code that is not readable and has no basic programming specifications; (2) Excessive object generation or large object generation, memory leakage, etc.; (3) IO stream operation is too much, or forget to close; (4) Too many database operations and too long transactions; (5) The scene used in synchronization is wrong; (6) Cycle iteration time-consuming operation, etc.

Database layer tuning: deadlock nightmare

For most Java applications, it is common to interact with databases, especially for OLTP applications that require high data consistency. The performance of the database directly affects the performance of the entire application.

Generally speaking, for database layer tuning we will basically start from the following aspects: (1) at the SQL statement level optimization: slow SQL analysis, index analysis and tuning, transaction separation, etc.; (2) Optimization at the database configuration level, such as field design, cache size adjustment, disk I/O and other database parameters optimization, data fragmentation, etc.; (3) Optimize from the database structure level: consider the vertical split and horizontal split database; (4) Choose the right database engine or type to adapt to different scenarios, such as considering the introduction of NoSQL, etc.

Summary and Suggestions

Performance tuning also follows the 2-8 principle, 80% of performance problems are caused by 20% of the code, so optimizing the key code is twice as effective. At the same time, the optimization of performance should be optimized on demand, excessive optimization may introduce more problems. For Java performance optimization, you need to understand not only the system architecture and application code, but also the JVM layer and even the underlying operating system. To sum up, we can mainly consider the following points:

1) Basic performance tuning Here, basic performance refers to the upgrade optimization at the hardware level or operating system level, such as network tuning, operating system version upgrade, hardware device optimization, etc. For example, the use of F5 and the introduction of SDD hard disk, including the new version of Linux in NIO upgrade, can greatly promote the performance of applications;

2) database performance optimization including common transaction split, the index tuning, SQL optimization, no (introduction, etc., such as when the transaction is split into asynchronous processing, eventually achieve consistency is introduced, such as included in the specific scene to introduce all kinds of no (database, can greatly alleviate the shortage of the traditional database under the high concurrency.

3) Application architecture optimization to introduce some new computing or storage frameworks, using new features to solve the bottleneck of the original cluster computing performance; Or the introduction of distributed strategy, in the calculation and storage of horizontal, including pre-processing of advance calculation, the use of typical space for time; Can reduce the system load to a certain extent;

4) Business-level optimization technology is not the only means to improve system performance. In many scenarios where performance problems occur, it can be seen that most of them are caused by special business scenarios. In fact, it is often the most effective to avoid or adjust the performance problems in business.