As long as the business logic code is written correctly and the business state concurrency in multiple threads is handled properly, there is little need for tuning. At most, the performance monitoring platform finds that the invocation time of some interfaces is too high, and then finds that the execution of a SQL or third-party interface times out. If you are working on middleware or IM communications projects, you may need to specialize in CPU, disk, network, and memory troubleshooting and tuning skills

  • How can I rectify the fault if the CPU is too high
  • The Linux memory
  • Disk I/o
  • Network IO
  • Java application memory leaks and frequent GC
  • Java thread troubleshooting
  • Common JVM startup parameters tuning

Linux tuning commands, portal:You can’t lose money

Pay attention to the public account, communicate together, wechat search: sneak forward

How do I rectify the fault when the Linux CPU is too high

CPU Indicator Parsing

  • The average load
    • The load average is equal to the number of logical cpus, indicating that each CPU is exactly fully utilized. If the average load is greater than the number of logical cpus, the load is heavy
  • Process context switch
    • Voluntary context switching due to inability to obtain resources
    • An involuntary context switch caused by forced scheduling by the system
  • CPU utilization
    • User CPU usage, including user-mode CPU usage (USER) and low-priority user-mode CPU usage (NICE), represents the percentage of time that the CPU is running in user-mode. If the CPU usage is high, some applications are busy
    • System CPU usage: indicates the percentage of time (excluding interrupts) that the CPU is running in kernel mode. If the system CPU usage is high, the kernel is busy
    • CPU utilization waiting for I/ OS, also known as IOWAIT, represents the percentage of time spent waiting for I/ OS. If the IOWait is high, the I/O interaction between the system and hardware devices takes a long time
    • The CPU usage of soft and hard interrupts represents the percentage of time that the kernel invokes soft and hard interrupt handlers, respectively. Their high usage indicates that a large number of outages have occurred

View the load average of the system

$ uptimeUp 1124 days, 16:31, 6 Users, Load Average: 3.67, 2.13, 1.79Copy the code
  • 10:54:52 is the current time; Up 1124 days, 16:31 is the system running time. 6 Users indicates the number of users who are logging in. The last three numbers are the Load Average of the past one minute, five minutes, and 15 minutes. The average load refers to the average number of processes in the running state and in the non-interrupted state per unit of time
  • When the average load exceeds 70% of the NUMBER of cpus, you should analyze and troubleshoot high load problems. Once the load is too high, the process may respond slowly, affecting the normal function of the service
  • Relationship between average load and CPU usage
    • Cpu-intensive processes that use a large number of cpus lead to a higher load average are consistent
    • I/ O-intensive processes, waiting for I/ OS can also lead to higher load averages, but CPU usage is not necessarily high
    • A large number of processes waiting for the CPU can also lead to higher load averages and higher CPU utilization

CPU Context Switch

  • Process context switch:
    • The running space of a process can be divided into kernel space and user space. When the code makes a system call (accessing restricted resources), the CPU switches context, and when the system call ends, the CPU switches back from kernel space to user space. One system call, two CPU context switches
    • Normally, the system invokes processes based on certain policies, causing context switchover
    • Context switches also occur when a process blocks and waits to access a resource
    • The process is suspended via the sleep function and context switches occur
    • When a process with a higher priority runs, the current process is suspended to ensure that the process with a higher priority runs
  • Thread context switch:
    • Threads in the same process share the same virtual memory and global variable resources that do not change when the thread context is switched
    • The thread’s own private data, such as stacks and registers, needs to be saved for context switching
  • Interrupt context switch:
    • In order to quickly respond to hardware events, the interrupt interrupts the normal scheduling and execution of the interrupt process, and instead invokes the interrupt handler to respond to device events

Check the context switch status of the system:

The vmstat and pidstat. Vmvmstat displays system indicators. Pidstat displays service indicators of each process

$ vmstat 2 1procs --------memory--------- --swap-- --io--- -system-- ----cpu----- r b swpd free buff cache si so bi bo in cs us sy Id wa st 1 0 0 3498472 315836 3819540 0 0 0 1 2 0 31 96 0 0 -------- Context switch (CS) is the number of context switches per second In (interrupt) is the number of interrupts per second. R (Running or Runnable) is the length of the ready queue, that is, the number of processes Running and waiting for the CPU. When this value exceeds the number of cpus, CPU bottleneck B (Blocked) is the number of processes that are in an uninterruptible sleep stateCopy the code
# pidstat -wLinux 3.10.0-862.el7.x86_64 (8f57EC39327b) 07/11/2021_x86_64_ (6 CPU) 06:43:23pm UID PID CSWCH /s NVCSWCH /s Command 06:43:23pm 0 1 0.00 Java 06:43:23pm 0 102 0.00 0.00 bash 06:43:23pm 0 150 0.00 pidstat -- -- -- -- -- - the indicators analysis -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- PID process id number Cswch/s active task context switches per second Nvcswch number/s passive task context switches per second. Involuntary context switching occurs when a large number of processes are competing for cpusCopy the code

How do I rectify the fault that the CPU is too high

  • Run the top command to view system indicators. This parameter is used if you need to sort by an indicatorName of the top-o fieldSuch as:top -o %CPU. -oYou can specify sort fields in order from largest to smallest
# top -o %MEMTop-18:20:27 up 26 days, 8:30, 2 Users, Load Average: 0.04, 0.09, 0.13 Tasks: 168 total, 1 running, 167 sleeping, 0 stopped, 0 zombie%Cpu(S): 0.3US, 0.5SY, 0.0Ni, 99.1ID, 0.0wa, 0.0HI, 0.1Si, 0.0STKiB Mem: 32762356 total, 14675196 used, 18087160 free, 884 buffers KiB Swap: 2103292 total, 0 used, 2103292 free.6580028 cached Mem PID USER PR NI VIRT RES SHR S %CPU % Mem TIME+ COMMAND 2323 mysql 20 0 19.918g 4.538g 9404 S 0.333 14.52 35:51.44 mysqld 1260 root 20 0 7933492 1.173g 14004 S 0.333 3.753 58:20.74 Java 1520 daemon 20 0 358140 3980 776 S 0.333 0.012 6:19.55 HTTPD 1503 root 20 0 69172 2240 1412 S 0.333 0.007 0:48.05 HTTPD -- -- -- -- -- -- -- -- -- the indicators analysis -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - the first line of statistics area 18:20:27 current time up to 25 days, 17:29 system running time, Format: 1 user Indicates the number of current login users. Load Average: 0.04, 0.09, 0.13 System load, the three values are the average values from 1 minute ago, 5 minutes ago, and 15 minutes ago to now respectively. Tasks: Running Number of running processes Number of sleeping processes Stopped Number of stopped zombie processes Cpu(s) : Cpu information %us: Cpu usage of user-space applications (not scheduled by nice) %sy: %ni: Indicates the CPU usage of user space and programs scheduled by NICE. % ID: idle CPU. %wa: indicates the time the CPU waits for I/OS. Number of soft interrupts processed by CPU Mem Memory Information Total Total of physical memory Used Total of used physical memory Free Total of free memory Buffers Used by the kernel cache Swap Memory information Total Total of Swap used Total of used Swap memory Free Idle Swap area Total Swap area of the cached bufferCopy the code
  • Once we find the relevant process, we can use ittop -Hp pidpidstat -t -p pidCommand to check the CPU usage of specific threads of the process to find the specific threads that cause high CPU usage
    • If % US is too high, you can check the details in the corresponding Java service according to the thread ID to see whether there is an infinite loop or a long time blocking call. Java services can use JStack
    • If % SY is too high, use Strace first to locate the specific system call, and then where is the application code causing it
    • If %si is too high, a network problem may be causing the soft interrupt frequency to spike
    • If the %wa value is too high, it is caused by frequent disk reads and writes.

The Linux memory

Check memory usage

  • Run the top or free and vmstat commands
# top top-18:20:27 up 26 days, 8:30, 2 Users, load Average: 0.04, 0.09, 0.13 Tasks: 168 total, 1 running, 167 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.3US, 0.sy, 0.0Ni, 99.1id, 0.0wa, 0.0hi, 0.1si, 0.0st KiB Mem: 32762356 total, 14675196 used, 18087160 free, 884 buffers KiB Swap: 2103292 total, 0 used, 2103292 free.6580028 cached Mem PID USER PR NI VIRT RES SHR S %CPU % Mem TIME+ COMMAND 2323 mysql 20 0 19.918g 4.538g 9404 S 0.333 14.52 352:51.44 mysqld 1260 root 20 0 7933492 1.173g 14004s 0.333 3.753 58:20.74 Java....Copy the code
  • Cachestat and cacheTop and memleak in the BCC-Tools package
    • Achestat Displays read/write hits of the entire system cache
    • Cachetop Allows you to view cache hits for each process
    • Memleak can be used to check for memory leaks in C and C++ programs

Free Memory indicator of the command

# free -m 
                total used   free   shared  buffers  cached 
Mem:            32107 30414  1692   0       1962     8489 
-/+ buffers/cache:    19962  12144 
Swap:               0     0     0

Copy the code
  • Shared is the size of shared memory, which is never used by a system and is always 0
  • Buffers /cache is the size of the cache and buffer. Buffers are the cache of raw disk blocks. Cache is the page cache that reads files from the disk in the file system
  • Available is the size of available memory for the new process

The memory swap is too high. Procedure

Swap is simply a piece of disk space, or a local file, used as memory. Swap out: Stores unused memory data to disks and releases the memory occupied by the data. Swap, which reads the memory from disk to memory when the process accesses it again

  • Swap and memory reclamation mechanisms
    • The collection of memory includes both file pages (memory-mapped pages that fetch disk files) and anonymous pages (dynamically allocated memory by processes)
    • To recycle file pages, you can recycle the cache directly, or write the dirty pages back to disk and then recycle them
    • The collection of anonymous pages is essentially a Swap mechanism that writes them to disk and frees memory
  • Too high a swap can cause serious performance problems, and page failures can cause frequent pages to be swapped between memory and disk
    • Most online servers have large memory. You can disable swap
    • You can set /proc/sys/vm/min_free_kbytes to adjust the threshold for periodic memory recycling, or set /proc/sys/vm/swappiness to adjust the recycling tendency of file pages and anonymous pages

Linux disk I/O problem

File systems and disks

  • A disk is a storage device (block device, to be exact) that can be divided into different disk partitions. You can also create a file system on a disk or partition and mount it to a directory on the system. The system can read and write files from the mount directory
  • A disk is a block device that stores data and a carrier of a file system. So, file systems really do rely on disk for persistent storage of data
  • When the system reads and writes common files, I/O requests first pass through the file system, and then the file system interacts with disks. When reading or writing block device files, the system skips the file system and directly interacts with disks
  • Buffers in Linux memory are temporary storage of raw disk blocks, that is, used to cache disk data, and are usually not particularly large (around 20MB). The kernel can then centralize scattered writes (optimizing writes to disk)
  • Cached in Linux memory is the page cache for reading files from disk, which is used to cache data read or written from files. The next time the file data is accessed, it is quickly fetched directly from memory without accessing the disk again

Disk Performance Indicators

  • Usage: indicates the percentage of I/O processing time. High utilization, such as more than 80%, usually indicates a disk I/O performance bottleneck.
  • Saturation refers to how busy the disk is in processing I/O. Excessive saturation means that the disk has a serious performance bottleneck. When the saturation is 100%, the disk cannot accept new I/O requests.
  • Input/Output Per Second (IOPS) is the number of I/O requests Per Second
  • Throughput refers to the number of I/O requests per second
  • Response time refers to the interval between sending an I/O request and receiving a response

I/O is too high, how to find the problem, how to tune

  • View the overall SYSTEM disk I/O
# iostat -x -k -d 1 1Linux 4.4.73-5-default (ceshi44) 2021 07月08日 _x86_64_ (40 CPU) Device: RRQM /s WRQM /s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await SVCTM %util sda 0.08 2.48 0.37 11.71 27.80 507.24 88.53 0.02 1.34 14.96 0.90 0.09 0.10 SDB 0.00 1.20 1.28 16.67 30.91 647.83 75.61 0.17 9.51 9.40 9.52 0.32 0.57 ------ RRQM /s: number of merge read requests to the device per second. The file system merges the read requests from the same block. WRQM /s: number of merge write requests to the device per second R /s: number of complete read requests per second. Data read per second (kB) wkB/s: data write per second (kB) AVGRQ-SZ: average data amount of each I/O operation (sector number) AVGQU-sz: average QUEUE length of I/O requests to be processed Avg. IO request waiting time (including waiting time and processing time, in milliseconds) SVCTM: Average IO request processing time (in milliseconds)%Util: indicates the ratio of time used for I/O operations within a cycle, that is, the ratio of time when the I/O queue is not empty

Copy the code
  • View process-level I/ OS
# pidstat -dLinux 3.10.0-862.el7.x86_64 (8F57EC39327b) 07/11/2021_x86_64_ (6 CPU) 06:42:35pm UID PID kB_rd/s kB_wr/s kB_ccwr/s Command 06:42:35pm 0 1 1.05 0.00 0.00 Java 06:42:35pm 0 102 0.04 0.05 0.00 bash ------ kB_rd/s KB kB_wr/s read from the disk per second KB written to Disk per second KB kB_ccwr/s Number of KB written to disk that the task cancels. The Command process executes a Command when a task truncates a dirty PagecacheCopy the code
  • When using pidstat -d to locate the application service, the next step is to use strace and lsof to locate the files on the read/write disk that are causing the high IO
$ strace -p 18940strace: Process 18940 attached ... mmap(NULL, 314576896, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0f7aee9000 mmap(NULL, 314576896, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 1, 0) = 0 x7f0f682e8000 write (3, "the 2018-12-05 15:23:01, 709 - __main"... , 314572844 ) = 314572844 munmap(0x7f0f682e8000, 314576896) = 0 write(3, "\n", 1) = 1 munmap(0x7f0f7aee9000, 314576896) = 0 close(3) = 0 stat("/tmp/logtest.txt.1", {st_mode=S_IFREG|0644, st_size=943718535, ... }) = 0Copy the code
  • Strace command output shows that process 18940 is writing 300m to/TMP /logtest.txt.1
$ lsof -p 18940COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME Java 18940 root CWD DIR 0,50 4096 1549389 /... Java 18940 root 2u CHR 136 0 0t0 3 /dev/pts/0 Java 18940 root 3w REG 8,1 117944320 303 / TMP /logtest.txt ---- FD TYPE indicates the file TYPE. NODE NAME indicates the file pathCopy the code
  • Lsof: process 18940 writes to/TMP /logtest. TXT at a speed of 300MB each time

The Linux NETWORK I/O problem

When a network frame arrives at the nic, the NIC puts the network frame into the packet receiving queue through DMA. A hard interrupt is then used to tell the interrupt handler that the network packet has been received. Next, the NIC interrupt handler allocates the kernel data structure (SK_buff) for the network frame and copies it into the SK_buff buffer; A soft interrupt notifies the kernel that a new network frame has been received. The kernel protocol stack fetches the network frame from the buffer and processes the network frame layer by layer through the network protocol stack

  • Hard interrupt: A hard interrupt is automatically generated by peripherals (such as network adapters and hard disks) connected to the system. It is used to notify the operating system of changes in the state of peripherals. For example, when the nic receives a packet, it issues a hard interrupt
  • Soft interrupts: To meet the requirements of real-time systems, interrupt processing should be as fast as possible. Linux implements this feature by allowing hard interrupts to handle the work that can be done in a short period of time when an interrupt occurs, while soft interrupts do the work that takes longer to process events

Network I/O indicator

  • Bandwidth: indicates the maximum transmission rate of a link, usually in b/s (bits/second).
  • Throughput: Indicates the amount of data successfully transferred per unit of time. The unit is usually B/s (bits/second) or B/s (bytes/second). Throughput is limited by bandwidth, which is the network usage
  • Delay: indicates the time delay from sending a network request to receiving a response from the remote end. This indicator may have different meanings in different scenarios. It can represent, for example, the time it takes to establish a connection (such as TCP handshake delay) or the time it takes for a packet to return (such as RTT).
  • PPS, short for Packet Per Second, indicates the transmission rate in network packets. PPS is usually used to evaluate the forwarding capability of a network, such as a hardware switch, which can achieve linear forwarding (that is, PPS can reach or approach the theoretical maximum). However, Linux server – based forwarding is easily affected by network packet size
  • Network connectivity
  • Number of concurrent connections (TCP connections)
  • Packet Loss rate (packet loss percentage)

View network I/O indicators

  • Viewing Network Configurations
# ifconfig em1Em1 Link encap:Ethernet HWaddr 80:18:44:EB: 18:99 inet ADDR :192.168.0.44 Bcast:192.168.0.255 Mask:255.255.255.0 Inet6 addr: fe80::8218:44ff:feeb:1898/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3098067963 errors:0 dropped:5379363 overruns:0 frame:0 TX packets:2804983784 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 Txqueuelen :1000 RX bytes:1661766458875 (1584783.9 Mb) TX bytes:1356093926505 (1293271.9 Mb) Interrupt:83 ----- TX and RX If the values of errors, dropped, Overruns, carrier, and collisions are not 0, it indicates that network I/O problems occur. Errors indicates the number of packets with errors, such as checksum errors and frame synchronization errors. Dropped indicates the number of discarded packets, that is, the packets have been received by the Ring Buffer but are lost due to insufficient memory. Overruns indicates the number of packets exceeding the limit, that is, the NETWORK I/O speed is too fast. Carrier indicates the number of packets with carrirer errors. For example, the duplex mode does not match and physical cable problems occur. Collisions indicates the number of collision packetsCopy the code
  • Network throughput and PPS
# sar -n DEV 1Linux 4.4.73-5-default (ceshi44) 2022 March 31 月 _x86_64_ (40 CPU) 15:39:40 secs IFACE RXPCK /s TXPCK /s rxkB/s txkB/s RXCMP /s TXCMP /s RXMCST /s %ifutil 15 分 39分41 secs EM1 1241.00 1022.00 600.48 590.39 0.00 0.00 165.00 0.49 15 分 39分41 secs lo 636.00 636.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 15 分 39分41 secs EM2 26.00 20.00 6.63 8.80 0.00 0.00 0.00 0.01 ---- RXPCK /s and TXPCK /s are PPS and TXPCK /s respectively. RxkB /s and txkB/s are the received and sent throughput respectively, in KB/ s. RXCMP /s and TXCMP /s are the received and sent compressed packets, in packets /sCopy the code
  • broadband
# ethtool em1 | grep Speed 
Speed: 1000Mb/s
Copy the code
  • Connectivity and latency
# ping www.baidu.comPING www.a.shifen.com (14.215.177.38) 56(84) bytes of data.64 bytes from 14.215.177.38: Icmp_seq =1 TTL =56 time=53.9 ms 64 bytes from 14.215.177.38: ICmp_seq =2 TTL =56 time=52.3 ms 64 bytes from 14.215.177.38: Icmp_seq =3 TTL =56 time=53.8 ms 64 bytes from 14.215.177.38: ICmp_seq =4 TTL =56 time=56.0 msCopy the code
  • TCP connection statistics ss and Netstat
[root@root ~]$>#ss -ant | awk '{++S[$1]} END {for(a in S) print a, S[a]}'
LISTEN 96
CLOSE-WAIT 527
ESTAB 8520
State 1
SYN-SENT 2
TIME-WAIT 660

[root@root ~]$>#netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
CLOSE_WAIT 530
ESTABLISHED 8511
FIN_WAIT2 3
TIME_WAIT 809
Copy the code

Network requests are slow. How do I tune them

  • In the case of high concurrency, TCP requests increase and a large number of connections in TIME_WAIT state occur, which occupy a large amount of memory and port resources. Kernel options related to TIME_WAIT state can be tuned at this point
    • Increase the number of connections in TIME_WAIT state net.ipv4. tcp_max_tw_BUCKETS and the size of the connection tracking table net.netfilter.nf_conntrack_max
    • Reduce net.ipv4.tcp_fin_timeout and net.netfilter.nf_conntrack_tcp_timeout_time_wait to release the resources occupied by them
    • Enable net.ipv4.tcp_tw_reuse for port reuse. In this way, ports occupied by the TIME_WAIT state can also be used for new connections
    • Increase the range of the local port net.ipv4.ip_local_port_range. This allows more connections to be supported and improves overall concurrency
    • Increases the number of maximum file descriptors. You can use fs.nr_open and fs.file-max to increase the maximum number of file descriptors for the process and system, respectively
  • SYN FLOOD attacks cause performance problems due to TCP attacks. You can optimize kernel options related to SYN status
    • Increase the maximum number of TCP half-connections net.ipv4.tcp_max_syn_backlog or enable TCP SYN Cookies net.ipv4.tcp_syncookies to bypass the half-connections limit
    • Reduces the number of times a SYN+ACK packet is retransmitted for a connection in SYN_RECV state net.ipv4.tcp_synack_retries
  • Speed up the recycling of TCP long connections and optimize keepalive-related kernel options
    • Shorten the interval between the last packet and the Keepalive detection packet net.ipv4.tcp_keepalive_time
    • Shorten the interval for sending Keepalive probe packets net.ipv4.tcp_keepalive_intvl
    • Net.ipv4. tcp_keepalive_probes Reduces the number of retries after a Keepalive probe fails until the application is notified

Java application memory leaks and frequent GC

Distinguish between memory overflow, memory leak, and memory escape

  • Memory leakage: The memory cannot be reclaimed after being applied for. As a result, the memory space is wasted
  • Memory overflow: The memory space is insufficient during memory application
    • 1- The memory upper limit is too small
    • 2- Memory loads too much data
    • 3- Allocating too much memory without collecting it, causing a memory leak
  • Memory escape: Data that should be allocated on the stack but needs to be allocated on the heap while the program is running
    • Objects in Java are allocated on the heap, and garbage collection reclaims objects that are no longer used in the heap, but it takes time to sift through recyclable objects, reclaim objects, and declutter memory. If escape analysis can be used to determine that an object will not escape the method, then the object can be allocated memory on the stack, and the memory occupied by the object can be destroyed as the stack frame leaves the stack, reducing the burden of garbage collection
    • Thread synchronization itself is time consuming. If it is certain that a variable cannot escape from the thread and be accessed by other threads, then there is no contention for reading and writing the variable, and the synchronization of the variable can be cleared
    • Primitive data types (int, long, reference, and so on) in the Java virtual machine cannot be further decomposed; they are called scalars. If a piece of data can be further decomposed, it is called an aggregate quantity, and the most typical aggregate quantity in Java is an object. If escape analysis proves that an object cannot be accessed externally and that the object is decomposable, the program may not create the object at runtime, instead creating its member variables that are used by the method. The disassembled variables can be analyzed and optimized separately, and space can be allocated on each stack frame or register, without the need to allocate space to the original object as a whole

How can I locate and handle memory leaks

  • usejmap -histo:live [pid]jmap -dump:format=b,file=filename [pid]The former can count the size and number of heap memory objects, while the latter can dump the heap memory
  • Specified in the startup parameters-XX:+HeapDumpOnOutOfMemoryErrorTo save the OOM dump file
  • Using JProfiler or MAT software to view heap memory objects makes it more intuitive to find leaked objects

Java thread troubleshooting

Java thread state

  • NEW: corresponds to a thread that is not Started, corresponding to a NEW ecosystem
  • RUNNABLE: Combination of ready and running states
  • BLOCKED, WAITING, and TIMED_WAITING: all three are BLOCKED
    • Sleep and join are called WAITING
    • Sleep and JOIN methods with timeout set are TIMED_WAITING
    • Wait and IO flows are called BLOCKED
  • TERMINATED: indicates death

The thread is deadlocked or blocked

  • Jstack - pid l | grep -i - E 'BLOCKED | deadlock'With -l, the jstack command quickly prints out the code causing the deadlock
# jstack -l 28764Full Thread Dump Java HotSpot(TM) 64-bit Server VM (13.0.2+8 Mixed mode, Sharing):..... Elapsed =0. Tid = 0x000001b3C25F7000 NID = 0x4ABC Waiting for monitor entry "Thread0" #14 ELAPSED =0 os_PRIo =0 CPU =0.00ms Elapsed =598.37s TID = 0x000001b3C25F7000 nID = 0x4ABC Waiting for monitor entry [0x00000061661fe000] java.lang.Thread.State: BLOCKED (on object monitor) at com.Test$DieLock.run(Test.java:52) - waiting to lock <0x0000000712d7c230> (a java.lang.Object) - locked <0x0000000712d7c220> (a java.lang.Object) at Java. Lang. Thread. The run ([email protected] / Thread. Java: 830) Locked ownable synchronizers: - Elapsed =0. Elapsed =0. Tid = 0x000001b3C25F8000 nID =0x1984 Waiting for monitor entry [0x00000061662ff000] java.lang.Thread.State: BLOCKED (on object monitor) at com.Test$DieLock.run(Test.java:63) - waiting to lock <0x0000000712d7c220> (a java.lang.Object) - locked <0x0000000712d7c230> (a java.lang.Object) at Java. Lang. Thread. The run ([email protected] / Thread. Java: 830)... Found one Java-level deadlock: ============================= "Thread-0": waiting to lock monitor 0x000001b3c1e4c480 (object 0x0000000712d7c230, a java.lang.Object), which is held by "Thread-1" "Thread-1": waiting to lock monitor 0x000001b3c1e4c080 (object 0x0000000712d7c220, a java.lang.Object), which is held by "Thread-0" Java stack information for the threads listed above: =================================================== "Thread-0": at com.Test$DieLock.run(Test.java:52) - waiting to lock <0x0000000712d7c230> (a java.lang.Object) - locked <0x0000000712d7c220> (a java.lang.object) at java.lang.thread. run([email protected]/ thread.java :830) "thread1 ": at com.Test$DieLock.run(Test.java:63) - waiting to lock <0x0000000712d7c220> (a java.lang.Object) - locked < 0x0000000712D7C230 > (a java.lang.object) at java.lang.thread. run([email protected]/ thread.java :830) Found 1 deadlock.Copy the code
  • Jstack logs show that the thread is blocked at line test.java :52, and a deadlock occurred

Common JVM tuning startup parameters

  • -verbose: displays information about each GC
  • -verbose: jNI outputs information about native method calls. It is generally used to diagnose JNI invocation errors
  • -xms n Specifies the initial size of the JVM heap. The default size is 1/64 of the physical memory and the minimum size is 1 MB. You can specify the unit, such as k or m. If you do not specify the unit, the default unit is byte
  • -Xmx n Specifies the maximum size of the JVM heap. The default value is 1/4 or 1 gb of physical memory, and the minimum value is 2 MB. The unit is the same as -xms
  • -Xss n Sets the size of a single thread stack. The default value is 512 KB
  • -xx :NewRatio=4 Sets the ratio of the young generation (including Eden and two Survivor zones) to the old generation (excluding persistent generation). If set to 4, the ratio of young generation to old generation is 1:4, and the young generation accounts for 1/5 of the entire stack
  • -Xmn Sets the memory size of the new generation. Total heap size = young generation size + Old generation size + persistent generation size. The permanent generation has a fixed size of 64m, so increasing the young generation will reduce the size of the old generation. This value has a significant impact on system performance. Sun officially recommends setting it to 3/8 of the entire heap
  • -xx :SurvivorRatio=4 Sets the size ratio of Eden zone to Survivor zone in the young generation. If set to 4, the ratio of two Survivor zones to one Eden zone is 2:4, and one Survivor zone accounts for 1/6 of the whole young generation
  • -xx :MaxTenuringThreshold=0 Sets the maximum garbage age. If set to 0, the young generation object passes through the Survivor zone and goes directly to the old generation. For the older generation of more applications, can improve efficiency. If this value is set to a large value, the young generation object will be copied multiple times in the Survivor zone, which increases the lifetime of the object in the young generation and increases the probability that it will be recycled in the young generation

Corrections are welcome

Refer to the article

  • Linux Performance Optimization Practice (Ni Pengfei)
  • JAVA online troubleshooting routine, from CPU, disk, memory, network to GC one-stop!
  • Java application online troubleshooting ideas, tools summary
  • Double eleven pressure test & Summary of Java application performance problems
  • # Learning so much in one online JVM Young GC tuning!