Warm tips, GIF has been compressed, flow party rest assured to view. Not much on CPU, but let’s learn some commands. This is the second article in the Cast Away series, looking at the CPU vertically. See Synonyms at:

Linux “Cast Away” (a) preparation

How to make a CPU

CPU is a kind of chip, we take han Core as an example, look at the production of seven steps.

  • Purification precision of 11 9 silicon wafers (99.999999999%)
  • Generate the wafer
  • Use a lithography machine to process the wafer
  • Use etching machine grooves
  • Complete the production of P-type semiconductor
  • Use 200 size coarse sandpaper to erase the original logo
  • Paint the new logo bingo, done!

Although the CPU is small, the equipment to produce it is not simple. As shown below, it is a photolithography machine that weighs more than ten tons and covers an area of hundreds of square meters.

Find the thread with the highest CPU usage

Let’s look at a practical example. The company was a bit poor, so it had multiple Java applications mixed up on the machine, and then one day, the CPU exploded, and we had to find out who was causing it. It’s not a process, it’s a thread, the one closest to the truth.

The traditional way

The usual practice is:

  • Enter on the command linetopAnd thenshift+pLook for the process with the highest CPU usage and note down the process number
  • Enter on the command lineTop-hp Indicates the process IDTo view the threads with the highest CPU usage
  • usePrintf 0x%x Thread number, gets its hexadecimal thread number
  • useJstack process,Get the Java execution stack, and thengrepFind the corresponding information in hexadecimalRecord a screen first

Pull radish with mud

But I want to implement this functionality in a different (most diverse) way, along with a few other commonly used commands.

ps -eo %cpu,pid |sort -n -k1 -r | head -n 1 |  awk '{print $2}' |xargs  top -b -n1 -Hp | grep COMMAND -A1 | tail -n 1 | awk '{print $1}' | xargs printf 0x%x
Copy the code

This line of Shell means to find the hexadecimal number of the highest CPU using thread of the highest CPU using process. That’s a long order, isn’t it? Don’t be afraid. We’ll do it little by little. Generally speaking, if you practice these few commands, you can handle 50% of the common tasks. Come on, above. 15423589014500.jpg

What are the commands to view the CPU

top

In fact, it can be seen from the above command, top and PS commands are interchangeable, but the form of expression is different, we directly take top for example. Press the number 1 to display CPU usage per core. They’re basically abbreviations of words that you can’t forget after reading them a few times. Such as:

us ==> user CPU time
Copy the code

  • If the load exceeds the number of CPU cores, the load is too high
  • If the WA value is too high, you can judge that the I/O is abnormal
  • Sy, SI, HI, ST, any one above 5% is problematic
  • The process status is in D, Z, and T states for a long time to improve the attention
  • CPU is unbalanced. Check affinity and priority

vmstat

Vmstat presents some information in a different form. As shown in figure:

  • bNumber of kernel threads placed on wait queues (waiting for resources, waiting for input/output). A large number means the CPU is too busy.
  • csIf context switches are frequent, consider whether the number of threads is too open
  • si/soShows the status of swap partitions, which sometimes cause CPU problems

sar

One of the most comprehensive system performance analysis tools available on Linux today, but it may not be pre-installed. Run the following command on centos to install it.

yum install sysstat -y
Copy the code

The main benefits of SAR are the ability to see history, display friendliness, and secondary processing of results. SAR also has graphical tools. All data can be obtained by executing SAR-A.

https://github.com/vlsi/ksar
Copy the code

In terms of CPU, we focus on:

  • The SAR – u by default
  • SAR -p ALL Indicates the usage of each CPU
  • The length of the Sar-q CPU queue, runq-sz> CPU count, indicates a bottleneck
  • Sar-w, as you can see from the context exchange per second, focuses on just a few points.

    # # the mpstat andpidstat, including colordstat, functions are almost the same, with a cooked OK.

Where the data comes from

So where does the data come from? The /proc directory is a virtual directory that stores a special set of files for the current kernel, and you can not only view some of the state, but even modify some of the values to change the behavior of the system.

Such as top’s load (using the uptime command to get the same result). It reads the /proc/loadavg file and reads the /proc/stat file for each CPU core

These commands are parsed and friendly representations of a set of information in the /proc directory, values that the Linux kernel has calculated to lie there.

A few examples

This article belongs to wechat public account “little sister taste”, reprinted with indication of source. High CPU is a symptom. In addition to the fact that the system is overloaded, the rest are caused by other reasons, such as I/O; Like equipment. These will be discussed in other chapters.

The CPU is too high due to GC

Let’s go back to our original example. By looking at jStack to find the corresponding hexadecimal process, it turns out to be a GC thread.

"VM Thread" prio=10 tid=0x00007f06d8089000 nid=0x58c7 runnable 
 
"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007f06d801b800 nid=0x58d7 runnable 
Copy the code

In this case, the JVM is running out of memory and GC is going crazy, either because the socket/ thread forgot to close, or because large objects are not being collected. This can only be resolved by restarting. Before restarting, use jmap to dump the stack. Of course, you might get JDK version issues.

The ST % ratio is too high

If the ST is too high, the physical CPU resources are insufficient, that is, the ST occurs only on VMS. If you buy a virtual machine with a consistently high ST, your service provider may be oversold and eating into your resources. Take a look at your virtual machine when you don’t believe double 11?

The CPU is too high due to the network adapter. Procedure

Business side several Kafka, CPU usage is at the normal level, only about 10%, but there is a core CPU, the load is particularly high, SI is extremely high.

Run the mpstat -i SUM -p ALL command to check the CPU usage.

20:15:18 CPU INTR/S 20:15:23 all 34234.20 20:15:230 9566.20 20:15:231 0.00Copy the code

When a network adapter needs CPU service, it throws an interrupt that tells the CPU what happened, and the CPU has to stop working to handle the interrupt. In fact, by default, all interrupt processing is concentrated on CPU0, resulting in excessive server load. Cpu0 became the bottleneck, while other cpus remained idle. The first call at ➋ did the AFFINITY function. Kafka ignored the CPU that the card used. /proc/irq/{seq}/smp_affinity we can install irqBalance directly, and then execute it.

yum install irqbalance -y 
service irqbalance start
Copy the code

Low CPU usage but high load

CPU ID % is high, that is, idle, such as 90%. But the load average is very high, like 10 with 4 cores.

Analysis: If the load average is high, it indicates that its tasks have been queued and many tasks are waiting. In this case, there may be a large number of non-interruptible processes.

Use top or PS to see the corresponding status of the process.

ps aux 
Copy the code

Linux Process Status (PS stat) R, S, D, T, Z, X

High frequency problem: Load

What load stands for

In plain English, load represents the queuing of your system’s processes.

  • There are only 4 cars on the road, and the load is about 0.5
  • There are 8 cars on the road, which can safely pass end to end. In this case, the load is about 1
  • There are 12 cars on the road, in addition to the 8 cars on the road, there are 4 intersection waiting outside, that is, the capacity exceeded, need to queue. The load is about 1.5

Load equals 1

There is still a lot of misunderstanding on this issue. A lot of you think, load goes up to 1, the system hits a bottleneck, but that’s not exactly true. The value of load depends on the number of CPU cores:

  • Single-core CPU reaches 100%, load is about 1
  • Both dual-core cpus reach 100%, and the load is about 2
  • The CPU of all four cores reaches 100% and the load is about 4

So, for a machine that’s loaded to 10 but has 16 cores, your system is nowhere near the load limit.

At the end

The actual troubleshooting process for this article is minimal because CPU problems are usually accompanied by other problems. But the commands presented here are not simple, especially with their rich arguments. These parameters can be seen by executing man. Such as:

man top
Copy the code

Of course, you can do that