This paper is participating in the30 years of Linux”Topic essay activity

CPU usage is the most intuitive and commonly used indicator of system performance, and the first indicator we usually focus on when troubleshooting performance problems. So we need to be familiar with what it means.

What about CPU utilization

CPU usage is a percentage of CPU usage per unit time. As a multitasking operating system, Linux divides the time of each CPU into very short time slices, which are allocated to each task in turn through the scheduler, thus creating the illusion of multitasking.

To maintain CPU time, Linux uses metronome to trigger time interrupts for usage statistics.

Analysis of related nouns

Cadence HZ: Kernel optional, can be set to 100, 250, 1000, etc. (that is, trigger 100, 250, 1000 time interrupts per second). Different systems may have different values. You can query the cat /boot/config kernel options to view its configuration. Linux triggers a time interrupt with a predefined beat rate (expressed as HZ in the kernel) and records the number of beats since startup using the global variable Jiffies (the value of Jiffies increases by 1 for each time interrupt)

Check the system kernel beat rate:grep 'CONFIG_HZ=' /boot/config-$(uname -r)

USER_HZ (User-space tempo) : Because tempo HZ is a kernel option, user-space programs are not directly accessible. So the kernel provides a user-space beat rate USER_HZ, which is fixed at 100, or 1/100 of a second. This way, the user-space program doesn’t need to care what HZ is set in the kernel, because it always sees a fixed value of 100.

Important indicators related to CPU usage

#This section describes how to view statistics about cpus and tasks in the system
$ cat /proc/stat | grep ^cpu
Copy the code

The first column represents the CPU number, and the first row of unnumbered cpus represents the sum of all cpus

The other columns show the total number of CPU beats in different scenarios in USER_HZ, which is 10 ms (1/100 of a second), so this is actually the CPU time in different scenarios. You can learn more about the meaning of the columns by viewing the documentation in Man Proc

  1. User (often abbreviated to US) stands for user-mode CPU time. Note that it does not include the following nice times, but does include guest times.
  2. Nice (often abbreviated to NI) represents low-priority user-mode CPU time when the nice value of a process is adjusted to be between 1 and 19. Note that the value of nice ranges from -20 to 19, and the higher the value, the lower the priority.
  3. System (often abbreviated sys) stands for kernel CPU time.
  4. Idle (often abbreviated to ID) stands for idle time. Note that it does not include time to wait for I/O (IOwait).
  5. Iowait (often shortened to WA) represents the CPU time to wait for I/O.
  6. Irq (often abbreviated to HI) represents the CPU time to process hard interrupts.
  7. Softirq (often abbreviated si) represents the CPU time to process soft interrupts.
  8. Steal (often abbreviated to ST) represents the CPU time consumed by other VMS while the system is running in the virtual machine.
  9. Guest (often shortened to guest) represents the time that other operating systems are running through virtualization, that is, the CPU time that the virtual machine is running.
  10. Guest_nice (gnICE for short), the time required to run a VM with a low priority.

CPU usage calculation method

How do I check the CPU usage

To see the CPU usage, you must first read the /proc/stat and /proc/[pid]/stat files and then follow the formula above. But now various performance analysis tools have done the math for us.

top

Top shows the overall CPU and memory usage of the system, as well as the resource usage of each process. By default, top displays the average of all cpus, and you can switch to the per-CPU usage by pressing the number 1

ps

Ps only shows the resource usage for each process.

pidstat

Pidstat details for each process,

  • User-mode CPU usage (%usr);
  • Kernel CPU usage (%system);
  • Running VM CPU usage (%guest);
  • Wait for CPU usage (%wait);
  • And total CPU usage (%CPU).

What can I do if the CPU usage is too high?

GDB

There is a powerful program Debugger called GDB (The GNU Project Debugger), but GDB is not suitable for The early use of performance analysis. The GDB debug process interrupts program execution, which is often not allowed in online environments. Therefore, GDB is only suitable for use in the later stages of performance analysis, when you have identified the general function that is failing and then use it offline to further debug the internal problems of the function

perf

Perf is a built-in performance analysis tool for Linux2.6.31. Based on performance event sampling, it can be used not only to analyze various events and kernel performance of the system, but also to analyze performance problems of specific applications

Perf Top can be used to find hot functions because it displays the functions or instructions that consume the most CPU clock in real time

The first row contains three data, which are Samples, event type, and Event count. 1. The first Overhead column is the percentage of the symbol's performance events in all samples. 2. The second column Shared refers to the Dynamic Shared Object where the function or instruction resides, such as the kernel, process name, Dynamic link library name, and kernel module name. 3. The third column Object is the type of the dynamic shared Object. For example, [.] represents user-space executables, or dynamically linked libraries, while [k] represents kernel space. 4. The last column of Symbol is the name of the function. When the function name is unknown, it is represented by a hexadecimal address.Copy the code

perf recordProvides the function of saving data, saved data, you need to useperf reportParse the presentation, plus-gParameter to enable sampling of call relationships, facilitating performance analysis based on the call chain.

conclusion

  • If the user CPU and Nice CPU are high, the user process occupies a large number of cpus. Therefore, check the process performance.
  • If the system CPU is high, it indicates that the kernel mode consumes more CPU. Therefore, the performance of the kernel thread or system call should be checked.
  • If the I/O waiting time for the CPU is high, the I/O waiting time is long. Therefore, check whether the SYSTEM storage system has AN I/O problem.
  • The high value of soft and hard interrupts indicates that the handlers of soft and hard interrupts occupy more CPU. Therefore, the interrupt service routines in the kernel should be checked.
  • When you have an increase in CPU usage, you can use tools such as Top and Pidstat to identify the source of the CPU performance problem. Then use tools such as PERf to identify specific functions that are causing performance problems.

👍 ❤️ Don’t get lost

The article continues to update every week, you can search wechat “ten minutes to learn programming” the first time to read and urge more, if this article is not bad, feel something if you support and recognition, is the biggest power of my creation, we will see the next article