“This article has participated in the call for good writing activities, click to view: the back end, the big front end double track submission, 20,000 yuan prize pool waiting for you to challenge!”

The average load

You can know the load of the system by running the top or uptime command, as shown in the figure:

Meaning of each column output:

The first line contains the current time, system running time, and the number of users logged in

Load Average: indicates the average load in the past 1 minute, 5 minutes, and 15 minutes respectively

CPU utilization

CPU usage is a statistic of CPU usage per unit time. It does not correspond to the average load. Such as:

  • Cpu-intensive processes, where using a large number of cpus leads to a higher load average, are the same;

  • I/ O-intensive processes, waiting for I/ OS can also lead to higher load averages, but CPU usage is not necessarily high;

  • A large number of processes waiting for the CPU can also lead to higher load averages and higher CPU utilization.

Meaning of load average

The average load refers to the average number of processes that are running and not interrupted per unit of time. The average load includes not only the processes that are using the CPU, but also the processes that are waiting for the CPU and I/O. This is the average number of active processes, which is not directly related to CPU usage.

  • A Runnable process that is using the CPU or waiting for the CPU is a process that is in the R state (Running or Runnable) as seen in the ps command

  • A process that is part of a kernel-state critical process is not interruptible. The most common is waiting for an I/O response from a hardware device, known as the D state (Uninterruptible Sleep) in the PS command. Also known as Disk Sleep). An uninterruptible state is actually a mechanism by which the system protects processes and hardware devices.

How much load average is appropriate

The optimal load is exactly one process running on each CPU, so that each CPU is fully utilized. So when evaluating the load average, the first thing you need to know is how many cpus your system has, which you can query with the following command

# about grep and the use of the wc, please check them manual or web search grep 'model name/proc/cpuinfo | wc -lCopy the code

When the average load is 2, what does that mean?

  • On a system with only 2 cpus, this means that all cpus are just about fully occupied.
  • On a 4-CPU system, that means the CPU is 50% idle.
  • On a system with only one CPU, this means that half of the processes cannot compete for the CPU.

When the average load is greater than the number of cpus, the system is already overloaded.

Example of load Average change

If the value of 1 minute is much smaller than the value of 15 minutes, it indicates that the system load has decreased in the last 1 minute and was heavy in the last 15 minutes.

If the value of 1 minute is much higher than the value of 15 minutes, it indicates that the load has increased in the last 1 minute. This increase may be temporary or may continue to increase, so you need to keep watching. Once the average load per minute approaches or exceeds the number of cpus, it indicates that the system is experiencing an overload problem, and it is time to analyze and investigate what is causing the problem and find ways to optimize it.

In general, when the average load exceeds 70% of the NUMBER of cpus, it is time to analyze and troubleshoot high load problems

However, the figure of 70% is not absolute. The most recommended method is to monitor the average load of the system and determine the trend of load based on more historical data. When there is a significant increase in the load, such as a doubling of the load, do the analysis and investigation.

Load average troubleshooting and case hypothesis

Use the iostat, mpstat, and pidstat tools to identify the root cause of the high load average

To prepare

Pre-install the stress and sysstat packages, such as apt install Stress sysstat

  • Stress: A Linux system stress test tool that can be used as an abnormal process to simulate scenarios with elevated load averages.
  • sysstat: contains common filesLinux Performance ToolsTo monitor and analyze system performance. The example uses two commands from this packagempstatpidstat.
    • Mpstat is a common multi-core CPU performance analysis tool. It is used to view the performance indicators of each CPU and the average performance indicators of all cpus in real time.
    • Pidstat is a common process performance analysis tool used to view real-time performance indicators of processes, such as CPU, memory, I/O, and context switching.

Scenario 1: CPU-intensive processes

  1. Running on the first terminalstress --cpu 1 --timeout 600Command to simulate a 100% CPU usage scenario(Do not want to simulate can be ignored):
  2. Run on the second terminalwatch -d uptimeView the load average changes(The -d parameter indicates the area where the change is highlighted)
  3. Running on the third terminal mpstat -P ALL 5View the CPU usage change(-p ALL means to monitor ALL cpus, followed by the number 5 means to output a set of data after an interval of 5 seconds)

    As you can see from [2], the 1-minute load average slowly increases to 1.00, and you can also see from [3] that there is exactly one CPU with 100% utilization and only 0 IOwait. This shows that the increase in load average is due to 100% CPU utilization.

  4. usepidstat -u 5 1To find out which process is responsible for 100% CPU utilization(Output a set of data after an interval of 5 seconds)

    It is clear from this that the CPU usage of the Stress process is 100%.

Scenario 2: I/ O-intensive processes

  1. run stress -i 1 --timeout 600Command, but this time simulates the I/O pressure, that is, sync is constantly executed
  2. Run on the second terminalwatch -d uptimeView the load average changes(The -d parameter indicates the area where the change is highlighted)
  3. Running on the third terminal mpstat -P ALL 5 1View the CPU usage change(-p ALL means to monitor ALL cpus, followed by the number 5 means to output a set of data after an interval of 5 seconds)

    As can be seen from [3], the 1-minute average load slowly increases to 1.06, where the CPU usage of a one-CPU system increases to 23.87 and the IOWAIT reaches 67.53%. This indicates that the increase in average load is due to the increase in IOWAIT.

  4. usepidstat -u 5 1To find out which process caused the IOWait to be so high(Output a set of data after an interval of 5 seconds)

    The stress process is responsible for the stress process.

Scenario 3: A large number of processes

  1. usestress -c 8 --timeout 600, this time simulating 8 processes (in a system with only 2 cpus, the CPU of the system is severely overloaded)

  2. Run on the second terminal mpstat -P ALL 5 1View the CPU usage change(-p ALL means to monitor ALL cpus, followed by the number 5 means to output a set of data after an interval of 5 seconds)
  3. runpidstat -u 5 1Let’s take a look at the progression(Output a set of data after an interval of 5 seconds)


    As you can see, eight processes are competing for two cpus, and each process waits 75% of the time for the CPU (i.e., the %wait column in the code block). These processes exceed the computing power of the CPU and eventually lead to CPU overloads.

conclusion

Load averaging provides a quick look at the overall performance of the system and reflects the overall load. But by looking at the load average itself, we can’t directly see where the bottleneck is.

When understanding load averaging, note:

  • High load averages can be caused by CPU-intensive processes;
  • A high load average does not necessarily mean higher CPU utilization, but it may also mean busier I/O;
  • This parameter can be used when the load is highmpstat,pidstatAnd other tools to help analyze the source of the load.

other

Pidstat does not display %wait

Upgrade systAT to 11.5.5 or later

#Download the source code through Git
git clone git://github.com/sysstat/sysstat
#The system configures sysstat
cd  sysstat/
./configure
#Compilation and installation
make & make install
#Verify that the installation is successful
mpstat -V
#/usr/bin/mpstat: No such file or directory 
cp pidstat /usr/bin/
cp mpstat /usr/bin/
Copy the code

👍 ❤️ Don’t get lost

The article continues to update every week, you can search wechat “ten minutes to learn programming” the first time to read and urge more, if this article is not bad, feel something if you support and recognition, is the biggest power of my creation, we will see the next article!