One of the biggest fears on the back end is a sudden surge in server load, which could mean a flurry of incoming calls. When this happens, your first reaction is to log in to the server and type a top command to see Load Average. Today, this article will tell you how to look at this “Load Average”.

Load Average

Many people say that Load Average alone is an indicator of high system Load, and this is true. So where exactly is the pressure? How do you calculate these three numbers? It may be difficult for many people to say.

Let’s start with the definition: the sum of the number of processes that are being processed and waiting to be processed by the CPU over a period of time. The three numbers represent the statistics of 1, 5 and 15 minutes respectively.

So, this number does reflect the load on the server. However, a high number does not directly indicate a problem with the machine’s performance. This could be because CPU intensive computations are taking place, or the run queue is blocked because of an I/O problem. So when we see a spike in numbers, it’s going to be a case by case basis. Upgrading machines directly is simple and crude, but it is a symptom rather than a cure.

Look at the top command line by line

The top command outputs a number of parameters, and the actual server load will be seen in combination with other parameters.

Top-20:41:08 UP 18 days, 5:24, 2 Users, load average: 0.04, 0.03, 0.05 Top: current time Up: number of users: number of current users Load Average: indicates the load in the past 1 minute, 5 minutes, and 15 minutes respectivelyCopy the code

The three values of Load Average really need to be paid attention to. As we all know, a CPU can only run one process in a time slice, and the number of CPU cores directly affects the number of processes that the machine can run at the same time. Therefore, generally speaking, the value of Load Average should not exceed the total number of cores of this machine, and there is basically no problem.

Second line: Tasks: 216 Total, 1 running, 215 sleeping, 0 stopped, 0 zombie Tasks: Number of processes running: running process sleeping: dormant process stopped: Zombie: Stopped processCopy the code

The more RUNNING, the greater the server’s natural stress.

%Cpu(s): 0.2us, 0.sy, 0.0Ni, 99.8ID, 0.0wa, 0.0hi, 0.0Si, 0.0st US: user process Cpu usage sy: system process Cpu usage ni: User process space changed priority ID: idle CPU usage WA: percentage of CPU time waiting for input and output HI: hardware interrupt request SI: software interrupt request ST: Steal TimeCopy the code

This line represents CPU usage. The US is chronically high, indicating that user processes are consuming a lot of CPU time. If the us+ SY exceeds 80 or 90 for a long period of time, it may indicate poor CPU performance and need more CPU.

KiB Mem: 65810456 Total, 30324416 Free, 9862224 Used, 25623816 buff/cache KiB Swap 7999484 Total, 7999484 Free, 0 Used. 54807988 Avail Mem Total: total memory free: free memory Used: Used buffer/cache: write cache/read cacheCopy the code

The fourth and fifth lines are memory information and swap information, respectively. All applications run in memory, so memory performance is very important to the server. However, when the memory free becomes less, in fact, we don’t need to be too nervous. What you really need to look at is the used information in Swap. A Swap partition is a Swap partition provided by hard disks. When the physical memory is insufficient, the OPERATING system (OS) puts unused data into a Swap partition. So when this number gets high, it means that memory is really low.

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 19868 root 20 0 19.733g 369980 15180 S 0.7 0.6 129:53.91 Java 19682 root 20 0 19.859g 5.766g 22252 S 0.3 9.2 139:42.81 Java 54625 100 20 0 50868 33512 4104 S 0.3 0.1 0:04.68 Fluentd PID: indicates the process ID. USER: indicates the process owner. PR: indicates the priority. NI: Nice value. A negative value indicates a high priority. VIRT: total virtual memory used by a process SWAP: swapped out size of virtual memory used by a process RES: physical memory used by a process but not swapped out SHR: shared memory size SHR: shared memory size S: process status. D is the uninterruptible state of sleep; R stands for run; S stands for sleep; T stands for trace/stop; Z stands for zombie process. %CPU: percentage of CPU usage since last update. %MEM: percentage of physical memory used by the process; TIME+: total CPU TIME used by a process (unit: 1/100 second). COMMAND: indicates the COMMAND name/COMMAND lineCopy the code

This is the process information, which gives you an overview of which processes are consuming system resources.

Other commands

Top is, of course, our most common command to check the health of the system. There are many other commands. Vmstat, w, uptime, and iostat are common commands.

To sum up

Once you understand these parameters, you will know which aspects of your code need to be improved, whether it is to optimize memory consumption or to optimize your code logic. Of course, the mineless heap machine can also be improved, as long as you convince the boss!