This is the sixth day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021

preface

In a production environment, we typically use commands in a Linux environment to monitor host load, such as CPU usage and memory usage per program. The top command I use the most in production environments is the top command, which looks at some metrics to determine how the machine is working under load.

use

Under the Liunx environment, we can enter the interface as shown below by entering the top command.

Index analysis

If we want to use top better, we need to understand the meaning of each indicator

Line 1: Load condition

  1. Two times: one is the system time, one is the machine running time

  2. 3 load average: represents the load situation of the machine in 1MIN, 5MIN and 15MIN respectively. Your number represents the average number of running processes + waiting processes at each time

Line 2: Task situation

It mainly checks the process situation on the machine, how many tasks there are in total, how many tasks are running, sleeping, how many tasks are terminated, and how many tasks have been observed in zombie state

Line 3: CPU information

You can view the CPU information of the host and press 1 to view the number of cpus.

  1. US/SY: CPU usage of user processes and system processes
  2. NI: NICE: indicates the proportion of processes whose thread priority has been adjusted. Normally, the proportion should not be large
  3. ID: indicates idle
  4. WA: indicates the waiting time for resources. The value of consumed resources increases
  5. HI: Hard interrupts: Usually caused by peripherals. If HI spikes, it means there is a problem with the peripherals at the hardware level
  6. SI: soft interrupt
  7. ST: steel. If the host is a virtual machine, it will have ST information, which is the percentage of time slices that the virtual machine obtains from the host CPU

The fourth and fifth lines focus on buffer and cache

  1. Buffer is the data to be processed to solve the problem of speed mismatch between systems; Cache is the cache of result data
  2. SWAP partition: Use hard disks as part of the cache. If SWAP is frequent, the memory is insufficient

Process List description

  1. PID process ID, User, RR priority, VIRT virtual memory,
  2. RES: resident memory, which indicates the memory occupied by the process, not the requested memory
  3. SHR: shared memory. Therefore, the actual memory occupied by the current process is res-shr

CPU usage is abnormal

When we observe the CPU in the running to observe the high CPU usage of Java programs, we can follow the following ideas to check.

  1. Observe the PROCESS PID that has high CPU usage
  2. Top-hp PID to view the TID of the thread under the process
  3. Convert TID from decimal to hexadecimal NID
  4. Use jStack PID to dump a few more times to see the State of the thread with nID = nID

conclusion

This is the basic usage of top. In large cluster size, we can use some monitoring tools to monitor all the hosts.