nvidia-smi

The NVIDIA System Management Interface (NVIDIa-SMI) is a command line utility based on the NVIDIA Management Library (NVML) designed to help manage and monitor NVIDIA GPU devices.

Viewing GPU Parameters

View the GPU running status

nvidia-smi
Copy the code
Sun Mar 28 02:40:38 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 | | -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | GPU Name Persistence -m | Bus - Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 108... On | 00000000:2:00. Off | 0 N/A | | | 23% 29 c P8 9 w / 250 w 611 mib / 11178 mib | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 108... On | 00000000:03:00 Off | 0 N/A | | 23% 30 c | 0 mib P8 9 w / 250 w / 11178 mib 0% Default | | +-------------------------------+----------------------+----------------------+ | 2 GeForce GTX 108... : 00000000-82 On | Off | 00.0 N/A | | 23% 30 c | 0 mib P8 9 w / 250 w / 11178 mib 0% Default | | +-------------------------------+----------------------+----------------------+ | 3 GeForce GTX 108... : 00000000-83 On | Off | 00.0 N/A | | 23% 30 c | 0 mib P8 9 w / 250 w / 11178 mib 0% Default | | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 33777 C /usr/bin/python 601MiB | +-----------------------------------------------------------------------------+Copy the code

This is the GEFORCE GTX 1080 TiGPU server running information.

  • The first line is command line tool version, GPU driver version, and CUDA version
  • The first column is GPU(GPU card number, 0 ~ 4) and Fan(Fan speed, 0 ~ 100%).
  • The second column is: Name(graphics card Name), Temp(temperature, degree Celsius)
  • The third column is: Perf(performance status, P0 to P12, highest performance P0, lowest performance P12)
  • The fourth column is: persistence-M (continuous mode, default is off, energy saving, if set to ON, it consumes a lot of energy, but it takes a shorter time to start a new GPU application), Pwr:Usage/Cap(energy consumption)
  • The fifth column are: Bus – Id (GPU Bus, domain: Bus: device. The function)
  • Column 6: Disp.A(whether GPU display is initialized), memory-usage (video Memory Usage)
  • Column 7: Volatile GPU-util
  • ECC(Error Correcting Code), Compute M.
  • The following table shows the resource usage of each GPU process

Note: Graphics memory occupation and GPU occupation are two different things. Graphics card is composed of GPU and graphics memory. The relationship between graphics memory and GPU can be simply understood as the relationship between memory and CPU.

Obtain GPU ID information

nvidia-smi -L
Copy the code

The GPU card number, GPU model, and GPU physical UUID are displayed from left to right

GPU 0: GeForce GTX 1080 Ti (UUID: GPU-5da6e67e-fd5a-88fb-7a0e-109c3284f7bf)
GPU 1: GeForce GTX 1080 Ti (UUID: GPU-ce9189e4-2e58-3a19-4332-cb5c7fac1aa6)
GPU 2: GeForce GTX 1080 Ti (UUID: GPU-242b3020-8e5c-813a-42d9-475766d52f9d)
GPU 3: GeForce GTX 1080 Ti (UUID: GPU-8f3d825f-7246-3daf-eaa1-37845b03aa03)
Copy the code

The GPU card number is separately filtered

nvidia-smi -L | cut -d ' ' -f 2 | cut -c 1
Copy the code

GPU Common Settings

Boot Mode Setting

Solve the problem of GPU startup loading slowness

Set GPU Persistence mode: Persistence -m sudo nvidia- SMi-pm 1Copy the code

Distribution of nodes

To solve the problem of uneven card performance, if it is a four-card machine, only two nodes are preferred to choose 0 and 3, and the boundary card slot is conducive to heat dissipation

The appendix

  • Developer.nvidia.com/nvidia-syst…