Introduction to the

Perf is a Performance analysis tool provided in Linux based on a kernel subsystem called Performance Counters. It also supports hardware (CPU, Performance Monitoring Unit (PMU)) and software (software counter, tracepoint) Performance analysis.

Events in PERF

Perf, like other performance tuning tools, samples monitored objects to infer the behavior of the entire program based on the distribution of sample points. With the perf list command, you can see that perf supports a number of sampling events, such as branch-misses, CPU-clock, and so on. The predefined events in PERF are of different types, such as hardware-generated events (cache hit/branch miss) and software-generated events (Context Switch/Page fault).

tracepoint

Tracepoint is a set of hooks defined in the Linux kernel. If enabled, these hooks will be triggered when certain logic is executed, so that other tools can obtain information about the internal running status of the system. Perf uses Tracepoint to record and collect statistics of tracepoint events. Generate analysis report.

use

The perF tool can be used in the following ways:

perf [--version] [--help] COMMAND [ARGS]
Copy the code

The list of commands can be viewed by executing perf –help. Here are a few commonly used commands.

perf stat

Perf stat is used to execute a command and collect data as it runs, providing an overview of how a program is performing. Such as:

user@localhost:~$ perf stat hostname
localhost

 Performance counter stats for 'hostname': 0.313464 task - clock (msec)# 0.481 CPUs utilized
                 2      context-switches          # 0.006 M/SEC
                 0      cpu-migrations            # 0.000 K/SEC
               153      page-faults               # 0.488 M/SEC896723 cycles# 2.861 GHz620709 instructions# 0.69 insN per cycle121143 branches# 386.465 M/SEC6247 branch - misses# 5.16% of All Branches

       0.000651441 seconds time elapsed
Copy the code

In the example above, the hostname command is run through perf stat and some metrics are summarized during the run, such as task-clock, context-switches, and so on. By default, perf stat prints statistics for several common events, such as:

  • Task-clock-msecs: indicates the CPU usage
  • Context-switches: indicates the number of process switches
  • Page-faults: the number of page breaks that occur
  • Cpu-migrations: Indicates the number of CPU migrations that occur during a process. That is, the scheduler moves the process from one CPU to another
  • Cycles: Processor clock. A single machine instruction may require multiple cycles
  • Instructions: Number of machine instructions
  • Branches: Number of branch instructions encountered
  • Branch-misses are the number of branch instructions that fail to be predicted

In addition, we can use the -e argument to specify the events we are interested in, such as:

user@localhost:~$ perf stat -e cache-misses hostname
localhost

 Performance counter stats for 'hostname':

          682      cache-misses                                                

       0.000646676 seconds time elapsed
Copy the code

perf top

Perf Top displays the current system performance statistics in real time. The previous perf stat was used to analyze a specific program, and in some cases we may not know which program is affecting system performance, so we can use Perf Top to find suspicious programs. Such as:

Samples: 775  of event 'cpu-clock', Event count (approx.): 92931021 Overhead Shared Object Symbol 8.93% [kernel] [k] vsnprintf 7.73% perf [.] rb_next 5.92% [kernel] [k] Kallsyms_expand_symbol.clone.0 5.07% [kernel] [k] format_decode 4.59% [kernel] [k] number 3.40% perf [.] symbols__insert 3.03% libslang. So. 2.2.1 SLtt_smart_puts [.]Copy the code

The above example shows that PERf counted cpu-clock events and sorted them by proportion. Similar to perf stat, you can use the -e parameter to specify statistics on other events, such as perf top-e context-switches to view the top N processes that are switched the most.

perf record & perf report

Perf Record is similar to perf stat in that it can run a command and generate statistics, but instead of displaying the results, perf Record outputs them to a file. Perf Record generates files that can be parsed using perf Report.

Perf Record can also use the -g parameter to generate a calling Graph at analysis time to help locate higher-level logical distributions.

other

Through examples, we can find that the Symbol column in perF analysis results shows the names of C language functions. In Java, jIT-compiled functions are displayed in Symbol instead of Java function names. In this case, it is not easy to locate the problem. We need to use additional means to match Symbol to the Symbol table of the Java program, which will be discussed later.