Software performance analysis, often need to look at the CPU time, know where the bottleneck.

The Flame Graph is a useful tool for performance analysis. This article introduces its basic usage.

Perf command

Let’s start with the perf command (short for performance), a performance analysis tool native to Linux that returns the name of the function being executed by the CPU and the stack.

Typically, it is executed at a frequency of 99Hz (99 times per second), and if the same function name is returned 99 times, the CPU is executing the same function all the time and may have a performance problem.


$ sudo perf record -F 99 -p 13204 -g -- sleep 30
Copy the code

In the above code, perf Record is the record, -f 99 is 99 times per second, -p 13204 is the process number, which process is analyzed, -g is the record call stack, and sleep 30 is 30 seconds.

This results in a large text file. If a server has 16 cpus and samples 99 times per second for 30 seconds, you get 47,520 call stacks, hundreds of thousands or even millions of rows long.

For ease of reading, the perf Record command counts the percentage of occurrences in each call stack and then ranks it from highest to lowest.


$ sudo perf report -n --stdio
Copy the code

The result is still hard to read, hence the fire diagram.

Two, the meaning of the flame diagram

The flame map is an SVG image based on perF results that shows the CALL stack of the CPU.

The Y-axis represents the call stack, and each layer is a function. The deeper the call stack, the higher the flame, with the executing function at the top and its parent functions below.

The X-axis represents the number of samples. The wider a function occupies along the X-axis, the more times it is drawn, or the longer it takes to execute. Note that the X-axis does not represent time, but rather all call stacks are grouped in alphabetical order.

The flame diagram is to see which function on the top takes up the most width. Any “flat top” (plateaus) indicates that the function may have a performance problem.

Colors have no special meaning, because the flame chart shows how busy the CPU is, so warm colors are generally used.

Third, interactive

Flame diagrams are SVG images that can be interacted with by the user.

(1) Mouse suspension

Each layer of the flame will mark the function name, when the mouse hover shows the complete function name, sampling times, the percentage of total sampling times. Here’s an example.

Mysqld 'JOIN::exec (272,959 samples, 78.34 percent)Copy the code

(2) Click to enlarge

Click on a layer and the flame will zoom in horizontally. The layer will take up all the width and display the details.

“Reset Zoom” will also appear in the upper left corner. Click the link and the image will be restored to its original state.

(3) Search

Pressing Ctrl + F displays a search box where the user can enter a keyword or regular expression, and all matching function names are highlighted.

4. Examples of flame diagrams

Here is a simplified example of a fire diagram.

First, the CPU sampled three call stacks.


func_c 
func_b 
func_a 
start_thread 

func_d 
func_a 
start_thread 

func_d 
func_a 
start_thread
Copy the code

In the above code, start_thread is the starting thread and func_A is called. The latter, in turn, calls func_B and func_d, which in turn calls func_c.

After the merging process, the following result is obtained, that is, there are two call stacks, the first call stack is drawn 1 times, the second call stack is drawn 2 times.

start_thread; func_a; func_b; func_c 1 start_thread; func_a; func_d 2Copy the code

With this call stack statistics, the flame map tool can generate SVG images.

In the image above, the top-level function g() takes the most CPU time. D () has the largest width, but it consumes very little CPU directly. B () and C () do not consume CPU directly. Therefore, if you want to investigate performance issues, you should investigate G () first and I () second.

In addition, it can be seen from the figure that A () has two branches b() and H (), which indicates that there may be a conditional statement in A (), and the B () branch consumes much more CPU than H ().

Five, the limited

In both cases, the fire diagram cannot be drawn and the system behavior needs to be corrected.

(1) The call stack is incomplete

When the call stack is too deep, some systems return only the first part (such as the first 10 layers).

(2) Missing function name

Some functions do not have names and the compiler only uses memory addresses to represent them (such as anonymous functions).

The Flame map of the Node application

The Flame map of a Node application is a performance sampling of Node processes, just like any other application.


$ perf record -F 99 -p `pgrep -n node` -g -- sleep 30
Copy the code

See this article for details.

7. The browser’s flame map

Chrome can generate flame maps of page scripts for CPU analysis.

Open developer tools and switch to the Performance panel. Then, click the “Record” button to start recording data. At this point, you can do various things on the page and then stop recording.

At this point, the developer tool displays a timeline. Below it is the fire chart.

The browser flame chart differs from the standard flame chart in two ways: it is inverted (that is, the function at the top of the stack is at the bottom); The X-axis is the time axis, not the number of samples.

8. Reference links

  • Paper on the introduction of flame diagram
  • Fire map official home page
  • Flame map generation tool

(after)