Daily open source | a good performance tuning tool: flame figure

1, the preface

The evolution of tools has always been a sign of the progress of human productivity. The rational use of tools can greatly improve our work efficiency. When encountering problems, the rational use of tools can accelerate the progress of troubleshooting problems. That’s why I love the Shell so much. Its rich command line toolset plus pipeline features work with text data sets with such precision and elegance that it’s mesmerizing.

But in many cases, the expressive force of the text is very limited, which can be said to be deficient. When expressing absolute value, it is naturally advantageous, but when displaying relative value, it is somewhat inadequate, let alone multidimensional data.

A shell can query text very quickly for accumulations, maxima, and so on, but when it comes to correlation analysis between two sets of values, there’s nothing you can do. At this point, another analysis tool is needed – graphs, such as scatter plots, that clearly show correlations.

Today, I am going to introduce a graph, flame graph. I have shared the usage method of it before, but I have not used it for a long time, so I am not impressed by it. Recently, I tried it when investigating the load problem of our Java application, so I have some experience about its use.

2, introduce

primers

When troubleshooting performance problems, we usually dump the thread stack and use it

grep --no-group-separator -A 1 java.lang.Thread.State jstack.log | awk 'NR%2==0' | sort | uniq -c | sort -nr
Copy the code

Similar shell statement to see what most thread stacks are doing. The frequency of thread stack occurrences is used to infer the most costly calls within the JVM.

As for how this works, imagine a large screen in a square that continuously plays various advertisements. If we take a picture of a large screen at random, many times, and count the frequency of each AD in the picture, we can basically figure out how long each AD is running.

The resource in our application is like a large screen, and each call is like playing an advertisement. If we count the proportion of thread stacks in dump, we can basically see the time ratio of thread stacks. Although there is some error, it should be close to the difference after multiple counts. This is why some parents every time into the child’s room found that the child at the system desktop after thinking that children usually like to be in a daze on the desktop. 🙂

2444 at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1200)1587 at sun.misc.Unsafe.park(Native Method)795 at java.security.Provider.getService(Provider.java:1035)293 at java.lang.Object.wait(Native Method)292 at java.lang.Thread.sleep(Native Method) 73 at org.apache.logging.log4j.core.layout.TextEncoderHelper.copyDataToDestination(TextEncoderHelper.java:61) 71 at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) 70 at java.lang.Class.forName0(Native Method) 54 at org.apache.logging.log4j.core.appender.rolling.RollingFileManager.checkRollover(RollingFileManager.java:217)
Copy the code

But there are some problems with this. Writing the shell in the first place takes a lot of work, and if I want to see the most calls from the second stack at the top, even if I change the shell command, the results are not intuitive.

The main reason for this problem is that our thread stack has a call relationship, that is, we need to consider the call chain and frequency of the thread stack two dimensions, and it is difficult to represent these two dimensions in a single text, so the famous performance analysis master Brendan Gregg proposed the flame chart.

3, to introduce

Flame map, named because it looks like a flame, its open source address:

Github.com/brendangreg…

It is an SVG interactive graphic that allows us to show more information by clicking and mouse-pointing. The figure below is a typical flame diagram. Structurally, it is made up of squares of different sizes and colors. Each square has a character on it, and they are connected together at the bottom to form a base of flame.

When we click on the square, the image expands upwards from the base of the square we clicked on, and when we point over the square, a detailed description of the square is displayed.

features

Before introducing the analysis of the flame chart, we should first explain its characteristics:

A unique chain of calls can be traced from bottom to top, with the bottom square being the parent of the top square.
Squares of the same parent call are arranged alphabetically from left to right.
The characters on the square represent the name of a call, and in parentheses are the number of times the flame map refers to the call in the flame map and the percentage of the width of the square in the bottom square.
The color of the squares has no real meaning, and the color difference between adjacent squares is just for ease of viewing.

Analysis of the

So, give us a fire map, how can we see where the system is broken?

According to the characteristics of the flame chart above, we should focus on the width of the block when viewing the flame chart, because the width represents the number of times that the call stack appears globally, and the number represents the frequency of occurrence, which can also explain the time consuming.

But it doesn’t make much sense to look at the width ratio of the bottom or middle squares of the flame chart, as shown above. The do_redirections function in the middle is 24.87% wide, which means it takes up about a quarter of the application’s time, but it’s not the DO_redirections function that really consumes time. It is another function called within do_reDirections, and its sub-calls are divided into many, and the time of each call is normal.

We should pay more attention to some of the “mesas” at the top of the fire chart, where the top indicates that it has no child calls, and the square width indicates that it takes a long time, hangs for a long time, or is called very frequently, which are the main causes of performance problems.

Find the exception call and optimize it directly, or go down the call chain of the flame diagram and find our business code and optimize it, and you’re done.

Application scenarios

Each tool has its own application scenarios, and flame maps are suitable for:

Code loop analysis: If there are large loops or dead-loops in the code, there will be a distinct “flat top” from the top of the flame chart or near the top of the item, indicating that the code is frequently moving up and down a thread stack. However, it should be noted that if the total time of the cycle is not long, it will not be obvious on the flame chart.
IO bottleneck/lock analysis: In our application code, our calls are generally synchronous, which means that during network calls, file I/O operations, or unsuccessful lock acquisition, the thread will hang on a call waiting for an I/O response or lock. If the wait is too long, the thread will hang on a call. This will be very clear on the fire chart. In contrast, our fire map of application threads does not accurately represent CPU consumption, because there is no system call stack in the application thread, and when the application thread stack hangs, the CPU may be doing other things, resulting in a long time, while the CPU is idle.
Flame map inversion for global code analysis: Flame map inversion can also be useful. If N different branches of our code call a method, and all the same calls at the top of the stack are combined, we can see the total time taken for the method, and it is easy to evaluate the benefits of optimizing the method.

4, implementation,

So how do we do that if the fire map is so powerful?

Generation tool

Brendan Gregg has implemented the method of generating flame charts in Perl. The open source code is in the Github repository above, and the flamegraph.pl file in the root directory is the executable Perl file.

This command can also pass in various parameters, allowing us to modify the color, size, and so on.

But flamegraph.pl can only handle files in certain formats, like:

a; b; c 12a; d 3b; c 3z; d 5a; c; e 3Copy the code

Before is the call chain, between each call; The number after each line is the number of times the call stack occurs.

With the above data, the flame map generated by Flamegraph.pl is shown below:

Data preparation

As for how our jStack information is processed into the above format, we have tools for common dump formats, such as Stackcollapse-perf.pl, which can handle the output of perf commands. Stackcollapsor-jstack.pl handles jstack output, stackcollapsor-gdb.pl handles stacks of GDB output, etc.

You can also use a shell to simply implement jStack processing:

grep -v -P '.+prio=d+ os_prio=d+' | grep -v -E 'locked <' | awk '{if ($0==""){print $0}else{printf"%s;" ,$0}}' | sort | uniq -c | awk '{a=$1; $1 = ""; print $0,a}'Copy the code

5, summary

After the conclusion of the flame diagram, there is another way to deal with performance problems.

Making the address

Github.com/brendangreg…

Gitee address

Gitee.com/gober/Flame…

PS: In case you can’t find this article, please click “like” to browse and find it.

Daily open source | a good performance tuning tool: flame figure

1, the preface

2, introduce

3, to introduce

4, implementation,

5, summary

Related Posts

[translation] Flutter will win | technical reviews on the Web

The Walking Dead: IT jobs destined for obsolescence

Data structure and algorithm (1) construction of development environment