Golang Performance Optimization Analysis Tool PPROF (Part 1)
1. Golang program performance tuning
What do YOU need to debug and optimize in golang?
General general content:
- CPU: The CPU usage of the program - the length of use, proportion, etc
- Memory: Application CPU usage - length of use, percentage, memory leaks, etc. If in, the program heap, stack usage
- I/O: I/O usage - Which program takes more I/O time
In golang:
- Goroutine: Go coroutine usage, call chain
- Goroutine Leak: Goroutine leak check
- Go dead lock: deadlock detection and analysis
- Data Race Detector: Data race analysis is also related to deadlock analysis
Here are some of the performance tuning elements in golang.
What tools are available to debug and optimize golang programs?
For example, CPU performance debugging in Linux, the tools are TOP, dstat, perf and so on.
So what are the analytical methods in Golang?
Golang performance debugging and optimization method:
- Benchmark: Tests specific code for running time, memory information, etc
- Profiling: Program analysis, a running portrait of a program that is analyzed by sampling collected data during program execution
- Trace: During program execution, the program is analyzed by collecting event data
What is the difference between profiling and trace? Profiling profiling does not have a timeline; trace profiling does.
What about tools to apply methods in Golang?
Here's the Pprof Golang tool that helps you debug your optimizer.
Its original program is GperfTools - github.com/gperftools/... The Pprof of is derived from it.
Introduction to Pprof
Introduction to the
Pprof is the official performance tuning analysis tool provided by Golang. It can analyze the performance of the program and visualize the data, which looks quite intuitive. Use this tool to debug and optimize your GO application when it encounters performance bottlenecks.
This article will apply the following two performance monitoring packages pprof in Golang:
-
Runtime /pprof: Collects program running data for performance analysis. It is generally used for background tool-based applications. These applications end after a period of time.
-
Net/HTTP /pprof: Secondary encapsulation of runtime/pprof, usually for service applications. Web Server, for example, runs all the time. This package collects and analyzes the data provided for the HTTP service.
When pprof is enabled, it collects the stack information of the current program at regular intervals to obtain the CPU, memory, and other usage of the function. Through the analysis of the sampled data, a data analysis report is formed.
Pprof stores data in profile.proto format and can then generate a visual analysis report from this data, supporting both text and graphical reports. The specific data format in profile.proto is protocol buffers.
What methods are used to analyze the data to produce text or graphical reports?
Use a command-line tool, go Tool pprof.
Pprof uses patterns
-
Report Generation: Report generation
-
Interactive terminal use: indicates the Interactive terminal
-
Web Interface: indicates the Web interface
Third, the runtime/pprof
The premise condition
To debug and analyze golang, start profile and start sampling data. Then install: go get github.com/google/pprof, which will be used in the later analysis.
Sampling data:
- First, add the following code to the GO program:
StartCPUProfile enables CPU profiling for the current process. StopCPUProfile Stops the current CPU profile. It returns when all profiles have been written.
// StartCPUProfile: pprof.startcpuprofile (IO.Writer) // StopCPUProfile: pprof.stopcpuprofile ()Copy the code
WriteHeapProfile writes memory heap related content to the file
pprof.WriteHeapProfile(w io.Writer)
Copy the code
- The second is when you test the benchmark
go test -cpuprofile cpu.prof -memprofile mem.prof -bench .
Copy the code
- Another is to collect data from the HTTP server
go tool pprof $host/debug/pprof/profile
Copy the code
The sample program
Go version go1.13.9
Example 1
We use the first method to add analysis code to the program, demo.go:
package main
import (
"bytes"
"flag"
"log"
"math/rand"
"os"
"runtime"
"runtime/pprof"
"sync"
)
var cpuprofile = flag.String("cpuprofile".""."write cpu profile to `file`")
var memprofile = flag.String("memprofile".""."write mem profile to `file`")
func main(a) {
flag.Parse()
if*cpuprofile ! ="" {
f, err := os.Create(*cpuprofile)
iferr ! =nil {
log.Fatal("could not create CPU profile: ", err)
}
defer f.Close()
iferr := pprof.StartCPUProfile(f); err ! =nil {
log.Fatal("could not start CPU profile: ", err)
}
defer pprof.StopCPUProfile()
}
var wg sync.WaitGroup
wg.Add(200)
for i := 0; i 200; i++ {
go cyclenum(30000, wg)
}
writeBytes()
wg.Wait()
if*memprofile ! ="" {
f, err := os.Create(*memprofile)
iferr ! =nil {
log.Fatal("could not create memory profile: ", err)
}
defer f.Close()
runtime.GC()
iferr := pprof.WriteHeapProfile(f); err ! =nil {
log.Fatal("cound not write memory profile: ", err)
}
}
}
func cyclenum(num int, wg *sync.WaitGroup) {
slice := make([]int.0)
for i := 0; i num; i++ {
for j := 0; j num; j++ {
j = i + j
slice = append(slice, j)
}
}
wg.Done()
}
func writeBytes(a) *bytes.Buffer {
var buff bytes.Buffer
for i := 0; i 30000; i++ {
buff.Write([]byte{'0' + byte(rand.Intn(10))})}return buff
}
Copy the code
Compiling program, collecting data and analyzing program:
- Compile the demo. Go
go build demo.go
Copy the code
- To collect data with pprof, run the following command:
./demo.exe --cpuprofile=democpu.pprof --memprofile=demomem.pprof
Copy the code
I win system, this demo is demo. Exe, Linux is demo
- To analyze the data, run the following command:
go tool pprof democpu.pprof
Copy the code
Go tool pprof [binary] [source]
- Binary: the application's binary file, used to parse various symbols
- Source: Indicates the source of profile data. It can be a local file or an HTTP address
To learn more about how to use the go Tool pprof command, see the documentation: Go Tool pprof --help
Note:
Profiling data is captured on the fly, and if you want to capture valid data, you need to ensure that your application or service is under heavy load, such as running a service in work, or that access pressure is simulated by other tools. Otherwise, if the application is idle, such as the HTTP service, the results may not be meaningful. (You will encounter this problem later when the HTTP Web service is idle and the collection displays empty data.)
There are two basic models for analyzing data:
- One is the command line interaction analysis mode
- One is graphical visualization analysis mode
Command line interaction analysis
A: Command line interaction analysis
- To analyze the data collected above, command:
go tool pprof democpu.pprof
field | instructions |
---|---|
Type: | Analysis type, here is CPU |
Duration: | The duration of program execution |
Duration also has a line under it, which is in interactive mode (type help for help information, o for options information).
As you can see, the Pprof operation of Go has many other commands.
- Enter the help command to get a lot of help information:
You have a lot of command information under Commands, text, top two Commands explain the same, enter two to see:
- Enter the top, text command
The top command sorts the CPU usage and percentage of a function
Top can also be followed by parameters, such as Top15
The same information is output.
field | instructions |
---|---|
flat | CPU usage of the current function |
flat % | CPU usage percentage of the current function |
sum% | The cumulative amount of CPU time used by functions, from small to large to 100% |
cum | The total CPU usage of the current function plus the function calling the current function |
%cum | The total CPU usage of the current function plus the function calling the current function |
From the field data, we can see which function is more time-consuming, so we can further analyze the function. The command used for analysis is list.
List command: lists the most time-consuming code parts of a function in the format of list function name
From the above sampled data, it can be concluded that the function with the longest total time is main.cycylenum, which can be analyzed with the list cyclenum command, as shown in the following figure:
The most time-consuming code was found to be 62 lines: slice = Append (slice, j), which took 1.47s and could be optimized.
The reason for the time consuming here should be the real-time expansion of Slice. Make ([]int, num * num) make([]int, num * num)
B: The analysis data is directly output on the command line
The command output is in the following format:
go tool pprof format [options] [binary] source
Copy the code
Enter the command:go tool pprof -text democpu.pprof
And the output:
Visual analysis
A. Pprof graphic visualization
In addition to the command line interaction analysis above, you can also use graphics to analyze program performance.
To do that, you need to install Graphviz,
- Download address: Graphviz address
After the installation is complete, add the execution file bin to the Path environment variable, and run the dot-version command on the terminal to check whether the installation is successful.
Generate visual files:
There are two steps to visualize using the data file democpu.pprof collected above:
- Run the go tool pprof democpu.pprof command
- Enter a Web command
By typing a Web command on the command line, you can generate an SVG file that you can open in a browser to view.
Execute the above two commands as shown below:
View the generated SVG diagram in a browser:
(The file is too large, only a small part of the graph is captured, please generate the complete graph by yourself)
A note about graphics:
- Each box represents a function, and the larger the box, the more CPU resources it consumes
- The line between each box represents the call relationship between functions, and the number on the line indicates the number of times the function was called
- The number in the first line of each box indicates the percentage of CPU occupied by the current function, and the number in the second line indicates the percentage of CPU occupied by the current function
B. Web visualization - Viewing data in a browser
Run the go tool pprof-http =:8080 democpu.pprof command
$ go tool pprof -http=:8080 democpu.pprof
Serving web UI on http://localhost:8080
Copy the code
After the command is executed, the address is automatically opened in the browser:http://localhost:8080/ui/
, we can view the analysis data in the browser:This graph is the one generated with the Web command above.
If you don't have so many menus to choose from when browsing the Web, install the native Pprof tool:
Pprof =:8080 democpu. Pprof =:8080 democpu.
You can also view the flame map at HTTP:http://localhost:8080/ui/flamegraph, can be directly clickVIEW the Flame Graph under the Flame Graph option on the VIEW menu. Of course, there are other options, like Top, Graph, etc. You can choose according to your needs.
C. Flame Graph D. Flame Graph
In fact, the above Web visualization already includes the flame diagram, which is integrated into pprof. But in honor of performance optimization expert Bredan Gregg, let's take a look at the flame diagram generation process.
Flame Graph is a performance analysis Graph created by performance optimization expert Bredan Gregg. Flame Graphs visualize profiled code.
The shape of the flame diagram is as follows:
(From: github.com/brendangreg...
To convert the sample data generated by Pprof into a flame map, we need to use a conversion tool called Go-Torch. This tool is open source for Uber. It is written in GO language and can directly read the data collected by Pprof and generate a flame map in SVG format.
- Install the go - the torch:
go get -v github.com/uber/go-torch
- Install Flame Graph:
Git clone github.com/brendangreg...
Add the location of the FlameGraph installation directory to the Path.
- Install the Perl environment:
FlameGraph, the program that generates the flame chart, is written in Perl, so install an environment that executes perl first.
- Install the Perl environment: www.perl.org/get.html
- Add the execution file bin to the Path
- Run the following command on the terminal:
perl -h
If the help information is displayed, the installation is successful
- Verify that FlameGraph is installed successfully:
Go to the FlameGraph installation directory and run./flamegraph.pl --help
The installation is successful
- Generate flame map:
Re-enter the directory of the democpu.pprof file and run the following command:
go-torch -b democpu.pprof
The command above generates a file named torch. SVG by default, open it in your browser and view it:
Custom output file name, followed by -f parameter:
go-torch -b democpu.pprof -f cpu_flamegraph.svg
Description of flame diagram:
Flame map SVG file, you can click on each square to view and analyze its content.
The fire chart is called from bottom to top, with each square representing a function, the layer above it representing what functions the function calls, and the size of the square representing how long the CPU is used.
Format of the go-Torch command:
go-torch [options] [binary] profile source
Copy the code
Go-torch Help documentation:
To learn more about the use of go-torch, use the help command to view the help documentation, go-torch --help.
Or check out the Go-Torch README document.