Golang Performance Optimization Analysis Tool PPROF (Part 1)

1. Golang program performance tuning

What do YOU need to debug and optimize in golang?

General general content:

CPU: The CPU usage of the program – the length of use, proportion, etc
Memory: Application CPU usage – length of use, percentage, memory leaks, etc. If in, the program heap, stack usage
I/O: I/O usage – Which program takes more I/O time

In golang:

Goroutine: Go coroutine usage, call chain
Goroutine Leak: Goroutine leak check
Go dead lock: deadlock detection and analysis
Data Race Detector: Data race analysis is also related to deadlock analysis

Here are some of the performance tuning elements in golang.

What tools are available to debug and optimize golang programs?

For example, CPU performance debugging in Linux, the tools are TOP, dstat, perf and so on.

So what are the analytical methods in Golang?

Golang performance debugging and optimization method:

Benchmark: Tests specific code for running time, memory information, etc
Profiling: Program analysis, a running portrait of a program that is analyzed by sampling collected data during program execution
Trace: During program execution, the program is analyzed by collecting event data

What is the difference between profiling and trace? Profiling profiling does not have a timeline; trace profiling does.

What about tools to apply methods in Golang?

Here’s the Pprof Golang tool that helps you debug your optimizer.

Its original program is GperfTools – github.com/gperftools/… The Pprof of is derived from it.

Introduction to Pprof

Introduction to the

Pprof is the official performance tuning analysis tool provided by Golang. It can analyze the performance of the program and visualize the data, which looks quite intuitive. Use this tool to debug and optimize your GO application when it encounters performance bottlenecks.

This article will apply the following two performance monitoring packages pprof in Golang:

Runtime /pprof: Collects program running data for performance analysis. It is generally used for background tool-based applications. These applications end after a period of time.
Net/HTTP /pprof: Secondary encapsulation of runtime/pprof, usually for service applications. Web Server, for example, runs all the time. This package collects and analyzes the data provided for the HTTP service.

When pprof is enabled, it collects the stack information of the current program at regular intervals to obtain the CPU, memory, and other usage of the function. Through the analysis of the sampled data, a data analysis report is formed.

Pprof stores data in profile.proto format and can then generate a visual analysis report from this data, supporting both text and graphical reports. The specific data format in profile.proto is protocol buffers.

What methods are used to analyze the data to produce text or graphical reports?

Use a command-line tool, go Tool pprof.

Pprof uses patterns

Report Generation: Report generation
Interactive terminal use: indicates the Interactive terminal
Web Interface: indicates the Web interface

Third, the runtime/pprof

The premise condition

To debug and analyze golang, start profile and start sampling data. Then install: go get github.com/google/pprof, which will be used in the later analysis.

Sampling data:

First, add the following code to the GO program:

StartCPUProfile enables CPU profiling for the current process. StopCPUProfile Stops the current CPU profile. It returns when all profiles have been written.

// StartCPUProfile: pprof.startcpuprofile (IO.Writer) // StopCPUProfile: pprof.stopcpuprofile ()Copy the code

WriteHeapProfile writes memory heap related content to the file

pprof.WriteHeapProfile(w io.Writer)
Copy the code

The second is when you test the benchmark

go test -cpuprofile cpu.prof -memprofile mem.prof -bench .
Copy the code

Another is to collect data from the HTTP server

go tool pprof $host/debug/pprof/profile
Copy the code

The sample program

Go version go1.13.9

Example 1

We use the first method to add analysis code to the program, demo.go:

package main

import (
	"bytes"
	"flag"
	"log"
	"math/rand"
	"os"
	"runtime"
	"runtime/pprof"
	"sync"
)

var cpuprofile = flag.String("cpuprofile".""."write cpu profile to `file`")
var memprofile = flag.String("memprofile".""."write mem profile to `file`")

func main(a) {
	flag.Parse()
	if*cpuprofile ! ="" {
		f, err := os.Create(*cpuprofile)
		iferr ! =nil {
			log.Fatal("could not create CPU profile: ", err)
		}
		defer f.Close()

		iferr := pprof.StartCPUProfile(f); err ! =nil {
			log.Fatal("could not start CPU profile: ", err)
		}
		defer pprof.StopCPUProfile()
	}

	var wg sync.WaitGroup
	wg.Add(200)

	for i := 0; i < 200; i++ {
		go cyclenum(30000, &wg)
	}

	writeBytes()

	wg.Wait()

	if*memprofile ! ="" {
		f, err := os.Create(*memprofile)
		iferr ! =nil {
			log.Fatal("could not create memory profile: ", err)
		}
		defer f.Close()
		runtime.GC()

		iferr := pprof.WriteHeapProfile(f); err ! =nil {
			log.Fatal("cound not write memory profile: ", err)
		}
	}
}

func cyclenum(num int, wg *sync.WaitGroup) {
	slice := make([]int.0)
	for i := 0; i < num; i++ {
		for j := 0; j < num; j++ {
			j = i + j
			slice = append(slice, j)
		}
	}
	wg.Done()
}

func writeBytes(a) *bytes.Buffer {
	var buff bytes.Buffer

	for i := 0; i < 30000; i++ {
		buff.Write([]byte{'0' + byte(rand.Intn(10))})}return &buff
}
Copy the code

Compiling program, collecting data and analyzing program:

Compile the demo. Go

go build demo.go
Copy the code

To collect data with pprof, run the following command:

./demo.exe --cpuprofile=democpu.pprof  --memprofile=demomem.pprof
Copy the code

I win system, this demo is demo. Exe, Linux is demo

To analyze the data, run the following command:

go tool pprof democpu.pprof
Copy the code

Go tool pprof [binary] [source]

Binary: the application’s binary file, used to parse various symbols
Source: Indicates the source of profile data. It can be a local file or an HTTP address

To learn more about how to use the go Tool pprof command, see the documentation: Go Tool pprof –help

Note:

Profiling data is captured on the fly, and if you want to capture valid data, you need to ensure that your application or service is under heavy load, such as running a service in work, or that access pressure is simulated by other tools. Otherwise, if the application is idle, such as the HTTP service, the results may not be meaningful. (You will encounter this problem later when the HTTP Web service is idle and the collection displays empty data.)

There are two basic models for analyzing data:

One is the command line interaction analysis mode
One is graphical visualization analysis mode

Command line interaction analysis

A: Command line interaction analysis

To analyze the data collected above, command:go tool pprof democpu.pprof

field	instructions
Type:	Analysis type, here is CPU
Duration:	The duration of program execution

Duration also has a line under it, which is in interactive mode (type help for help information, o for options information).

As you can see, the Pprof operation of Go has many other commands.

Enter the help command to get a lot of help information:

You have a lot of command information under Commands, text, top two Commands explain the same, enter two to see:

Enter the top, text command

The top command sorts the CPU usage and percentage of a function

Top can also be followed by parameters, such as Top15

The same information is output.

field	instructions
flat	CPU usage of the current function
flat %	CPU usage percentage of the current function
sum%	The cumulative amount of CPU time used by functions, from small to large to 100%
cum	The total CPU usage of the current function plus the function calling the current function
%cum	The total CPU usage of the current function plus the function calling the current function

From the field data, we can see which function is more time-consuming, so we can further analyze the function. The command used for analysis is list.

List command: lists the most time-consuming code parts of a function in the format of list function name

From the above sampled data, it can be concluded that the function with the longest total time is main.cycylenum, which can be analyzed with the list cyclenum command, as shown in the following figure:

The most time-consuming code was found to be 62 lines: slice = Append (slice, j), which took 1.47s and could be optimized.

The reason for the time consuming here should be the real-time expansion of Slice. Make ([]int, num * num) make([]int, num * num)

B: The analysis data is directly output on the command line

The command output is in the following format:

go tool pprof <format> [options] [binary] <source>
Copy the code

Enter the command:go tool pprof -text democpu.pprofAnd the output:

Visual analysis

A. Pprof graphic visualization

In addition to the command line interaction analysis above, you can also use graphics to analyze program performance.

To do that, you need to install Graphviz,

Download address: Graphviz address

After the installation is complete, add the execution file bin to the Path environment variable, and run the dot-version command on the terminal to check whether the installation is successful.

Generate visual files:

There are two steps to visualize using the data file democpu.pprof collected above:

Run the go tool pprof democpu.pprof command
Enter a Web command

By typing a Web command on the command line, you can generate an SVG file that you can open in a browser to view.

Execute the above two commands as shown below:

View the generated SVG diagram in a browser:

(The file is too large, only a small part of the graph is captured, please generate the complete graph by yourself)

A note about graphics:

Each box represents a function, and the larger the box, the more CPU resources it consumes
The line between each box represents the call relationship between functions, and the number on the line indicates the number of times the function was called
The number in the first line of each box indicates the percentage of CPU occupied by the current function, and the number in the second line indicates the percentage of CPU occupied by the current function

B. Web visualization – Viewing data in a browser

Run the go tool pprof-http =:8080 democpu.pprof command

$ go tool pprof -http=:8080 democpu.pprof
Serving web UI on http://localhost:8080
Copy the code

After the command is executed, the address is automatically opened in the browser:http://localhost:8080/ui/, we can view the analysis data in the browser:This graph is the one generated with the Web command above.

If you don’t have so many menus to choose from when browsing the Web, install the native Pprof tool:

Pprof =:8080 democpu. Pprof =:8080 democpu.

You can also view the flame map at HTTP:http://localhost:8080/ui/flamegraph, can be directly clickVIEW the Flame Graph under the Flame Graph option on the VIEW menu. Of course, there are other options, like Top, Graph, etc. You can choose according to your needs.

C. Flame Graph D. Flame Graph

In fact, the above Web visualization already includes the flame diagram, which is integrated into pprof. But in honor of performance optimization expert Bredan Gregg, let’s take a look at the flame diagram generation process.

Flame Graph is a performance analysis Graph created by performance optimization expert Bredan Gregg. Flame Graphs visualize profiled code.

The shape of the flame diagram is as follows:

(From: github.com/brendangreg…

To convert the sample data generated by Pprof into a flame map, we need to use a conversion tool called Go-Torch. This tool is open source for Uber. It is written in GO language and can directly read the data collected by Pprof and generate a flame map in SVG format.

Install the go – the torch:

go get -v github.com/uber/go-torch

Install Flame Graph:

Git clone github.com/brendangreg…

Add the location of the FlameGraph installation directory to the Path.

Install the Perl environment:

FlameGraph, the program that generates the flame chart, is written in Perl, so install an environment that executes perl first.

Install the Perl environment: www.perl.org/get.html
Add the execution file bin to the Path
Run the following command on the terminal:perl -hIf the help information is displayed, the installation is successful

Verify that FlameGraph is installed successfully:

Go to the FlameGraph installation directory and run./flamegraph.pl –help

The installation is successful

Generate flame map:

Re-enter the directory of the democpu.pprof file and run the following command:

go-torch -b democpu.pprof

The command above generates a file named torch. SVG by default, open it in your browser and view it:

Custom output file name, followed by -f parameter:

go-torch -b democpu.pprof -f cpu_flamegraph.svg

Description of flame diagram:

Flame map SVG file, you can click on each square to view and analyze its content.

The fire chart is called from bottom to top, with each square representing a function, the layer above it representing what functions the function calls, and the size of the square representing how long the CPU is used.

Format of the go-Torch command:

go-torch [options] [binary] <profile source>
Copy the code

Go-torch Help documentation:

To learn more about the use of go-torch, use the help command to view the help documentation, go-torch –help.

Or check out the Go-Torch README document.

Four, reference

pprof
- README
Profiling Go Programs
runtime/pprof
net/http/pprof
go-torch
Flame Graph