Hello, I’m fried fish.

This time, I share PProf, the performance analysis big kill in “Go Language Programming Journey”, with 1W3 + words. I think it is the most complete PProf article in the industry at present. Welcome everyone to learn more advanced skills and knowledge of Go language with Fried Fish.

It is hoped that more Go language lovers can understand the use of PProf, so that they can stand out and become the core force of the team when they become the fire brigade leader, so that they can leave work at 7 PM.

The contents of this article are as follows:

Feel good, welcome to like and forward to more friends to see, thank you.

preface

When the application is running, there will always be some unexpected problems, such as sudden alarms when you are running, monitoring prompts that the process CPU usage is too high, memory usage is increasing (suspected leakage), temporary memory does not run for a long time after a large number of requests. Or Goroutine leaks, Goroutine numbers explode, and continue to do so, or an application fails to service for hours after an iteration is released for no apparent reason…

If this happens, it is very worrying. Besides all kinds of protection, how can we troubleshoot the problem when it is happening? Therefore, in this chapter, we will introduce one of the troubleshooting methods, that is, the PProf tool chain of Go language performance analysis. It is a must-know skill point in Go.

What is PProf

In the Go language, PProf is a tool for visualizing and analyzing performance analysis data. PProf reads collections of analysis samples with profile.proto and generates reports to visualize and aid in analyzing data (text and graphic reports are supported).

Profile. proto is a Protobuf v3 description file that describes a set of callstack and symbolization information that serves as a sampling callstack for statistical analysis. Is a common stackTrace configuration file format.

What are the sampling methods

  • Runtime /pprof: Collect the running data of the specified block of the program (not the Server) for analysis.

  • Net/HTTP /pprof: Runs on HTTP Server and can collect runtime data for analysis.

  • Go test: Collect by running test cases and specifying the required identifiers.

What usage patterns are supported

  • Report Generation: Report generation.

  • Interactive terminal use: Interactive terminal use.

  • Web Interface: indicates the Web interface.

What can be done

  • CPU Profiling: CPU Profiling, which collects monitored application CPU usage (including registers) at a given frequency, can determine where an application is spending time while actively consuming CPU cycles.

  • Memory Profiling: Memory Profiling, which records stack traces as an application makes heap allocation, is used to monitor current and historical Memory usage, and to check for Memory leaks.

  • Block Profiling: Block analysis, record Goroutine Block waiting for synchronization (including the timer channel), the default is not open, you need to call the runtime. SetBlockProfileRate set.

  • Mutex Profiling: a Mutex analysis, report the Mutex is the competition, the default is not open, you need to call the runtime. SetMutexProfileFraction set.

  • Goroutine Profiling: Goroutine Profiling enables stack tracing and Profiling of Goroutine that is currently running in the application.

Features such as Goroutine Profiling are often used in actual Profiling.

Because a lot of problems occur when goroutines explode, and one of the things we need to do is look at what the Goroutine is doing in the application, what’s blocking, and then go to the next step.

Introduction and Use

A simple example

Let’s create a new main.go file for subsequent application analysis and sample demonstration and write the following code:

var datas []string func main() { go func() { for { log.Printf("len: %d", Add(" goprogramming -tour-book")) time.sleep (time.millisecond * 10)}}() _ = http.listenandServe ("0.0.0.0:6060", nil) } func Add(str string) int { data := []byte(str) datas = append(datas, string(data)) return len(datas) }Copy the code

The next most important step is to add a reference to “net/ HTTP /pprof” to import as follows:

import (
 _ "net/http/pprof"
 ...
)

Copy the code

Next, we run the program, visit http://127.0.0.1:6060/debug/pprof/ address, check whether the normal response.

Access through a browser

The first way, we can directly through the browser, view, and then in the first step we can check the overview page, or visit http://127.0.0.1:6060/debug/pprof/, as follows:

/debug/pprof/

Types of profiles available:
Count Profile
3 allocs
0 block
0 cmdline
8 goroutine
3 heap
0 mutex
0 profile
11 threadcreate
0 trace
full goroutine stack dump

Copy the code
  • Allocs: View a sample of all past memory allocations at $HOST/debug/pprof/allocs.

  • Block: View the stack trace that causes the blocking synchronization at $HOST/debug/pprof/block.

  • Cmdline: the full call path to the command line of the current program.

  • Goroutine: look at all the currently running goroutines stack trace, the access path for $HOST/debug/pprof/goroutine.

  • Heap: To view the memory allocation of active objects, access the path $HOST/debug/pprof/heap.

  • Mutex: View the stack trace of the competing owner that caused the mutex, with access to $HOST/debug/pprof/mutex.

  • Profile: CPU Profiling is performed for 30 seconds by default, resulting in a profile for analysis, accessed at $HOST/debug/pprof/profile.

  • Threadcreate: view to create a new OS thread stack trace, access path for $HOST/debug/pprof/threadcreate.

If you add? If debug=1, you can access it directly from the browser as follows:

The debug mode

If the debug parameter is not added, the corresponding profile file will be downloaded directly.

In the deployment environment, for the sake of network security, we usually do not directly expose the relevant ports of pprof to the Internet. Therefore, we use curl, wget and other methods to pull profile files indirectly.

Another point to note is that debug access is time-sensitive. In actual scenarios, we often need to store the current profile file in time for secondary analysis.

Used through interactive terminals

The second way is to grab and analyze the running application Pprof directly from the command line.

CPU Profiling

$ go tool pprof http://localhost:6060/debug/pprof/profile\? seconds\=60 Fetching profile over HTTP from http://localhost:6060/debug/pprof/profile? seconds=60 Saved profile in /Users/eddycjy/pprof/pprof.samples.cpu.002.pb.gz Type: cpu Duration: 1mins, Total samples = 37.25s (61.97%) Entering interactive mode (type "help" for commands, "O" for options) (pprof)Copy the code

After executing this command, you need to wait 60 seconds (the value of seconds can be adjusted) for Pprof to perform CPU Profiling. After that, pprof enters the interactive command line mode by default, and the analysis results can be viewed or exported. If you are using TLS as an HTTP Server, you need to change the call path to go Tool pprof: go tool pprof https+insecure://localhost:6060/debug/pprof/profile\? Seconds \ = 60.

Next, we enter the query command top10 to view the top10 functions in terms of resource cost (e.g. CPU = execution time/cost, Memory = Memory usage), as follows:

(pprof) Top10 Showing Nodes accounting for 36.23s, 97.26% of 37.25s total Dropped 80 nodes (cum <= 0.19s) Showing top 10 nodes out of 34 flat flat% sum% cum% Name 32.63 s 87.60% 87.60% s 87.79% syscall. 32.70 the syscall 2.34% 89.93% s 2.36% runtime. 0.88 0.87 s stringtoslicebyte 0.69 s by 1.85% 91.79% 0.69s 1.85% runtime.memmove 0.52s 1.40% 93.18% 0.52s 1.40% runtime.nanotime... (pprof)Copy the code
  • Flat: the running time of the function itself.

  • Flat % : The total CPU elapsed time of the function itself.

  • Sum % : the total CPU usage of the function itself.

  • Cum: Total running time of the function itself and its calling function.

  • Cum % : The total elapsed time of the function itself and its calling functions.

  • Name: indicates the function Name.

In most cases, we can use these five columns to get a sense of how an application is doing, what functions are currently doing, how much resources are being used, and who is doing the most to get a preliminary analysis direction.

Pprof also supports a number of other commands on the interactive command line, such as pprof help to view the help instructions.

Heap Profiling

$ go tool pprof http://localhost:6060/debug/pprof/heap
Fetching profile over HTTP from http://localhost:6060/debug/pprof/heap
Saved profile in /Users/eddycjy/pprof/pprof.alloc_objects.alloc_space.inuse_objects.inuse_space.011.pb.gz
Type: inuse_space
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)

Copy the code

The Type option, as you can see it defaults to inuse_space, can actually be analyzed for a variety of memory profiles, commonly in the following categories:

  • Inuse_space: Analyzes the resident memory footprint of an application.
$go tool pprof - inuse_space (pprof http://localhost:6060/debug/pprof/heap top Showing nodes accounting for 4.01 GB, 100% of 4.01GB total flat flat% sum% cum% 4.01GB 100% 4.01GB 100% main.Add 00% 100% 4.01GB 100% main.main.func1Copy the code
  • Alloc_objects: Analyzes the application’s temporary memory allocation.
$ go tool pprof -alloc_objects http://localhost:6060/debug/pprof/heap (pprof) top Showing nodes accounting for In 215552550, 100% of 215560746 Total Dropped 14 Nodes (CUM <= 1077803) flat flat% sum% cum% 86510197 40.13% 40.13% 86510197 Add 85984544 39.89% 80.02% 85984544 39.89% FMT.Sprintf 43057809 19.97% 100% 215552550 100% main.main.func1 0 0% 100% 85984544 39.89% log.printfCopy the code

In addition, there are inuse_objects and alloc_space categories, which correspond to the number of objects for each function and the size of memory allocated, depending on the specific situation.

Goroutine Profiling

$ go tool pprof http://localhost:6060/debug/pprof/goroutine
Fetching profile over HTTP from http://localhost:6060/debug/pprof/goroutine
Saved profile in /Users/eddycjy/pprof/pprof.goroutine.003.pb.gz
Type: goroutine
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) 

Copy the code

To view goroutines, you can use the traces command, which prints all the corresponding call stacks and indicators. This command allows you to easily see what the whole call link is, how many goroutines are used, and who is the real caller. The following output is displayed:

(pprof) traces
Type: goroutine
-----------+-------------------------------------------------------
         2   runtime.gopark
             runtime.netpollblock
             internal/poll.runtime_pollWait
             ...
-----------+-------------------------------------------------------
         1   runtime.gopark
             runtime.netpollblock
             ...
             net/http.ListenAndServe
             main.main
             runtime.main

Copy the code

On the call stack, the display order is bottom-up, that is, runtime.main calls main.main, and main.main calls net/http.ListenAndServe. It would be very easy to screen.

Each call stack information is segmented by ———–, and a function method is preceded by metrics such as Goroutine Profiling, which is the number of Goroutines that the method occupies. Heap Profiling shows how much memory is being used, as follows:

$ go tool pprof http://localhost:6060/debug/pprof/heap ... Type: inuse_space Entering interactive mode (type "help" for commands, "o" for options) (pprof) traces Type: inuse_space -----------+------------------------------------------------------- bytes: 13.55 MB 13.55 MB main. Add the main. The main. Func1 -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --Copy the code

Virtually all functions in Pprof show different results for different Profile types.

Mutex Profiling

When this happens, it can be blocked by calling chan (channel), sync.mutex (lock), time.sleep (), and so on. So to verify the stack trace of the competing owners of the Mutex, we can adjust the previous example code to sync.Mutex above, which looks like this:

func init() { runtime.SetMutexProfileFraction(1) } func main() { var m sync.Mutex var datas = make(map[int]struct{}) for  i := 0; i < 999; i++ { go func(i int) { m.Lock() defer m.Unlock() datas[i] = struct{}{} }(i) } _ = http.ListenAndServe(":6061", nil) }Copy the code

Need to pay special attention to is the runtime SetMutexProfileFraction statement, if the mutex collection by hope in the future, so need to call this method to set the sampling frequency, if not set or not set value greater than 0, the default is not to collect.

Next, we call go Tool pprof for analysis, as follows:

$ go tool pprof http://localhost:6061/debug/pprof/mutex
Fetching profile over HTTP from http://localhost:6061/debug/pprof/mutex
Saved profile in /Users/eddycjy/pprof/pprof.contentions.delay.010.pb.gz
Type: delay
Entering interactive mode (type "help" for commands, "o" for options)

Copy the code

Let’s look at calling the top command to see the rank of mutex:

(pprof) top
Showing nodes accounting for 653.79us, 100% of 653.79us total
      flat  flat%   sum%        cum   cum%
  653.79us   100%   100%   653.79us   100%  sync.(*Mutex).Unlock
         0     0%   100%   653.79us   100%  main.main.func1

Copy the code

We can then call the list command to see the code for the specified function (including specific metrics, such as time). If the function name is not clear, the default function name will be fuzzy matching, as follows:

(pprof) list main Total: 653.79 us ROUTINE = = = = = = = = = = = = = = = = = = = = = = = =. The main main. Func1 in/eddycjy/main. Go 0 653.79 us (flat, cum) 100% of Total . . 40: go func(i int) { . . 41: m.Lock() . . 42: defer m.Unlock() . . 43: . . 44: Datas [I] = struct{}{}.653.79 us 45:}(I).. 46:}.. 47:.. 48: err := HTTP.ListenAndServe(":6061", nil).. 49: if err ! = nil { . . 50: log.Fatalf("http.ListenAndServe err: %v", err) (pprof)Copy the code

We can see in the analysis of the output exactly where the function that causes the mutex is, and where the lock cost is, in this case line 45.

Block Profiling

And Mutex runtime. SetMutexProfileFraction similar, Block, also need to call the runtime. SetBlockProfileRate () of Settings, otherwise the default closed, If the value is less than or equal to 0, it is considered to be disabled.

Compared with the section on the Mutex, the subject code remains the same, only is the new runtime SetBlockProfileRate () call, as follows:

func init() {
 runtime.SetBlockProfileRate(1)
 ...
}

Copy the code

Let’s look at calling the top command to see the ranking of blocking cases:

$ go tool pprof http://localhost:6061/debug/pprof/block Fetching profile over HTTP from http://localhost:6061/debug/pprof/block Saved profile in /Users/eddycjy/pprof/pprof.contentions.delay.017.pb.gz Type: delay Entering interactive mode (type "help" for commands, "O" for options) (PPROF) Top Showing Nodes accounting for 74.54ms, 100% of 74.54ms total flat flat% sum% cum cum% 74.54ms 100% 100% 74.54ms 100% sync.(*Mutex).lock 00% 100% 74.54ms 100% main.main.func1Copy the code

Similarly, we can call the list command to see the specific blocking, executed in the same manner and troubleshooting mode as outlined earlier.

Viewing the Visual Interface

Let’s continue with the previous example program, re-run it, and then execute the following command in another window:

$wget HTTP: / / http://127.0.0.1:6060/debug/pprof/profileCopy the code

By default, it takes 30 seconds. After the execution, the collected file profile can be found in the current directory. For the visual interface, we can analyze the next step in two ways:

  1. Method one (recommended) : This command will run a PProf analysis site on the specified port number.
$ go tool pprof -http=:6001 profile 

Copy the code
  1. Method 2: Write the graphic as an SVG file using a Web command and open it in a Web browser.
$ go tool pprof profile Type: cpu Time: Feb 1, 2020 at 12:09pm (CST) Duration: 30S, Total samples = 60ms (0.2%) Entering interactive mode (type "help" for commands, "O" for options) (pprof) webCopy the code

Could not execute dot; May need to install Graphviz. That means you need to install the Graphviz component.

In addition, method 1 runs a site that actually contains the content of method 2 (SVG images) and is more flexible, so we will use method 1 directly to run the site for observation and analysis, except for special cases.

Analyze the content

Through the visualization interface provided by PProf, we can see the call chain and usage of Go application more conveniently and intuitively. In addition, in the View menu bar, PProf also supports a variety of analysis mode switching, as follows:

The view menu bar

We will cover all the profiles captured based on CPU Profiling, but the other Profiling patterns are also interlinked, and once we understand one, the rest will follow.

Top

The top section

This view has the same function and meaning as the top subcommand explained earlier, so I won’t repeat it.

Graph

Graph columns

This view shows the overall function call flow. The larger the box, the thicker the line, and the brighter the box (red), the longer it takes and the more expensive it is. On the contrary, if the box color is lighter and smaller, it means that in the overall function call process, its overhead is relatively small.

So we can use this view to analyze who is the most expensive and what call flow it is invoked for.

Peek

Peek columns

This view, compared to the Top view, has the added display of the owning context information, that is, the output caller/callee of the function.

Source

The source program

This view mainly adds source-oriented tracing and analysis, so you can see where most of the overhead is going.

Flame Graph

An overview of the flame graph

Flame Graph is dynamic and is called from top to bottom (A -> B -> C -> D), with each block representing A function, the brighter the color (red), and the larger the block the longer the CPU usage. It also supports click blocks for in-depth analysis.

We select the main.main.func1 block on the page, which will go to the next level below it, as follows:

Take a closer look at Flame Graph

In this way, we can analyze different functions according to their multidimensional levels, and better observe their flow and find problems.

Profiling through test cases

In the previous section, we collected data by embedding the method in the application, so there is another way to more precisely profile the process or function you want to analyze.

First we move the Add method we wrote earlier into a separate package and call it add.go. Then we create a new add_test.go file and write the following test case code:

func TestAdd(t *testing.T) {
 _ = Add("go-programming-tour-book")
}

func BenchmarkAdd(b *testing.B) {
 for i := 0; i < b.N; i++ {
  Add("go-programming-tour-book")
 }
}

Copy the code

After finishing coding, we go back to the command line window and execute the following collection command:

$ go test -bench=. -cpuprofile=cpu.profile

Copy the code

Profile file will be generated in the current command, and then simply run the go Tool pprof command to view it, as shown in the following figure:

cpu profile

In addition to CPU analysis, we can also adjust the options to analyze the memory condition, as follows:

$ go test -bench=. -memprofile=mem.profile

Copy the code

Then, as above, run the go Tool pprof command to check, as shown below:

Further inspection

Write files to parse by Lookup

In addition to the injection of HTTP handler and Go test, we can also collect and call related content in the program through the Lookup method provided by Pprof.

It supports six types, namely:

  • Goroutine.

  • Threadcreate.

  • Heap.

  • Block.

  • The mutex.

The specific code is as follows:

type LookupType int8

const (
 LookupGoroutine LookupType = iota
 LookupThreadcreate
 LookupHeap
 LookupAllocs
 LookupBlock
 LookupMutex
)

func pprofLookup(lookupType LookupType, w io.Writer) error {
 var err error
 switch lookupType {
 case LookupGoroutine:
  p := pprof.Lookup("goroutine")
  err = p.WriteTo(w, 2)
 case LookupThreadcreate:
  p := pprof.Lookup("threadcreate")
  err = p.WriteTo(w, 2)
 case LookupHeap:
  p := pprof.Lookup("heap")
  err = p.WriteTo(w, 2)
 case LookupAllocs:
  p := pprof.Lookup("allocs")
  err = p.WriteTo(w, 2)
 case LookupBlock:
  p := pprof.Lookup("block")
  err = p.WriteTo(w, 2)
 case LookupMutex:
  p := pprof.Lookup("mutex")
  err = p.WriteTo(w, 2)
 }

 return err
}

Copy the code

This method provides the IO.Writer interface, so as long as the corresponding Write method is implemented, we can Write it anywhere supported, as follows:

...
func init() {
 runtime.SetMutexProfileFraction(1)
 runtime.SetBlockProfileRate(1)
}

func main() {
 http.HandleFunc("/lookup/heap", func(w http.ResponseWriter, r *http.Request) {
  _ = pprofLookup(LookupHeap, os.Stdout)
 })
 http.HandleFunc("/lookup/threadcreate", func(w http.ResponseWriter, r *http.Request) {
  _ = pprofLookup(LookupThreadcreate, os.Stdout)
 })
 http.HandleFunc("/lookup/block", func(w http.ResponseWriter, r *http.Request) {
  _ = pprofLookup(LookupBlock, os.Stdout)
 })
 http.HandleFunc("/lookup/goroutine", func(w http.ResponseWriter, r *http.Request) {
  _ = pprofLookup(LookupGoroutine, os.Stdout)
 })
 _ = http.ListenAndServe("0.0.0.0:6060", nil)
}

Copy the code

In the above code, we will collect the result written to the console, we can make a validation, called http://127.0.0.1:6060/lookup/heap, the console output results are as follows:

$ go run main.go
heap profile: 0: 0 [0: 0] @ heap/1048576

# runtime.MemStats
# Alloc = 180632
# TotalAlloc = 180632
# Sys = 69928960
# Lookups = 0
...

Copy the code

Why initialize NET/HTTP /pprof

In our example, you can see that we initialized the net/ HTTP /pprof package by default (that is, _) on the reference. If you missed or did not add the net/ HTTP /pprof package, you will find that you cannot call the relevant interface of pprof at all.

func init() {
 http.HandleFunc("/debug/pprof/", Index)
 http.HandleFunc("/debug/pprof/cmdline", Cmdline)
 http.HandleFunc("/debug/pprof/profile", Profile)
 http.HandleFunc("/debug/pprof/symbol", Symbol)
 http.HandleFunc("/debug/pprof/trace", Trace)
}

Copy the code

Net/HTTP /pprof: net/ HTTP /pprof: net/ HTTP/pPROF: net/ HTTP/PPROF

var DefaultServeMux = &defaultServeMux
var defaultServeMux ServeMux

func HandleFunc(pattern string, handler func(ResponseWriter, *Request)) {
 DefaultServeMux.HandleFunc(pattern, handler)
}

Copy the code

The HTTP Server we use in this example is provided by default in the standard library, so it fits together perfectly, which happens to be the pattern of the smallest example.

Another problem you may notice is that in our actual projects, there is a relatively independent ServeMux. In this case, we just need to register the route corresponding to pprof, as follows:

 mux := http.NewServeMux()
 mux.HandleFunc("/debug/pprof/", pprof.Index)
 mux.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline)
 mux.HandleFunc("/debug/pprof/profile", pprof.Profile)
 mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol)
 mux.HandleFunc("/debug/pprof/trace", pprof.Trace)

Copy the code

conclusion

In this article, we introduce the use of Pprof in Go language in detail, and explain some commonly used suites. Pprof plays a very important role in our performance analysis and troubleshooting.

In daily life, you just need to follow reasonable troubleshooting ideas. I believe you will be able to solve those big or small problems according to the clues in Pprof and get off work on time!

I’m fried fish. See you next time 🙂

If you have any questions, welcome feedback and communication in the comments section. The best relationship is mutual achievement. Your praise is the biggest motivation for the creation of fried fish, thank you for your support.

This article is constantly updated. You can read it by searching “Brain into fried fish” on wechat. In this paper, making github.com/eddycjy/blo… Star has been included, welcome to urge more.