Memory leaks can occur in many forms throughout the system. In addition to careless writing of code and forgetting to close the resource that should be closed, memory leaks are more often caused by poor design decisions or careless business logic that fails to consider some boundary conditions.

For example, when searching the database, a query condition is not applied under certain circumstances, resulting in the program is forced to hold a large result set, so that for a period of time, more threads performing the same task, will cause memory leaks.

Golang provides us with the Pprof tool. In this article, we will focus on how to use Pprof to troubleshoot memory leaks.

I will not Go into details about how to add pprof to the system developed by Go, because my previous article explained the installation and use of Pprof in detail. The link to the article is here:

  • Golang program performance analysis (1) PPROF and flame map analysis
  • Using Pprof in Echo and Gin frameworks
  • Golang program performance analysis (iii) gRPC service performance analysis with Pprof

Of course, if you want to try something smarter, let the application monitor itself, and when jitter occurs, Dump the memory, CPU call stack information, can take a look at the following two articles I introduced methods and useful library.

  • Learn these tricks and let Go monitor itself
  • Design and implementation of automatic sampling performance analysis for Go service

Memory leaks should depend on which indicator

The Pprof tool set provides the ability to sample a variety of performance metrics within the Go program. The performance metrics we often use are:

  • Profile: CPU sampling
  • Heap: A sampling of the memory allocation of active objects in the heap
  • Goroutine: Stack information of all goroutines
  • Allocs: Samples the memory allocation information of all objects since the program started (including memory already collected by the GC)
  • Threadcreate: The stack information sampled leads to the creation of a new system thread

Heap and ALlocs are two memory-related metrics. The ALlocs metric samples the memory allocation information of all objects started by the program. Allocs is usually viewed when we want to analyze which code can be optimized for efficiency.

Heap

Pprof’s heap information is a sampling of the memory allocation of active objects in the heap. The heap is where memory is stored for objects that our code uses, and where the GC will reclaim memory. What objects in Go are allocated to the heap? In general, objects referenced by more than one function, global variables, objects larger than a certain size (32KB) are allocated to the heap, and in the case of Go, there are other situations where objects can escape to the heap.

We won’t go into details about which variables are allocated to the heap and memory escape. For details, see the following two articles.

  • Illustrates the memory allocation strategy for Go memory manager
  • Go memory management code escape analysis

Heap sampling

To get the sampling information for the heap metric using pprof, one scenario is to use the “NET/HTTP /pprof” package

import (
	"net/http/pprof"
)

func main(a) {
	http.HandleFunc("/debug/pprof/heap", pprof.Index)
	......
	http.ListenAndServe(": 80".nil)}Copy the code

It is then obtained through an HTTP request

curl -sK -v https://example.com/debug/pprof/profile > heap.out
Copy the code

Another major approach is to save the sampling information to a file using the method provided by Runtime.pprof.

pprof.Lookup("heap").WriteTo(profile_file, 0)
Copy the code

How these two packages are used, and how to sample information into files, is explained in detail in the article on automatic sampling, which I won’t spend too much time on here.

How to use pprof to find out where the code is causing the memory leak.

Use pprof to find out where the memory leaks are

Pprof samples the heap metric using the runtime.memprofile function, which by default collects allocation information for each 512KB allocated byte. We could set runtime.memProfile to collect information about all objects, but this would have a serious impact on application performance.

Once we have the sample file, we can load the information into an interactive mode console using go Tool Pprof.

> go tool pprof heap.out
Copy the code

After entering the interactive console, you will be prompted as follows:

File: heap.out
Type: inuse_space
Time: Feb 1, 2022 at 10:11am (CST)
Entering interactive mode (type "help" for commands, "o" for options)
Copy the code

Where Type: inuse_space specifies the Type of information sampled in the file. Possible values of Type are:

  • Inuse_space – Memory space allocated but not yet freed
  • Inuse_objects – Number of allocated but unreleased objects
  • Alloc_space – Amount of memory allocated (freed is also counted)
  • Alloc_objects – Total number of allocated objects (whether freed or not)

Next, the pprof interactive command top, which can also be topN, such as top10. This is similar to the top command on Linux, which outputs the top N most-used functions.

(pprof) Top10 Showing Nodes Accounting for 134.55MB, 92.16% of 145.99MB Total Dropped 60 nodes (cum <= 0.73MB) Showing top 10 nodes out of 117 flat flat% sum% cum% 60.53 MB 41.46% 41.46% 58.69% github.com/jinzhu/gorm.glob.. 85.68 MB Func2 18.65MB 12.77% 54.24% 18.65MB 12.77% regexp.(* regexp).split 16.95MB 11.61% 65.84% 16.95MB 11.61% Github.com/jinzhu/gorm. Scope (*). 8.67 MB AddToVars 5.94% 71.78% 88.39% example.com/xxservice/dummy.GetLargeData 129.05 MB 6.50MB 4.45% 87.08% 6.50MB 4.45% fmt.Sprintf 4MB 2.74% 89.82% 4MB 2.74% runtime.malg 1.91MB 1.31% 91.13% 1.91MB 1.31% Strings.replace 1.51MB 1.03% 92.16% 1.51MB 1.03% bytes.makesliceCopy the code

Of the two, the top three is a gorM library method, which is an ORM library, but is likely to leak memory because of the business logic behind it, dummy.GetLargeData method.

In the list of the output of the top instruction, we see two values, flat and cum.

  • Flat: represents the memory space allocated and held by this function.
  • Cum: represents the amount of memory allocated by this function or the functions below its call stack.

In addition, sum% represents the sum of the flat% output from the previous several lines. For example, the sum% value of the fourth line above is 71.78%, which is actually the sum of the flat% output from the three lines above it.

After locating the function that caused the memory leak, the next optimization problem is to drill down into the function to see if there are any misuses or logical omissions, such as the example I gave at the beginning where the query condition does not apply in some cases.

Of course, if you want to find out exactly which code caused the memory overflow inside a function, you can use the list directive.

The list directive lists the memory allocated for each line of code within a function at run time (CPU usage is displayed if CPU is analyzed).

(pprof) list dummy.GetLargeData Total: 814.62 MB ROUTINE = = = = = = = = = = = = = = = = = = = = = = = = dummy. GetLargeData in/home / / / XXX XXX XXX. Go 814.62 MB 814.62 MB (flat, cum) 100% of Total . . 20: }() . . 21: . . 22: tick := time.Tick(time.Second / 100) . . 23: var buf []byte . . 24: For range tick {814.62MB 814.62MB 25: buf = append(buf, make([]byte, 1024*1024)...) .. 26:}.. 27:}.. 28:Copy the code

conclusion

Here is a brief summary of how to use Pprof to detect memory leaks in your applications. Of course, if your company is capable of continuous sampling, or the automatic sampling scheme described in my previous article, it is better to use it and let the machine do the work for us.

However, no matter what method we use, it can only help us locate where the memory leak is caused. As for how to optimize the solution to this problem, I feel that most of the memory leak is a matter of business logic and system design, and which resources are neglected, forgotten, or used incorrectly is relatively rare. So how to solve this has to be a case by case basis.