Allocation Efficiency in high-performance Go Services

About the tool

Our first piece of advice is: Don’t optimize too early. Go provides great performance tuning tools that directly point out where your code is consuming a lot of memory. No need to reinvent the wheel, I suggest you read this great post on Go’s official blog; You will learn how to tune CPU and memory using pprof step by step. In Segment we also used these tools to find performance bottlenecks in projects.

Use data to drive optimization.

Escape analysis

Go automatically manages memory, which helps us avoid a lot of potential bugs, but it doesn’t completely free programmers from memory allocation. Because Go does not provide a way to manipulate memory directly, developers must understand its internals in order to maximize revenue.

If you can only remember one thing after reading this article, remember this: stack allocation is cheap, heap allocation is expensive. Now let’s dig a little deeper into what this means.

Go has two places to allocate memory: a global heap space to allocate memory dynamically, and a stack space that each Goroutine has. Go tends to allocate memory in stack space — a Go program allocates most of its memory in stack space. It is cheap because only two CPU instructions are required: one to push the data onto the stack space for allocation and the other to free it from the stack space.

Unfortunately, not all memory can be allocated on the stack. Stack space allocation requires that the lifetime and memory footprint of a variable be determined at compile time. Otherwise, dynamic allocation on the heap space is required at run time. Malloc must find a chunk of memory large enough to hold the new variable data. On subsequent releases, the garbage collector scans the heap space for objects that are no longer being used. Needless to say, this is significantly more expensive than stack allocation with only two instructions.

The memory footprint represents all the blocks of memory associated with a variable. For example, if a struct contains a member *int, then the memory block referred to by *int is part of the footprint of the struct.

The compiler uses escape analysis techniques to choose between the two. The basic idea is to do garbage collection at compile time. The compiler tracks the scope of a variable on a code block. A variable carries a set of checksums to prove that its entire life is known at run time. If the variable passes these checks, it can be allocated on the stack. Otherwise it escapes and has to be distributed on the heap.

The mechanism of escape analysis is not explained in the Go language’s official description. The most effective way for Go programmers to learn these rules is by experience. The compilation command go build-gcflags ‘-m’ causes the compiler to print the results of escape analysis at compile time. Let’s look at an example:

package main

import "fmt"

func main() {
        x := 42
        fmt.Println(x)
}
Copy the code
$ go build -gcflags '-m' ./main.go
# command-line-arguments
./main.go:7: x escapes to heap
./main.go:7: main ... argument does not escape
Copy the code

We see x Escapes to heap, which means that it is dynamically allocated on the heap space at run time. This example is a bit confusing, and intuitively it is obvious that the variable x has not escaped main(). The compiler does not say why it thinks the variable escaped. To get more detail, pass a few more -m arguments to the compiler, which will print more detail.

$ go build -gcflags '-m -m' ./main.go
# command-line-arguments
./main.go:5: cannot inline main: non-leaf function
./main.go:7: x escapes to heap
./main.go:7:         from ... argument (arg to ...) at ./main.go:7
./main.go:7:         from *(... argument) (indirection) at ./main.go:7
./main.go:7:         from ... argument (passed to call[argument content escapes]) at ./main.go:7
./main.go:7: main ... argument does not escape
Copy the code

Yes, it shows that the variable x escaped because it was passed into a function that escaped.

This mechanism may seem elusive at first, but with a few uses of the tool, the pattern becomes clear. To make a long story short, here are some typical cases we found that can cause variables to escape to the heap:

  • Sends a pointer or a value with a pointer to a channel. At compile time, there is no way to know which Goroutine will receive data on a channel. So the compiler has no way of knowing when the variable will be released.

  • Stores Pointers or values with Pointers on a slice. A typical example is []*string. This causes the contents of the slice to escape. Although the array behind it may be allocated on the stack, the value it references must be on the heap.

  • The array behind Slice is reallocated because it may exceed its capacity (CAP) at append. The initialization of slice is known at compile time and is initially allocated on the stack. If the storage behind the slice is to be expanded based on runtime data, it is allocated on the heap.

  • Call a method on the interface type. Methods called on the interface type are dynamically scheduled — the true implementation of the method is known only at run time. Imagine a variable r of type IO.Reader. Calling r.read (b) causes both the value of r and the memory behind slice B to escape, so it will be allocated on the heap.

In our experience, these four points are the most common causes of heap allocation in Go programs. Fortunately, there is a solution! Let’s dive into some concrete examples of how to locate memory performance problems on online systems.

About the pointer

A good rule of thumb is that the data to which Pointers point is allocated on the heap. Therefore, reducing the use of Pointers in programs can reduce heap allocation. This is not absolute, but we find that it is the most common problem in practical problems.

In general, we think: “Copying values is expensive, so use a pointer instead.” However, in many cases, copying values directly is much cheaper than using Pointers. You may ask why.

  • The compiler checks when removing the pointer. The goal is to panic() directly if Pointers are nil to avoid memory leaks. This requires more code to be executed at run time. If the data is passed by value, you don’t have to do that, it can’t be nil

  • Pointers usually have bad local references. All values inside a function are allocated on the stack space. Local references are an important part of writing efficient code. It makes variable data hotter in the CPU Cache(the CPU’s primary and secondary caches), thus reducing the chance of a Cache miss during instruction prefetch.

  • Copying a bunch of objects in the Cache layer can be roughly considered as efficient as copying a pointer. The CPU moves memory with a fixed Cache size in each Cache layer and main memory. 64 bytes on x86 machines. Furthermore, Go uses Duff’s Device technology to make routine memory operations more efficient.

Pointers should be used primarily for mapping data ownership and variability. Pointers to avoid copying should be used sparingly in real projects. Don’t fall into the trap of premature optimization. Get into the habit of passing by value, using Pointers only when needed. Another benefit is that there are fewer security issues associated with nil.

Another advantage of reducing the use of Pointers in a program is that if it can be proved that there are no Pointers in it, the garbage collector will simply skip the memory. For example, a block of on-heap memory stored behind [] bytes does not need to be scanned. The same is true for arrays and struct data types that do not contain Pointers.

When the garbage collector collects a variable, it checks to see if there are Pointers to the type. If so, check to see if the memory to which the pointer points can be reclaimed to determine whether the variable can be reclaimed. And so on and so forth. If there is no pointer in the reclaimed variable, there is no need to go into the recursive scan, just recycle it.

Reducing the use of Pointers not only reduces the amount of garbage collection, it results in more cache-friendly code. Read memory is to read data from main memory to the CPU cache. Cache space is limited, so other data must be erased to make room. The erased data is probably related to another part of the program. The resulting cache jitter can cause unexpected and sudden jitter of online services.

Again, Pointers

Reducing the use of Pointers means digging deeper into our custom data types. One of our services uses a circular buffer with a set of data structures to build a queue of failed operations for retry; It looks something like this:

type retryQueue struct {
    buckets       [][]retryItem // each bucket represents a 1 second interval
    currentTime   time.Time
    currentOffset int
}

type retryItem struct {
    id   ksuid.KSUID // ID of the item to retry
    time time.Time   // exact time at which the item has to be retried
}
Copy the code

The size of the array on the other side of buckets is fixed, but the number of items in []retryItem changes at runtime. The more retries, the larger the slice grows.

Digging into the retryItem implementation, we find that KSUID is an alias of [20] Byte, which has no pointer, so it can be ruled out. CurrentOffset currentOffset is an int and of fixed length. Next, take a look at the implementation of time.time:

type Time struct {
    sec  int64
    nsec int32
    loc  *Location // pointer to the time zone structure
}
Copy the code

The time. time structure contains a pointer member, LOc. Using it in retryItem causes the GC to pass through this area on the heap every time. I have to track the pointer inside the structure.

We found that this case is typical. Failures are rare during normal operation. Only a small amount of memory is used to store retry operations. When failures suddenly spike, the number of objects in the retry queue grows by thousands per second, putting a lot of pressure on the garbage collector.

In this case, the time zone information in time.time is not necessary. These in-memory time cuts are never serialized. So you can rewrite the data structure to avoid this:

type retryItem struct {
    id   ksuid.KSUID
    nsec uint32
    sec  int64
}

func (item *retryItem) time() time.Time {
    return time.Unix(item.sec, int64(item.nsec))
}

func makeRetryItem(id ksuid.KSUID, time time.Time) retryItem {
    return retryItem{
        id:   id,
        nsec: uint32(time.Nanosecond()),
        sec:  time.Unix(),
}
Copy the code

Notice that retryItem now does not contain any Pointers. This greatly reduces GC pressure because the entire footprint of the retryItem is known at compile time.

Pass Slice

Slices are frenzied areas that cause inefficient memory allocation behavior. Unless the size of the slice is known at compile time, the array behind the slice (as is the map) is allocated on the heap. Let’s talk about a couple of ways to get slices allocated on the stack instead of the heap.

A project that relies heavily on MySQL. The performance of the entire project depends heavily on the performance of the MySQL client driver. After analyzing memory allocation using pprof, we found that the MySQL driver serialized time. time code was very inefficient.

The performance analyzer shows that a large percentage of the memory allocated on the heap is used to serialize time. time, which is why MySQL driver is inefficient.


Go tool pprof result

This inefficient code calls the Format() method of time.time, which returns a string. Wait, I thought we were talking about slicing? Well, according to the Go blog, a string is actually a read-only []byte, but the language has some syntax support. The rules are the same for memory allocation.

The analysis shows that 12.38% of memory allocation is caused by Format(). What does Format() do?


It shows that there are more efficient ways to achieve the same results using the standard library. But Format() is convenient to use, and AppendFormat() is friendlier in memory allocation. Parsing the time package source, we see that it uses AppendFormat() instead of Format(). This further demonstrates the higher performance that AppendFormat() can provide.


In effect, the Format() function is just a wrapper around AppendFormat().

func (t Time) Format(layout string) string {
          const bufSize = 64
          var b []byte
          max := len(layout) + 10
          if max < bufSize {
                  var buf [bufSize]byte
                  b = buf[:0]
          } else {
                  b = make([]byte, 0, max)
          }
          b = t.AppendFormat(b, layout)
          return string(b)
}
Copy the code

More importantly, AppendFormat() leaves the programmer more room for optimization. Instead of returning a string directly, it needs to pass in a slice for storage. Using AppendFormat() instead of Format() can do the same thing with a fixed size of memory space, and these operations are done in stack space.

Interface type

It is well known that calling a method on an Interface type is less efficient than calling a method directly on a Struct. Calling methods on the interface type is dynamically scheduled. This greatly limits the compiler’s ability to determine how runtime code is executed. So far we’ve talked a lot about tweaking your code so that the compiler can better understand how your code behaves at compile time. But the interface type makes all of this useless.

Unfortunately, the interface type is also a very useful abstraction — it allows us to write more extensible code. A common use of interface is the hash function in the hash package of the standard library. The Hash package defines a common interface and then provides several concrete implementations. Let’s look at a few examples:

package main

import (
        "fmt"
        "hash/fnv"
)

func hashIt(in string) uint64 {
        h := fnv.New64a()
        h.Write([]byte(in))
        out := h.Sum64()
        return out
}

func main() {
        s := "hello"
        fmt.Printf("The FNV64a hash of '%v' is '%v'\n", s, hashIt(s))
}
Copy the code

Compiling the previous code, along with the escape analysis parameters, produces the following output:

./foo1.go:9:17: inlining call to fnv.New64a ./foo1.go:10:16: ([]byte)(in) escapes to heap ./foo1.go:9:17: Hash.Hash64(&fnV.s ·2) escapes to heap./foo1.go:9:17: &fnv.s·2 escapes to heap./foo1.go:9:17: moved to heap: Fnv.s ·2./foo1.go:8:24: hashIt in does not escape./foo1.go:17:13: s escapes to heap./foo1.go:17:59: hashIt(s) escapes to heap ./foo1.go:17:12: main ... argument does not escapeCopy the code

This shows that hash objects, input strings, and [] bytes all escape to the heap. To the naked eye, it’s obvious that the data didn’t escape at all, but the interface type limits what the compiler can do. There is no way to safely invoke the concrete implementation of a hash without entering its interface structure. So unless you manually implement a library that doesn’t use interfaces, there’s no good way.

A little trick

That last one is funnier than it actually is. However, it can give us a deeper understanding of the compiler’s escape analysis mechanism. When reading library source code to solve performance problems, we see code like this:

// noescape hides a pointer from escape analysis. noescape is // the identity function but escape analysis doesn't think  the // output depends on the input. noescape is inlined and currently // compiles down to zero instructions. // USE CAREFULLY! //go:nosplit func noescape(p unsafe.Pointer) unsafe.Pointer { x := uintptr(p) return unsafe.Pointer(x ^ 0) }Copy the code

This function hides the pointer argument from the compiler’s escape analysis. What does that mean? Let’s take a look at an example.

package main import ( "unsafe" ) type Foo struct { S *string } func (f *Foo) String() string { return *f.S } type FooTrick struct { S unsafe.Pointer } func (f *FooTrick) String() string { return *(*string)(f.S) } func NewFoo(s string)  Foo { return Foo{S: &s} } func NewFooTrick(s string) FooTrick { return FooTrick{S: noescape(unsafe.Pointer(&s))} } func noescape(p unsafe.Pointer) unsafe.Pointer { x := uintptr(p) return unsafe.Pointer(x  ^ 0) } func main() { s := "hello" f1 := NewFoo(s) f2 := NewFooTrick(s) s1 := f1.String() s2 := f2.String() }Copy the code

The previous code had two implementations of the same functionality: they contained a string and then returned the string using the string () function. But the compiler’s escape analysis output table names the FooTrick version without escape.

./foo3.go:24:16: &s escapes to heap
./foo3.go:23:23: moved to heap: s
./foo3.go:27:28: NewFooTrick s does not escape
./foo3.go:28:45: NewFooTrick &s does not escape
./foo3.go:31:33: noescape p does not escape
./foo3.go:38:14: main &s does not escape
./foo3.go:39:19: main &s does not escape
./foo3.go:40:17: main f1 does not escape
./foo3.go:41:17: main f2 does not escape
Copy the code

The key is these two lines

./foo3.go:24:16: &s escapes to heap
./foo3.go:23:23: moved to heap: s
Copy the code

The compiler recognizes that the NewFoo() function references the string and stores it in the structure, resulting in escape. However, NewFooTrick() has no such output. If you delete the code that calls noescape(), an escape occurs. What the hell happened?

func noescape(p unsafe.Pointer) unsafe.Pointer {
    x := uintptr(p)
    return unsafe.Pointer(x ^ 0)
}
Copy the code

The noescape() function obscures the input and output dependencies. The compiler doesn’t think p will escape through x because the uintptr() produces references that the compiler doesn’t understand. The built-in Uintptr type makes it believable that this is a true pointer type, but at the compiler level it’s just an int type big enough to hold a point. The last line of code returns unsafe.Pointer, which is also an int.

Noescape () is heavily used where unsafe.Pointer is used in the Runtime package. This is useful if the author knows that the data referenced by unsafe.Pointer will definitely not be escaped, but the compiler doesn’t.

But keep in mind that we strongly discourage using this technique. That’s why the package is called unsafe and the source code includes USE CAREFULLY! The reason for the comment.

Tips:

  1. Don’t optimize too early, use data to drive our optimization efforts.
  2. Stack space allocation is cheap, heap space allocation is expensive.
  3. Understanding escape allows us to write more efficient code.
  4. The use of Pointers makes stack allocation even less feasible.
  5. Find apis that provide allocation control in inefficient blocks of code.
  6. Use interfaces sparingly where they are frequently invoked.