Slice Preallocates memory

When slice’s capacity is less than 1024, the capacity grows by a factor of two. When the capacity is greater than 1024, the capacity is increased 1.25 times. Take a look at the following example:

func appendOne(num int) []int {
	var res []int
	for i := 0; i < num; i++ {
		res = append(res, i)
	}
	return res
}

func appendMany(num int) []int {
	res := make([]int.0, num)
	for i := 0; i < num; i++ {
		res = append(res, i)
	}
	return res
}
Copy the code

AppendOne does not specify the initial capacity; appendMany specifies the initial capacity. Here’s a benchmark test:

func BenchmarkAppendOne(b *testing.B) {
	num := 10000
	for i := 0; i < b.N; i++ {
		_ = appendOne(num)
	}
}

func BenchmarkAppendMany(b *testing.B) {
	num := 10000
	for i := 0; i < b.N; i++ {
		_ = appendMany(num)
	}
}
Copy the code

Run the test

$ go test -bench=. -benchmem                                                      
goos: darwin
goarch: amd64
pkg: com.learn/gormLearn/gc_gc
BenchmarkAppendOne-4               23163             50675 ns/op          386296 B/op         20 allocs/op
BenchmarkAppendMany-4              96781             12241 ns/op           81920 B/op          1 allocs/op
PASS
Copy the code

AppendMany allocates memory once per operation, allocates 81920 B per operation, and spends 12241 ns per operation. Slice does not allocate memory as it expands the underlying array, and the old underlying data can still be reused, which obviously reduces GC pressure.

Similarly, you can specify the size of a new map.

func makeMap(num int){
    m := make(map[int]int,num)
    for i:=0; i<len(num); i++{ m[i]=i } }Copy the code

This reduces the overhead of memory copying as well as rehash overhead.

Map holds values instead of Pointers, using a segmented map

In the following example, map holds Pointers and values respectively.

func timeGC(a) time.Duration {
	start := time.Now()
	runtime.GC()
	return time.Since(start)
}

func mapPointer(num int) {
	m := make(map[int] *int, num)
	for i := 0; i < num; i++ {
		m[i] = &i
	}
	runtime.GC()
	fmt.Printf("With %T, GC took %s\n", m, timeGC())
	_ = m[0]}func mapValue(num int) {
	m := make(map[int]int, num)
	for i := 0; i < num; i++ {
		m[i] = i
	}
	runtime.GC()
	fmt.Printf("With %T, GC took %s\n", m, timeGC())
	_ = m[0]}func mapPointerShard(num int) {
	shards := make([]map[int] *int.100)
	for i := range shards {
		shards[i] = make(map[int] *int)}for i := 0; i < num; i++ {
		shards[i%100][i] = &i
	}
	runtime.GC()
	fmt.Printf("With map shards (%T), GC took %s\n", shards, timeGC())
	_ = shards[0] [0]}func mapValueShard(num int) {
	shards := make([]map[int]int.100)
	for i := range shards {
		shards[i] = make(map[int]int)}for i := 0; i < num; i++ {
		shards[i%100][i] = i
	}
	runtime.GC()
	fmt.Printf("With map shards (%T), GC took %s\n", shards, timeGC())
	_ = shards[0] [0]}const N = 5e7 // 5000w

func BenchmarkMapPointer(b *testing.B) {
	mapPointer(N)
}

func BenchmarkMapValue(b *testing.B) {
	mapValue(N)
}

func BenchmarkMapPointerShard(b *testing.B) {
	mapPointerShard(N)
}

func BenchmarkMapValueShard(b *testing.B) {
	mapValueShard(N)
}
Copy the code

run

$ go testBench =^BenchmarkMapPointer$benchmem With map[int]*int, GC took 545.139836ms Goos: Darwin Goarch: amd64 PKG: bench=^BenchmarkMapPointer$benchmem With map[int]*int, GC took 545.139836ms Goos: Darwin Goarch: amd64 PKG: com.learn/gormLearn/gc_gc BenchmarkMapPointer-4 1 9532798100 ns/op 1387850488 B/op 724960 allocs/op $ gotest-bench=^BenchmarkMapPointerShard$-benchmem With map shards ([]map[int]*int), GC took 688.39764ms goos: Darwin goarch: amd64 pkg: com.learn/gormLearn/gc_gc BenchmarkMapPointerShard-4 1 20670458639 ns/op 4286763416 B/op 1901279 allocs/op $ gotestBench =^BenchmarkMapValueShard$-benchmem With map shards ([]map[int]int), GC took 1.965519ms goos: Darwin goarch: bench=^BenchmarkMapValueShard$-bench With map shards ([]map[int]int), GC took 1.965519ms goos: Darwin goarch: amd64 pkg: com.learn/gormLearn/gc_gc BenchmarkMapValueShard-4 1 16190847776 ns/op 4385268936 B/op 1918445 allocs/op $ gotestBench =^BenchmarkMapValue$benchmem With map[int]int, GC took 22.993926ms goos: Darwin goarch: amd64 PKG: bench=^BenchmarkMapValue$benchmem With map[int]int, GC took 22.993926ms goos: Darwin goarch: amd64 PKG: com.learn/gormLearn/gc_gc BenchmarkMapValue-4 1 8253025035 ns/op 1444338752 B/op 724512 allocs/opCopy the code

As you can see, using a segmented, value-saving map has the lowest GC time. Add GODEBUG=gctrace=1 to GC trace:

$ GODEBUG=gctrace=1 go test-bench=^BenchmarkMapPointer$ -benchmem ... The gc 3 @ 0.130 s 19% : 0.006+424+ 0.013ms clock, 0.027+0.18/424/848+ 0.055ms CPU, 1224->1224->1224 MB, 1225 MB goal, 4 P GC 4 @ 9.41s 2%: 0.005+543+0.002 ms clock, 0.022+0/543/1628+0.011 ms CPU, 1325->1325->1323 MB, 2448 MB goal, (2) the contract shall be forced. 0.003+547+0.003 ms clock, 0.013+0/547/1631+0.013 ms CPU, 1323->1323->1323 MB, 2647 MB goal, 4 P (forced) With map[int]*int, GC took 550.40821msCopy the code

To understand the printed log, we need to understand gCTrace, taking the 0.013+0/547/1631+0.013 ms CPU as an example, GC is divided into three phases.

  • Mark Prepare (STW)0.013Represents the stop the WROLD time of the marking phase.
  • Marking.0/547/1631, 0 meansmutator assistThe time consuming,547.1631Are the time taken to mark GC.
  • Mark Termination (STW).0.013Represents the stop the WROLD time for the end of the tag.
$ GODEBUG=gctrace=1 go test -bench=^BenchmarkMapValue$ -benchmem ... The gc 3 @ 0.018 s 0% : 1224->1224->1224 MB, 1225 MB goal, 4 P gc 4 @8.334s 0%: 0.006+21+0.003 ms clock, 0.027+0/6.4/21+0.013 MS CPU, 1379->1379->1334 MB, 2448 MB 0.003+19+0.003 ms clock, 0.014+0/5.0/20+0.015 MS CPU, 1334->1334->1334 MB, 2668 MB goal, 4 P (forced)Copy the code

As you can see, map takes less time to save values than Pointers, mainly during the marking phase of GC.

Conversion of string to []byte

In Golang, strings are immutable by design. Thus, both string and []byte conversions produce a new copy.

func Example(a) {
	s := "Hello,world"
	b := []byte(s)
}
Copy the code

If you are sure that the converted String /[]byte will not be modified, you can do a direct conversion so that a copy of the original variable will not be generated. The new variables share the underlying data pointer.

func String2Bytes(s string) []byte {
	stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
	bh := reflect.SliceHeader{
		Data: stringHeader.Data,
		Len:  stringHeader.Len,
		Cap:  stringHeader.Len,
	}
	return* (* []byte)(unsafe.Pointer(&bh))
}

func Bytes2String(b []byte) string {
	sliceHeader
	sh := reflect.StringHeader{
		Data: sliceHeader.Data,
		Len:  sliceHeader.Len,
	}
	return* (*string)(unsafe.Pointer(&sh))
}
Copy the code

Function returns values using values, not Pointers

For a function that occupies little space and is frequently allocated, if the function returns a pointer, memory escape will occur, so that the memory that could be allocated on the stack will be allocated on the heap. Copying small objects on the stack performs well, much better than allocating objects on the heap. In the following example, two functions return a value and a pointer.

type S struct {
	a, b, c int64
	d, e, f string
	g, h, i float64
}

func byCopy(a) S {
	return S{
		a: 1, b: 1, c: 1,
		e: "lyp", f: "lyp",
		g: 1.0, h: 1.0, i: 1.0,}}func byPointer(a) *S {
	return &S{
		a: 1, b: 1, c: 1,
		e: "lyp", f: "lyp",
		g: 1.0, h: 1.0, i: 1.0,}}Copy the code

Benchmark functions

func BenchmarkMemoryStack(b *testing.B) {
	var s S

	f, err := os.Create("stack.out")
	iferr ! =nil {
		panic(err)
	}
	defer f.Close()

	err = trace.Start(f)
	iferr ! =nil {
		panic(err)
	}

	for i := 0; i < b.N; i++ {
		s = byCopy()
	}

	trace.Stop()

	b.StopTimer()
	_ = fmt.Sprintf("%v", s.a)
}

func BenchmarkMemoryHeap(b *testing.B) {
	var s *S

	f, err := os.Create("heap.out")
	iferr ! =nil {
		panic(err)
	}
	defer f.Close()

	err = trace.Start(f)
	iferr ! =nil {
		panic(err)
	}

	for i := 0; i < b.N; i++ {
		s = byPointer()
	}

	trace.Stop()

	b.StopTimer()
	_ = fmt.Sprintf("%v", s.a)
}
Copy the code

run

 go test. /... -bench=BenchmarkMemoryHeap -benchmem -run=^$ -count=10 goos: darwin goarch: amd64 pkg: Com. learn/gormLearn/ gc_GC BenchmarkMemoryHeap-4 19625536 53.0 ns/op 96 B/op 1 ALlocs /op gotest. /... -bench=BenchmarkMemoryStack -benchmem -run=^$ -count=10 goos: darwin goarch: amd64 pkg: Com. learn/gormLearn/ gc_GC BenchmarkMemoryStack-4 163253341 7.22 NS /op 0 B/op 0 ALlocs /opCopy the code

As you can see, stack allocation (return value) takes 7.22 ns/op, while heap allocation (return pointer) takes 53.0 ns/op.

Optimization using struct{}

In Golang, there is no set. If you want to implement a collection, you can use struct{} as the value.

func assign(num int) {
	m := make(map[int]bool, num)
	for i := 0; i < num; i++ {
		m[i] = true}}func assignStruct(num int) {
	m := make(map[int]struct{}, num)
	for i := 0; i < num; i++ {
		m[i] = struct{} {}}}Copy the code

Struct {} is optimized by the compiler to refer to the same memory address (runtime.zerobase).

Tools for GC analysis

  • go tool pprof
  • go tool trace
  • Go build – gcflags = “- m”
  • GODEBUG gctrace = = “1”

My official account: THE place to share lyP

My Zhihu column

My blog 1

My blog 2