In the last article about string splicing Go language string efficient splicing (two), we finally for the Builder splicing name, really lived up to expectations, especially more and more strings splicing, its performance superiority is more obvious.

At the end of the last article, I left the suspense that there was room for optimization, which is why today’s article, the third and final in the string concatenation series, will look at how to improve Builder performance even more. For the first article on efficient string concatenation, click on efficient String concatenation (1).

Where is the slow Builder

Since we want to optimize Builder splicing, so we at least know where it is slow, let’s continue with our test case from the previous article and run it to see the performance.

Builder10-8     5000000     258 ns/op       480 B/op        4 allocs/op
Builder100-8    1000000     2012 ns/op      6752 B/op       8 allocs/op
Builder1000-8   100000      21016 ns/op     96224 B/op      16 allocs/op
Builder10000-8  10000       195098 ns/op    1120226 B/op    25 allocs/op
Copy the code

In order to optimize the Builder splicing, four different numbers of strings of 10, 100, 1000 and 10000 were adopted for splicing test. We find that each operation has a different number of memory allocations, the more memory allocations, the slower, and if GC is caused, the slower. First we optimize this to reduce the number of memory allocations.

Memory allocation optimization

The runtime.growslice function is called frequently and for a long time, according to cpuProfile, which looks at the generated flame map. Let’s look at the source code for Builder.writeString:

func (b *Builder) WriteString(s string) (int, error) {
	b.copyCheck()
	b.buf = append(b.buf, s...)
	return len(s), nil
}
Copy the code

Certainly is triggered the runtime append method. The growslice, because b.b uf insufficient capacity of the cap, so you need to call the runtime. Growslice expand b.b uf’s capacity, before you can add new element s… . Capacity expansion naturally involves memory allocation, and the more content is added, the more times the content is allocated, which is the same as the data in our performance test above.

Now that the cause of the problem has been identified, we can optimize it by reducing runtime.growslice calls, or even not calling them at all. According to this idea, we need to allocate capacity cap for B.Bof in advance. Fortunately, Builder provides us with the capacity expansion method Grow. We can expand the capacity by using the Grow method before WriteString.

Now let’s transform our StringBuilder function.

//blog:www.flysnow.org
// Wechat official account :flysnow_org
func StringBuilder(p []string.cap int) string {
	var b strings.Builder
	l:=len(p)
	b.Grow(cap)
	for i:=0; i<l; i++{ b.WriteString(p[i]) }return b.String()
}
Copy the code

Add a parameter cap to let the user tell us how much capacity we need. The implementation of the Grow method is very simple, is a make function to expand the size of b.uf, and then copy b.uf.

func (b *Builder) grow(n int) {
	buf := make([]byte.len(b.buf), 2*cap(b.buf)+n)
	copy(buf, b.buf)
	b.buf = buf
}
Copy the code

So now our performance test case looks like this:

func BenchmarkStringBuilder10(b *testing.B) {
	p:= initStrings(10)
	cap: =10*len(BLOG)
	b.ResetTimer()
	for i:=0; i<b.N; i++{ StringBuilder(p,cap)}}func BenchmarkStringBuilder1000(b *testing.B) {
	p:= initStrings(1000)
	cap: =1000*len(BLOG)
	b.ResetTimer()
	for i:=0; i<b.N; i++{ StringBuilder(p,cap)}}Copy the code

For illustrative purposes and short code, there are only 10 and 1000 element use cases, and the rest is similar. To maximize performance, I allocated as much capacity as I needed at once. Now let’s run the performance test code again.

Builder10-8     10000000    123 ns/op       352 B/op    1 allocs/op
Builder100-8    2000000     898 ns/op       2688 B/op   1 allocs/op
Builder1000-8   200000      7729 ns/op      24576 B/op  1 allocs/op
Builder10000-8  20000       78678 ns/op     237568 B/op 1 allocs/op
Copy the code

Performance was more than doubled, with only one memory allocation, and the memory footprint per operation was more than halved, lowering GC.

summary

This optimization, at this point, is over, and it will be easy to write, but the principle behind it is also very common, which is to pre-allocate memory, reduce the append process memory reallocation and data copy, so that we can improve a lot of performance. So for the foreseeable length of the cut, you can apply for good memory in advance.

The string concatenation series ends here, there are three series, I hope to help you all.

This article is an original article, reprinted with notes of origin, “there are always bad people grab the article when also remove my original description” welcome to scan the code to follow the public number flysnow_org or www.flysnow.org/, the first time to see the follow-up wonderful article. “Anti-rotten person remarks **…… &*¥” feel good, feel free to share it in moments, thank you for your support.