How to efficiently concatenate strings in Go

preface

Hello, everyone, I’m Asong

The operation of string concatenation is inseparable from daily business development. Different languages have different ways of realizing string concatenation. In Go language, there are six ways to concatenate string. Which is more efficient to use? In this article, we will analyze it together.

This article uses Go language version 1.17.1

Type string

Let’s first look at the structure definition of string in Go. Let’s first look at the official definition:

// string is the set of all strings of 8-bit bytes, conventionally but not
// necessarily representing UTF-8-encoded text. A string may be empty, but
// not nil. Values of string type are immutable.
type string string
Copy the code

A string is a collection of 8-bit bytes, usually but not necessarily representing UTF-8 encoded text. String can be empty, but it cannot be nil. The value of string cannot be changed.

String is essentially a structure, defined as follows:

type stringStruct struct {
    str unsafe.Pointer
    len int
}
Copy the code

StringStruct is similar to slice in that STR points to the beginning of an array and len to the length of the array. What array does slice point to? Let’s look at the method he calls when instantiating:

//go:nosplit
func gostringnocopy(str *byte) string {
	ss := stringStruct{str: unsafe.Pointer(str), len: findnull(str)}
	s := *(*string)(unsafe.Pointer(&ss))
	return s
}
Copy the code

The input parameter is a pointer to a byte. From this we can see that string is an array of bytes underneath, so we can draw a picture like this:

A string is essentially an array of bytes. In Go, strings are designed to be immutable. Not only in Go, but in other languages as well. In a concurrent scenario, we can use the same string multiple times without locking it, ensuring efficient sharing without worrying about security.

The string type cannot be changed, but it can be replaced because the STR pointer in a stringStruct can be changed, but the pointer to the string cannot be changed. In other words, every time the string is changed, memory needs to be reallocated and the space allocated is reclaimed by the GC.

String concatenation: string concatenation: string concatenation

6 ways and principles of string splicing

Native stitching mode “+”

The Go language supports direct concatenation of two strings using the + operator, as shown in the following example:

var s string
s += "asong"
s += "Handsome"
Copy the code

This is the easiest way to use and is available in almost all languages. Concatenation with the + operator iterates over the string, evaluates and creates a new space to store the original two strings.

String formatting functionfmt.Sprintf

The Go language uses the default function fmt.Sprintf for string formatting, so string concatenation can also be done this way:

str := "asong"
str = fmt.Sprintf("%s%s", str, str)
Copy the code

FMT.Sprintf implementation principle is mainly the use of reflection, specific source analysis because of the length of the reason is not here detailed analysis, see reflection, will produce performance loss, you know!!

Strings.builder

Go language provides a library for manipulating strings. Strings.Builder can be used to concatenate strings, and writeString method is provided to concatenate strings as follows:

var builder strings.Builder
builder.WriteString("asong")
builder.String()
Copy the code

The implementation principle of Strings. builder is very simple, and its structure is as follows:

type Builder struct {
    addr *Builder // of receiver, to detect copies by value
    buf  []byte / / 1
}
Copy the code

The writeString() method is used to append data to the buf file as a byte slice:

func (b *Builder) WriteString(s string) (int, error) {
	b.copyCheck()
	b.buf = append(b.buf, s...)
	return len(s), nil
}
Copy the code

The provided String method converts []]byte to String. In order to avoid memory copy problems, cast is used to avoid memory copy:

func (b *Builder) String(a) string {
	return* (*string)(unsafe.Pointer(&b.buf))
}
Copy the code

bytes.Buffer

Since string is an array of bytes, we can concatenate the bytes.buffer of Go. Bytes. Buffer is a Buffer containing bytes. The usage is as follows:

buf := new(bytes.Buffer)
buf.WriteString("asong")
buf.String()
Copy the code

The underlying bytes.buffer is also a []byte slice with the following structure:

type Buffer struct {
	buf      []byte // contents are the bytes buf[off : len(buf)]
	off      int    // read at &buf[off], write at &buf[len(buf)]
	lastRead readOp // last read operation, so that Unread* can work correctly.
}
Copy the code

Since bytes.Buffer can continuously write data to the tail of the Buffer and read data from the head of the Buffer, the off field is used to record the read position and use the cap property of the slice to know the write position. This is not the focus of this article, but how the WriteString method concatenates strings:

func (b *Buffer) WriteString(s string) (n int, err error) {
	b.lastRead = opInvalid
	m, ok := b.tryGrowByReslice(len(s))
	if! ok { m = b.grow(len(s))
	}
	return copy(b.buf[m:], s), nil
}
Copy the code

Memory blocks are not allocated when a slice is created. Memory blocks are allocated only when data is written into the slice. The size of the first memory block is the size of the data written into the slice. If the data to be written is less than 64 bytes, the request is 64 bytes. Using the dynamic slice extension mechanism, string appending is copied to the end in the form of copy. Copy is a built-in copy function, which can reduce memory allocation.

But the standard type is still used when converting []byte to string, so memory allocation occurs:

func (b *Buffer) String(a) string {
	if b == nil {
		// Special case, useful in debugging.
		return "<nil>"
	}
	return string(b.buf[b.off:])
}
Copy the code

strings.join

The strings. join method can concatenate a string slice into a string. It can define join operators as follows:

baseSlice := []string{"asong"."Handsome"}
strings.Join(baseSlice, "")
Copy the code

Strings. join is also implemented based on strings.builder, and the code is as follows:

func Join(elems []string, sep string) string {
	switch len(elems) {
	case 0:
		return ""
	case 1:
		return elems[0]
	}
	n := len(sep) * (len(elems) - 1)
	for i := 0; i < len(elems); i++ {
		n += len(elems[i])
	}

	var b Builder
	b.Grow(n)
	b.WriteString(elems[0])
	for _, s := range elems[1:] {
		b.WriteString(sep)
		b.WriteString(s)
	}
	return b.String()
}
Copy the code

The only difference is that the method b. Row (n) is called in the join method, which is the initial capacity allocation, and the length of n calculated previously is the length of the slice we want to splice. Since the length of the slices we pass in is fixed, the capacity allocation in advance can reduce the memory allocation, which is very efficient.

sliceappend

Since string is also a byte array underneath, we can redeclare a slice and concatenate strings using append as follows:

buf := make([]byte.0)
base = "asong"
buf = append(buf, base...)
string(base)
Copy the code

If you want to reduce memory allocation, consider casting when converting []byte to string.

Benchmark comparison

We have provided a total of six methods, so we basically know the principle, so let’s use the Go language Benchmark to analyze which string concatenation method is more efficient. We mainly analyze it in two cases:

  • Small string concatenation
  • Large string concatenation

Because the amount of code is a little too much, the following only post the analysis results, the detailed code has been uploaded to github: github.com/asong2020/G…

Let’s start by defining a base string:

var base  = "123456789qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASFGHJKLZXCVBNM"
Copy the code

For the test of a small number of string concatenation, we verify by concatenating base once, base concatenating base, so we get the Benckmark result:

goos: darwin
goarch: amd64
pkg: asong.cloud/Golang_Dream/code_demo/string_join/once
cpu: Intel(R) Core(TM) i9- 9880.H CPU @ 2.30GHz
BenchmarkSumString- 16           21338802                49.19 ns/op          128 B/op          1 allocs/op
BenchmarkSprintfString- 16        7887808               140.5 ns/op           160 B/op          3 allocs/op
BenchmarkBuilderString- 16       27084855                41.39 ns/op          128 B/op          1 allocs/op
BenchmarkBytesBuffString- 16      9546277               126.0 ns/op           384 B/op          3 allocs/op
BenchmarkJoinstring- 16          24617538                48.21 ns/op          128 B/op          1 allocs/op
BenchmarkByteSliceString- 16     10347416               112.7 ns/op           320 B/op          3 allocs/op
PASS
ok      asong.cloud/Golang_Dream/code_demo/string_join/once     8.412s
Copy the code

Let’s build a string slice of length 200:

var baseSlice []string
for i := 0; i < 200; i++ {
		baseSlice = append(baseSlice, base)
}
Copy the code

And I’m going to iterate over this slice and I’m going to keep stitching it, because I can benchmark it:

goos: darwin
goarch: amd64
pkg: asong.cloud/Golang_Dream/code_demo/string_join/muliti
cpu: Intel(R) Core(TM) i9- 9880.H CPU @ 2.30GHz
BenchmarkSumString- 16                       7396            163612 ns/op         1277713 B/op        199 allocs/op
BenchmarkSprintfString- 16                   5946            202230 ns/op         1288552 B/op        600 allocs/op
BenchmarkBuilderString- 16                 262525              4638 ns/op           40960 B/op          1 allocs/op
BenchmarkBytesBufferString- 16             183492              6568 ns/op           44736 B/op          9 allocs/op
BenchmarkJoinstring- 16                    398923              3035 ns/op           12288 B/op          1 allocs/op
BenchmarkByteSliceString- 16               144554              8205 ns/op           60736 B/op         15 allocs/op
PASS
ok      asong.cloud/Golang_Dream/code_demo/string_join/muliti   10.699s

Copy the code

conclusion

Using the + operator to concatenate strings is efficient when concatenating a small number of strings. However, the performance of the + operator is poor when concatenating a large number of strings. The FMT.Sprintf function is still not suitable for string concatenation. Regardless of the number of concatenation strings, the performance loss is very high. Strings. Builder has stable performance no matter it is a small number of string splicing or a large number of string splicing, which is why strings.Builder is officially recommended for string splicing in Go. When using strings. Builder, it is better to use Grow method for preliminary capacity allocation. As can be seen from benchmark of strings.join method, because Grow method is used to allocate memory in advance, there is no need to copy strings during string concatenation. There is also no need to allocate new memory, which gives the best performance and minimum memory consumption using Strings. Builder. The bytes.Buffer method has lower performance than the strings.builder method. When bytes.Buffer is converted to a string, it applies for a new space to store the generated string variables. Unlike strings.buidler, which returns the underlying []byte as a string, this takes up more space.

Synchronous final analysis conclusion:

String concatenation using strings. Builder is most efficient in any case, but to use methods primarily, remember to call grow for capacity allocation. The performance of strings.join is approximately equal to that of strings.builder, which can be used when the string slice is already in use, but is not recommended when the string slice is unknown. If the + operator is the most convenient and has the highest performance for a small amount of string concatenation, the use of strings. Builder can be abandoned.

Comprehensive comparison performance ranking:

Strings.join ≈ strings.builder > bytes.buffer > []byte Conversion String > “+” > fmt.sprintf

conclusion

In this paper, we introduce six string stitching methods and compare the efficiency through Benckmark. No matter when strings. Builder is used, it is not wrong to use strings.

The code has been uploaded to github: github.com/asong2020/G…

Well, that’s the end of this article. I amasongAnd we’ll see you next time.

Welcome to the public account: Golang Dream Factory