Remember an 18-fold performance optimization

background

Recently, I was responsible for a self-developed Dubbo registry, which often received CPU usage alarms, so I carried out a wave of optimization, and the effect was not bad, so I decided to share the thinking and optimization process, hoping to help you.

Research Dubbo registry is what things, I draw a sketch we feel a little good, do not understand it does not matter, does not affect the subsequent understanding.

The Consumer and Provider’s service discovery requests (registration, unregistration, subscription) are sent to the Agent, which acts as the Agent’s Agent
Registry and Agent maintain the long link of Grpc. The purpose of the long link is to timely push the change of the Provider to the corresponding Consumer. In order to ensure the correctness of data, a combination of push and pull mechanism is implemented. Agent will go to Registry to pull the subscribed service list at intervals
Agent and Service services are deployed on the same machine, similar to Service Mesh, which minimizes Service intrusion and enables fast iteration

To get back to today’s point, the CPU usage of this registry has been in the mid to high level for a long time recently. There are occasional application releases, and the CPU is even full when the push volume is high.

This alarm is not generated before because only a few applications are connected. In recent months, more applications are connected and the alarm threshold is reached.

Search for optimization Points

Since this project is written by Go (don’t worry if you don’t know Go, this article focuses on algorithm optimization, not tool use), it is relatively easy to find where to use CPU: just open pprof and Go online for a period of time.

You can refer to my previous article for specific operations. All the knowledge and tools used in today’s article can be found in this article.

CPU profile cut part of the figure, the other is not important, can see consumed CPU is AssembleCategoryProviders method and it’s directly related

Two redis-related methods
One is calledassembleUrlWeightThe method of

Explain a little bit, AssembleCategoryProviders method is to return to Dubbo provider url, because the return url to do some processing (such as adjusting the weight, etc.), will involve on the interpretation of the Dubbo url. And because of the push-pull model, the more people use online services, the more QPS this process takes, so it’s not surprising that it takes up most of the CPU.

These two Redis operations are probably CPU intensive serialization, more concentrated in assembleUrlWeight, which is a bit confusing.

Let’s look at how assembleUrlWeight is optimized, because it takes up the most CPU and optimizes best.

Here is the pseudocode for assembleUrlWeight:

func AssembleUrlWeight(rawurl string, lidcWeight int) string {
	u, err := url.Parse(rawurl)
	iferr ! =nil {
		return rawurl
	}

	values, err := url.ParseQuery(u.RawQuery)
	iferr ! =nil {
		return rawurl
	}

	if values.Get("lidc_weight") != "" {
		return rawurl
	}

	endpointWeight := 100
	if values.Get("weight") != "" {
		endpointWeight, err = strconv.Atoi(values.Get("weight"))
		iferr ! =nil {
			endpointWeight = 100
		}
	}

	values.Set("weight", strconv.Itoa(lidcWeight*endpointWeight))

	u.RawQuery = values.Encode()
	return u.String()
}
Copy the code

Parameter transmission RawURL is the URL of Dubbo Provider, and lidcWeight is the weight of equipment room. The weight in the URL is recalculated based on the configured weight of the equipment room, so that traffic of multiple equipment rooms can be allocated based on the weight.

This process involves parsing the URL parameter, calculating the weight, and finally restoring it to a URL

The URL structure of Dubbo is the same as that of a normal URL, with more parameters and no fragment after #.

The main CPU consumption is in these two parses and the final restore, and the purpose of these two parses is to get the lidc_weight and weight parameters in the URL.

Url.parse and url.parsequery are both official libraries provided by Go, and have been implemented in various languages. The core of urL.parse is to Parse the URL into an object, so that each part of the URL can be easily obtained.

If you know the idea of entropy, you know that it’s something that can be optimized. Shannon used the concept of thermodynamics for reference and called the average amount of information after redundancy was removed as information entropy.

Parse and url.parsequery parsing in this scenario is definitely redundant, and redundancy means the CPU is doing something extra.

Since a Dubbo URL argument is usually many, we only need to take these two arguments, and url.parse parses all of them.

For example, if you want to find the maximum value of an array, sort the array first and then take the maximum value, it’s obviously redundant.

Sorted arrays can not only take the maximum value, but also take the second largest value, the third largest value… Minimum, there’s redundancy in the information, so sorting first is definitely not the best solution to the maximum.

To optimize the

Optimize the performance of url parameters

The first idea is that instead of parsing the entire URL, just take the corresponding argument, which is very similar to the algorithm we wrote, such as getting the weight argument, which can only be one of these two cases (there is no #, so it’s much easier) :

Dubbo: / / 127.0.0.1:20880 / org. Newboo. Basic. MyDemoService? weight=100&…
Dubbo: / / 127.0.0.1:20880 / org. Newboo. Basic. MyDemoService? xx=yy&weight=100&…

Either &weight=, or? Weight =, ends either with & or directly at the end of the string. This code is easy to write.

func GetUrlQueryParam(u string, key string) (string, error) {
	sb := strings.Builder{}
	sb.WriteString(key)
	sb.WriteString("=")
	index := strings.Index(u, sb.String())
	if (index == - 1) || (index+len(key)+1 > len(u)) {
		return "", UrlParamNotExist
	}

	var value = strings.Builder{}
	for i := index + len(key) + 1; i < len(u); i++ {
		if i+1 > len(u) {
			break
		}
		if u[i:i+1] = ="&" {
			break
		}
		value.WriteString(u[i : i+1])}return value.String(), nil
}
Copy the code

The original method of obtaining parameters can be extracted:

func getParamByUrlParse(ur string, key string) string {
	u, err := url.Parse(ur)
	iferr ! =nil {
		return ""
	}

	values, err := url.ParseQuery(u.RawQuery)
	iferr ! =nil {
		return ""
	}

	return values.Get(key)
}
Copy the code

Benchmark these two functions:

func BenchmarkGetQueryParam(b *testing.B) {
	for i := 0; i < b.N; i++ {
		getParamByUrlParse(u, "anyhost")
		getParamByUrlParse(u, "version")
		getParamByUrlParse(u, "not_exist")}}func BenchmarkGetQueryParamNew(b *testing.B) {
	for i := 0; i < b.N; i++ {
		GetUrlQueryParam(u, "anyhost")
		GetUrlQueryParam(u, "version")
		GetUrlQueryParam(u, "not_exist")}}Copy the code

Benchmark results are as follows:

BenchmarkGetQueryParam-4          103412              9708 ns/op
BenchmarkGetQueryParam-4          111794              9685 ns/op
BenchmarkGetQueryParam-4          115699              9818 ns/op
BenchmarkGetQueryParamNew-4      2961254               409 ns/op
BenchmarkGetQueryParamNew-4      2944274               406 ns/op
BenchmarkGetQueryParamNew-4      2895690               405 ns/op
Copy the code

You can see that performance is about 20 times better

The new method has two minor details. The first is that the return value differentiates whether or not the parameter exists, which will be used later. Second, strings.Builder is used for string operation, which is also the result of actual test. The performance of + or FMT.Springf is not as good as this.

Optimize URL write parameter performance

After calculating the weight, write the weight into the URL. Here is the optimized code directly:

func AssembleUrlWeightNew(rawurl string, lidcWeight int) string {
	if lidcWeight == 1 {
		return rawurl
	}

	lidcWeightStr, err1 := GetUrlQueryParam(rawurl, "lidc_weight")
	if err1 == nil&& lidcWeightStr ! ="" {
		return rawurl
	}

	var err error
	endpointWeight := 100
	weightStr, err2 := GetUrlQueryParam(rawurl, "weight")
	ifweightStr ! ="" {
		endpointWeight, err = strconv.Atoi(weightStr)
		iferr ! =nil {
			endpointWeight = 100}}iferr2 ! =nil { // There is no weight in the url
		finUrl := strings.Builder{}
		finUrl.WriteString(rawurl)
		if strings.Contains(rawurl, "?") {
			finUrl.WriteString("&weight=")
			finUrl.WriteString(strconv.Itoa(lidcWeight * endpointWeight))
			return finUrl.String()
		} else {
			finUrl.WriteString("? weight=")
			finUrl.WriteString(strconv.Itoa(lidcWeight * endpointWeight))
			return finUrl.String()
		}
	} else { // There is weight in the url
		oldWeightStr := strings.Builder{}
		oldWeightStr.WriteString("weight=")
		oldWeightStr.WriteString(weightStr)

		newWeightStr := strings.Builder{}
		newWeightStr.WriteString("weight=")
		newWeightStr.WriteString(strconv.Itoa(lidcWeight * endpointWeight))
		return strings.ReplaceAll(rawurl, oldWeightStr.String(), newWeightStr.String())
	}
}
Copy the code

Whether there is weight in the URL

If the URL does not have a weight parameter, add a weight parameter to the url?
If the URL itself has a weight parameter, string replacement is performed directly

If lidcWeight = 1, return the value of Dubbo. If lidcWeight = 1, the value of Dubbo is 100.

After all optimization, make a general benchmark:

func BenchmarkAssembleUrlWeight(b *testing.B) {
	for i := 0; i < b.N; i++ {
		for _, ut := range []string{u, u1, u2, u3} {
			AssembleUrlWeight(ut, 60)}}}func BenchmarkAssembleUrlWeightNew(b *testing.B) {
	for i := 0; i < b.N; i++ {
		for _, ut := range []string{u, u1, u2, u3} {
			AssembleUrlWeightNew(ut, 60)}}}Copy the code

The results are as follows:

BenchmarkAssembleUrlWeight-4               34275             33289 ns/op
BenchmarkAssembleUrlWeight-4               36646             32432 ns/op
BenchmarkAssembleUrlWeight-4               36702             32740 ns/op
BenchmarkAssembleUrlWeightNew-4           573684              1851 ns/op
BenchmarkAssembleUrlWeightNew-4           646952              1832 ns/op
BenchmarkAssembleUrlWeightNew-4           563392              1896 ns/op
Copy the code

About an 18-fold increase in performance, and this is probably a bad case, if you pass in lidcWeight = 1, it’s better.

The effect

After optimization, corresponding unit test was written for the modified method. After confirming that there was no problem, observation was conducted online, and CPU Idle (Idle rate) increased by more than 10%

The last

In fact, this article shows a Go program is very routine performance optimization, is relatively simple, after reading, you may have questions:

Why parse urls during push and pull? Couldn’t it have been calculated and saved?
Why only this point is optimized? Can other points be optimized as well?

For the first question, actually this is a historical problem. When you take over the system, it is like this. If something goes wrong with the program, you have to change the whole mechanism

The second question, which I just answered incidentally, is optimized in this way to minimize changes and maximize benefits. Other points are not so easy to change. In the short term, the most important thing is to get benefits. Of course, we plan to refactor the system later, but before refactoring, this optimization is enough to solve the problem.

Search attention wechat public number “bug catching master”, back-end technology sharing, architecture design, performance optimization, source code reading, problem solving, practice. Also welcome to add my personal wechat MrRoshi, circle of friends.

Remember an 18-fold performance optimization

background

Search for optimization Points

To optimize the

Optimize the performance of url parameters

Optimize URL write parameter performance

The effect

The last

Related Posts

Spring Boot Microservice performance down 90%! Use Arthas to locate the root cause

Trivia: Teach you to read UML diagrams

Thinking about data source in Spring+Mybatis transaction management