My project is mainly divided into the following three chapters, including the application scenario of Douyu in GO, how to optimize GO in business, and what pits we have stepped in GO.

1 Application scenarios of Go

In Douyu, we divided the application scenarios of GO into the following three categories: cache type data, real-time type data, and CPU-intensive tasks. Each of these three application scenarios has its own characteristics.

  • Caching data types in the case of the betta is our home page, list page, the page and the characteristics of the interface is different users at the same time the data are the same, these types of data cache packages usually is larger, and these data are not user mode, has a certain value, it is easy to be reptiles crawl.

  • Real-time type data in douyu’s case is the video stream, focusing on the data, which is characterized by different data for each request. In addition, it is easy to be diverted by some business scenarios, such as the reminder of anchor’s broadcast or the start of a large event, which will flood in a large number of users at the same time in a short time, resulting in a sharp increase in server traffic.

  • The example of CPU intensive tasks in Douyu is our list sorting engine. The list sorting of Betta fish has many data sources and complicated algorithm model. How to calculate these data in a short time and improve the conductivity of the list is also a big challenge for us.

We also took many detours in terms of how to optimize these three business scenarios. And like some programmers, it’s easy to get caught up in particular techniques and thinking. Let’s take a simple example. In the early days when we were optimizing the GO sorting engine, we thought about various algorithm optimization, including the introduction of skip list and merge sort, which seemed to optimize a lot of performance and the benchmark data was pretty good. But in fact, the sorting algorithm time and the sorting data source acquisition time order of magnitude is very different. Optimization if you find the wrong direction, business optimization is only half the effort. Therefore, in the future work, we basically find the main time consuming area of business optimization according to the time area as shown in the figure below.

From the figure, we mainly enumerate several time distributions to give you an idea of these values. The time from the client to the CDN back to the equipment room is about 50 to 300ms. The communication between servers in the equipment room is about 5ms to 50ms. Redis, the in-memory database we accessed, returned about 500us to 1ms of data. GO Obtaining memory data internally Takes ns. By understanding the major time consuming areas of the business, we can know which parts to focus on optimizing.

2. Go business optimization

2.1 Cache data optimization

For a user to access a URL, we assume that url is /hello. This URL returns the same data structure for each user. It is usually possible to do this for the following example. For development, code is the most intuitive and controllable. However, this approach is often just about implementing features without improving the user experience. Because for the cache data, we do not need to let the CDN back to the source station room every time, increasing the link time of user access.

    // Echo instance
    e := echo.New()
    e.Use(mw.Cache)    // Routers
    e.GET("/hello", handler(HomeHandler))Copy the code

2.1.1 Adding CDN Cache

Therefore, for the cached data, we will not use GO for caching, but cache optimization in the front-end CDN. CDN links are shown as follows

To give you a better understanding of CDN, LET me first ask you a question. How long does it take to travel from Beijing to shenzhen at the speed of light (7ms)? Therefore, as shown in the figure, when a user accesses a cached data, we should try our best to cache the data in the CDN node close to the user. This optimization method is called CDN cache optimization. Through this technology, the CDN node will aggregate the requests of the attached users and return them to the machine room of the source station uniformly. In this way, not only the traffic bandwidth of the equipment room is saved, but also the primary link is reduced from the physical layer. This enables users to obtain cached data faster. In order to better simulate CDN caching, we take Nginx + GO to describe this process. Nginx is the base station in the picture, and GO service is the source station room in Beijing. The nginx configuration is as follows:

server { listen 8088; location ~ /hello { access_log /home/www/logs/hello_access.log; Proxy_pass http://127.0.0.1:9090; proxy_cache vipcache; proxy_cache_valid 200 302 20s; proxy_cache_use_stale error timeout invalid_header updating http_500 http_502 http_503 http_504 http_403 http_404; add_header Cache-Status "$upstream_cache_status"; }}Copy the code

The GO code is shown below

package main import ( "fmt" "io" "net/http") func main() { http.Handle("/hello", &ServeMux{}) err := http.ListenAndServe(":9090", nil) if err ! = nil { fmt.Println("err", err.Error()) } } type ServeMux struct { } func (p *ServeMux) ServeHTTP(w http.ResponseWriter, r *http.Request) { fmt.Println("get one request") fmt.Println(r.RequestURI) io.WriteString(w, "hello world") }Copy the code

After launching the code, we can see that.

  • The first time you visit \hello, both Nginx and Go will receive a request. The response header of nginx will have a miss value in cache-status, indicating that the nginx request will pass through to Go

  • Second visit\hello, nginx will receive requests, go will not receive requests at this time. The cache-status header in the nginx response header will have a hit content, indicating that the nginx request is not returned to go

  • As an added bonus to the nginx configuration, the cache URL \ Hello can still return data if the backend GO service fails. Nginx returns as follows

2.1.2 CDN to remove the question mark cache

Normal users access \ Hellourl through the interface boot, and then get \ Hello data. However, for crawler users, in order to obtain more timely crawler data, they will add various random numbers \hello? 123456, this behavior will cause the CDN cache to invalidate, sending many requests back to the source station room. Cause more stress. So in general, in this case, we can do question mark caching in CDN. This behavior can be simulated with Nginx. Nginx configuration is as follows:

server { listen 8088; if ( $request_uri ~* "^/hello") { rewrite /hello? /hello? break; } location ~ /hello { access_log /home/www/logs/hello_access.log; Proxy_pass http://127.0.0.1:9090; proxy_cache vipcache; proxy_cache_valid 200 302 20s; proxy_cache_use_stale error timeout invalid_header updating http_500 http_502 http_503 http_504 http_403 http_404; add_header Cache-Status "$upstream_cache_status"; }}Copy the code

2.1.3 Locking Heavy Traffic

We talked earlier about how if all of a sudden there’s a big game on the air, there’s a lot of people coming. At this time, there may be a scenario where the cache data has not been established, and a large number of user requests may still be returned to the source site machine room. The service load is high. At this point we can add proxy_cache_lock and proxy_cache_lock_TIMEOUT parameters

server { listen 8088; if ( $request_uri ~* "^/hello") { rewrite /hello? /hello? break; } location ~ /hello { access_log /home/www/logs/hello_access.log; Proxy_pass http://127.0.0.1:9090; proxy_cache vipcache; proxy_cache_valid 200 302 20s; proxy_cache_use_stale error timeout invalid_header updating http_500 http_502 http_503 http_504 http_403 http_404; proxy_cache_lock on; procy_cache_lock_timeout 1; add_header Cache-Status "$upstream_cache_status"; }}Copy the code

2.1.4 Data optimization

Above we also mentioned the first page, list page of the Betta cache type. This page interface data usually returns a lot of data. Here we use Go to simulate getting 120 data in a single request. Slice is divided into three cases: unpreset slice length, preset slice length, preset slice length and sync.map. The code is shown below. Each of these goroutines represents an HTTP request. Let’s run the data once with benchmark

package slice_testimport (
    "strconv"
    "sync"
    "testing")// go test -bench="."type Something struct {
    roomId   int
    roomName string}func BenchmarkDefaultSlice(b *testing.B) {
    b.ReportAllocs()    var wg sync.WaitGroup    for i := 0; i < b.N; i++ {
        wg.Add(1)        go func(wg *sync.WaitGroup) {            for i := 0; i < 120; i++ {
                output := make([]Something, 0)
                output = append(output, Something{
                    roomId:   i,
                    roomName: strconv.Itoa(i),
                })
            }
            wg.Done()
        }(&wg)
    }
    wg.Wait()
}func BenchmarkPreAllocSlice(b *testing.B) {
    b.ReportAllocs()    var wg sync.WaitGroup    for i := 0; i < b.N; i++ {
        wg.Add(1)        go func(wg *sync.WaitGroup) {            for i := 0; i < 120; i++ {
                output := make([]Something, 0, 120)
                output = append(output, Something{
                    roomId:   i,
                    roomName: strconv.Itoa(i),
                })
            }
            wg.Done()
        }(&wg)
    }
    wg.Wait()

}func BenchmarkSyncPoolSlice(b *testing.B) {
    b.ReportAllocs()    var wg sync.WaitGroup    var SomethingPool = sync.Pool{
        New: func() interface{} {
            b := make([]Something, 120)            return &b
        },
    }    for i := 0; i < b.N; i++ {
        wg.Add(1)        go func(wg *sync.WaitGroup) {
            obj := SomethingPool.Get().(*[]Something)            for i := 0; i < 120; i++ {
                some := *obj
                some[i].roomId = i
                some[i].roomName = strconv.Itoa(i)
            }
            SomethingPool.Put(obj)
            wg.Done()
        }(&wg)
    }
    wg.Wait()
}Copy the code

The following results are obtained. It can be reduced from the slowest 12us to 1us.

2.2 Real-time data optimization

2.2.1 Reducing I/O operations

We mentioned above that in the case of sudden diversion of the business, our service could flood with a large amount of traffic in a short period of time, which, if not handled, could overwhelm the back-end data source. In addition, in the case of sudden traffic such as video stream, if the request takes a long time, the user may not get the data for a long time, and may further refresh the page to request the interface again, resulting in secondary attacks. So we optimized this real-time interface.

We do three levels of caching for large amounts of real-time data. The first layer is whitelist, this kind of data is mainly through manual intervention, preset some memory data. The second layer is to put some of our more important room information into the service memory through algorithms, and the third layer is to dynamically adjust the amount of requests. Through these three levels of cache design. For example, when big events and big anchors start broadcasting, our request will not penetrate to the data source, and the data has been returned in the memory of the direct server. In this way, I/O operations are reduced and traffic is neutralized to smoothly reach the data source.

Other non-real-time data of small magnitude are pushed through ETCD

2.2.2 Tuning redis parameters

Fully understand the parameters of Redis. Only in this way can we properly adjust the parameters of Redis according to the business. Achieve optimum performance. MaxIdle setting the high point can ensure that there are enough connections to obtain Redis in the case of burst traffic and there is no need to establish connections in the case of high traffic. The Settings of maxActive, readTimeout and writeTimeout protect Redis, which is equivalent to a simple flow limiting and frequency lowering operation performed by go service on Redis.

Redigo Parameter tuning maxIdle = 30 maxActive = 500 dialTimeout = "1s" readTimeout = "500ms" writeTimeout = "500ms" idleTimeout = "60s"Copy the code

2.2.3 Service and Redis tuning

Because Redis is an in-memory database, the response speed is relatively block. Services may use a lot of Redis, many times our service pressure test, the bottleneck is not in the code writing, but in redis throughput performance. Because Redis is a single-threaded model, in order to improve the speed, we usually adopt pipeline instructions to increase the redis slave library, so that GO can concurrently pull data according to the number of Redis to achieve the best performance. Here we simulate this scenario.

package redis_testimport ( "sync" "testing" "time" "fmt")// go testfunc Test_OneRedisData(t *testing.T) { t1 := time.Now() for i := 0; i < 120; i++ { getRemoteOneRedisData(i) } fmt.Println("Test_OneRedisData cost: ",time.Since(t1)) }func Test_PipelineRedisData(t *testing.T) { t1 := time.Now() ids := make([]int,0, 120) for i := 0; i < 120; i++ { ids = append(ids, i) } getRemotePipelineRedisData(ids) fmt.Println("Test_PipelineRedisData cost: ",time.Since(t1)) }func Test_GoroutinePipelineRedisData(t *testing.T) { t1 := time.Now() ids := make([]int,0, 120) for i := 0; i < 120; i++ { ids = append(ids, i) } getGoroutinePipelineRedisData(ids) fmt.Println("Test_GoroutinePipelineRedisData cost: ", time.since (t1))}func getRemoteOneRedisData(I int) int { Defined as 600 us time. Sleep (600 * time. Microsecond) return I} func getRemotePipelineRedisData (I int []) [] int = {length: len (I) // Single redis data, For 500 us time. Sleep (time Duration (length) * 500 * time. Microsecond) return I} func getGoroutinePipelineRedisData (ids int []) []int { idsNew := make(map[int][]int, 0) idsNew[0] = ids[0:30] idsNew[1] = ids[30:60] idsNew[2] = ids[60:90] idsNew[3] = ids[90:120] resp := make([]int,0,120)  var wg sync.WaitGroup for j := 0; j < 4; j++ { wg.Add(1) go func(wg *sync.WaitGroup, j int) { resp = append(resp,getRemotePipelineRedisData(idsNew[j])...) wg.Done() }(&wg, j) } wg.Wait() return resp }Copy the code

From the figure, we can see that the performance can be improved by 5 times by using concurrent pull-add pipeline. There are many more ways to optimize Redis. For example,

2. For batch data, according to the number of redis slave libraries, concurrent Goroutine pull data 3. 4. Simplify key field 5. Decode redis value to MSgpackCopy the code

3. GO’s experience in stomping pits

On the pit code address: https://github.com/askuy/gopherlearn

3.1 Pointer type string number

3.2 Locking multiple Maps

3.3 Channel Usage Problems

4 Related literature

More pits, less instructions. https://stackoverflow.com/questions/18435498/why-are-receivers-pass-by-value-in-go/18435638 above problems can be found in the related literature, the specific reason, please read the documentation.

When are function parameters passed by value? As in all languages in the C family, everything in Go is passed by value. That is, a function always gets a copy of the thing being passed, as if there were an assignment statement assigning the value to the parameter. For instance, passing an int value to a function makes a copy of the int, and passing a pointer value makes a copy of the pointer, but not the data it points to. (See a later section for a discussion of how this affects method receivers.) Map and slice values behave like pointers: They are descriptors that contain Pointers to the dread map or slice data. Copying a map or slice value doesn't copy the data it points to. CopyingCopy the code