preface

People in the Go-Zero group often ask,

What algorithm is used to implement service monitoring?

How does a sliding window work? Can you explain how this works?

How is the circuit breaker algorithm designed? Why isn’t it half-open and half-closed?

In this article, to analyze the implementation algorithm and logic behind the index statistics in GO-Zero.

How to count indicators

We’ll go straight to breaker:

type googleBreaker struct {
  k     float64
  stat  *collection.RollingWindow
  proba *mathx.Proba
}
Copy the code

The default Breaker in Go-Zero is based on Google SRE.

When the breaker intercepts a request, it records the success/failure rate of the current request:

func (b *googleBreaker) doReq(req func(a) error.fallback func(err error) error.acceptable Acceptable) error{...// Execute the actual request function
  err := req()
  if acceptable(err) {
    // Execute: b.tat.Add(1)
    +1 = +1 = +1 = +1
    b.markSuccess()
  } else {
    // Same principle as above
    b.markFailure()
  }

  return err
}
Copy the code

So the bottom line is: when the request completes, the internal statistics structure will add statistics (either positive or negative) based on the number of errors. Statistics also need to evolve over time as they migrate over time.

Simply put: time series memory database [also not so fierce database, is a storage, just a memory version of]

Let’s talk about what data structure is used to organize this time series.

The sliding window

Let’s look at the rollingWindow definition data structure:

type RollingWindow struct {
    lock          sync.RWMutex
    size          int
    win           *window
    interval      time.Duration
    offset        int
    ignoreCurrent bool
    lastTime      time.Duration
  }
Copy the code

In the above structural definition, the window stores the attribute of the metric record.

A rollingWindow contains several buckets (this is up to the developer to define) :

Each bucket stores: Sum total number of successful requests, Count total number of requests. So when the breaker does the final calculation, it adds Sum cumulatively to accept Sum and Count cumulatively to total, so that the current error rate can be calculated.

How does the slide happen

First of all, for breaker, it needs to count the request status within a unit of time (such as 1s). Corresponding to the above bucket, we only need to record the indicator data within a unit of time in this bucket.

How can we ensure that the specified Bucket stores data per unit of time as time progresses?

The first thought: start a timer in the background to create a bucket every unit of time. Then when the request is made, the current timestamp falls in the bucket to record the current request status. Periodically creating buckets is a contradiction between critical conditions, data coming in, buckets not yet built.

The second way is to lazily create a bucket and then check and create a bucket when a data is encountered. Sometimes we have buckets, sometimes we don’t have buckets, and we create buckets in large numbers. Can we reuse buckets?

The Go-Zero approach is that the RollingWindow is created directly in advance, and the current time of the request is determined by an algorithm to the bucket and the request status is recorded.

Here’s how the breaker calls b.stat.add (1) :

func (rw *RollingWindow) Add(v float64) {
  rw.lock.Lock()
  defer rw.lock.Unlock()
  // The sliding action happens here
  rw.updateOffset()
  rw.win.add(rw.offset, v)
}

func (rw *RollingWindow) updateOffset(a) {
  span := rw.span()
  if span <= 0 {
    return
  }

  offset := rw.offset
  // Resets an expired bucket
  for i := 0; i < span; i++ {
    rw.win.resetBucket((offset + i + 1) % rw.size)
  }

  rw.offset = (offset + span) % rw.size
  now := timex.Now()
  // Update time
  rw.lastTime = now - (now-rw.lastTime)%rw.interval
}

func (w *window) add(offset int, v float64) {
  // Adds the specified index to the executing bucket
  w.buckets[offset%w.size].add(v)
}
Copy the code

The image above shows the window changes that happen to the bucket during Add(Delta). Explain:

  1. updateOffsetIs to dobucketUpdate, and determine where the current time fallsbucketIf the number of buckets exceeds, return the number of bucketsbucketreset
    • Determine the current time relative tobucket intervalIf the number of buckets exceeds, return the number of buckets directly.
    • Will span withinbucketAll clear the data.reset
    • updateoffsetIs the one that will be written tobucket
    • Update execution timelastTimeAnd make a sign for the next move
  2. Updated from last timeoffset, to the correspondingbucketWrite data

In this process, how to determine the bucket expiration point and update time. The most important part of the sliding window is the time update. The following figure is used to explain this process:

Since(rw.lastTime)/rw.interval


In this way, in the process of Add(), the window is sliding through the annotation of lastTime and nowTime, and the window is constantly resetting. New data is constantly filling in, thus realizing the window calculation.

conclusion

This paper analyzes the basic encapsulation of index statistics in go-zero framework and the implementation of sliding window rollingWindow. Of course, in addition, store/ Redis also exist indicators, this inside does not need to sliding window counting, because itself only need to calculate the hit rate, hit +1, miss +1 can be counted by indicators, the last statistics will know the hit rate.

Sliding window is suitable for the calculation of indicators in flow control, but also can achieve flow control.

For more articles on the design and implementation of Go-Zero, you can follow the “Microservices Practice” public account.

The project address

Github.com/tal-tech/go…

Welcome to Go-Zero and star support us!

Wechat communication group

Pay attention to the public account of “micro-service Practice” and click on the exchange group to obtain the QR code of the community group.