preface

For high-concurrency systems, there are three powerful tools to protect the system: caching, degradation, and limiting traffic. The common application scenarios of traffic limiting are sudden concurrent problems such as seckill, order, and comment.

  1. The purpose of caching is to improve system access speed and system throughput.

  2. Degradation is when the service has a problem or affects the performance of the core process, it needs to be temporarily shielded, until the peak or the problem is resolved.

  3. Some scenarios cannot be solved by caching and downgrading, such as scarce resources (snap, snap), writing services (comments, orders), frequent complex queries (latest comments). So there needs to be a way to limit the amount of concurrency/requests in these scenarios, which is called flow limiting.

The body of the

The purpose of current limiting

Current limiting is the purpose of through to the speed limit of concurrent access/request, or a request within the time window for the speed limit to protect the system, once reached limit rate could be denial of service (directed to the error page or inform resources without), line, or wait for (such as seconds kill, reviews, order), down (return to the palm or the default data, If the product details page inventory is available by default).

The mode of limiting the current

  1. Limit total concurrency (e.g. database connection pools, thread pools)

  2. Limit the number of instantaneous concurrent connections (e.g. Nginx limit_CONN module, which limits the number of instantaneous concurrent connections)

  3. Limit the average rate within the time window (e.g. Guava’s RateLimiter, Nginx’s limit_req module, limit the average rate per second)

  4. Limit the remote interface call rate

  5. Limit the consumption rate of MQ

  6. You can limit traffic based on the number of network connections, network traffic, AND CPU or memory load

Algorithms for limiting traffic

1. The token bucket

2. The bucket

3. Counter

Sometimes you can also use a counter to limit the flow, mainly to limit the total number of concurrent requests, such as database connection pools, thread pools, and the number of concurrent seconds. Limiting traffic by a threshold set by the global total number of requests or the total number of requests within a certain period of time. This is a simple and crude method of current limiting, rather than average rate current limiting.

Token buckets vs. leaky buckets

Token buckets limit the average inflow rate, allow burst requests, and allow a certain amount of burst traffic.

The drain bucket limits the constant outflow rate, smoothing out the burst inflow rate.

Application level traffic limiting

1. Total number of traffic limiting resources

You can use pooling techniques to limit the total number of resources: connection pools, thread pools. For example, if the number of database connections allocated to each application is 100, the application can use a maximum of 100 resources. If the number exceeds 100, the application can wait or throw exceptions.

2. Limit the total number of concurrent requests, connections, and requests

If you have used Tomcat, one of its Connector configurations has the following parameters:

  • MaxThreads: The maximum number of threads Tomcat can start to process requests. If the number of requests continues to exceed the maximum number of threads, it may freeze.

  • MaxConnections: Indicates the maximum number of instantaneous connections. The exceeded number will be queued.

  • AcceptCount: If Tomcat threads are busy responding, new connections are queued. If the number of connections exceeds the queue size, connections are rejected.

3. Limit the total number of concurrent requests or requests on an interface

Using AtomicLong in Java, show the code:

try{
    if(atom.incrementandGet () > limit number) {// Reject the request
    } else {
        // Process the request}}finally {
    atomic.decrementAndGet();
}
Copy the code

4. Limit the number of requests in a time window for an interface

Using Guava’s Cache

LoadingCache counter = CacheBuilder.newBuilder()
    .expireAfterWrite(2, TimeUnit.SECONDS)
    .build(newCacheLoader() {
        @Override
        public AtomicLong load(Long seconds) throws Exception {
            return newAtomicLong(0); }}); longlimit =1000;
while(true) {
    // Get the current second
    long currentSeconds = System.currentTimeMillis() /1000;
    if(counter.get(currentSeconds).incrementAndGet() > limit) {
        System.out.println("Restricted flow:" + currentSeconds);
        continue;
    }
    // Business processing
}
Copy the code

5. Smooth the number of traffic requests on an interface

The previous traffic limiting methods are not able to deal with emergent requests well, that is, instantaneous requests may be allowed, leading to some problems. Therefore, in some scenarios, burst requests need to be transformed into average rate request processing.

Guava RateLimiter provides an implementation of the token bucket algorithm:

  1. SmoothBursty

  2. SmoothWarmingUp implementation

SmoothBursty

RateLimiter limiter = RateLimiter.create(5);
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
Copy the code

You get output similar to the following:

0.0
0.198239
0.196083
0.200609
0.199599
0.19961
Copy the code

SmoothWarmingUp

RateLimiter limiter = RateLimiter.create(5.1000,  TimeUnit.MILLISECONDS);
for(inti = 1; i < 5; i++) {
    System.out.println(limiter.acquire());
}

Thread.sleep(1000L);
for(inti = 1; i < 5; i++) {
    System.out.println(limiter.acquire());
}
Copy the code

You get output similar to the following:

0.0
0.51767
0.357814
0.219992
0.199984
0.0
0.360826
0.220166
0.199723
0.199555
Copy the code

SmoothWarmingUp is created by:

RateLimiter.create(doublepermitsPerSecond, long warmupPeriod, TimeUnit unit);
Copy the code
  • PermitsPerSecond: number of new tokens per second
  • WarmupPeriod: Indicates the interval between the cold start rate and the average start rate

The rate is trapezoidal, that is to say, the cold start will slow down to the average rate at a relatively large rate; And then it goes to the average rate. The warmupPeriod parameter can be adjusted to smooth the initial fixed rate.

Distributed current limiting

The key to distributed traffic limiting is to make the traffic limiting service atomized, and the solution can be implemented using Redis + Lua or Nginx + Lua technology.

Traffic limiting at the access layer

The access layer usually refers to the entrance of request traffic. The main purposes of this layer are as follows:

  • Load balancing
  • Illegal Request Filtering
  • Request the aggregation
  • Cache, degrade, and traffic limiting
  • A/B testing
  • Service quality monitoring

Nginx has two modules: ngx_HTTP_limit_conn_module for connection number traffic limiting and ngx_HTTP_limit_req_module for leak-bucket algorithm. You can also use the Lua traffic limiting module lua-resty-limit-traffic provided by OpenResty for more complex traffic limiting scenarios.

  • Limit_conn: traffic limiting for the total number of network connections corresponding to a KEY. Traffic limiting can be performed by IP address or domain name.

  • Limit_req: limits the average rate of requests corresponding to a KEY. The value can be smooth mode (delay) or burst mode (nodelay).

The Lua traffic limiting module lua-resty-limit-traffic provided by OpenResty can perform more complex traffic limiting scenarios.


Welcome to pay attention to the technical public number: Zero one Technology Stack

This account will continue to share learning materials and articles on back-end technologies, including virtual machine basics, multithreaded programming, high-performance frameworks, asynchronous, caching and messaging middleware, distributed and microservices, architecture learning and progression.