The system is designed with an estimated capacity, and if the TPS/QPS threshold is exceeded for an extended period of time, the system may be overwhelmed and the entire service may become inadequate. To avoid this, we need to limit the flow of interface requests.

The purpose of traffic limiting is to protect the system by limiting the number of concurrent access requests or the number of requests within a time window. Once the rate reaches the limit, the system can deny service, queue up or wait.

Common limiting modes include controlling concurrency and controlling rate, limiting the number of concurrent requests, limiting the rate of concurrent access, and limiting the number of requests per unit time window.

Controlling the number of concurrent traffic is a common method of limiting traffic, which can be implemented by Semaphore mechanism (such as Semaphore in Java) in practical applications. For example, we provide a service interface that allows a maximum of 10 concurrent requests. The code implementation is as follows:

public class DubboService {    private final Semaphore permit = new Semaphore(10, true);    public void process(){ try{ permit.acquire(); Catch (InterruptedException e) {e.printStackTrace(); } finally { permit.release(); }}}Copy the code

In the code, there are 30 threads executing, but only 10 concurrent executions are allowed. Semaphore(int permits) accepts an integer indicating the number of permits available. Semaphore(10) indicates that 10 threads are allowed to obtain the license, i.e., the maximum number of concurrent requests is 10. Semaphore is also very simple to use, first the thread uses Semaphore’s acquire() to acquire a license, then calls Release () to return the license, or tries to acquire a license using tryAcquire().

Control access rate in our engineering practice, the common is to use the token bucket algorithm to achieve this mode, other such as leaky bucket algorithm can also achieve control rate, but in our engineering practice, not introduced here, readers please understand by yourself.

The token bucket algorithm is described on Wikipedia as follows:

A token is added to the bucket every 1/r second.

The bucket can hold a maximum of B tokens. If the bucket is full, new tokens are discarded.

When an N-byte packet arrives, n tokens are consumed and the packet is sent.

If the number of available tokens in the bucket is less than n, the packet is cached or discarded.

The token bucket controls the amount of data that passes in a time window. In API terms, QPS and TPS are exactly the amount of requests or transactions in a time window, but the time window is limited to 1s. Tokens are put into the bucket at a constant rate, and if the request needs to be processed, a token needs to be retrieved from the bucket first, and when no token is available in the bucket, service is denied. Another benefit of a token bucket is that you can easily change the speed, increasing the rate at which tokens are put into the bucket as needed.

In our engineering practice, we usually use Ratelimiter in Guava to control the rate, for example, we don’t want to commit more than 2 tasks per second:

// The rate is two permissions per second. Final RateLimiter RateLimiter = ratelimite.create (2.0); void submitTasks(List tasks, Executor executor) {for(Runnable task : tasks) { rateLimiter.acquire(); // You may need to wait for executor.execute(task); }}Copy the code

Controlling the number of requests per unit time window In some scenarios, we want to limit the number of requests or invocations per second/minute/day for an interface or service. For example, to limit the number of service calls per second to 50, the implementation is as follows:

import com.google.common.cache.CacheBuilder; import com.google.common.cache.CacheLoader; import com.google.common.cache.LoadingCache; import java.util.concurrent.ExecutionException; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicLong; private LoadingCache<Long, AtomicLong> counter = CacheBuilder.newBuilder() .expireAfterWrite(2, TimeUnit.SECONDS) .build(new CacheLoader<Long, AtomicLong>() { @Override public AtomicLong load(Long seconds) throws Exception {returnnew AtomicLong(0); }}); public static long permit = 50; Public ResponseEntity getData() throws ExecutionException {// Get the current second long currentSeconds = System.CurrentTimemillis () / 1000;if(counter.get(currentSeconds).incrementAndGet() > permit) {
            return ResponseEntity.builder().code(404).msg("Access rate too fast").build(); This concludes the section on limiting traffic. The next article will cover downgrading and circuit breakers.Copy the code