Stop more for a long time “interview tutoring”, with the recent school recruitment approaching, also want to put on the agenda, in combing the eight-part essay at the same time, but also to deepen their understanding, I hope to help you children ~

An overview of the

In a recent issue of the article to a few children after the interview counselling, I found some problems! In the interview, there are mentioned that the real children’s project experience put forward more questions, also do not know whether people look at ORZ

The main problems in the project are listed as follows:

  • To understand why you do seckill systems?
  • Where does seckill work and where does it not
  • Think about where your system is missing
  • Understand every point of your system, including functionality, performance, data flow, and deployment architecture
  • Technology selection, why do you use Redis, why do you use MQ?
  • Technical risks, the benefits and risks to your system that refer to these middleware
  • How to disaster recovery, how to monitor

Today’s article about current limiting is also a key technical point in seckill system. Will from: technical principles, technical selection, use scenarios and other aspects to introduce, let you in the interview, wanton play.

What is current limiting

An example that everyone understands is the discharge of the Three Gorges Dam

  • Three Gorges reservoir water storage: we can understand the second to kill the user activities
  • Release: Activity begins
  • Drain: Kill the user successfully

If there is no gate, what is the impact? The villages downstream are flooded, and your system will collapse as well!

Maybe you have a question, if I did not do this impoundment action (the Three Gorges does not have so much water), I do not need to do the flow limit? Actually not, we all know how much the Three Gorges has solved the historical flooding problem, here find a popular science link.

So with our seckill system, how do we know when our system is going to have a surge in users? If you’re not ready, are you losing them? And system paralysis, also has an impact on the stock of users. lose-lose

What do I need this iron bar forCopy the code

Therefore, the current limit is the anchor of our system, let our system calm down.

Finally, a batch of data is used to illustrate the actual scenario of a lower bound stream:

1 commodity within 1 second 100 quota 5000 users 1000 enter the order page 4000 timeout page 100 order 900 insufficient inventory results: 100 successful order 4900 order failure limit flow: 1000Copy the code

To consider

Ask: what is the maximum concurrency of my service?Copy the code

How current limiting

A simple call link is drawn

H5/ Client -> Nginx -> Tomcat -> Seckill system -> DB

Simply comb as

  • The gateway current-limiting
    • Nginx current-limiting
    • Tomcat current-limiting
  • Service end traffic limiting
    • Single current limiting
    • Distributed current limiting

The gateway current-limiting

Nginx current-limiting

Nginx comes with two current-limiting modules:

  • The number of connectionsCurrent limiting modulengx_http_limit_conn_module
  • The leaky bucket algorithm implements the requestCurrent limiting modulengx_http_limit_req_module

1, ngx_http_limit_conn_module

Mainly used to limit scripting attacks, if our seckill campaign starts, a hacker (pretend to have one, after all our system is going to be bigger and stronger!) Scripting attacks will waste bandwidth and generate a large number of invalid requests. For such requests, we can limit the number of IP connections.

We can implement this restriction by adding the following configuration to the nginx_conf HTTP {} :

One limit_conn_zone $binary_remote_addr zone=one:10m; Limit_conn_log_level error; limit_conn_status 503; Limit user number of concurrent connections to 1 limit_conn one 1;Copy the code

2, ngx_http_limit_req_module

That’s the number of IP connections, but what if we want to control the number of requests? The way to limit this is by using a funnel algorithm to process a fixed number of requests per second, delaying too many requests. If the frequency of requests exceeds the limit domain configuration, request processing is delayed or discarded, so all requests are processed at the defined frequency.

limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s; Limit_req zone=one burst=5;Copy the code

3, how to understand the connection number, request number traffic limit

  • Connection limit (ngx_http_limit_conn_module) We only receive one connection per IP, and we only receive the next one when this IP is processed. (Only one connection is being processed per unit time)

The toilet (IP) is restricted to only one pit, and only when I finish, can the next person go on.

  • The request limit (ngx_http_limit_req_module) passesBucket algorithm, according to the unit time release request, regardless of your server can not be processed, I put, ah, is put!

Smell of interpretation: the toilet has five pits, I put five people in a minute, the next minute and then put five people in. There could be five people in there, there could be 10 people in there, I don’t know.

4. How to choose?

Perhaps the interviewer, after hearing that you know about nginx’s traffic limiting, will ask you what kind of traffic limiting strategy you use under what circumstances

  • IP traffic limiting: This can be configured before an activity starts and can also be used to prevent scripting attacks (another word in the case of IP proxy)
  • Request traffic limiting: Schedules can be configured to protect our servers from crashes caused by bursts of traffic

Bucket algorithm

The main concepts of the leaky bucket algorithm are as follows:

  • A fixed capacity of the leakage bucket, at a constant constant rate of water droplets;
  • If the bucket is empty, no water needs to flow;
  • Can flow at any rate into the leakage bucket;
  • If the incoming droplet exceeds the bucket’s capacity, the incoming droplet overflows (is discarded), while the leaky bucket’s capacity is constant.

Tomcat current-limiting

It doesn’t work very well, but get to know it

Tomcat has been packaged in SpringBoot, making developers more and more lazy. But human evolution, the root cause is laziness, so it’s not a bad thing.

In the Tomcat configuration file, there is one maxThreads

<Connector port="8080"   connectionTimeout="30000" protocol="HTTP / 1.1"
          maxThreads="1000"  redirectPort="8000" />
Copy the code

This seems to have no good introduction, if you encounter your pressure test, concurrent can not check this configuration.

In the previous interview, the interviewer asked me questions about Tomcat:

What is the default maximum number of Tomcat connections? How many threads did your server set? How much memory is used by threads?Copy the code

conclusion

The ngx_http_limit_conn_module module of Nginx allows only one request per unit of time for an IP address, which prevents users from making multiple requests. Relieve pressure on services. After the order page is displayed, multiple requests are generated per unit time. You can use the ngx_http_limit_req_module module to limit the number of requests to avoid order loss due to IP address restrictions.

In addition, before the service went live, we did a maximum concurrent load test on the server (e.g. 200 concurrent), so set the maximum Tomcat request to (300, slightly higher, other requests).

Server traffic limiting

Single current limiting

If our system deployment is only one machine, then we can directly use the single-machine traffic limiting solution (after all, you have to use distributed traffic limiting for one machine, is it a bit too much ~).

<! -- https://mvnrepository.com/artifact/com.google.guava/guava -->
<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>30.1.1 - jre</version>
</dependency>
Copy the code

The sample code

    public static void main(String[] args) throws InterruptedException {
        // Generate one token per second
        RateLimiter rt = RateLimiter.create(1.1, TimeUnit.SECONDS);
        System.out.println("try acquire token: " + rt.tryAcquire(1) + " time:" + System.currentTimeMillis());
        System.out.println("try acquire token: " + rt.tryAcquire(1) + " time:" + System.currentTimeMillis());
        Thread.sleep(2000);
        System.out.println("try acquire token: " + rt.tryAcquire(1) + " time:" + System.currentTimeMillis());
        System.out.println("try acquire token: " + rt.tryAcquire(1) + " time:" + System.currentTimeMillis());

        System.out.println("------------- separator -----------------");

    }
Copy the code

Both ratelimiter.tryacquire () and ratelimiter.acquire () methods obtain tokens through a limiter,

1, tryAcquire

Support the incoming wait time, through canAcquire to determine the time of the earliest generated token, judge whether to wait for the acquisition of the next token.

public boolean tryAcquire(int permits, long timeout, TimeUnit unit);

  private boolean canAcquire(long nowMicros, long timeoutMicros) {
    return queryEarliestAvailable(nowMicros) - timeoutMicros <= nowMicros;
  }

Copy the code

Example code:

    public static void main(String[] args) throws InterruptedException {
        // Generate one token per second
        RateLimiter rt = RateLimiter.create(1.3, TimeUnit.SECONDS);
        System.out.println("try acquire token: " + rt.tryAcquire(1,TimeUnit.SECONDS) + " time:" + System.currentTimeMillis());
        System.out.println("try acquire token: " + rt.tryAcquire(5,TimeUnit.SECONDS) + " time:" + System.currentTimeMillis());
        Thread.sleep(10000);
        System.out.println("------------- separator -----------------");
        System.out.println("try acquire token: " + rt.tryAcquire(1,TimeUnit.SECONDS) + " time:" + System.currentTimeMillis());
        System.out.println("try acquire token: " + rt.tryAcquire(1,TimeUnit.SECONDS) + " time:" + System.currentTimeMillis());

    }

Copy the code

Output results:

2, acquire

Acquire = block and wait for a token to be acquired.

Example code:

        RateLimiter rt = RateLimiter.create(1);
        // Generate one token per second
        for (int i = 0; i < 11; i++) {
            new Thread(() -> {
                // Get 1 token
                rt.acquire();
                System.out.println("Try acquire token success, time:" +System.currentTimeMillis() + " ThreaName:"+Thread.currentThread().getName());
            }).start();
        }
Copy the code

Output results:

The token algorithm

In nignx we have the funnel algorithm, and in RateLimiter we have the token algorithm

We can explain this by looking at the figure above, where there is a bucket of limited capacity and tokens are added to the bucket at a fixed rate. Since the bucket has a limited capacity, it is not possible to add unlimited tokens to it. If the bucket is full by the time the token reaches the bucket, the token is discarded. N tokens are removed from the bucket on each request. If the number of tokens in the bucket is less than n, the request is rejected or blocked.

There are a few key attributes

  /** The currently stored permits. */
  double storedPermits; // The current number of tokens

  /** The maximum number of stored permits. */
  double maxPermits; // The maximum number of tokens
  private long nextFreeTicketMicros = 0L; // The next token fetch time

Copy the code

Before fetching a token, there is a rule that determines whether the current fetching time of the token meets the last fetching time — producing time of the token.

For example, if I get a token for 100 seconds and generate a token for 10 seconds, I can’t get a token regardless of whether the bucket has a token or not when I come to get it for 105 seconds.

  private boolean canAcquire(long nowMicros, long timeoutMicros) {
    return queryEarliestAvailable(nowMicros) - timeoutMicros <= nowMicros;
  }
Copy the code

Here’s the point!!

So what is the use of the number of tokens (the amount of tokens) in the token bucket? For different requests, we can set a different number of tokens, high priority, only need 1 token. If the priority is low, multiple tokens are required. When the time for obtaining the token is up, the next layer determines whether the number of tokens is sufficient. Requests with high priority (those requiring a small number of tokens) can be released !!!!! immediately

The algorithm for refreshing tokens in RateLimit:

  void resync(long nowMicros) {
    // if nextFreeTicket is in the past, resync to now
    if (nowMicros > nextFreeTicketMicros) {
      doublenewPermits = (nowMicros - nextFreeTicketMicros) / coolDownIntervalMicros(); storedPermits = min(maxPermits, storedPermits + newPermits); nextFreeTicketMicros = nowMicros; }}Copy the code

Cluster current-limiting

As our seckill system gets bigger and stronger, one machine will not be able to meet our needs, so our deployment architecture will evolve into the following architecture diagram (simplified version).

Before limiting the flow of a cluster, here’s a question to consider:

Cluster deployment we can’t use stand-alone deployment solution?

The answer is yes, we can extend the single-machine flow limiting scheme to every machine in the cluster, so that every machine reuse the same set of flow limiting code (RateLimit implementation) every day.

So what’s wrong with this scheme?

  • Uneven flow distribution
  • Error limit, error limit
  • Late update

The request received by our server is distributed by nginx. In a certain period of time, due to uneven distribution of the request (60,30,10 proportional distribution, traffic limiting 50qps), the traffic limiting of the first machine will be triggered. For the cluster, My overall limit threshold is 150 QPS, now 100qps limit, that will not work wow!

Redis implementation

Reference: juejin.cn/post/684490…

We can use the ordered set ZSet of Redis to realize the time window algorithm traffic limiting. The process of implementation is to use the key of ZSet to store the ID of traffic limiting, and score to store the time of request. Every time a request comes, the traffic volume of the previous time window is cleared first. If the number of current time Windows is greater than or equal to the maximum number of traffic allowed, return False to perform traffic limiting operations, allow service logic to be executed, and add a valid access record to the ZSet.

There are two disadvantages to this implementation:

  • ZSet is used to store the access records of each time. If the amount of data is large, it will occupy a large amount of space. For example, if the 60s allows 100W access;
  • The execution of this code is non-atomic, first judging and then increasing, and the gap between them can be interspersed with the execution of other business logic, resulting in inaccurate results.

Current-limiting middleware

Sentinel is a lightweight and highly available flow control component for distributed service architecture developed by Ali Middleware team. Sentinel mainly takes traffic as the entry point and helps users protect the stability of services from multiple dimensions such as flow control, circuit breaker degradation and system load protection.

The principle of the current limiting middleware is very important, so HERE I briefly cracked some differences between them, and I will write a separate article to share the implementation principle of Sentinel in the future! At present, it is easy to understand that the bottom layer is based on sliding Windows

Sliding window algorithm

In the underlying implementation of Sentinel and Hystrix, sliding window is adopted. Here is a brief description of sliding window. Within 1S, I allow 5 requests to pass, respectively in the range of 0~200ms,200~400ms and so on. Our time interval becomes 200ms ~ 1200ms. So the first request is not in the statistical range. Our total number of requests is 4 now, so we can accept a new request for processing!

conclusion

Just to ramble, in the picture I drew, I listed Hystrix, Sentinel, and the internal Guardian of the ants. They all have one thing in common: protection. Porcupines have hard quills to protect their soft bodies, while sentries and guards protect their families behind them.

When an interviewer asks you why you use current-limiting, your first instinct should be to protect the system. Protect the system from harm! This is the fundamental reason why you should use various strategies to limit the flow.

When we talk about high availability, we think about peak clipping, current limiting, and circuit breakers. Their goal is to protect our system, to improve the availability of the system, we often say system availability 9, this data is protected by a variety of high availability strategies.

Follow-up plan:

  • fusingAnd combined with theSentinelTo introduce the principle of seckill system using the scene of fusing
  • Peak clippingAnd combined with theRocketMQLet’s talk about the pros and cons of peaking. IntroduceMQCosts and risks

Pay attention, don’t get lost

Well folks, that’s all for this post, and I’ll be updating it weekly with a few high-quality articles about big factory interviews and common technology stacks. Thank everyone can see here, if this article is well written, please three!! Creation is not easy, thank you for your support and recognition, we will see the next article!

I am Jiuling, there is a need to communicate children’s shoes can add me WX, JayCE-K, follow the public number: Java tutorial, master first-hand information!

If there are any mistakes in this blog, please comment and comment. Thank you very much!