Second kill system believe that many people have seen, such as JINGdong or Taobao’s second kill, millet mobile phone’s second kill, so how to achieve the background of the second kill system? How do we design a seckill system? What should be considered for a seckill system? How to design a robust seckill system? This article we will discuss this problem.

What questions should be considered

Oversold problem

In the analysis of the business scenario of seckill, the most important point is the oversold problem. If there are only 100 goods in stock, but 200 are eventually oversold, generally speaking, the price of seckill system is relatively low. If the oversold will seriously affect the property interests of the company, so the first thing is to solve the oversold problem of goods.

High concurrency

Seckill has the characteristics of short time and large concurrency, and the duration of seckill is only a few minutes. In order to create a sensation, general companies will attract users with very low prices, so there will be a lot of users to participate in the buying. In a short period of time, a large number of requests will flood in, and how to prevent the backend from too high concurrency resulting in cache breakdown or failure, breaking the database are issues to consider.

Interface brush now most of the second kill will come out for the second kill corresponding software, this kind of software will simulate the continuous launch of requests to the background server, hundreds of times a second is very common, how to prevent this kind of software repeated invalid requests, to prevent the continuous launch of requests also need us to consider.

Seconds kill the url

For ordinary users, what they see is a relatively simple seckill page. The seckill button is gray before the specified time is reached. Once the specified time is reached, the gray button becomes clickable. This part is aimed at the white users, if the user is a little computer background, will see the browser network through F12 to see the URL of the second kill, through the specific software to request can also achieve second kill. Or know ahead of time to kill the URL, a request on the direct implementation of the second kill. This is a problem we need to think about.

Database design

There is a risk that it will bring down our server, and if it is used in the same database as our other businesses, coupled with it, it will potentially affect other businesses. How to prevent the occurrence of such problems? Even if the server breaks down and freezes, it should not affect the normal online services as much as possible.

Mass request problem

In terms of “high concurrency”, even the use of caching is insufficient to deal with the impact of high concurrency traffic for a short period of time. How to carry such a huge number of visits while providing stable and low latency service guarantee is a major challenge to be faced. Let’s calculate an account. If the Redis cache is used, the QPS that a single Redis server can bear is about 4W. If a second kill attracts enough users, the single QPS may reach hundreds of thousands, and the single Redis is still not enough to support such a huge number of requests. The cache will be punctured, infiltrating DB directly, which will bring down MySQL, and there will be a lot of errors in the background.

Second kill system design and technical scheme

Second kill system database design

In view of the problem of “database design”, a separate database should be designed to prevent the whole site from crashing due to high concurrent access to the activity. Here you only need two tables, one for the kill order list and one for the kill goods list.

In fact, there should be several tables, commodity table: can be associated with goods_id to check specific commodity information, commodity image, name, usual price, second price, etc., and user table: according to the user user_id can query user nickname, user phone number, shipping address and other additional information, this specific example will not be given.

Seconds kill URL design

In order to avoid people with program access experience to kill goods by directly accessing the background interface through the ordering page URL, we need to make the URL dynamic, even the development of the whole system can not know the URL before the start of the kill. The specific approach is to encrypt a string of random characters through MD5 as the SECOND kill URL, and then the front-end access to the background to obtain the specific URL, after the background verification can continue to second kill.

Seckill page static

The product description, parameters, transaction records, images, evaluation and so on are all written into a static page, the user request does not need to visit the back-end server, do not need to go through the database, directly generated in the foreground client, so as to reduce the server pressure as much as possible. This can be done by using Freemarker template technology, creating a web page template, filling it with data, and then rendering the page.

Single Redis are upgraded to cluster Redis

Seckill is a scenario where you read too much and write too little, and Redis is the perfect cache. However, considering the cache breakdown problem, we should build a Redis cluster, using the Sentinel mode, which can improve the performance and availability of Redis.

Use Nginx

Nginx is a high-performance Web server with tens of thousands of concurrent capabilities, whereas Tomcat has only a few hundred. Nginx maps client requests and distributes them to a cluster of back-end Tomcat servers to greatly improve concurrency.

To streamline the SQL

A typical scenario is when the inventory is deducted. The traditional approach is to query the inventory first and then update it. This will require two SQL statements, but in fact we can do one SQL. Update miaosha_goods set stock =stock-1 where goos_id ={#goods_id} and version = #{version} and sock>0; In this way, you can ensure that the inventory is not oversold and update the inventory once. Also note that optimistic locks with version numbers are used, which perform better than pessimistic locks.

Redis pre-destocking

A lot of requests come in, all need background query inventory, this is a frequent read scenario. You can use Redis to pre-reduce inventory. Set the value in Redis, such as Redis. Set (goodsId,100), where the pre-stored inventory is 100. Integer stock = (Integer)redis.get(goosId); Then determine the sock value, if less than the constant value subtract 1; However, it should be noted that when canceling, the inventory needs to be increased, and when increasing the inventory, it should also be noted that the total inventory should not be greater than the set number (inventory query and inventory reduction need atomic operation, you can use lua script at this time). When placing the next order and acquiring the inventory, it can be checked directly from Redis.

The interface current limiting

In the end, the essence of seckilling is database update, but there are a lot of invalid requests, we ultimately have to do how to filter out these invalid requests, prevent infiltration into the database. To limit the current, there are many aspects to start with:

The front-end current-limiting

The first step is to use front-end traffic limiting, where the user initiates a request after the seckill button is clicked, and the request cannot be clicked for the next 5 seconds (by setting the button to Disable). This small measure is cheap to develop, but effective.

Repeated requests from the same user within xx seconds are rejected

The specific number of seconds depends on the actual service and the number of seconds killed. Generally, it is limited to 10 seconds. Get (userId) from String Value = redis.get(userId); If the value is null or null, it is a valid request, and the request is allowed. If it is not empty, it is a repeated request, and the request is discarded. If valid, use redis.setexpire(userId,value,10). Value can be any value, it is better to use the service attribute, set the expiration time of 10 seconds with the userId as the key.

Token bucket algorithm traffic limiting

There are many strategies for limiting traffic on an interface. We use the token bucket algorithm here. The basic idea of token bucket algorithm is that each request attempts to obtain a token, and the backend only processes requests with tokens. We can limit the speed and efficiency of token production by ourselves. Guava provides RateLimter API for us to use. For a simple example, note that Guava needs to be introduced:

Public class TestRateLimiter {public static void main(String[] args) {// Generate 1 token in 1 second Final RateLimiter RateLimiter = RateLimiter.create(1); for (int i = 0; i < 10; I++) {// this method blocks the thread until a token can be fetched from the token bucket. double waitTime= rateLimiter.acquire(); System.out.println(" task execution "+ I +" waitTime "+ waitTime); } system.out.println (" end of execution "); }}Copy the code

The idea of the code above is to limit our token bucket to generate 1 token per second (which is inefficient) and loop 10 times to execute the task through RateLimiter. Acquire blocks the current thread until a token is acquired, that is, if the task does not acquire a token, it waits. Then the request will be stuck for a certain amount of time before it can proceed. This method returns how long the thread will wait. Execute as follows:

As you can see, there is no need to wait for the first task as the token is already produced in the first second of execution. Subsequent task requests must wait until the token bucket generates a token before proceeding. If not, it blocks (there is a pause in the process). However, this approach is not very good, because if the user requests on the client side, if too many, directly in the background production token will stall (poor user experience), it will not abandon the task, we need a better strategy: if the time is not a certain amount of time, simply reject the task. Here’s another example:

public class TestRateLimiter2 { public static void main(String[] args) { final RateLimiter rateLimiter = RateLimiter.create(1); for (int i = 0; i < 10; I++) {long timeOut = (long) 0.5; boolean isValid = rateLimiter.tryAcquire(timeOut, TimeUnit.SECONDS); System.out.println(" task "+ I +" whether the execution isValid :" + isValid); if (! isValid) { continue; } system.out. println(" task "+ I +" executing "); } system.out.println (" end "); }Copy the code

The tryAcquire method is used. The main function of this method is to set a timeout and return true if the token is available and false if it is not. We then let the invalid skip directly, set it to produce 1 token per second, let each task try to get the token in 0.5 seconds, if not, just skip the task (in a seckill environment, just discard the request). The program actually runs as follows:

The token bucket (1 token per second) will not return false if the token bucket (1 token per second) is not produced within 0.5 seconds.

How efficient is this traffic limiting strategy? If our concurrent requests are 4 million instantaneous requests, the token generation efficiency is set to 20 per second, and the token acquisition time is 0.05 seconds per attempt, then the final test results will only allow 4 or so requests at a time, and a large number of requests will be rejected, which is the beauty of the token bucket algorithm.

Asynchronous order

In order to improve the efficiency of ordering, and prevent the failure of ordering service. The order operation needs to be processed asynchronously. The most common approach is to use queues, which have three obvious advantages: async, peak clipping, and decoupling. This can be done with RabbitMQ, and after limiting traffic and checking inventory in the background, valid requests flow into this step. It is then sent to the queue, which receives the message and places the order asynchronously. After placing the order, there is no problem in storage. You can notify the user of the success of seckilling by SMS. In case of failure, compensation can be used and retry.

Service degradation

If a server goes down or the service is unavailable during the seckill process, backup work should be done. A previous blog post talked about using Hystrix for service meltdowns and downturns. You can develop a backup service that, if the server does go down, gives users a friendly message back instead of a blunt feedback such as a freeze or a server error.

conclusion

Second kill flow chart:

This is the second kill flow chart I designed, of course, different second kill volume for the technology selection is not the same, this process can support hundreds of thousands of flow, if it is tens of millions of billions of it will have to be redesigned. For example, the sub-database sub-table, queue changed to Kafka, Redis to increase the number of clusters and other means. Through this design is mainly to show how we deal with high concurrency processing, and began to try to solve it, in the work of more thinking, more hands can improve our ability level, come on.