How to design a good seckill system?

preface

If you ask how to design the second kill system, I believe you can tell a 123, but if you look into the details, many people estimate that they can not quickly hit up. This article from six aspects, to briefly talk about the second kill system how to design, should be the main things.

01 seconds kill | design system should pay attention to the five principles of architecture

Speaking of secakill, I think you must be familiar with it. In the past two years, the scene of “secakill” can be seen everywhere, from shopping on Singles’ Day to snatping red envelopes during the Spring Festival and then snatping train tickets at 12306. In simple terms, seckilling is the process of having a large number of requests competing to buy and complete the transaction at the same time, or in technical jargon, a large number of concurrent reads and writes. Concurrency is one of the biggest headaches for programmers in any language.

The same is true for a piece of software. You can quickly add, delete, modify and create a seckill system, but it is not so easy to make it support high concurrency. For example, how do you keep your system from failing in the face of millions of requests? How to ensure data consistency in high concurrency? Is it all heap server? This is clearly not the best solution.

In my opinion, a seckill system is essentially a distributed system that satisfies large concurrency, high performance, and high availability. Today, we are going to talk about how to achieve the ultimate performance improvement for this kind of business while satisfying a well-structured distributed system.

Architecture principle: “4 yes 1 no

“If you are an architect, you need to start by outlining the elements of how to build a high-traffic concurrent read and write, high-performance, and highly available system. I summarize these elements as “4 yes, 1 no”. ** As little data as possible ** By “as little data as possible”, first of all, users request as little data as possible. The requested data consists of data uploaded to the system and data returned by the system to the user (usually a web page). Why “keep data to a minimum”? Firstly, it takes time for the data to be transmitted over the network, and secondly, whether the data is requested or returned needs to be processed by the server. The server usually has to do compression and character encoding when writing the network, which is very CPU consuming. Therefore, reducing the amount of data transmitted can significantly reduce the CPU usage. For example, we can simplify the page size, remove unnecessary page decoration effects, and so on. Secondly, “data should be as little as possible” also requires the system to rely on as little as possible, including the system to complete some business logic needs to read and save the data, these data and background services and databases are generally dealing with. Calling other services involves serialization and deserialization of data, which is a major CPU killer, as well as increasing latency. Also, the database itself can easily become a bottleneck, so the less you have to deal with the database the better, and the simpler and smaller the data, the better.

** When the page is returned, the browser renders the page with additional requests. For example, CSS/JavaScript, images, Ajax requests that the page relies on are defined as “additional requests” and these additional requests should be kept to a minimum. This is because every request the browser makes is somewhat costly, such as a three-way handshake to establish a connection, page dependencies or connection number limits, and serial loading of some requests (such as JavaScript). In addition, if the domain name is not the same for different requests, the DNS resolution of these domain names is involved, which may take longer. So keep in mind that reducing the number of requests can significantly reduce resource consumption due to all of these factors. For example, one of the most common practices to reduce the number of requests is to merge CSS and JavaScript files, combining multiple JavaScript files into a single file separated by commas in the URL (g.xxx.com/tm/xx-b/4.0… URL, and then dynamically merge those files back together.

3. Keep the path as short as possible. The so-called “path” is the number of intermediate nodes that the user needs to pass through during the process from sending a request to returning data. Typically, these nodes can be represented as either a system or a new Socket connection (for example, a proxy server simply creates a new Socket connection to forward requests). Each time a node passes, a new Socket connection is generally created. However, each additional connection adds a new level of uncertainty. Statistically speaking, if a request goes through five nodes and each node is 99.9% available, then the availability of the entire request is 99.9% to the power of 5, which is approximately 99.5%. Therefore, shortening the request path can not only increase availability, but also improve performance (fewer intermediate nodes can reduce serialization and deserialization of data) and reduce latency (can reduce network transfer time). One way to shorten access paths is for multiple strongly dependent applications to be deployed together, turning remote procedure calls (RPCS) into method calls between JVMS. In my book, Technical Architecture Evolution and Performance Optimization for Large Web Sites, I have a chapter on the detailed implementation of this technique.

Dependency should be minimized. Dependency refers to the system or service that must be depended on to complete a user request. Here dependency refers to strong dependency. For example, if you want to display a kill page, the page must rely heavily on product information, user information, and other non-essential information such as coupons, deal lists, etc. (weak dependencies), which can be removed in an emergency. To reduce dependence, we can classify systems, such as level 0 system, level 1 system, level 2 system, level 3 system. If level 0 system is the most important system, then the system with strong dependence of level 0 system is also the most important system, and so on. Note that level 0 systems should minimize their strong dependence on Level 1 systems to prevent important systems from being overwhelmed by less important ones. For example, if the payment system is a tier 0 system and the coupon is a tier 1 system, the coupon can be downgraded in extreme cases to prevent the payment system from being overwhelmed by the tier 1 system.

5. Do not have a single point in a system can be said to be a taboo in system architecture, because a single point means that there is no backup, the risk is not controllable, we design a distributed system is the most important principle is to “eliminate the single point”. So how do you avoid a single point? I think the key is to avoid binding the state of the service to the machine, i.e. making the service stateless so that it can be moved around the machine at will. How do you decouple the state of the service from the machine? There are many ways to do this. For example, to make machine-specific configurations dynamic, these parameters can be dynamically pushed through the configuration center and dynamically pulled down when the service is started. We set some rules in these configuration centers to easily change these mappings. Stateless application is an effective way to avoid a single point, but it is difficult for storage services to be stateless, because data must be stored on disks and bound to machines. In this scenario, the single point problem is usually solved by redundant multiple backups. I’ve covered some of these design principles, but have you noticed that I keep saying “try” instead of “absolutely”? I’m sure you’re asking if the least is always the best, and my answer is “not necessarily”. We used to inline some CSS in the page, which could reduce the need to rely on one CSS request and speed up the rendering of the home page. However, it also increased the page size, which did not conform to the principle of “minimizing data”. In this case, in order to improve the rendering speed of the first screen, Only the HTML-dependent CSS on the first screen is inlined, and the other CSS is still loaded in the file as a dependency, so as to achieve a balance between the opening speed of the first screen and the loading performance of the whole page. So architecture is the art of balance, and the best architecture is empty once it’s out of context. What I want you to keep in mind is that the points mentioned here are just directions, and you should try to work in those directions, but also balance the other factors. I mentioned some architectural principles earlier, but what is a good architecture for the “seckilling” scenario? Here I take the evolution of taobao’s early seckill system architecture as the main line to help you sort out the best seckill system architecture under different request volumes.

I mentioned some architectural principles earlier, but what is a good architecture for the “seckilling” scenario? Here I take the evolution of taobao’s early seckill system architecture as the main line to help you sort out the best seckill system architecture under different request volumes. If you want to quickly set up a simple clickkill system, just add a “timed listing” feature to your purchase page so that users see the buy button only when the clickkill starts and the item ends when it’s out of stock. This is how the first version of the seckill system was implemented. However, as the volume of requests increased (from, say, 1W /s to 10W /s), this simple architecture quickly ran into bottlenecks and required architectural changes to improve system performance. These structural changes include:

The seckill system to create a separate system, so that targeted optimization can be done, for example, the independent system will reduce the shop decoration function, reduce the complexity of the page;
In the system deployment also do an independent machine cluster, so that the second kill large flow will not affect the normal commodity purchase cluster machine load;
Put hot data (such as inventory data) into a separate cache system to improve “read performance”;
Add seckill questions to prevent seckill from competing.

The system architecture now looks like this. Most importantly, seckill details became a separate new system, and some of the core data was placed in the Cache, and other related systems were deployed in a separate cluster.

However, the architecture still can not support more than 100W /s requests, so in order to further improve the performance of the seckill system, we made further upgrades to the architecture, such as:

Complete static and dynamic separation of the page, so that users do not need to refresh the entire page, but only need to click the grab the treasure button, so that the page refresh data to a minimum;
Local caching of seckill commodities on the server side does not need to call background services that depend on the system to obtain data, or even to query data in the public cache cluster, which can not only reduce system calls, but also avoid crushing the public cache cluster.
Add system current limiting protection to prevent the worst case.

After these optimizations, the system architecture looks like the one shown below. Here, we further static the page so that we don’t need to refresh the entire page during the seckill process, and only need to request very little dynamic data from the server. In addition, the most critical details and transaction systems have been added to the local cache, to cache the information of the kill goods in advance, hot database has been independently deployed, and so on.

From the previous upgrades, the more you need to customize, the less generic you are. For example, caching split-kill items in memory on each machine is obviously not a good way to have too many split-kill items running at the same time, because memory on a single machine is always limited. So to achieve extreme performance, you have to make sacrifices in other areas (for example, versatility, ease of use, cost, etc.).

02 | how to do action separation? What are the options?

Dynamic and static separation of data. Have you heard of this solution before? Whether you’ve heard it or not, I suggest you stop and think about the value of static separation. If your system is not already using static/static separation, you can also ask yourself why not. Did you not think of it before, or did you not need the business volume at all? I can safely say, however, that if you are in a rapidly growing company, and you are deeply involved in the architecture or development of a seckilling system within the company, you will sooner or later come up with a static separation solution. Why is that? Very simple, in the seckilling scenario, the requirements for the system are actually three words: fast, accurate and stable.

So how do you get up fast? I think in the abstract, there are only two points, one is to improve the efficiency of a single request, one is to reduce unnecessary requests. Today we talk about “static separation” is aimed at this general direction. I don’t know if you still remember, the earliest SEC kill system was actually to refresh the whole page, but later, when SEC kill, you just click the “refresh loot” button is enough, the essence of this change is static separation, after separation, the client greatly reduced the amount of data requested. Isn’t that naturally “soon”?

What is static data

So what exactly is static separation? The so-called “static and dynamic separation” is actually the user request data (such as HTML pages) into “dynamic data” and “static data”.

Simply put, the main difference between “dynamic data” and “static data” is whether the output data in the page is related to URL, viewer, time and region, and whether it contains private data such as cookies. Such as:

Many media sites, the content of an article whether you visit or I visit, it is the same. So it’s typically static data, but it’s a dynamic page.

If we visit the home page of Taobao now, everyone may see a different page, taobao home page contains a lot of information recommended according to the characteristics of visitors, and these personalized data can be understood as dynamic data.

Again, when we talk about static data, we can’t just think of it as an HTML page on disk in the traditional sense. It can also be a page generated by a Java system, but the output page itself does not contain the above factors. This is “dynamic” or “static”, not whether the data itself is moving or not, but whether the data contains personalized data related to the visitor.

It is also important to note that “not included” in the page means “not included in the HTML source code of the page”. With an understanding of static data and dynamic data, I expect you can easily understand the context of the “static separation” solution. By separating static data, we can cache the separated static data. With caching, the “access efficiency” of static data is naturally improved. So, how do you cache static data? I’ve summarized a few key points here.

** First, you should cache static data as close to the user as possible. ** Static data is data that is relatively unchanged, so we can cache it. Where does the cache go? There are three common ones: in the user’s browser, on the CDN, or in the server’s Cache. You should cache them as close to the user as possible, depending on the situation.

** Second, the static transformation is to cache HTTP connections directly. ** You’ve certainly heard of static changes to the system as opposed to regular data caching. As shown in the figure below, the Web proxy server directly fetches the corresponding HTTP response header and body according to the request URL and then returns it directly. This response process is so simple that HTTP protocol does not need to be reassembled. Even HTTP request headers do not need to be parsed.

** Third, who caches static data is also important. ** Cache software written in different languages can handle cached data differently. In the case of Java, you can do caching directly on the Web server layer instead of the Java layer, because the Java system has its own weaknesses (such as not being good at handling large numbers of connection requests, consuming a lot of memory per connection, and the Servlet container is slow to parse HTTP). That way you can mask some of the weaknesses at the Java language level; Web servers (such as Nginx, Apache, Varnish) are also better at handling large concurrent static file requests.

How to do the transformation of static and static separation

Now that we understand the “why” and “what” of dynamic and static data, we move on to “how”. How do we make dynamic pages cache-friendly static pages? In fact, it is very simple, that is to remove the previous several influencing factors, separate them separately, do static separation.

Let me take a typical product detail system as an example. Here, you can first open the product details page of JINGdong or Taobao to see what kind of movement data are in this page. We separate dynamic content from the following five aspects.

**- URL uniqueness. ** Product details system naturally can do URL unique, such as each product by ID to identify, then item.xxx.com/item.htm?id… Can be used as a unique URL identifier. Why is the URL unique? We want to cache the entire HTTP connection, so what is the Key? Use the URL as the cache Key, for example, id= XXX. **- Separate viewer related factors. ** Factors related to visitors include whether they have logged in or not, as well as their login identity, etc., which we can separate out and obtain through dynamic requests. **- Separate the time factor. ** The server output time is also obtained by dynamic request. **- Asynchronize the region factor. ** Details page related to the locale to make asynchronous access, of course you can also use dynamic access, but asynchronous access is more appropriate. **- Remove cookies. ** The Cookie contained in the page output by the server can be deleted by code software. For example, the Web server Varnish can remove the Cookie by using the unset req.http. Cookie command. Note that Cookie removal does not mean that the page received by the client does not contain cookies, but rather that cookies are not contained in the cached static data.

Once the dynamic content is separated out, it becomes critical to organize the content pages. I want to remind you that a lot of this dynamic content is used by other modules in the page, such as whether the user is logged in, whether the user ID matches, etc., so we should json-organize the data in JSON format to make it easy for the front end to get.

We used caching to handle static data in the previous section. There are two methods for dynamic content processing: ESI (Edge Side Includes) and CSI (Client Side Include).

**ESI solution (or SSI) : ** A dynamic content request is made on a Web proxy server and inserted into a static page so that when the user gets the page, it is already a complete page. This approach has some impact on server performance, but the user experience is better.

* * CSI. ** is a separate asynchronous JavaScript request to fetch dynamic content from the server. This way, the server performance is better, but the client page may be delayed and the experience is slightly worse.

Several architectural schemes of static and static separation

We have separated the static data and dynamic data through transformation in the past, so how to further recombine these dynamic and static data in the system architecture, and then output to the user completely? This involves properly architecting the user request path. Based on the complexity of the architecture, three solutions are available: physical machine single-node deployment; Unified Cache layer; On CDN.

Solution 1: Single-server deployment

In this solution, virtual machines are changed to physical machines to increase the Cache capacity, and consistent Hash groups are used to improve the hit ratio. The Cache is divided into several groups to achieve a balance between hit ratio and hotspot access. The fewer Hash groups there are, the higher the Cache hit ratio will be. However, the disadvantage is that single items will be concentrated in one group, which will easily lead to Cache breakdown. Therefore, we should add multiple groups of the same type appropriately to balance the problem of hot access and hit ratio. Here I present the structure diagram of the physical machine deployment scheme as follows:

Physical machine deployment has the following advantages: No network bottlenecks and large memory usage. Can not only improve the hit ratio, but also reduce Gzip compression; Reduce the Cache invalidation pressure, because the Cache invalidation mode is timed. For example, the Cache is only cached for three seconds. In this scenario, the advantages of switching from Java applications that normally only require a virtual machine or container to a physical machine are obvious. It increases the memory capacity of a single machine, but it also wastes CPU to a certain extent, because a single Java process can hardly use the entire CPU of a physical machine.

In addition, Java applications deployed on a physical machine are used as a Cache, which causes high complexity in operation and maintenance, so this is a compromise solution. If there are no more systems in your company with similar requirements, then this is also appropriate. If you have multiple business systems with static transformation requirements, it is advisable to separate the Cache layer for public use, as shown in Solution 2 below.

Solution 2: Unified Cache layer

The unified Cache layer separates the single-node Cache to form a single Cache cluster. Unified Cache layer is a more ideal and scalable scheme. The structure diagram of this scheme is as follows:

** Unified management of the Cache layer can reduce operation and maintenance costs and facilitate access to other static systems. In addition, it has some advantages. 六四屠杀

A single Cache layer can reduce the cost of using the Cache for multiple applications. In this way, the application only needs to maintain its own Java system, and does not need to maintain the Cache separately. The solution that only cares about how to use the unified Cache is easier to maintain. For example, it only needs a set of solutions to strengthen monitoring and configuration automation, and unified maintenance and upgrade is convenient. Memory can be shared to maximize memory utilization, and memory can be dynamically switched between different systems, making it effective against various attacks.

While this solution is more maintainable, it also introduces some other problems, such as more centralized caches, resulting in:

The internal switching network of Cache layer becomes the bottleneck. The cache server’s network card can also be a bottleneck; Less machine risk is greater, the failure of one will affect a large part of the cache data.

To resolve the above problems, you can Hash caches, that is, a group of caches with the same contents. In this way, new bottlenecks can be avoided due to excessive concentration of hotspot data.

Plan 3: Access the CDN

After we separate the whole system, we will naturally think of a further solution, which is to move the Cache further forward to the CDN, because the CDN is closest to the user, so the effect will be better. But in order to do so, there are several problems that need to be solved.

** failure problem. ** We have also mentioned the cache aging problem, I don’t know if you understand, I will explain again. When it comes to static data, THE key word I use is “relatively constant,” which translates to “subject to change.” An article, for example, stays the same now, but if you find a typo, will it change? If you have a long cache age, the client will see the wrong thing for a long time. Therefore, in this scheme, we also need to ensure that CDN can simultaneously invalidate the Cache distributed across the country within seconds, which has high requirements for the invalidation system of CDN.

** Hit ratio problem. ** One of the most important metrics for a Cache is “high hit ratio”, otherwise the Cache would be meaningless. Similarly, if all data is placed in the CDN of the country, Cache fragmentation will inevitably lead to, and Cache fragmentation will reduce the probability of access requests hitting the same Cache, so the hit ratio becomes a problem.

Issue update issues. If a business system has daily business releases every week, the release system must be simple and efficient, and you need to consider the ease of quickly rolling back and troubleshooting problems when they occur.

From the above analysis, it is not realistic to put the commodity details system on all CDN nodes nationwide, because there are problems of failure, hit ratio and system release and update. Is it possible to select several nodes to try to implement? The answer is “yes”, but such nodes ** need to meet several conditions: ** is close to the area where traffic is concentrated; Relatively far from the main station; The network between nodes and master stations is good and stable. The node capacity is large and does not occupy too many resources of other CDN.

And last but not least, that is:Don’t have too many nodes. Based on the above factors, the level-2 Cache of the CDN is suitable because the number of level-2 Cache is smaller and the capacity is larger. Users’ requests are sent to the level-2 Cache of the source CDN first. If the CDN fails, users can return to the source site to obtain data

The deployment mode is as follows:

Using CDN level 2 Cache as Cache can achieve a hit ratio similar to that of the current server static Cache. Because the number of nodes is small, the Cache is not very scattered, and the traffic volume is concentrated, the hit ratio problem can be solved, and the user can have the best access experience. It is an ideal CDN scheme at present. In addition, the CDN deployment scheme also has the following characteristics:

Caching the entire page in the user’s browser;
CDN is also requested if the entire page is forced to refresh;
The actual valid request is just the user’s click on the “refresh grab treasure” button.

In this way, 90% of the static data is cached on the client or CDN. When the real second kill occurs, the user only needs to click the special “refresh grab treasure” button, instead of refreshing the whole page. In this way, the system only requests a small amount of valid data from the server, and does not need to repeatedly request a large amount of static data. The dynamic data of seckill is less than the dynamic data of ordinary details page, and the performance is improved by more than 3 times.

Therefore, the design idea of “treasure grab” allows us to request the latest dynamic data on the server without refreshing the page.

03 | this principle: a targeted handle system data “hot spots”

What is “hot”

Hotspots are classified into hotspot operations and hotspot data. The so-called “hot operations”, such as a large number of page refreshes, a large number of shopping cart additions, double 11 midnight large number of orders and so on, are among these operations. For system, these operations can be abstracted as “read requests” and “write requests”, the two hot spots of the request handling, read requests to optimize space are larger, and the bottleneck of write requests are generally in storage layer, the idea of the optimization according to CAP do balance theory, the content I again in “inventory reduction” is introduced in detail.

The “hotspot data” is easier to understand, that is, the data corresponding to the user’s hotspot request. Hotspot data is divided into static hotspot data and dynamic hotspot data.

Static hotspot data refers to hotspot data that can be predicted in advance. For example, we can screen out these hot commodities in advance through the way of seller registration and mark them through the registration system. In addition, we can also use big data analysis to find hot commodities in advance. For example, we can analyze historical transaction records and users’ shopping cart records to find out which commodities are likely to be more popular and better sold. These are all hot commodities that can be analyzed in advance.

The so-called “dynamic hotspot data” is the hotspot that cannot be predicted in advance and is generated temporarily during the system running. For example, the seller advertised on Douyin, and then the product went viral, causing it to be bought in large quantities in a short time. Since the hotspot operation is a user’s behavior, we cannot change it, but we can do some restrictions and protection, so this paper mainly introduces how to optimize the hotspot data.

Discovering Hotspot Data

Earlier, I introduced how to separate the page data of a single split-kill item from the static data to optimize the processing of static data, so another key question arises: how to find these split-kill items, or more accurately, how to find hot items?

You might say, “A second kill item is a second kill item.” Yes, the key is how does the system know which item is a second kill item? So, you need to have a mechanism ahead of time to differentiate between normal goods and second kill goods.

Let’s look at it from the two aspects of finding static and dynamic hot spots.

Description Static hotspot data was discovered

Hot as previously said, static data can be through the business methods, such as forcing sellers to sign up for the ways ahead of the filtered hot commodities, is implemented by an operating system, to participate in the activity of commodity data marking, then through a backend system preprocessing of these hot commodities, such as cache in advance. However, this method of screening in advance through registration will also bring new problems, that is, increase the cost of sellers, and the real-time is poor, not too flexible. However, in addition to pre-registration screening, you can also predict in advance through technical means, such as big data calculation of the products that buyers visit every day, and then calculate the TOP N products, we can consider these TOP N products as hot products.

Discover dynamic hotspot data

We can predict static hot data in advance by means of seller registration or big data prediction, but there is a pain point in this, which is poor real-time performance. It would be perfect if our system could automatically discover hot products within seconds. Being able to dynamically discover hotspots in real time is valuable not only for the second kill items, but also for other popular items, so we need to find ways to implement dynamic discovery of hotspots.

Here I give a dynamic hot spot discovery system concrete implementation.

1. Build an asynchronous system, which can collect the hot Key of middleware products in each link of the transaction link, such as Nginx, cache, RPC service framework and other middleware (some middleware products themselves have hot statistics module). 2, establish a hot report and can be distributed in accordance with the requirements to subscribe to the service of the hot spot of the specification, main purpose is through the trading link on each system (including details, shopping cart, trading, preferential, inventory, logistics, etc.) in access time, the pass have found hot through upstream to the downstream system, arrange for protection. For example, for rush hours, the detail system is the first to know, and the Nginx module counts hot urls at the unified access layer. 3. Send the hotspot data collected by the upstream system to the hotspot service desk, and then the downstream system (such as the trading system) will know which commodities will be frequently invoked, and then do hotspot protection.

Here I gave a figure, in which the user access to the route when there are a lot of goods, we mainly rely on guide in front page (including the home page, search page, product details, shopping cart, etc.) in advance, to identify which traffic is high, the goods by the middleware in these systems to collect data, hot and record to the log.

We use the Agent deployed on each machine to summarize the logs into the aggregation and analysis cluster, and then push the hot data in accordance with certain rules to the corresponding system through subscription distribution system. You can either populate the Cache with hot data, push it directly into the memory of the application server, or intercept it, but the downstream system can subscribe to the data and decide what to do with it based on its own needs.

When building a hot spot discovery system, I have summarized a few considerations based on my previous experience.

1. It is better to use asynchronous mode for the background of the hot spot service to capture hot spot data logs, because “asynchronous” is convenient to ensure universality on the one hand and does not affect the main flow of the business system and middleware products on the other hand. 2. Hotspot service discovery coexists with the middleware’s own hotspot protection module, and each middleware and application needs to protect itself. The hotspot service console provides hotspot data collection and subscription services to transparently expose hotspot data of each system. 3. Hotspot discovery should be near real-time (hotspot data discovery within 3s), because only when it is close to real-time, dynamic discovery is meaningful and can provide real-time protection for downstream systems.

Processing hot Data

There are usually several ways to deal with hot data: optimization, restriction, and isolation. Let’s start with optimization. The most effective way to optimize hotspot data is to cache hotspot data. If hotspot data is separated from static data, static data can be cached for a long time. However, the cache of hotspot data is more of a “temporary” cache, that is, whether static or dynamic data is cached for a few seconds with a queue, which can be replaced by the LRU algorithm due to the limited queue length.

Let’s talk about limits. Limit is more of a protection mechanism, limiting the method also has a lot of, such as access to goods do consistent Hash ID, and then based on the Hash bucket, each barrel set a processing queue, this can limit the popular commodities in a request queue, prevent hot commodity takes too much for some server resources, Other requests will never get processing resources from the server.

Finally, a word about quarantine. The first principle in the design of a seckill system is to isolate the hot data, so that the 1% of requests do not affect the other 99%. After isolation, it is easier to optimize the 1% of requests.

In the case of seckilling, isolation can be implemented at several levels.

Service isolation. Make the second kill into a marketing activity, the seller to participate in the second kill this marketing activity needs to sign up separately, from the technical point of view, after the seller signs up for us, there is a known hot spot, so you can do a good job in advance. ** System isolation. ** System isolation is more runtime isolation and can be separated from the other 99% by grouping deployment. Seckill can apply for a separate domain name, the purpose is also to allow requests to fall into different clusters. ** Data isolation. ** Most of the data that is called is hot data. For example, a separate Cache cluster or MySQL database will be enabled to store hot data. The purpose is not to have 0.01% of the data affect 99.99% of the data.

04 | flow peak clipping what do I do this thing?

Why peel the peak?

We know that the processing resources of the server are constant, and the processing power is the same whether you use it or not, so it’s easy to get too busy and have nothing to do when you’re not busy. However, in order to ensure the quality of service, many of our processing resources can only be estimated according to the busy time, which leads to a waste of resources.

It is as if there is a solution to off-peak traffic because of the problem of morning and evening rush hours. The existence of peak clipping can make the server processing more smooth, and can save the resource cost of the server. In this case, peak clipping is essentially delaying user requests more in order to reduce and filter out invalid requests, following the principle of “keep the number of requests to a minimum”.

Today, I will introduce some of the flow peak cutting operation ideas: queuing, answering questions, layered filtering. That several ways are intact (that is, will not damage the user’s request) the implementation of the plan, of course, there are some beneath the implementation of the scheme, including behind us to introduce some solution about stability, such as current limiting and machine load protection some compulsory measures can achieve the purpose of flood protection, of course, this is all to some of the measures, therefore not classified here

Line up

The easiest solution to peak peaking is to use message queues to buffer instantaneous traffic, convert synchronous direct calls into asynchronous indirect push, and pass a queue to receive instantaneous peak traffic at one end and smoothly push messages out at the other end. Here, the message queue acts as a “reservoir” to hold the upstream flood and reduce the peak flow into the downstream channel, thus achieving the purpose of flood relief. A message queue to buffer instantaneous traffic is shown below:

However, if the peak traffic continues for a period of time and the maximum number of messages on the message queue is reached, for example, the local message backlog reaches the maximum storage space, the message queue will also be overwhelmed, which protects the downstream system, but is not much different from simply dropping the request. Just like when a flood breaks out, even if there is a reservoir, I’m afraid it will not help

In addition to message queues, there are many similar queuing methods, such as: 1. Using thread pools to lock waiting is also a common queuing method; 2, fifO, FIFO and other commonly used memory queuing algorithm implementation; Serialize the request to a file and then read the file sequentially (e.g. MySQL binlog-based synchronization) to recover the request.

Answer the questions

If you remember, the early seconds were purely about refreshing the page and clicking the buy button, but it was later that questions were added. So why add the answer function?

The main purpose of this is to increase the complexity of the purchase, which serves two purposes.

The first purpose is to prevent some buyers from using seckill to cheat while participating in seckill. When seckill was very popular in 2011, seckill was also rampant, so it failed to achieve the purpose of national participation and marketing, so the system added questions to limit seckill. After the increase of questions, the ordering time was basically controlled in 2s, and the ordering ratio of seckill was also greatly reduced. The answer sheet is shown below.

The second purpose is to delay the requests and to whiten the peak of the request traffic so that the system can better support the instantaneous peak traffic. This important feature is to lengthen the peak order request from the previous less than 1s to 2s-10s. Thus, peak requests are sharded based on time. Sharding at this time is very important for the server to handle concurrency and greatly reduces stress. Also, because requests have a sequential nature, later requests arrive with no inventory and therefore never reach the final order step, so true concurrent writing is very limited. This design idea is very common at present, such as alipay’s “shoop shoop”, wechat’s “shake shake” are similar.

Here, I focus on the second kill answer design ideas.

As shown in the figure above, the logic of the seckill question is divided into three main parts.

Question bank generation module, this part is mainly to generate a question and answer, in fact, the question and answer itself does not need to be very complex, the important thing is to prevent the machine to calculate the results, that is, to prevent the second kill to answer. The push module of the question bank is used to push the question to the detail system and transaction system in advance before the second kill answer. The push of question bank is mainly to ensure that each question requested by users is unique, and the purpose is also to prevent cheating. Topic picture generation module, used to generate the topic into a picture format, and add some interference factors in the picture. This, too, prevents machines from answering directly, requiring that only a human be able to understand the questions themselves. It should also be noted that since the network is crowded when answering questions, we should push the picture of the question to THE CDN in advance and warm it up. Otherwise, when users really request the question, the picture may be slow to load, thus affecting the experience of answering questions.

In fact, the logic of the real answer is relatively simple and easy to understand: when the answer submitted by the user is compared with the corresponding answer of the question, if the answer passes, the next order logic will be continued, otherwise it will fail. We can encrypt the questions and answers with MD5 keys like this:

Key: userId+itemId+question_Id+time+PK key: userId+itemId+answer+PK

The validation logic is shown below:

Note that in addition to verifying the answer to the question, the authentication logic also includes the authentication of the user’s own identity, such as whether the user has logged in, whether the user’s Cookie is complete, and whether the user repeatedly submits frequently. In addition to the correctness verification, we can also set some limits on the time for submitting answers, such as more than 1s from the beginning of the answer to the acceptance of the answer, because less than 1s is very unlikely to be manually operated, so as to prevent machine answers.

Layered filter

The queuing and answering methods described above are either to send fewer requests or to buffer incoming requests, but another approach for the seckill scenario is to layer the requests to filter out invalid requests. Hierarchical filtering actually uses a “funnel” design to process requests, as shown in the figure below.

If the request goes through the LAYERS of CDN, foreground read system (such as commodity details system), background system (such as transaction system) and database, then:

Most of the data and traffic are obtained from the user’s browser or CDN. This layer can intercept most of the data read. When passing through the second layer (i.e. the foreground system), data (including strong consistency data) should be cached as far as possible to filter some invalid requests. Then to the third layer background system, mainly to do the second test of data, to protect the system and limit the flow, so that the amount of data and requests will further reduce; Finally, data consistency verification is completed at the data layer.

This is like a funnel, trying to filter and reduce the amount of data and requests layer by layer.

The core idea of hierarchical filtering is to filter out as many invalid requests as possible at different levels, leaving valid requests at the end of the “funnel”. To achieve this effect, we must perform hierarchical verification of the data.

** The basic principles for layered verification are as follows: **

The Cache of dynamically requested read data is used on the Web to filter out invalid data reads. Do not perform strong consistency check on read data to reduce the bottleneck caused by consistency check. Write data is segmented reasonably based on time to filter out invalid requests. Traffic limiting is implemented for write requests to filter out the requests that exceed the system capacity. Strong consistency check is performed on write data. Only the last valid data is retained.

05 | what are the factors affect performance? How to improve the performance of the system?

Factors affecting performance

So what factors affect performance? Before answering this question, define performance. The definition of performance varies with service devices. For example, CPU performance depends on CPU frequency and disk performance depends on IOPS (Input/Output Operations Per Second).

Today, we mainly discuss the performance of the system server, which is generally measured by QPS (Query Per Second). There is another impact related to QPS, which is Response Time (RT), which can be understood as the Time taken by the server to process the Response.

Under normal circumstances, the shorter the response time (RT), the more requests per second (QPS) will be processed. This seems to be a linear relationship in the case of single-threaded processing, i.e., the performance will be highest if we minimize the response time per request.

But you might think that there’s always a limit to the response time, and it can’t go down indefinitely, so there’s another dimension, which is processing requests through multiple threads. This theoretically becomes “total QPS = (1000ms/response time) x number of threads”, so performance is related to the server time taken for a response and the number of threads that process the request.

First, let’s look at how response time relates to QPS.

For most Web systems, response time is typically a combination of CPU execution time and thread Wait time (RPC, IO Wait, Sleep, Wait, etc.), which means that the server is processing a request partly by the CPU itself and partly by various waits.

If the proxy server itself no CPU consumption, we in time for the proxy server proxy request to add a time delay, which increase the response time, but the throughput of the proxy server itself is not how old influence, because the resources of the proxy server itself has not been consumption, can be dealt with by adding a proxy server of the number of threads, To compensate for the impact of response time on proxy server QPS.

What really matters is the CPU execution time. This also makes sense, since CPU execution really consumes the server’s resources. In practical tests, if you cut the CPU execution time in half, you can double the QPS.

That said, we should focus on reducing CPU execution time.

Second, let’s look at the impact of the number of threads on QPS.

Looking at the formula for “total QPS”, you would think that the more threads there are, the higher QPS will be, but is this always true? Obviously not, the more threads the better, because threads themselves consume resources and are subject to other constraints. For example, the more threads a system has, the higher the cost of thread switching, and each thread consumes a certain amount of memory.

So, what is the most reasonable number of threads to set? Many multithreading scenarios have a default configuration of “threads = 2 x CPU cores + 1”. In addition to this configuration, there is a formula based on best practice:

Number of threads = [(Thread waiting time + thread CPU time)/Thread CPU time] x Number of cpus

Of course, the best way to find the optimal number of threads is through performance testing. In other words, to improve performance, we need to reduce CPU execution time and set a reasonable number of concurrent threads, both of which can significantly improve server performance. Now that you know how to improve performance quickly, the next question you might be asking is, how do I find out where my system is consuming the most CPU?

How to Find bottlenecks

In the case of a server, there are many areas where bottlenecks can occur, such as CPU, memory, disk, and network. In addition, different systems pay different attention to bottlenecks. For example, for a caching system, the bottleneck is memory, while for a storage system I/O is more likely to be the bottleneck.

The scenario we are targeting is seckill, where the bottleneck is more on the CPU.

How do you easily determine if the CPU is a bottleneck? One way to do this is to see if your server’s CPU usage exceeds 95% when QPS is maxed out. If it is not, there is room for improvement, either because of a lock limit or because there is too much local I/O waiting to happen.

How to optimize the system

There are many ways to optimize Java systems. Here I will focus on several effective methods for your reference. They are: reduce coding, reduce serialization, Java optimization, concurrent read optimization. So let’s look at each of these.

1. Cut down on coding

Java’s code is slow, which is a major drawback of Java. In many scenarios, any operation involving strings (such as I/O, I/O) is cpu-intensive, whether it is disk I/O or network I/O, because characters are converted to bytes, and this conversion must be encoded. Each character encoding requires a table lookup, which is very resource-intensive, so reducing character to byte or vice versa can be very productive. Reducing coding can greatly improve performance

2. Reduce serialization

Serialization is also a natural enemy of Java performance, and reducing serialization in Java can greatly improve performance. And because serialization often happens at the same time as encoding, reducing serialization reduces encoding.

Serialization mostly happens in RPC, so avoiding or reducing RPC can reduce serialization, although current serialization protocols have been optimized to improve performance. A new approach is to “merge deployment” multiple applications that are closely related, and reducing RPCS between different applications also reduces serialization consumption.

The so-called “merge deployment” is to combine two different applications on different machines and deploy them on the same machine. Of course, it is not only deployed on the same machine, but also in the same Tomcat container, without using the local Socket, so as to avoid serialization.

We can do even more extreme things for seckill scenarios, which brings us to Point 3: Java extreme optimization.

3. Java extreme optimization

Java is less capable of handling large concurrent HTTP requests than a generic Web server such as Nginx or Apache, so we tend to statically adapt high-traffic Web systems. Let most requests and data be returned directly from Nginx servers or Web proxy servers (such as Varnish, Squid, etc.) (this reduces serialization and deserialization of data), while the Java layer only needs to handle dynamic requests for small amounts of data. For these requests, we can use the following methods to optimize:

1. Handle requests directly with servlets. Avoid using traditional MVC frameworks, which can bypass a lot of complicated and useless processing logic and save 1ms of time (depending on how much you rely on MVC frameworks). Using resp.getOutputStream() instead of resp.getwriter () improves performance by saving some of the encoding of immutable character data; It is recommended to use JSON for data output rather than a template engine (which typically interprets execution) to output pages.

4. Concurrent read optimization

Some readers may find this problem easy to solve by putting it in the Tair cache. Centralized caches generally use consistent hashes to ensure hit ratios, so the same key falls on the same machine. While a single caching machine can support 30W /s of requests, it’s not nearly enough to handle a hot item of the “big second” scale. So, how to solve the bottleneck of single point completely?

The answer is application-level LocalCache, which caches data related to items on a single machine in a seckill system.

How do you Cache data? You need to separate dynamic and static data for processing:

1. The data such as “title” and “description” in goods will be pushed to the machine in full quantity before the second kill starts and cached until the end of the second kill;

2. Dynamic data, such as inventory, will be cached for a certain period of time (usually a few seconds) in the way of “passive invalidation”, and then the latest data will be pulled from the cache after invalidation.

06 | seconds kill system core logic of “reduce inventory” design

Never oversold. That’s the main premise.

There are several ways to reduce inventory

In the normal e-commerce platform shopping scenario, the actual purchase process of users is generally divided into two steps: order and payment. If you want to buy an iPhone, click the “Buy now” button on the product page, check the information and click “Submit order.” This step is called placing an order. Once you place an order, you can’t make a purchase until you actually pay for it, which is known as “bagging for safety.”

So if you’re an architect, where do you do inventory reduction? To sum up, the inventory reduction operation generally has the following ways:

** Place an order to reduce inventory. ** Subtract the quantity purchased by the buyer from the total inventory of the goods when the buyer places an order. Placing an order to reduce inventory is the simplest way to reduce inventory, but also the most accurate control of one, when placing an order directly through the transaction mechanism of the database to control the inventory of goods, so that there will be no oversold situation. But you should know that some people may not pay after placing the order. ** Payment to reduce inventory, ** that is, buyers after placing an order, do not immediately reduce inventory, but wait until there is a user to pay the real inventory reduction, otherwise the inventory has been reserved for other buyers. But because the time of payment to reduce inventory, if concurrent is higher, likely buyers order after the payment, as may have been bought by others * * withholding inventory, * * this way is relatively complicated, buyers after placing orders, inventory for its keep a certain amount of time (e.g., 10 minutes), more than this time, the stock will be released automatically, After the release, other buyers can continue to buy. Before the buyer pays, the system will check whether the inventory of the order is reserved: if not, it will try to withhold again; If the inventory is insufficient (i.e., withholding failure), no further payment is allowed; If the withholding is successful, the payment is completed and the inventory is actually reduced.

The above several ways to reduce inventory will have some problems, let’s look at the next.

Possible problems in reducing inventory

Because there are two or more steps in the process of shopping, there will be some loopholes that may be used by malicious buyers in different steps of inventory reduction, such as the occurrence of malicious orders.

If we use “place an order to reduce inventory” means, namely after the user places an order, subtract inventory, normal circumstance falls, the probability of payment after buying an order is met very high, won’t have too big problem so. However, there is an exception. When the seller participates in a certain activity, the effective time of the activity is the prime selling time of the goods. If a competitor places orders all the goods of the seller through malicious ordering, so that the inventory of the goods is reduced to zero, then the goods cannot be sold normally. Understand that these malicious people do not actually pay, which is the “order to reduce inventory” approach of the deficiency.

Since “order to reduce inventory” may lead to malicious orders, thus affecting the seller’s sales of goods, then is there any way to solve the problem? You may be thinking, “pay to reduce inventory” method is not ok? Yes, we can. However, paying to reduce inventory can lead to another problem: oversold inventory.

If have 100 goods, appear likely 300 people place an order the circumstance of success, because when place an order, won’t reduce inventory, also appear likely so place an order the circumstance of success far exceeds real inventory number, this can happen especially on the popular commodity that does activity. In this way, it will lead to many buyers who place orders successfully but cannot pay, and the shopping experience of buyers is naturally poor.

It can be seen that either “order inventory reduction” or “payment inventory reduction” will lead to the situation that the inventory of goods can not completely correspond to the actual sales situation, it seems that it is not easy to sell the goods accurately!

So, since “order to reduce inventory” and “payment to reduce inventory” have shortcomings, can we combine the two, the two operations before and after the association, order pre-deduction, do not pay in the specified time and then release inventory, that is, the use of “pre-deduction inventory” this way?

This kind of program can alleviate the above problem to some extent. But is it solved for good? Not really! In the case of malicious order, although the effective payment time is set to 10 minutes, malicious buyers can place an order again after 10 minutes, or use the way of ordering many pieces to reduce the inventory. In this case, the solution is to combine security and anti-cheating measures.

For example, identify and mark frequent non-payment buyers (can not reduce inventory when targeted buyers order), set a maximum number of purchases for certain categories (for example, participate in the event of a maximum of 3 items for each person), and limit the number of repeated orders for non-payment operations.

In view of the situation of “oversold inventory”, the number of orders may still exceed the number of inventory within 10 minutes. In this case, we can only treat it differently: For ordinary goods with orders exceeding the number of inventory, we can solve the problem by replenishment; However, some sellers will not allow negative inventory at all, which can only prompt the buyer to pay for the low inventory.

How to reduce inventory in large second kill?

At present, the most common business system is the withholding inventory scheme. For example, when you buy air tickets or movie tickets, there is usually an “effective payment time” after placing an order, and the order will be automatically released after the time, which is a typical withholding inventory scheme. And specific to the second kill this scene, should use which scheme is better?

As a result of participating in the second kill goods, are generally “grab is to earn”, so the successful order but do not pay the situation is less, coupled with sellers of second kill goods inventory has strict restrictions, so second kill goods using “order inventory reduction” is more reasonable. In addition, in theory, “order destocking” is logically simpler than “withholding inventory” and “payment destocking” involving third-party payment, so it has better performance.

“Orders to reduce inventory” on the data consistency, main is to ensure that the inventory data cannot be negative, concurrent requests is to ensure that the inventory in the database field values cannot be negative, normally we have a variety of solutions: one is judged by transaction in an application, namely the guarantee after inventory reduction cannot be negative, or rolled back; Another method is to directly set the database field data as unsigned integer, so that the inventory field value is less than zero will directly execute SQL statements to report errors; Another option is to use CASE WHEN statements, such as this SQL statement:

UPDATE item SET inventory = CASE WHEN inventory > = xxx THEN inventory-xxx ELSE inventory END  
Copy the code

Optimal inventory reduction in seconds

In the trading link, “inventory” is a key data, is also a hot data, because each link of the trading may involve the inventory query. However, as I mentioned earlier when introducing hierarchical filtering, precise consistent reads of the inventory are not required in seckill. Storing the inventory data in the Cache greatly improves read performance. To solve the problem of large concurrent reads, we can use LocalCache (that is, cache the data related to goods on a single machine in the second kill system) and hierarchical filtering of the data. However, such large concurrent writes like inventory reduction cannot be avoided anyway, which is the core technical problem in the second kill scenario.

Therefore, I would like to focus on the ultimate optimization of inventory reduction in the second kill scenario, including inventory reduction in the cache and inventory reduction in the database.

Goods and general goods inventory reduction or kill some of the differences, the quantity is less, for example, transaction time is short, so there is a bold hypothesis, namely can kill goods inventory reduction directly into the cache in the system implementation, also is in the cache to reduce inventory directly or in a system with the function of the persistent cache (such as Redis) done?

If you have a very simple inventory reduction logic for seconds kill items, such as no complex linkage between SKU inventory and total inventory, I think it’s fine. But if you have more complicated destocking logic, or need to use transactions, you still have to do destocking in the database.

Because MySQL stores data, the same data in the database must be stored in one row (MySQL), so there will be a large number of threads competing for InnoDB row lock. The higher the concurrency, the more waiting threads will be, and the TPS (Transaction Per Second) will decrease. The response time (RT) goes up, and the throughput of the database is severely affected.

This can lead to the problem that a single hot item can affect the performance of the entire database, causing 0.01% of items to affect 99.99% of items sold, which we don’t want to see. One solution is to follow the principles described earlier to isolate hotspot items into a separate hotspot repository. However, this will undoubtedly bring maintenance problems, such as dynamic migration of hotspot data and separate databases.

Separating hot items into a separate database still doesn’t solve the problem of concurrent locking. What should we do? There are two ways to solve the problem of concurrent locking:

The application layer does the queuing. In this way, the concurrency of operations on the same row of database records on the same machine can be reduced. In addition, the number of database connections occupied by a single item can be controlled to prevent hot items from occupying too many database connections. The database layer does the queuing. The application layer can only do the queuing of a single machine, but the number of application machines itself is very large, the ability to control the concurrency of this queuing method is still limited, so it is ideal to do global queuing in the database layer. Ali’s database team developed patch on InnoDB layer for this kind of MySQL, which can queue single row records concurrently on the database layer.

You may be wondering, queuing and lock competition are not to wait, what’s the difference?

If you are familiar with MySQL, you will know that the deadlock detection inside InnoDB and the switching between MySQL Server and InnoDB will consume performance. The MySQL core team of Taobao has also made many other optimizations. Fixes such as COMMIT_ON_SUCCESS and ROLLBACK_ON_FAIL, combined with hints in the SQL, do not wait for the application layer to COMMIT in a transaction, and after the last SQL is executed, Commit or rollback directly based on the results of TARGET_AFFECT_ROW can reduce network wait time (about 0.7ms on average). As far as I know, the Current Ali MySQL team has made MySQL open source with these patches.

In addition, in addition to the hot-spot isolation and queuing processing described above, there are also some scenarios (such as the lastModifyTime field of a commodity) where the update is very frequent. In some scenarios, these multiple SQL can be merged, and only the last SQL is executed within a certain period of time. To reduce database update operations.

The content of this article is part of the course “How to design a second kill system” by Xu Lingbo.