Taobao big second system design details

This article has been authorized by Xu Lingbo and CSDN public account, and re-edited for readers!

Abstract: The original prototype of the second kill system is the regular listing function on Taobao details, because some sellers in order to attract attention, the price is very low. However, this brings great pressure to the detail system. In order to isolate the burst traffic, the second kill system is designed. This paper mainly introduces the large second system and the solution ideas and practical experience of the typical hot spot problem of reading data.

Xu to wave, Didi researcher, joined Taobao in 2009, now responsible for product details business and stability related work, long-term focus on the field of performance optimization, participated in the main optimization project of Taobao’s high-traffic Web system, the author of the book “In-depth Analysis of Java Web Technology Inside”. Personal website http://xulingbo.net

Some of the data

Do you still remember the Xiaomi Seconds kill in 2013? Each of the three Xiaomi phones was sold with 110,000 units, all of which were based on the big second system. Three minutes later, it became the first and fastest flagship store to reach 100 million yuan on Singles’ Day. According to the log statistics, the peak effective QPS of double 11 in the front-end system is about 60W /s, and the peak value of cluster and single cache in the back-end cache is nearly 2000W /s and 30W /s. However, the traffic is much smaller when it comes to real write. At that time, the highest TPS of single-order inventory reduction is created by Red rice, reaching 1500/s.

Hot isolation

The first principle in the design of a seckill system is to isolate the hot data, so that the 1% of requests do not affect the other 99%, and it is easier to optimize the 1% of requests. We have several levels of isolation for seckill:

Service isolation. To make the second kill into a marketing activity, the seller to participate in the second kill this marketing activity needs to sign up separately, from the technical point of view, the seller after signing up for us is known hot, when really start we can do a good job in advance to warm up.
System isolation. System isolation is more runtime isolation and can be separated from the other 99% by grouping deployment. Second kill also applies for a separate domain name, again to allow requests to fall into different clusters.
Data isolation. Most of the data called by seckill is hot data. For example, a separate cache cluster or MySQL database will be enabled to put hot data. Currently, we do not want 0.01% of the data to affect the other 99.99%.

Of course, there are many ways to achieve isolation, such as distinguishing according to users, assigning different cookies to different users, routing to different service interfaces in the access layer; In addition, traffic limiting policies can be set for different PATHS of URLS at the access layer. The service layer invokes different service interfaces; The data layer can distinguish the data with special labels. The goal is to distinguish identified hot spots from ordinary requests.

Dynamic and static separation

The principle introduced above at the system level is to do isolation, followed by the hot data dynamic and static separation, which is also an important principle to solve the large flow system. How to do static transformation of static separation of the system I used to write a “static architecture design of high traffic system” in detail introduced the static design ideas of Taobao commodity system, interested in the “Programmer” magazine to find it. Our large second system is developed from the commodity detail system, so it has already realized the separation of movement and motion, as shown in Figure 1.

FIG. 1 Dynamic and static separation of large second system

In addition, there are the following characteristics:

Cache the entire page in the user’s browser
If the entire page is forced to refresh, the CDN is also requested
The actual valid request is just the “refresh grab” button

In this way, 90% of the static data is cached on the client or CDN. When the real second kill occurs, the user only needs to click the special button “refresh the treasure” instead of refreshing the whole page. In this way, only a few valid data are requested to the server and a large number of static data are not repeatedly requested. The dynamic data of seckill is less than the dynamic data of ordinary details page, and the performance is more than 3 times better than ordinary details. Therefore, the design idea of “refresh grab treasure” is a good solution to the latest dynamic data can be requested to the server without refreshing the page.

Peak clipping based on time slice

Familiar with taobao seconds kill all know, the first seconds kill system itself is not the answer, just behind the increased the seconds kill the answer, of course, seconds kill the answer is a very important in order to prevent seconds kill, 2011 seconds kill very fire, seconds kill is rampant, and did not achieve the purpose of public participation and the marketing, so increasing the answer to limit the seconds. After the increase of questions, the ordering time was basically controlled after 2s, and the ordering proportion of seckill also decreased to less than 5%. The new answer sheet is shown in Figure 2.

Figure 2 second answer page

Actually increase the answer still has an important function, is the peak order request for longer, from the previous 1 s within the extended to about 2 ~ 10 s, peak based on time slicing the request, this time the shard is very important for server-side processing concurrent, will lose a lot of pressure, also requested by successively, on the request of nature also have no inventory, There is no way to get to the final order step, so real concurrent writing is very limited. In fact, this kind of design idea is also very common at present, such as Alipay’s “shooo shooo” and wechat shake a shake.

In addition to peak-cutting on the client side by answering questions on the front end, instantaneous requests are generally controlled on the server side by locking or queuing.

Hierarchical data check

FIG. 3 Layered verification

For checking and the big traffic system data layer is the most important design principles, a layered calibration is made “funnel” type design for a large number of requests, as shown in figure 3: invalid request as much as possible in different hierarchies filtering, the end of the funnel is a valid request, to achieve this effect must be to do a layered calibration data, here are some principles:

Let’s do static and dynamic separation of data
Caches 90% of the data to the client browser
Cache dynamically requested read data on the Web side
Strong consistency check is not performed on read data
Write data is sharded reasonably based on time
Traffic limiting is implemented for write requests
Performs strong consistency check on write data

The seckill system is designed with this principle in mind, as shown in Figure 4.

Figure 4. Hierarchical architecture of seckill system

Put a lot of static data that doesn’t need to be verified in the nearest place to the user; Check some basic information in the front-end reading system, such as whether the user has the second kill qualification, whether the state of goods is normal, whether the user answers correctly, whether the second kill has ended, etc.; In the write data system to verify some things, such as whether it is illegal request, whether the marketing equivalent is sufficient (scouring gold, etc.), written data consistency, such as checking whether there is inventory, etc.; Finally, ensure the accuracy of data in the database layer, such as inventory can not be reduced to a negative number.

Real-time hotspot discovery

Essence is actually kill system is a hot problem in the data read, and it is the simplest one, as mentioned in the article through business separation, we have been able to recognize these hot data in advance, we can do some protection in advance, early recognition of hot spot is also relatively simple data processing, such as analysis of the historical record found what popular goods, Analyzing users’ shopping cart records can also reveal which items are likely to sell well, which are hot spots that can be analyzed in advance. What is more difficult is that the commodities that cannot be discovered in advance suddenly become hot spots become hot spots, which requires real-time hotspot data analysis. At present, we design that real-time hotspot data can be found on the trading link within 3S, and then each system can do real-time protection according to the real-time hotspot data. The concrete implementation is as follows:

Build an asynchronous hot key that can collect the statistics of each middleware product such as Tengine, Tair cache, HSF and so on on the transaction link (the middleware products such as Tengine and Tair cache have their own hot statistics module).
Establish a hotspot report and hotspot service delivery specification that can be subscribed according to demand. The main purpose is to transparentively transmit the hotspots discovered upstream to the downstream system through the access time difference of each system (details, shopping cart, transaction, offer, inventory, and logistics) on the transaction link, and protect the hotspots in advance. For example, the great rush hour detail system is the first to know the hot URL statistics of Tengine module on the statistical access layer.
The upstream system collects hotspot data and sends it to the hotspot server. Then the downstream system, such as the trading system, will know which commodities are frequently invoked and then do hotspot protection. See Figure 5.

Figure 5 Background of real-time hotspot data

Key elements include:

It is better to capture hotspot data logs asynchronously in the background of the hotspot service. On the one hand, it is convenient to achieve universality, and on the other hand, it does not affect the main flow of the service system and middleware products.
There is no substitute relationship between the hotspot service background and existing middleware and applications, and each middleware and application needs to protect itself. The hotspot service background provides a unified standard and tool for collecting hotspot data and providing hotspot subscription service, facilitating the transparency of hotspot data of each system.
Hotspot discovery should be real-time (within 3s).

Key technology optimization points

Previously introduced some of the principles used in the design of heavy traffic read system, but when these means are used, or there is a large influx of traffic how to deal with it? The seckill system has several key problems to solve.

Java optimization for handling large concurrent dynamic requests

In fact, Java is weaker than a generic Web server (Nginx or Apache) to handle large concurrent HTTP requests, so we usually make static changes to a Web system with high traffic. Let most requests and data be returned directly from Nginx server or Web proxy server (Varnish, Squid, etc.) (to reduce serialization and deserialization of data), don’t drop requests onto Java layer, let Java layer only handle dynamic requests with small amount of data. Of course, there are some optimizations that can be used for these requests:

Use servlets directly to process requests. Depending on how much you rely on MVC, avoiding traditional MVC frameworks can save you 1ms of time by bypassing a lot of complicated and useless processing logic.
Output stream data directly. Using resp.getOutputStream() instead of resp.getwriter () saves some immutable character data encoding and improves performance; It is also recommended to output pages using JSON rather than a template engine (which typically interprets execution) for data output.

Large concurrent read problem for the same commodity

Centralized Tair caches use consistent hashes to ensure hit ratio, so the same key falls on the same machine. Even though our Tair caches can support 30W /s of requests per machine, However, hot goods of this level like Seconds are far from enough, so how to solve this single point of bottleneck completely? The answer is to use the application layer Localcache, that is, to cache the commodity related data on the single machine of the second kill system, how to cache the data? Also dynamic and static:

Items such as titles and descriptions that don’t change themselves are pushed to the kill machine before the kill starts and cached until the kill ends.
Dynamic data such as inventory is cached for a certain period of time (usually a few seconds) by passive invalidation, and then the Tair cache is used to pull the latest data.

So you might be wondering if something like inventory, which is updated so frequently if the data is inconsistent, is oversold? In fact, we need to use the read data hierarchical verification principle introduced earlier. The read scenario can allow some dirty data, because the misjudgment here will only lead to a small number of orders that are not in stock mistakenly thinking that there is still stock, until the data is actually written to ensure the final consistency. In this way, the high availability and consistency of data are balanced to solve the problem of high concurrency data reads.

Large concurrent updates of the same data

To solve the problem of large concurrent reads, Localcache and hierarchical data verification are adopted. However, in any case, large concurrent writes such as destocking are unavoidable, which is the core technical problem in the seckill scenario.

The same data in the database must be stored in a row (MySQL), so there will be a large number of threads competing for InnoDB row lock. When the concurrency is higher, there will be more waiting threads, TPS will decrease, RT will increase, and the throughput of the database will be seriously affected. Speaking of which, there will be a problem, that is, a single hot item will affect the performance of the entire database, there will be 0.01% of the goods we do not want to see affect 99.99% of the goods, so one idea is to follow the first principle introduced in the previous isolation, put hot items in a separate hot database. But there are undoubtedly maintenance complications (live migration of hot data and separate databases, etc.).

Separating hot items into separate databases still does not solve the problem of concurrent locking, which can be solved in two layers.

The application layer does the queuing. In this way, the concurrent operation on the same row of the database can be reduced by the same machine. In addition, the number of database connections occupied by a single item can be controlled to prevent hot items from occupying too many database connections.
The database layer does the queuing. Application layer can only do single queue, but many use the machine itself, this way of queuing control concurrent remains limited, so if you can do in the database layer global queue is the most ideal, taobao’s team developed a database for this patch on MySQL InnoDB layer, can be done on the database layer of single record do concurrent queue, See Figure 6.

Figure 6. Database layer concurrent queuing of single row records

You may ask queue and lock competition don’t wait? What’s the difference? If you are familiar with MySQL, you will know that the internal deadlock detection of InnoDB and the switch between MySQL Server and InnoDB will consume a lot of performance. The MySQL core team of Taobao has also made many other optimizations. Add hint to SQL, such as COMMIT_ON_SUCCESS and ROLLBACK_ON_FAIL patch, without waiting for the application layer to COMMIT, and COMMIT or roll back TARGET_AFFECT_ROW after executing the last SQL. Can reduce the network wait time (about 0.7ms on average). As far as I know, ali MySQL team has submitted these patches to MySQL official review.

We will promote thinking on hotspot issues

Based on years of experience, I have summarized some general principles: isolation, dynamic separation and layered verification. Each link must be considered and optimized from the whole link. In addition to optimizing the system to improve performance, it is also necessary to do a good job in current limiting and protection.

In addition to the hot issues described above, Amoy has a variety of other hot data issues:

Data access hotspots, such as the Detail, have very high access to some hot items. Even the Tair Cache itself has bottlenecks. Once the number of requests reaches the limit of a single machine, there will be hotspot protection problems. Sometimes it seems like a simple solution, such as limiting traffic, but consider that once a hotspot triggers the limiting threshold of a machine, the data in that machine’s Cache is invalid, which indirectly causes the Cache to be breached and requests to the application layer database to avalanche. This problem needs to be combined with specific Cache products to provide a better solution. A general solution is to perform Localcache on the client side of the Cache. When hotspot data is found, the Cache is directly stored in the client side instead of the Cache Server.
In addition to the hotspot isolation and queuing processing introduced above, there are also some scenarios, such as the lastModifyTime field update of goods is very frequent. In some scenarios, these multiple SQL can be combined, and only the last SQL is executed within a certain period of time. You can reduce the number of updates to the database. In addition, the automatic migration of hotspot products can also be theoretically completed in the data routing layer. The hotspots can be automatically migrated from the common database to a separate hotspot database by using the real-time hotspot discovery described above.

The index built according to a certain dimension generates hot data, for example, in real-time search, the evaluation data of some hot commodities are so many that the search system has no room for the index of evaluation data built according to the commodity ID, and the order information associated with the transaction dimension also has these problems. This kind of hot data requires data hashing, adding another dimension, and reorganizing the data.

Taobao big second system design details

Related Posts

Here comes the Go language learning route

Hadoop uploads files to the HDFS

Interface automation