Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

If the statement

The content of this article is entirely the author’s own technical analysis and summary precipitation, due to the author’s limited technology and ability, if there is something wrong, but also hope you forgive and forgive, and a lot of comments, thank you.

Seckill systems – Intelligence background

I believe that we have all come into contact with Sina Weibo, Taobao, JINGdong and other platforms and websites with huge visits. In terms of “high traffic” and “high concurrency”, it is a difficult “burden” problem that we [technology developers] have to face. Alas, it seems that if you are going to make it in this business, even though you may not be exposed to high-concurrency scenarios, you still have to create your own high-concurrency scenarios, because that’s how you can get ahead!

Seckill system. – Intel introduction

For the content we are going to introduce today is one of the most extreme scenarios of high concurrency: “seckill”, this noun will generally appear in the “big promotion”, of course, will also appear in some platform activities, so there will certainly be a small partner will say, seckill system to pay attention to what problems ah! Why is it harder? Why is it harder?!

Seckill system – Feature analysis

Instantaneous surge: traffic starts to enter at a certain moment (there is rarely a warm-up and slow growth mechanism). When a large number of users snap up the same product at the same time, the site instantly surges in traffic.
Not enough to satisfy the satisfy: the inventory of goods is limited, the number of orders under the second kill request will be far more than the inventory, only a small number of users can succeed in the second kill.
Resource lock: the second kill business process is relatively simple, generally is to place an order to reduce inventory. Inventory is the “resource” that users compete for, and the “resource” that is actually consumed cannot exceed the “resource” that is planned to be sold, that is, cannot be “oversold”.

Second kill system – Difficulty analysis

Its difficulty is to complete a “60-100 minute” seckill system, then it must at least take care of the following three aspects, namely “service availability”, “data consistency” and “fast response”, a little “demanding”!

In our current scenario, it’s hard to think about the architecture of a non-distributed system. (Distributed architecture) WE all know CAP theory. It’s okay if you don’t know.

CAP theory, also known as CAP theorem, refers to Consistency at the service (data) level, Availability of service itself, and fault tolerance of different nodes in the network in a distributed system.

A and C: A and C: A and C: It means that if you want to ensure that different nodes can still access data even when there is A problem in the network, then the most direct method is redundant assignment nodes, otherwise everything is empty, so as A distributed system, P cannot be ignored, we can understand that it is the basis of A and C.

Summary of CAP system

Only ensure that AC is a single application, not distributed at all. Of course it does. This is how systems were built before distribution. If one of the nodes in the system fails, there is no brain split but the whole system goes down.
Furthermore, if there are more nodes in the network, the partition tolerance is higher, but the more data to be copied and updated, the more difficult it is to ensure consistency.
To ensure consistency, the longer it takes to update all node data, the less availability it will be.

The above three elements have become “contradiction theory”, and the CAP principle means that these three elements can only achieve two at most at the same time, but not all of them.

Back to the main body: the kill three elements, they are not exactly the same as the CAP three elements, or even more demanding than them, or even based on a higher level of the first three elements.

Availability of services

Service availability is to maintain the service availability under the impact of high concurrent traffic and ensure that the service capability can be output to the outside world. It will not cause downtime and resource damage. Even in the case of limited memory, network and even hardware resources, it will not be destroyed and “die”.

For example, if you raise fish, you play hard to put feed to the fish, and the fish can bear more than the amount, it can not stand alive choking or supporting death, the fish is like your system, must ensure the health of the fish ah!

Consistency of data

All know, our development program and now most of the servers, such as database, at the time of processing data, they are more likely to have multiple threads at the same time to modify a row or the same block of memory, in the Java perspective itself also inconsistent problems in application and middleware perspective, too, There will be the disordering of the data modification order at the same time, and the disorder of the data, resulting in the repeated operation of the data, resulting in different assumptions from our expectations.

Unless you can serialize them, one by one, and not let them change or manipulate data at the same time, this is the most essential and safest way, but it is also the most performance critical way. (Pessimistic locks, synchronous queues).
And a way is, every moment at the atomic level, which is closest to modify the data at the bottom of the computer, or between all the nodes to build a application level in the middle of the summary one point (main point of redis or database), the top write barriers and reading barriers, before the change, have a check in the judgment, If the data is different from what is expected, it is not modified. This is the famous optimistic lock!

Quick Response of services

Generally speaking, this belongs to the user experience, a more qualified second kill system, should not make users wait for a long time but better feedback results as quickly as possible.

To make a fast response, you don’t need to return asynchronously, you need to respond quickly.
In addition, you need to help users calculate data as soon as possible and return it directly.

To summarize

(asynchronous return + synchronous processing) to sum up is asynchronous in the case of synchronous calculation, both can ensure fast response, and can ensure the consistency of data.
(Asynchronous return + optimistic lock processing) To sum up, is asynchronous nesting optimistic lock calculation, can ensure fast response, but also to ensure the consistency of data.

After the intelligence analysis, we’re gonna hit the big time! Technical analysis was carried out.

Seckill system – Architecture design

We divide the seckill architecture into three levels: from the outside to the inside, respectively: application layer, service layer, data access layer.

Kill architecture design points

Application layer architecture design

Static and static separation +CDN technology

Static and static separation analysis

Scenario Analysis: Before the activation of seckill, users will try to refresh the browser page (commonly known as F5) to ensure that they will not miss the products of seckill.
According to the common website application architecture:
- We assume that if these useless requests frequently hit our backend server, such as through: Web server (LVS, Nginx, etc.)-> application server (Tomcat or Jetty, etc.), connection database (MySQL), it will undoubtedly cause great pressure on the backend service and server.
Solution: Redesign the second kill product page, do not use the site’s original product details page, page content static, reduce/isolate useless requests through the back-end service.

CDN technical analysis

Sudden increase in network and server bandwidth

The static page data size of the website is 100K, so the required network and server bandwidth is 2G (100K×10000), the network bandwidth is increased because of the second kill activity, which exceeds the bandwidth normally used by the website.

Even converts dynamic business to a static page, but down the activities will be very dramatic increase in the consumption of network bandwidth, at the same time will not ease the pressure of the front-end web server, so if you can, need to be further down the page are cached in the CDN goods, is not just our front Nginx server level, Therefore, the newly added eout bandwidth needs to be temporarily rented from the CDN service provider.

Prevents cache interference with page refresh to kill pages

Pass random number + status bit through javascript file!

Add a reference to a JavaScript file that contains the kill start flag no to the kill static page.

Generate a new JavaScript file when the seckill starts (the filename remains the same, but the content is different), update the seckill start flag as yes, add the URL of the order page and the random number parameter (this random number will only generate one, that is, all people see the same URL, Server side can use Redis distributed cache server to save random number), and is loaded by the user browser, control the display of seconds kill product page.
This JavaScript file can be loaded with a random version number (e.g. Xx.js? V =32353823), which will not be cached by browsers, CDN and reverse proxy servers.
The JavaScript file is so small that even accessing the JavaScript file server with every browser refresh would not put too much strain on the server cluster and network bandwidth.

To summarize: front-end splitter pages use specialized pages, including static HTML and dynamic JS, which need to be cached on the CDN.

Limit frequency heat based on UID

In order to control the fairness principle, because scalpers or some hacktivists, will adopt “high-tech”, for example, using crawler script operation, to frantically refresh the page. In order to prevent the destruction of some people and fair dispersion, so the use of the same standard to control UID (user ID) to access frequency information, when exceeding each person needs to reach the frequency threshold, it is necessary to limit the amount of interactive window can access refresh data!

For example, you can use Redis to make access statistics for each user, and control the user’s access frequency to a certain commodity according to the user’S ID and the identification of the commodity. When the access frequency exceeds, his request will be temporarily cut off.

Reverse proxy + load balancer

A split-kill system is necessarily a clustered system, and it is also a good choice to use Nginx for load balancing if the hardware is not upgraded.

Load Balance is an application of Cluster technology. It can distribute work tasks to multiple processing units to improve concurrent processing capacity and improve the performance of medium and large websites.

You need to use service clustering and horizontal scaling to distribute “peak” requests to different servers for processing.

HTTP redirection protocol implements load balancing

According to the DNAT of the HTTP request, a real Web server address is calculated, and the web server address is written into the HTTP redirection response and returned to the browser for re-access. This method is relatively simple, but the performance is poor.

Generally speaking, springCloud-Gateway and Neflix Zuul, which are often used, belong to this category.

Protocol layer: DNS domain name resolution load balancing

The IP addresses of multiple domain names are recorded on the DNS server. This method directly transfers the load balancing work to DNS, saves a lot of trouble for website management and maintenance, improves the access speed and effectively improves the performance.

Commonly used DNS servers or domestic DNS servers are of this type.

Protocol layer: reverse proxy load balancer

In addition to providing load balancing functions, the reverse proxy server manages a group of Web servers and forwards the browser access requests to different Web servers for processing based on the load balancing algorithm. The reverse proxy server returns the processing results to the browser.

In this mode, the web server address cannot be directly exposed and the external IP address is not required. However, if the reverse proxy service serves as a communication bridge, you need to configure two network adapters and two sets of external and internal IP addresses.

Nginx or HaProxy, which are commonly used, fall into this category.

Network layer IP load balancing

The network layer performs load balancing by changing the destination address. This method responds to the request faster than reverse SERVER load balancing. However, when the request data is large (large video or file), the response speed is slow.

Nginx or HaProxy, which are commonly used, fall into this category.

Data link layer load balancing

The Mac address of the data link layer is changed for load balancing. The IP address of the load balancing server is the same as the virtual IP address of the Web service group it manages. It does not require the load balancing server to translate IP addresses, but it has high requirements on the network adapter bandwidth of the load balancing server.

LVS and so on, which are commonly used, fall into this category.

F5 and A10 load balancers

F5 stands for F5-BIG-IP-GTM. It is a hardware load balancing device with concurrent capability. This mode implements load balancing and redundancy among multiple ISP links and implements load balancing and high availability among ISP links.

Service layer architecture design

Analysis of cache technology

Persistent I/O operations on hard disks consume a lot of resources. So we decided to use redis based on memory operations, redis intensive IO.

Batch release + queue processing

No matter how many applications we scale, how many application servers we use, or how many load balancers we deploy, there will always be times when we can’t handle the volume of requests.
Therefore, at this level we need to consider how to limit the flow, when the system is out of the limit, we need to decisively prevent the influx of requests.

line

Queuing processing mechanism, as we buy things in the same way, this way will not be processed, and can also ensure the correctness of data execution!

It queues requests directly, using FIFO (First Input First Output), so that we don’t end up with some requests never getting the lock. As you can see here, there are some mechanisms that change multi-threaded processing into single-threaded processing, which can greatly affect data efficiency and performance!

Java’s three commonly used concurrent queues

ArrayBlockingQueue is a blocking queue with a fixed initial capacity that can be used as a successful auction queue for the database. For example, if there are 10 items, we can create an array queue of 10 sizes.
ConcurrentLinkedQueue is an asynchronous queue that uses the CAS primitive lockless queue implementation. It is fast to join the queue and slow to lock the queue.
LinkedBlockingQueue is also a blocked queue, joining and leaving the queue are locked, and threads temporarily block when the queue is empty.

During the pre-processing phase, the queue will not be empty due to the large number of incoming requests than outgoing requests, so we can choose ConcurrentLinkedQueue as our request queue implementation or even use Disruptor asynchronous processing framework mechanism.

Batch release

On the basis of synchronous queuing, we can add a batch release execution processing mechanism.

As the name implies, in order to improve performance, we can consider executing back-end services after reaching a predetermined threshold, which can improve performance and reduce the number and pressure of back-end requests, as shown in the following figure:

Caching and queuing technologies are also used to reduce application processing stress and achieve final consistency through asynchronous requests.

Current limiting

Bucket algorithm

The idea of leaky bucket algorithm is very simple. The water (request) enters the leaky bucket first, and the leaky bucket flows out of the water at a certain speed. When the inflow speed is too high, the water directly overflows.

Set the leaky bucket flow rate and the total capacity of the leaky bucket, and determine whether the current leaky bucket capacity is full when the request arrives. If the request is not satisfied, the request can be stored in the bucket. Otherwise, the request is discarded.
A thread is used to fetch and process requests at a set rate.

Algorithms disadvantages

Since it can only process requests at a certain rate, determining that rate is a core issue. Setting the rate too low will waste performance resources, while setting it too high will result in insufficient resources.

Speed execution sensitivity is not high! No matter how the input rate fluctuates, it is not reflected in the server. Even if the resources are free, the unexpected request cannot be processed in a timely manner. Therefore, this method is not recommended when the unexpected request is processed.

Token bucket algorithm

The token bucket algorithm works by putting tokens into the bucket at a constant rate. If the request needs to be processed, a token needs to be fetched from the bucket first. When no token is available in the bucket, the service is denied.

Realize the principle of

Set, the rate at which the token is added to the token bucket and setting maximum storage of tokens in the bucket, when the request arrives, the request token in the bucket (according to the application requirements, may be one or more), if required token quantity, delete the corresponding number of tokens and through the current request, if insufficient number of tokens in the bucket the trigger current limiting rules.

In order to solve the problem of traffic burst at cycle switching caused by fixed window counting, sliding window counting can be used. Sliding window calculation is also fixed window counting in essence, the difference lies in the refinement of counting period.

The sliding window

Compared with the fixed window counting method, the sliding window counting method adds a parameter M to set the number of sliding Windows in the period T in addition to counting the two parameters of period T and the maximum number of visits (calls) within the period N.

Data access layer

Mysql suffers especially severe performance degradation under high concurrency conditions due to high concurrency.

Data update point (inventory deduction)

Pessimistic locks update data

Can be from the “pessimistic lock” direction

Pessimistic locking, that is, when modifying data, adopts a locked state, excluding external requests for changes. When a locked state is encountered, it must wait.
While it addresses thread-safety issues, in “high concurrency” scenarios, there are many such modification requests, each requiring a “lock” that some thread may never have a chance to grab, and the request will die there.
At the same time, the number of such requests can increase the average response time of the system in an instant, and as a result, the number of available connections can be exhausted and the system can run into exceptions.

Row lock, page lock, table lock, synchronous lock, distributed lock, distributed queue, destination, etc.

Optimistic lock update data

Discuss the idea of “optimistic lock”.

Optimistic lock is a looser locking mechanism compared with “pessimistic lock”, which mostly adopts update with Version number/update operation mechanism through state.
The implementation is that all requests for this data are eligible to modify, but will get a version number of the data, only the version number can be updated successfully, the other return snap up failed.
This way, we do not need to worry about queues, but it can increase CPU computation overhead. But it is a better solution when the conflict is small.

Cache optimistic lock, database optimistic lock. Check whether the number of update rows is greater than 0

Companion piece [” Top Secret Files “” Revelations” complete kill architecture design to technical key points of the “gossip information”]

Then the extended technology will be introduced as follows:

Hot separation
- Hot Spot identification technology
- Hot spot isolation technology
- Hot spot optimization technology
Transaction integrity
- Interface idempotency
Distributed transaction system
- Transaction de-duplication method
- Idempotency of transaction processing
High data availability
- Reading and writing separation
- Depots table
- Database cluster
- Optimizing the database
DB level of single row records to do concurrent queuing mechanism
Service degradation analysis
Impact reduction method (delay processing)
- Delay queue mechanism (Single point method)
- Delay queue mechanism (distributed)

Complete SEC architecture design to technical key points of “intelligence information”