How does "12306" support millions of QPS?

Author: draw your whole life

Link: https://juejin.cn/post/6844903949632274445

Every holiday season, people in first-tier and second-tier cities who return to their hometowns and go out to play almost all face a problem: grabbing train tickets!

Grab tickets, the limit of concurrent thinking

Although most of the cases can book tickets now, but put tickets instant that no ticket scene, I believe we have a deep experience.

Especially during the Spring Festival, people not only use 12306, but also consider Zhixing and other ticket-snatching apps, as hundreds of millions of people across the country scramble for tickets during this time.

The “12306 service” is burdened with QPS that is unsurpassed by any seckill system in the world, and millions of concurrent requests are perfectly normal!

The author specially studied the server architecture of “12306” and learned many highlights of its system design. Here, I would like to share with you and simulate an example: how to provide normal and stable service when 1 million people grab 10,000 train tickets at the same time.

https://github.com/GuoZhaoran/spikeSystemCopy the code

1. Large and high concurrency system architecture

The system architecture with high concurrency adopts distributed cluster deployment. Load balancing is implemented at the upper layer of services and various Dr Methods (such as dual-fire room, node fault tolerance, and server DISASTER recovery) are provided to ensure high availability of the system. Traffic is also balanced to different servers based on load capacity and configuration policies.

1.1 Overview of Load Balancing

In the figure above, the user requests to the server go through three layers of load balancing. The following three layers of load balancing are briefly introduced.

①OSPF is an Interior Gateway Protocol (IGP).

OSPF advertises the status of network interfaces between routers to establish a link-state database and generate a shortest path tree. OSPF automatically calculates the Cost value of the routed interface. However, OSPF can manually specify the Cost value of the interface.

The Cost calculated by OSPF is inversely proportional to the interface bandwidth. The higher the bandwidth, the lower the Cost. Load balancing can be performed on paths that reach the same Cost value. A maximum of six links can perform load balancing simultaneously.

②LVS (Linux Virtual Server)

It is a Cluster technology, using IP load balancing technology and content – based request distribution technology.

The scheduler has a good throughput rate, transfers requests to different servers evenly for execution, and the scheduler automatically shields the server failures, thus forming a group of servers into a high performance, high availability virtual server.

(3) Nginx

It is a very high performance HTTP proxy/reverse proxy server, which is often used for load balancing in service development.

Nginx implements load balancing in three main ways:

polling
Weighted polling
IP Hash polling

Let’s configure and test weighted polling for Nginx.

1.2 Nginx weighted polling demo

The following is a weighted polling load configuration. I will configure the weight of 1,2,3,4 respectively on the local listening port 3001-3004:

Upstream load_rule {server 127.0.0.1:3001 weight=1; Server 127.0.0.1:3002 weight = 2; Server 127.0.0.1:3003 weight = 3; Server 127.0.0.1:3004 weight = 4; }... server { listen 80; server_name load_balance.com www.load_balance.com; location / { proxy_pass http://load_rule; }}Copy the code

Next, use Go language to open the four HTTP port listening service, the following is listening on port 3001 Go program, the other several only need to modify the port:

package mainimport ( "net/http" "os" "strings")func main() { http.HandleFunc("/buy/ticket", Func handleReq(w http.responsewriter, ListenAndServe(":3001", nil)} R *http.Request) {failedMsg := "handle in port:" writeLog(failedMsg, "./stat.log")} logPath string) { fd, _ := os.OpenFile(logPath, os.O_RDWR|os.O_CREATE|os.O_APPEND, 0644) defer fd.Close() content := strings.Join([]string{msg, ""}, "3001") buf := []byte(content) fd.Write(buf)}Copy the code

I wrote the requested port log information to the./stat.log file and used the AB pressure tool to do the pressure test:

ab -n 1000 -c 100 http://www.load_balance.com/buy/ticketCopy the code

Upstream load balancing in Nginx: Upstream load balancing in Nginx:

https://www.kancloud.cn/digest/understandingnginx/202607Copy the code

2. Selection of seckilling system

To solve this problem, we need to figure out one thing:

2.2 Order and reduce inventory

In extreme concurrency, the details of any memory operation can have a critical impact on performance, especially logic such as creating an order, which is typically stored in a disk database, putting an incredible strain on the database.
If a user places a malicious order, only placing an order without paying will reduce the inventory and sell many orders, although the server can limit the number of purchase orders for IP and users, which is not a good method.

2.2 Payment for inventory reduction

2.3 Withholding inventory

3. The art of withholding inventory

In the case of single machine with low concurrency, we usually realize the inventory withholding like this:

Then we have a local stock of 100 train tickets per machine, and the total stock on 100 servers is still 10,000, which ensures that the stock orders are not oversold. The cluster architecture we describe is as follows:

Let’s make a specific analysis based on the following architecture diagram:

4. Code demonstration

4.1 Initialization

The Redis library uses Redigo, and here is the code:

. Package localSpiketype localSpike struct {LocalInStock int64 LocalSalesVolume int64} struct {LocalInStock int64 LocalSalesVolume int64}... Package remoteSpike RemoteSpikeKeys struct {SpikeOrderHashKey string // Order hash key TotalInventoryKey string // Order hash key QuantityOfOrderKey string Func NewPool() *redis.Pool {return &redis.Pool{MaxIdle: 10000, MaxActive: 12000, // max number of connections Dial: func() (redis.Conn, error) { c, err := redis.Dial("tcp", ":6379") if err ! = nil { panic(err.Error()) } return c, err }, }}... func init() { localSpike = localSpike2.LocalSpike{ LocalInStock: 150, LocalSalesVolume: 0, } remoteSpike = remoteSpike2.RemoteSpikeKeys{ SpikeOrderHashKey: "ticket_hash_key", TotalInventoryKey: "ticket_total_nums", QuantityOfOrderKey: "ticket_sold_nums", } redisPool = remoteSpike2.NewPool() done = make(chan int, 1) done <- 1}Copy the code

4.2 Local and unified withholding stock

The logic of the local inventory is very simple, the user requests to add the sales volume, and then compares whether the sales volume is greater than the local inventory, returns Bool:

Package localSpike// LocalDeductionStock. Mandatory bool func (spike * localSpike) LocalDeductionStock() bool{spike.LocalSalesVolume = spike.LocalSalesVolume + 1 return spike.LocalSalesVolume < spike.LocalInStock}Copy the code

Unified inventory operation Redis, because Redis is a single thread, and we need to achieve data from it, write data and calculate a series of steps, we need to coordinate with Lua script packaging command, to ensure the atomicity of the operation:

package remoteSpike...... const LuaScript = ` local ticket_key = KEYS[1] local ticket_total_key = ARGV[1] local ticket_sold_key = ARGV[2] local ticket_total_nums = tonumber(redis.call('HGET', ticket_key, ticket_total_key)) local ticket_sold_nums = tonumber(redis.call('HGET', ticket_key, If (ticket_total_nums >= ticket_sold_nums) then return redis. Call ('HINCRBY', ticket_key, ticket_sold_key, Func (RemoteSpikeKeys *RemoteSpikeKeys) RemoteDeductionStock(conn Redis.Conn) bool {lua := redis.NewScript(1, LuaScript) result, err := redis.Int(lua.Do(conn, RemoteSpikeKeys.SpikeOrderHashKey, RemoteSpikeKeys.TotalInventoryKey, RemoteSpikeKeys.QuantityOfOrderKey)) if err ! = nil { return false } return result ! = 0}Copy the code

hmset ticket_hash_key "ticket_total_nums" 10000 "ticket_sold_nums" 0Copy the code

4.3 Responding to User Information

We start an HTTP service that listens on a port:

package main... func main() { http.HandleFunc("/buy/ticket", handleReq) http.ListenAndServe(":3005", nil)}Copy the code

Now that we’ve done all the initialization, the logic of handleReq is pretty clear.

Func handleReq(w http.ResponseWriter, R * HTTP Request) {redisConn: = redisPool. Get LogMsg () : = "" < - done. / / the global read-write lock if localSpike LocalDeductionStock && () RemoteSpike. RemoteDeductionStock (redisConn) {util. RespJson (w, 1, "rob tickets success", nil) LogMsg = LogMsg + "result:1,localSales:" + strconv.FormatInt(localSpike.LocalSalesVolume, } else {util.RespJson(w, -1, "sold out ", nil) LogMsg = LogMsg + "result:0,localSales:" + strconv.FormatInt(localSpike.LocalSalesVolume, // stat.log)}func writeLog(MSG string, logPath string) {fd, _ := os.OpenFile(logPath, os.O_RDWR|os.O_CREATE|os.O_APPEND, 0644) defer fd.Close() content := strings.Join([]string{msg, ""}, "") buf := []byte(content) fd.Write(buf)}Copy the code

4.4 Single-machine service pressure test

To start the service, we use AB pressure testing tool to test:

Ab - n - 10000 c 100 http://127.0.0.1:3005/buy/ticketCopy the code

Here is my local low Mac pressure test information:

This is ApacheBench, Version 2.3 <$Revision: 1826891 $>Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/Licensed to The Apache Software Foundation, http://www.apache.org/Benchmarking 127.0.0.1 (be patient) Completed requestsCompleted requestsCompleted 2000 3000 1000 requestsCompleted 4000 requestsCompleted 5000 requestsCompleted 6000 requestsCompleted 7000 requestsCompleted 8000 requestsCompleted 9000 requestsCompleted 10000 requestsFinished 10000 requestsServer Software:Server Hostname: 127.0.0.1Server Port: 3005Document Path: /buy/ticketDocument Length: 29 bytesConcurrency Level: 100Time Taken for Tests: 2.339 secondsComplete Requests: 10000Failed Requests: 0Total transferred: 1370000 bytesHTML transferred: 290000 bytesRequests per second: 4275.96 [#/ SEC] (mean)Time per request: [MS] (mean)Time per request: 0.234 [MS] (mean, across all concurrent requests)Transfer rate: 572.08 [Kbytes/ SEC] receivedConnection Times (ms) min mean[+/-sd] Median maxConnect: 08 14.7 6 223Processing: 2 15 17.6 11 232Waiting: 1 11 13.5 8 225Total: 7 23 22.8 18 239Percentage of the requests served within a certain time (MS) 50% 18 66% 24 75% 26 80% 28 90% 33 95% 39 98% 45 99% 54 100% 239 (longest request)Copy the code

In addition, it is found by checking logs that requests are normal, traffic is even and Redis is normal in the whole service process:

//stat.log... result:1,localSales:145result:1,localSales:146result:1,localSales:147result:1,localSales:148result:1,localSales:149resul t:1,localSales:150result:0,localSales:151result:0,localSales:152result:0,localSales:153result:0,localSales:154result:0,l ocalSales:156...Copy the code

5. Review

① Load balancing, divide and conquer
② Use concurrency and asynchrony properly

● No. 1083, input no. Direct to this article

● Type m to get the article table of contents

Programmer Job Interview

Share experience in job hunting for programmers

Programmer written test, interview questions

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

How does “12306” support millions of QPS?

Grab tickets, the limit of concurrent thinking

1. Large and high concurrency system architecture

2. Selection of seckilling system

3. The art of withholding inventory

4. Code demonstration

5. Review

How does “12306” support millions of QPS?

Grab tickets, the limit of concurrent thinking

1. Large and high concurrency system architecture

2. Selection of seckilling system

3. The art of withholding inventory

4. Code demonstration

5. Review

Related Posts

Algorithm (3) : graphic breadth-first search algorithm

Tomcat Optimization Scheme

Ari hung me up just because I didn’t know the algorithm