QPS, TPS, RT, concurrency, throughput understanding, and performance optimization

throughput

Before looking at QPS, TPS, RT, and concurrency, it is important to understand what a system’s throughput actually means. In general, system throughput refers to the system’s capacity to withstand stress, load, and the maximum number of users a system can handle per second.

The throughput of a system is usually determined by QPS (TPS) and concurrency. Each system has a relative limit for both values, and as long as one of the maximum values is reached, the throughput of the system will fail to increase.

QPS

Note that query refers to the number of successful responses to requests sent by the user to the server. The simple definition is query = Request.

QPS = Number of requests per second

TPS

Transactions Per Second. Number of Transactions Per Second. A transaction is the process by which a client sends a request to a server and the server responds. The client starts the timing when the request is sent and ends the timing when the server response is received to calculate the time used and the number of transactions completed.

For a single interface, TPS can be considered equivalent to QPS. For example, accessing a page /index.html is a TPS, while accessing a page /index.html may request the server three times, such as CSS, JS and index interfaces, resulting in three QPS.

TPS = Transactions per second

RT

Response Time is an abbreviation, which is simply understood as the Time interval between the system input and output. Broadly speaking, it represents the Time difference between the client initiates the request and the server receives the request and responds to all the data. The average response time is generally taken.

concurrency

In short, the number of requests/transactions the system can process simultaneously.

calculation

QPS= concurrency /RT or concurrency =QPS*RT

Here’s an example:

Let’s assume that every single hour between 9 a.m. and 10 a.m., the company has 3,600 employees and each employee spends an average of 10 minutes in the bathroom. Let’s do the math.

QPS = 3600/60*60 1

RT is equal to 10 times 60, 600 seconds

Concurrency = 1 * 600 600

This means that for the best possible experience, the company will need 600 slots to accommodate staff who otherwise have to wait in line for the loo.

Performance thinking

Using the QPS= concurrency /RT formula, assuming we are in a single-threaded scenario, the QPS formula would look like this: QPS=1/RT, QPS=2/(CPU time + CPU wait time), QPS=2/(CPU time + CPU wait time)

Optimal thread count calculation

If the CPU time is 49ms and the CPU wait time is 200ms, QPS=1000ms/249ms=4.01. If the CPU wait time is 200ms, QPS=1000ms/249ms=4.01. In theory, 200ms can also accept 200/49≈4 requests. Without considering the context switch and other overhead, it can be considered that the total number of threads =(200+49)/49=5. If we consider the problems of CPU multi-core and utilization, we can roughly think that:

** Optimal number of threads =RT/CPUTime ✖️ Number of CPU cores ✖️ CPU usage **

Then the maximum QPS formula can be deduced as follows:

Maximum QPS= Optimal number of threads ✖️ Single-thread QPS= (RT/CPU Time ✖️ Number of CPU cores ✖️CPU usage) ✖️ (1/RT) = Number of CPU cores ✖️CPU usage /CPUTime

So does this mean that we can increase QPS indefinitely just by increasing the number of CPU cores?

Amdahl’s law

In 1967, G.M.Amdahl proposed Amdahl’s Law, which provided a model for the scalability of parallel processing. It was pointed out that the scaling speed depends on the parallelism part of the problem. We can simply think of it as the acceleration value that the program can theoretically obtain with additional computing resources.

Par is the proportion of parallel computing, and p is the number of parallel processing nodes

Suppose you want to go from Wangjing to Shunyi, it takes 3 hours by car, although there are 3 cars now, you can’t get there in 1 hour. There’s no parallelization here, all Par=0%, p=3, the acceleration ratio is still equal to 1, it’s not getting any faster.

Gustafson’s Law

Stafson’s Law, also known as scaled Speedup, describes the relationship between the number of processors, the serial ratio, and the speedup ratio, but with a different emphasis than Amdar’s law.

According to Amdar’s law and the QPS formula, increasing the number of CPU cores will increase the maximum QPS if CPUtime and CPU utilization remain the same, and increasing the number of parallelies p will increase the efficiency if the PAR is not zero or parallel. However, as the number of requests increases, Brings a lot of context switching, GC, and lock changes. Higher QPS, more objects generated, more frequent GC, both CPU time and utilization are affected, especially when serialized. Lock spin, adaptive, bias, and so on are also factors affecting pars.

In conclusion, in order to improve and achieve the best performance, we need to continuously conduct performance tests, adjust the size of small cities, and find the most appropriate parameters to achieve the purpose of improving performance.

Reference:

javahao123.com/?p=772

Cloud.tencent.com/developer/a…

www.cnblogs.com/caishunzhe/…

www.jianshu.com/p/8532ac88c…

zhuanlan.zhihu.com/p/66929848

www.cnblogs.com/lupeng2010/…

END –

QPS, TPS, RT, concurrency, throughput understanding, and performance optimization

throughput

QPS

TPS

RT

concurrency

calculation

Performance thinking

Optimal thread count calculation

Amdahl’s law

Gustafson’s Law

Related Posts

Practice of GRPC-Node in BFF (PART 1)

Front-end automation integration deployment delivery practices

Too great! Python makes lossless conversion of audio formats surprisingly easy