95% of the content of this article was translated from github.com/brettwooldr…

While researching HikariCP, a database connection pool, I came across an article on HikariCP’s Github Wiki (the link above) that was a great relief to me. So here do translation share.

And then the body

Configuration of database connection pools is a common pitfall for developers, and there are a few counterintuitive principles to follow when configuring database connection pools.

10,000 concurrent users

Imagine that you have a site that, while not as stressful as Facebook, has around 10,000 concurrent visits — that’s around 20,000 TPS. How big should this site’s database connection pool be? The results may surprise you, because the correct way to ask this question is:

“How small should this site’s database connection pool be?”

This video is Oracle Real World Performance Group, please read: www.dailymotion.com/video/x2s8u…

In the video, Oracle database was subjected to a stress test, 9600 concurrent threads were performing database operations, and sleep 550ms between two database accesses. At the beginning, the size of middleware thread pool was set to 2048:

Initial configuration

After the pressure test runs, it looks like this:

2048 Performance data during connection

Each request will wait for 33ms in the connection pool queue, and then it will take 77ms to execute the SQL after obtaining the connection.

All kinds of buffer busy waits

Then reduce the database connection pool to 1024 (concurrency and all that) and the performance data looks like this:

Connection pool down to 1024

The wait time for fetching links has not changed much, but the time to execute SQL has decreased. In the graph below, the top half is wait and the bottom half is throughput

Wait and Throughput

As can be seen, after the middleware connection pool is halved from 2048, the swallow volume remains unchanged, but wait events are halved. Next, reduce the database connection pool to 96 and the number of concurrent threads remains the same as 9600.

Performance data at 96 connections

The average queue wait time is 1ms, and the average SQL execution time is 2ms.

Wait events are almost gone and throughput is up.

Without adjusting anything else, just shrinking the database connection pool in the middleware layer reduced the request response time from around 100ms to 3ms.

But why?

Why does nginx use only 4 threads to perform so much better than Apache HTTPD with 100 processes? Think back to the basics of computer science, and the answer is pretty obvious.

Even a computer with a single CPU can run hundreds of threads “simultaneously.” But we all [should] know that this is just a trick the operating system plays with time sharding. A CPU core can execute only one thread at a time, then the operating system switches context, and the core starts executing code for another thread, and so on. It is A basic law of computer science that given A CPU core, it is always faster to execute A and B sequentially than to execute A and B “simultaneously” through time sharding. Once the number of threads exceeds the number of CPU cores, increasing the number of threads makes the system slower, not faster.

It’s almost true…

Limited resources

The above statement can only be said to be close to the truth, but it is not so simple, there are some other factors need to be added. When we look for database performance bottlenecks, we can always group them into three categories: CPU, disk, and network. There’s nothing wrong with adding memory, but it’s orders of magnitude more bandwidth than disk or network, so I’ll leave it at that.

If we ignore disks and networks, the conclusion is very simple. On an 8-core server, setting the connection/thread count to 8 provides optimal performance, and increasing the connection count further degrades performance due to context switch losses. A database usually stores data on a disk, which in turn consists of rotating metal disks and a read/write head mounted on a stepper motor. The read/write header can only be in one place at a time, and then it must “address” to another location to perform another read/write operation. So there’s the addressing time, and then there’s the loopback time, where the read/write head has to wait for the target data on the disk to “rotate into place” before it can operate. Using caching can certainly improve performance, but the principle still holds.

During this time period (the “I/O wait “), threads are “blocking” waiting for the disk, and the operating system can use that free CPU core to serve other threads. So, since threads are always blocking on I/O, we can have more threads/connections than the CPU core and get more work done in the same amount of time.

So how much more should it be? It depends on the disk. Newer SSDS do not require addressing and do not have rotating discs. Don’t assume that “SSDS are faster, so we should increase the number of threads.” On the contrary, no addressing and no loop time means less blocking, so fewer threads [closer to the number of CPU cores] will deliver better performance. Only when blocking creates more opportunities for execution can more threads produce better performance.

Networks are similar to disks. Data reading and writing through an Ethernet interface is also blocked. The 10 GBIT/s bandwidth is less blocked than the 1 gbit/s bandwidth, and the 1 GBIT/s bandwidth is less blocked than the 100 MBIT/s bandwidth. But networks are usually a secondary concern, and some people ignore them in performance calculations.

In the chart above, PostgreSQL benchmark data shows that TPS growth starts to slow down from 50 connections. In the Oracle video above, they reduced the number of connections from 2048 to 96, which is actually too high unless the server has 16 or 32 cores.

A formula to calculate

The following formula is provided by PostgreSQL, but we believe it can be applied broadly to most database products. You should model the expected traffic and test your app from this formula to find the most appropriate number of connections.

Connections = ((Number of cores * 2) + Number of valid disks) The number of cores should not contain hyper threads, even if HyperThreading is enabled. If all the active data is cached, then the number of effective disks is 0, and as the cache hit ratio decreases, the number of effective disks gradually approaches the actual number of disks. How well this formula works with SSDS has not been analyzed.

Using this formula, your 4-core I7 database server should have a connection pool size of ((4 * 2) + 1) = 9. Let’s call it 10. Is it too small? Run a performance test and we guarantee that it can easily handle a scenario where 3000 users execute a simple query concurrently at 6000TPS. If the connection pool size exceeds 10, you will see the response time start to increase and the TPS start to decrease.

Note: this formula is not only applicable to database connection pool calculation, most of the calculation and I/O procedures, thread number Settings can refer to this formula. In my previous stress test of a messaging service written using Netty, the optimal number of threads was exactly double the number of CPU cores.

Axiom: You need a small pool of connections and a queue full of threads waiting to connect

If you have 10, 000 concurrent users, setting a connection pool of 10, 000 is almost insane. 1000 is still scary. Even 100 is too much. You need a small pool of 10 or so connections and let the rest of the business threads wait in the queue. The number of connections in the connection pool should be equal to the number of query tasks your database can efficiently perform simultaneously (usually no more than 2 CPU cores).

It is common to see a small web application with about a dozen concurrent users using a connection pool of 100 connections. This can place an extremely unnecessary burden on your database.

Please note that

The size of the connection pool ultimately depends on the system characteristics.

A system with a mix of long and short transactions, for example, is often difficult to tune for any connection pool. The best approach is to create two connection pools, one for long and one for short transactions.

For example, if a system runs a task queue and only a certain number of tasks are allowed to execute simultaneously, the number of concurrent tasks should be adjusted to the number of connections in the connection pool, not the other way around.