You know how many threads are appropriate for a server with a multi-core CPU?

First of all, we need to clarify the concepts of CPU number, core number and processor number.

For example, when you use top to view the load, press 1. The number of CPU0 to CPUn is actually the number of processors.



CPU

Using cat /proc/cpuinfo, you can see CPU cores and processors in the output.

CPU

So what are these conceptual differences?

CPU: An independent central processing unit (CPU), which is a slot for multiple cpus on the mainboard. CPU cores: On each CPU, there may be multiple cores, each with a separate set of ALU, FPU, Cache, etc. This concept is also known as physical cores. This is mainly thanks to hyper-threading technology, which allows a single physical core to simulate multiple logical cores, known as processors. Simply put, when you have multiple computing tasks, you can have one of them use ALU and the other one use FPU. In this way, the components of the physical core can be fully utilized, so that multiple computing tasks can be processed in parallel in the same physical core.Copy the code

With these concepts in mind, how do you choose to set the number of threads in your application?

In theory, for computationally intensive tasks, the number of threads should be the same as the number of parallelism the CPU can provide. Should the parallel number be the physical core number or the processor number?

Speak with the facts.

I tested with a server with 2 cpus, 12 physical cores on each CPU, and 2 logical processors on each physical core. The number of threads is 6, 10, 12, 30, 48, 96 respectively.

CPU

As you can see, if the number of threads is greater than the number of processors (48) or less than the number of physical cores (24), the throughput will be significantly affected. Therefore, for computationally intensive tasks to be tested, the number of threads should be set to between 24 and 48.

Specifically, throughput and CPU load ranged from 24 (number of physical cores) to 48 (number of processors) without significant change. However, there was a slow increase (10%) in 99 latency and a slight decrease (4%) in average latency.

More specific statistics on the delay can also be seen that the burr of the delay curve decreases with the decrease of the number of threads.

So why doesn’t hyper-threading here increase parallelism and thus throughput, as the theory suggests?

I suspect that because the use of computing components (FPU\ALU) is not evenly distributed in my program (and most of my programs), hyperthreading does not provide much parallelism. This slight improvement is offset by the cost of thread switching.

In summary, it is generally recommended to set the number of threads to the number of physical cores for computationally intensive tasks. Specifically, it is necessary to conduct corresponding pressure tests for different programs to obtain appropriate parameter selection.



Do read/write locks perform better in critical zones

Reprint please indicate the source: blog.guoyb.com/2018/08/18/…

Please use wechat to scan the qr code below and follow my wechat official account TechTalking. Technology · Life · Thinking:

Back end technology little black room