Click on the top of the blue text “back-end technology little black room”, follow the eggplant to save the world public account ~


Recently in a service server model replacement, encountered about the number of threads, multi-core CPU, hyper-threading problems.

I have previously discussed the relationship between physical cores and hyperthreading in Threads vs. Multicore cpus, and suggested that for computationally intensive tasks, the number of threads should be set to the number of physical cores.

However, this model replacement, or problems.

Multi – core and main frequency

Originally, our service used a single CPU, 4 physical cores, 8 logical cores (hyperthreading, HTT) server, the main frequency is 3.4g. The replacement of the new model is dual-CPU, each CPU20 physical core, 40 logical core (HTT), a total of 80 core server, but the main frequency is only 2.5g.

Considering that this service is a delay sensitive service, my heart is cool to see the main frequency here. However, there is a lot of talk on the Internet (mainly in Intel’s official press release) that the new version of the lower CPU can match or even exceed the performance of the older version of the higher CPU even with the smaller process update architecture.

Why do multicore cpus have lower main frequencies? This is mainly about power consumption. With 40 physical cores densely distributed on a board, high main frequency will bring high power consumption, and heat dissipation is a big problem. Therefore, the CPU of dozens of cores on the market at present, the main frequency is basically about 2. XG.

The second concern is that in order to take full advantage of the performance of multi-core cpus, the number of threads in the program needs to be greatly increased, so will there be contention for locks and thus performance degradation? Fortunately, lock-free With Double Buffers has solved most of the locking problems, and the current service is basically a lock-free, computation-intensive application.

hyper-threading

So deploy, test, pressure test, go live.

As the volume of requests increased, the delay on the new machine more than doubled.

I consulted the operation and maintenance students who often configure and use this new model, and told me that I could try to turn off hyperthreading. Sure enough, after the hyper-threading is closed, the delay is less than 10% higher than the original 8-core machine, and there is no significant change with the increase of load, which meets the delay requirements of upstream service.

So why does turning off hyperthreading work?

First, we need to understand what hyperthreading is.

Hyperthreading, a processor technique used to increase parallelism in CPU computation, simulates two logical cores with one physical core. The two logical cores have their own interrupts and states, but share the computing resources of the physical core. Hyperthreading technology aims to increase the utilization of CPU computing resources and thus improve the parallelism of computing.Copy the code

Hyperthreading is based on the fact that most programs run with underutilized CPU resources. For example, when the CPU cache misses, branch prediction is wrong, or waiting for data, the computing resources in the CPU are actually idle. Hyperthreading technology can dispatch these idle CPU resources to other instructions through hardware instructions, thus improving the overall utilization of CPU resources.

So why does our program improve latency by turning off hyperthreading?

There may be several reasons for this:

  • Hyperthreading is not nearly as efficient as adding a separate physical core, because it does not add any computing resources, but allows two tasks to share existing computing resources from the same physical core. So if the processor has enough computing resources that are underutilized, hyperthreading can provide a big boost, and another thread can plug in when some computing resources are idle. This allows you to compute more tasks in the same clock cycle. The performance gains here are more of a throughput gain than a single-task computing delay gain. If the application is less dependent on the front and back, it can be compiled into independent instruction execution, and the increase in CPU level throughput can also lead to the decrease in application level computing latency. However, too few businesses meet this requirement, and most applications are contextual.

  • Whether hyperthreading utilization is high depends on the operating system. If the operating system does not understand hyperthreading (e.g., Linux2.6, Windows Server2003, etc.), then it is possible to distribute computing tasks that could have been distributed to two physical cores to two hyperthreads on the same physical core. In this case, there will be a performance loss.

  • Some studies show that hyperthreading depends on the CPU scheduling of the operating system. However, if CPU binding is enabled in an application, the integrity of this scheduling can be compromised, resulting in a performance penalty. Unfortunately, due to the internal logic of our application, CPU binding must be enabled. This is probably a major problem.

  • In order to implement hyperthreading, an additional logical processing unit is required in the physical core. Even if there is no instruction for the calculation of the logical core out of the hyperthread, the new logical processing unit will occupy a certain amount of physical core resources, bringing performance impact.

In addition, hyper-threaded programs can lead to a significant increase in power consumption. In the case of server applications, this can lead to increased power consumption, resulting in cooling problems and thus lower frequency, and battery life can also be a serious problem for current smartphones.

conclusion

Hyperthreading technology is now almost standard on server cpus. But how much of a performance boost you get depends on your application and operating system.

If you find that server performance is not up to par, try turning off hyperthreading.


Recommended reading:

About dynamic_cast

Be careful with unsigned int subtraction

Interview to build a nuclear bomb, work to turn a screw?


Illustration: Russell_Yan

Authorization: CC0 protocol