Authors: Chang Huaixin, Ding Tianchen

Annoying CPU limiting affects container performance, and sometimes one has to sacrifice container deployment density to avoid CPU limiting. We designed the CPU Burst technology to ensure the quality of container operation without reducing the density of container deployment. CPU Burst has been incorporated into Linux 5.14. Anolis OS 8.2, Alibaba Cloud Linux2, and Alibaba Cloud Linux3 also support CPU Burst.

In K8s container scheduling, the CPU limit of a container is specified by the CPU limits parameter. Capping CPU resources limits the amount of CPU time that individual containers consume and ensures that other containers receive sufficient CPU resources. CPU limits are implemented in the Linux kernel with CPU Bandwidth Controller, which limits the resource consumption of cgroups by CPU Bandwidth limiting. So when processes in a container use resources that exceed CPU limits, they are restricted by the CPU, their CPU time is limited, and some of the key latency indicators in the process are degraded.

Faced with this situation, what should we do? Typically, we set the container’s CPU limits by taking the container’s daily peak CPU utilization and multiplying it by a relatively safe factor. This way, we can avoid the container’s quality-of-service degradation due to traffic limiting while maintaining CPU utilization. For a simple example, we have a container whose daily peak CPU usage is around 250%. We set the container’S CPU limits to 400% to ensure the container’s quality of service, and the container’s CPU utilization is 62.5% (250%/400%).

But is life really that good? Apparently not! CPU limiting occurs more often than expected. How to do? It seems that we can only solve this problem by increasing CPU limits further. Most of the time, a container’s quality of service is guaranteed when its CPU limits are increased 5 to 10 times, resulting in a container’s total CPU utilization of 10 to 20 percent. So in order to cope with the potential spike in container CPU usage, container deployment density must be significantly reduced.

Historically, people have fixed some bugs in CPU Bandwidth Controller that caused CPU limiting problems. We found that the current unexpected limiting was caused by 100ms CPU burst usage. In addition, the CPU Burst technology allows certain CPU Burst to avoid the CPU current limiting when the average CPU utilization is lower than the limit. In cloud computing scenarios, the value of CPU Burst technology is:

  1. Improve CPU resource service quality without improving CPU configuration;
  2. Allows the resource owner to reduce CPU resource configuration and improve CPU resource utilization without sacrificing resource service quality.
  3. Reduce resource Cost (TCO, Total Cost of Ownership).

The CPU utilization you see is not the whole story

The CPU usage at the second level does not reflect the CPU usage at the 100ms level of the Bandwidth Controller, which causes unexpected CPU traffic limiting.

Bandwidth Controller Applies to CFS tasks. Period and quota are used to manage Cgroup CPU usage. If the cgroup period is 100ms and the quota is 50ms, cgroup processes can use a maximum of 50ms CPU time every 100ms. When the CPU usage exceeds 50ms in a 100ms cycle, the CPU usage of the cgroup is restricted to 50%.

The CPU usage is the average CPU usage over a period of time. If the CPU usage is calculated in a coarse granularity, the CPU usage tends to be stable. As the granularity of observations becomes finer, the burst characteristics of CPU usage become more pronounced. The container load was observed at the same time with 1s granularity and 100ms granularity. When the observation granularity was 1s, the average CPU utilization was about 250% in seconds. At the 100ms level of the Bandwidth Controller, the observed CPU utilization peak exceeded 400%.

Set quota and period to 400ms and 100ms respectively based on the observed CPU usage of 250% at the second level. The fine-grained burst of container processes is limited by the Bandwidth Controller, affecting the CPU usage of container processes.

How to improve

We meet this fine-grained CPU Burst requirement with CPU Burst technology, introducing the concept of Burst based on the traditional CPU Bandwidth Controller quota and Period. When the CPU usage of the container is below the quota, the burst resources available for the burst accumulate. When the container’s CPU usage exceeds quota, the accumulated burst resources are allowed to be used. The net effect is to limit the container’s average CPU usage to its quota over a longer period of time, allowing short periods of time for CPU usage to exceed its quota.

If the Bandwidth Controller algorithm is used to manage vacation time, the vacation time period is one year, and the quota of vacation time is set to quota. With CPU Burst technology, the vacation time can be taken later.

The quality of service of the test container improved significantly after using CPU Burst in the container scenario. A 68% decrease in the mean RT was observed (from 30+ms to 9.6ms); 99% RT decreased by 94.5% (from 500+ms to 27.37ms).

CPU Bandwidth Controller guarantee

Using the CPU Bandwidth Controller prevents certain processes from consuming too much CPU time and ensures that all processes that need CPU get enough CPU time. The reason for this stability is that when the Bandwidth Controller is set,

Scheduling stability constraints are as follows:

Among them,

Is the quota of the i-th Cgroup and the CPU requirement for that Cgroup during a period. Bandwidth Controller collects CPU time statistics for each period. Scheduling stability constraints ensure that all tasks submitted within a period can be processed within that period. For each CPU Cgroup, this means that the task submitted at any time can be completed within a period, namely the task real-time constraint:

Regardless of task priority, WCET (worst-case Execution Time) does not exceed one period.

If it continues to appear

The scheduler stability is broken, tasks accumulate at each period, and the execution time of newly submitted jobs increases.

Use of CPU Burst effects

In order to improve the quality of service, we use CPU Burst to allow Burst CPU usage, what is the impact on the stability of the scheduler? The answer is that when multiple Cgroups use CPU in a burst at the same time, the scheduler stability constraint and task real-time guarantee may be broken. At this time, the probability of the two constraints being guaranteed is the key. If the probability of the two constraints being guaranteed is very high, the real-time performance of the task can be guaranteed for most of the cycles, and the CPU Burst can be confidently used. If the probability of task real-time performance is very low, CPU Burst cannot be directly used to improve service quality, and the deployment density should be reduced first to improve CPU resource allocation.

So my next concern is, how do I calculate the probability of two constraints being broken in a particular scenario.

Assess the impact

The quantitative calculation can be defined as a classical queuing theory problem and solved by Monte Carlo simulation. The results of quantitative calculation show that average CPU utilization and number of Cgroups are the main factors to determine whether CPU Burst can be used in the current scenario. The lower the CPU utilization, or the greater the number of Cgroups, the less likely both constraints are to be broken and safe to use CPU bursts. On the other hand, if CPU utilization is high or the number of Cgroups is small, to eliminate the impact of CPU limiting on process execution, reduce deployment and improve configuration before using CPU Burst.

The problem is defined as: there are m Cgroups, each of which is limited to a quota of 1/m, and each of which generates a specific distribution of computing demand (CPU utilization) for each cycle. These distributions are independent of each other. Assume that the task arrives at the beginning of each cycle. If the CPU demand in this cycle exceeds 100% and the WCET of the task in the current cycle exceeds 1 period, the excess part is accumulated and processed together with the newly generated CPU demand in the next cycle. The input is the number of Cgroups m and the specific distribution satisfied by each CPU requirement, and the output is the probability and WCET expectation of ending each period with WCET > period.

Take the result of input CPU demand as pareto distribution, m=10/20/30 as an example. The reason why we choose the Pareto distribution is that it produces a lot of long tail CPU burst usage and is easy to have a great impact. The format of data items in the table is

Among them

The closer you get to one the better,

The lower the probability, the better.

The results are consistent with intuition. On the one hand, the higher the CPU demand (CPU utilization), the more likely the CPU burst is to break the stability constraint, causing the task WCET expectation to become longer. On the other hand, the more Cgroups with independent CPU demand distribution, the less likely they are to produce sudden CPU demand at the same time, the easier it is to maintain the stability constraint of the scheduler, and the closer WCET expectation is to 1 period.

Scenario and parameter Settings

We set that there are m Cgroups in the whole system, and each Cgroup equally shares 100% of the TOTAL CPU resources, that is, quota=1/m. Each CGOUP generates computing requirements and assigns them to the CPU according to the same rule (independent and distributed). \

Referring to the queuing theory model, we treat each Cgroup as a customer, the CPU as the service desk, and each customer’s service time is limited by quota. To simplify the model, we discretely define the arrival time interval for all customers as a constant, and then the CPU can service up to 100% of the computing demand within this interval, which is a cycle.

Then we need to define the service hours of each customer within a cycle. We assume that the computing demands generated by customers are independently and identically distributed, averaging u_AVg of their own quota. The amount of time a customer submits to the service desk for each cycle depends on its own computing needs and the maximum CPU time allowed by the system (its quota plus tokens accumulated in previous cycles).

Finally, there is an adjustable buffer in CPU Burst technology that represents the maximum number of tokens allowed to accumulate. This determines the instantaneous burst capacity of each Cgroup, which is expressed as b times quota.

We set the parameters defined above as follows:

The negative exponential distribution is one of the most common and used distributions in queuing theory models. Its density function is zero

Among them

Pareto distribution is a common distribution in computer scheduling system, and it can simulate a large delay tail, so as to reflect the effect of CPU Burst. Its density function is:

In order to suppress the tail probability distribution so that it is not too exaggerated, we set:

At this point, when U_AVG =30%, the maximum computing demand may be about 500%.

The data show

Monte Carlo simulation results based on the above parameter Settings are shown as follows. We reversed the Y-axis of the first chart (WCET expectation) to make it more intuitive. Similarly, the second chart (probability of WCET equal to 1) shows the probability that the timeliness of scheduling is guaranteed, expressed as a percentage.

Negative exponential distribution

Pareto distribution

Conclusion * *

In general, the higher the U_AVG (load calculated for demand), the lower the M (number of Cgroups) and the greater the WCET. The former is the obvious conclusion, while the latter is because the more tasks are individually and equally distributed, the more the overall demand generation tends to be average, and the tasks that exceed the quota demand are more likely to complement the tasks that fall below the quota and thus free up CPU time.

The increase of buffer can make THE CPU Burst play a better effect, and the optimization benefit for a single task is more obvious. However, it also increases WCET, which means increasing interference to adjacent tasks. This is also an intuitive conclusion.

When setting the buffer size, we recommend that the calculation requirements (including distribution and mean), the number of containers, and its own requirements be determined according to the specific business scenario. If you want to increase overall system throughput and optimize container performance without a high average load, you can increase the buffer. On the other hand, if we want to ensure the stability and fairness of scheduling and reduce the impact on the container when the overall load is high, buffer can be appropriately reduced.

In general, a CPU Burst does not have a significant impact on adjacent containers in scenarios with less than 70% average CPU utilization.

Simulation tools and methods of use

After the dry data and conclusions, what is likely to be on the minds of many readers: Will a CPU Burst affect my actual business scenario? To solve this puzzle, we adapted the tools used in the Monte Carlo simulation method to help you test the effects in your own real world scenarios

Tool can get here: codeup. Openanolis. Cn/codeup/ying,…

Detailed instructions are included with the README, so let’s take a look at an example.

Little A wants to deploy 10 containers on his server for the same business. In order to obtain accurate measurement data, he started a normal container service and bound it to cg1 cgroup without limiting the flow to obtain the real performance of the service.

Then call sample.py for data collection :(the demonstration effect is only collected 1000 times, and it is actually recommended that the more times of collection are better if conditions are available)

This data is stored in./data/cg1_data.npy. The displayed information shows that the average CPU usage of the service is about 6.5%. The average CPU usage of the service is about 65% when 10 containers are deployed. (PS: Variance data is also printed for reference, perhaps the larger the variance, the better the CPU Burst benefits.)

Next, he uses simu_from_data.py to calculate the impact of setting buffer to 200% when configuring 10 Cgroups with the same scenario as CG1:

According to the simulation results, enabling THE CPU Burst function has almost no negative impact on the container in this business scenario, and Little A can safely use it.

To learn more about the use of the tool, or to change the distribution and see the simulation results for theoretical interest, visit the repository link above

About the author

Chang Huaixin (Yizhai), Ali Cloud kernel group engineer, good at CPU scheduling field.

Ding Tianchen (Yingyu), joined Ali Cloud kernel group in 2021, and is currently studying and researching in scheduling field.

Click here to view ali Cloud proprietary cloud Agile version of cloud native Stack related introduction!