Brief introduction: This paper focuses on K8s layer container priority and quality of service model, hoping to provide some ideas for the industry.

Author: Nan Yi

The introduction

Since 2014, Alibaba’s offline mixing technology has undergone seven years’ double 11 test, and has been promoted on a large scale internally, saving alibaba Group billions of resource costs every year. The overall resource utilization rate has reached about 70%, leading the industry. In the past two years, we began to export the mixing technology within the group to the industry through the way of production, and installed it seamlessly on the standard native K8s cluster through the way of plug-in. With the mixing control and operation and maintenance capabilities, we improved the resource utilization rate of the cluster and the comprehensive user experience of the product.

Because mixing is a complex technology and operation and maintenance system, including K8s scheduling, OS isolation, observability and other technologies, this paper will focus on the container priority and quality of service model of K8s layer, hoping to provide some ideas for the industry.

K8s native model

In actual production practices, even many technicians familiar with cloud native and K8s often confuse scheduling priorities with QoS.

Therefore, before talking about the mixing model, we first make a detailed introduction to the concept of K8s original, as shown in the following table:

For a detailed description at the API level, see the table below

Problems to be solved by mixing parts

The main problem of mixing is to make full use of idle resources in the cluster to improve the overall utilization of the cluster on the premise of ensuring the service level target SLO of the deployment application.

When a cluster is deployed by online service deployment, due to the high security feature of online applications, the container will be assigned a peak resource specification, which may lead to low actual utilization.

We want to oversold this idle but unused resource for off-line operations with low SLO to raise the overall machine water level. In this way, it is necessary to provide SCHEDULING capabilities based on SLO and take into account the real resource water level of the machine to avoid hot spots.

In addition, the ONLINE SLO is usually high and the offline SLO is low. When the overall water level of the machine rises too high, the online APPLICATION SLO can be guaranteed by preempting the offline operation mode. And the isolation features required to utilize kernel-level Cgroups to safeguard high and low SLO jobs.

Between these online and offline pods, we need to have different scheduling priorities and quality of service levels to meet the actual online and offline operation requirements.

Application level model defined by cloud native mix

First, let’s look at the yamL definition of a Pod in mixing

apiVersion: v1 kind: Pod metadata: annotations: alibabacloud.com/qosClass: BE # {LSR,LS,BE} labels: alibabacloud.com/qos: BE # {LSR,LS,BE} spec: containers: - resources: limits: alibabacloud.com/reclaimed-cpu: 1000 # unit milli core, 1000 said 1 core alibabacloud.com/reclaimed-memory: 2048 # unit of byte, and the common memory. Unit for Gi Mi Ki GB MB KB requests: alibabacloud.com/reclaimed-cpu: alibabacloud.com/reclaimed-memory: 2048 1000Copy the code

This is the Pod rating we introduced in the mixer. The difference from the community native is that we explicitly declare 3 ratings in anotation and label: LSR, LS and BE. These three levels are associated with scheduling Priority and Qos.

Specific resource usage for each container, LSR and LS still use the original CPU /memory configuration. BE tasks are special, and resources are declared in the community standard extend-Resource mode.

So what are the run-time implications of these three levels? Take a look at the runtime of these three applications on the CPU

And detailed implications for other resource usage:

It can be seen that this level is not only related to THE CPU and memory of Pod running on a single machine, but also related to the full-link priority of network Qos, so as to avoid low-optimal off-line tasks preempting all network bandwidth. Ali in the kernel work effectively guarantee the runtime application stability, double during the 11, 2021, he became the world’s first of all business on their own public cloud big technology companies, which means that ali cloud has the ability to cope with difficult technical challenges in a complicated environment, also has brought the very big technology benefits: Alibaba’s business research and development efficiency has increased by 20%, CPU resource utilization has increased by 30%, application cloud bioengineering has increased by 100%, and online business container can reach one million scale. Meanwhile, its computing efficiency has been greatly improved, and the overall computing cost of Singles’ Day has decreased by 30% in three years. Hybrid deployment technology plays an important role in this process. The kernel team and cloud native team engineers have stepped on numerous pits to develop advanced features including flexible CPU bandwidth, Group Identity, SMT Expeller, MEMCG asynchronous reclaim, memory water tier, MEMCG OOM, and so on, leading the industry. This work will be covered in a series of articles.

What happens when these three types of priority tasks are actually scheduled and run, as shown in the table below

In other words, the priority of mixing will be applied to both scheduling and run-time to maximize the use of resources in the cluster for high-priority and medium-priority tasks with high SLO.

Quota, water mark, multi-lease isolation

This paper only focuses on the scheduling priority of K8s single Pod. In practical use, in order to ensure the application of SLO, it is necessary to cooperate with the water line of the single machine, tenant’s quota, and OS isolation capability, etc., which will be discussed in detail in the subsequent articles.

This section describes related solutions

Entering the year of 2021, mixing has become a very mature technology in Ali, saving billions of costs for Ali every year, and is the basic ability of Ali data center. Ali Cloud also put these mature technology after two years, precipitation into mixed products, began to serve all walks of life.

In the product family of ali cloud, we will reveal the mixing capability through ACK agile edition and CNStack (CloudNative Stack) product family, and combine with the dragonfly operating system (OpenAnolis) to form a complete integrated solution of mixing in CloudNative data center. Output to our customers.

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.