The author | d alibaba senior technical experts

Serverless Kubernetes is an exploration of the future evolution of Kubernetes by Alibaba Cloud container service team. By subtracting Kubernetes, it reduces the burden of operation and maintenance management, simplifies cluster management, and makes Kubernetes from complex to simple.

background

As a general container arrangement system, Kubernetes bears a wide range of applications and scenarios, including CI/CD, data computing, online applications, AI, etc. However, due to its versatility and complexity, managing a Kubernetes cluster is still full of challenges for many users, mainly reflected in:

  • High cost of learning;
  • Cluster O&M management costs are high, including node management, capacity planning, and node fault locating.
  • The computing cost is not optimal in many scenarios. For example, in a cluster with scheduled Jobs running, it is wasteful for users to hold a resource pool for a long time, resulting in low resource utilization.

Subtract the Kubernetes cluster

None Node Management

We believe that in the future users will focus more on application development than infrastructure maintenance. In the Kubernetes cluster, we want users to focus on the semantics of application orchestration such as pod/service/ingress/job, and less on the underlying node.

No need to manage nodes can significantly reduce the cost of cluster operation and maintenance management. According to statistics, most of the common Kubernetes abnormal problems are related to nodes, such as Node NotReady problem. There is no need to worry about the security of Node, as well as the upgrade and maintenance of basic system software.

In ASK cluster, we use virtual node Virtual-Kubelet instead of ECS node, the capacity of virtual node can be considered as “infinite”, users do not need to worry about the capacity of the cluster, there is no need to do capacity planning in advance.

There is no Master management

Like the ACK hosted version, ASK’s Master(Apiserver, CCM, KCM, etc.) resources are hosted by the container service platform, so users do not need to manage the upgrade, operation and maintenance of these core components, and there is no cost.

Minimal K8s base operating environment

In addition to eliminating the need to manage nodes and masters, Kubernetes cluster management has been greatly simplified, including hosting many Adons by default so that users no longer need to manage or pay for basic Adons. Relying on aliyun’s native network and storage capabilities, as well as the unique hosting architecture design, we provide extremely simplified but fully functional Kubernetes basic operating environment.

function ACK ASK
storage Aliyun-disk-controller/FlexVolume needs to be deployed No deployment required (in support)
The CNI network Terway/Flannel daemonset needs to be deployed No deployment is required and VPC networks are used for communication
Coredns service discovery Two copies of CoreDNS need to be deployed No deployment required, privateZone based access
kube-proxy Kube-proxy Daemonset needs to be deployed No deployment required, privateZone based access
Ingress Nginx-ingress-controller needs to be deployed No deployment is required, and layer 7 forwarding is based on SLB
ACR image is obtained without milking Acr-credential-helper needs to be deployed No deployment is required. It is supported by default
SLS log collection Logtail Daemonset needs to be deployed No deployment is required. It is supported by default
The metrics of statistical Metrics-server needs to be deployed No deployment required, out of the box
Mount the eip Terway needs to be deployed No deployment is required, specified using Annotaion
The cloud disk is mounted with pod Rely on aliyun – disk – controller No deployment is required. It is supported by default
Elastic scaling Cluster-autoscaler needs to be deployed Don’t need to deploy
GPU plug-in Nivida-docker needs to be deployed No deployment required, out of the box

To sum up, an ACK cluster requires at least 2 ECS machines to run these basic Adons, while an ASK cluster reduces these basic Adons to invisible and can create an out-of-box Kubernetes cluster at a cost of 0.

Simplified elastic expansion

Since there is no need to manage nodes and capacity planning, when the cluster needs to be expanded, it does not need to consider the capacity expansion at the node level, but only needs to pay attention to the capacity expansion of POD, which greatly improves the speed and efficiency of capacity expansion. At present, some customers designate the use of ASK/ECI to quickly cope with business traffic peak.

At present, ASK/ECI supports 30 seconds to fully start 500 pods (to the Running state), and a single POD can be started within 10 seconds.

Lower cost

In addition to the low cost of creating the ASK cluster itself, the on-demand use of POD enables optimal resource utilization in many scenarios. For many Jobs or data computing scenarios, users do not need to maintain a fixed resource pool for a long time. In this case, ASK/ECI can well support these demands.

It has been proved that when pod runs less than 16 hours a day, the ASK/ECI approach is more economical than keeping the ECS resource pool.

ECI: Elastic computing services for rapid delivery of container resources

When we talk about ASK, we must talk about ECI, the resource base of ASK. ECI is a stable, efficient and highly elastic container instance service provided by Ali Cloud based on ECS IaaS resource pool. ECI makes containers first class citizens of the public cloud. Users can deploy container applications directly without purchasing and managing ECS. This simplified container instance product form and ASK form a perfect combination.

Users can directly use THE ECI Open API to create container instance resources, but in the container scenario, users generally need an orchestration system to be responsible for container scheduling, highly available orchestration and other capabilities, and ASK is such a Kubernetes orchestration layer.

For ASK, ECI frees the ASK container service from the need to build a background computing resource pool, let alone worry about the capacity of the underlying computing resource pool. Based on ECI means that based on the whole IaaS scale resource pool of Ali Cloud, it has natural inventory and elasticity advantages (for example, ecS specifications corresponding to the underlying ECI can be specified through annotations, and most ECS specifications can be used in ASK. Meet the needs of multiple computing scenarios). In addition, ECI and ECS reuse of resource pools means that we can maximize the scaling dividend and provide lower cost computing services to our users.

Container ecological support

ASK provides complete support for the Kubernetes container ecosystem, and a large number of customers are currently using ASK to support a variety of scenarios:

  • CI/CD: Gitlab-Runner, Jenkins/Jenkins-X
  • Data calculation: Spark/Spark-operator, Flink, Presto, Argo
  • AI: tensorflow/arena
  • ServiceMesh: Istio, Knative
  • Tests: Locust, Selenium

The ASK cluster does not support Helm V2. ACK/ASK will release support for Helm V3 in the near future, and users can easily deploy Charts in the ASK cluster.

More ASK Reference documentation

“Alibaba Cloud originators pay close attention to technical fields such as microservice, Serverless, container and Service Mesh, focus on cloud native popular technology trends and large-scale implementation of cloud native, and become the technical circle that knows most about cloud native developers.”