Author: Chu Xingjuan

Interviewee: Wang Siyu (Name: Wine Wishes)

In December 2021, CNCF open source project OpenKruise officially announced the release of v1.0 major version.

OpenKruise is a Kubernetes-based extension suite that focuses on the automation of cloud native applications such as deployment, distribution, operation and maintenance, and availability protection. After being updated to V1.0, OpenKruise mainly provides functions such as application workload, Sidecar management, enhanced operation and maintenance (O&M) capabilities, elastic policy for partition deployment, and application availability protection, providing the ability to land cloud native applications.

At present, the number of officially registered adopters of OpenKruise has reached 35+. Alibaba, Ant Group, Meituan, Ctrip, NetEase, Xiaomi, OPPO and Suning all use OpenKruise in their production environment. Foreign companies such as Lyft in North America, Bringg in Israel and Shopee in Southeast Asia also use OpenKruise.

Built for more complex scenarios

OpenKruise originates from the best practices of large-scale application deployment, release and management in Alibaba’s economy over the years. Ali has a large scale of Internet application scenarios, and the vast majority of such rich business lines and a large number of application instances are run in the container cluster maintained by aliyunyun native platform in the way of containers.

In 2011, Ali began to develop container technology based on LXC, and then gradually completed the full containerization of the group’s business deployment. In recent years, with the development of cloud technology and the rise of cloud native, Ali has migrated the past T4 container to a new architecture system –ASI (Alibaba Serverless Infrastructure). ASI on the basis of the original Kubernetes, through the way of standardized expansion to provide more enhanced functions and adaptation ali Group scene landing ability, support a variety of complex scenes and needs.

As more and more diversified businesses migrated to the ASI cloud native cluster, Ali began to consider opening these component functions to Kubernetes users around the world, hence the Open source project OpenKruise. In June 2019, the first preview version of OpenKruise was released and announced as open source at the KubeCon Cloud Native Technology Summit.

In the view of Ali Cloud technology team, open source is more than just copying code and opening it up. “We’ve seen open source projects that just selectively update some of their internal code to GitHub every few months or more. This is not a healthy, sustainable, open source approach to community cohesion.” Alibaba cloud technology expert Wang Siyu said.

Therefore, in the more than two months between the initial concept and the release of the first open source version, the Alibaba Cloud technology team focused on the following two things:

  • Design open source and internal collaboration processes. After much consideration, the team decided to host OpenKruise’s base repository entirely in the community, maintain only a fork repository internally, and constantly sync code upstream from GitHub. As a result, all OpenKruise features are developed based on GitHub collaboration, submission and review, and all processes are open to the community and anyone can participate. Ali’s internal fork repository retains only a small number of adaptive interfaces and maintains a consistency rate of over 95% between internal and external codes.

  • Develop reasonable open source path of functions. The extension features in ASI are very rich, but not all of them are suitable for any native Kubernetes, and many of them are not perfect enough to be better designed and implemented. So Alibaba chose to start with features that were mature and easy to use, yet universal and backward compatible, and gradually open them up to the community.

In November 2020, Ali donated OpenKruise to CNCF Foundation for trusteeship, and will apply for CNCF Incubation in early 2022.

Why is it a major upgrade

In March 2021, OpenKruise released v0.8.0. Prior to this release, OpenKruise focused more on the Workload area, with CloneSet, Advanced StatefulSet, SidecarSet and other features catering to a wide variety of business and container deployment scenarios.

However, Alibaba Cloud technology team believes that OpenKruise, as a Kubernetes application automation management project, should not be limited to application “deployment”. Therefore, in 2021, the team proposed “More than Workloads”, from V0.8.0 to v1.0, OpenKruise application management support is expanding.

Multiple types of enhanced Workload

First, in the latest v1.0 major release, OpenKruise offers several enhanced Workload types.

According to Wang Siyu, Kubernetes’ native Workload can only meet 40% to 60% of simple and general scenarios in real production environment, but these do not include many super-large and complex business scenarios from Alibaba and other Internet companies. Therefore, OpenKruise has made many improvements for these scenarios, such as the stateless application management load CloneSet with benchmarked Deployment.

The following table compares the differences between CloneSet and Deployment in terms of scalability and release capacity. As you can see, CloneSet meets many business requirements in real production scenarios that Deployment does not have.

In-place upgrades have been significantly improved

In v1.0, OpenKruise also made significant enhancements to the core functionality of in-place upgrading.

Compared with the current way of deleting and creating a Pod in Deployment upgrade, in-place upgrade can keep Pod objects, Node, IP, Volume mounted Volume and data unchanged. Even when one container in Pod is upgraded in place, the other containers are kept running normally.

It is understood that in the case of super-large clusters and business release peaks, the in-place upgrade not only ensures the stability of the release, but also optimizes the release efficiency by 60%~80% compared with a large number of Pod reconstruction upgrades. There are two main types of in-place upgrade:

  • In-place upgrade for container images. Modify the image field in Pod by Kruise Controller. After modification, Kubelet will sense that the hash value of the corresponding container in Pod has changed, and then stop the old container. Then pull, create, start, and so on again with the new container (image) in Pod.

  • For in-place upgrades of fields such as container environment variables defined through the PROGRAM API. The Kruise-Daemon component on each node brings the GENIE API into the container to calculate the real hash value. When the hash value changes, i.e. the labels/ Annotations of the PRECEDING API reference are updated, Kruise-Daemon will stop the current container through the CRI interface. Kubelet found that the container stopped and then rebuilt the new container according to the Pod, which took effect with new environment variables and other configurations.

According to Wang Siyu, considering the changes to the enterprise architecture and design, the Kubernetes community currently only has a proposal for VPA, i.e. resource in-place upgrade, and more such as mirror in-place upgrade in the cloud native community only OpenKruise is doing. As of v1.0, OpenKruise offers in-place upgrades for fields such as container Image and ENV/Command/ARgs through the program API.

High availability protection is improved

As we all know, Kubernetes’ end-state oriented automation is a “double-edged sword”, which not only brings declarative deployment capabilities to applications, but also potentially magnifies some misoperations by end-state. For example, in normal condition (non-orphan delete), once the parent resource is deleted, all subclass resources are deleted in association:

When a CRD is deleted, all corresponding CRS are cleaned up. Delete a namespace. All resources in the namespace including Pod are deleted at the same time. To delete a Workload (Deployment/StatefulSet /…). , all subordinate pods are deleted.

Any enterprise production environment in the occurrence of large scale error deletion is not affordable, so many community Kubernetes users and developers are complaining about similar “cascade deletion” problems. Therefore, the first protection function of OpenKruise is to protect the “cascading deletion” mechanism.

To put it simply, Kruise will help the user check whether there is cascade risk when the CRD, namespace and Workloads are called to be deleted after the user tags CRD, namespace and Workloads to prevent cascade deletion. For example, there is Pod running and service under a namespace. Kruise forbids deleting the namespace directly to avoid deleting service PODS by mistake.

In addition, OpenKruise also offers an enhanced version of the native Pod Disruption Budget (PDB), Pod Unavailable Budget (PUB). PDB only protects Pod eviction operations, while PUB protects all operations that make pods unusable, including eviction operations and more Pod deletions, in-place upgrades, etc.

Upgrade operations

One aspect of Kubernetes that gets a lot of flack right now is that it encapsulates the underlying Container Runtime too tightly.

There is only one Pod resource for container creation at Runtime layer. In addition, there is no interface for users to perform Runtime related operations through Kubernetes API layer, such as pulling images and restarting containers, but these are realistic demands from business scenarios.

Since Kubelet lacked a plugin-like extension mechanism, OpenKruise created a node component called Kruise-Daemon. Kruise-daemons can understand some CRDS and extended protocols defined by OpenKruise and communicate with the Container Runtime Interface (CRI) on their own nodes to transfer operations on node containers. In this way, OpenKruise provides CRDS such as image preheating and container restart. Users can submit YAML to specify the image image to be preheated, or specify one or more containers in Pod to perform restart.

In addition, the latest version of OpenKruise also supports operation and maintenance functions such as resource cross-namespace distribution and container startup sequence control. The former allows you to distribute a ConfigMap, Secret configuration to a set of namespaces, while the latter allows you to control the startup order of strongly dependent containers in pods.

Next step: runtime

According to Wang Siyu, different users will use OpenKruise focus will be different.

Alibaba, Ctrip and other companies have actually taken OpenKruise as a unified application load for business deployment. For example, most of alibaba’s e-commerce and lifestyle services are deployed and managed through CloneSet, while middleware such as Nacos is deployed through Advanced StatefulSet. Some companies use some of OpenKruise’s capabilities on demand. Some use SidecarSet to independently manage, inject, and upgrade sidecar containers, while others rely on enhanced operations such as image preheating and container restart.

In Wang Siyu’s opinion, Currently OpenKruise Workload has become mature, which can meet most of the more common application deployment and release scenarios, but around the problems of Kubernetes runtime, there is still a lot to be improved and improved.

“We have received feedback more than once from users that probe configuration errors or probe errors occurred when using LivenessProbe of native Kubernetes, resulting in abnormal restart of all pods in the entire application, but the process in Pod is normal. The entire application went out of service, triggering a major failure.” According to Wang, OpenKruise will then define a set of bypass LivenessProbes and allow users to define a traffic limiting policy when triggering a restart, thus avoiding unavailability of the full Pod in the application.

According to Wang, OpenKruise is working on an exploratory project called ControllerMesh. This project uses a Proxy container to intercept the user’s operator (Controller) communication with Kube-Apiserver by modifying and forwarding the request/return data at the Proxy layer. In this way, policies such as multi-lease deployment, dynamic isolation, gray scale upgrade, fault injection, and client fusing are implemented for operators.

“This is an unprecedented and powerful extension to the Kubernetes Controller runtime without any intrusion on the user operator itself.” “Wang Siyu said.

Guest Introduction:

Siyu Wang (guest name: Wine Lovers), ali Cloud technology expert, OpenKruise Maintainer, Kubernetes member, several KubeCon summit lecturer, has many years of experience in scheduling and management of super large scale container and cloud native field.

Click here to view the official homepage and documentation of OpenKruise project!! Release the latest information of cloud native technology, collect the most complete content of cloud native technology, hold cloud native activities and live broadcast regularly, and release ali products and user best practices. Explore the cloud native technology with you and share the cloud native content you need.

Pay attention to [Alibaba Cloud native] public account, get more cloud native real-time information!