Sailing with Karmada: Multi-cluster management of massive nodes

On KubeCon2021, Shen Yifan, cloud platform architect of ICBC Software Development Center, and Wang Zefeng, head of huawei Cloud native open source, delivered a speech titled “Sailing with Karmada: Multi-cluster Management of Massive Nodes”, sharing the practice process of ICBC multi-K8S cluster management.

* The introduction of Karmada project was shared by Zefeng Wang and the rest by Yifan Shen

Construction status of ICBC cloud platform

Icbc’s current business cloud scenarios are diverse, including core business applications represented by payment lines such as The Spring Festival red envelope, technical support applications represented by MySQL and Redis, and new technology fields such as block chain and artificial intelligence.
At present, ICBC cloud platform has also carried out in-depth customized research and development based on mainstream open source projects to ensure the overall autonomy and controllability.
In terms of construction, it is also the largest container cloud in the industry, with the current number reaching more than 280,000.

Typical business requirements and the state of cloud native infrastructure

In the context of large-scale cloud native infrastructure, specific requirements and current status of ICBC’s business for us:

Typical business requirements include: highly available deployment, elastic scaling and scheduling across clusters, and a specific dependency on K8s version for some business products.

Based on the above situation, the current status of ICBC’s cloud native foundation is as follows:

A single cluster has high reliability requirements. Our overall number of nodes in a single cluster is below 2000, in order to reduce the impact range of the failure rate of the cluster.
The resource pool rapidly increases with services. At present, the new business applications are fully on the cloud, the existing applications are continuously migrating to the cloud, and now the core applications have all been in the cloud.
Business-level heterogeneous cluster. The business is dependent on a particular VERSION of K8s, and there are a lot of heterogeneous CNIS, CSI, and some underlying hardware heterogeneity.
Multi-location, multi-center, multi-cloud construction status, ICBC business including head office cloud, branch cloud, ecological cloud and so on. The fault domain aspect is the data center construction of two or three centers, and there is also a more fine-grained division within the data center in terms of multiple fault domains.

The key challenge

Based on the above situation, the number of K8s clusters in ICBC has actually reached more than 100, which are centrally managed by container cloud platform at present. However, in the process of sustainable development, we also face the following problems:

Availability is limited because the K8s cluster is itself a failure domain and there is currently a lack of automatic recovery across failure domains.
Resources are limited, and overall application scheduling and elastic scaling are limited to a single cluster.
Cluster opacity: Because clusters are currently marked with heterogeneous, fault domain and other attributes, the business team needs to sense the underlying cluster to choose the K8s cluster, resulting in the K8s cluster itself is not transparent to upper-layer applications.
Repeat the configuration. Although our services are unified on the cloud management platform for configuration input, the specific configuration needs to be delivered to each cluster, and each cluster needs to ensure synchronization.

Design goals

In the face of these challenges, we have formulated design objectives for the management of multiple clusters:

In the aspect of multi-cluster management plane: cluster management and overall life cycle management of the cluster, and a unified standard API entry.
In terms of resource management, multiple versions of comprehensive K8s resources need to be supported, as well as multi-dimensional resource Override support.
In the aspect of inter-cluster automatic scheduling, it needs to be automatically scheduled according to the fault domain and resource allowance, and can be automatically scaled across clusters.
Disaster recovery (Dr) requires automatic cross-cluster resource recovery, and decouples the management plane from the service cluster.
In terms of compatibility, the heterogeneous cluster with a large amount of storage needs smooth management, while the project itself needs high scalability and community activity.

Joint innovation

With a clear design goal, the next step is how to implement it. First of all, for commercial products, considering the binding of a single manufacturer and not meeting the requirements of autonomy and control, they were first passed by us. After investigating Kubefed, we found that it uses non-native API, which is too difficult for our stock cluster to migrate, and the community activity has decreased recently. In the end, we chose to carry out joint research and development based on ourselves in the most suitable way for our needs. Meanwhile, ICBC Financial Cloud itself is also based on open source, so we also actively invested in open source construction and promoted the virtuous circle of community development. Finally, we chose to jointly launch the Karmada project.

Karmada project

Why not build on the original KubeFed and launch a new project instead?

In fact, we did have Kubernetes Federation V3 in mind at the beginning of the project, and prototype development was completed very quickly. However, in the process of communicating with many community users, we found that in fact, the narrow scope of Federation project does not completely cover the capability map that people expect to provide. In addition to the multi-cluster application load management capabilities contained in Federation, we also want to provide multi-cluster resource scheduling, failover and automatic scaling, as well as multi-cluster service discovery, data automation and multi-platform supported cluster lifecycle management. To truly provide users with a multi-cloud multi-cluster open source software site out of the box. Therefore, a neutral open source project hosted on CNCF would be better suited for long-term technical evolution and community development.

Overview of Karmada technology

The core architecture of Karmada

In terms of the core architecture of Karmada, we draw on the experience and experience of several community sponsors in multi-cluster management, and focus on K8s native API support and the scalability of the framework.

Slightly similar to K8s single-cluster architecture:

The Karmada control surface has its own separate APIserver to provide the K8s native API as well as Karmada’s extended API
The Karmada Scheduler supports multi-dimensional and multi-weight scheduling policies for [faulty domains, cluster resources, K8s version, and cluster enabled plug-ins], and facilitates user customization
In terms of synchronization with member cluster, the pull working mode of Karmada Agent can effectively disperse the control pressure and realize the management of super-large multi-cluster resource pool.
Karmada also supports ExecutionController and integrated KubeEdge for direct management of K8s clusters in public, private, and edge network environments.
With the ExecutionSpace design, Karmada implements isolation of access and resources between different clusters to meet the security requirements in multi-cluster scenarios.

Karmada core concepts

In the design of the core Karmada concept, we define all workload related Resource objects entered by the user as Resource templates, which is a strict K8s native API that supports deployment, Service, ConfigMap, etc. And user-defined CRD resource objects. In this way, users can directly use the original single-cluster YAML or API to create multi-cluster applications without any modification, and the original secondary development of the business platform on K8s does not need to do any modification.

For application splitting and scheduling across multiple clusters, the extended Propagation Policy supports the definition to be reused by multiple applications. Thanks to this decoupled design, in a scenario where the platform team and the business team are set up independently like ICBC, the platform team can set the strategy for the common high availability deployment model, while the business team can continue to use K8s’s native single cluster API to manage the daily business rollout and version upgrade.

Case: How do users manage their business using Karmada

Zero retrofit: Deploy a 3AZ highly available application using native apis

In this example, we can see that the first is a Propagation policy. In the Propagation Policy definition, the platform team set Resource selector and qualified all Deployment. If it has a special Label, HA mode is Multi zone replication. These applications are strictly distributed into three zones. On the right is the familiar definition of a standard Deployment API. By combining these two definitions, we can actually see that the platform team focuses on setting up the common application deployment model, while the business team focuses on its own internal application definitions such as Image, version, and Container ports. Karmada was responsible for combining the needs of the two teams to make the application highly available across regions. In the case of any underlying cluster failure, the dynamic scheduling capability can automatically make up for the missing cluster or the missing availability zone application instances to achieve cross-cluster failover.

The practice result

In the process of the positive cycle of Karmada design and r&d practice redesign, ICBC has also summarized some of its advantages, which can be said to be the experience. Resource scheduling, DISASTER recovery (Dr), cluster management, and resource management. In my opinion, the following three points are worth paying special attention to, which are particularly prominent in the real landing process.

It supports the binding scheduling of multiple resources, which ensures that k8S resources required by business nodes can be scheduled at the same time, and also greatly improves the real-time performance of resource provisioning.
Support for K8S native objects ensures that our large number of current K8S external clients require little modification.
Karmada currently supports Pull and Push distribution for a variety of scenarios. Especially in a scenario where we have a large number of clusters, using the Pull mode can greatly reduce the performance pressure on the Karmada control plane.

The follow-up plan

In terms of mass production, we hope that container cloud platform will be a user-oriented platform in the future, with Karmada as the basis for unified management of multi-cluster resources and scheduling, so as to manage 100+ K8S clusters including heterogeneous clusters.

In terms of community contribution, we want to continue community contribution. Key features of concern include smooth migration of stock applications, which can be automatically incorporated into Karmada federalization. In terms of scaling across clusters, application migration and data linkage, it continues to optimize and implement.

Finally, please visit GitHub for the Karmada project to check out the Release Note and community documentation for more features and details. If you have any suggestions or feedback during the process of using Karmada, please join the community group and discuss with us!

Attached: Address of Karmada Community technical Exchange

Project address: github.com/karmada-io/…

Slack: karmada-io.slack.com