Explore the container governance and practice under multi-cloud and cross-cloud

Abstract:Cloud native technology and cloud market continue to mature, multi-cloud, multi-cluster deployment has become normal, the future will be the era of programmatic multi-cloud management services.

This article is shared from Huawei Cloud Community “Huawei Cloud MCP Multi-cloud Cross-cloud Container Governance and Practice”, the original author: technology torchbearer.

At the Huawei Developer Conference (Cloud) 2021, Zhang Yuxin, Huawei Cloud CTO, announced that Karmada, a multi-cloud container choreography project, was officially open source. On April 26, Wang Zefeng, the person in charge of Huawei Cloud native open source, and Xu Zhonghu, a senior engineer of Huawei Cloud, delivered a keynote speech entitled “Huawei Cloud MCP Multi-cloud Cross-cloud Container Governance and Practice”, sharing the development of Huawei Cloud MCP and the core black technology of Karmada project.

The speech mainly includes five aspects:

1) Cloud native multi-cloud status and challenges 2) Huawei cloud MCP history 3) Karmada project 4) multi-cluster service governance 5) summary and outlook

Cloud native cloudy status and challenges

According to a recent survey report, more than 93% of enterprises are using services from multiple cloud vendors. Cloud native technology and cloud market continue to mature, multi-cloud, multi-cluster deployment has become normal, the future will be the era of programmatic multi-cloud management services. When a business deploys to cloud or multiple clusters, it is actually divided into several phases:

Typical stage 1: Deploy in cloudy and multi-location, manage operation and maintenance uniformly, and reduce repetitive work

The first stage, we think it’s more deployment operations management, can be understood as a number of interoperability, interoperability means that in a different environment, different cloud, the technology of software used to stand is a set of standardized, in the public cloud and public cloud 1, public cloud 2 when switching to each other, by operating command input command request are all the same. However, there is no business correlation between them or the business correlation is very weak. At this time, the unified application delivery, such as deployment operation and maintenance, can be done by manually executing repeated commands or scripting, or simply using a set of CI/CD system to stack them. At this stage, most of the services are relatively fixed, which public cloud, which data center and which computer room they are deployed in do not need too much dynamicity and variability.

Typical stage 2: cloudy unified resource pool to cope with business pressure fluctuations

The second stage is to unify the resource pool, so there will be some demands for the dynamics of the resource pool. In general, what we think of as application delivery here is not a simple CI/CD, because we want to be dynamic and traffic to migrate with it. In this case, the application delivery above requires automatic scheduling capabilities, and traffic can be captured by itself based on the distribution of the number of instances. Of course, there are other situations, such as some simple scripting to handle your traffic, which can also be considered phase two. And of course, ideally, we would think that this would be fully automated.

Typical stage 3: multi-cloud collaboration, unified application platform, and cross-cloud deployment of business

The third phase is the final shape of what we think is currently foreseeable as a cloudy and multi-clustered environment, and what we think is an ideal shape. In fact, whether using clusters, Kubernetes or previous virtual machines, from the entire history of cloud computing, in fact, has been constantly breaking the boundaries, or to redefine the boundaries. Such as the earliest time with some new applications and deployment of new services, need a physical server, and the boundary is not flexible, when a virtual machine, the container later, smaller the particle size, but in access across different machine environment form, and produced a lot of new challenges, so the emergence of Kubernetes actually after produced so many delicate degree of the container, Redraw a large cluster as the boundary.

Cloudy is actually on the basis of these constantly evolving boundaries. When it comes to a certain stage of development, it is limited by data centers or clouds. Cloudy technology can be used to break through the cloud boundary and the boundary of clusters.

But in fact, under the cloud native topic, cloudy still has a lot of challenges, for the following reasons:

Clusters are numerous: tedious and repetitive cluster configurations, cluster management differences among cloud vendors, and fragmented API access points
Business decentralization: Differentiated configuration of applications across clusters, cross-cloud access of services, and application synchronization across clusters
Cluster boundary constraints: resource scheduling is cluster bound, application availability is cluster bound, and elastic scaling is cluster bound
Vendor binding: Business deployment “stickiness,” lack of automatic failover, lack of neutral open source multi-cluster choreography projects

Huawei cloud MCP process

In Kubernetes, the concept of multi-cluster appeared very early, and Huawei was also one of the earliest initiators. In 2015, we proposed the concept of Federation in the community. The development of version V1 of Federation was started in 2016, and it was independent in 2017. Developed as a separate sub-project of Kubernetes. In mid-2017, we launched Federation V2 development. In terms of commercialization, Huawei actually started a large platform for the whole commercialization in the middle of 2018, and provided the commercial capability at the end of the year. However, we also found some problems in the process of serving customers for a long time, so in 2020, we started a new engine, Karmada.

Developed based on Kubernetes Federation V1 and V2, Karmada can run cloud native applications across multiple Kubernetes clusters and clouds without any changes to the applications. By directly using the native Kubernetes API and providing advanced scheduling capabilities, Karmada can implement a truly open and cloudy Kubernetes.

Karmada project

The image above shows the view of a cloud and cluster of technology sites that we think should be present in the open source community, and the gray boxes in the image are all the capabilities that Karmada wants to cover. From the perspective of data surface, storage and operations-related dimensions, we need to address the multi-cloud multi-cluster of container network, multi-cloud multi-cluster of service discovery and even traffic governance, and the persistence of raw data, which will be covered by Karmada project in the community.

In the initial stage, we will focus on several aspects. One is the compatibility with K8S native API. This feature is actually a slight obstacle of the original Federation V2, because people are used to using K8S API instead of the new API, so when we do the new Karmada project, Direct adoption of native APIs to provide the ability to deploy multiple clusters of applications.

In terms of cluster synchronization, we will support a variety of network modes, including control surfaces in the public cloud, subsets in the private cloud or vice versa, and even edge scenarios, which can be covered by the Karmada project, and we will have built-in out-of-the-box capabilities to achieve the lowest cost of adaptation.

There is a unified control surface on the Karmada architecture diagram, and we actually have a separate API-server to generate the Kubernetes native API as well as the additional policy API capabilities provided by Karmada to do the core functions of auxiliary advanced scheduling. In terms of cluster synchronization, we have two modes of central Controller and Agent, which respectively correspond to the situation that the control surface and subset group have clouds in public and private or inverted.

In the cloud edge network environment, it needs to manage an edge cluster. Therefore, we will combine the optimization of KubeEdge in the whole network environment to provide the edge cluster management capability.

Core values of Karmada project:

K8S native API compatibility, rich cloud native ecology
Embedded policy, out of the box
Rich multi-cluster scheduling support
Cluster resources are spatially isolated
Multi-mode cluster synchronization, shielding region, network restrictions

Multi-cluster application deployment

1) Zero Transformation — Deploy a multi-clustered application using the K8S native API

Example policy: Configure a multi-AZ HA deployment scenario for all Deployment

Deployment applications are defined using the standard K8S API

kubectl create -f nginx-deployment.yaml

2) Propagation Policy: Reusable application multi-cluster scheduling Policy

resourceSelector

Support for associating multiple resource types
Support for object filtering using Name or LabelSelector

placement

clusterAffinity:

Define the target cluster for propensity scheduling
Supports filtering by Names or LabelSelector

clusterTolerations:

Similar to Pod tolerations and node taints spreadConstraints in a single cluster:

Defines the HA policy for application distribution
Supports dynamic grouping of clusters: grouping by Region, AZ and Feature Label to achieve different levels of HA

3) Override Policy: A differentiated configuration Policy that can be reused across clusters

resourceSelector

Support for object filtering using Name or LabelSelector

overriders

Many override plug-in types are supported
PlainTextOverRider: Basic PlainTextOverRider, plain text manipulation replacement
ImageOverRider: A differentiated configuration plug-in for container images

Multi-cluster service governance

Multi-cluster service governance issues to be addressed

Service discovery
The DNS
Load balancing, fuse, fault injection, flow segmentation and other advanced flow management
Access security across the cloud

The advantage of ServiceMesh

The above diagram is a typical architecture of Istio ServiceMesh. Istio is a completely non-intrusive system that intercepts traffic sent by applications through automatic Certificate injection and manages traffic through certificates.

The basic functions of Istio:

1) Traffic management

Improve the Resilience of the entire system through Resilience (circuit break, timeout, retry, fault injection, etc.)
Grayscale Release: It makes it easier for us to deploy new versions online faster
Load balancing, route matching, basically can replace the original kind of micro-service governance framework

2) Security: data encryption, authentication, authentication

3) Observability: It is more convenient for operation and maintenance personnel to diagnose system faults. The three typical observable indicators here are Metrics, Traces and Access Log.

How to select the technology of service grid under multi-cluster and multi-cloud scenario?

Next, the technical details will be explained in detail from the following three dimensions:

Flat network vs. non-flat network
Single service grid vs. multiple service grid
Single control surfaces vs multiple control surfaces

1) Multi-cluster service grid – flat network

Advantage:

Low latency for east-west service access

Disadvantages:

Networking complexity
Security: All workloads are on the same network.
Scalability Scalability: POD and service IP addresses do not conflict

2) Multi-cluster service grid – non-flat network

Advantage:

Network isolation, relatively higher security
Simple network
Scalability: network addresses could be scaled

Disadvantages:

Access to services across a cluster needs to go through an east-west gateway
Gateway work relies on TLS Auto Passthrough

3) Non-flat network – single control surface

Single control surface (can be deployed in a user cluster or fully hosted)
Service discovery
Configuration found
Split Horizon EDS
East-West gateway

4) Non-flat network – multiple control surfaces

Control surfaces are deployed in each cluster
Service discovery
Configuration found
Sidecar connects to the ISTIO control facets within the cluster, providing better performance and availability than single control facets

5) Non-flat network – east-west gateway

Gateway address acquisition
Split horizon EDS:

apiVersion: networking.istio.io/v1beta1

kind: Gateway
metadata:
name: cross-network-gateway
namespace: istio-system
spec:
selector:
istio: eastwestgateway
servers:
- hosts:
- '*.local'
port:
name: tls
number: 15443
protocol: TLS
tls:

mode: AUTO_PASSTHROUGH

Network filter: “envoy.filters.net work. Sni_cluster” P.S. Karmada community technology communication address

The address of the project: https://github.com/karmada-io…

Slack address: https://karmada-io.slack.com

Click on the attention, the first time to understand Huawei cloud fresh technology ~