Author: Situ Fang

Review & proofread: Tian Weijing, Xi Yang

Editing & Typesetting: Wen Yan

Introduction: In the era of cloud native, direct use of Kubernetes and cloud infrastructure is too complex, such as users need to learn a lot of low-level details, the cost of application management is high, error-prone, frequent failures. With the spread of cloud computing, different clouds have different details, further exacerbating the problem.

This article will introduce how to build a new application management platform on Kubernetes, which provides a layer of abstraction to encapsulate the underlying logic and only presents the interfaces that users care about, so that users can only focus on their own business logic and manage applications faster and safer.

The cloud native era is a very good era, and we are facing a disruptive change in the overall technology, and a comprehensive end-to-end refactoring. There are currently three key technologies emerging in the evolution of cloud native:

  • The first is containerization. Container, as a medium of standardized interaction, has been greatly improved compared with traditional methods in terms of operation and maintenance efficiency, deployment density and resource isolation. According to the latest RESEARCH report of CNCF, 92% of enterprise production systems now use containers.

  • The second is Kubernetes, which abstracts and manages the infrastructure and is now standard in cloud native.

  • The third is Operator automatic operation and maintenance. Through the mechanism of controller and customized resources, Kubernetes can not only operate and maintain stateless applications, but also perform user-defined operation and maintenance capabilities to achieve more complex automatic operation and maintenance applications for automatic deployment and interaction.

In fact, these three key technologies are gradually evolving, and there are corresponding theories in the field of application delivery that are constantly evolving along with these technologies. The rise of cloud native has brought comprehensive upgrades and breakthroughs in delivery media, infrastructure management, operation and maintenance model and continuous delivery theory, accelerating the arrival of cloud computing era.

FIG. 1 Panorama of cloud native technology (Details can be found here)

From the panorama of cloud native technology released by CNCF (see Figure 1), we can see the thriving ecology of cloud native technology. Counting the 900 + logos in the picture, there are some open source projects and start-ups, where cloud native technology will be born in the future.

Application delivery challenges presented by cloud native “operating system” Kubernetes

As mentioned above, Kubernetes has become the cloud native standard. It has a way to deploy the difference of the lower packaging infrastructure, the operation and maintenance deployment of various applications, such as stateless applications, microservices, as well as the application of new technologies, such as ststate, batch processing, big data, AI, blockchain, etc. Kubernetes has become an “operating system” in the real world. It is in the cloud as Android is in mobile devices. Why do you say that? Android isn’t just on our phones. It’s also on smart devices like cars, TVS, and Tmall, where mobile apps can run. And Kubernetes also has the potential or development trend, of course, it is not in smart home appliances, but in the public cloud, self-built room, and edge clusters. Expect Kubernetes to be as ubiquitous as Android in the future.

The container + Kubernetes interface solves all of the delivery problems. The answer is definitely not. Just think, if we only have Android on our phones, can it meet our work and life needs? No, you have to have a variety of software applications. Cloud native, in addition to Kubernetes as an “operating system,” also requires a set of application delivery capabilities. On mobile, applications can be installed via apps like Pea Pod, and in the cloud native era, applications need to be deployed on different Kubernetes clusters. However, due to Kubernetes’ vast amount of trivial facility details and complex operating languages, the deployment process will encounter a variety of problems, then need cloud native “pea pod” to solve this problem, namely the application management platform, to mask the complexity of delivery.

There are two main models for application management platforms in the industry. The first is the traditional platform model, which “puts a big hat” on Kubernetes to mask all complexity, and on top of that, provides a layer of simplified application abstraction on demand. In this way, although the application platform becomes easy to use, new capabilities need to be developed by the platform, which leads to the problem of difficult expansion and slow iteration, unable to meet the growing demands of application management.

Another solution is the container platform pattern. This model compares to cloud native, where components are open and scalable. However, it lacks the abstraction of the application layer, leading to many problems, such as a steep learning path for developers. For example, when a business developer submits his code to the application platform, he needs to write Deployment Deployment application, Prometheus rule configuration monitoring, HPA set up elastic scaling, Istio rule control routing, etc., which is not what business development wants to do.

So, no matter which solution, there are advantages and disadvantages, need to choose between. So how do you encapsulate the complexity of the platform and still have good scalability? That’s what we’ve been exploring.

The application management platform eliminates the complexity of cloud native application delivery

In 2012, Alibaba started to do containerization related research. At first, it mainly aimed at improving resource utilization and started to develop container virtualization technology by itself. With the increasing amount of machine resources to cope with the rush, we began to adopt the container hybrid cloud elastic architecture in 2015, and used the public cloud computing resources of Ali Cloud to support the rush traffic peak. This is also alibaba do cloud native early stage.

The turning point occurred in 2018. After alibaba adopted open source Kubernetes for the underlying scheduling, we changed from scripted installation and deployment mode facing virtual machines to deployment application based on standard container scheduling system, comprehensively promoting Kubernetes upgrade of Alibaba infrastructure. But soon, a new problem emerged: there were no standards, no uniformity, and everyone was “in their own place”.

Therefore, in 2019, we jointly released the Open Application Model — OAM (Open Application Model) with Microsoft, and began to make the transformation of OAM platform. Everything went well. In 2020, OAM implementation engine KubeVela was officially open source, and several sets of application management platforms were internally promoted based on OAM and KubeVela evolution. It has also driven the trinity strategy, using the same technology not only in alibaba’s internal core systems, but also in commercial cloud products for customers and in open source. Engage the entire OAM and KubeVela community by fully embracing open source.

In this exploration process, we have taken a lot of detours and accumulated a lot of experience. Next, we will give a detailed introduction and share the design principle and usage method of KubeVela to help developers understand the complete solution of cloud native application management platform, and improve application developers’ use experience and application delivery efficiency.

Cloud native application management platform solution

In the process of exploring cloud native application management platform solutions, we encountered four major challenges and summarized four basic principles, which are described below.

Challenge 1: Redundant application platform interfaces are not unified in different scenarios.

Although cloud native has Kubernetes system, it will build different application platforms in different scenarios, and the interface is completely different, and the delivery capacity is very different. For example, AI, middleware, Serverless and e-commerce online business all have different service platforms. Therefore, repeated development and operation and maintenance are unavoidable when building an application management platform. The ideal situation is of course to achieve reuse, but the operation and maintenance platform architecture mode is different, can not achieve interoperability. In addition, when business developers connect to application delivery in different scenarios, apis are completely different and delivery capabilities vary greatly. That was our first challenge.

Challenge 2: “End Oriented” does not satisfy a procedural delivery approach.

In the cloud native era, end-state design is popular because it reduces the user’s concern with the implementation process. The user only needs to describe what he or she wants, and the system can do it automatically without having to plan the execution path in detail. However, in actual use, the delivery process usually requires approval, suspension of observation, adjustment and other human intervention. For example, our Kubernetes system is in a tightly managed state during delivery, with approval for release. It is specified in The Change Management Specification of Ali Group that “for online change, the first X batches of online production environment should be observed for more than Y minutes after each batch of change.” “Publishing must be done in a safe Production environment (SPE) first, and grayscale publishing can only be done online in a production environment after the SPE has verified that there are no problems.” Therefore, application delivery is a process-oriented rather than an end-state oriented execution process, and we must consider how it fits better into a process-oriented process.

Challenge 3: Platform capability scaling complexity is too high.

As mentioned above, application platforms in the traditional mode have poor scalability. What are the common mechanisms for extending platforms in the era of cloud native? In Kubernetes system, Go Template and other Template languages can be used directly to do deployment, but the disadvantage is that flexibility is not enough, the structure of the whole Template written down is complex, it is difficult to do large-scale maintenance. Some experts might say, “I can customize a set of Kubernetes Controllers, which must have great scalability!” Yes, but less people know about Kubernetes and CRD extension mechanisms. Even if the master writes the Controller, he still has a lot of work to do, such as compiling and installing it to run on Kubernetes, and the number of controllers can’t keep growing. Therefore, making a highly scalable application platform is a big challenge.

Challenge 4: Delivery can vary dramatically in different environments and scenarios.

In the process of application delivery, the o&M capability varies greatly for different environments. For example, the development of test environment, attach importance to the efficiency of development and joint adjustment, adopt hot loading for each modification, do not repackage, go through a set of process of mirroring deployment, and deploy independent environment for developers to create as required. Another example is the pre-release joint adjustment environment, and the daily operation and maintenance demands of attack and defense drills and fault injection. In the production environment, o&M capabilities in production safety and high availability of services are required. In addition, the component dependencies of the same application vary greatly, such as database, load balancing, and storage on different clouds.

In view of the above four challenges, we summarize four core design principles of modern application management platform:

  1. A unified, infrastructure-independent open application model.

  2. Declarative delivery around workflow.

  3. Highly extensible and easy to program.

  4. Design for a mixed environment.

Principle 1: A unified, infrastructure-independent open application model.

How do you refine a unified, infrastructure-independent open application model? Take the open Application Model, or OAM, for example. First of all, its design is very simple and can greatly simplify our use of the management platform: the original consumer is faced with hundreds of apis, which OAM abstracts into four delivery models. Secondly, FROM the perspective of business developers, OAM describes the components to be delivered, and the operation and maintenance capabilities and delivery strategies to be used. Platform developers provide the implementation of operation and maintenance capabilities and delivery strategies, so as to shield developers from the details and differences of infrastructure. Through the component model, OAM can be used to describe artifacts such as containers, virtual machines, cloud services, Terraform components, Helm, and so on.

Figure 2 depicts an example of application delivery with the Open application model

Figure 2, an example of KubeVela application delivery described in OAM, contains the above four types of models. First, describe the components to be delivered when an application is deployed, usually in the form of images, workpacks, cloud services, etc. Secondly, describe the traits used after application deployment, such as routing rules and automatic capacity expansion and reduction rules. O&m capabilities apply to components. Thirdly, delivery policies, such as cluster distribution policies, health check policies, firewall rules, etc., can be declared and implemented in this stage. Finally, there is the definition of Workflow, such as blue-green deployment, progressive deployment with traffic, manual approval, and any piped continuous delivery strategy.

Principle 2: Declarative delivery revolves around workflow.

Workflow is the core of the above four models. Application delivery is essentially a choreography, which defines components, operation and maintenance capabilities, delivery strategies, workflow steps and so on in a directed acyclic graph DAG in sequence.

Figure 3 shows an example of KubeVela delivered through a workflow orchestration application

For example, the first steps before the application is delivered, such as installing system deployment dependencies, initialization checks, etc., are described in the delivery policy and performed at the beginning of the delivery; The second step is the deployment of dependencies. For example, if the application depends on the database, we can create relevant cloud resources through components, or reference an existing database resource and inject database connection strings into the application environment as environment parameters. The third step is to deploy the application itself with components, including mirrored versions, open ports, and so on. The fourth step is the application o&M capability, such as monitoring mode, elastic scaling policy, and load balancing. The fifth step is to insert a manual audit into the online environment to check whether there is a problem with the application startup, and then continue to let the workflow go down after manual confirmation. The sixth step is to deploy the remaining resources in parallel, and then do a callback via a pin message to inform the developer of the deployed message. This is our delivery process in a real world scenario.

The greatest value of this workflow is that it describes a complex, context-oriented delivery process in a standardized manner.

Principle 3: Highly extensible and easy to program.

We’ve always wanted to be able to build application modules like Lego bricks, so that platform developers can easily extend the capabilities of the application platform using the business development of the platform. However, as mentioned above, the template language is not flexible enough and does not have enough scalability, and writing Kubernetes Controller is too complicated and requires a high level of professional ability of the developer. So how can you be both highly scalable and programmatically flexible? We finally borrowed CUElang from Google Borg, which is a configuration language suitable for data templating and data delivery. It is a natural fit for calling Go, easy ecological integration with Kubernetes and high flexibility. And CUElang is a dynamic configuration language, no need to compile and publish, fast response, as long as the rule is published to Kubernetes, immediately take effect.

 Figure 4. KubeVela dynamic extension mechanism

Taking KubeVela’s dynamic extension mechanism as an example, platform developers register OAM X-Definition capability templates (OAM X-Definition) to the corresponding environment after compiling component templates such as Web services, scheduled tasks, and operation and maintenance capability templates such as elastic scaling and rolling upgrade. KubeVela installs the dependencies required by the capability runtime on the cluster of the corresponding environment based on the content of the capability template. At this point, the Application developer can use the templates just written by the platform developer, who builds an Application, Application YAML, by selecting components and operations capabilities, and publishes yamL to the KubeVela control surface. KubeVela uses Application YamL to orchestrate applications, run corresponding capability templates, and finally publish applications to Kubernetes cluster. The entire process from capability definition, application description, and final delivery is complete.

Principle 4: Design for mixed environments.

From the beginning of KubeVela’s design, we considered that the future of application delivery could be in a hybrid environment (hybrid cloud/multi-cloud/distributed cloud/edge), and delivery would vary greatly from environment to environment and scenario to scenario. We did two things. First, the KubeVela control plane is completely independent and does not invade the business cluster. Any Kubernetes plug-in from the community can be used in the business cluster to manage and manage the application, and KubeVela is responsible for managing and operating the plug-in in the control plane. Second, instead of using techniques such as KubeFed to generate a large number of federated objects, deliver directly to multiple clusters, keeping the experience consistent with single-cluster management. Push and Pull modes are supported by integrating OCM/Karmada and other multi-container cluster management solutions. Under the scenarios of central management and heterogeneous network, KubeVela can realize the capabilities of secure cluster governance, differentiated environment configuration, multi-cluster gray scale publishing and so on.

Taking the solution of aliyun internal edge computing products as an example, developers only need to write images and KubeVela files directly published to the KubeVela control plane, the control plane will distribute application components to the central managed cluster or edge cluster. An edge cluster can use an edge cluster management solution such as OpenYurt. Because KubeVela is a unified control plane for multiple clusters, it can realize the unified arrangement of application components, the different configuration of cloud-side clusters, and the aggregation of all the underlying monitoring information, so as to achieve unified observation and draw the cross-cluster resource topology.

conclusion

In summary, the four core KubeVela design principles can be summarized as follows: 1. Based on the underlying details of the OAM abstract infrastructure, users only need to care about four delivery models.

2. Declarative delivery around workflows that do not require additional startup processes or containers, and standardized delivery processes.

3. Highly extensible and easy to program: The operation and maintenance logic is coded in CUE language, which is more flexible than template language and one order of magnitude simpler than writing Controller.

4. The hybrid environment design provides conceptual abstraction around applications, such as environments and clusters, and centrally manages resources (including cloud services) that all applications depend on.

Figure 5 Location of KubeVela in aliyun’s native infrastructure

At present, KubeVela has become a part of aliyun’s native infrastructure. As you can see from Figure 5, we have built many extensions on top of Kubernetes, including resource pooling, node, cluster management capabilities, and support for workloads and automated operations. KubeVela builds a unified layer of application delivery and management on top of these capabilities so that the group’s business can adapt to different scenarios.

How will cloud native evolve in the future? Looking back over the last decade of cloud native development, an irreversible trend has been the upward movement of standardized interfaces. Why is that? From around 2010, when cloud computing first emerged, to today’s firm foothold, the computing power of the cloud has gained popularity; Around 2015, containers were rolled out on a large scale, bringing the standardization of delivery media; Around 2018, Kubernetes standardized infrastructure management by abstracting cluster scheduling and operation and maintenance. In the past two years Prometheus and OpenTelemetry have been unifying monitoring, and Service Mesh technologies such as Envoy/Istio are making traffic management more universal. From the development of cloud native, we see the problems of technology fragmentation and application delivery complexity in cloud native field, and propose open application model OAM and open source KubeVela to solve this problem. We believe that application layer standardization will be the trend in the cloud native era.

Click here to view the official website of KubeVela Project!!

You can learn more about KubeVela and the OAM project in the following materials:

Project code base:github.com/oam-dev/kubevela Welcome to Star/Watch/Fork!

Project official homepage and documents:Kubevela. IO has provided Chinese and English documents since version 1.1. Developers are welcome to translate more language documents.

Project nail group:23310022; Slack: CNCF #kubevela Channel

Join a wechat group:Please add the following maintainer micro signals to enter the KubeVela user group:

About the author:

Situ Fang, alias “Ji Feng” | Ali Cloud senior technical expert, Ali Cloud application PaaS and Serverless product line leader. Since joining Alibaba in 2010, I have been deeply involved in the cross-generation evolution of servitization and cloud native architecture, such as link tracking, container virtualization, full-link pressure measurement, remote multi-activity, cloud transition of middleware, cloud native on cloud, etc. Responsible for and leading the construction of alibaba’s open source technology and commercial products in micro-services, observability, Serverless and other fields, committed to providing mature and stable Internet architecture solutions and products for external enterprises through cloud native technology. Participated in or led the design of open source projects including KubeVela, Spring Cloud Alibaba, Apache Dubbo, Nacos, etc.