On June 26, KubeCon + CloudNativeCon, the top conference in the cloud native field, came to an end. As a leading enterprise in cloud native technology and application, Ali Cloud fully displayed the cloud native product family and open source panorama, allowing enterprises and developers to easily enjoy the technology dividend of cloud.

Today, Ali Cloud has become the most comprehensive technology company in the field of cloud native open source contribution in China, covering scheduling, job management, serverless framework, etc. :

● Led the maintenance of etCD, Containerd, Dragonfly and other CNCF star projects, more than 10 projects have entered CNCF scape; ● Project construction level: Actively build Kubernetes project, the contribution ranked top 10 in the world; ● Open source ecological support: join CNCF, OCI, CDF and other foundations, become a top member of several foundations, to build open source ecology.

During the conference, Ali Cloud senior technical expert, China’s first CNCF TOC Li Xiang shared Alibaba’s experience in exploring cloud native technology, as well as solutions to the challenges of cloud native application landing.

Large-scale practice is the only way of cloud native landing

Cloud native is a new concept, but also a set of methodology involving the collaboration of the whole IT technology stack, simple product research and development is far from enough, only through large-scale scene practice to achieve all-round efficiency improvement, and achieve the implementation.

Ten years ago, Alibaba met the challenge of mass traffic ahead of other companies, so IT decided to upgrade its IT architecture and began to solve the traffic pressure through container practice cloud native technology system. Under the circumstance that there is no example for reference in the whole industry, it has gradually explored a set of container infrastructure architecture comparable to that of global first-line technology companies and serving the whole group, and opened the precedent for Chinese companies to apply cloud native technology system in e-commerce, finance, manufacturing and other fields on a large scale.

Although this exploration process is lonely, it has been consistently persisted up to now. It is in the process of the all-or-nothing technological exploration and progress, the complete experience of all the key nodes in the wave of cloud native technology, not only become an important witness of the technological revolution, but also gradually become one of the promoters and leaders of China’s cloud native technology system.

There is no doubt that Alibaba’s Internet scale and complex business scenes are natural advantages to promote cloud origanization. Driven by the cost pressure of Double 11, resource cost and efficiency optimization have become the starting point of cloud origanization. That is, starting with containers, research on low-cost virtualization and scheduling technologies:

  • Provide flexible, standard deployment units;
  • Replace static resource allocation with dynamic on-demand scheduling to further improve deployment efficiency, solve the problem of resource fragmentation, and increase deployment density.
  • Technologies such as storage network virtualization and storage computing separation enhance task mobility, improve resource reliability, and reduce resource costs.

Driven by the cost of resources, Ali completed full containerization, and resource allocation was taken over by efficient scheduling platform. And Ali’s cloud native does not stop there, improving the efficiency of research and development and speeding up the iteration cycle is the secret weapon to promote the business enhancement of Ali.

In order to reduce the difficulty of application deployment and improve the degree of deployment automation, Ali began to adopt Kubernetes as the container orchestration platform, and continued to promote the performance and scalability of Kubernetes, introduced the application standardized management such as Helm, and began to explore the service grid at the same time. It is committed to further improving the universality and standardization of service governance and lowering the threshold for developers to use it.

In March this year, Zhang Jianfeng, president of Alibaba Cloud Intelligence, announced that Alibaba Group will realize the full cloud in the next two years. After the exploration and transformation of cloud origin, ali’s infrastructure system is modernized and standardized.

  • The application is decoupled from the host machine by using container technology.
  • Using Kubernetes abstraction of Pod and Volume, the realization of a variety of resources unified;
  • Through intelligent scheduling and PaaS platform, it is possible to automatically migrate applications and repair unstable factors. Alibaba greatly reduces the difficulty of accessing the cloud through cloud native technology.

In the process of improving the efficiency of resources and personnel, the entire infrastructure is also becoming more open, connected to the open source ecosystem, and constantly absorbing and contributing good ideas, technologies and ideas through exchanges and interactions. Today, Alibaba Cloud not only supports The largest cloud native application Double 11 in China, but also owns the largest public cluster and mirror warehouse in China. As the only vendor selected by Gartner in the competition pattern of public cloud container services, Ali Cloud has also accumulated the most abundant and valuable customer practices.

Continuous optimization to improve the efficiency of enterprises and developers

Elasticity and scale are key factors that underpin Alibaba’s various types of complex scenarios and traffic peaks. Alibaba continues to optimize performance, which can be divided into four dimensions: workload tracking, performance analysis, customized scheduling, and mass mirror distribution. First of all, there is a complete tracking and replay mechanism for workload scheduling, and then all performance problems are carefully analyzed to overcome technical bottlenecks one by one.

After continuous polishing, Alibaba has made remarkable achievements in the scale and performance of Kubernetes: increasing the number of storage objects by 25 times, increasing the number of supported nodes from 5000 to tens of thousands, and changing the end-to-end scheduling delay from 5s to 100ms.

A lot of this work has been done with the community, and the results of this research and development have been contributed to the community, and other businesses and developers can also enjoy the technology dividend of Alibaba’s scale.

Kubernetes itself is highly customizable. Alibaba has developed customized scheduling ability and image distribution system for its own business scenarios. For example, the open source Dragonfly project originated from Double 11, which has strong image distribution ability.

Generally speaking, Alibaba’s landing in Kubernetes can be divided into three stages:

  • First of all, Kubernetes provides resource supply, but it does not interfere with the operation and maintenance process much. This system container is a rich container, which brings the image standardization and lightweight virtualization capability to the above PaaS platform.
  • The second step is to transform the operation and maintenance process of PaaS platform in the form of Kubernetes Controller to bring stronger end-state automation capability to PaaS.
  • Finally, the traditional heavy mode such as running environment is changed to the lightweight mode of native container and POD. Meanwhile, the PaaS capability is completely transferred to Kubernetes Controller, thus forming a completely cloud native architecture system.

How do enterprises respond to the challenges of cloud native landing?

Alibaba Cloud’s original exploration started from self-developed container and scheduling system, and now embraces open source standardized technology. At present, Kubernetes has become a mainstay of cloud native ecology. It not only shields the underlying details downward, but also supports various surrounding business ecology upward. On the other hand, there are more and more open source projects built around Kubernetes, such as Service Mesh and Kubeflow.

However, there are many challenges in the evolution of cloud native technology architecture, the most difficult challenge actually comes from the management of Kubernetes itself. Because Kubernetes is relatively young, its own operation and maintenance management system ecosystem is not yet perfect. For Ali, tens of thousands of cluster management is crucial. We explore and summarize four methods:

  • Kubernetes on Kubernetes, using K8s to manage K8s itself;
  • Node release rollback strategy, according to the rule requirements grayscale release;
  • The environment is segmented into simulative environment and production environment.
  • And in the monitoring side of the full effort, Kubernetes become more white box and transparent, early detection of problems, problems prevention, problem solving.

At the KubeCon Conference, Aliyun announced two major projects: Cloud Native App Hub — Kubernetes Application Management Center for all developers, OpenKruise — Kubernetes automation open source project set from the world’s top Internet scenarios.

The Cloud Native App Hub makes it easy for users to access application resources and greatly simplifies the steps for Kubernetes to deploy and install an application. OpenKruise/Kruise project is committed to becoming a “cloud native application automation engine” to solve many operation and maintenance pain points in large-scale application scenarios.

The original link

This article is the original content of the cloud habitat community, shall not be reproduced without permission.