From May 20 to 22, the 13th China System Architects Conference (SACC2021) was webcast on cloud, with the theme of “Digital Transformation, Architecture Remodeling”. Jiang Cen, the edge cloud native technology expert of Ali Cloud, shared ali Cloud’s exploration and practice in edge cloud native, and elaborated the core competitiveness of the product from the aspects of coping with technical challenges and system architecture design, so as to drive business development with innovative technologies.

Cloud native development and current situation

The picture

With the maturity of cloud computing technology, most enterprises choose cloud computing to rapidly deploy operational business. The large-scale commercialization of 5G has promoted the networking of tens of billions of terminal devices around the world. Customers’ demand for near-end quasi-real-time computing with low latency and large bandwidth will greatly increase. On the one hand, the growth of the edge cloud computing market comes from the sinking edge of the central business, and on the other hand, the emergence and development of various edge innovative business scenarios, such as cloud games and smart cities.

Jiang Cen believes that the cloud on the enterprise business system, whether it is the upper center cloud or the edge cloud, metropolitan experienced three stages:

In the migration of self-built IDCs, the service architecture will not be greatly adjusted based on factors such as stability and DISASTER recovery. Most of them only use the most basic cloud services, such as ECS, SLB, and VPC.

In terms of the overall business of cloud, from the perspective of the ability to comprehensively reuse cloud and improve efficiency and reduce cost, the evolution of cloud architecture has gradually begun to apply gray scale.

When everything was in place, businesses began to embrace cloud native in a big way.

At present, many cloud businesses are already pushing cloud biogenesis on a large scale.

The concept of cloud native first came from CNCF Cloud native Computing Foundation and Kubernetes platform incubated by Google. Founded at the end of 2015, CNCF has incubated a large number of high-quality projects that meet the cloud native standards. Its core modules include database, message-oriented middleware, application orchestration and scheduling, CICD continuous integration, RPC, service grid, container service, cloud native network and so on.

Today, cloud native technology is no longer limited to the container /Kubernetes domain, and is increasingly becoming the standard architecture for cloud vendors’ neutral hardware and software infrastructure. Edge computing is a technology gradually emerging with the application of 5G and Internet of Things technology in recent 3-5 years, and its technology maturity is far lower than central cloud computing. At present, there are not many projects involving edge computing on CNCF. With the improvement of edge scenes and supporting capabilities, a large number of businesses in the center sink to the edge, and edge innovation scenes keep emerging, which will inevitably lead to the emergence of cloud native technologies in line with edge characteristics.

Challenges facing edge cloud native evolution

The picture

When talking about how cloud native technology is evolving to the edge, Jiang cen mentioned three technical challenges:

From the perspective of resources, the edge is different from the large-scale centralized layout of the center, which is mainly built with the goal of distribution and high regional coverage. In addition to the central standard cloud server, there are a large number of heterogeneous resources on the edge, including Internet of Things devices, MEC, co-built nodes and so on. Cloud native technology has specific requirements on the deployment environment, so it needs to flexibly adapt to the massive heterogeneous resources on the edge. In addition, edge nodes are small and numerous, so improving resource reuse rate is the key, which requires flexible scheduling based on resource pooling capability and resource performance.

From the point of technical ability, edge of cloud infrastructure, cloud native ability to sink application directly to the edge, in addition to the need to provide the equivalent performance indicators in the center of the disaster, safety isolation, autonomy, architecture, cognitive ability, also need to constantly improve cloud edge and edge high-speed channel construction, raise the construction difficulty coefficient.

When resources and technical capabilities are available, maintaining a consistent user experience is a challenge. From a user perspective, the center is a long process, the sinking process of the business for a single business center and edge may be in the condition of long-term coexistence, capacity building, on the edge of the cloud is likely to exist, most of the inconsistent to the user should be non-inductive, so how to packaging products, in cost, function, performance, stability and so on various aspects to achieve consistent experience cloud edge, It’s very challenging.

Ali Cloud edge cloud system construction

Relying on 2800+ edge cloud nodes all over the world, Ali Cloud provides users with secure, stable and reliable edge computing and content distribution acceleration services, and builds the edge cloud infrastructure closest to users. A single node is a small IDC, ranging in size from a few to dozens of servers. In the early stage, the edge cloud node building strategy was to build points separately from CDN, resulting in resource sharing failure and lack of services. At present, the construction strategy is to promote CDN ON ENS resource fusion production, integrate edge computing power resources, and bring greater possibility to time-sharing reuse of resources after fusion.

The picture

As the most mature edge cloud application scenario, CDN has undergone long-term technical architecture evolution, and its infrastructure hardware and software architecture can be reused to edge cloud technology. The source site is usually the server built by the enterprise, and the scale and performance are relatively limited compared with the central cloud. However, with the growth of the business, faced with massive client requests, if there is no CDN, the enterprise can only increase resource input. Otherwise, the response of the server may be timed out or even service paralysis. CDN, through multi-level cache and global DNS scheduling capability, enables users to access the resources needed nearby (especially static resources such as pictures and videos) to avoid excessive pressure on the source bandwidth and server. CDN can be considered as having the typical characteristics of edge cloud computing with low latency and large global bandwidth due to the convenient access of users in different regions. The monitoring, data intelligence, configuration management and other systems supporting CDN are equipped with the ability of standard edge mass data distribution, processing and interaction with the center, and will gradually evolve into the supporting standard system of edge cloud native.

The picture

According to the capability model definition of aliyun edge cloud, it can be seen that: On the resource side, heterogeneous resources (including traditional physical machines, cloud connected nodes, IoT/MEC devices, ARM array servers, etc.) are pooled into cloud, and edge cloud node operating system is provided to virtualize computing, storage, and network resources. And combined with the container /K8S standard cloud native ability to build modular capacity and corresponding to the edge standard ecological extension of the output community, such as business-oriented full network full group application life cycle management, layout and release capabilities, corresponding to Ali Cloud has the edge CRD operator EdgeWorkload definition ability. Define OAM choreography extension capabilities. Platform administrators, such as multi-cluster management, tenant isolation, and metadata management, also need to customize corresponding capabilities in the scenario of massive data and users at the edge. In addition, there are a large number of distributed heterogeneous resources on the edge, so how to maximize the utilization of resources depends on the global container scheduler combined with the global traffic scheduling and distribution policies related to services. Elastic scaling HPA/VPA scenarios are also edged-oriented solutions.

Ali Cloud has resources all over the world, so it needs to define regional planning strategies for heterogeneous resource management modules and plan access, centering on the mode of central control + edge autonomy + multiple cache.

Considering the architectural complexity of edge cloud, the number of massive nodes, the difference of heterogeneous resources and other factors, Ali Cloud improves the system stability by constantly improving the observability of the system and strengthening the ability of Devops operation and maintenance construction.

At the same time, Aliyun edge cloud native has technical advantages such as wide coverage of heterogeneous fusion, consistency of cloud-side experience, compatibility of standard cloud native, and all-domain mobility of computing power.

Typical edge cloud service applications

The picture

The early CDN node architecture is mainly planned and deployed according to resources. 2 LVS+ is less than 4 control machines, and the rest are cache machines, which belongs to the deployment mode of planning first. There are many idle resources, and the construction cost is wasted. The comprehensive promotion of CDN ON ENS edge fusion computing can greatly improve the efficiency of resource utilization.

Cloud on intelligent terminals is an important scenario for large-scale access of IoT devices in the future, which involves the coordination of typical edge global container scheduling and traffic scheduling. The central management controller applies for resources based on the estimated number of users, connects to clusters, and deplores containers on edge nodes. When users request for connection, the central management controller obtains idle edge containers from the central management controller based on predefined traffic scheduling policies, and binds user devices to server containers. When the user is disconnected, the container is destroyed and rebuilt for subsequent use of other services to avoid data leakage. The central control will dynamically expand or shrink the container based on core indicators such as concurrent requests in real time.

The number of large-scale regions in the center is limited. When customers are sensitive to delay, they are preferred to deploy services and process customer requests on nearby edge nodes. To ensure consistent experience on the cloud side, the service control system needs to obtain service data from both the central and the edge, and then distribute traffic based on user requests. In this way, the cost and resources of central bandwidth are reduced and user experience is improved.

Finally, Jiang Cen said that Ali Cloud edge cloud native technology will continue to improve scheduling, resources, collaboration and other capabilities, to provide the best cloud native application experience for industry customers and partners, and jointly create edge cloud innovative applications.