Best practices for operation and maintenance of large-scale microservice applications in the era of Serverless

Summary: The original microservice users need to build a lot of components themselves, including PaaS microservice some technical framework, operation and maintenance of IaaS, K8S, including observable components, etc. SAE provides a holistic solution to all of these aspects so that users only need to focus on their own business systems, which greatly reduces the barriers for users to use micro-service technologies. The author | Chen tao

Advantages and pain points of a microservices architecture

1. The background of the micro-service architecture

Back in the early days of the Internet, namely the Web1.0 era, there were mainly portal websites, single application was the mainstream application at that time, and the research and development team was relatively small. At that time, the challenge was the complexity of technology and the lack of technical personnel. In the Internet era of the new century, some large-scale applications have appeared, such as social networking, e-commerce, etc., and the complexity of traffic and business has also increased significantly. The R&D team with hundreds or even thousands of people has emerged. SOA solution is the product of the Internet, its core is distributed, split, etc. But because the ESB is such a single point of components, it has not been popularized very well. HSF, open source Dubbo and other technologies launched by Alibaba at that time were actually a solution similar to distributed, and the concept of micro-service architecture had already been in place at that time.

The official name of micro-service architecture was born in the era of mobile Internet. At this time, life has been fully internet-oriented, and a variety of lifestyle APPs have emerged. Compared with the era of Internet in the new century, the complexity of netizens and traffic has significantly increased. In addition, larger research and development teams have become mainstream. At this point, there is a general desire for efficiency, rather than just a few giants who need this kind of technology. The introduction of micro-service architecture and micro-service technology, such as the popularity of Spring Cloud, Dubbo and other frameworks, has greatly promoted micro-service technology.

Now we have entered a comprehensive digital era, the society is fully internet-based, a variety of units (including government and enterprise, relatively traditional units) need strong research and development capabilities. The challenges of traffic, business complexity, and the expansion of research and development teams have led to higher demands for efficiency. At this time, the micro-service architecture has been further promoted and popularized.

After so many years of development, microservices architecture is an enduring technology. Why does it continue to grow like this?

2. Advantages of microservices architecture

Let’s review the differences between a microservice architecture and a monolithic architecture, as well as the core benefits of a microservice architecture.

The problem at the core of the monolithic architecture is that the conflict domains are too large, including shared codebases. Especially prone to conflict in the development process; The unclear size of the boundaries and modules also reduces team productivity.

In the micro-service architecture, the core lies in the separation, including the decoupled R&D state and the decoupled deployment state, which greatly releases the R&D efficiency of the team. This is one of the reasons why microservice architectures are so sustainable.

3. Pain points in the era of micro-service

According to the law of conservation of complexity, we solve a problem, the problem will appear in another form, and we need to solve it. As you can see, there are a lot of pain points introduced in the age of microservices, and the core is stability. Because after the original local calls to remote calls, there may be a surge of stability points, including scheduling amplification, that is, because of some low-level remote call problems, may cause some instability in the upper layer, as well as the need to do flow limiting degradation, call chain and so on.

The complexity of locating a problem in the era of microservices has also grown exponentially, and there may be a need for service governance. And without good design and some forethought, there could be an explosion of microservice applications, including problems with collaboration between developers and testers.

After so many years of development of microservices technology, the industry actually has some solutions.

As shown above, if you want to better to play the micro service technology, in addition to develop their own business system, may be necessary to set up multiple systems, including the CI/CD, publishing systems, research and development process, the micro service component related tools, as well as the observability related real-time monitoring, alarm system, service management, call chain, etc., You also need to operate and maintain basic IaaS resources. In this era, you may also need to maintain a K8S cluster yourself in order to better operate and maintain IaaS resources.

Therefore, in this context, many enterprises will choose to build an operation and maintenance team, or middleware team, or some back-end R&D students part-time. But just think, how many companies are satisfied with the system they have built internally? What is the iteration efficiency of the system? Have you stepped into some open source pits? Have these pits been solved now? These should be a constant pain point in the minds of CTOs and architects in the enterprise.

Solution in the era of Serverless

1, Serverless era

Serverless first proposed in 2012, and then briefly reached the peak of its influence after launching such a explosive product as Lambda in 2014. However, such a new thing, suddenly into the real and complex production environment, actually has a lot of inadaptation, including the need to improve the place, so it may enter a trough in the next few years.

However, Serverless’s philosophy of “leaving the simple to the user and the complex to the platform” is actually a step in the right direction. So in the open source community, including the industry, in fact, are continuing to carry out some exploration and development of Serverless.

Aliyun launched Function Compute (FC) in 2017 and Serverless application engine SAE in 2018. In 2019 and the following years, Aliyun continues to invest in the field of Serverless. Support includes mirroring deployment, reserve strength, micro-service scenarios, etc.

2, Serverless market overview

In the latest Forrester assessment in 2021, Aliyun’s Serverless product capability is the first in China and the world leader, and Aliyun’s Serverless user ratio is also the first in China. This side shows that Ali cloud Serverless is already more and more into the real production environment of the enterprise, more and more enterprises recognize the ability and value of Serverless and Ali cloud Serverless.

3. SAE solutions

It can be seen that under the traditional micro-service architecture, enterprises need to develop a lot of solutions to make good use of micro-service related technologies. Then in the era of Serverless, how to solve the problem in the SAE product?

As you can see, SAE takes the Serverless concept to the extreme, not only hosting IaaS resources, including upper-level K8S, but also integrating white-screen PaaS with enterprise-level microservice-related suites and observable related suites. All of these are well integrated in the overall SAE solution, providing users with an out-of-the-box microservices solution that makes it easy for enterprises and developers to use microservices.

1, 0 threshold PaaS

As can be seen from the figure, SAE provides users with a white-screen operating system at the top. Its design concept is very consistent with the general enterprise PaaS system, including the publishing system or some open source PaaS system, which greatly reduces the threshold for enterprises to get into SAE, or even zero threshold. Including it also integrates some of Alibaba’s best publishing, namely publishing three axes — observable, grayscale, rollback.

It also provides some enterprise-level enhancements, including namespace environment isolation, fine-grained permission control, and more. As you can see from the figure, in an enterprise, if there are two modules that are relatively independent of each other, they can be separated by the namespace process.

2. Enhanced microservice governance

In the aspect of micro-service governance enhancement, especially in the Java language, SAE uses an Agent, which is equivalent to no intrusion, no perception, zero upgrade for the user, and the comprehensive compatibility of the Agent open source, the user almost in the case of no modification, You can use lossless minima, API management, current limiting degradation, link tracing, and so on.

3, front and rear end full link gray level

Two capabilities are expanded here. The first capability is the front and rear end full link grayscale. SAE makes use of the Agent technology mentioned above, from Web request to gateway to consumer to provide for a full link through, users can through a very simple white screen configuration, can achieve a grayscale publishing scene. And if you have to build your own technology, the complexity of that is pretty clear to you.

4, CloudToolkit side cloud coordination

The second capability is CloudToolkit’s cloud-side collaboration. As we all know, the number of applications under the micro-service scenario is showing an explosion trend. If there are so many applications that need to be developed locally, how can we safely and conveniently connect a service on the cloud? Now, with CloudToolkit, users can easily access the cloud environment locally and conduct an end-to-end cloud collaboration, greatly reducing the threshold for development and testing.

5. Powerful application monitoring & diagnosis

In the case of micro-services, locating the problem below the problem scenario is very complex because of the rapid divergence of micro-services and the extreme growth of call links. SAE integrates various observable products of Aliyun, including Prometheus, IaaS, SLS, basic monitoring, etc., and provides rich solutions for Tracing Logging Metrics, including query of request link, index analysis of commonly used diagnostic scenarios, etc. Basic monitoring, real-time logging, event notification, etc., can greatly reduce some of the daily positioning problems of enterprises in the scenario of micro-service desk operation.

SAE technical principles and extreme elastic construction

I’ve already explained the three parts, namely Zero Threshold PaaS, Enterprise Micro Service Suite, and Observable. Now we will introduce one of the core modules of Serverless, which is the IaaS level of free operation and maintenance and elastic capability building.

1. SAE business architecture

From this business architecture diagram of SAE, it is relatively clear that in SAE, IaaS resources, including storage, network, etc., do not need to be cared about by users. In addition, SAE also hosts K8S, a component of the PaaS layer, which means users don’t need to run and maintain K8S by themselves. On top of the K8S layer, SAE provides enhanced capabilities for microservice governance, application lifecycle management, and more. In terms of resilience, the SAE has a resilience of up to 15 seconds, which in many enterprise-level scenarios can help developers cope with unexpected traffic situations. In addition, through multiple sets of environments and some best practices, a cost reduction and efficiency increase can be achieved.

2. SAE technical architecture

So how does the SAE build an IaaS resource and a K8S resource that the user does not need to host?

As can be seen in the figure above, the underlying SAE is actually a security container technology. Compared with Docker, the security container is equivalent to providing a security solution at the virtual machine level. In the RUNC scenario, because the shared kernel is actually on a public cloud product, user A may penetrate into a container of user B, causing some security risks. A production-level security isolation is achieved with the use of secure container technology, i.e. virtual machine-related security technology, including security containers entering the K8S and container ecology. In this way, the combination of safety container and container ecology achieves a good balance of safety and efficiency.

In addition, in terms of isolation between storage and network, SAE not only needs to consider the network isolation on traditional K8S, but also needs to consider that under the public cloud products, most users already have a lot of storage resources and network resources on the public cloud, which also need to be connected.

SAE uses the Eni network card technology of the cloud product to connect the Eni network card directly to the secure sandbox, so that users can not only achieve the isolation of a computing layer, but also achieve the network layer access.

It can be seen that the mainstream security container technologies now include Kata, Firecracker, Gvisor and so on. In SAE, the earliest and most mature Kata technology is adopted to achieve a COMPUTATIVELY secure isolation. In addition, the security container implements not only a security isolation, but also a performance isolation and fault isolation.

For example, in a RunC shared kernel scenario, a user’s Container causes some kernel failures that directly affect the physical machine. There is no risk based on the SAE use of a security container; at most, it only affects that one security container.

3. Extreme elasticity and extreme cost

As you can see in the figure below, if the elasticity efficiency reaches an extreme, the cost to the user can also reach an extreme. By comparing the figures on the left and right, you can understand one effect that elasticity can have on user costs.

1. SAE Extreme Resilience Construction: Deployment & Restart

What does SAE do in terms of resilience? It can be seen that the traditional K8S POD creation process needs to go through the stages of scheduling, init container creation, pulling the user image, creating the user container, starting the user container, running the application and so on. Although it conforms to the design concept and specification of K8S, in the production environment, For some scenarios requiring relatively more efficiency, in fact, it does not meet the requirements of the enterprise level. SAE’s in-situ upgrade strategy with CloneSet components from Alibaba’s open source is equivalent to no need to rebuild the entire Pod, but only need to rebuild the container inside, eliminating the need for scheduling and creating Innt Containr, and achieving a 42% improvement in deployment efficiency.

2, SAE extreme elastic construction: elastic capacity expansion

SAE also implements a parallel scheduling in the mirror warm-up scenario. As you can see, in the standard scenario, scheduling to a user pull image is a serial process. Then an optimization is made here, that is, when it recognizes that POD is about to be transferred to a single physical machine, it will start to pull the image of the user in parallel, which can also achieve a 30% increase in elastic efficiency.

3. SAE Extreme Resilience Construction: Java startup acceleration

So in the application launch phase, we also did something to improve the elastic efficiency and capability. For example, Java applications, in the Serverless scenario has been slow to start the pain point, the core is that Java needs to be loaded one by one. In some enterprise applications, this can be a relatively slow process for thousands of classes to load.

SAE combined with Alibaba’s open source DragonWell to implement the technology of APP CDS, which will load the class into a compressed package at the first launch of the application, and the subsequent application loading only needs to load the compressed package, eliminating the need for a large number of serialized class loading. Achieved a 45% increase in deployment efficiency.

4. SAE Extreme Elastic Construction

Finally, in the application runtime, we also made some flexibility enhancements. Applications of microservices often require the configuration of a very large number of threads, which are usually one-to-one corresponding to the underlying threads of Linux. In high concurrency scenarios, there will be a large thread switching overhead. SAE combined with Alibaba open source DragonWell, WISP thread technology, the upper hundreds of threads corresponding to the bottom of a dozen threads, greatly reducing a thread switching overhead.

In the figure above is the data from one of our pressure tests. The red line is the use of Dragonwell, WISP technology, and you see a 20% or so increase in operational efficiency.

The above is the SAE in Serverless, IaaS hosting and K8S hosting, as well as in the elastic efficiency of the construction of some technical principles and effects.

Summary and Outlook

The original microservice users need to build a lot of components themselves, including PaaS microservice some technical framework, operation and maintenance of IaaS, K8S, including observable components, etc. SAE provides a holistic solution to all of these aspects so that users only need to focus on their own business systems, which greatly reduces the barriers for users to use micro-service technologies.

The SAE will also continue to build capacity for each module. Include:

• In terms of zero-threshold PaaS, Micro Services will continue to integrate some cloud products, including the CICD toolchain. In addition, enterprise-level capabilities will be enhanced, such as approval flow.

• In terms of Serverless operation and maintenance, extreme elasticity, we will also provide more and more elastic capacity, elastic indicators, elastic efficiency, which will continue to be built. Resistivity solutions such as AI prediction will also be provided to reduce the mental burden of the user when setting resilience metrics.

• In terms of the microservices ecosystem, we will also do more integration with the microservices enterprise suite to further reduce the barriers to access to microservices technologies such as chaos engineering, remote debugging capabilities, etc.

Finally, in terms of observability, SAE is equivalent to operating the user’s application. Observability is also a very important capability for SAE itself, or for the platform itself, and we continue to do some monitoring alerts in this regard, including planning and gray scale construction. Users also need to host their applications on the SAE, which requires a product that lowers the barriers for users to use it, and then builds the app pool, event center, and so on.

This article is the original content of Aliyun, shall not be reproduced without permission.