The author | ke-ming fang (creek weng) ali cloud middleware technology technical experts

Introduction: Cloud native has become the future-oriented technology infrastructure for the entire Alibaba economy. Service Mesh, as one of the key technologies of cloud native, successfully completed the landing verification of the core application of Double 11 under harsh and complex scenarios. The author of this article will share with you the challenges we have faced and overcome in achieving this goal.

Deployment architecture

Before getting to the topic, we need to explain the deployment architecture of the core application of Double 11, as shown in the figure below. In this article, we focus on the Mesh of the RPC protocol between Service A and Service B.

The following figure shows the three planes that the Service Mesh consists of: Data Plane, Control Plane, and Operation Plane. The data plane is the open source Envoy (Sidecar in the image above, please note that the two terms are used interchangeably in this article), the control plane is the open source Istio (currently using only the Pilot component), and the operations plane is completely self-developed.

In contrast to the launch six months ago, for the Double 11 core application, we deployed Pilot clusters, where instead of deploying with the Envoy into a business container, Pilot clusters are set up as an independent cluster. This change brings the deployment of the control plane to the final state of the Service Mesh.

challenge

The core applications of Double 11 we selected were all implemented by Java programming language. In the process of landing, we faced the following challenges.

1. How to realize application Mesh when SDK cannot be upgraded

When we decided to implement the Mesh on the core application of Double 11, the RPC SDK version that the Java application relied on had been finalized, so we had no time to develop and upgrade an RPC SDK suitable for the Mesh. At that time, the technical question for the team was: How to Mesh RPC protocol without upgrading the SDK?

For those familiar with Istio, Istio uses iptables’ NAT tables to perform transparent interception of traffic that can be hijacked unenvoy to implement Mesh. Unfortunately, the NF_Contrack kernel module used by the NAT table was removed from alibaba’s online production machine due to its low efficiency, so the community solution could not be used directly. Fortunately, we reached a cooperation with the Alibaba OS team at the beginning of this year, and they are responsible for the construction of the two basic capabilities of transparent traffic interception and network acceleration required by Service Mesh. After a close collaboration between the two teams, the OS team explored a transparent interception scheme based on userID and Mark identified traffic, and implemented a new transparent interception component based on iptables mangle table.

The following example illustrates the flow of RPC service calls in the presence of a transparent interception component. Inbound traffic refers to incoming traffic (the recipient of the traffic is the Provider role), and Outbound traffic refers to outgoing traffic (the sender of the traffic is the Consumer role). Generally, an application plays two roles at the same time. Therefore, Inbound traffic and Outbound traffic coexist.

With the transparent interception component, application meshing can be completely insensitive, which greatly improves the ease of Mesh landing. Of course, since the RPC SDK still has the previous service discovery and routing logic, and this traffic is hijacked to the Envoy and done again, this will cause the Outbound traffic to increase RT due to the presence of two service discovery and routing, which will be reflected in the later data section. Obviously, when landing the Service Mesh in the final state, the Service discovery and routing logic in RPC SDK need to be removed to save the corresponding CPU and memory cost.

2. Support complex service governance functions of e-commerce businesses in a short period of time

routing

In alibaba e-commerce business scenarios, routing features are diverse. In addition to supporting routing strategies such as unitization and environmental isolation, service routing must be completed according to method names, call parameters and application names of RPC requests. Alibaba’s internal Java RPC framework supports these routing policies by embedding Groovy scripts. Business parties configure Groovy routing templates on the operation and maintenance console, which will be executed when SDK calls to complete the application of routing policies.

Future iterations of Service Mesh are not intended to provide the same flexibility as Groovy scripts for customizing routing policies, so as not to hinder the evolution of the Service Mesh itself. Therefore, we decided to take the opportunity of meshing to remove Groovy scripts. By analyzing the scenario of Groovy scripts used by the landing application, we abstracted a cloud-native solution: Extend VirtualService and DestinationRule in Istio native CRD, and add routing configuration segments required by RPC protocol to express routing policies.

At present, the strategies such as unitization and isolation in Alibaba environment are customized in Istio/Envoy standard routing module, which inevitably has some hack logic. In the future, we plan to design a wASM-based routing plug-in solution in addition to Istio/Envoy’s standard routing policies, allowing those simple routing policies to exist as plug-ins. In this way, it not only reduces the intrusion of standard routing module, but also meets the needs of business side to customize service routing to a certain extent. The proposed architecture is shown below:

Current limiting

For the sake of performance, Alibaba’s internal Service Mesh solution does not adopt Mixer component in Istio, and the function of flow limiting is realized by Sentinel component widely used in Alibaba. It not only forms synergy with Sentinel, which has been open source. It can also reduce the migration cost of alibaba’s internal users (directly compatible with the existing configuration of the business to limit traffic). In order to facilitate Mesh integration, several teams have collaborated to develop a C++ version of Sentinel. The entire stream limiting function is implemented through the Filter mechanism Envoy, which we built on top of the Dubbo protocol. Represents a separate functional module for processing requests), and each request is processed by Sentinel Filter. The configuration information required for limiting traffic is retrieved from Nacos via Pilot and sent to Envoy via THE xDS protocol.

3. Envoy has too much resource overhead

One of the core issues that envoys address when they are born is the observability of services, so envoys start out with a lot of stats (or statistics) built in to help them observe services.

Envoy stats are fine-grained, down to the IP level of entire clusters. In alibaba’s environment, some e-commerce applications’ Consumer and Provider services add up to hundreds of thousands of IP addresses. (Each IP carries different meta information for different services. So the same IP in different services is independent). So the Envoy’s memory overhead in this area is quite high. To this end, we added stats switches to the Envoy to turn off or on STATS at the IP level. Turning off STATS at the IP level directly resulted in a 30% memory savings. Next we will follow the community’s stats Symbol table solution to solve the stats indicator string duplication problem, then the memory overhead will be further reduced.

4. Decouple services from infrastructure to achieve infrastructure upgrade

One of the core values of the Service Mesh implementation is that the infrastructure and business logic are completely decoupled and can evolve independently. In order to achieve this core value, Sidecar needs to have thermal upgrade capability so that it can be upgraded without interrupting business traffic, which is quite a challenge to the solution design and technical implementation.

In hot upgrade, a two-process solution is adopted. The new Sidecar container is first pulled up, and the old Sidecar performs run-time data exchange with the new Sidecar. After the new Sidecar is ready to send and take over traffic, the old Sidecar is allowed to wait for a certain period of time before exiting, and ultimately the service traffic is not damaged. The core technology is mainly the use of Unix Domain Socket and RPC node elegant offline function. The following diagram Outlines the key processes.

The data show

Publishing performance data can lead to controversy and misunderstanding if you’re not careful, because there are many variables in the performance data scenario. For example, concurrency, QPS, payload size, and so on can have a key impact on the final data representation. For this reason, Envoy officials have never presented data such as this one in this article, for fear of misinterpretation by its author Matt Klein. It is important to note that the Service Mesh we landed on was not optimal or even final in a very time-constrained situation (for example, there were two routes on the Consumer side). The reason why we choose to share is to let more peers know our progress and status.

This paper only lists the data of one of the online core applications of Double 11. From the perspective of single-machine RT sampling, the average RT of a machine with Service Mesh deployed is 5.6ms on the Provider side and 10.36ms on the Consumer side. The RT performance of the machine near double 11 zero is shown in the figure below:

On a machine where the Service Mesh is not deployed, the average value is 5.34ms on the Provider side and 9.31ms on the Consumer side. Below is an example of RT performance of the machine in the double 11 zero point attachment.

In contrast, RT on the Provider side increased by 0.26ms before and after Mesh, and RT on the Consumer side increased by 1.05ms. Note that this RT difference includes all the time that the business applies to the Sidecar and the Sidecar process. The following example illustrates the link that brings the delay increase.

Overall, the core application compares the average data of all machines with Service Mesh and machines without Service Mesh in a certain period. The RT of the Provider side after Mesh increased by 0.52ms, while that of the Consumer side increased by 1.63ms.

In terms of CPU and memory overhead, after Mesh, Envoy CPU consumption remained around 0.1 cores for all core applications, with burr as Pilot pushes data. Future burrs will need to be optimized with incremental push between Pilot and Envoy. Memory overhead varies greatly depending on the service and cluster size of the application, and there seems to be a lot of room for optimization in the Envoy’s memory usage.

According to the data performance of all the core applications launched on Double 11, the introduction of Service Mesh has basically the same impact on RT and the CPU cost, while the memory cost varies greatly due to Service dependence and cluster size.

Looking forward to

Under the wave of cloud native, Alibaba is committed to creating future-oriented technical infrastructure by taking advantage of this wave of technology. On the road of development, we will implement the development idea of “borrowing from open source, feeding back open source”, realize universal benefits of technology through open source, and make our own contribution to the popularization of cloud native technology in a wider range in the future.

Next, our overall technical focus is:

  • Work with the Istio open source community to enhance Pilot’s data push capabilities. Under the super-scale application scenario of Alibaba’s Double 11, we have extreme requirements for the data push capability of Pilot. We believe that in the pursuit of perfection, we can accelerate the co-construction of global fact standards together with the open source community. From the internal perspective of Alibaba, we have completed the co-construction with the Nacos team, and will connect with Nacos through the MCP protocol of the community, so that all kinds of open source technology components of Alibaba can work together in a systematic way.

  • Take Istio and Envoy as an example to further optimize their protocols and their respective management data structures, reducing their memory overhead through more refined and rational data structures;

  • Focus on the operation and maintenance capacity construction of large-scale Sidecar. Make Sidecar upgrade grayscale, monitor and rollback;

  • Realizing the value of Service Mesh allows business and technical infrastructure to evolve independently of each other with greater efficiency.

This book highlights

  • In the practice of double 11 super scale K8s cluster, the problems encountered and solutions are detailed
  • Best combination of Cloud biogenesis: Kubernetes+ Container + Dragon, to achieve 100% cloud on the core system technical details
  • Double 11 Service Mesh large-scale landing solution

“Alibaba cloudnative wechat public account (ID: Alicloudnative) focuses on micro Service, Serverless, container, Service Mesh and other technical fields, focuses on cloudnative popular technology trends, large-scale implementation of cloudnative practice, and becomes the technical public account that most understands cloudnative developers.”