The author | liu yun (flower: to Jane) ali cloud senior technical experts

In 2019, the maturity of Service Mesh open source products has not changed substantially globally, but there are still some noteworthy events in China. For example, Alibaba implemented a complete Service Mesh solution on some e-commerce core applications of Double 11, and completed preliminary technical verification before large-scale implementation with the help of strict business scenarios of Double 11. The author of this paper will share with you his observations and thoughts during the practice of Service Mesh in Alibaba.

Is Service Mesh old wine in a new bottle?

As new technologies emerge, the value proposition is bound to spark discussion, and Service Mesh is no exception.

In the past, there were two main categories of skepticism about the value of Service Mesh.

  • The first type is that the number of applications does not reach a certain scale. As the operation and deployment complexity of Service Mesh increases, the cost and complexity are considered to be higher than the benefits.

Fundamentally, this category does not really doubt the value of Service Mesh. Rather, it advocates the adoption of Service Mesh at an appropriate time in the future when it is not fully mature and popular. Of course, there are some exceptions when I communicate with external customers. They still want to use Service Mesh to solve Service governance problems in non-Java programming languages (such as Go), such as distributed link tracing, even though there are relatively mature solutions for these capabilities in the Java domain. But it’s rare in the non-Java world, so it was a natural thought to use Service Mesh.

  • The second class of skeptics of Service Mesh are those who have a large number of applications and a good understanding of the scale of distributed applications, but have accumulated technologies that are comparable to the capabilities of Service Mesh over the course of development. As a result, when we first knew Service Mesh, we felt that “old wine is changed into a new bottle” and did not recognize its value. Alibaba has been in this camp before.

Alibaba has been exploring overall solutions in the development and governance of distributed applications for more than a decade, and the exploration process has been continuously tested and incubated through rigorous scenarios such as Double 11, using a single Java language to build a whole set of technologies. Even so, it is still not easy to deal with the scale of distributed applications, which is reflected in the lack of top-level design and lack of attention to the user experience of technical products, resulting in high operation and maintenance costs and technical threshold. In the face of these travails, the concept of cloud native is emerging clearly.

Cloud native claims that technology products in the most demanding scenarios can still provide a certain quality of service and reflect good flexibility, but also emphasizes the technology product itself should have good usability, and in the future for the enterprise to cloudy and hybrid cloud support the IT infrastructure (i.e., to help realize the portability of distributed application).

The concept of cloud native not only fits alibaba Group’s urgent pains in technological development, but also caters to Alibaba’s original intention of taking cloud computing as its group strategy and making cloud computing universally beneficial to society. In this context, Alibaba has made the decision of comprehensive cloud biogenesis. Service Mesh, as one of the key technologies in the concept of cloud biogenesis, is certainly included in it.

Value Service Mesh brings to Alibaba

The first change that Service Mesh brings about is a shift in Service governance from a framework approach to a platform approach.

This shift is not the latter negating the former, but the former and the latter combining to better play to their respective strengths. The biggest difference between the two kinds of thinking lies in that platform thinking can not only achieve a better decoupling of application and technical infrastructure, but also enable systematic top-level design to develop through the aggregation effect of platform.

The transformation from frame thinking to platform thinking is embodied in “lightweight” and “sinking” in execution.

  • Lightweight refers to the removal of volatile features from the FRAMEWORK’s SDK. As a result, apps using THE SDK become lighter, eliminating the inefficiencies caused by the constant upgrading of volatile features. It completely eliminates the need for app developers to focus on the business logic itself;

  • Functionality removed from the framework is placed in the Sidecar of the Service Mesh to achieve functionality sinking.

As a platform technology, Service Mesh will be operated and provided by cloud vendors. Once the global de facto standard built through open source is adopted by all cloud vendors and exported into products, the portability of applications will be solved naturally.

Function sinking also saw corresponding value in the process of Alibaba landing Service Mesh. Alibaba’s e-commerce core applications are basically built with Java. Before Mesh, RPC service discovery and routing are completed in SDK. In order to ensure consumer user experience in the traffic peak scenario like Double 11, change push of service address will be degraded through pre-plan. Avoid Full GC for application processes due to frequent push. After meshing, the SDK functionality is put into a separate process called Sidecar (C++ development language), which makes Java applications completely free of the Full GC problems that occur in similar scenarios.

The quality of software design is mainly embodied in the words “concept” and “relationship”.

For a system with the same function, different concept shaping and segmentation will produce completely different design results, and even affect the engineering quality and efficiency of the final software product. When the concept is established, the relationship is also established, and the quality of the relationship is reflected in the degree of decoupling. Service Mesh loosens and stabilizes the relationship between applications and technical infrastructure. The flow less destructive thermal upgrade enables applications and technical infrastructure to evolve independently, accelerating their evolution efficiency. It’s not that the software is not mature and perfect, but that it’s too slow and too heavy.

Alibaba realized the huge engineering value brought by loose coupling during the implementation of Service Mesh. When the application is meshed, the subsequent upgrade of the technical infrastructure becomes transparent, and the human coordination problems previously required for the upgrade work can be fully released through the means of technology transition. In addition, in the past, the application process included the functions of business logic and basic technology, so it was difficult to clearly explain the consumption of computing resources by each. The Service Mesh can be better isolated and quantified through an independent process. Only with quantified results can the technology be better optimized.

The second change that Service Mesh brings about is a shift in technology platform construction from a single programming language to a multi-programming language.

For start-up or small enterprises, the development of business applications using a single programming language has obvious advantages, manifests itself in the same technology stack for individuals to master and bring good cooperation efficiency, but when the enterprise’s development into the diversified and interdisciplinary higher and bigger stage, more programming language demands will arise, This is especially true for cloud vendors like Alibaba, whose cloud offerings are unlikely to be overly constrained by the programming language used by customers. The reason behind the appeal of multiple programming languages is that each programming language has its own advantages and scope of application, and needs to give play to its own advantages to accelerate exploration and innovation. Technically, this shift means:

  • First, the capabilities of the technology platform should be servitized as much as possible, so as to avoid the need to introduce SDK due to incomplete servitization, which will lead to the problem of multiple programming languages (that is, the programming language cannot be used because there is no corresponding SDK of the programming language).
  • Second, in the case that SDK cannot be avoided, the best way to support multi-programming language SDK is to use IDL to make SDK light and stable enough to reduce the engineering cost of platformization and multi-programming language.

At an organizational level, this shift means that the skills of platform technology teams need to be multi-programming languages. It’s hard for a single-language team to build a technology platform for multiple programming languages, not only because of a single perspective, but also because of the inability to “eat your dog’s food” and suffer from multiple programming languages.

Opportunities for Service Mesh

With these two changes in mind, let’s talk about the opportunities that Service Mesh offers.

  • First, Service Mesh creates a developer-centric opportunity to build a distributed application development platform for the future.

In Service Mesh, all kinds of distributed Service management technology product development, the absence of strong gripper do transverse pull through systematic design and complete ability to reuse, thus appears unavoidably inconsistent abstract concepts and re-invent the wheel, eventually each technology has its own set of concepts and independent of the operations console. It’s easy to overlook the difficulties and inefficiencies that arise when multiple operations consoles are handed to developers to understand the concepts of each and how they relate to each other.

Essentially, the emergence of Service Mesh addresses the complexities that lie between applications in microservices software architectures. Its emergence brings all the governance issues of distributed applications under one roof. In other words, with the advent of Service Mesh, we have the opportunity to do a global design for the governance of distributed applications, and to integrate various technical products without duplication.

The future distributed application development platform must be based on the basic technology of Service Mesh. To this end, we need to take this opportunity to redefine the mentality of developers from the perspective of usability. The ease of use mentality will enable developers to do minimal work on an operations console, reducing the mental burden of use by shielding them from the technical implementation details behind it, and reducing the likelihood of safety incidents resulting from operational errors.

In theory, this could have been done without the Service Mesh, but it would not have landed because there was no specific lateral technology for the gripper.

  • Second, Service Mesh creates an opportunity for other technology products to rethink the cloud native era.

With Service Mesh, many previously independent technology products (for example, Service registry, messaging system, configuration center) become Backend as a Service (BaaS). Sidecar of Service Mesh is responsible for interconnecting with them. Applications access these services through Sidecar, and even some BaaS services are terminated by Sidecar and are completely insensitive to the application.

These changes do not diminish the importance of those BaaS services. Instead, it is important to better integrate the Service Mesh to serve the application while exploring some capability enhancements. For example, not every BaaS Service will be able to support the grayscale publishing functions of the application version that Service Mesh supports (including blue-green publishing, Canary publishing, and A/B testing), but will need to be modified accordingly. Note that this is mainly about the grayscale capabilities of the application, not the grayscale capabilities of the BaaS service itself. Of course, this does not hinder the exploration of the Service Mesh to make the gray-scale work of the BaaS Service itself easy and low-risk.

The competitive advantage of many future technology products will be their seamless integration with Service Mesh.

The core driver of seamless integration is the user’s need for ease of use of technology products and portability of applications. With this in mind, Alibaba is removing the heavy logic from the RocketMQ/MetaQ messaging client into the Envoy Sidecar (still thinking “down”) and making some technical changes based on the capabilities provided by Service Mesh. So that RocketMQ/MetaQ can well support grayscale distribution of applications. This kind of thinking and action will appear in more technical products in the future.

  • Again, Service Mesh provides an opportunity to explore how the technology infrastructure can better align with the business underlying technologies.

Each business (e-commerce, for example) builds its domain based technology base, which we call business base technology. When alibaba hope will be the basis of a certain business technology moved to external clients, business basic technology how to use as a service to satisfy the customer has chosen, and various basic technology of programming language, can appear otherwise based technology based on Java build business difficult to written application Go together.

In the process of Service Mesh to solve the servitization problem, it is worth exploring whether the capability of the basic business technology can be “extended” on the Service Mesh in the form of plug-ins through certain technical means. When the business infrastructure technology exists as a plug-in, it does not need to exist as a separate process to achieve better performance, and this mechanism can be reused by different businesses. Sidecar software Envoy, which is used in Alibaba’s Service Mesh technology solution, is actively exploring a plug-in mechanism for traffic processing using Wasm technology, and further evolving it into a plug-in mechanism for business base technology is worth exploring.

The following example illustrates the plug-in mechanism for business-based technologies.

The two colors in the figure respectively represent different businesses (for example, one represents e-commerce and the other represents logistics). The basic technology of the two businesses is not to develop two independent applications (processes) and then do release and operation and maintenance management, but to implement business technology plug-ins based on the programming language supported by Wasm. This can be interpreted as a multi-programming language approach to business servitization, rather than a mandatory programming language like Sidecar. Plug-ins are managed through the operation and maintenance platform of Service Mesh, including installation, gray scale, upgrade, and monitoring capabilities.

Since plug-ins are “long” over the Service Mesh, the process of plug-ins is the process of business technology servitization.

In addition, Service Mesh needs to provide the ability for the application developer or operator of the business to choose which plug-ins they want on their machine (think of the plug-in marketplace). Another point of concern is that the operation and management capabilities of plug-ins and certain quality assurance measures are provided by the Service Mesh platform, but the responsibility for operation, management, and quality assurance lies with the provider of each plug-in. This division will effectively eliminate the inefficiencies of the Service Mesh platform taking care of the quality of all plug-ins, and divide-and-conquer is still a recipe for improving the efficiency of many projects.

  • Finally, Service Mesh opens the door to exploring the future of remote, live, always-on, holistic technology solutions.

The interworking among services, the control, observation and security reinforcement of service traffic are the key problems to be solved under the microservice software architecture. These problems are closely related to the availability and security of services on a large scale. In the future, much will be written to improve application publishing and operation efficiency through the flow control capabilities of Service Mesh, and we will see a truly dynamic cloud platform.

The “Trinity” approach to Service Mesh

As a supplier of cloud computing technology, Alibaba is not only considering how to realize the dividend of cloud native technology in Alibaba, but also thinking about how to bring the technology dividend to more Aliyun customers when exploring the road of Service Mesh technology. Based on this, Alibaba’s overall development idea of Service Mesh follows the “trinity”, that is, the corresponding commercial products and open source software inside Alibaba and on Aliyun will adopt the same set of codes.

From our experience with Aliyun customers, they are willing to try their best to adopt technology solutions that are not specific to cloud vendors, so as not to be locked in by technology and hinder their future development. In addition, they can only achieve the enterprise’s multi-cloud and hybrid cloud strategy if they adopt open source de facto standard software. Based on this demand, we pay special attention to participate in the construction of open source de facto standard in the technical development of Service Mesh. On both Istio and Envoy open source projects, we are committed to bringing back to the open source community some of the improvements we have made internally.

In the future, we will continue to explore Service Mesh, and we will continue to share our findings and thoughts with you.

This book highlights

  • In the practice of double 11 super scale K8s cluster, the problems encountered and solutions are detailed
  • Best combination of Cloud biogenesis: Kubernetes+ Container + Dragon, to achieve 100% cloud on the core system technical details
  • Double 11 Service Mesh large-scale landing solution

“Alibaba cloudnative wechat public account (ID: Alicloudnative) focuses on micro Service, Serverless, container, Service Mesh and other technical fields, focuses on cloudnative popular technology trends, large-scale implementation of cloudnative practice, and becomes the technical public account that most understands cloudnative developers.”