Introduction: In THE Faas scenario, cost and r&d efficiency are more attractive to users. Cost is mainly achieved through on-demand allocation and extreme elastic efficiency. Application developers are looking to FaaS to provide a multilingual programming environment and improve development efficiency, including launch time, release time, and development efficiency.

Author | Cao Shengli

What is a Service Mesh?

Since 2010, SOA architecture has been popular among medium and large Internet companies, and Ali also opened Dubbo in 2012. After that, micro-service architecture became popular, and a large number of Internet and traditional enterprises joined in the construction of micro-service. In China, Dubbo and Spring Cloud have gradually formed two micro-service camps. In 2016, a more forward-thinking, container-and-Kubernetes-aligned microservice solution is emerging in the field of microservices. This technology is called Service Mesh. Today, the concept of Service Mesh has gained widespread popularity and many companies are working in the Service Mesh field.

Service Mesh definition

The Service Mesh is an infrastructure layer that revolves around communication between services. Service Mesh enables reliable request delivery in the complex Service topology of cloud native applications. The Service Mesh is operated in Sidecar mode. An independent Service Mesh process runs next to the application. The Service Mesh process is responsible for the communication between remote services. The military tricycle is very similar to the Service Mesh in that one soldier drives and one soldier fires.

Service Mesh Indicates the pain points to be solved

Traditional microservice architecture is mostly based on RPC communication framework, which provides services registration/discovery, service routing, load balancing, full link tracing and other capabilities in RPC SDK. The approach of applying business logic and RPC SDK in the same process brings many challenges to traditional microservices architecture: middleware capability related code intrusions into business code, high coupling; The cost of upgrading RPC SDKS is very high, which results in SDK version differentiation. At the same time, this approach has high requirements for application developers, such as rich service governance operation and maintenance capabilities, middleware background knowledge, and high threshold to use middleware.

Service Mesh is used to submerge some RPC capabilities, which can achieve separation of concerns and clear boundary of responsibilities. With the development of containers and Kubernetes technology, Service Mesh has become the cloud-native infrastructure.

Istio introduction

Istio is the undisputed king of the Service Mesh world. Istio consists of a control plane and a data plane. In ServiceMesh, different services communicate with each other through Proxy Sidecar. The core function of Istio is traffic management, which is coordinated between the data plane and the control plane. Istio was initiated by Google, IBM, and Lyft. It is the purest lineage in the Service Mesh field of CNCF ecological map and is expected to become the de facto standard for Service Mesh.

The data surface of Istio defaults to Envoy, which is the default best data surface in the community. Istio The interface protocol between the data plane and the control plane is xDS.

Service Mesh nodules

Finally, summarize the Service Mesh:

  • Service Mesh positioning is the infrastructure that provides communication between services. The community mainly supports RPC and HTTP.
  • The Sidecar deployment mode can be deployed on Kubernetes and VMS.
  • The Service Mesh is forwarded by the original protocol, so it is also called the network proxy. It is because of this way, so can be achieved to the application of zero intrusion.

What is Dapr?

Challenges encountered by Service Mesh

Users can deploy services on the cloud in common applications and FaaS types. In the Faas scenario, cost and R&D efficiency are more attractive to users. Cost is mainly achieved through on-demand allocation and extreme elastic efficiency. Application developers are looking to FaaS to provide a multilingual programming environment and improve development efficiency, including launch time, release time, and development efficiency.

In essence, the implementation of Service Mesh is to forward the original protocol, which can bring the advantage of zero intrusion to the application. However, the original protocol forwarding also brings some problems. The application-side middleware SDK still needs to realize serialization and codecs, so there is a certain cost in multi-language implementation. As open source technology continues to evolve and the technologies used continue to evolve, to migrate from Spring Cloud to Dubbo, either the application developer needs to switch the SDK to rely on, or to use Service Mesh to achieve this effect, the Service Mesh needs to be protocol transformed. Higher cost.

Service Mesh is more focused on communication between services, and there is very little support for other types of Mesh. Envoy, for example, has been successful in the fields of Redis, messaging, and so on, except in RPC. Integration of RPC and message is supported in ant Mosn. The need for an overall multi-mesh format exists, but individual Mesh products evolve independently, lacking abstractions and standards. Do so many Mesh forms share one process? If it is a shared process, is it a shared port? Many questions remain unanswered. And the control surface, from the functional point of view, mostly around the flow to unfold. Looking at the CONTENT of the xDS protocol, the core is around discovering services and routing. Other types of distributed capabilities are largely absent from the Service Mesh control surface, let alone abstracting various XDS-like protocols to support them.

As FaaS is increasingly used by customers for cost and r&d efficiency, there is more demand for multilanguage and API friendliness, but Service Mesh does not provide additional value to customers in these areas.

Requirements for distributed applications

Bilgin Ibryam is the author of Kubernetes Patterns, the lead middleware architect at RedHat, and is very active in the Apache community. He published an article abstraction some of the difficulties and problems of current distribution, and divided distributed application requirements into four broad categories: lifecycle, network, state, and binding. Each type has sub-capabilities such as point-to-point, pub/sub, Caching and other classic middleware capabilities. Applications require so much distributed capability that Service Mesh clearly cannot meet the current needs of the application. Biligin Ibryam also proposed the concept of Multiple Runtime in his article to solve the dilemma of Service Mesh.

Multiple Runtime derivation

In the traditional middleware mode, application and distributed capability are integrated in a process by MEANS of SDK. As infrastructure sinks, distributed capabilities move from application to application. For example, K8s is responsible for the requirements related to the life cycle, while Istio and Knative are responsible for some distributed capabilities. Moving all of these capabilities into a separate Runtime is not an acceptable situation from an operational or resource level. So at this point you need to integrate parts of the Runtime, and the ideal way is to integrate them into one. This approach is defined as the Mecha, which means “Mecha” in Chinese, just like in the Japanese anime where the hero transforms into a Mecha, each component of the Mecha is like a distributed capability, and the person in the Mecha corresponds to the main application, also called Micrologic Runtime. The two Runtime can be a one-to-one Sidecar, which is ideal for traditional applications; It can also be used in many-to-one Node mode, suitable for edge scenarios or network management mode.

So the goal of a Mecha Runtime that integrates distributed capabilities is not a big deal in itself, but how? What are the requirements for the Mecha?

  1. Mecha’s component capabilities are abstract, and any open source product can be quickly extended and integrated.
  2. Mecha needs to be configurable and can be configured and activated with YAML/JSON. These file formats are ideally aligned with the mainstream cloud-native approach.
  3. Mecha provides a standard API, and the network communication between the main application and the interaction is completed based on this API, instead of the original protocol forwarding, which brings great convenience to the component extension and SDK maintenance.
  4. Life cycle in distributed capabilities, which can transfer some capabilities to underlying infrastructure, such as K8s. Of course, some complex scenarios may require K8s, APP, and Mecha Runtime to complete together.

So why is it called Multiple Runtime if there’s only one Runtime left? Since the app itself is actually a Runtime, plus the Mecha Runtime, there are at least two Runtime.

Dapr introduction

The Multiple Runtime mentioned above is abstract, and can be reinterpreted from Dapr. Dapr is a good practitioner of Multiple Runtime, so Dapr must coexist with applications, either in Sidecar mode or Node mode. The word Dapr is a combination of the initials of Distributed Application Runtime. The icon of Dapr can be seen as a hat. The hat is actually a waiter’s hat, which means to provide good service for applications.

Dapr is open source by Microsoft and heavily partnered with Alibaba. The current Dapr has released version 1.1 and is now close to production capability.

Since Dapr is the best practitioner of Multiple, the operating mechanism of Dapr is also built on the concept of Mulitple Runtime. Dapr abstracts distributed capability, defines a set of apis for distributed capability, and these apis are built on Http and gRPC. This abstraction and capability is called Building Block in Dapr. In order to support different types of products such as open source products and commercialization to extend the distributed capability in Dapr, THERE is an internal SPI extension mechanism called Components. After using Dapr, application developers only need to program for various distributed capability apis, without paying much attention to specific implementation, and Dapr can activate corresponding components freely according to Yaml files.

Dapr features

Application developers use the Dapr SDK in a variety of multiple languages to directly access distributed capabilities. Of course, developers can also make their own calls based on HTTP and gRPC. Dapr can run in most environments, including your own computer environment, or any Kubernetes environment, or edge computing scenarios, or cloud vendors like Aliyun, AWS, GCP, etc.

The Dapr community has integrated 70+ components implementations so that application developers can quickly select and use them. The replacement of components with similar capabilities can be done in Dapr, without awareness on the application side.

Dapr core module

Let’s take a look at the Dapr product module latitude and see why Dapr is a good practice for the Mulitiple Runtime.

The Component mechanism ensures rapid scale-out capabilities, and the community now has more than 70 implementations, including not only open source products but also commercial products on the cloud.

Currently, only 7 distributed capabilities are supported in Building blocks. More distributed capabilities will be needed in the future. BuildingBlock now supports HTTP and gRPC, two open and already very popular protocols. Dapr relies on YAML files to determine which Components will be activated under the Building Block. Because Dapr uses HTTP and gRPC to expose capabilities, it becomes easier to support standard API programming interfaces in multiple languages on the application side.

Dapr Core: Component & Building Block

Dapr Component is the core of the Dapr plug-in extension and is the SPI of Dapr. Currently supported Components include Bindings, Pub/Sub, Middleware, ServiceDiscovery, Secret Stores and State. Some of the extension points are functional latitude (Bindings, pub/sub,state, etc.) and some are horizontal (Middleware). If you want to implement THE Dapr integration of Redis, all you need to do is implement the Dapr State Component. Dapr Building Block is a capability provided by Dapr and supports gRPC and HTTP modes. Now it supports Service Invocation, State, Pub/Sub, etc.

A Building Block is composed of one or more components. Binding Building blocks include Bindings and Middleware.

Dapr overall architecture

Dapr, like Istio, has data and control planes. The control plane has Actor Placement, Sidecar Injector, Sentry, OPerator. Actor Placement is mainly for Actor service, Sentry does security and certificate related work, Sidecar Injector is mainly responsible for the injection of Dapr Sidecar. In Dapr, activating a component implementation is done through a YAML file, which can be specified in two ways: One is to specify the runtime parameters locally; the other is to do it through the control plane Operator, which stores the files activated by components in the way of K8s CRD and delivers them to the Sidecar of Dapr. The two core components of the control plane depend on K8s to run. The current Dapr Dashboard functions are still weak and cannot be enhanced in the short term. After the integration of various components, the operation and maintenance of each component still needs to be completed in the original console. The Dapr control plane does not participate in the operation and maintenance of specific component implementation.

The Dapr standard runs in the same Pod as the application, but in two containers. The rest of the Dapr has already been covered enough to leave it here.

Dapr Microsoft landing scene

Dapr has been developing for about two years. What is the landing situation inside Microsoft?

There are two projects for Dapr on Github: WorkFlows and Azure Functions Dapr Extensions. Azure Logic App is Microsoft’s cloud-based automated workflow platform. Workflows is the integration of Azure Logic App and Dapr. There are several key concepts in the Azure Logic App, and the Trigger and Connector fit well with the Dapr. The Trigger can be completed using Dapr Input Binding. A large number of component implementations that rely on Dapr Input Binding can expand the type of traffic entry. The Connector and Dapr’s Output Binding or Service Invocation match well for quick access to external resources. Azure Functions Dapr Extensions provide Dapr support based on Azure Function Extension. It enables Azure Function to quickly use Dapr’s various Building Block capabilities, while bringing Function developers a relatively simple and consistent programming experience in multiple languages.

The Azure API Management Service perspective is not consistent with the two landing scenarios mentioned above, it is provided that the application is already accessed through Dapr Sidecar and the Service provided by the application is exposed through Dapr. In this case, if non-K8S applications or cross-cluster applications want to access the services of the current cluster, they need a gateway that directly exposes the capabilities of Dapr and adds some security and permission controls to the gateway. Currently, three Building blocks are supported: Service Invocation, PUB/SUB, and Resource Bindings.

Dapr summary

Dapr provides featurely-oriented apis that give developers a consistent programming experience that supports multiple languages, while the SDKS for these apis are relatively lightweight. These features are well suited to THE FaaS scenario. With the continuous improvement of THE INTEGRATION ecology of Dapr, the advantages of developers’ capability oriented programming will be further expanded, and the implementation of Dapr components can be replaced more easily through Dapr, without the need for developers to make code adjustments. Of course, the original component and the new component implementation must have the same type of distributed capability.

Service Mesh:

Capabilities: Service Mesh focuses on Service invocation; Dapr provides a wider range of distributed capabilities, covering multiple distributed primitives.

Working principle: Service Mesh adopts the original protocol to achieve zero intrusion. Dapr uses multi-language SDK + standard API + various distributed capabilities.

Domain oriented: Service Mesh is friendly to non-invasive upgrade support for traditional microservices; Dapr provides a more user-friendly programming experience for application-oriented developers.

Ali’s exploration on Dapr

Ali’s path in Dapr

In October 2019, Microsoft made Dapr open source, releasing version 0.1.0. At this time, Alibaba and Microsoft happened to know about Dapr project because OAM had already carried out some cooperation, so they began to evaluate it. At the beginning of 2020, Alibaba and Microsoft had a round of communication about Dapr on Alibaba’s offline platform, and learned about Microsoft’s views on Dapr, investment and subsequent development plans. At this point, Ali had decided that Dapr was of great value. It wasn’t until mid-2020 that work began on the Dapr. In October, Dapr began to function on the grayscale part of the line under the function calculation scenario. Up to today, the grayscale of all functions related to function calculation of Dapr has been basically completed and opened for public testing. In February 2021, version 1.0 was finally released.

Aliyun function computing integrated Dapr

In addition to the benefits of operation and maintenance such as extreme elasticity, functional computing is different from the middle platform application in that it pays more attention to bring developers a better R&D experience and improve the overall r&d efficiency. The value that Dapr brings to functional computing is that it provides a unified, feature-oriented programming interface in multiple languages, without requiring developers to focus on specific products. For example, if you want to use the OSS service on Aliyun in Java language, you need to introduce Maven dependency and write some OSS code at the same time. However, with Dapr, you only need to call the Binding method of Dapr SDK, which is convenient for programming. The entire runnable package does not need to introduce redundant dependencies, but is manageable.

The English name is Function Compute, or FC for short. The FC architecture includes a number of systems, including the Function Compute Gateway and the environment in which functions are run. The FC Gateway is responsible for receiving traffic and expanding or reducing the capacity of the current function instance based on the volume of the traffic and the current CPU and memory usage. The function computation runtime environment is deployed in a Pod, the function instance in the primary container, and the DAPR in the sidecar container. When there is external traffic to access the service calculated by the function, the traffic will first go to the Gateway, and the Gateway will forward the traffic to the function instance providing the current service according to the accessed content. If the function instance needs to access the external resource after receiving the request, it can initiate the call through the multi-language SDK of Dapr. At this time, THE SDK will initiate gRPC requests to THE Dapr instance, and in the Dapr instance, according to the request type and body, select the corresponding capabilities and component implementation, and then initiate calls to external resources.

In the Service Mesh scenario, the Mesh exists as a Sidecar and is deployed in two containers on the same Pod as the application, which meets the requirements of the Service Mesh. However, in the function calculation scenario, Dapr is too resource-intensive to run as an independent container, and multiple function instances themselves are deployed in a Pod to save resource costs and second-level elasticity. Therefore, in the function calculation scenario, function instances and Dapr processes need to be deployed in the same container, but exist as two processes.

In function calculation scenarios, you can set the number of reserved instances, which indicates the minimum number of instances of the current function. If there are reserved instances, but these instances are not accessed by traffic for a long time, they need to be put into pause/sleep state, which is consistent with AWS. When a function enters hibernation, the process or thread within the instance needs to stop running. The Extension structure has been added to the function runtime to support scheduling of the Dapr lifecycle. When the function instance goes to sleep, Extension notifies Dapr to go to sleep; When the function instance is running again, Extension notificates Dapr to resume the previously running state. Component implementations within Dapr need to support this type of lifecycle management. In Dubbo, for example, Dubbo registry NACOS needs to periodically send heartbeat messages to nacOS Server to keep abreast. Dapr integrated Dubbo Consumer also needs to send heartbeats to the Dubbo Provider. When entering the transient state, the heartbeat needs to exit; When running again, the entire health state needs to be restored.

The combination of function calculation and Dapr mentioned above is based on external traffic, so what about incoming traffic? Can message traffic flow directly to Dapr without going through the Gateway? In order to achieve this, performance data needs to be reported to the Gateway in Dapr Sidecar in time so that the Gateway can achieve resource elasticity.

SasS business on the cloud

With more and more SaaS businesses incubated internally by Alibaba, the demand for external services of SaaS business is very strong. The SaaS business has a strong demand for multi-cloud deployment, and customers expect the SaaS business to be deployed on Alibaba Cloud public cloud or Huawei private cloud. And customers expect the underlying technologies to be open source or standard cloud vendors’ commercial offerings.

To take Ali as a SaaS business cloud as an example, the left is the original internal system of Ali, and the right is the system after transformation. The goal of transformation is to switch the dependent internal system of Ali to open source software, Ali RPC to Dubbo, and the internal Cache, Message, Config switch to Redis, RocketMq and Nacos, respectively. Dapr is expected to achieve the minimum cost of switching.

Since we want to use Dapr to accomplish this mission, the simplest and most brutal way is to make the application rely on THE Dapr SDK. However, this way is too expensive to transform, so we adapt the underlying implementation to THE Dapr SDK while keeping the original API unchanged. In this way, applications can access Dapr directly using the original API and only need to upgrade the corresponding dependent JAR version. After the transformation, developers still program to the original SDK, but the underlying layer has been replaced with Dapr’s capability oriented programming, so that applications can use one set of code during the migration process, without having to maintain different branches for each cloud environment or different technologies. When Dapr Sidecar is used internally, RPC. Yaml, cache.yaml, msg.yaml, config.yaml are used to activate component implementations. On the public cloud, dubbo.yaml, Redis. yaml, RocketMQ. This way of shielding component implementations by activating different components with different YAML files facilitates SaaS business multi-cloud deployment patterns.

Dingding is an important Dapr partner and facilitator, working with the Cloud Native team to bring Dapr to the ground. By sinking some middleware capabilities behind Dapr Sidecar, middleware implementations of underlying similar capabilities are shielded. However, Dingding also has its own business pain point. The common business components of Dingding are strong business binding, which requires some customization for the specific business, which at the same time leads to low reusability, so Dingding expects to sink some business component capabilities into Dapr. This allows different businesses to have the same programming experience, while the component maintainer only needs to maintain the Components implementation.

Dapr outlook

Infrastructure sinking becomes a software trend

The history of software architecture is fascinating. Reviewing the evolution history of Alibaba’s system architecture can help people understand the development history of domestic and even global software architecture. When Taobao was first established, it was a single application; With the development of the business Scale, the system first upgrades the hardware in this Scale Up mode. However, it was soon discovered that this approach encountered various problems, so microservices were introduced in 2008; The solution of SOA is distributed. For stability, observability and other aspects, it is necessary to introduce high availability solutions such as fuse, isolation and full link monitoring. The next problem is how to make the service reach more than 99.99% available SLA at the machine room and IDC level. At this time, there are solutions such as same-city dual-machine room and remote multi-live. With the continuous development of cloud technology, Alibaba embraces and guides the development of cloud native technology, actively embraces cloud native technology, and actively carries out the upgrade of cloud native technology based on K8s.

From this history, we can find that there are more and more new demands for software architecture, which cannot be completed by the underlying infrastructure but can only be completed by the rich SDK on the application side. After K8s and containers gradually become the standard, microservices and some distributed capabilities are returned to the infrastructure. The future trend is the sinking of distributed capabilities represented by Service Mesh and Dapr, releasing the dividends of cloud and cloud native technology development.

Appeal of application developers in cloud native scenarios

Future application developers should expect to be able to develop experiences that are capability oriented, silent, and not tied to specific cloud vendors and technologies, while at the same time being able to achieve the cost advantage of extreme flexibility through the benefits of cloud technology. I believe this ideal is still possible to achieve one day, from the current perspective, how to achieve this goal?

  1. The Multiple Runtime concept can really take off and continue to grow;
  2. Taking Dapr as an example, it is expected to promote the DISTRIBUTED capability API of Dapr into an industry standard, and this standard needs continuous development;
  3. The continued development of K8s and Serverless technology will enable the future to be extremely resilient.

Dapr Community Direction

Finally, a look at Dapr’s community development:

1. Promote API standardization and integrate more distributed capabilities; 2. More Component integration, improved Dapr ecology; 3. Establish more companies, expand product boundaries and polish Dapr products to achieve production availability; 4. Enter CNCF, the de facto standard of member cloud native Multiple Runtime.

The original link

This article is ali Cloud original content, shall not be reproduced without permission.