On February 23, Beijing time, At the world’s first community Summit IstioCon 2021, Zhang Chaomeng, chief architect of Huawei Cloud Application Service Grid, delivered the keynote speech “Best Practice: From Spring Cloud to Istio”. Practical examples of Istio in production are shared.

Introduction to issues

Istio. IO/IStiocon-20…

The following is the full text of the speech

Hello everyone, I am an engineer from Huawei Cloud. I am honored to have the opportunity to share with you a practical example of Istio being used in production.

Huawei Cloud Application Service Grid was launched in public cloud in 2018. As one of the earliest grid services in the world, huawei cloud application service Grid has experienced and witnessed the process from the early understanding and attempt of grid to the current large-scale use. As more and more clients are served, the scenarios become more complex. As features, most of these general functions are contributed to the Istio community, and the practice at the solution level also hopes to communicate with everyone through this opportunity.

The topic I choose this time is Spring Cloud to Istio. A combination and migration case from our client’s Spring Cloud project and Istio.

The speech mainly consists of four parts:

1) Background

2) Problems encountered using the Spring Cloud microservices framework

3) Solutions

4) Use examples to describe the practical details of the solution

background

Still take microservices as the starting point, many advantages of microservices are very obvious, but the corresponding complexity to the whole system is also very significant. After a single system becomes distributed, network problems, how services find and access to the peer service discovery problems, network access fault tolerance protection problems, etc. Even the simplest problem location can be achieved through the call stack in the log, after microservitization must be supported through the distributed call chain. How to solve these challenges posed by microservices?

The Microservices SDK used to be a common solution. Encapsulate the general-purpose capabilities of microservitization in a development framework that developers use to develop and write their own business code, and the resulting microservices naturally have these capabilities built into them. For a long time, this form was standard for microservices governance, so much so that beginners thought that only these SDKS were microservices.

The service grid provides governance capabilities in another form. Unlike the SDK approach, service governance capabilities are provided in a separate proxy process, completely decoupled from development. Although the difference between the two is very small, we will look at the differences between the two concepts in terms of architecture and practical cases to see that the former is a development framework while the latter is an infrastructure.

Spring Cloud is the most influential representative project in the SDK form. Spring Cloud provides a development toolset for building distributed applications, as shown in the list. Among them, most developers are familiar with microservices-related projects, such as Service registry Discovery Eureka, config Management Config, load balancing Ribbon, fuse fault tolerance Hystrix, call chain buried SleUTH, Gateway Zuul or Spring Cloud Gateway. The Spring Cloud mentioned in this share also refers specifically to Spring Cloud’s microservices development suite.

The most influential project in the grid is Istio. This Istio architecture diagram will appear frequently during this talk. As background for this sharing, let’s just know that the architecture consists of a control plane and a data plane, which manages the services in the grid and the various rules for configuring the services. The outgoing and incoming traffic between each service on the data plane is intercepted by the data plane agent with the POD of the service and performs traffic management actions.

In addition to architecture, as another piece of background, let’s pick two basic features and open them up a little bit to see the similarities and differences in design and implementation. The first is service discovery and load balancing.

On the left is Spring Cloud. All microservices are first registered in the center. Eureka usually registers the service.

Istio on the right does not require a service registration process, just needs to fetch the service and instance relationship from the running platform K8S. During the service visit, the data side proxy envoys intercept the traffic and selects a target instance to send the request. It can be seen that client load balancing is performed based on service discovery data. The difference is that the source of service discovery data and the execution bodies of load balancing are different.

Here’s a comparison of fuses:

On the left is a classic Hystrix state transition diagram. If the number of consecutive instance errors exceeds the threshold within a period of time, the instance will enter the fusing open state and will not accept requests. After being isolated for a period of time, it will move from the fusing state to the semi-fusing state. If it is normal, it will enter the fusing closed state and can receive requests. If it is not normal, it still enters the fusing open state.

Although such a state diagram is not displayed in Istio, those who are familiar with Istio rules and behaviors will find that threshold rules of OutlierDection in Istio are also designed in this way. The difference between the two is that Spring Cloud fuses are performed by Hystrix in the SDK and data plane Proxy in Istio. Hystrix allows the user to do some control programmatically because in business code.

The above analysis shows that the capabilities and mechanisms for service discovery, load balancing, and circuit breakers are similar. If you ignore some details on the diagram, the rough block diagram model is exactly the same, and there is usually only one item in the comparison table that is the different execution position, which brings a very big difference in practical application.

Issues encountered using the Spring Cloud microservices framework

The focus of this lecture is practice. The following are some of the problems that our customers find in our TOP, and analyze the problems that users encounter when using the traditional microservice framework. Most of these are the biggest motivation for them to choose the grid.

1) Multilingual problems

In enterprise application development, it is very reasonable and common for a business to use a unified development framework. In order to improve efficiency, many development teams often maintain a common development framework of their own company or team. Of course, since most business systems are developed in Java, the Spring Cloud development framework, or the various development frameworks derived from Spring Cloud, is especially widely used.

However, in the cloud native scenario, the business is generally more complex and diverse, especially involving many existing old systems. We cannot require a set of mature services to be rewritten in Spring Cloud for microservitization. Users really want a way to manage application layer service access without rewriting the original system.

2) Running Spring Cloud microservices on K8s will have a high probability of service discovery delay

As mentioned above, Spring Cloud service discovery is realized based on the data that each microservice registers with the registry first. In the traditional Spring Cloud scenario, when microservice is deployed on VM, the requirement of dynamic change of service is not so high. At most, some instances do not run normally. A health check found by the service is sufficient. However, in the K8S scenario, dynamic migration of service instances is a normal scenario. As shown in the diagram, one of the producer’s Pods has migrated from one node to another. In this case, the new Poproducer instance of Pod2 needs to register with Eureka and the old Pod1 instance needs to register.

If this happens frequently, the registry data may not be maintained in a timely manner, resulting in service discovery and load balancing on the old instance POd1, resulting in access failures.

3) Upgrade all applications to meet changing service management requirements

The third question is a more typical one. The customer has a public team that maintains its own development framework based on Spring Cloud and has to beg the business team to upgrade its services every time the development framework is upgraded. Often the SDK itself does not require a lot of testing, but it requires a long cycle of upgrades to recompile, package, and upgrade thousands of service clusters developed based on the SDK, often with the business team making changes overnight. The business team is generally not motivated, given the workload and online risks associated with this upgrade, because they have not changed much themselves.

4) Move from monolithic architecture to microservice architecture

This is a relatively common problem, that is, gradual microservitization. Martin fowler in the famous article monomer into the split of micro service (martinfowler.com/articles/br…). Also mentioned is the initiative of progressive microservitization, how a large business can be separated from the business, decoupled, and then gradually microservitization. Martin Fowler emphasizes that “decoupling is business capability, not code,” and god leaves decoupling of code to developers.

But it’s not easy to talk about incremental microservices from a developer’s perspective. Taking the development of microservices based on the Spring Cloud framework as an example, in order to carry out uniform service discovery, load balancing, consumption and implementation of the same governance strategy among all microservices, all microservices must be developed based on the same or even a unified version of SDK. Of course, in this case, our client also ADAPTS based on THE API level to coexist the original non-microservitization and the microservitization. It is very troublesome to use such a grayscale method.

Once a customer asked whether there is no need to make two sets, whether some large monomer can be microservized directly, and some other monomer can be completely motionless for a long time, until there is time or it is safe to move it.

The solution

For each of the four typical microservice framework problems that customers actually encounter, our recommended solution is the service grid. Let’s take a look at how Istio solves each of these problems.

First, multilingualism. Based on the service grid, business and governance data surfaces do not need to run in the same process or compile together, so there are no language and framework bindings. Services developed in any language can be managed by the grid as long as there is a port on a certain application protocol that can be accessed and managed externally. Through the unified grid control surface, unified governance rules are delivered to the unified grid data surface for execution, and unified governance actions, including gray scale, flow, security, observability and so on, are carried out.

When Spring Cloud service is running in Kubernetes, the original service registration and discovery is not timely. The root cause is the inconsistency caused by two sets of service discovery, so the solution is relatively simple, unified service discovery. Since K8s already maintains data between services and instances at the same time as Pod scheduling, there is no need to develop a separate mechanism for name services, but also laborious service registration and discovery.

Compared to the previous Spring Cloud registry discovery diagram, the registry is gone, and the registrie-based service registration and service discovery actions are no longer needed. Istio uses K8S service discovery data directly, but it is much simpler from an architectural point of view.

We have also concluded that most of the scenarios encountered this problem are when migrating the microservice framework from VM to K8S, which kind of uses the container as the previous VM, and only uses K8S as the platform to deploy and run the container, without using K8S service.

The solution to the problem of SDK upgrade causing all services to be upgraded is to decouple the common capabilities of service governance from services. In the grid, business development, deployment, and upgrade are decoupled from the service governance infrastructure by sinking governance capabilities into the infrastructure. Business developers focus on their part of the business. As long as there are no changes to the business code, there is no need for recompilation and live changes.

When the upgrade of governance capability requires only the upgrade of infrastructure, the upgrade of infrastructure has no impact on user services. Most grid service providers, such as Huawei Cloud ASM, can perform one-click upgrade, which is completely invisible to users.

The problem of progressive microservitization can be solved perfectly using the Isito service grid. Istio governs the access between services. As long as a service is accessed by other services, it can be managed through Istio, no matter whether it is a micro service or a single service. After Istio takes over service traffic, both individual and micro services can receive unified rules for unified management.

As shown in the figure, in the process of microservitization, a single application SVC1 can be divided into three microservices svC11, SVC12, and SVC13 in priority according to service separation. Svc2, another single application on which SVC1 service depends, does not need to be changed. Running in a grid can be managed as well as the other three microservices. After running for a period of time, svC2 services can be microservified according to their own business needs. In this way, we can avoid the workload and the risk of business migration brought by a big refactoring, and truly achieve the practice of gradual microservitization advocated by Martin Fuller.

practice

The above are the solutions to several typical problems of customers in actual work. How can these solutions be implemented in practice? The following is a summary of actual customer cases to share specific practices.

Our main idea is decoupling and unloading. Uninstall the non-developed functions of the original SDK, SDK only provides code framework, application protocol and other development functions. Anything that involves microservice governance is offloaded to the infrastructure.

As you can see from the graph, the developer’s exposure to the framework is thinner, and the developer’s learning, use, and maintenance costs are correspondingly lower. The infrastructure becomes thick, including non-intrusive service governance capabilities in addition to the basic ability to do what was needed to run the service. More and more business capabilities will be refined into general capabilities and handed over to infrastructure and cloud vendors, so that customers can get rid of these tedious non-business affairs and devote more time and energy to business innovation and development. With this division of labor, the SDK really returns to its roots as a development framework.

To use grid capabilities, traffic from microservices can go to the data side of the grid. The main migration work is on the service callers of microservices. We recommend 3 steps: Step 1: Scrap the original microservice registry and use K8S Service. Step 2: Short-circuit the SDK logic of service discovery and load balancing, and directly use the service name and port of K8S to access the target service; Step 3: According to the needs of the project, the corresponding functions provided in the original SDK will be replaced by the governance ability in the grid step by step. Of course, this step is optional. If the original functions such as call chain burying point meet the requirements, they can also be retained as the functions of the application itself.

To achieve the above migration, we have two approaches for different user scenarios.

One is to change the configuration only: In addition to supporting Eureka-based server discovery, Spring Cloud can also configure static service instance addresses for the Ribbon. Use this mechanism to configure the Kubernetes service name and port of the service in the back-end instance address of the corresponding microservice.

When the original server microservice name is accessed in the Spring Cloud framework, the request is forwarded to the service and port of K8S. This access will be intercepted by the grid data surface and will flow to the grid data surface. Service discovery load balancing and various traffic rules can apply grid capabilities.

This approach essentially concatenates the SDK access links with the grid data plane access links with minimal modification. When used on the platform, pipelining tools can be used to help reduce the amount of work and errors in modifying configuration files directly. As you can see in the actual project, I only modified the application.yaml configuration file of the project, and the rest of the code was 0 modified. Of course, the same semantic modification is required for the annotation – based configuration.

The previous approach makes fewer changes to the original project, but the Project dependencies of Spring Cloud remain.

Some of our customers have opted for a simpler and more straightforward approach, and since the service discovery load balancing including various service governance capabilities is no longer needed in the original SDK, these dependencies are eliminated altogether. From the final mirror size, the volume of the whole project has been greatly reduced. In this way, customers make various tailoring according to their actual usage, and ultimately most of the Spring Cloud is degenerated into Spring Boot.

Another part of the migration that is special is the Gateway for external access to microservices.

Spring Cloud has two similarly functional gateways, Zuul and Spring Cloud Gateway. Eureka-based service discovery maps internal microservices to external services and provides security, triage, and other capabilities at the entry point. When switching to K8S and Istio, the service discovery for the various inbound services is migrated to K8S as well as internal services.

The difference lies in that if users develop a lot of private business-related filters on Gateway, Gateway is actually the facade service of micro-service. For business continuity, it can be directly deployed in the grid as a common micro-service for management.

However, in most cases we recommend using Istio’s Ingress Gateway to directly replace the microservice Gateway, providing external TLS termination, traffic limiting, traffic sharding, and other capabilities in a non-invasive manner.

After the above simple transformation, the services developed by different languages and different development frameworks can be managed uniformly through Istio as long as the business protocols are connected and can be accessed by each other, and the access protocols can be managed by the grid.

Unified service management rules can be configured on the control panel. On the data side, unify service discovery, load balancing, and other traffic, security, and observability related capabilities using Envoy. Services on the data surface can run either in a container or on a virtual machine. And can run in multiple K8S clusters.

Of course, during the migration process, we also support the periodic retention of the original microservices framework registry to enable Istio and other service discovery to be used in an intermediate state, so that services in the grid can access the services of the microservices registry.

Here is an example of a service developed by Spring Cloud running on an Istio service grid for grayscale publishing. The above log is the log of Sidecar, the service caller, and you can see that the grid distributes traffic to different service backends. The following screenshot uses the grayscale function of Huawei cloud ASM service. It can be seen that the Spring Cloud service distributes 30% of the traffic to the grayscale version through the grid configured traffic diversion policy.

Here is an example of a service developed by Spring Cloud using Istio’s circuit breaker feature. This process is the practice of Hystrix’s state transition diagram in the previous Principles section, except that this implementation is based on Istio. Based on the service grid, access can be fusion-protected regardless of the language or framework in which the service is developed.

The screenshot here is from the ASM application topology of Huawei Cloud application Service grid. You can see the traffic changes at the service level and service instance level, as well as the health status of services and service instances, thus showing the whole process of faulty Spring Cloud instances being isolated. It can be seen from the topology that an instance abnormally meets the fuse threshold, triggering the fuse. The traffic distributed to the faulty instance gradually decreases until there is no traffic at all, that is, the faulty instance is isolated. The overall success rate of service access is guaranteed by this circuit breaker protection.

The following three traffic topologies illustrate the process of fault recovery.

As you can see: Initial state The fault instance is isolated with no traffic;

When the instance itself is normal, the grid data plane will try to allocate traffic to it again after isolating the configured interval. When the threshold requirements are met, the instance will be considered normal and can receive requests like the other two instances.

You end up seeing requests processed evenly across all three instances. That is, fault recovery is achieved.

Finally, we conclude today with a diagram of microservices, containers, K8S, and Istio:

1) Both microservices and containers have the common characteristics of being lightweight and agile. Container is a very suitable operating environment for microservices.

2) In the cloud native scenario, in the microservice scenario, containers never exist independently, using K8S to orchestrate containers has become a de facto standard;

3) The close combination of Istio and K8S in architecture and application scenarios together provides an end-to-end platform for microservices operation and governance.

4) It is also our recommended solution. Using Istio for microservice governance is becoming the technology choice of more and more users.

These four relationships are combined clockwise to form a complete closed loop for our solution.

Speaker: Zhang Chaomeng

The attached

As early as 2018, Huawei Cloud took the lead in releasing the world’s first Istio commercial Service — Application Service Mesh (ASM for short). Huawei Cloud Application Service Mesh is a fully hosted Service grid with high performance, high reliability and ease of use, supporting multiple infrastructures such as virtual machines and containers. Supports unified management of multi-cloud and multi-cluster services across regions. Provide users with service traffic management, service operation monitoring, service access security, and service publishing capabilities in an infrastructurally based manner.

At present, Huawei cloud application service grid has served nearly a thousand customers in Internet, automobile, manufacturing, logistics, government and other industries to meet the business demands of customers in different industries. Huawei Cloud transforms the rich experience accumulated in this process into code and contributes to the Istio community, which greatly promotes the maturity of Istio technology and the rapid development of the community. At the same time, Huawei Cloud also invested heavily in the service grid technology preaching, through community forums, technical meetings, live video, professional books and other ways to promote the service grid technology dissemination and popularization.

Welcome to Huawei cloud native team, we will provide you with:

We update the cloud native technology dynamics, practical progress, application cases and so on every day;

Join the group and the industry technology champion, 10,000 + cloud native lovers link to learn together;

Irregularly invite cloud OG level technology to share technical combat

…………………