In the 2021 service grid, “stability” takes precedence. Whether it is the development of native communities or the implementation of industry practices, “stability” is the first priority. Less a few years ago the great leap forward and architecture evolution, function change, much more practical, more ground industry exploration and practice of service grid in 2021 are from the rush of “young”, “star” flow, grow up to be a real “artists”, gradually matures, be more industries, enterprises and received by the organization for standardization. This paper will review the 2021 of service grid from the perspectives of community progress, practice implementation, industry standards, technology ecology, etc., to help readers understand the overall progress of service grid in the past year, and provide some references for enterprises to select and implement service grid.

Community progress: Steady and pragmatic

In 2021, Istio community will release releases every three months as promised: 1.9, 1.10, 1.11, 1.12. The steady release cycle is a sign that the Istio community is becoming more regular, and it makes it easier for companies to choose the right version. Overall, the Istio community has not announced any major architectural changes or innovative capabilities in 2021, but more native support in terms of access, operations, apis, etc. :

  • 1.9 — Better ** VIRTUAL Machine integration (Beta), ** request classification (Beta), Kubernetes Service API support (Alpha), integration with external licensing systems (Experimental), etc. Virtual machine integration continues the continuous optimization of virtual machine access experience after the introduction of intelligent DNS (to resolve cross-environment service name resolution) in version 1.8, and further enhances the ability of service grid to manage non-container environments.

  • 1.10 — **Kubernetes resource discovery selector, ** stable revision tags, Sidecar network changes, etc. The Kubernetes resource discovery selector can limit the configuration sets Istiod receives and processes from Kubernetes. In combination with the Sidecar CRD/API resources, the configuration volume of Istiod to Envoy is further optimized.

  • 1.11 — **CNI plug-in (Beta), External Control plane (Beta), ** Gateway injection, ** updates to revisions and tag deployment, ** Support for Kubernetes Multi-cluster service (experimental). The CNI plug-in provides users with Kubernetes environment to replace istio-init container solution (do not need higher Kubernetes permissions); The external control plane provides users with the grid control plane deployed in the management cluster. The update of revision and label deployment enables users to deploy and upgrade Istio, reducing the operation and maintenance risks of Istio.

  • 1.12 — **WebAssembly API, ** telemetry API, Kubernetes Gateway API WasmPlugin was added as the WebAssembly API to improve Istio’s experience of using WebAssembly for plug-in extensions.

Looking at the four releases of the Istio community in 2021, it’s easy to see:

  1. No significant architectural changes, no innovation capability: There is no specific threshold for an enterprise to choose an Istio version.

  2. Ease of access: Added content supported by VMS, CNI plug-ins, and WebAssembly to provide native capabilities for more complex service deployment environments, more demanding container environments, and more language extension requirements.

  3. Improved operation and maintenance: Stable revised labels and external control planes provide better native support for Istio’s own operation and maintenance and multi-cluster management and control.

  4. API standardization: Including WebAssembly API, Kubernetes Gateway API, Kubernetes Service API support, whether Istio’s own API standardization, or Kubernetes standard API support, The Istio community continues to work on API standardization.

Practice landing: industry extension

Service Grid (Buoyant) technology originated from large Internet companies (Google, IBM, Twitter/Buoyant), and its early applications were mostly Internet companies: With their profound skills and continuous investment in technology, Internet giants have completed the leapfrog of service grid from initial exploration to large-scale production and application in recent years. Small and medium-sized Internet companies have also followed the pace of Dachang, followed the tide of cloud native technology, and completed the “initial experience” of service grid. In 2021, enterprises in more industries will start to try the landing service grid.

Enterprise demands

Take the financial industry known for its large scale, high stability and strong security as an example. In 2021, the infrastructure teams of many large state-owned banks, joint-stock banks and securities brokers began to introduce service grid technology for technical research, platform building and business trial. This paper summarizes the typical demands of financial industry enterprises for service grid technology based on several leading financial industry enterprises we served in 2021 and other publicly available technical data.

  1. Ground zero threshold

In the review of Microservices 2020, we propose that “smooth ground support” is one of the two key elements of the enterprise ground service grid. This is particularly true in the financial sector. Service grid landing zero threshold is one of the core demands of enterprises.

We summarize the “three elements” that service grid needs to support enterprise landing: communication protocol, registry and deployment environment.

  • Communication protocols: Service communication protocols supported by the service grid, such as HTTP, gRPC, Dubbo, etc., as well as private RPC protocols with industry attributes;

  • Registries: registries managed by the service grid, including the common ones Eureka, Consul, Nacos, Zookeeper, and Kubernetes (ETCD);

  • Deployment environment: the service grid can support the business deployment environment, in addition to the natural cloud native Kubernetes + Docker, the same treatment should be given to the virtual machine and physical machine where the legacy system is located.

Only after meeting the “three elements” can the service grid reach the “pass line” of business landing.

In addition, we found more obstacles in the financial sector:

  • Strict environmental control: the deployment platform (container, virtual machine, physical machine) and the basic platform (micro-service, middleware) belong to different teams. Due to the division of corporate responsibilities, financial compliance requirements and other factors, the landing of the service grid is subject to more restrictions such as network environment, management rights, financial norms.

  • Stock of complex systems: the financial industry, the head enterprises are already has a relatively complete system of distributed, but there are many complex and heterogeneous, legacy systems, due to the development of language and communication protocol, can’t modify the code, not registered discovery mechanism factors, such as many systems cannot nanotubes in existing system, become “isolated island” of a distributed system.

  1. Architecture scene matching

Different from the traditional business scenarios that repeatedly cover the service governance capabilities on the side of the microservices framework, the service grid focuses on solving the architectural scenarios of enterprises. In addition to realizing the management and governance capabilities of microservices in the cloud native system, it is also necessary to cover the requirements of architecture scenarios such as unified governance of heterogeneous applications and migration of legacy systems, so as to solve the integrity problems existing after enterprise microsertization in a real sense.

We summarize the typical demands of financial industry enterprises in terms of architecture scenarios as follows:

  • Multi-cluster and multi-machine room service management: provides normal service discovery, invocation, governance, and cross-region disaster recovery.

  • Long-term, smooth and stable migration and evolution from the existing monomer and micro-service architecture to the cloud native service grid architecture: The evolution from the existing architecture to the service grid architecture in a grayscale mode in a business-insensitive way, and the migration process of services is interoperable, manageable and observable, with high SLA guaranteed.

The core value of

After the initial cognition of the service grid, enterprise users will often ask their soul: why do they want to join the service grid? What is the value of the service grid?

In general, the “standard answer” to the core values of the general service grid is:

  • Business-free micro-service component: the micro-service architecture support, network communication, governance and other related capabilities are sunk to the infrastructure layer, and business departments do not need to invest special personnel for development and maintenance, which can effectively reduce the r&d and maintenance costs under the micro-service architecture.

  • Support for multiple development languages and frameworks: The service grid naturally does not limit development languages and frameworks, providing multi-language service governance capabilities;

  • Zero-cost framework upgrade: Support hot framework upgrade, reduce the cost of middleware, technical framework client and SDK upgrade;

  • Unified management and evolution of microservice system: the storage microservice cluster, legacy system and outsourced system microservice system are managed and evolved in a unified manner.

For different teams within the enterprise, the value focus of the service grid will be different:

  • Infrastructure/platform development team: Architecture scenarios that focus more on service grid coverage

  • Multiple development languages, framework independent, can manage a variety of business applications access;

  • Framework upgrade cost zero, no business restart or perception;

  • The unified management and evolution of the microservice system enables the existing microservice clusters, legacy systems and outsourced systems to be managed and evolved in a unified manner.

  • Business development teams: Business scenarios that place greater emphasis on service grid coverage

  • One-click access micro-service governance A full set of governance and monitoring capabilities, such as fusing, traffic limiting, degradation, fault tolerance, fault injection, indicator monitoring, link tracking, etc.

  • Legacy and outsourced systems can be integrated into unified governance, with equal governance and monitoring capabilities, and interconnect with other business micro-services;

  • Without the awareness of microservice components, business developers no longer need to learn, research, and maintain microservice-related technologies and frameworks.

Facing the challenge

Even though the Istio version tends to be stable, many Internet companies have successfully completed the implementation of the service grid, and more industrial enterprises still face challenges in the implementation of the service grid.

  1. Technical aspect: Zero threshold access is difficult

From a technical perspective, there are three major challenges to achieve “zero threshold” :

  • Communication protocol extension — as the first of the “three elements” of the enterprise landing service grid, it is a huge project to realize the full set of capabilities such as proxy, parsing, governance and observability of communication protocol, especially for those private RPC protocols (especially common in the financial industry) that are designed far away from HTTP, gRPC and other general protocols. It requires a clever and complete extension mechanism.

  • Custom plugin extensions – most developers can’t directly write extensions to Envoy C++, Envoy native offers weak extensions to Lua, and WASM (WebAssembly) performance, which the community has been expecting, is far from production. You need to have an Envoy custom plug-in extension mechanism that is really good and productive to use.

  • Virtual machine/physical machine environment management – Although Istio community has been improving the virtual machine/physical machine environment management experience of the service grid, various public cloud vendors have also provided “blood version” capabilities. However, the business deployed in non-container is always like a “second-class citizen” — it is difficult to obtain the same service grid capability as the container environment deployment business, and a more complete and compatible non-container environment Sidecar management, traffic interception and other implementation solutions are required.

  1. Scene: It is difficult to cover complex scenes

The business of enterprises in the financial industry is often deployed and maintained under various environments and normative constraints. Coupled with the complexity of the business system itself, the combination of stock, legacy and outsourcing systems exists, the service grid is located in the financial industry.

  • Multi-cluster and multi-machine room deployment of services, inter-communication of cross-cluster and cross-machine room invocation, unified governance, abnormal disaster recovery, various high availability guarantee, etc., all require the service grid system to have adaptability.

  • Business architecture, smooth evolution from existing monomer, micro service architecture, gradually moving to the cloud native service grid architecture, including micro service framework, service grid “generations” long-term coexistence, service discovery technology stack, visits, governance, observation, need to truly realize the business architecture migration scenario high ability to adapt to and SLA guarantee.

Industry standard: set sail

After the service grid technology has been gradually stabilized in community progress and practice implementation, the corresponding industry standards and standard platforms have come into being and started to set sail.

Ict standards

In July 2021, at the trusted Cloud Conference 2021 hosted by China Academy of Information and Communications Technology, the standard of Service Grid Technical Capability Requirements was officially released. Alibaba, NetEase, Byte and Flomesh passed the first batch of evaluation and obtained the highest level of service grid evaluation. Interestingly, the first batch of four enterprises can be said to be typical representatives of cloud computing giants, established Internet companies, new Internet companies and technology-based startups, which also reflects the importance of various enterprises to promote service grid technology standards and implementation.

Standard platform

In 2021, the standard service grid platform provided by cloud computing vendors will be gradually improved and mature, and enterprises can choose the standard platform as required, or jointly build the service grid with manufacturers.

There are slight differences in the types of standard platforms provided by different vendors:

  • Native Istio resources + public cloud infrastructure + ecological integration: focus on the compatibility of native Istio and integration with the existing public cloud ecosystem;

  • Native Istio platformization + private deployment + three-party integration: Based on Istio expansion and enhancement, it shields the complexity of native Istio, focuses on unified management, control and governance of microservice system, and ADAPTS, compatibility and integration to the environment of enterprise privatization;

  • Self-developed partial system or whole system of service grid: it is not limited to Istio and other open source communities, and the weak points of open source service grid are strengthened.

Different platforms have their own applicable scenarios and strong and weak items, and enterprises can choose according to their own situation.

Technology ecology: a hundred schools of thought contend

The service grid will enter a stable period in 2021, and the service grid technology ecology will also bloom in this year.

The open source project

In 2021, a large number of excellent ISTIO-related projects will be open source to enhance Istio in terms of ease of use, scalability, operation and maintenance.

  • Slime: IsTIo-based smart service Grid manager that adds a non-intrusive management plane to Istio. It will be open source by NetEase in January 2021.

  • GetMesh: Istio integration and command line management tool. It can be used to manage multiple Istio versions. Open-source by Tetrate in February 2021.

  • Aeraki: Manages any seven-tier load for Istio, providing support for multi-protocol extensions to the service grid. Open source by Tencent in March 2021.

  • Layotto: the cloud native application can be used as the data plane of Istio. Open source by Ant in June 2021.

  • Hango Gateway: API Gateway built on Envoy and Istio, naturally compatible with Istio, providing native high performance and rich proxy capabilities. Open source by NetEase in August 2021.

The emergence of many service grid ecological open source projects confirms the vitality of the service grid field.

How the runtime

Similar to the idea of service grid sinking micro-service governance capability to infrastructure layer (Sidecar), multi-Runtime was proposed by Bilgin Ibryam in 2020, which summarized and sublimated various forms of Sidecar mode. The characteristics of multiple runtimes can be summarized as follows:

  • Capabilities: provide broader distributed capabilities than service grid, such as middleware proxy, message PUB /sub, etc.

  • Deployment: It can be 1:1 (PER-POD) or 1: N (per-node), deployed in multiple environments as required, and combined components.

  • Interaction: Communicate with the application through the standard API. No service intrusion is emphasized. There will be an SDK bearing the standard API in the application.

A typical multi-runtime open source framework is Microsoft’s Distributed Application Runtime (Dapr), which ushered in the landmark 1.0 version in 2021 and entered the CNCF Sandbox for incubation. It is still developing at a high speed.

From the perspective of implementation practice, multi-run time shows good potential and development trend in 2021:

  • Advanced concepts may be the future trend of distributed architecture;

  • Led by big factories, the community has developed rapidly, and several big factories have entered the site to explore;

  • The overall maturity is not high, and there are still deficiencies in point-to-point service communication governance, capability integrity, API stability and other aspects.

  • It can be ecologically integrated with existing technologies such as service grid to make up for shortcomings.

eBPF

EBPF technology makes it possible to program and run sandbox programs in the Linux kernel without changing the kernel source code or loading kernel modules. This allows developers to enhance system observability and optimize the network and its security from the kernel. In the domain of service grid, eBPF can be used for Sidecar network acceleration, and can observe the kernel message queue, task queue, network packet information, network connections and other deeper information from the bottom.

In 2021, Cilium (eBPF open source framework) proposed the idea of replacing Sidecar with **eBPF to realize kernel-level service grid (** data side agent) to solve the problems of deployment resource consumption, latency performance loss caused by independent Sidecar. To realize flow control in a real sense, observation capacity sinks to the infrastructure layer. However, Cilium’s bold idea soon received a backlash from the “traditional” service grid camp, citing the limitations of eBPF’s ability to implement service proxies, operational complexity, protocol processing complexity, and kernel version dependencies.

In any case, it is a new trend for eBPF technology to be integrated into the service grid ecosystem. Even if IT cannot be a perfect substitute for Sidecar, eBPF can also serve as a powerful supplement to Sidecar, enabling them to interact with each other on the traffic link

Proxyless

At the beginning of the service grid, independent Sidecar Proxy was responsible for the Proxy, governance and observation of traffic. The service grid implementation framework also organized the data plane capability in the way of independent Proxy by default, and drew a clear line with the traditional micro-service framework in the application process. It seems that the Proxy pattern is the standard pattern for serving the grid data plane. In 2021, the “dimensional wall” between in-process frameworks and standalone Sidecar proxies will be broken down and the concept of Proxyless will be mentioned more and more.

WHY Proxyless (essentially a response to the “disadvantages” of the service grid independent Sidecar Proxy model) :

  • Performance problems: Additional deployment resource cost and delayed performance cost brought by independent Proxy;

  • Traffic interception: Most traffic interception by independent Proxy needs to cooperate with technologies such as IPTables, which requires management permissions, complicated logic, and difficult obstacle removal.

  • Governance granularity: Independent Proxy works outside the application process and is stateless. Therefore, it cannot govern and observe programs and methods in the application process.

WHAT Proxyless (can provide complementary capabilities for various distributed scenarios) :

  • Service grid optimization: fine-grained governance, monitoring, and traffic interception capabilities within the application;

  • Multi-runtime operation: standard SDK is provided within the application to provide the interface for services to operate infrastructure resources.

  • Capability continues to sink: implement traffic handling, governance, and observation in the operating system kernel.

HOW Proxyless (several common implementations) :

  • Frameworks/SDK: Classic usage, going back in time;

  • Non-invasive Agent: The non-invasive way to realize the enhancement of business code, the principle can refer to the “Service framework: Non-invasive Agent Service Governance” part of the evolution practice of NetEase Twin-engine Multi-mode Service Governance from service framework to service grid.

  • Native RPC support: The new VERSION of gRPC provides the governance function and supports the standard xDS protocol that connects to the control plane directly.

  • EBPF: Processes, manages, and monitors traffic in the Linux kernel.

From the perspective of architecture evolution, Proxyless is suspected of “countercurrent” development. However, from the perspective of practical implementation, the capability supplement brought by Proxyless for Proxy may better help enterprises to complete the gradual migration from traditional architecture to cloud native architecture.

future

The review of Service Grid 2021 concludes here, and we are confident about the future of service Grid. At the end of this paper, we give the future outlook of service grid:

Zero threshold

With the gradual improvement and maturity of service grid technology, and the accumulation of landing experience in more and more industries, the technical and scene challenges will eventually be overcome, and the threshold of service grid landing will gradually approach zero.

standardized

The technical capabilities and scenario coverage of service grid will be highly abstract and generalized, and service grid platforms/products will be highly standardized accordingly, making it easier for enterprises to choose service grid platforms/products.

Comprehensive and unified

Service grid technologies such as Envoy and Istio will help unify related software fields. For example, more L7 traffic proxies will be constructed with Envoy as the core, and the data plane and control plane will interact with each other using xDS protocol. The unified global governance of distributed architecture that enterprise architects want to achieve is no longer a dream.

Ecological integration: Proxyless + Proxy + eBPF + multiple runtime

The different ecologies of the service grid will not be in opposition, but will eventually form a “joint force” in a pragmatic way to achieve mutual win-win: Proxyless -> Proxy -> eBPF cooperation on the traffic link to complement each other; The existing shortcomings of multi-operation can integrate the mature capabilities of service grid and accelerate its development.

References (special thanks to the many practitioners and sharers in the service grid space) :

From service to service grid framework, netease canoe twin-engine multimodal evolution of service management practices: https://www.infoq.cn/article/KNp1ibj40vS8IIZCizMW

Interpretation of micro service 2020: frame at the left grid on the right, cloud native era where micro FuWu Road: https://www.infoq.cn/article/4Zog2lMBqKjAeMTc8Add

Cloud flow entrance of native era of Envoy Gateway: https://www.infoq.cn/article/SF5sl4IlUtUxuED3Musl

Istio release 1.9 – the key to improve Istio Day2 operation: https://mp.weixin.qq.com/s/E7iwBF6hhPm5aTukTlTCMg

Istio release 1.10 and the website: https://mp.weixin.qq.com/s/Lq6zF90FR-ohT9ON-88Z_Q

Istio release 1.11: https://mp.weixin.qq.com/s/QkLUFOCQz2AWt2En-G-VQg

Istio release 1.12: https://mp.weixin.qq.com/s/Q52IQrXxxHEn2c8rkAVTgA

Based on gRPC and Istio grid without sidecars agent service: https://mp.weixin.qq.com/s/aYwo2criOotqNp8lD39QAA

In 2021, for grid service, what the community in discussion: https://mp.weixin.qq.com/s/ZDDC4YAebbdws8Md9zCrqQ

Dapr v1.0 outlook: from servicemesh to cloud native: https://skyao.io/talk/202103-dapr-from-servicemesh-to-cloudnative

Farewell sidecars – using EBPF unlock kernel level service grid: https://mp.weixin.qq.com/s/W9NySdKnxuQ6S917QQn3PA

原 文 : Will the service grid use eBPF? Yes, but the Envoy agency will continue to exist: https://mp.weixin.qq.com/s/iZYXPec7Lh0fhflA42d8gA

About the author:

** Pei Fei, ** Senior technical expert and Senior Architect of NetEase Shifan. With more than 10 years of enterprise platform architecture and development experience, I am now mainly responsible for the NetEase Shufan Micro service team, focusing on the research and implementation of enterprise micro service architecture and cloud native technology. Led the team to complete the landing of several projects in NetEase Group, such as Light Boat service grid, micro-service framework and API gateway, as well as the output of commercial products, and led the construction of several cloud native open source projects such as Slime and Hango.