Existing System Architecture



  • Business services, most of which have realized microservialization and statelessness, are deployed in Kubernetes cluster by Docker container, and service deployment and scaling are carried out by using Kubernetes container management ability. However, some services are only containerized without microservice transformation. Such services belong to SOA architecture, and one service may expose multiple business apis, which is similar to the situation mentioned by Mr. Ao Xiaojian in his series of articles “Multi-protocol Universal Solution in SOFAMesh”.
  • Some stateful public services, such as databases, FTP servers, shared caches, etc., are not currently included in the Kubernetes cluster, but business services have a lot of dependencies on these public services.
  • Other services not included in the Kubernetes cluster, such as those provided by legacy systems and third-party systems. There are requirements for certain business services to access each other with these services.





The service registry






github.com/ZTE/Knitter














Service discovery


API Gateway














  • Perform performance data collection and statistical analysis of external requests.
  • The authentication service is invoked to authenticate login to an external request.
  • Realized the current limit, fuse, black and white list and other functions.
  • Grayscale publishing of applications can be realized through the triage rule.





Pain points

  • At the initial stage of the system, most of the microservices were written based on Java. We encapsulated the logic for point-to-point communication between services, such as service discovery, retry, flow limiting and fusing, through SDK, and provided it to each business microservice. But as the system evolves, more and more microservices are written in Golang, as well as a fair number of Microservices based on Python, and as a platform, we can expect to see more languages in the future. The idea of writing an SDK for every language becomes increasingly difficult to maintain. It is a trend to sink the communication layer of microservices into the Mesh layer.
  • The API Gateway can conduct statistical analysis on the performance data of external requests, but cannot collect and process the performance data of calls between micro-services within the system. If the scheme of non-invasive requires in all languages and frameworks using a standard set of interfaces, and will be for different language to write the corresponding SDK, maintenance workload is very big, and for business service codes have larger limit, so the sidecars way between micro service invocation performance data collection is a more reasonable way.
  • The scheme that uses the shunt rule at the API Gateway to realize grayscale publishing has great limitations. It can only shunt the whole application, but cannot configure the shunt configuration for different versions of a single micro-service in the application. Therefore, it is basically impossible to achieve rapid upgrade iteration of microservice granularity through grayscale publishing.





Istio integration solution



Control surface




















The data plane








Current progress

  1. At present, we have completed the integration of Pilot and Mixer, but we have not considered the integration of Citadel for the time being because we have adopted our own security scheme in the system.
  2. Service discovery, routing rules, Metrics collection, and Distributed Tracing were validated.
  3. Develop a user-friendly interface for configuring and managing Istio routing rules.
  4. Based on Istio routing rules and K8S, an online gray scale upgrade of microservices is developed.





Potholes encountered and problems to be solved

  1. Istio does not support multiple network planes, resulting in an infinite loop when envoys perform service forwarding, where the environment Envoy is exhausted and restarts repeatedly due to File Descriptor. The problem was hidden and took almost two weeks to locate and troubleshoot. We are working on an extension based on Istio to support multiple network planes and are ready to contribute this code to the community.
  2. The community did not provide detailed documentation on how The Mixer was deployed outside of Kubernetes, and numerous pothholes were encountered when deploying the Mixer.
  3. Mixer has no Adapter in Consul Environment, so some attributes are missing in reported Metrics. We are evaluating and writing the Adapter, and are planning to contribute this part of the code to the community.
  4. Istio currently supports HTTP and GRPC, but does not support asynchronous messages like Kafaka. While asynchronous messaging is essential to achieving a business-level end-to-end routing control, the Envoy community has plans to support Kafaka. (github.com/envoyproxy/… 2)
  5. For IT applications, Service Mesh mainly deals with user traffic (northbound) and inter-service traffic (east-west). However, for CT applications, a large part of the traffic from nes to the management system (southbound) uses custom protocols and cannot be understood by Istio. The extension scheme of Ali SOFAMesh introduced by Mr. Ao Xiaojian in the multi-protocol General solution X-Protocol Introduction Series (3) — TCP Protocol Extension has put forward a good idea.






zhaohuabing.com/post/2 … …