The inter-service communication and service governance of microservices are two core issues in the realization of microservices architecture.

This paper hopes to share relevant experience with you through combining the practice in scallop production environment (one million days of life scenario).

Service to service communication

In terms of communication types, there are roughly three types: synchronous call, asynchronous call, and broadcast.

At the beginning of the design of microservices, we should think clearly about the call relationship and call methods, which need to be synchronous, which can be asynchronous, which need to broadcast, and it is best to have a unified understanding within the team.

Then it is necessary to determine the good call protocol, for example, the common choice is:


  • Synchronous calls: HTTP REST, gRPC, Thrift, etc.
  • Backend includes Redis, various AMQP implementations, Kafka, etc
  • Broadcast: various AMQP implementations, Kafka, etc.

We need to think about how to choose from many angles. There are a lot of “X vs Y” articles and q&a on the Internet. It’s really just from several perspectives: performance, maturity, ease of use, ecology, technical team. Our advice: prioritize those with an active community and who fit into your team’s technology stack. Community active technologies often represent trends and ecology, and the compatibility of the team’s technology stack is the guarantee of success.

For example, scallops chose gRPC as the interface protocol for synchronous calls. There are two main considerations: 1. The community is active and developing rapidly; 2. 2. Based on HTTP/2, compatible with our technology stack.

In addition to the selection of protocols, more important should be the management of interface documentation. Ideally, interface documentation is strongly related to the code, as gRPC’s ProTO is to the generated code. Otherwise, people tend to be lazy, and it is likely that the code has changed and the documentation has not changed.

In a word, we need to do a good job in inter-service communication:


  • Define interface specifications, what to use for synchronous calls, what to use for asynchronous calls, and what to use for broadcasts; What protocol is used for synchronous calls and what is used for asynchronous calls
  • Determine the interface protocol, and think about the management of interface documents, interface documents and code how to establish a strong connection

Service governance

Service governance is a very big topic, really to spread out to tell, may be a few articles are not over. Here we would like to take a brief look at the problems that service governance is trying to solve, as well as some of the current solutions and trends. Finally, the scallop as an example, a brief introduction to a real million days of production environment, service governance is how to do.

What is service governance

Microservitization brings many benefits, such as decomposing and reducing complexity by dividing complex systems into microservices that are easily understood and maintained by small development teams. However, there are many challenges, such as: microservice connectivity, service registration, service discovery, load balancing, monitoring, AB testing, Canary publishing, traffic limiting, access control, and so on. These challenges are the content of service governance.

Existing programs

Service governance has a long history, especially after the prevalence of microservitization. The main solutions are based on frameworks such as Spring Cloud or Dubbo. The problems with these solutions are: 1. They are intrusive, which means you have to change a lot of things if you want to change the framework. 2. Language specificity (Java), not if we were using Go/Python, or if our microservices weren’t all Java.

Service Mesh

Service Mesh 2017 came out of the blue and blew our minds. There are a lot of information about Service Mesh on the Web. You can check it out online. As I understand it, the core idea of Service Mesh is “proxy traffic”. The Service Mesh forwards/receives all traffic for microservices through “proxies”. By controlling these proxies, Service connection, registration, discovery, load balancing, fusing, monitoring and other functions related to Service governance can be realized. In other words, the microservice code does not need Service governance implementation. That is, service governance is transparent to microservice developers. As shown in the figure below, the green square is the microservice, the blue square is the proxy of the Service Mesh, and the blue line is the communication between services. You can see that the blue squares and lines make up the entire grid. This network is the Service Mesh.


It is generally accepted that there are two generations of Service Mesh: first generation Linkerd/Envoy and second generation Istio/Conduit. The first generation is relatively mature and stable and can be directly used in the production environment, while the second generation is not perfect at present (early 2018) and is seriously not recommended for production.

The scallop Service Mesh is implemented based on Envoy with Kubernetes.

GRPC service discovery, load balancer and RateLimit

This section uses scallops as an example to briefly introduce how we do service governance.

Start with some preconditions: scallop microservices are all containerized, the orchestration system is based on Kubernetes, synchronous invulsions are based on gRPC and asynchronous invulsions are based on rabbitMQ. Python3, NodeJS, go are the main development languages. The Service Mesh is based on Envoy

The overall solution is as follows: Envoy deployed to each Node of Kubernetes as DaemonSet, using Host network mode. All microservice pods send gRPC requests to the Node’s Envoy for load balancing. As shown below:


Here’s a little bit of a detailed explanation.


  1. EnvoyIn theRouteSimilar to theNginxLocation.ClusterSimilar to theNginxupstream.EndpointCorresponding to theNginxupstreamEntry in.
  2. The choiceEnvoyBut no useLinkerdBecause at that timeEnvoyIs theHTTP/2Support the best. And it consumes less resources.
  3. The choiceHostNetwork mode is designed to maximize performance.
  4. forEnovyIn terms of service discovery, it is tellingEnvoy, eachClusterIP of the instance behind which the service is provided (corresponding toKubernetesThat isPodWhat is the IP) of.
  5. Initially, service discovery is leveragedKubernetesDNS, therefore, in creatingServiceWhen, to useClusterIP: None.
  6. Later service discovery is based onKubernetesEndpointAPI implementsEnovyEDS(We will open source this project on GitHub later).
  7. forEnvoyIn terms of achieving circuit breaker, as long as the realizationrate limit serviceWill do. We are based onLyft/ratelimitRate Limit service implemented.
  8. All calls between microservices can passEnvoy 的 statisticSo we do monitoring alerts for service calls against the STATISTIC API.
  9. Similarly, call logging can be leveragedEnvoy 的 Access LogTo implement.

To be continued

In terms of service governance, we’ll share DevOps practices, monitoring alarm systems, building logging systems, and more in upcoming articles. Finally, if you are also interested in microservices, K8S, Service Mesh, devops, please join us!


Welcome to our zhihu column: Scallop Technical Architecture Team