Istio is one of the hottest projects in Service Mesh. Istio is responsible for managing traffic in the Service Mesh, including dynamic Service discovery, Service routing, and elastic capabilities. It serves as the control plane of the Service Mesh and the Envoy data plane to form the flow management system in the Service Mesh.

Traffic management configuration and related mechanisms in Istio system are quite complicated, so this paper will try to describe it as a whole, without focusing on details, in order to have a macro understanding.

Why do WE need a Service Mesh

To understand why an architecture like Service Mesh was born in the first place, we need to compare two things: the traditional microservice architecture and the native K8S architecture.

Traditional microservices

We know that the existing microservices architecture is mature, with complete service registration discovery and service governance functions (including traffic limiting fuses, load balancing, traffic routing, etc.), so what is still bothering us? Almost all of the existing service framework is integrated in the client SDK, developers in their own applications integration services framework SDK, which has the function of service governance related, so it will be a problem of two almost inevitable: applications need to update the code to update the client SDK, among which may appear on all kinds of conflict; Application integration and use of the client SDK relies on the quality of the developer and often leads to inefficiencies.

In this case, we naturally think of using Sidecar to move the logic in SDK to Sidecar and integrate it with application services during operation and maintenance, forming a more advanced sidecar mode than SDK integration. This approach has little to no intrusion into the application code and is not restricted by the level of the developer, completely separating control from logic. Of course we lose some performance in exchange for ease of use.

The Sidecar pattern is one of the main advantages of Service Mesh over microservices, which is also known as the “data plane” in service Mesh architecture.

Native k8s

So what we’re thinking about here is what are we not satisfied with above K8S? K8s uses the concept of Kube-DNS and Kube-Proxy with Service to support service registration and discovery, which means we have the foundation to build microservices on K8S. However, in actual production, we often need to manage traffic more carefully. We want to manage the traffic of a single POD or a group of pods, not just the service dimension. For example, one requirement is that we want some of the pods under endpoints to be version v1 and the rest to be version V2, and requests prefixed with /v1 will be scheduled to version V1 and requests prefixed with /v2 to version V2.

Sidecar is deployed as a data plane with each application in the same POD, which means that we can manage poD-level traffic. This logic is implemented by another layer in the Service Mesh: the “control plane”.

Istio architecture

Istio is the most popular architecture in Service Mesh. Let’s take a look at the overall architecture of Istio.

Istio is divided into the data plane and the control plane. The data plane is deployed with applications in the form of sidecar to send and receive traffic, while the control plane organizes network logic by configuring and controlling messages and sends them to the data plane.

Let’s talk briefly about the components:

  • Envoy: Envoy is a high performance Proxy developed in c++ that is used in Istio as a data plane (Proxy) to control the inbound and outbound traffic of applications. In Istio, It has dynamic service discovery, load balancing, Http2/gRpc agent, fuse, health check, fault injection and other features, of course, these need to control the plane with instructions to achieve.

  • Mixer: The Mixer is one of the components of the Istio control plane used to perform access control and telemetry (no longer available in the latest version)

  • Pilot: Pilot is a key component for implementing traffic management, providing service discovery, intelligent routing, and elastic capabilities (timeout/retry/fuse) for envoys, which we describe in detail below.

  • Citadel: Citadel supports powerful service-to-service and end-user authentication and traffic encryption with built-in identity and certificate management.

  • Galley: Galley is the configuration validation, extraction, processing, and distribution component of Istio.

Concept of traffic management in Istio

Traffic management is the most basic and important feature in Istio. As we mentioned earlier, our real appeal is often to specify a POD or a set of pods to receive traffic, basically through human operations, but more elegantly by delegating to Istio itself, and Istio takes care of this through Pilot and Envoy proxies.

How to fix this? Since the kubernetes service abstraction didn’t meet our needs, it was natural to try to solve the problem by abstracting some higher-level concepts. The same is true for Istio, but let’s talk about some of the basic concepts (the configuration is skipped for space) :

  • VirtualService: As the name implies, it can be understood as a layer of service abstraction. It defines routing rules and controls the routing of traffic to the matching service subset (VirtualService). In addition, you can set some independent network elasticity properties for each VirtualService, such as timeout and retry.

  • DestinationRule: defines the destination service policy, such as circuit breakers and load balancers, after the VirtualService route takes effect. Of course, it can also define a routable subset, that is, routing rules in VirtualService can be fully delegated to DestinationRule.

  • ServiceEntry: This is used to add services outside the Istio service grid to the internal service registry, which means you can access external services.

  • Gateway: the Gateway used to control north-south traffic. Bind VirtualService to the Gateway to control incoming HTTP/TCP traffic.

  • EnvoyFilter: Configure filters primarily for envoys that dynamically extend Envoy’s capabilities.

Control plane Pilot flow management

In the control plane of Istio, Pilot is the component responsible for traffic management and is also the protagonist of this article. The overall architecture of Pilot is as follows (from old_pilot_repo, but not entirely different) :

We see that Pilot mainly consists of two components, Discovery Services and Agent.

  • Agent: This process corresponds to the Pilot Agent, which produces the Envoy profile and manages the Envoy life cycle. It is deployed in the same Pod as the Proxy(i.e. Envoy) and the specific application Service A/B.

  • Discovery Services: This process corresponds to The pilot-Discovery process, which is responsible for the most critical logic in the Pilot, namely, service Discovery and traffic management. Discovery Services are typically deployed in a separate Deployment from the application. It deals with two types of data. One is the service information of K8S API Server in the figure, that is, service, endpoint, POD, Node and other resources. The other is some CRD resources in K8s API Server, including the traffic rule configuration information of the above VritualService, DestinationRule, Gateway, ServiceEntry and other Istio control planes. Discovery Services then converts these two pieces of data into a format that the data side understands and sends them to the sidecar via standard apis.

Pilot Agent

Pilot Agent is not the main character of this article, so we will briefly introduce it. Its main work includes:

  • Generate envoy related configurations, by which I mean a few static configurations, since most dynamic configurations are retrieved from Pilot via the standard xDS interface.

  • Responsible for monitoring and managing envoy processes, such as restarting an envoy after it has been suspended or reloading an envoy after configuration changes.

  • Start the envoy process

Pilot-discovery

Pilot-discovery is a key component in pilot, which we describe in more detail.

The overall model

The overall model of Pilot-Discovery is as follows:

Pilot-discovery has two parts of input information:

  • Information from the ISTIO control plane, the Rules API in the figure, including VirtualService, DestinationRule, and so on, This information is stored in kubernetes API Server as a Kubernetes CRD resource.

  • Service registry information from the service registry, namely Kubernetes, MesOS, Cloud Foundry, and so on. Of course, we assume kubernetes by default, and the descriptions below also assume kubernetes by default.

In order to support different service registries, it is necessary to have a unified data storage Model, namely Abstract Model, and a converter Platform Adapter to implement the data transformation from each service registry to Abstract Model. Of course, in addition to registry data, Platform Adapter also needs to convert CRD resource information such as VirtualService and DestinationRule into Abstract Model.

Based on the Unified Data Storage Model Abstract Model, Pilot-Discovery provides control information services, called XDS API services, to the data side to deliver control information to the data side envoy.

Let’s talk about the whole process of Pilot-Discovery.

Initialization work

The first thing that pilot-Discovery does is do some initialization, which includes:

  • Create a Kubernetes client. To interact with Kubernetes API Server, you need kubeClient. There are two ways to create kubeClient. One is to use a specific kubeConfig file to connect to the Kubernetes API Server. The other option is to use in Cluster Config, which automatically completes the configuration by being aware of the cluster context if you are in a Kubernetes cluster.

  • Multi-cluster configuration. In reality, we may have multiple kubernetes clusters. One way is to build one Istio for each Kubernetes cluster, but it is also possible that multiple clusters share one Istio. Istio then needs to connect to other remote Kubernetes clusters, called Remote Clusters, and Istio keeps remote access information for each remote cluster in a map.

  • Configuration related to Mixer is not described in detail here.

  • Initialize and configure storage center connections. As mentioned earlier, many traffic management configurations in Istio, including VirtualService, DestinationRule, etc., need to be stored in the storage center (not Etcd). Istio supports storing these configurations either through file storage or kubernetes CRD, but mostly the latter. When the CRD resource is registered, a Config Controller is created to handle CRUD events for the CRD resource.

  • Configure the connection to the registry. This registry basically refers to Kubernetes. As mentioned above, we need to listen for pod, Service, endpoints and so on in Kubernetes. Of course, we also need a Service Controller to handle events.

  • Example Initialize the pilot-Discovery service. Because The Envoy Sidecar connects to the Pilot – Discovery service to get service registration discovery information and traffic control policies, you need to initialize the Discovery service, both the SERVICE that provides REST and the service that provides gRPC.

  • Some health checks and monitoring.

CRD information processing, such as traffic policies

As we mentioned before, traffic policies and other related rules in Istio are stored in Etcd behind Kubernetes API Server in the form of Kubernetes CRD, including VirtualService and DestinationRule. When these CRD resources are registered, a Config Controller is created to handle CRD events.

The config Controller mainly consists of implementing a list/watch for each CRD resource, and then creating a unified flow framework for its Add, Update, Delete and other events. Encapsulate CRD object events into a Task and push them into a queue. The coroutine is then started to process the CRD resource events in turn, fetching the Task from the Queue and calling its ChainHandler.

Service registration information processing

The service registry refers to the kubernetes service registry by default (other registries are also supported, but less popular), i.e., POD, Service, Endpoints, Node, etc. We talked earlier about Pilot creating a Service Controller to monitor and handle kubernetes resources.

The logic of service Controller is basically the same as that of Config Controller. Implement a list/watch for each kubernetes resource, and encapsulate its Add, Update, and Delete events as Task objects pushed into the queue. The coroutine is then started to fetch from the queue and call the CHainHandler.

Expose the control message service for Envoy

We know that envoys interact with Pilot exposed information services to get information about service discovery/routing policies and so on. Pilot – Discovery creates a gRPC protocol discovery service that interacts with Envoys via the xDS API, including EDS, CDS, RDS, LDS, and more. The details will be described later.

Data side Envoy and xDS services

Data planes are intelligent agents deployed in a Sidecar manner. In Istio, data planes are Envoy Envoy. Envoy can adjust network traffic between control service grids and is the de facto enforcer of control plane traffic management.

Envoy xDS is a set of APIS designed for Istio control plane to data plane communication, which is key to delivering traffic management configurations.

Basic concepts in Envoy

First, before we look at xDS, we need to understand some basic concepts in Envoy:

  • Host: indicates the entity that can communicate with the network. A host in Envoy is a logical network application that can run multiple hosts on a piece of physical hardware as long as they are individually addressed.

  • Downstream: A Downstream host connects to an Envoy and sends a request and receives a response.

  • Upstream: The Upstream host receives the connection request and response from the Envoy.

  • Cluster: Represents a set of upstream host clusters Envoy connected to. Envoy discovers members in a cluster through service discovery, and Envoy can use active health checks to determine the health of cluster members.

  • Endpoint: indicates the upstream host id, including the Endpoint address and health check configuration.

  • Listener: An Envoy exposes one or more listeners to downstream hosts. When a listener listens to a request, the processing of the request is abstracted as a Filter, such as ReadFilter, WriteFilter, HttpFilter, etc.

  • Listener filter: Filter, in terms of pluggable and combinable layers of logic processing, is the Envoy’s core logic processing unit.

  • Http Router Table: refers to Http routing rules, such as which domain names are forwarded to which clusters.

Envoy architecture

The overall architecture of the Envoy is as follows:

Envoy workflow sends requests to upstream hosts (Host A /C/D), intercepts requests (Host B/C/D), listens to downstream Host requests and abstracts them into Filter Chains, In addition, routes are routed to upstream host clusters based on traffic policy configurations to complete routing and forwarding, load balancing, and traffic policies.

The configuration information related to the traffic policy mentioned above is mainly implemented by the xDS API in a dynamically configured manner and is the focus of our attention. In addition, an Envoy acting as a traffic proxy can write some configurations statically to a file that can be loaded directly upon launch.

XDS service

XDS, where x is a pronoun, in Istio, These services include Cluster Discovery Service (CDS), Endpoint Discovery Service (EDS), Route Discovery Service (RDS), and Listener Discovery (LDS) Service, and aggregated discovery Service (ADS) can encapsulate these apis.

In the concept introduced by the API, the endpoint represents a specific application instance, corresponding to an IP address and port, similar to a POD in Kubernetes. A cluster is an application cluster that corresponds to one or more endpoints, similar to the Kubernetes concept of Service (actually smaller). Route means that when we do similar grey hair and canary publishing, the same service will run multiple versions, each version corresponds to a cluster. At this time, the route rule needs to specify how to route requests to a certain version of the cluster.

In the actual request process, the xDS interface adopts the following order:

  1. CDS first updates the Cluster data
  2. EDS Updates the Endpoint information of the Cluster
  3. LDS updates the CDS/EDS Listener
  4. RDS finally updates the Route configuration related to listeners
  5. Delete the CDS/EDS configuration that is no longer used

However, at present, this process has been integrated in ADS, that is, aggregated service discovery. ADS ensures the call sequence of each xDS interface through a gRPC stream.

We talked earlier about how the work of Pilot-Discovery involves creating a Discovery service. Now it will set up a two-way streaming gRPC connection with each Envoy, and the Envoy Proxy will make a request through the ADS interface following the invocation logic described above. In addition, the Pilot’s key concepts of traffic management such as VirtualService and DestinationRule are assembled into envoy configurations such as cluster, endpoint, router, listener, etc. Eventually these dynamic configurations will be applied to each Envoy Proxy, enabling actual dynamic service discovery, traffic management, and other functions as Envoy Listeners listen for downstream host requests.

The resources

  1. Istio Handbook
  2. Istio source code analysis of the Pilot – Discovery module analysis
  3. Pilot – Discovery Module for ISTIO Source Code Analysis (Continued)
  4. Chinese version of Envoy’s official document
  5. Istio Prelim 1.6 documentation