Istio Technology and Practice 05: How to Use Istio to achieve traffic Management

What is Istio?

Version 1.0 of Istio was released at midnight on August 1st. The core features have been added to the production environment and published on wechat official accounts and blogs. So what exactly is Istio? What can solve the problem?

Istio is another open source initiative of Google after Kubernetes. The main participating companies include Google, IBM, Lyft, etc. It provides a complete and non-invasive microservice governance solution to solve the management of microservices, network connectivity and security management and other application network governance issues
It enables load balancing of microservices, authentication and authorization between services, and traffic monitoring and governance without modifying any code. From an overall infrastructure perspective, it can be understood as a microservice-oriented management platform in addition to the PaaS platform.

Istio and Kubernetes

Kubernetes provides deployment, upgrade and limited operation traffic management capabilities, using the mechanism of service to do service registration and discovery, forwarding, through kubeProxy has a certain forwarding and load balancing capabilities. But does not have the upper level such as circuit breaker, current limiting degradation, call chain governance and other capabilities. Istio complements k8s’s microservice governance capabilities and is built on k8s, but not a complete remake like SpringCloud, Netflix, etc. Istio is a key part of Google’s microservices governance.

This section describes Istio traffic management capabilities

Istio for connecting, protecting, controlling, and observing services. Today, we’re going to talk about Istio’s number one feature — the connection service. So, three questions arise:

How does Istio connect services?
What traffic management capabilities are available after connection?
How can Istio be told to leverage these capabilities?

1. How does Istio connect services?

2. What traffic management capabilities are available after connection?

In terms of traffic management between services, Istio implements the following four functions: request routing, service discovery and load balancing, fault handling, and fault injection.

A. Request routing

In addition to distributing traffic by percentage between versions, you can route requests to different versions based on their content

B. Service discovery and load balancing

The service grid has three dynamic cycles: service registration, service discovery, and load balancing.

Typically, container management platforms such as Kubernetes, MESos, and others already provide service registries to track load instances of services, so Pilot can easily access all service registries in the service grid and communicate this information to proxies in all services. Proxy performs service discovery based on this information and dynamically updates its load balancing pool accordingly. A Service usually has multiple load instances. When Service A requests ServiceB, different load balancing modes can be configured: polling, random, and weighted minimum request. Suppose A load instance of Service B fails because the Proxy in Service A periodically performs Service discovery to remove the failed instance from its load balancing pool.

C. Troubleshoot the fault

Envoy provides a set of out-of-the-box, optional troubleshooting capabilities that are great for the service in your application. These features include:

timeout
With a timeout budget and limited retries, the time between retries is jitter
Limit on the number of concurrent connections and upstream service requests
Active (and periodic) health checks are run for each member of the load balancing pool
Fine-grained fuses (passive health check) – for each instance in a load balancing pool

Use Service A to call Service B as an example.

For function 1. If Service B clearly knows that the timeout after 10s is bound to cause failure, it would be A wise move to shorten the timeout period so that Service A can know the result more quickly and respond to it.

For function 2. For overloaded Service B, the jitter between retries greatly reduces the impact of retries, and the timeout budget ensures that Service A gets A response (success/failure) within A predictable time frame.

For function 3. By limiting the number of connections and requests made by Service A or other services to Service B, Service B can be protected from DDOS attacks or crash due to heavy traffic.

For functions 4 and 5. The combination of active and passive health checks minimizes access to unhealthy instances in the load balanced pool. When combined with platform-level health checks, such as those supported by Kubernetes or Mesos, applications can ensure that unhealthy load instances are quickly removed from the service grid, minimizing the impact of request failures and delays.

Together, these features enable the service grid to tolerate failed nodes and prevent local failures from degrading the stability of other nodes.

D. Fault injection

Although proxies provide a number of the troubleshooting mechanisms described in the previous section for services running on Istio, it is still necessary to test the end-to-end failure recovery capabilities of applications composed of the entire service grid. Misconfigured failover policies (for example, incompatibilities/restrictive timeouts across service invocations) can lead to persistent unavailability of critical services in an application, thus damaging the user experience.

Istio can inject protocol-specific faults into the network without killing load instances, creating packet latency or corruption at the TCP layer. Our reasoning is that the failures observed at the application layer are the same regardless of the failures at the network level, and that more meaningful failures (for example, the HTTP classic 4XX and 5XX error codes) can be injected at the application layer to verify and improve application resilience.

O&m personnel can configure faults for requests that meet certain criteria, and can further limit the percentage of requests that suffer failures. Two types of failures can be injected: delays and interrupts. Latency is a timing fault that simulates a rise in network latency or an upstream service overload. An outage is a breakdown fault that simulates an upstream service. Interrupts usually take the form of HTTP error codes or TCP connection failures.

Again, use Service A to call Service B.

If the delay of 10s or 503 interruption is set for Service B, Service A will not get the response of the request until at least 10s, or the response of the request is 503 error. Through the coverage tests of various scenarios, the comprehensive performance of Service A in these scenarios can be obtained, so as to make targeted improvements. Increase its toughness.

3. How to tell Istio to use these capabilities?

Istio has four profiles that help us customize all of the above traffic management requirements: VirtualService, DestinationRule, ServiceEntry, and Gateway:

By configuring VirtualService, you can implement the request routing function.
By configuring DestinationRule, you can realize the functions of service discovery, load balancing, fault handling and fault injection.
You can configure ServiceEntry to enable services in the service grid to view the outside world.
By configuring the Gateway, the services of the service grid can be seen all over the world.

With these four tools, all of our traffic management needs for the service grid can be met.

Given the space, let’s take three simple examples:

Suppose our service grid has 1 Service Explorer with only 1 V1 version; There is another service helloWorld with v1 and V2 versions.

(1) To make all explorer requests go to v1 with a 75% probability and to V2 with a 25% probability, just configure the following two files: VirtualService and DestinationRule.

apiVersion:networking.Istio.io/v1alpha3
kind:VirtualService
metadata:
  name:helloworld
spec:
  hosts:
    - helloworld
  http:
  - route:
    - destination:
        host:helloworld
        subset:v1
      weight:75
    - destination:
        host: helloworld
        subset: v2
     weight: 25

apiVersion:networking.Istio.io/v1alpha3
kind:DestinationRule
metadata:
  name: helloworld
spec:
  host: helloworld
  subsets:
  - name:v1
    labels:
      version: v1
  - name: v2
   labels:
      version: v2
Copy the code

If you need to access www.google.com to tell Explorer what the world looks like, you need to configure the following two files: ServiceEntry and DestinationRule:

apiVersion:networking.Istio.io/v1alpha3
kind:ServiceEntry
metadata:
  name: googleapis
spec:
  hosts:
  - "*.google.com"
  ports:
  - number:443
    name:https
    protocol:http
 
apiVersion:networking.Istio.io/v1alpha3
kind:DestinationRule
metadata:
 name: googleapis
spec:
  host: "*.google.com"
Copy the code

③ If helloWorld needs to be accessed by a service network other than the Explorer service, configure the following two files: Gateway and VirtualService:

apiVersion:networking.Istio.io/v1alpha3
kind:Gateway
metadata:
 name: helloworld-gateway
spec:
   selector:
     Istio:ingressgateway
 servers:
 - port:
     number: 80
     name: http
     protocol: HTTP
   hosts:
   - 'helloworld.com'
  
apiVersion:networking.Istio.io/v1alpha3
kind:VirtualService
metadata:
name: bookinfo
spec:
hosts:
   - 'helloworld.com'
 gateways:
 - helloworld-gateway
 http:
 - route:
  - destination:
       host: helloworld
       port:
         number: 9080
Copy the code

So far, a simple summary: Pilot and Proxy, provided by Istio, form a service grid of hundreds of services. Based on this, we can implement traffic management capabilities such as request routing, service discovery and load balancing, troubleshooting, and fault injection. We only need to VirtualService, DestinationRule, ServiceEntry and Gateway resources to do a simple configuration, you can achieve.