1 introduction

Istio, Greek for setting sail.

A fully open source service grid product that is transparent to distributed applications. Istio manages traffic between services, enforces access policies and aggregates telemetry data without changing application code. Istio simplifies deployment complexity by layering existing distributed applications in a transparent manner.

Is also a platform that can be integrated with any logging, telemetry, and policy systems. Serves the microservices architecture and provides a unified approach to securing, connecting, and monitoring microservices.

On the basis of the original data plane, the control plane is added.

Why did it go viral?

  • Timely release (May 2017 version 0.1)
  • Giant manufacturer buff buff buff
  • Second generation Service Mesh
  • Istio received a boost from the Envoy’s arrival
  • powerful

advantages

  • Easily build a service grid
  • The application code does not need to change
  • powerful

2 Core Functions

2.1 Flow Control

Routing, traffic diversion Traffic in and out of the network elasticity test related

2.1.1 Core Resources (CRD)

Virtual Services and Destination rules are key components of Istio traffic routing.

2.1.1.1 Virtual Service

Virtual services let you configure how requests are routed to services within the service grid, based on the basic connectivity and service discovery capabilities provided by Istio and the platform. Each virtual service contains a set of routing rules that Istio evaluates sequentially, and Istio matches each given request to the actual destination address specified by the virtual service. Your grid can have multiple virtual services or none, depending on the usage scenario.

  • Routes traffic to the specified destination address
  • The request address is decoupled from the real workload
  • Contains a set of routing rules
  • Usually paired with Destination rules
  • Rich routing matching rules

2.1.1.2 Destination Rule

Defines the real address, the subset, of the destination address of the virtual service routing. Set the load balancing mode

  • random
  • The weight
  • Minimum number of requests

2.1.1.3 Gateway (Gateway)

Egress This parameter is not required.

Service Entry

  • Use Service Entry to add an Entry to the Service registry maintained internally by Istio, which registers external services to the grid.

With the service entry added, the Envoy proxy can send traffic to the service as if it were inside the grid.

Configuration service entry allows you to manage the traffic of services running on the network. It includes the following capabilities:

  • Redirect and forward requests for external targets, such as API calls from the Web side or services flowing to legacy systems.
  • Define retry, timeout, and fault injection policies for external targets.
  • Add a service that runs on a virtual machine to extend your grid.
  • Logically add services from different clusters to the grid to implement a multi-cluster Istio grid on Kubernetes.

You do not need to add service portals for every external service that a grid service uses. By default, Istio configures an Envoy proxy to deliver requests to unknown services. However, you cannot use Istio’s features to control target traffic that is not registered in the grid.

Sidecar

By default, Istio allows each Envoy proxy to access requests from all ports of its associated workload and then forward them to the corresponding workload. You can use the Sidecar configuration to do the following:

  • Fine-tune the set of ports and protocols accepted by the Envoy agent.
  • Limit the set of services that an Envoy agent can access.

You may want to limit such sidecar reachabability in larger applications, and configuring each agent to access any service in the grid may affect the grid’s performance due to high memory usage.

You can specify that the Sidecar configuration should be applied to all workloads in a particular namespace, or select a specific workload using workloadSelector.

2.1.2 Network Elasticity and Testing

In addition to diverting your grid, Istio provides optional failover and fail-injection capabilities that you can configure dynamically at run time. Using these features allows applications to run stably, ensures that the service grid tolerates failed nodes, and prevents a cascade of local failures from affecting other nodes.

timeout

The timeout is the amount of time an Envoy proxy waits for a response from a given service to ensure that the service does not hang indefinitely waiting for a response and that the call succeeds or fails within a predictable time range. The default timeout for HTTP requests is 15 seconds, which means that if the service does not respond within 15 seconds, the call will fail.

For some applications and services, Istio’s default timeout may not be appropriate. For example, too long a timeout can cause excessive delays waiting for a response from a failed service; Too short a timeout can trigger an unnecessary failure while waiting for an operation that involves the return of multiple services. To find and use the best timeout Settings, Istio allows you to easily adjust timeouts dynamically by service using virtual services without having to modify your business code.

retry

The retry setting specifies the maximum number of times an Envoy proxy can attempt to connect to the service if the initial call fails. Retries improve service availability and application performance by ensuring that calls do not permanently fail due to problems such as temporarily overloaded services or networks. The interval between retries (25ms+) is variable and is automatically determined by Istio to prevent the called service from being swamped with requests. The default retry behavior for HTTP requests is to retry twice before returning an error.

As with timeouts, Istio’s default retry behavior may not suit your application’s needs in terms of latency (too many retries on failed services can slow things down) or availability. You can adjust retry Settings on a service-by-service basis in a virtual service without having to modify the business code. You can further refine the retry behavior by adding a timeout for each retry and specifying the amount of time to wait for each retry to successfully connect to the service.

fuse

Fuses are another useful mechanism Istio provides for creating resilient microservice applications. In the fuse, set a limit on calls to a single host in the service, such as the number of concurrent connections or the number of failed calls to that host. Once the limit is triggered, the fuse will “trip” and stop connecting to the host. Using circuit breaker mode allows you to fail quickly without having to let clients try to connect to overloaded or failing hosts.

Fuses apply to the “real” grid target addresses in the load balancing pool, and you can configure fuse thresholds in the target rules so that the configuration applies to each host in the service

Fault injection

After configuring the network, including the fail-over policy, Istio’s fail-over mechanism can be used to test fail-over capabilities for the entire application. Fault injection is a testing method that introduces errors into a system to ensure that the system can withstand and recover from error conditions. Using fault injection is particularly useful to ensure that failover policies are not incompatible or too restrictive, which could make critical services unavailable.

Unlike other error injection mechanisms, such as delaying packets or killing pods at the network layer, Istio allows error injection at the application layer. This allows you to inject more relevant faults, such as HTTP error codes, to get more relevant results.

There are two types of faults that can be injected, both using virtual service configurations:

  • delay

Delay is a time failure. They simulate increased network latency or an overload of upstream services.

  • Termination of

Termination is a crash failure. They mimic the failures of upstream services. Termination is usually in the form of an HTTP error code or a TCP connection failure.

Flow image

Traffic mirroring, also known as shadow traffic, is a powerful feature for bringing changes to production with the lowest possible risk. Mirroring sends a copy of real-time traffic to the mirroring service. Mirroring traffic occurs outside the critical request path of the master service.

In this task, all traffic is first routed to the v1 version of the test service. Rules are then executed to mirror a portion of the traffic to version V2.

2.2 Observability

Observability ≠ monitoring

Monitoring refers to passively examining the system behavior and status from the perspective of operation and maintenance. It refers to exploring the running status of the system outside the system. Observability is to actively explore the state of the system from the perspective of the developer, and to consider which system indicators should be exposed during the development process. In the original period, we all check the running state of the system through logs, so this is also a kind of innovation in concept.

composition

Metrics

Use aggregated data to monitor how your application is performing. To monitor service behavior, Istio generates metrics for all incoming and outgoing service traffic in the service grid. These metrics provide information about behavior, such as total traffic, error rates, and request response times.

In addition to monitoring the behavior of services in the grid, it is important to monitor the behavior of the grid itself. Istio components can export indicators of their own internal behavior to provide insight into the functionality and health of the grid control plane.

Istio indicator collection is driven by o&M personnel configuration. Operations personnel decide how and when to collect metrics and how detailed the metrics themselves are. This gives it the flexibility to adapt the metrics collection to meet individual needs. Classification of indicators in Istio:

Proxy level indicator (proxy-level)

The Istio metrics collection starts from the Sidecar proxy (Envoy). Each broker generates a rich set of metrics for all the traffic (inbound and outbound) that passes through it. The agent also provides detailed statistics about its own administrative capabilities, including configuration information and health information.

Metrics generated by envoys provide grid monitoring at the granularity of resources such as listeners and clusters. Therefore, in order to monitor Envoy metrics, you need to understand the connections between grid services and Envoy resources.

Istio allows operations to choose which Envoy metrics to generate and collect on each workload instance. By default, Istio supports only a small portion of the statistics generated by Envoy to avoid relying on too many back-end services and to reduce the CPU overhead associated with metric collection. However, operations can easily extend the collected set of proxy metrics as needed. This enables targeted debugging of network behavior while reducing the overall cost of monitoring across the grid.

The Envoy documentation includes detailed instructions for collecting Envoy statistics. The operations manual in Envoy Statistics provides more information on controlling proxy level metric generation.

Examples of agent level metrics:

The total number of requests from upstream services in the current cluster
envoy_cluster_internal_upstream_rq{response_code_class="2xx",
	cluster_name="xds-grpc"} 7163

# Number of requests completed by upstream services
envoy_cluster_upstream_rq_completed{cluster_name="xds-grpc"} 7164
# is the number of SSL connection errors
envoy_cluster_ssl_connection_error{cluster_name="xds-grpc"} 0
Copy the code
Service-level Indicators

Monitor service-oriented metrics for service communications.

  • There are four basic service monitoring requirements: latency, traffic, errors, and saturation. Istio comes with a default set of dashboards for monitoring service behavior based on these metrics.
  • Default Istio metrics are defined by the configuration set provided by Istio and exported to Prometheus by default. O&m personnel can freely modify the form and content of these indicators and change their collection mechanism to meet their own monitoring requirements.

The Indicator collection task provides more detailed information for customizing Istio indicator generation.

  • The use of service level metrics is entirely optional. O&m personnel can disable indicator generation and collection to meet their own needs.

Examples of service level indicators

istio_requests_total{
  connection_security_policy="mutual_tls",
  destination_app="details",
  destination_principal="cluster.local/ns/default/sa/default",
  destination_service="details.default.svc.cluster.local",
  destination_service_name="details",
  destination_service_namespace="default",
  destination_version="v1",
  destination_workload="details-v1",
  destination_workload_namespace="default",
  reporter="destination",
  request_protocol="http",
  response_code="200",
  response_flags="-",
  source_app="productpage",
  source_principal="cluster.local/ns/default/sa/default",
  source_version="v1",
  source_workload="productpage-v1",
  source_workload_namespace="default"
} 214
Copy the code
Control plane

Each Istio component (Pilot, Galley, Mixer) provides a collection of its own monitoring indicators. These metrics allow monitoring of Istio’s own behavior (unlike services within the grid).

Access log

Monitor your application by events generated by your application. Access logging provides a way to monitor and understand behavior from the perspective of a single workload instance. Istio can generate access logs for service traffic from a configurable set of formats, providing o&M with complete control over how, what, when, and where logs are recorded. Istio exposes complete source and target metadata to the access logging mechanism, allowing detailed scrutiny of network traffic.

  • Generation location is optional

Access logs can be generated locally or exported to a custom back-end infrastructure, including Fluentd.

  • Log contents

Application logs Envoy logs: kubectl logs -l app= demo-c istio-proxy

Istio access log example (JSON format) :

{
"level": "info"."time": "The 2019-06-11 T20:57:35. 424310 z"."instance": "accesslog.instance.istio-control"."connection_security_policy": "mutual_tls"."destinationApp": "productpage"."destinationIp": "10.44.2.15"."destinationName": "productpage-v1-6db7564db8-pvsnd"."destinationNamespace": "default"."destinationOwner": "kubernetes://apis/apps/v1/namespaces/default/deployments/productpage-v1"."destinationPrincipal": "cluster.local/ns/default/sa/default"."destinationServiceHost": "productpage.default.svc.cluster.local"."destinationWorkload": "productpage-v1"."httpAuthority": "35.202.6.119"."latency": "35.076236 ms"."method": "GET"."protocol": "http"."receivedBytes": 917,
"referer": ""."reporter": "destination"."requestId": "e3f7cffb-5642-434d-ae75-233a05b06158"."requestSize": 0."requestedServerName": "outbound_.9080_._.productpage.default.svc.cluster.local"."responseCode": 200,
"responseFlags": "-"."responseSize": 4183,
"responseTimestamp": "The 2019-06-11 T20:57:35. 459150 z"."sentBytes": 4328,
"sourceApp": "istio-ingressgateway"."sourceIp": "10.44.0.8"."sourceName": "ingressgateway-7748774cbf-bvf4j"."sourceNamespace": "istio-control"."sourceOwner": "kubernetes://apis/apps/v1/namespaces/istio-control/deployments/ingressgateway"."sourcePrincipal": "cluster.local/ns/istio-control/sa/default"."sourceWorkload": "ingressgateway"."url": "/productpage"."userAgent": "Curl / 7.54.0"."xForwardedFor": "10.128.0.35"
}
Copy the code

Distributed tracking

Request tracing is used to understand the invocation relationship between services for troubleshooting and performance analysis. Distributed tracing provides a way to monitor and understand behavior by monitoring individual requests that flow through the grid. Tracing enables grid operations to understand service dependencies and sources of delay in the service grid.

Istio supports distributed tracking via Envoy proxies. The broker automatically generates trace spans for its application, simply requiring the application to forward the appropriate request context.

Istio supports many tracking systems, including Zipkin, Jaeger, LightStep, and Datadog. Operations controls the sampling rate (the rate at which trace data is generated per request) at which traces are generated. This allows operations to control the amount and rate at which the grid generates trace data.

More information about Istio distributed tracing can be found in the Distributed Tracing FAQ.

Distributed trace data generated by Istio for a request:

Network security

Authorization and authentication

strategy

Traffic limiting blacklist and whitelist

reference

  • Istio. IO/latest/useful/d…
  • Istio. IO/latest/useful/d…