Important conclusions

  • A microservice-style architecture simplifies the development of individual services. However, managing the communication, monitoring, and security of hundreds of microservices is not a simple task.
  • Service Mesh provides a transparent, programming language-neutral way to automate operations such as network configuration, security configuration, and Service observation flexibly and easily. In essence, it decouples service development from operation and maintenance.
  • The Istio Service Mesh consists of two parts. 1. Data panels composed of Envoy agents that intercept web requests and control communication between services. 2. A control panel that supports runtime management of services, providing policy enforcement, telemetry data collection, and certificate rotation.
  • The immediate project goal is to release Istio 1.0 (including support for hybrid environments) once key features are in beta.
  • The long-term goal is to integrate Istio into a variety of environments.

It is no exaggeration to say that Istio popularized the concept of “Service Mesh”. Before diving into the details of Istio, let’s take a quick look at what Service Mesh is and why it’s important. We’ve all seen the challenges of monolithic applications, and one obvious solution is to break them down into multiple microservices. While this approach simplifies the development of individual services, managing the communication, monitoring, and security of hundreds of microservices is not straightforward. Until now, the solution to these problems has been to chain services together through custom scripts, class libraries, and so on, and to devote dedicated human resources to handling the administrative tasks of distributed systems. But this approach reduces the efficiency of each team and increases the cost of maintenance. This is where Service Mesh comes in.

Service Mesh provides a transparent, programming language-neutral way to automate operations such as network configuration, security configuration, and telemetry flexibly and easily. In essence, it decouples service development from operation and maintenance. If you’re a developer, you don’t have to worry about the operational impact of deploying new services or modifying existing services on your distributed system. Similarly, operations personnel can change operations control between services without having to redeploy the service or modify the source code of the service. The layer of infrastructure between the Service and the underlying network is often referred to as the Service Mesh.

Inside Google, we manage services through a distributed platform, with proxies handling internal and external protocols. Behind these agents is a control panel that provides an additional layer of abstraction between developers and operations personnel, on top of which services are managed across languages and system platforms. The architecture has been proven to ensure high scalability, low latency, and rich features across Google’s services.

In 2016, we decided to develop an open source project to manage microservices, which is very similar to the platform we use internally at Google. We decided to name the project “Istio”. The name was chosen because Istio means “to set sail” in Greek. At the start of the project, we decided that it needed to support Kubernetes, which in Greek translates to “helmsman” or “driver.” It is important to note that Istio is not limited to a service deployment environment; its development goal is to be able to manage services running in different environments.

At about the same time that we started work on the Istio project, IBM also released an open source project called Amalgam8, a content-based routing scheme for microservices based on NGINX technology. IBM then realized that there was a lot of overlap between the two projects in terms of usage scenarios and product visions, and agreed to become our partner in abandoning Amalgam8 and building Istio based on Lyft’s Envoy project.

How does Istio work?

Generally, an Istio Service Mesh consists of two parts. 1. Data panels composed of Envoy agents that intercept web requests and control communication between services. 2. A control panel that supports runtime management of services, providing policy enforcement, telemetry data collection, and certificate rotation.

Image credit: Istio project PM, Dan Ciruli

The agent

Envoy is a high-performance, open source distributed agent written by Lyft based on C++ and used internally by Lyft to handle network requests in production environments. Deployed as a Sidecar, the Envoy intercepts all incoming and outgoing network requests, implements various network policies, and integrates with the Istio control panel. Istio leverages a number of features built into Envoy, such as service discovery and load balancing, traffic splitting, fault injection, fuses, and phased publishing capabilities.

Pilot

As an important part of the control panel, Pilot manages the configuration of the agent and distributes the service’s communication policy to all Envoy instances in the Istio mesh. It can take high-level rules (such as publishing policies), interpret them as low-level Envoy configurations, and distribute them to sidecar without causing outages or redeployments. Although Pilot itself is independent of the underlying platform, operations personnel can use platform-specific adapters to push service discovery information to Pilot.

Mixer

Mixer is able to integrate a variety of ecological infrastructure back-end systems in Istio through a standard configuration model with a set of plug-and-play adapters that enable Istio to easily integrate with existing services. The adapter extends Mixer capabilities and exposes specific interfaces to monitoring, logging, tracking, quota management, and other functions. The adapter is loaded on demand and comes into play at run time as configured by the operations staff.

Citadel

Citadel, formerly known as Istio Auth, implements certificate signing and rotation for cross-mesh communication between services and provides bidirectional authentication and authorization functions. Envoy transparently infuses two-way TLS on each call through Citadel certificates, managing traffic securely and encrypting it through automated identity and credential management. Citadel fits Istio’s overall design, requires little or no service code to configure authentication and authorization, and seamlessly supports multiple clusters and platforms.

Why Istio?

Istio is highly modular and applicable to various scenarios. A detailed explanation of its various benefits is probably beyond the scope of this article, but I will briefly introduce it and give you a taste of how it can simplify network operations, security operations, and DevOps day-to-day tasks.

flexibility

Istio protects applications from flaky networks and avalanche failures. If you are a network operations person, you can systematically test the flexibility of your application by injecting faults such as network latency and network isolation into your system through features such as fault injection. If you want to migrate one version of the service to another version, you can reduce the risk by gradually directing traffic to the new version of the service through weighted traffic routing. Better yet, you can simulate the behavior of real traffic pointing to the new deployed service to see how it works before making the actual switch. In addition, you can load balance incoming and outgoing traffic through the Istio Gateway and apply routing rules such as timeouts, retries, and fuses to traffic to reduce potential failures and recover from failures.

security

One of the main usage scenarios for Istio is secure encryption of communication between services in heterogeneous systems. Security operations staff to be able to in a uniform way on a large scale of operation, such as open flow encryption, in does not destroy other services under the premise of to prevent access to a service, open two-way identity authentication, access control, white list (ACL) management services, of the communication between the service and the service authorization, and the safety of the analysis of service condition and so on. Operations personnel can enforce these security policies across a single service, a single namespace, or the entire mesh. The existence of these functions can reduce the dependence on the firewall layer and reduce the workload of security operation and maintenance personnel.

observability

One of the challenges posed by microservices is how to visually understand how the infrastructure is performing. Until recently, the best approach was to extend each service to enable end-to-end service delivery. Unless you’re going to devote a team of people to tuning binaries, it’s still hard to get a holistic view of the platform, and it’s still hard to troubleshoot system bottlenecks.

With Istio’s built-in capabilities, you can visualize key metrics of your system and track requests across services. That way, you can do things like automatically scale based on application metrics. While Istio supports extensions such as Prometheus, Stackdriver, Zipkin, and Jaeger, to name a few, Istio itself is not limited to a choice of back-end platforms. If you can’t find the right tool, you can write your own adaptation and integrate it with Istio.

What is the status of Istio?

New features are being added to Istio, and we are improving existing features. Istio development follows a standard agile style where each feature needs to be delivered through its own lifecycle (dev/alpha/beta/stable). While some features are still in progress, many are already available in production (Beta /stable). Check out the latest list of features on istio. IO.

Istio follows a strict release rhythm, and while we provide daily and weekly builds, we don’t support them or ensure their reliability. Monthly snapshot releases, on the other hand, are more secure and often contain new features. However, if you plan to use Istio in a production environment, select the version that includes the “LTS” (long Term support) label. At the time of this writing, the latest LTS version number is 0.8. You can find this and other versions on GitHub.

What are your future plans?

It has been a year since the official release of Istio 0.1 at GlueCon. Although we have made great progress, there is still much work to be done. The near-term goal is to release Istio 1.0 once key features are in beta (and in some cases may need to wait for stable). It is important to note that this release is not the full list of Istio features, but the most important ones we have selected based on community feedback. For this release, we are also working on improving non-functional requirements such as performance and scalability, as well as improving our documentation and hands-on experience.

An important goal of Istio is to support hybrid environments. For example, users can run virtual machines on GCE, local Cloud Foundry clusters, or other public Cloud services. Istio provides a unified view of the overall service platform, managing the connections between these environments and ensuring security. We are currently working on a multi-cluster architecture that allows you to join multiple Kubernetes clusters into a single mesh on a flat network and enable service discovery across clusters, which is in alpha in the 0.8lTS release. In the near future, it will also support globalized cluster-level load balancing and provide support for non-flat networks through Gateway peer to peer.

In addition to the 1.0 release, our other focus is on API management capabilities. To take just one example, we plan to launch a Service Broker API that will provide service discovery and provisioning capabilities to individual services, linking service consumers with service managers. We will also provide a unified interface for API management functions such as API business analysis, API key verification, authentication verification (such as JWT, OAuth, etc.), encoding transformation (JSON/REST to gRPC transformation), routing, and integration with multiple API management systems. Examples include Google Endpoints and Apigee.

All of these short-term goals are aimed at achieving our long-term goal of integrating Istio into different environments. According to Sven Mawson, our technical lead and founder of Istio, “What we want to achieve is to be able to integrate Istio into every environment, no matter what environment or platform you are using, and provide you with service management capabilities.”

Istio is still in its early stages, but its pace of development and acceptance is steadily increasing. Istio has become synonymous with Service Mesh for both mainstream cloud vendors and individual contributors, and is an important part of the basic design roadmap. Every release means we are one step closer to our goal.

About the author

Jasmine Jaksic Jasmine Jaksic has 15 years of experience in the development and support of software products and services at Google as the Technical Project Manager for Istio projects. She is also a co-founder of Posture Monitor, which enables Posture correction with a 3D camera. She is also a contributing writer for the New York Times, Wired magazine and the Huffington Post. Follow her on Twitter at @Jasminejaksic.

Istio and the Future of Service Meshes