preface

From the launch of Istio in 2017 to the launch of several Service Mesh products in 2020, there is no doubt that Service Mesh has redefined microservice governance and is still the de facto standard for Sevice Mesh. However, even though Istio claims that version 1.7 is a true production-line release by mid-2020, it still has a number of problems. Just as Docker’s glory has been eclipsed by Kubernetes, will Istio, like Docker, go from being the first to being a pathfinder?

Take the hammer and look for a nail

In 2020, I was lucky enough to participate in the landing project of ISTIO in the company. Although the landing project was successful, the process was tortuous and the result was not so perfect. In our early technical research, we all knew the theory of Service Mesh, but it was a painful process of thinking and exploring how to land and migrate from the existing microservice architecture. Unlike Kubernetes, where there is a complete solution, our biggest problem is that we have Istio installed in the Kubernetes cluster, what do we do with Istio next? No one will tell you, we just have to explore ourselves. We are like workers with hammers looking for nails. After a lot of trial and error, we figured out how to do service name call, how to make full use of Istio traffic management to achieve grayscale publishing, traffic governance, cross-cluster call based on Istio Gateway+mTLS, Visualization of network links through Kiali, link performance monitoring and analysis through Skywalking, and integration into existing microservices governance systems.

Migration of old and new systems

Of course, there is another problem that Istio will have to face: the migration of old and new systems. The process of service migration is very long, we have hundreds of microservices on the wire, IT takes a lot of manpower and time to sort out the existing call links and the compatible processing in the migration process, because the Internet business unlike traditional IT services, IT does not give you the opportunity to shut down maintenance. Although our previous microservices were already running in a container environment, the biggest problem with migration was not containerization, but the change of call links. Even though we spent a lot of time sorting out the migration solution, there were a lot of problems during the migration process. In addition to the omissions and historical baggage in the process of sorting out the migration, some services also involved storage changes that we hadn’t considered initially. So we have to move as we go and find problems and fix them as quickly as possible. Therefore, once you decide to move to Istio, you must make a long-term migration plan.

Delay increase and resource occupation on complex links

Sidecar liberates the governance of microservices in different development languages, but it creates new problems. Network links in microservice architecture will inevitably become more complex as the system iterates, and the number of Sidecar agents passing through each request will increase. Even if a sidecar is an efficient network agent, the delay caused by a single Sidecar can be ignored, but if the links are complex, the delay caused by an envoy cannot be ignored. In our actual tests, the average latency for requests increased by about 20% after moving to the service grid, because our team didn’t have the time or manpower to optimize the Envoy and redevelop the Sidecar proxy, but there was no turning back and we had no choice. But sometimes the delays are too much for the business, and in some cases we have to temporarily abandon the service grid. In addition, Envoy as a proxy is no picnic: it is interesting to note that a large proportion of our microservices have low initial memory CPU footprint, and in many cases their sidecar footprint exceeds that of the application container itself. So for this kind of business scenario, we actually started to consider serverless. After all, Service Mesh is not a silver bullet, and we should consider the cost as well as solve the r&d pain points.

Sidecar container startup and shutdown sequence

The problem of the sequence of sidecar start and end has existed since the release of ISITO. Since the traffic of the application container needs to pass through the Sidecar container, if the Sidecar container is started after the application container, the requests in the application container cannot be sent for a period of time. If the sidecar container terminates after the application container is applied, some requests will not be sent because the Sidecar container stops first, even if the application container can be gracefully closed. Of course, the main reason for this problem is not Istio, but Kubernetes does not provide an effective mechanism to guarantee it. With the release of Istio1.7, sidecar startup sequence has been resolved by preStart Lifecycle and sidecar shutdown will have to be resolved by Kubernetes’ Future Sidecar lifecycle.

Refinement of traffic management

Compared to Kubernetes itself, Istio’s traffic management functions are already very detailed. In traditional Kubernetes, if you want to achieve the corresponding proportion of canary release, can only be Pod level to control the flow, while using Istio can easily do network level flow control through VirtualService, so as not to maintain the Pod ratio, you can rest assured to expand and shrink the capacity. However, traffic management in some scenarios requires more detailed functions. For example, as a RESTFul API provider, a service sometimes only needs to perform traffic limiting or fusing for a certain interface. However, Istio’s traffic limiting and fusing are targeted at the application service level. If you want to utilize the functions of Isito, you have to disintegrate micro-services into more detailed ones, which is contrary to the idea of non-intrusive application of Isito. So, there’s still a lot of work to do to make Istio and microservices work better together.

The dispute between trademark and commercialization

On July 8, 2020, the Istio community blog posted that it was transferring ownership of the Istio project trademark to Open Usage Commons, an organization that Google has just announced, Focuses on the management and guidance of open source project trademarks in a manner consistent with the definition of open source. However, IBM, one of the founders of Istio, was unhappy with the decision, saying it violated its original vendor-neutral commitment to donate the project to CNCF. In fact, it’s easy to understand Google’s approach: Google open-source Kubernetes without making a lot of money out of it, allowing cloud vendors like AWS to make a lot of money out of Kubernetes. However, it is Kubernetes’ de facto vendor neutrality that makes Kubernetes popular among cloud vendors. The commercialization controversy may have a limited impact on Service Mesh technology, but Istio itself may be affected by the emergence of other Service Mesh vendors and the establishment of Service Mesh standards.

conclusion

Although there are some problems with Istio at present, we can see from our experience that Istio has brought us great benefits. The sinking of microservices governance capability in multiple languages frees our infrastructure developers from having to spend a lot of time maintaining and upgrading SDKS in different languages; The traffic management capability brought by ISTIO makes it easy for us to have blue-green and Canary publishing capabilities. At the same time, relying on Istio’s bidirectional mTLS and service access control capabilities, we have fully guaranteed application security and service governance under multi-cloud and multi-center conditions.