Why ServiceMesh

UCloud App Engine on Kubernetes (hereinafter referred to as “UAEK”) is a kubernetes-based computing resource delivery platform built by UCloud, with features such as high availability, cross-room disaster recovery, automatic scaling, three-dimensional monitoring, log collection and simple operation and maintenance. The purpose is to use container technology to improve the efficiency of internal R&D operation and maintenance, so that the development can put more energy into business research and development itself, at the same time, so that the operation and maintenance can more calmly deal with resource expansion, gray release, version change, alarm monitoring and other daily work.

Considering that Kubernetes is born for automatic deployment, scaling and containerization, and after UCloud UAEK team completed the research and design implementation of IPv6 networking, a mature container management platform was soon officially launched in multiple available areas in Beijing ii region. Compared with the past application management VM deployment service, Kubernetes does bring real convenience, such as convenient and flexible automatic scaling and handy microservice architecture, simple configuration can achieve cross-availability disaster recovery.

However, microservitization brings many new problems to the system architecture, such as service discovery, monitoring, gray control, overload protection, request call tracing, etc. We are used to operating and maintaining a group of Zookeeper clusters to realize service discovery and client load balancing. Can WE use UAEK without having to operate and maintain Zookeeper? In order to monitor the running status of services, we all need to add bypass reporting logic in the code. Can we use UAEK to achieve non-intrusive and zero-coupling monitoring reporting?

In addition, in the past, many calls between system modules lack the fuse protection strategy, and the peak flow will collapse once hit. Can the use of UAEK help the business side to avoid large-scale transformation? In the past, troubleshooting problems, especially those involving time-consuming calls, always took time and effort. Can UAEK provide a convenient tool for locating bottlenecks?

Obviously, a stable Kubernetes platform alone is not enough to solve these problems. Therefore, when UAEK was launched, the team made ServiceMesh a must. Any TCP backend service deployed on UAEK would enjoy these features:

SideCar mode deployment, zero intrusion, microservice governance code and business code completely decoupled; Service discovery mechanism and load balancing scheduling integrated with Kubernetes platform; Provides flexible, real-time, no need to restart, according to layer 7 service information flow gray management function; The unified abstract data reporting API layer is provided for monitoring and access policy control. Use distributed request link tracking system to quickly trace bugs and locate system performance bottlenecks; Overload protection mechanism, can automatically trigger fuse when the amount of requests exceeds the system design capacity; Can provide the fault simulation injection drill script before the service goes online, and conduct the fault handling drill in advance; In this way, after using UAEK to deploy application services, you can monitor and observe the abnormal version rollback, expansion of gray scale, full release, overload protection, and abnormal request locating and tracking.

Why Istio?

For the implementation of ServiceMesh, we focused on Istio. Through preliminary investigation and testing, we found that several features of Istio can well meet UAEK’s requirements:

Perfect support for Kubernetes platform; Separation of control surface and data forwarding surface; Sidecar deployment, control all inter-service call traffic, unlimited control; Envoy as Sidecar implementation, Envoy developed in C++11, based on event-driven and multi-threaded mechanism, performance is good concurrency, comparable to NGINX; Zero intrusion into business code and configuration files; Simple configuration, easy operation, perfect API.

The entire service grid is divided into two parts: control panel and data surface. Data faces refer to Envoy containers injected into the application Pod that dispatch all traffic between the modules as proxies. The control surface is divided into Pilot, Mixer and Citadel modules, with specific functions as follows:

Pilot is responsible for retrieving and watching the entire cluster of service discovery information from the Kubernetes API and delivering the cluster service discovery information and user-customized routing rule policies to the Envoy. Mixer is divided into two sub-modules, Policy and Telemetry. Policy provides access Policy control, black and white list control, QPS flow control services to Envoy; Telemetry provides data reporting and log gathering services for Envoy monitoring alerts and log queries. Citadel provides authentication and authentication, administrative credentials, and RBAC for services and users. In addition, Istio provides a command line tool called istioctl for operations personnel, similar to Kubectl of Kubernetes. After compiling the yamL file of routing rules, you can use IStioctl to submit routing rules to the cluster.

The principles and process details of Istio’s overall work are very complex, and the technology stack involved has a certain depth and breadth. Here’s a quick overview of the process:

O&m personnel use IStiocTL or call API to create and modify routing rule policies to the control layer. Pilot obtains and watches cluster service discovery information from Kube APIServer. When deploying an application, Istio injects an Envoy container into the POD’s deployment configuration, which hijacks all TCP traffic in the proxy POD via the Iptables NAT redirect. Envoy updates the cluster’s service discovery information and routing rules policies in real time from Pilot and intelligently schedules traffic within the cluster based on this information. An Envoy sends a Check request to Mixer Policy to Check whether the request is restricted by Policy or quota before each request is sent. After each request is received, Mixer Telemetry reports basic information about the request, such as call success, return status code, and elapsed time data. Citadel implements two-way TLS client certificate generation and injection, server key and certificate delivery and injection, and K8S RBAC access control. After the above investigation and a series of tests, the UAEK team fully recognized the design concept and potential value of Istio, and hoped to attract more internal teams to migrate their services to the UAEK environment by utilizing Istio’s rich and powerful microservices governance functions.

In reality, however, the process of connecting Istio to UAEK has not been smooth sailing. When Istio was first investigated, Istio was in version 0.6 and was not fully functional and could not be used out of the box in a UAEK environment.

IPv6 problem solving

The first problem we encountered is that UAEK is a pure IPv6 network environment, while Istio does not fully support IPv6 traffic, and some components cannot even be deployed in IPv6 environment.

Before introducing the specific transformation case, let’s take a look at how Istio Sidecar takes over the traffic of business applications.

As described in the figure above, Istio injects two containers into the application Pod: proxy-init and envoy. The proxy-init container redirects all TCP layer traffic to port 15001 on the Envoy by initializing the iptables Settings via the NAT redirect. Taking incoming traffic as an example, an Envoy’s service port receives a redirected TCP connection and finds the true destination IP address of the TCP connection using the SO_ORIGINAL_DST parameter via the getSocketopt (2) system call and forwards the request to the true destination IP.

However, we found that in the IPv6 environment, Envoy could not hijack Pod traffic. Based on packet capture and source code tracing, Pod starts with an Iptables initialization script to configure the NAT redirect inside the Pod and hijack TCP incoming and outgoing traffic into the Envoy’s listening port. However, this initialization script does not have the corresponding operation of ip6tables and discards all IPv6 traffic, so we modify the initialization script to implement IPv6 traffic hijacking.

A wave flat, a wave rising. After completing the IPv6 traffic hijacking, we found that all TCP traffic visiting the business service port was reset by the Envoy and port 15001 was not open when entering the Envoy container. This is an IPv4 address. We need an Envoy listening address of [::0]:15000. So we continue to modify the Pilot source code.

After the above efforts, the application server program Pod finally successfully accepted the TCP connection initiated by us. But soon, our request connection was closed by the server. As soon as the client connected, it immediately received the TCP FIN segment, and the request still failed. By looking at the Envoy’s run log, I found that the Envoy could not find the corresponding 4-layer Filter after receiving the TCP request.

Digging into the source code, Envoy requires the getSocketopt (2) system call to retrieve the actual destination address of the hijacked access request, but the enlist-related implementation in IPv6 is bugged, as shown in the code below. Due to the lack of determining the type of socket FD, getSocketopt (2) passed in an IPv4 parameter, so the Envoy could not find the true destination address of the request, reported an error and immediately closed the client connection.

Once the problem was discovered, the UAEK team immediately modified the Envoy source code to improve the IPv6 compatibility of getSocketopt (2) ‘s SO_ORIGINAL_DST option. This change was then submitted to the Envoy open source community, which subsequently incorporated it into the current Master branch. And was updated to use in the Envoy image of Istio1.0.

At this point, Istio SideCar can finally schedule traffic between services in the UAEK IPv6 environment.

In addition, we also found that Pilot and Mixer modules had out-of-bounds array and program crash when dealing with IPv6 address format, and repaired them one by one.

Performance evaluation

Before the release of iso 1.0, performance issues had been the focus of industry criticism. We first examined whether the addition of an Envoy, an additional layer of replication of the traffic, and the need for a Check request to the Mixer Policy before the request was dispatched would cause unacceptable delays to the business. After extensive testing, we found an increase in latency of around 5ms under a UAEK environment compared to without Istio, which is perfectly acceptable for most internal services.

Subsequently, we focused on the architecture of the entire Istio Mesh and concluded that Mixer Policy and Mixer Telemetry could easily become the performance shortfalls of the entire cluster. Due to the need to perform Check requests to the Policy service before each request is dispatched, this increases the delay of the business request itself and the load on the Policy as a single point. We take Http1.1 request as sample test and find that when the QPS of the whole grid reaches 2000-3000, the Policy will have a serious load bottleneck, resulting in a significant increase in the time of all Check requests, from 2-3ms to 100-150ms under normal circumstances. This significantly increases the time delay for all business requests, which is clearly unacceptable.

More seriously, in Istio 0.8 and earlier, Policy was a stateful service. Some features, such as global QPS Ratelimit quota control, require a single Policy process to record real-time data for the entire Mesh, which means that the Policy service cannot solve the performance bottleneck by scaling instances horizontally. After a tradeoff, we have now turned off the Policy service and trimmed some features, such as QPS global quota limits.

As mentioned earlier, Mixer Telemetry is mainly responsible for gathering calls to each request from an Envoy. Version 0.8 Mixer Telemetry also had serious performance issues. When the cluster QPS reaches 2000 or above, Telemetry instance memory usage increases dramatically.

After analyzing and positioning, it is found that the reason for Telemetry memory increase is that the rate of data consumption through various back-end adapters cannot keep up with the rate reported by Envoy, resulting in the rapid backlog of data not processed by Adapter in memory. This problem was greatly alleviated when we removed Istio’s impractical stdio log collection functionality. Fortunately, Telemetry’s memory data backlog problem was resolved with the release of Istio1.0, and a single Telemetry instance was capable of collecting and reporting at least 3.5W QPS under the same test conditions.

Problems, hopes and the future

After many problems along the way, a ServiceMesh available in production has finally come online in the UAEK environment. In the process, other teams in the department were influenced by the UAEK team to learn about Istio and try to use Istio in their projects. However, there is still a gap between the current situation and our original intention.

Istio continues to iterate rapidly, evolving and updating both Istio itself and the Envoy Proxy every day. Each release brings more powerful features, more concise API definitions, and more complex deployment architectures. From 0.7.1 to 0.8, the new routing rule V1alpha3 is completely incompatible with the previous API, and the new VirtualService is completely different from the original RouterUle, causing a lot of trouble for every user.

How to completely avoid the negative impact of Istio upgrade on the live network? The official still has not provided a perfect and smooth upgrade plan. In addition, although the performance of each component was significantly improved from 0.8 to 1.0, it was not satisfactory to everyone according to the feedback from the industry. It remains to be observed to what extent Mixer Check caching mechanism can alleviate the performance pressure of Policy.

It’s worth noting that many of the bugs we found were also being discovered and resolved by other developers in the community. What makes us happy is that the UAEK team is not an island of information. We can feel that the Istio official community is working hard to iterate at a high speed, always trying to solve the problems that developers are concerned about, and our submitted issues can be responded to within hours, which makes us believe that Istio is a potential project. Will be successful like Kubernetes.

According to the experience of UAEK access users, users need to properly use Istio without in-depth learning of Istio documents. UAEK will continue to focus on simplifying this process, making it easier for users to dumb, interface, and customize their own routing rules as they wish.

UAEK team is always committed to reform UCloud’s internal R&D process, so that r&d efficiency can be improved, operation and maintenance can no longer be troubled, and everyone can work happily. In addition to continuing to improve ServiceMesh functionality, UAEK will open up more regions and regions in the second half of the year, offer a richer console, and release automated code management packaged continuous integration (CI/CD) features.

The authors introduce

Chen Sui, senior R&D engineer of UCloud, has been responsible for the development of monitoring system, Serverless products, PaaS platform ServiceMesh, etc., with rich experience in distributed system development.