As an important carrier in the information age, the network has formed a unique “virtual network” service architecture and mode under the rapid development of cloud services. On December 19th, the 2020 China Cloud Network Summit was successfully held in Beijing. At the meeting, Mr. Chen Huangdong, who is in charge of UCloud VPC virtual network, gave a speech titled “The Road of UCloud VPC Technology Evolution”. This paper mainly introduces the problems that UCloud encountered in the process of updating the virtual network and the network practice of how to improve according to the aspects of hardware and software.
The VPC network of UCloud has undergone three major technological evolutions since its launch in 2012, transitioning from the earliest classic network to VPC network, and forming the current VPC 3.0 architecture integrating software and hardware.
The early classic web
In its early days, UCloud’s network of data centers was essentially a classic network. Both the cloud host and the host reside in a large two-layer network, with network forwarding relying on the Linux Bridge and isolation relying on iptables and ebtables rules maintained by the control plane.
With the continuous expansion of network scale, many problems have been exposed in the classic network:
- Scale problem: the scale of the classic network that relies on the large layer is limited by the broadcast domain. With the expansion of the broadcast domain, the broadcast storm and the insufficient MAC address table entries of the switch will cause network failures, thus limiting the network scale.
- Performance issues: Poor performance due to forwarding reliance on Linux Bridge, and poor matching efficiency due to increasing iptables isolation rules, resulting in performance issues;
- Networking problem: due to the unified distribution of IP addresses in classical networks, customers can not decide their own network structure and network address space; At the same time, because IP can not be reused, resulting in insufficient address space.
VPC 2.0 architecture: VPC network based on SDN technology
Because of these problems, UCloud developed and launched a VPC network based on the VPC 2.0 architecture at the end of 2016, and eventually helped customers migrate seamlessly to VPC networks.
In the VPC 2.0 architecture, we realized the virtualization of the network based on SDN technology. Openvswitch and OpenFlow were introduced in the forwarding plane to complete the forwarding and isolation of Overlay traffic, and SDN controller was developed in the control plane to complete the Flow management.
In VPC 2.0, we issue Flow rules through a combination of Packet-In and Push. The routing table and ACL-related flows are actively pushed, while the point-to-point forwarded flows rely on the Packet-In mechanism to be sent to the controller to complete the Flow dispatch.
In addition, we have developed many east-west and south-north gateways based on DPDK technology, such as load balancing gateways, hybrid cloud gateways, bare metal physical cloud gateways, etc. These gateways can directly interact with VPC to complete traffic forwarding and realize VPC access to heterogeneous networks.
In the long run of VPC 2.0, we also found many problems:
- Packet-In mechanism leads to the first Packet delay: the new communication must be calculated by the controller before the Flow can be sent, resulting In the forwarding delay of the first Packet and affecting the customer experience. With the rise of container applications of K8S and Serverless, this part of delay will have a greater impact on the business.
- The traffic coupling between the forwarding surface and the control surface: it is precisely because of the pack-in mechanism that the traffic of the forwarding surface and the traffic of the control surface are coupled to each other, which leads to many DDoS attacks In the network and the scanning traffic of the Intranet penetrating to the control surface, thus causing huge pressure on the control surface and affecting the service stability;
- Heterogeneous network coupling: In VPC 2.0 architecture, VPC needs to communicate directly with various heterogeneous networks, so each gateway needs to perceive the routing details of VPC, which leads to logic decentralization of VPC control surface, enlarged network boundary, and long iteration cycle due to new features on-line.
- Insufficient performance of OVS forwarding: OVS forwarding relies on Linux kernel, which is not designed for network forwarding. There are many locks and queues, resulting in poor performance of OVS forwarding.
VPC 3.0 architecture: software and hardware integration of the new generation of VPC network
In order to solve these problems under VPC 2.0, we have done a lot of virtual network technology exploration and improvement, and finally formed the integration of software and hardware VPC 3.0 architecture.
In VPC 3.0 architecture, the biggest characteristic is the cooperation of hardware and software. The forwarding surface introduces a lot of forwarding elements, including kernel version OVS, hardware off-load version OVS, intelligent network card, P4 and DPDK, etc. Therefore, how to adapt many forwarding surface elements and maintain good expansibility is an important issue that the control surface needs to consider.
Adaptation network elements
In the VPC 3.0 control plane, we introduced the concepts and services of Model Layer, Middle Layer, Mapping Layer and Datapath Layer. When adapting network elements, unified business objects (such as Subnet) will be generated at the model layer and routed to the mapping layer at the middle platform layer. In the mapping layer, the mapping of business objects to different network element objects is completed, such as OpenFlow object, P4 object and TC object.
Then the network meta objects will be routed to the push layer again through the middle layer. The push layer is concerned with the specific transmission element to achieve efficient and high-performance forward object push.
The dynamic learning
In order to solve the problem of Packet-In In VPC 2.0 architecture, we introduce the Flow delivery mode which combines active push and dynamic learning to complete the efficient Flow delivery. We developed VPC gateway BGW based on P4 and programmable chip. BGW will run DCP(DataPath Control Protocol) with DataPath Controller located in computing node to complete the learning of flow table and offload of traffic.
The specific implementation principle is as follows: When the existing rules of OVS cannot meet the forwarding requirement, it will be forwarded to BGW through the default Flow. In addition to correctly forwarding the traffic to the destination, BGW will also construct a UDP message according to the DCP protocol and send it to the Datapath Controller at the source end. The Controller will also learn the key information required for the next Flow according to this article, so as to realize the dynamic learning of Flow and the offload of Flow from BGW to OVS.
Compared with the Packet-In mechanism of VPC 2.0, dynamic learning brings the following benefits:
- Flow table learning occurs on the forwarding surface and its performance is much higher than that on the control surface.
- During the flow table learning period, the traffic can still be forwarded by BGW normally, without affecting the service and without first packet delay.
- Compared with full-volume push, on-demand learning of Flow table can greatly reduce the number of push flows and improve the performance of push.
Control surface middle platform
Moreover we construct the control plane middle ability, through the middle layer implements many general ability, including routing, cache consistency, object shard, objects, such as gray, make different products in the development, adaptation forward different surface can reuse these have well-defined, quickly achieve good general ability, improving control the reliability and performance.
In the evolution of the forward plane, we also gradually moved from software to hardware. From the early kernel bridge and kernel OVS forwarding, the current hardware network elements such as hardware unloading OVS and intelligent network card have been gradually switched to accelerate the forwarding performance. In the host of Kuaijie Cloud, the network forwarding capacity is improved to 25G bandwidth, 1000W PPS and 10G external network bandwidth by uninstalling OVS.
At the same time, the gateway has gradually evolved from DPDK technology to the current P4 based programmable chip. In the P4 practice, we developed and went online VPC gateway BGW and load balancing gateway CGW. VPC gateway mainly supports two or three layers of traffic forwarding and ARP generation in VPC, and supports Flow Offload. The load-balancing gateway supports seamless replacement of the traditional switch ECMP to implement gateway clusters, supports consistent Hashing (Maglev Hashing), and supports Hashing based on arbitrary fields (VNI, memory IP and port), and supports the IPv4 / IPv6 overlay protocol. One of the use scenarios for CGW is to achieve the sharding and grayscale of the gateway cluster.
In the VPC 3.0 architecture, we introduce a centralized gateway such as UXR to complete the decoupling of heterogeneous networks, so that the heterogeneous networks can be decoupled from each other when communicating with VPC without caring about the details of VPC network, so as to narrow the network boundary and make the network more cohesive.
In addition, we also introduced the VPC gateway to achieve the VPC traffic forwarding and flow table dynamic learning.
At the same time, bare metal physical cloud products have also evolved to smart network cards. Based on the smart network cards, we have realized the unloading of kernel OVS and NVGRE tunnel, and improved the network bandwidth to 40G through bonding technology.
Service architecture: micro-service-oriented
In the service architecture, we have also evolved from a monolithic architecture to a micro-service architecture.
In the single architecture, we are a distributed system based on our own framework and TCP and Protobuf, but there are also many problems in the long-term maintenance process:
- Complicated logic: With the increase and iteration of product scale, application logic becomes more and more complex, single application becomes “large and complete”, iteration cycle becomes longer, and flexibility becomes worse;
- Poor scaling capacity: At the same time, the deployment and management of services are more traditional and the scaling capacity is poor.
- Process management is complex: Since RPC calls are based on TCP and ProtoBuf, traffic management costs are high. In order to realize grayscale, we developed an internal grayscale gateway. In order to support business retry and current limiting logic, it also needs to be realized inside the code.
As a result, we split individual applications into microservices along service boundaries and introduced frameworks and components such as Istio, Kubernetes, and GRPC.
In the micro-service architecture, we achieved the following advantages:
- Service cohesion and fast iteration: During the splitting process, the single microservice logic is cohesive and simple enough to make the service iteration faster and easier to be grayed.
- Strong elastic scalability: With the help of Kubernetes, good elastic scalability and deployment capability can be achieved;
- Refining grayscale capabilities: With the help of ISTIO, we can achieve refined grayscale capabilities, including request grayscale based on traffic ratio, customer, customer level, VPC, and even VM level;
- ISITO-based Traffic Management: Through ISTIO we have implemented the traffic management in the Sidecar node, including retry, traffic limit, fusing, etc.
In the process of micro-servitization, we have also encountered many challenges. Maintaining a large micro-service system will bring more complexity and uncertainty to the overall service, which will also test our service governance ability. Therefore, we have also made a lot of efforts behind the micro-servitization.
Telemetry and fault location
With the development of cloud computing, the scale of cloud network is also expanding. In order to locate network problems in the increasingly complex cloud network environment, we often have to answer the three pain points of internal network traffic tracking:
- Is the communication normal: Is the end-to-end communication normal, is there a fault, and where is the fault?
- Are Delays Normal: Are end-to-end communication delays normal?
- Which path does the traffic take: How do you determine the actual path of the traffic among the many ECMPs and hashes?
In order to solve the above problems, we design and develop UCloud full link high performance detection system.
The biggest feature of the system is that the requirements on network elements are very small, only need to overlay/underlay network elements to support traffic mirroring, ERSPAN, without the need for programming ability. By mirroring INT packets (specially colored TCP packets) in and out of each network element to the Telemetry Cluster, we can construct an end-to-end message bitmap. Based on this analysis, the communication results (whether the communication is normal, whether the packet is lost, where the packet is lost), the end-to-end approximate delay, and the actual end-to-end communication link are obtained.
In addition, with our active flow analysis system, it is possible to quickly detect the active flow within the VPC. During the change, we can quickly verify the active communication status before and after the change and whether the communication link is abnormal through such a mechanism, so as to quickly and reliably find out the potential problems.
Under the latest VPC3.0 architecture, UCloud VPC supports high performance network forwarding (up to 10 million PPS of internal network packet volume, and a single EIP supports up to 10GB of external network bandwidth). In addition to IPv4, UCloud VPC also provides native support for IPv6. Help customers quickly build IPv6 VPC networks. At the same time, with the support of ACL and security group, users can realize the fine-grained security access control of resources in VPC.
In the future, the UCloud VPC team will pay close attention to the development of software and hardware related to the network, and digest and absorb new technologies that meet their own needs, so as to continue to create safe, stable and high-performance VPC cloud services for users.