Here’s a look at some of the things that the open source Kube-OVN from CloudCloud is doing in terms of cloud native practices. Kube-OVN is also evolving based on issues encountered in everyday practices.

1. Subnet model

The major change of Kube-OVN is to improve the Subnet model and change the model of Subnet Per Node to Subnet Per Namespace. This has several advantages. First, the subnet can be distributed across all nodes, avoiding the binding of the network address to the host address. There is no need to change the entire cluster of machines, only need to add or subtract subnets when network scaling.

When a subnet is associated with a namespace, there will be many independent Settings. For example, each namespace can have an independent address space, which may correspond to a certain project group or a certain tenant on the user side. Such a correspondence is straightforward, because it may correspond to a tenant to a namespace, and a tenant to a subnet. It’s a closely related situation. After the subnet is bound to namespace, some firewall policies can be defined according to the tenant. Gateway, NAT policy, subnet IPv4 or IPv6 can be set according to the level of the subnet, without being tied to the node.

2 fixed IP

Kube-OVN also provides full support for fixed IP.

• POD supports fixed IP/Mac

Workload types (deployment/statefulset/dae will work monset/job) by ippool annotation fixed IP, ensure the deployment life cycle has been reuse IP.

•StatefulSet supports IP reuse and reuse of allocated IP by name throughout the life cycle. We don’t know what the IP is when StatefulSet is created, but once StatefulSet is created, the Pod name and IP are fixed.

We’ve also made some additional support for StatefulSet, which has the default fixed IP on StatefulSet. This is debatable, and some people think it shouldn’t be used this way, but in our practice, we leave the ultimate choice to the user. If he does not need the function of fixed IP, we can do it in the way of randomly assigning IP. If there is such a problem between some departments of the user, it needs to be modified by the application or the underlying level. Maybe the application side does not want to change, but it also wants to use the container and fixed IP. We can also provide such capability.

3Gateway

The way out of the network, in practice, will encounter the application part in K8S, part is not, or some associated systems are not on K8S, which involves the needs of internal and external cluster visits. We offer several different ways to get out of the network. The default mode is distributed gateway, which is similar to the default mode of Flannel and Calico. If POD wants to access the external network, it can directly exit the network from the node node. But this approach may not work for all scenarios.

In this way, if your POD is floating, it means that the address is also floating out of the network. It is very difficult to whitelist or do an audit. In this case, whitelist all the machines in the cluster. When auditing, he did not know exactly which node the audit should come from. We made a namespace level, which is bound to the previous subnet. As long as we configure the gateway of the subnet, we can configure the way of the subnet.

Each Namespace can set its own Gateway node out of the network. At this node, all outbound traffic passes through a specific node, which is equivalent to outbound traffic in a fixed way. When auditing or whitelisting externally, a node is processed. And this node can do high availability, is now the main standby mode, if the current off, or failure offline, can automatically switch to the next standby node.

In addition to these two ways of network, we have a special setting, POD network to NAT, if it is, is equivalent to the current host node IP way of network. If NAT is not done, it is equivalent to the intercommunication between inside and outside the cluster. Our practice or all the ability to provide, to the user to choose.

4 performance

The last point is performance, which is also a topic of concern. Mainly from the control plane and data plane performance two aspects of Kube-OVN in the engineering work.

In our early days, we focused on the performance of the control plane, and Kube-OVN was the first to use the OVN control plane. It was found that there were some performance issues early on, and the control plane lag at thousands of PODs could be very serious. We’ve done a lot of work to mitigate latency, such as reducing the number of rules that change, merging changes from OVN, reducing the overall number of changes, and speeding up changes.

There’s also something upstream is doing, going from full update to incremental update. Previously, the mode of OVN said that network changes required each node to pull the entire network topology, which would be very slow if the total number of updates were made, but now they are all incremental updates, only new or changed rules are changed. That’s a pretty significant change. According to official tests, it took about six seconds to send out tens of thousands of streams. With this change, it takes 200 to 300 milliseconds. This function will also be added to KUBE-OVN in the future, when the support of the control plane will reach the scale of tens of thousands or hundreds of thousands of PODS or services.

Another way is to divide the area of the network. Nowadays, the K8S clusters used by users are extremely large. In large clusters, it is a big overhead to synchronize all network topologies with each node. We divide large clusters into network partitions according to tenants or available areas. As long as each partition follows its own internal rules and passes specific gateways, the number of flow tables required by the control plane can be reduced and the burden of each node can be reduced.

In another scenario, the most demanding area for control plane performance is the speed of disaster recovery. Under normal circumstances, if the cluster runs normally, the frequency of adding, deleting, modifying and checking is still controllable. If the master node goes down, or the entire OVN node or several machines go down, the synchronization update of the network will be involved in the recovery. Kube-OVN has a controller policy, which needs to verify all PODS. After the controller is started, it will see whether all the configurations are synchronized to the latest state.

As you can imagine, if you have 10,000 PODS or services, you need to reconfigure them in the OVN at startup. According to our test, the previous work needed more than ten minutes to complete the recovery of the fault. We have made some optimizations for this, mainly in two ways: First, to reduce duplicate network configurations. Second, real-time cleaning of evicted, failed and other network resources.

After controlling the performance of the plane, we recently did some things on the data plane. First of all, Geneve was used by default, which is recommended by OVN as a network encapsulation mode. Geneve encapsulation has low requirements on physical network topology and performance loss in packet unpacking.

We now also support VLAN mode, which is required for the underlying switch. The underlying VLAN is required to be able to recognize Kube-OVN, and only a light amount of packaging is needed to convert the data packet through the physical switch. We have tested the performance, which is close to the performance of the physical network. This is the PR that Ruijie mentioned in the community.

The above two are kernel-based protocol stacks. Now we still have some telecom and 5G scenarios, where they will have extremely high requirements for network performance at the edge nodes, and the conventional kernel network cannot meet their requirements. So we now also support a DPDK stack, a container mounted with a vhost-user socket, and a container that can run high-performance DPDK applications.

Monitoring and troubleshooting

The performance and other functions in front of you are more concerned about the selection of the point, but in practice will find that operation and monitoring is more important ability. If something goes wrong, it will affect the confidence of many departments in the container network, so Kube-OVN integrates many commonly used tools, including the commonly used Tcpdump integrated into the plug-in, to facilitate the export of traffic. There’s also OVN-Trace, and if the network isn’t working from one point to another, the Tracing tool can be used to see if there’s a problem with the logical chain of packets from source to destination.

In addition, a large number of monitoring is integrated. Maybe everyone is looking at the network monitoring of a container, such as bandwidth, error rate, etc., we have increased the routine monitoring to improve the monitoring of network quality on the whole. It is mainly divided into the following dimensions:

• Pod – > Pod

• Pod – > Node

• Pod – > Service

• Pod – > DNS

• Pod – > External Address

We have regular monitoring of latency, quality of connectivity, and all the data is fed into Prometheus & Grafana to help operators and network managers see the quality of the current network in real time, without having to wait for an application to have a problem and then trace back to the network. In addition, when problems occur in the application, the data can be monitored to determine whether the problem is the container network or the application layer. There are also some traffic direction work, all the container traffic mirror, so that some traditional analysis tools, such as fine-grained traffic analysis, depth detection, security detection, through the way of traffic to do some more fine-grained network performance, quality monitoring.

6 User Cases

Openness

After Kube-OVN technology community took shape, it gained a large number of developers and users’ attention. KUBE-OVN function, operation and maintenance, performance improvement process, a number of manufacturers to join the development partners. For example, as we all know recently, we and Intel development team cooperation, joined the OVS-DPDK support, as well as the performance optimization scheme.

Capability is Intel open source and leading 5G edge computing suite, through their model, let K8S run on the edge. They chose KUBE-OVN as the default configuration node network configuration and management scheme. You can look at the picture on the right:

It is now being tried by telecoms in Europe and Japan. Multus provides the industry’s first combination of OVS-DPDK and container network to run high-performance DPDK applications containerized.

Welcome domestic edge computing network solutions manufacturers to contact us.

Top three domestic public cloud vendors in the application of the edge container scenario

With the power of the community, Kube-OVN is already in use in several projects and real-world environments. There are two domestic public cloud manufacturers have used Kube-OVN, the use of the business and the scene is similar, are in the edge of computing. Instead of selling K8s directly to users, they sell containers, similar to selling containers in a virtual machine model. There is a requirement that they have dynamic bandwidth control, because it may involve oversold, and if traffic comes up, they have to dynamically adjust the bandwidth per user.

Second, their containers are directly external and need to have a corresponding public IP. The reason for their integration with KUBE-OVN is mainly based on two points:

• Use the QoS capability of Kube-OVN to dynamically adjust the bandwidth

• Combined with the function of fixed IP, use NAT to directly map public IP and container IP

The container itself has a bandwidth control plug-in, but it is a one-time, you can only start the container to specify the bandwidth, and later if you want to change the bandwidth, can only be rebuilt to take effect. Kube-OVN is dynamically adjusted, the service does not need to restart, bandwidth can be adjusted in real time.

There is also the function of fixed IP, users can directly map the IP of the public network and the container IP, so that when the container is up, the dynamic unbinding and binding thing will not appear, the rule is written through the fixed IP, so as to avoid some problems caused by frequent changes.