K8s network from 0 to implement a CNI network plug-in

The introduction

The source address

gayhub

The guide knowledge

In the last article: “K8s network in-depth understanding of CNI” we briefly introduced the K8s CNI network, and then briefly looked at the K8s source code to call a CNI plug-in about the process. At the end, we said that next time we would try to implement a CNI plug-in ourselves, so let’s do it today

Cluster environment:

Cluster environment: Ubuntu20.04 VM
K8s version: v1.23.0
Go Version: GO1.16 Linux/AMD64
GCC version: 9.3.0 (go test requires GCC)
Etcd version: 3.2.26
Etcd API Version: 3.2

Node environment:

Three VM nodes
Node IP:
1. Master: 192.168.98.143
2. Node – “2.168.98.144
3. 2.168.98.145 node – the children
Node hostname (can be set by hostnamectl) :
1. Master: ding – net – master
2. Node – 1: ding – net – node – 1
3. Node – 2: ding – net – node – 2
The three VMS can ping each other
Etcd address: https://192.168.98.143:2379

Note: On each node, add the hostname and IP of the other nodes to /etc/hosts

Note: cluster can use Kubeadm quickly set up, set up many methods online, here will not repeat

Review the knowledge

Before we begin, let’s briefly review what we learned in the previous article:

Kubelet starts the Container Runtime interface (CRI Runtime interface) on the Kubelet side. The Container Runtime interface (CRI Runtime Interface) starts the GRPC client on the Kubelet side. The typical implementation is Containerd, which means that containerd starts a GRPC server and receives a request from Kubelet to “create Pod”.

On receiving this request, Containerd calls the “Create Sandbox” interface, which preemptively starts a hidden pause container with stable network and storage resources for the Pod containers. However, containerd is a “high-level runtime.” Where it actually pulls up containers is in OCI, also known as a “low-level runtime.” The typical implementation might be runc or Kata, but the most common implementation is probably Runc. In order to create network resources for sandbox, the network configuration file will be loaded in /etc/cni/net.d first, and the binary plug-in will be found in /opt/cni/bin according to the type field in the configuration file. Then, the runtime information of the container will be used as environment variables. This binary is then called with the configuration file as standard output.

Binary plug-in call process is mainly to achieve three points: 1. Pod IP management, 2. Through pod communication between hosts, 3. Pod communication between different hosts. Once all three are done, the plug-in needs to print something on the standard output, which CRI reads for subsequent operations.

CNI is probably so, if there is no clear place, we can move to the last article to see again ~

Text: from 0 to implement a CNI network plug-in

The overall architecture

Before we get started, we can take a quick look at how other web plugins are implemented, such as this one: Plugins /bridge.go

The official implementation of some basic plug-ins, such as IPAM, Bridge, etc. Bridge is linked above. We can take a look at the bridge source:

The code is pretty long, but we can just focus on the main method and see that it calls a skel.pluginmain method and passes in cmdAdd and cmdCheck and cmdDel as arguments.

We can also look at other official implementations such as ipvlan in the Bridge statistics directory:

As you can see, the basic framework is also implemented by calling the skel.pluginmain method directly in main.

So this method is a key to implement the CNI plug-in, let’s briefly take a look at its source code, the method source code in the official repository PKG /skel:

/etc/cni/net.d /cni/net.d /etc/cni/net.d /cni/net.d

After the Version Check is done, different branches are used to execute different functions depending on which instruction is Add/Del/Check. These cmdAdd and other methods are passed in the main method of the above plug-in. CRI will execute different commands at different times. Generally, Kubelet will execute Version command first when creating POD, and Add command will be issued after Version verification is successful. When the POD is deleted, the Del directive is issued.

After reading this, we can probably see that the whole process of implementing a CNI plug-in is to implement cmdAdd, cmdDel, and cmdCheck (check can even be implemented without returning nil).

So the first thing we need to do is create a set of shelves that follow this pattern:

Func cmdAdd(args * skel.cmdargs) error {utils.WriteLog(" access to cmdAdd") utils.WriteLog(" where CmdArgs is: ", "ContainerID: ", args.ContainerID, "Netns: ", args.Netns, "IfName: ", args.IfName, "Args: ", args.Args, "Path: ", args.Path, "StdinData: ", String (args.stdindata)) return nil} func cmdDel(args * skel.cmdargs) error {utils.writelog (" enter into cmdDel") Utils.WriteLog(" where CmdArgs is: ", "ContainerID: ", args.ContainerID, "Netns: ", args.Netns, "IfName: ", args.IfName, "Args: ", args.Args, "Path: ", args.Path, "StdinData: ", string(args.stdindata)) // If del returns error, // Return errors.New("test cmdDel") return nil} func CmdCheck (args * skel.cmdargs) error {utils.WriteLog(" enter cmdCheck") utils.WriteLog(" here CmdArgs is: ", "ContainerID: ", args.ContainerID, "Netns: ", args.Netns, "IfName: ", args.IfName, "Args: ", args.Args, "Path: ", args.Path, "StdinData: ", string(args.StdinData)) return nil } func main() { skel.PluginMain(cmdAdd, cmdCheck, cmdDel, version.All, bv.BuildString("testcni")) }Copy the code

Here we have implemented a utils.WriteLog method. This method is very simple, which is to build a log output address, and then output the parameter string to the corresponding address, so that we can view the cnI plug-in execution log.

Why not Print FMT directly? One is because the plugin is directly called by K8S in the background, which is not easy to view in the foreground (journalctl-xeu kubelet can be viewed), and another reason is that after the cnI execution is completed, some JSON information should be directly printed on the output, and K8S needs to read this information. Therefore, do not let the log messages distort the FORMAT of the JSON information, otherwise kubelet’s log will report something like “illegal string” error.

/opt/ cnI /bin /opt/ cnI /bin

Note also that the cmdDel method must return nil. If something like errors.new () is returned, CRI will keep sending Del instructions to the plug-in until cmdDel returns nil.

Get the necessary information

We have initialized the entire cnI plug-in shelf above, now we need to get some necessary information.

As we mentioned earlier, the necessary information consists of two parts. The first is that the configuration file in /etc/cni/net.d is sent to CNI as standard output, and the second is that the container runtime information is used by CNI as environment variables.

So these are the two places we need to get the key information first. The key point is that we don’t have to do it ourselves. Remember that the skel.pluginmain function used in main already handles a lot of information for us:

This function gets container runtime information from environment variables, including containerID, netns, ifName(the name of the network card in the container), ARgs (obtained from the CNI_ARGS environment variable), PATH (binary address of the CNI), StdinData (standard output of configuration file, byte type [])

The cmdArgs structure is then passed as a parameter to the cmdAdd method, which is the main method we need to implement.

The configuration file information is only a byte array, so we need to convert it to the structure. Before declaring the structure, we need to define the configuration file in /etc/cni/.

0.3.0 "{" cniVersion" : ", "name" : "testcni", "type" : "testcni", "bridge" : "testcni0", "subnet configures" : "10.244.0.0/16}"Copy the code

This configuration doesn’t need to be as complicated as calico or anything, just the functionality we need.

CniVersion, name, and type must be specified, where type will be used to find the corresponding binary under /opt/cni/bin.

Since each binary plug-in implementation is different, the configuration file is different, so we need to declare the structure ourselves in the plug-in code:

Type PluginConf struct {// NetConf specifies the basic information of the plugin, such as CNIVersion, Name, type, etc. PrevResult types.NetConf // this runtimeConfig can be configured in /etc/cni/net.d/xxx.conf with a similar // "capabilities": {"xxx": true, "yyy": False} attributes such as // to enable the XXX capability at runtime, not yyy capability // and then wait for the container to run (or before pulling up) directly by setting the environment variable export CAP_ARGS='{" XXX ": "Aaaa ", "yyy":" BBBB "}' // to enable or disable some capabilities // Then read the data from stdin standard input with a RuntimeConfig attribute, which is RuntimeConfig: {" XXX ": "Aaaa"} // because yyy is set to false in /etc/cni/net.d/xxx.conf HTTP: / / https://kubernetes.feisky.xyz/extension/network/cni / / the cni source of implementation: /cni/libcni/api.go:injectRuntimeConfig RuntimeConfig *struct { TestConfig map[string]interface{} `json:"testConfig"` } Bridge String 'json:" Bridge "' Subnet string 'json:" Subnet "'}Copy the code

The types.NetConf contains the most basic information about the configuration file, such as the plug-in name or version, which is provided directly by the CNI repO for use. RuntimeConfig is optional and its effect is stated in the code comment above. The final Bridge and Subnet are the attributes that our plugin needs to use. We can define them here, or if your plugin needs more information, you can write them here.

Here’s a brief description of the Bridge and Subnet parameters. We declare the Bridge property in the configuration file, which represents the name of the Bridge on each node (more on what Bridges do later), and the Subnet which represents the subnetwork segment we want to IP Pods. Here we give the Pods 10.244.0.0/16.

Then we can go back to the cmdAdd function:

Func cmdAdd(args * skel.cmdargs) error {utils.WriteLog(" access to cmdAdd") utils.WriteLog(" where CmdArgs is: ", "ContainerID: ", args.ContainerID, "Netns: ", args.Netns, "IfName: ", args.IfName, "Args: ", args.Args, "Path: ", args.Path, "StdinData: ", string(args.StdinData)) pluginConfig := &PluginConf{} if err := json.Unmarshal(args.StdinData, pluginConfig); err ! = nil {utils.WriteLog("args.StdinData to pluginConfig failed ") return err} return nil}Copy the code

As mentioned above, the standard output from CRI is processed by skel.pluginmain and passed directly to cmdAdd. Then we can get the standard output from the configuration file in StdinData in args. And use the PluginConf just declared to convert the data structure.

We now have the basic information necessary to complete the plug-in’s functionality.

Now let’s review what the plug-in needs to do:

PodIP address management
Communication between PODS within nodes
Communication between PODS of different nodes

Let’s take it step by step

Set up the ETCD client

First, we need to implement Pod of IP address management function, the function in the realization of the calico or flannel, is through the custom CRD, and then the plug-in through the API – server to operate the realization of the CRD (of course there is essentially etcd) : calico.yaml

For example, in this CRD, the NETWORK segment where the podIP resides is specified through cidR, and each IP address used is recorded in these CRDS.

In our scenario, instead of such a complex design, we simply implemented IP address management.

But the question is how should we design IPAM (IP Address Manager)?

First of all, we are a distributed cluster with many nodes, so we must consider that when a POD on a node uses a certain IP address, other nodes or other pods on this node cannot use this IP address. Therefore, there must be a place to record the usage of IP address. This place must be unique in the whole cluster.

So we naturally thought of ETCD, after all, a CRD way like Calico is essentially stored in ETCD.

First we create the client of etCD:

import ( etcd "go.etcd.io/etcd/client/v3" ) func newEtcdClient(config *EtcdConfig) (*etcd.Client, error) { var etcdLocation []string if config.EtcdAuthority ! = "" { etcdLocation = []string{config.EtcdScheme + "://" + config.EtcdAuthority} } if config.EtcdEndpoints ! = "" { etcdLocation = strings.Split(config.EtcdEndpoints, ",") } if len(etcdLocation) == 0 { return nil, Errors. New(" etcd not found ")} tlsInfo := transport. tlsInfo {CertFile: config.EtcdCertFile, KeyFile: = errors.New(" etcd not found ")} tlsInfo := transport. tlsInfo {CertFile: config.EtcdCertFile, KeyFile: config.EtcdKeyFile, TrustedCAFile: config.EtcdCACertFile, } tlsConfig, err := tlsInfo.ClientConfig() client, err := etcd.New(etcd.Config{ Endpoints: etcdLocation, TLS: tlsConfig, DialTimeout: clientTimeout, }) if err ! Func GetEtcdClient() (*EtcdClient, error) {// Omit some operations if _client! Return _client, nil} else {client, err := newEtcdClient(&etcdConfig {EtcdEndpoints: "Https://192.168.98.143:2379", EtcdCertFile: "/ etc/kubernetes/pki/etcd/healthcheck - client. CRT", EtcdKeyFile: "/etc/kubernetes/pki/etcd/healthcheck-client.key", EtcdCACertFile: "/ etc/kubernetes/pki/etcd/ca. CRT",}) / / omit some code... Return client, nil} return nil, errors.New(" failed to initialize etcd client ")}Copy the code

Etcd connection is very simple, we just need to call the official etCD package provided by etCD, and give it a configuration can be connected. I wrote the link address as the etCD master address in my test environment, so you can change it while playing. For more details, please refer to the source code (address: k8S-Nci-test /client.go).

Note, however, that the K8S CA root certificate, as well as the certificate and private key required by the ETCD client must be passed to the ETCD package as configuration items, otherwise the ETCD cluster cannot be connected to HTTPS.

Note: You can apt install etcd-clinet on the cluster node and access etcd from the command line as follows. The command is a bit long, but it works!

ETCDCTL_API = 3 etcdctl - endpoints https://192.168.98.143:2379:2379 -- cacert/etc/kubernetes/pki/etcd/ca. The CRT - cert /etc/kubernetes/pki/etcd/healthcheck-client.crt --key /etc/kubernetes/pki/etcd/healthcheck-client.key get / --prefix --keys-onlyCopy the code

Example Create the IPAM service

When I was in implementing this IPAM service code, because it is their own projects can literally play, so I want to try a variety of programming method, so I will serve the IPAM code encapsulation for chain call way, may be due to the level is not enough, so the implementation of the code is long, so don’t introduce one by one to you.

Here is a brief introduction to my idea of IPAM.

First I need to create an IP address pool in etCD:

Etcd /testcni/ipam/ subnet/{subnet}/subnet/{network segment number}/pool All the available IP address segments are going to be in this key.

The purpose of setting this pool is that whenever we have a node to join the cluster, once the kubelet on this node calls our plug-in, it will trigger the IPAM service used in the plug-in. When the IPAM service is initialized on the node, It will take an unused network segment from the pool as the network segment of its own node.

In the figure above, we can see that the available network segment under this key obtained by etcdctl command starts from 10.244.3.0, because there are three nodes in my test cluster. Before this three node has three 10.244.0.0 10.244.1.0/10.244.2.0 to occupy, so if add another node, and then a less in this pool.

In addition to the need to set up a pool in etCD to hold the network segment, I also set up a key in etCD for each node to record the “IP used by the current node” :

When a new pod is created on a node, it iterates through all the IP addresses already used under this key, and then selects the last IP + 1 IP address as the IP address of the new pod, and writes this IP address back to the key.

Finally, I also recorded the network segment allocated to each node on etCD:

The reason for recording the network segment is that when the POD communication of different nodes of the plug-in is realized later, each other needs to be aware of the network segment of nodes other than themselves. Therefore, a key is used to record the network segment in advance.

Because this part of the code implementation is quite lengthy, so I can not explain to everyone, but you can check the ipAM test file to see which methods I implement (source address: k8S-Nci-test /ipam_test.go) :

Func TestIpam(t * test.t) {test := assert.New(t) // To initialize the ipAM, When the pool is initialized, ipAM will initialize the pool on the ETCD. If the pool has been initialized on other nodes, there is no need to initialize it again Init("10.244.0.0", "16") is, err := GetIpamService() if err! = nil {fmt.println ("ipam initialization failed: ", err.error ()) return} fmt.println (" success: ", is.maskip) test.equal (is.maskip, "255.255.0.0") FMT.Println(" successful: ", is.maskip) test.equal (is.maskip, "255.255.0.0") // Get().cidr () _ : = is the Get (). The CIDR (" ding - net - master ") test. The Equal (CIDR, "10.244.0.0/24") CIDR, _ = is the Get (). The CIDR (" ding - net - node - 1 ") test. The Equal (CIDR, "10.244.1.0/24") CIDR, _ = is.Get().CIDR("ding-net-node-2") test.Equal(cidr, "10.244.2.0/24") // Use Get().unusedip () to directly obtain the next IP address newIp that has not been used in the network segment where the pod of the current node resides. err := is.Get().UnusedIP() if err ! = nil {fmt.Println(" where err is: ", err.error ()) return} fmt.Println(" Where the unused IP is: ", newIp) // Release().ips () can batch Release IP addresses on the network segment where the pods of the current node reside err = is.release ().ips (newIp) if err! // Get().nodenames () to Get hostname names of all nodes in the cluster. err := is.Get().NodeNames() if err ! = nil {fmt.Println(" err here is: ", err.Error()) return } test.Equal(len(names), 3) for _, Name := range names {// Get().nodeip () to obtain the IP address of a node from hostname // Note that this method is not to obtain the pod network segment or pod IP // but to obtain the IP address of the node 192.168.98.143/144/145 = 192.168.98.143/144/145 = 192.168.98.143/144/145 = 192.168.98.143/144/145 err := is.Get().NodeIp(name) if err ! = nil {fmt.println (" where err is: ", err.error ()) return} fmt.println (" where IP is: ", IP)} // Get().allhostNetwork () can directly obtain the hostname, pod network segment, node IP address nets of each node. err := is.Get().AllHostNetwork() if err ! Println(" where err is: ", err.error ()) return} FMT.Println(" Where err is: ", err.error ()) return} FMT.Println(" Where err is: ") ", net) for _, net := range nets {FMT.Println(" ", net)} // Get().hostNetwork () can Get the IP address, POD network segment, host name, nic name, etc. err := is.Get().HostNetwork() if err ! = nil {fmt.println (" where err is: ", err.error ()) return} fmt.println (" Where network information is: ", currentNet)}Copy the code

As for the IPAM service code implementation, interested friends can consult the source code: K8S-CNI-test/IPam.go

Well, the implementation of IPAM, about equal to our implementation of CNI plug-in the first step “IP address management” function has basically completed, the rest is to pod into IP.

Next, we will try to implement the pod on the same host can communicate with each other.

To achieve communication between two isolated PODS on the same node, veth pair + Bridge is generally adopted.

Since there are a lot of articles on veth Pair and Bridge devices on the web, you can find a lot of them. I’m not going to try to explain how they work.

First, Linux supports the creation of a Veth pair, a pair of virtual devices that you can simply think of as a network cable with two plugs at each end
This plug can be freely inserted into Linux net NS or other virtual network devices, and messages sent from one end can be sent to the other end
Bridge is also a linux-enabled virtual network device that acts like a bridge to a physical device, or more like a switch, with both layer 2 forwarding and layer 3 routing capabilities
We will create a Bridge
Then create a veth pair
Insert one part of the VETH into the Bridge and the other part into the NETNS where the POD is located. This netns is passed when CRI calls the plug-in.
Ipam is used to obtain the network segment of pod on the current node from ETCD, for example, 10.244.0.0. Then we use the IP of this network segment + 1 as the IP address of bridge by default, for example, the network segment of master node is 10.244.0.0. Address is 10.244.0.1 let Bridges, so also the other nodes, respectively fortress 10.244.1.1/10.244.2.1 10.244.3.1 /…
Once the 10.244.0.1 is plugged into the bridge, the moment the IP is plugged into the bridge, the bridge has layer 3 routing capability and can then act as a gateway for all network devices connected to it
Set the default route and gateway to bridge for netns inside pod, so that POD can ping through network segments other than 10.244.0.0
When a new pod is created, repeat actions 5, 6, 7, 9 above

Let’s take a quick look at the code implementation:

func cmdAdd(args *skel.CmdArgs) error { pluginConfig := &PluginConf{} if err := json.Unmarshal(args.StdinData, pluginConfig); err ! = nil {utils.WriteLog("args.StdinData failed to convert pluginConfig ") return err} // Initialize ipAM using the subnet address sent by Kubelet (Containerd) Ipam.init (pluginconfig.subnet) ipamClient, err := ipam.getipamService () Gateway, err := ipamclient.get ().Gateway() // Get gateway + network segment number gatewayWithMaskSegment, BridgeName := pluginconfig. Bridge if bridgeName! = "" {bridgeName = "testcni0"} It needs to be 1460 because the VXLAN device adds a 40-byte VXLAN header mtu := 1500 // Get the nic name that containerd sent. IfName := args. ifName // Get netns from containerd. Err := ns.GetNS(args.Netns) // Get an unused IP address from ipam podIP, Err := ipamclient.get ().unusedip () // The podIP is in the etCD PodIP = podIP + "/" + ipamclient. MaskSegment podIP = podIP + "/" + "24" /** * After the preparation operation is complete, you can call the network tool to create the network. Create a bridge based on the bridge name * 2. Create a pair of veth * 3. If IfName is used on pod(netns) * 4. Set up the bridge and this pair of veth * 6. Create a default route in pod(netns) and route all IP addresses that match 0.0.0.0 veth out of IfName * 7. Set iptables for the host Make all traffic from bridgeName forward(because Docker may set iptables rules against forwarding itself) */ err = nettools.CreateBridgeAndCreateVethAndSetNetworkDeviceStatusAndSetVethMaster(bridgeName, gatewayWithMaskSegment, ifName, PodIP, MTU, netns) // omit the following code...... }Copy the code

The general process is similar to the principle briefly introduced above, that is, first obtain some necessary information such as gateway address, bridge name, MTU, NetNS address, unused podIP, etc.

Get up all we call a look very SAO gas function “CreateBridgeAndCreateVethAndSetNetworkDeviceStatusAndSetVethMaster”, although the name SAO gas, but what also be clear at a glance, In the comments above the function, I also write the specific function, so I don’t need to talk about the function here, but I will simply look at the implementation:

func CreateBridgeAndCreateVethAndSetNetworkDeviceStatusAndSetVethMaster( brName, gw, ifName, podIP string, mtu int, Netns ns.NetNS,) error {// CreateBridge br, err := CreateBridge(brName, gw, Do(func(hostNs ns.NetNS) error {// Create a pair of veth devices containerVeth, hostVeth, Err := CreateVethPair(ifName, mtu) err = SetVethNsFd(hostVeth, mtu) PodIP err = SetIpForVeth(containerVeth, Err = SetUpVeth(containerVeth); After testing, it is found that it must be written here. If it is written in the following hostns.do, an error will be reported. GwNetIP, _, err := net.parsecidr (gw) // Add a default routing rule to POD. Container Veth Err = SetDefaultRouteToVeth(gwNetIP, ContainerVeth) hostns.do (func(_ ns.NetNS) error {// Get the host veth again because the hostVeth changed. Name) hostVeth = _hostVeth.(* netlink.veth) // Start it err = SetUpVeth(hostVeth) Err = SetVethMaster(hostVeth, br); err = SetVethMaster(host, br); It is possible that the iptables was set forward drop / / need to use iptables to allow bridge to do forward err = SetIptablesForBridgeToForwardAccept (br) return nil}) return nil }) if err ! = nil { return err } return nil }Copy the code

I’ve omitted some error handling, but the main flow is exposed, and it’s basically the same principle as above.

Note that if docker is installed on the host, docker will automatically set iptables to drop the ip_forward function of the bridge, so try to change it to ACCPET, otherwise even if the bridge has IP, However, IP forwarding is still not possible.

/proc/sys/net/ipv4/ip_forward = 1 if iptables does not work well

This is a kernel parameter, and a value of 0 means that the kernel dismisses the IP when it finds it needs to be forwarded

When all the above operations are successful, pods on the same node should be able to ping each other because they are plugged into the same bridge via the Veth pair.

You can check this by using the cnitool provided by CNI or some of the test methods provided in the source code of this article. For specific test methods, see the readme.md at k8S-CnI-test/readme.md.

Realize pod communication between different nodes

Having realized the intercommunication with the nodes, we are going to try to realize the intercommunication between devices of different nodes.

There are many ways to realize device intercommunication between different nodes, but generally speaking, there are four kinds:

SDN, which I don’t know much about personally, so I won’t go into details
The static route of the host is relatively simple, so we use this method
Overlay network, also known as tunnel network
Dynamic routed network

The second method is called GW-host in Flannel. The third method can use VXLAN or other tunnel network technologies. The fourth method is calico’s main method, which uses BGP to set dynamic routes on multiple hosts.

We use the second approach, which is simpler and better understood (we may try to implement tunnel network pattern correlation later if we have time).

First, let’s briefly introduce the principle of host routing, which is actually very simple:

Obtain the pod network segment address of all nodes except this node from etCD
Get the IP addresses of all nodes except this node from etCD
Set these network segment addresses to the local routing table, and set the other network segment addresses to DST and the other IP addresses to GW, or next hop (VIA).
Set host NIC iptables to allow IP forward

In simple terms, when the traffic from the pod of the current node goes to the host, the next hop address of the target POD can be known through the routing table. The next hop address is the real IP address of other nodes. In this way, the traffic packet can be sent to the corresponding node, and then other nodes receive it. Traffic is then forwarded to the bridge on the host based on the routing table of the host and the destination IP address in the traffic packet. The bridge then sends the traffic to its connected VETH, which can then send it to the corresponding Nets (Pods).

Let’s implement some code:

Func cmdAdd(args * skel.cmdargs) error {// Omit previous code...... // Obtain the network information networks of all nodes in the cluster stored in etCD through ipAM. Err := ipamclient.get ().AllHostNetwork() Err := ipamclient.get ().hostNetwork () // The cidR of the Pods on other nodes and the IP of the host's network adapter are created as a routing rule to err = on the current host nettools.SetOtherHostRouteToCurrentHost(networks, currentNetwork) link, Err := netlink.LinkByName(currentNetwork.name) // Set the iptables rule for the external NETWORK adapter of the current host. Forward to allow do err = nettools. SetIptablesForDeviceToFarwordAccept (link. (* netlink. Device)) / / finally get some necessary information such as the gateway information _gw: = Net.parseip (gateway) Result := &current.Result{CNIVersion: pluginconfig. CNIVersion, IPs: []*current.IPConfig{ { Address: *_podIP, Gateway: Types.printresult (result, pluginconfig.cniversion) return nil}Copy the code

Add the pod network segment and IP address of the other host to the routing table of the current host.

Finally, don’t forget to print it to standard output, because K8S reads jSON-formatted information from standard output for subsequent operations.

Walk a wave!

When the above code is completed, in theory our cnI plug-in “add network” function is almost realized, of course, the last also need to implement cmdDel used to clear the network, clear the network is relatively simple, just need to reverse operation, the previous creation of veth pair what all to kill on the line. Remember to keep the Bridge, as it may be used by new pods later.

Then we run:

go build main.go & mv main /opt/cni/bin/testcni
Copy the code

The binary executable file is generated and mv is stored in the /opt/cni/bin directory and renamed as testcni. Of course, the name can be changed, as long as it corresponds to the type in the configuration file.

/opt/cni/bin/ testcni /etc/cni/net.d/testcni /opt/cni/bin/

K8s can make use of the ability to execute commands in POD on each node and copy these files to the corresponding directory

Finally, let’s try two pods and ping.

I provide a simple test deploy to Busybox (address: k8S-Cuni-test /test-busybox.yaml) :

apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox
spec:
  selector:
    matchLabels:
      app: busybox
  replicas: 2
  template:
    metadata:
      labels:
        app: busybox
    spec:
      containers:
      - name: busybox
        image: busybox
        command:
          - sleep
          - "36000"
        imagePullPolicy: IfNotPresent
Copy the code

Then in the cluster:

k apply -f test-busybox.yaml
Copy the code

Then check the POD status and enter the POD:

k exec -it pods/xxxxx -- sh
Copy the code

Finally ping each other:

As you can see, busyBox starts two copies, one on Node-1 and one on Node-2, in different network segments. One IP address is 10.244.1.2 and the other is 10.244.2.2.

Then enter a POD, and try to ping pod on another node, found that it can be pinged through!

OK! Knock it off!

Here we start from 0, step by step to implement a simple CNI network plug-in. There are many places where the implementation is not in place, if you are welcome to point out at any time.

alms

Finally paste a source address: k8S-CNI-test

If you feel that there is some help, please point a little star, thank you ~