Flannel is an open source CNI plug-in of CoreOS. The following figure shows the packet encapsulation, transmission and unpacking provided by Flannel’s official website. It can be seen from the picture that docker0 of two machines are in different segments: Backend Service2 Pod (10.1.20.3) is connected from Web App Frontend1 Pod (10.1.15.2) to 10.1.20.1/24 and 10.1.15.1/24. Network packets are sent from 192.168.0.100 to 192.168.0.200. The packets in the inner container are encapsulated in the UDP of the host, and the IP and MAC addresses of the host are wrapped in the outer layer. This is a classic overlay network. Because the IP of the container is an internal IP that cannot be communicated across hosts, the network of the container needs to be overlaid on the host network.

Flannel supports multiple network modes, including VXLAN, UDP, HOSTGW, IPIP, GCE and Ali Cloud. The difference between VXLAN and UDP is that VXLAN is a kernel packet, while UDP is a Flanneld user-mode program packet, so the performance of UDP is lower. Hostgw mode is a host gateway mode. The gateway of a container to another host is set as the network card address of the host. This is very similar to Calico, except that CalICO is declared through BGP, while HostGW is distributed through etCD in the center, so HostGW is directly connected. Overlay packet sealing and unpacking are not required, so the performance is relatively high. However, the biggest disadvantage of hostGW mode is that it must be in a Layer 2 network. After all, the next hop route must be in the neighbor table, otherwise it cannot be passed.

In the actual production environment, the most commonly used or VXLAN mode, we first look at the working principle, and then through the source code analysis to achieve the process.

The installation process is very simple, mainly divided into two steps:

Step 1 Install the Flannel,

Yum install Flannel or use Kubernetes daemonset to configure the etCD address for flannel

The second step is to configure the cluster network,

curl -L http://etcdurl:2379/v2/keys/flannel/network/config -XPUT -d Value = "{\" Network \ ": \" along / 16 \ "and \" SubnetLen \ ": 24, \" Backend \ ": {\" Type \ ": \" vxlan \ ", \ "VNI \" : 1}}"Copy the code

The Flanned program for each node is then started.

Working principle:

1. How to assign container addresses:

When the Docker container is started, flannel allocates an IP address through Docker0. Flannel assigns an IP segment to each machine, which is configured on docker0. After the container is started, flannel selects an unused IP address in this segment.

First look at the flannel boot file/usr/lib/systemd/system/flanneld. Service

[Service]

Type=notify

EnvironmentFile=/etc/sysconfig/flanneld

ExecStart=/usr/bin/flanneld-start $FLANNEL_OPTIONS

ExecStartPost=/opt/flannel/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /run/flannel/dockerCopy the code

The flannel environment variable is specified in the following file: /run/flannel/docker environment variable is specified in the following file: /run/ Flannel /docker /flannel/docker

DOCKER_OPT_BIP = "-- the BJP = 10.251.81.1/24" DOCKER_OPT_IPMASQ = "- IP - masq = false" DOCKER_OPT_MTU = "-- mtu = 1450" DOCKER_NETWORK_OPTIONS = "- the BJP = 10.251.81.1/24 - IP - masq = false - mtu = 1450"Copy the code

And this file is docker boot file/usr/lib/systemd/system/docker. The service is associated,

[Service]

Type=notify

NotifyAccess=all

EnvironmentFile=-/run/flannel/docker

EnvironmentFile=-/etc/sysconfig/dockerCopy the code

This allows you to set the bridge for Docker0.

In the development environment, there are three machines assigned to the following network segments:

The host – 139.245 10.254.44.1/24

The host – 139.246 10.254.60.1/24

The host – 139.247 10.254.50.1/24

2. How do containers communicate

The preceding section describes how to assign IP addresses to containers. Then, containers on different hosts communicate with each other using the most common VXLAN. There are three key points: routing, ARP, and FDB. We analyze the functions of the above three elements one by one according to the packet sending process of the container. First, the packet coming out of the container will pass through Docker0. Then, are the packets going out from the host network directly or through VXLAN packet forwarding? This is the routing Settings on each machine,

# IP route show dev flannel.1 10.254.50.0/24 via 10.254.50.0 onlink 10.254.60.0/24 via 10.254.60.0 onlinkCopy the code

As you can see, each host has a route to the other two machines. This route is the onlink route. The onlink parameter indicates that the gateway is forced to be “on the link” (although there is no link layer route), otherwise Linux cannot add routes to different network segments. The packets are then known and, if accessed directly by the container, passed on to the Flannel.1 device.

Flannel.1 The virtual network device will packet the data, but the next question is, what is the MAC address of the gateway? Since the gateway was set via onLink, Flannel would deliver the MAC address. Check the ARP table

# IP neig show dev flannel.1 10.254.50.0 lladdr ba:10:0e:7b:74:89 PERMANENT 10.254.60.0 llADDR 92: F3: C8 :b2:6e:f0 PERMANENTCopy the code

You can see the MAC address of the gateway, so that the packets in the inner layer are encapsulated

Last question, what is the destination IP of the outgoing packet? In other words, which machine should the packaged packet be sent to? Every packet is broadcast. The default vxLAN implementation was broadcast for the first time, but Flannel sent the FDB directly in hack mode again

# bridge FDB show dev flannel.1 92:f3: C8 :b2:6e:f0 DST 10.100.139.246 self permanent ba:10:0e:7b:74:89 DST 10.100.139.247  self permanentCopy the code

In this way, the corresponding MAC address forwarding target IP address can be obtained.

However, both the ARP table and the FDB table are permanent, which indicates that the write record is manually maintained. Traditionally, arp obtains neighbor information through broadcast. If the neighbor is received, the neighbor will be marked as reachable. If the peer end is found to be invalid, it will be marked as stale, and then delay and probe will enter the probe state. If the probe fails, it will be marked as Failed. The reason why I introduce the basic content of ARP is that the old version of Flannel does not use the method I mentioned above. Instead, it uses a temporary ARP scheme. In this case, the delivered ARP represents the reachable state. In order for the container to obtain the peer ARP address, the kernel will first send an ARP request if it tries

/proc/sys/net/ipv4/neigh/$NIC/ucast_solicitCopy the code

In this case, an ARP request is sent to the user space

/proc/sys/net/ipv4/neigh/$NIC/app_solicitCopy the code

Previous versions of Flannel took advantage of this feature

# cat   /proc/sys/net/ipv4/neigh/flannel.1/app_solicit

3Copy the code

Thus, Flanneld can obtain the L3MISS sent by the kernel to user space, and return the MAC address corresponding to the IP address with etCD, and set it as reachable. From the analysis, it can be seen that if the Flanneld program exits, the communication between containers will be interrupted, which needs attention. The startup process of Flannel is shown below:

By default, etCD is used. If kube-subnet-Mgr is specified, the Kubernetes interface is used to store data.

The specific code is as follows:

func newSubnetManager() (subnet.Manager, error) {

    if opts.kubeSubnetMgr {

       return kube.NewSubnetManager(opts.kubeApiUrl, opts.kubeConfigFile)

    }


    cfg := &etcdv2.EtcdConfig{

       Endpoints: strings.Split(opts.etcdEndpoints, ","),

       Keyfile:   opts.etcdKeyfile,

       Certfile:  opts.etcdCertfile,

       CAFile:    opts.etcdCAFile,

       Prefix:    opts.etcdPrefix,

       Username:  opts.etcdUsername,

       Password:  opts.etcdPassword,

    }


    // Attempt to renew the lease for the subnet specified in the subnetFile

    prevSubnet := ReadCIDRFromSubnetFile(opts.subnetFile, "FLANNEL_SUBNET")


    return etcdv2.NewLocalManager(cfg, prevSubnet)

 }Copy the code

You can use SubnetManager to obtain network configuration information, including backend and network segment information. For VxLAN, you can use NewManager to create a network manager in simple engineering mode. First, each network mode manager is initialized with init registration,

Such as vxlan

func init() {

    backend.Register("vxlan", New)Copy the code

If it is a udp

func init() {

    backend.Register("udp", New)

 }Copy the code

Similarly, build methods are registered in a map to enable the network manager according to the network mode configured by ETCD.

The third step is to register the network

RegisterNetwork, the flannel.vxlanID nic is created first. The default vxlanID is 1. Flannel then registers a lease with the ETCD and obtains the corresponding network segment information. Each time the old flannel starts, it obtains a new network segment. The new Flannel will traverse the etCD information already registered to obtain the previously allocated network segment and continue to use it.

Finally, write the local subnet file by WriteSubnetFile,

# cat /run/flannel/subnet.env 

FLANNEL_NETWORK=10.254.0.0/16

FLANNEL_SUBNET=10.254.44.1/24

FLANNEL_MTU=1450

FLANNEL_IPMASQ=trueCopy the code

Use this file to set up the docker network. Careful readers may notice that the MTU is not 1500 as required by Ethernet because the outer VXLAN packet takes up 50 bytes.

Of course, after the startup of flannel, the data in the Watch ETCD is also required. These are the three tables that can be dynamically updated by other Flannel nodes when a new flannel node is added or changed. The main handling methods are in handleSubnetEvents

Func (NW *network) handleSubnetEvents(Batch []subnet.Event) {.. switch event.Type {// If a new network segment is added (a new host is added) case // Update the routing table if err := netLink. RouteReplace(&directRoute); err ! = nil { log.Errorf("Error adding route to %v via %v: %v", sn, attrs.PublicIP, err) continue} // Add an ARP table log.v (2).Infof("adding subnet: %s PublicIP: %s VtepMAC: %s", sn, attrs.PublicIP, net.HardwareAddr(vxlanAttrs.VtepMAC)) if err := nw.dev.AddARP(neighbor{IP: sn.IP, MAC: net.HardwareAddr(vxlanAttrs.VtepMAC)}); err ! Error("AddARP failed: ", err) continue} // AddFDB table if err := nw.dev.AddFDB(neighbor{IP: attrs.PublicIP, MAC: net.HardwareAddr(vxlanAttrs.VtepMAC)}); err ! = nil { log.Error("AddFDB failed: ", err) if err := nw.dev.DelARP(neighbor{IP: event.Lease.Subnet.IP, MAC: net.HardwareAddr(vxlanAttrs.VtepMAC)}); err ! = nil {log.Error("DelARP failed: ", err)} continue} // Delete a route if err := netlink.RouteDel(&directRoute); err ! = nil { log.Errorf("Error deleting route to %v via %v: %v", sn, attrs.PublicIP, err) } else { log.V(2).Infof("removing subnet: %s PublicIP: %s VtepMAC: %s", sn, attrs.PublicIP, net.hardwareAddr (vxlanattrs.vtepmac)) // Delete arp if err := nw.dev.DelARP(neighbor{IP: sn.IP, MAC: net.HardwareAddr(vxlanAttrs.VtepMAC)}); err ! Error("DelARP failed: ", err)} if err := nw.dev.DelFDB(neighbor{IP: attrs.PublicIP, MAC: net.HardwareAddr(vxlanAttrs.VtepMAC)}); err ! = nil { log.Error("DelFDB failed: ", err) } if err := netlink.RouteDel(&vxlanRoute); err ! = nil { log.Errorf("failed to delete vxlanRoute (%s -> %s): %v", vxlanRoute.Dst, vxlanRoute.Gw, err) } } default: log.Error("internal error: unknown event type: ", int(event.Type)) } } }Copy the code

In this way, any host additions and deletions to the Flannel can be sensed by other nodes, updating the local kernel forwarding table.

Author: Chen Xiaoyu

Original: http://college.creditease.cn/#/detail/15/176