Calico is a pure three-tier data center network solution that seamlessly integrates with Iaas cloud architectures like OpenStack to provide controllable IP communication between VMS, containers, and bare metal machines. Why is it pure three layers? All data packets are routed to the corresponding host and container, and then all routes are synchronized to all machines or data centers through BGP, thus completing the interconnection of the entire network.

In simple terms, Calico creates a bunch of Veth pairs on the host, with one end on the host and the other in the container’s network namespace, and then sets up several routes between the container and the host to complete the network interconnection.

1. The unveiling of Calico network model

Here we use specific examples to help you understand the communication principle of Calico network. Select any node in the K8S cluster as the experimental node, enter container A, and check the IP address of container A:

$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:inet 127.0.0.1/8 scope host LO valid_lft forever preferred_lft forever 3: eth0@if771: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1440 qdisc noqueue state UP link/ether 66:fb:34:db:c9:b4 brd Ff :ff:ff:ff:ff:ff :ff inet 172.17.8.2/32 scope global eth0 valid_lft forever preferred_lft foreverCopy the code

Here the container gets the / 32-bit host address, representing container A as A single point of LAN.

Take A look at container A’s default route:

$IP route default via 169.254.1.1 dev eth0 169.254.1.1 dev eth0 scope linkCopy the code

The routing table shows that 169.254.1.1 is the default gateway of the container, but there is no network card corresponding to this IP address. What is this?

Don’t panic. First, recall that when the destination address of a packet is not local, it will query the routing table. After finding the gateway in the routing table, it will first obtain the MAC address of the gateway through ARP, and then change the target MAC address to that of the gateway in the network packets it sends. The GATEWAY IP address does not appear in any network header. In other words, no one cares what the IP address is, as long as they can find the corresponding MAC address and respond to ARP.

With that in mind, we can proceed further by using the IP neigh command to check the local ARP cache:

$IP neigh 169.254.1.1 dev eth0 lladdr ee:ee:ee:ee:ee REACHABLECopy the code

This MAC address is supposed to have been shoehorned in by Calico and is ARP responsive. But how exactly does it work?

In a normal scenario, the kernel sends an ARP request asking who owns IP address 169.254.1.1 in the entire Layer 2 network. The devices with this IP address will return their MAC addresses to each other. The MAC address is a useless EE :ee:ee:ee:ee:ee :ee:ee:ee. The container and the host network cannot communicate at all. So how did Calico do it?

I won’t beat around the bush here, but Calico actually uses the proxy ARP function of the network card. Proxy ARP is a variant of ARP. When a gateway receives an ARP request across a network segment, it returns the request to the addressee using its MAC address. Proxy ARP is called Proxy ARP. Here’s an example:

In the figure above, the computer sends an ARP request for the MAC address of the server 8.8.8.8. Upon receiving the ARP request, the router (gateway) determines that the destination 8.8.8.8 does not belong to the local network segment (that is, across network segments) and returns the MAC address of its interface to the PC. When the PC accesses the server later, The target MAC is packaged directly as MAC254.

Now that we know that Calico essentially told a “white lie” using proxy ARP, let’s make sure.

View the nic information and routing information of the host:

$ ip addr ... 771: calicba2f87f6bb@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 14 inet6 fe80::ecee:eeff:feee:eeee/64 scope link valid_lft forever preferred_lft forever . $ ip route ... Dev calicba2f87f6bb scope link...Copy the code

Check whether proxy ARP is enabled.

$ cat /proc/sys/net/ipv4/conf/calicba2f87f6bb/proxy_arp
1Copy the code

If you are not sure, you can use tcpdump to capture packets for verification:

$ tcpdump -i calicba2f87f6bb -e -nn tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on calicba2f87f6bb, Link-type EN10MB (Ethernet), Capture size 262144 bytes 14:27:13.565539 EE :ee:ee:ee:ee:ee > 0a:58: AC :1c: CE :12, Ethertype IPv4 (0x0800), length 4191:10.96.0.1.443 > 172.17.8.2.36180: Flags [P.], seq 403862039:403866164, ack 2023703985, win 990, options [nop,nop,TS val 331780572 ecr 603755526], Length 4125 14:27:13.565613 0A :58: AC :1c: CE :12 > EE: EE: EE: EE :ee:ee:ee: EE, etherType IPv4 (0x0800), Length 66: 172.17.8.2.36180 > 10.96.0.1.443: Flags [.], ack 4125, win 2465, options [nop,nop,TS val 603758497 ecr 331780572], length 0Copy the code

Conclusion:

  1. Calico guides all workload traffic to a special gateway 169.254.1.1 in a clever way, thus diverting it to the host’s Calixxx network device, and finally converting all layer 2 and 3 traffic into Layer 3 traffic for forwarding.
  2. After proxy ARP is enabled on the host, ARP broadcasts are suppressed on the host, preventing broadcast storms and preventing ARP table expansion.

2. Simulate networking

Now that we know how Calico’s networking works, it’s time to manually simulate and verify. The architecture is shown in the figure below:

Run the following command on Host0:

$ ip link add veth0 type veth peer name eth0 $ ip netns add ns0 $ ip link set eth0 netns ns0 $ ip netns exec ns0 ip a Add 10.20.1.2/24 dev eth0 $IP netns exec ns0 IP link set eth0 up $IP netns exec ns0 IP route add 169.254.1.1 dev eth0 Scope link $IP netns exec NS0 IP route add default via 169.254.1.1 dev eth0 $IP link set veth0 up $IP route add 10.20.1.2 dev veth0 scope link $IP route add 10.20.1.3 via 192.168.1.16 dev ens192 $echo 1 > /proc/sys/net/ipv4/conf/veth0/proxy_arpCopy the code

Run the following command on Host1:

$ ip link add veth0 type veth peer name eth0 $ ip netns add ns1 $ ip link set eth0 netns ns1 $ ip netns exec ns1 ip a Add 10.20.1.3/24 dev eth0 $IP netns exec ns1 IP link set eth0 up $IP netns exec ns1 IP route add 169.254.1.1 dev eth0 Scope link $IP netns exec NS1 IP route add default via 169.254.1.1 dev eth0 $IP link set veth0 up $IP route add 10.20.1.3 dev veth0 scope link $IP route add 10.20.1.2 via 192.168.1.32 dev ens192 $echo 1 > /proc/sys/net/ipv4/conf/veth0/proxy_arpCopy the code

Network connectivity test:

# Host0 $IP netns exec ns1 ping 10.20.1.3 ping 10.20.1.3 (10.20.1.3) 56(84) bytes of data.64 bytes from 10.20.1.3: Icmp_seq =1 TTL =62 time=0.303 ms 64 bytes from 10.20.1.3: ICmp_seq =2 TTL =62 time=0.334 msCopy the code

Successful experiment!

The specific forwarding process is as follows:

  1. All packets in the NS0 network space are forwarded to a virtual IP address 169.254.1.1, and ARP requests are sent.
  2. When receiving ARP requests, the veth end of Host0 directly returns its MAC address to NS0 by enabling proxy ARP.
  3. Ns0 sends IP packets destined for NS1.
  4. Because the address 169.254.1.1 is used, the Host determines that the routing is layer 3 and queries the local route10.20.1.3 via 192.168.1.16 dev ens192If BGP is configured, you can see that the proTO protocol is BIRD.
  5. When Host1 receives the packet of 10.20.1.3, it matches the local routing table10.20.1.3 dev veth0 scope link, the data packet is forwarded to the corresponding veth0 end, so as to reach NS1.
  6. Return similar

Through this experiment, we can clearly grasp the data forwarding process of Calico network. First, we need to configure a special route for all NS, and use the proxy ARP function of VETH to make all the forwarding out of NS become layer 3 routing forwarding, and then use the host route for forwarding. In this way, not only layer 2 and 3 forwarding with the same host, but also cross-host forwarding can be realized.