This is the 9th day of my participation in Gwen Challenge

Docker network introduction

The introduction

This paper briefly introduces how each network type of Docker carries out network switching, how Docker container carries out network communication under the condition of being isolated by NetworkNamespace, and how to carry out network communication between different host containers

prepared

I created two centos7 VMS with Selinux, swap and FileWalld disabled respectively. Two hosts can ping through each other.

The host
172.16.3.141
172.16.3.142

1. The Bridge network

When multiple containers need to communicate on the same Docker host, a user-defined bridging network is the best choice

Bridge is the network model Docker uses by default. Let’s take a look at how Bridge enables two containers to communicate with each other. At this time, all the configurations of the host are the default configurations.

Draw a rough picture

[root@localhost ~]# docker -v
Docker version 20.10.7, build f0df350

Copy the code

We found that there is a docker0 bridge device, the same host container need to communicate with the native Docker0 bridge

[root@localhost ~]# ifconfig docker0: Flags = 4163 < UP, BROADCAST, RUNNING, MULTICAST > mtu 1500 inet 172.17.0.1 netmask 255.255.0.0 BROADCAST 172.17.255.255 inet6 fe80::42:d2ff:fe1d:22a5 prefixlen 64 scopeid 0x20<link> ether 02:42:d2:1d:22:a5 txqueuelen 0 (Ethernet) RX packets 0 Bytes 0 (0.0b) RX errors 0 dropped 0 Overruns 0 Frame 0 TX packets 5 bytes 446 (446.0b) TX errors 0 dropped 0 overruns  0 carrier 0 collisions 0 ens33: Flags = 4163 < UP, BROADCAST, RUNNING, MULTICAST > mtu 1500 inet 172.16.3.141 netmask 255.255.0.0 BROADCAST 172.16.255.255 inet6  fe80::8d1e:c5fb:7a84:89a prefixlen 64 scopeid 0x20<link> ether 00:0c:29:81:ed:94 txqueuelen 1000 (Ethernet) RX packets 104250 bytes 139817031 (133.3 MiB) RX errors 0 Dropped 0 Overruns 0 Frame 0 TX packets 42132 bytes 4974745 (4.7 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: Flags =73<UP,LOOPBACK,RUNNING> MTU 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixLen 128 scopeid 0x10<host> loop Txqueuelen 1000 (Local Loopback) RX packets 198 bytes 16420 (16.0 KiB) RX errors 0 Dropped 0 Overruns 0 frame 0 TX Packets 198 bytes 16420 (16.0 KiB) TX errors 0 Dropped 0 Overruns 0 carrier 0 collisions 0Copy the code
  1. Let’s create two sandbox containers and see what happens to the network card
docker run -it -d --name busybox busybox
docker run -it -d --name busybox1 busybox
Copy the code
[root@localhost ~]# docker ps -a
CONTAINER ID   IMAGE     COMMAND   CREATED          STATUS          PORTS     NAMES
c7d208c90e23   busybox   "sh"      19 minutes ago   Up 19 minutes             busybox1
15e045ae83f8   busybox   "sh"      25 minutes ago   Up 25 minutes             busybox
Copy the code

Veth7c9ae7d and VethD5DAB10. Veth pair is a virtual device that usually exists in pairs. Requests received from one network card will immediately appear on the peer network card. That is, the two Veth pairs on the host correspond to eth0 in the container

veth7c9ae7d: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet6 fe80::bc27:7fff:fe48:9a77 prefixlen 64 scopeid 0x20<link> Ether be:27:7f:48:9a:77 TXQueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0b) RX errors 0 Dropped 0 Overruns 0 frame 0 TX Packets 8 bytes 656 (656.0b) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 vethd5dab10: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet6 fe80::e409:5dff:fef4:6293 prefixlen 64 scopeid 0x20<link> Ether e6:09:5D: F4:62:93 TXQueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0b) RX errors 0 Dropped 0 Overruns 0 frame 0 TX Packets 13 bytes 1102 (1.0 KiB) TX errors 0 Dropped 0 Overruns 0 carrier 0 collisions 0Copy the code

If we look at the bridge device, the two virtual devices on it are plugged into the Docker0 bridge. If you don’t know how this works, you should know that containers interact with each other through the Docekr0 bridge

[root@localhost ~]# brctl show
bridge name	bridge id		STP enabled	interfaces
docker0		8000.0242d21d22a5	no		veth7c9ae7d
							vethd5dab10
Copy the code

Looking at docker network, I have removed some information for ease of view. We can see that the two newly created containers already appear in the default Bridge configuration and are assigned the subnet ip172.17.0.2 and 172.17.0.3, respectively

[root@localhost ~]# docker network inspect 0f2e3e50513e [ { "Name": "bridge", ......... "IPAM": { "Driver": "Default ", "Options": null, "Config": [# configure Subnet scope {"Subnet": "172.17.0.0/16"}]}, "Containers": { "15e045ae83f863c0c74588f8a1cea51e49f11128e46c37df7707c91fafb5a996": { "Name": "busybox", "MacAddress": "02:42:ac:11:00:02", "IPv4Address": "172.17.0.2/16"}, "c7d208c90e23137846c6fbdba0af8f3e28c79503dbd37cbc41ec5677343a3ae5" : {" Name ": "Busybox1 MacAddress", "" :" 02:42: ac: 11:00:03 ", "IPv4Address" : "172.17.0.3/16,"}}}]Copy the code

We try to ping the IP address of another container inside the container to see if it works

The container ip
busybox 172.17.0.2
busybox1 172.17.0.3
[root@localhost ~]# docker exec it busybox sh -c 'ping 172.17.0.3' ping 172.17.0.3 (172.17.0.3): 56 data bytes 64 bytes from 172.17.0.3: seq=0 TTL =64 time=0.419 ms 64 bytes from 172.17.0.3: Seq =1 TTL =64 time= 0.830ms [root@localhost ~]# docker exec it busybox1 sh -c 'ping 172.17.0.2 (172.17.0.2): 56 data bytes 64 bytes from 172.17.0.2: seq=0 TTL =64 time=0.419 ms 64 bytes from 172.17.0.2: Seq = 1 TTL = 64 time = 0.942 msCopy the code

How do they communicate? Isn’t it isolated by NetworkNamespace? All requests to 172.17.0.0 are routed through eth0. What is eth0? How does it route requests to eth0 in another container?

[root@localhost ~]# docker exec -it busybox1 sh -c 'ifconfig' eth0 Link encap:Ethernet HWaddr 02:42:AC:11:00:03 inet Addr: 172.17.0.3bcast :172.17.255.255 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:20 errors:0 dropped:0 overruns:0 frame:0 TX packets:12 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX Bytes :1496 (1.4kib) TX bytes:840 (840.0b) Lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP Loopback  RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 Overruns :0 Carrier :0 collisions:0 TXQueuelen :1000 RX bytes:0 (0.0b) TX bytes:0 (0.0b) [root@localhost ~]# docker exec -it busybox1 sh -c 'route -n' Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.17.0.1 0.0.0.0 UG 0 0 0 eth0 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0Copy the code

All requests to 172.17.0.0 are routed to the Docker0 bridge. We now know that requests from 172.17.0.3 (busybox1) are sent to Docker0 via vethpair on the host via eth0. How did Docker0 find 172.17.0.2?

[root@localhost ~]# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.16.3.1 0.0.0.0 UG 100 00 ENS33 172.16.0.0 0.0.0.0 255.255.0.0 U 100 00 ENS33 172.17.0.0 0.0.0.0 255.255.0.0 U 00 0 docker0Copy the code

According to the routing table in the container, if the destination IP address is 172.17.0.0, eth0 is used and Flags=U is used, indicating that it is a direct connection rule. The direct connection rule only needs to go through the Layer 2 network, but the layer 2 network requires the MAC address of destination IP address 172.17.0.2. Docker0 sends an Arp broadcast to find the MAC address corresponding to 172.17.0.2, and then initiates a direct connection request


Turn on the TRACE function and see how the packet goes

iptables -t raw -A OUTPUT -p icmp -j TRACE
iptables -t raw -A PREROUTING -p icmp -j TRACE
Copy the code

Ping busybox1 from busyBox (172.17.0.2)

We sent an ICMP request from 172.17.0.2 to 172.17.0.3

tail -f /var/log/message Jun 23 08:17:17 localhost kernel: TRACE: Raw :PREROUTING :rule:4 IN=docker0 # PREROUTING :rule:4 IN= DOCker0 # PREROUTING :4 IN= Docker0 # PREROUTING: If empty, the host receives PHYSIN= vethD5dab10 # Destination MAC address 02:42:AC :11:00:03, source 02:42:AC :11:00:02 MAC = 02:42: ac: 11:00:03:02:42: ac: 11:00:02:08:00 # 08:00 for upper-layer protocol code, SRC=172.17.0.2 # source IP address DST=172.17.0.3 # Destination IP Address LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=41094 DF PROTO=ICMP TYPE=8 CODE=0 ID=20 SEQ=0Copy the code

The MAC address of the target 172.17.0.3 is 02:42:AC :11:00:03, and the veth pari of the target is obtained from the CAM table. (CAM is a MAC address learning function obtained)

[root@localhost ~]# cat /proc/net/arp IP address HW type Flags HW address Mask Device 172.17.0.3 0x1 0x2 02:42: AC :11:00:03 * Docker0 172.16.3.1 0x1 0x2 3E :22: FB: A1:32:64 * ens33 172.17.0.2 0x1 0x2 02:42: AC :11:00:02 * Docker0Copy the code

Knowing the flow of container direct network transport, what if we ping the IP of the container directly from the host machine? It can be seen that ping is also successful. What is the direction of the data at this time

[root@localhost ~]# ping 172.17.0.2 ping 172.17.0.2 (172.17.0.2) 56(84) bytes of data. 64 bytes from 172.17.0.2: Icmp_seq =1 TTL =64 time= 1.68ms 64 bytes from 172.17.0.2: ICmp_seq =2 TTL =64 time= 0.445msCopy the code

According to the host routing table, the packet destined for 172.17.0.2 passes through the Docker0 bridge device, which is also a direct connection type, and then returns to the arp process

172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
Copy the code

Ojbk, what if it’s an external request to access the container? For example, I have an Nginx container with port 80, but the destination IP address of the request is the host IP address such as 172.16.3.140. How can I forward the request to the container? It’s pretty easy to guess, DNAT.

[root@localhost ~]# iptables -t nat -L -n Chain DOCKER (2 references) target prot opt source destination RETURN all -- 0.0.0.0/0 0.0.0.0/0 DNAT TCP - 0.0.0.0/0 0.0.0.0/0 TCP DPT: 80 to: 172.17.0.4:80Copy the code

Then we went back to the routing table and ARP

Refer to the official documentation

2. The Host network

This one is simpler. Its network namespace is actually used by the host (the others are still independent and isolated).

docker run -d -it --name busybox2 --network host  busybox
Copy the code

At this point, we check ifconfig. The device without Veth pair is created. After entering the container, the ifconfig network view is exactly the same as that on the host.

"Containers": { "45e964d3ccf99c044f0bae5aef145e8036ecda5da7bb6dd8495a4c0ca7588a84": { "Name": "busybox2", "EndpointID": "fecfcdbf4a8324362b6e7fdda835c9ad110bd31b72254beba8e3e04ef42d33d7", "MacAddress": "", "IPv4Address": "", "IPv6Address": ""}}.Copy the code

3. The container model

The Container pattern is that multiple containers use the network namespace of a container.

We start two containers busyBox BusyBox1, busyBox1 uses the network view of BusyBox

docker run -d -it --name busybox  busybox
docker run -d -it --name busybox1 --network container:busybox  busybox
Copy the code

If you look at the network namespaces of both containers, they are exactly the same.

[root@localhost ~]# docker exec -it busybox sh -c 'ifconfig' eth0 Link encap:Ethernet HWaddr 02:42:AC:11:00:02 inet Addr: 172.17.0.2bcast :172.17.255.255 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:13 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX Bytes :1102 (1.0kib) TX bytes:0 (0.0b) Lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP Loopback RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 Carrier :0 collisions:0 TXQueuelen :1000 RX bytes:0 (0.0b) TX bytes:0 (0.0b) [root@localhost ~]# docker exec -it Busybox1 sh -c 'ifconfig' eth0 Link encap:Ethernet HWaddr 02:42:AC:11:00:02 inet ADDR: 172.17.0.2bcast :172.17.255.255 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:13 Errors :0 Dropped :0 Overruns :0 Frame :0 TX Packets :0 Errors :0 dropped:0 Overruns :0 Carrier :0 collisions:0 TXQueuelen :0 RX bytes:1102 (1.0 KiB) TX bytes:0 (0.0b) Lo Link encap:Local Loopback inet ADDR :127.0.0.1 Mask:255.0.0.0 UP Loopback RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0b) TX bytes:0 (0.0b)Copy the code

4.MacVlan (Across hosts)

Your network devices need to be able to handle “promiscuous mode,” where a physical interface can be assigned multiple MAC addresses.

Macvalan requires enabling the promiscuous mode of the network card, which is obviously not supported at present (this is required for both hosts).

ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
Copy the code
The VM NIC is ens33 ifConfig ENS33 promiscCopy the code
# see PROMISC representative opens the ens33: flags = 4419 < UP, BROADCAST, RUNNING, PROMISC, MULTICAST > mtu 1500Copy the code

At this point, we will create a network whose driver is Macvaln (both hosts need to execute).

Docker network create -d macvlan \ --subnet=172.16.3.0/24 \ # The same network segment as the host --gateway=172.16.3.1 \ -o parent=ens33 sup-net # Specify the name of the physical nicCopy the code

Create a container on each host

#172.16.3.140 docker run-itd --name aaa -- IP =172.16.3.100 --network sup-net busybox #172.16.3.141 docker run-itd --name BBB -- IP =172.16.3.101 --network sup-net busyboxCopy the code

Both containers have been started

[root@localhost ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES dbd7da305fcf busybox "sh" 22 seconds  ago Up 20 seconds aaa [root@localhost ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 70a6f2d0a2aa busybox "sh" 38 seconds ago Up 37 seconds bbbCopy the code

Let’s ping it. We can see it’s already connected

[root@localhost ~]# docker exec it aaa sh -c 'ping 172.16.3.101' ping 172.16.3.101 (172.16.3.101): 56 Data bytes 64 bytes from 172.16.3.101: Seq =0 TTL =64 time= 1.822ms # reverse [root@localhost ~]# docker exec it BBB sh -c 'ping 172.16.3.100' ping 172.16.3.100 (172.16.3.100): 56 Data bytes 64 bytes from 172.16.3.100: seq=0 TTL =64 time=0.679 ms 64 bytes from 172.16.3.100: Seq = 1 TTL = 64 time = 0.621 msCopy the code

Refer to the official documentation