Kubernetes network foundation

Kubernetes network model design is a basic principle: single Pod single IP model. The goal of this model is to assign an IP address of the Kubernetes cluster private network address segment for each Pod. Through this IP, Pod can communicate with other PODS, physical machines, containers, and so on across the network. All containers within a Pod share a network stack (equivalent to a network namespace, where their IP addresses, network devices, configurations, and so on are shared) and communicate with each other via localhost as if on a single machine. So you can simply think of a Pod as a separate virtual machine.

Kubernetes assigns IP addresses to each pod in a non-NAT (network address translation) flat network address space, which is important because NAT introduces additional complexity by segmenting the network address space. Of course, the IP of the Pod is not fixed, and usually a service resource is added to access the Pod (as explained in this article).

Now that we have the network model, how do we implement it? To ensure the standardization, scalability, and flexibility of network solutions, the K8S adopts the Container Networking Interface (CNI) specification. There are already many network solutions, such as Flannel, Calico, Canal, Weave Net, etc. Weave is used as an example in this article.

Weave Net principle

Weave creates virtual networks that connect containers deployed on multiple machines. For containers, Weave is like a giant Ethernet switch that all containers access and can communicate directly without NAT or port mapping.

Install weave

kubectl apply -f "https://cloud.weave.works/k8s/net? k8s-version=$(kubectl version | base64 | tr -d '\n')"
Copy the code

Environment description: Two slaves. Each Pod has a Pod resource, and 2 containers are created in each Pod. We’ll explain this in more detail later. This section only studies the basic principles of Weave networks.

Weave Pod resources will be installed in Kube-System ns, running 2 containers per Pod

  • Weave: The main program that builds the Weave network, sends and receives data, and provides DNS services.

  • Weave-npc: Network policy controller. Weave-npc uses iptables to enforce network policy and control access output.

Weave, vethwe-datapath@vethwe-bridge, vethwe-bridge@vethwe-datapath, and VXLAN-6784 are automatically generated after the Weave network plug-in is installed.

The Weave network consists of two virtual switches (see figure 1) : Weave and Datapath, and vethwe-Bridge and Vethwe-Datapath connect the two together. Weave connects containers to the Weave network. Datapath sends and receives data through VxLAN tunnels between hosts. Datapath enables the Weave Net router to tell the kernel how to process packets

Going back to the first picture, there are many network cards, and we’ll go through them one by one.

  • Eth0 inside a Container: eth0 is the default network used by Container hosts to access services provided by the external network. It uses the default Docker network architecture, but it creates the docker_gwbridge.
  • Docker_gwbridge is a bridge created by the container that replaces the docker0 service.
  • Container ethwe: a virtual network adapter used to communicate with other containers.
  • Vethwe-bridge: Is the Ethwe device on another section of the Weave bridge created. The specific IP address and gateway assigned within the bridge.
  • Weave: The weave bridge uses the route table to find the destination and forwards packets to the peer port node through the port.
  • Eth0: connects the machine NIC to the external NIC. It is used to forward packets of VXLAN and NAT nics to the specified peer node.
  • Weave learns from neighboring nodes, communicates with each other through the Route table, and sends data through separate ports. Similar to static routes.

On node1, there is a pod (service-test-958CCB545-kg5wn) that runs two pause containers. Each pod runs an additional pause container. In K8S, the pause container is used as the parent of all containers in a POD. The Pause container has two core functions. First, it provides the foundation for the entire POD namespace, such as networking. Second, enable the PID namespace, which is treated as a PID 1 process in each POD, and recycle zombie processes. So the Pod network depends on the corresponding pause container configuration

1. Check Pod IP address (10.44.0.5)

2. Check the NIC of the Pod, that is, the NIC of the Pause container

Enter the pause container (21276 is the PID of the container) and check the nic pairs. Eth0 is inside the pause container

IP address of the global NIC A

Ethic-vethwepl6811277 is a network card pair. Vethwepl6811277 is a network card that connects the other end of the Pod to the Weave bridge.

Weave bridge also hangs the Vethwe-Bridge, what is this

The IP address range of the Weave bridge

IP -d link Displays detailed network adapters

You can see that vethwe-Bridge and Vethwe-Datapath are network adapter pairs

Vethwe-datapath hangs on a master datapath (datapath is a switch)

The VxLAN-6784 is a NETWORK interface card (NIC) of the VxLAN. Weave hosts communicate with each other through the VXLAN. The following figure is a simple weVA architecture diagram

So the network of containers is:

  • All containers are connected to the Weave bridge
  • The Weave bridge connects to the OpenVSwitch module of the kernel through the Veth pair
  • Cross-host containers communicate through the OpenVSwitch VXLAN
Kubernetes network communication
1. Direct communication between containers

Containers within a Pod (containers within a Pod do not cross hosts) share the same network namespace and Linux stack. So for various operations on the network, they can even access each other’s ports using localhost addresses as if they were on the same machine. For example, if container 2 is running MySQL, container 1 can directly access MySQL running on container 2 using localhost:3306

2, abstract Pod to Pod direct communication

1) Pod communication on the same machine (source authoritative guide below)

Pod1 and Pod2 are connected to the same docker0 bridge via Veth. Their IP addresses IP1 and IP2 are dynamically obtained from the docker0 network segment, which is the same network segment as the bridge itself IP3. In addition, on the Linux protocol stack of Pod1 and Pod2, the default route is the docker0 address, that is, all non-local address network data will be sent to the Docker0 bridge by default, through the Docker0 bridge directly. To sum up, because they are all associated on the same Docker0 bridge and have the same address segment, they can communicate directly with each other

2) Pod communication on different machines

The red line shows the data flow

The pod on Node1 communicates with the pod on Node2

  • Eth0 sends the packet to the Vethwe-Bridge.
  • The Vethwe-Bridge receives the data and Weave processes it. The UDP6784 data port forwards the data to the next routing node according to Weave’s routing table.
  • If the node is the destination, Local Weave forwards the information to the kernel’s TCP station, which then forwards it to the destination node.
3. Communication between Pod and Service

Using iptables. The details are explained in this article

4. Communication between external and internal components of the cluster

It has been explained in this article

reference

  • Containers and The Container Cloud

  • The Definitive Guide to Kubernetes

  • 5 Minutes a day to play Kubernetes

  • Weave is introduced