Linux supports namespaces, Cgroups, and Overlay hierarchical file systems, and Docker calls these interfaces provided by the operating system to achieve various isolation effects

The namespace

The functions provided by the Linux kernel are used to isolate resources and generate virtualization technology. The programs running in the namespace can only be aware of other processes and file systems in the current namespace. They are classified into the following namespaces:

The namespace Function parameters Isolate the content
UTS CLONE_NEWUST Host name and domain name, short for Unix Time-sharing System
IPC CLONE_NEWIPC Semaphore, message queue, and shared memory
PID CLONE_NEWPID Process. The process ID in different namespaces can be the same
Network CLONE_NEWNET Network devices and network cards
Mount CLONE_NEWNS File mount
User CLONE_NEWUSER Users and user groups

In /proc/< process ID>/ns, you can find the corresponding files. Run the ll command to view any process as follows:

ll /proc/1123/ns lrwxrwxrwx 1 root root 0 Jun 13 14:38 cgroup -> 'cgroup:[4026531835]' lrwxrwxrwx 1 root root 0 Jun 13 14:38 ipc -> 'ipc:[4026531839]' lrwxrwxrwx 1 root root 0 Jun 13 14:38 mnt -> 'mnt:[4026531840]' lrwxrwxrwx 1 root root 0  Jun 13 14:38 net -> 'net:[4026531993]' lrwxrwxrwx 1 root root 0 Jun 13 14:38 pid -> 'pid:[4026531836]' lrwxrwxrwx 1 root root 0 Jun 13 14:38 pid_for_children -> 'pid:[4026531836]' lrwxrwxrwx 1 root root 0 Jun 13 14:38 user -> 'user:[4026531837]' lrwxrwxrwx 1 root root 0 Jun 13 14:38 uts -> 'uts:[4026531838]'Copy the code

The number 4026531835 is the namespace number. If you view the NS directory of other processes, the namespace number is the same. If the namespace number is the same, the two processes are in the same namespace

The Clone function of Linux can be used to create child processes. The flags parameter specifies the isolation mechanism. The clone function can be viewed using the man command

man clone
Copy the code

Network namespace

To isolate the network through the network namespace, run the following command:

// Create a network namespace named test IP netns add test IP netns delete test // enter the test network namespace IP netns exec test bash // Exit the network namespace exit // Run the IP netns exec test netstat -anp command without entering the namespaceCopy the code

Operations within a namespace

Execute after entering the specified network namespace

// Query the IP link of the network interface in the current namespace // The following information is displayed: 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00Copy the code

After a new network namespace is created, lo is displayed by default. If the state is DOWN, it indicates that the network namespace is not started. In this case, ping 127.0.0.1 is unavailable

// Enable LO IP link set LO UP // disable LO IP link set lo DownCopy the code

Run the following command to view the IP address

// Query the IP IP addr in the namespaceCopy the code

Communication between namespaces

Create two virtual nics in the system namespace

IP link add interface1 type veth peer name interface2 // Check the interface. interface2@interface1 and interface1@interface2 are displayed. Are respectively the DOWN state IP link | grep interfaceCopy the code

Move Interface1 and interface2 to the test1 and test2 namespaces, respectively

IP link set Interface1 netns test1 IP link set Interface2 netns test2 test2 ip linkCopy the code

Set the IP addresses of the network adapters in the two namespaces. The two IP addresses must be in the same network segment; otherwise, the network cannot be pinged

IP netns exec test1 IP addr add dev interface1 192.168.10.10/24 IP netns exec test2 IP addr add dev interface2 192.168.10.20/24Copy the code

Start the network cards in both namespaces

IP netns exec test1 IP link set interface1 up IP netns exec test2 IP link set interface2 up If the status is UP, the system runs normally. IP netns exec test1 IP link IP netns exec test2 IP linkCopy the code

Test for connectivity

IP netns exec test1 ping 192.168.10.20 IP netns exec test2 ping 102.168.10.10 Netns exec test1 ping 192.168.10.10 IP netns exec test2 ping 192.168.10.20Copy the code

Connect by bridge

The principle is that each namespace connects to the bridge and communicates with the other through the bridge

/ / create a bridge IP link add bridge1 type bridge / / view IP link | grep bridge1 / / create two IP network namespace netns add test1 IP netns add test2Copy the code

First connect test1 to Bridge1

IP link add test1ToBridge1 type veth peer name bridge1ToTest1 // Create network adapters from test1 to bridge1 and bridge1 to test1 IP link set test1ToBridge1 netns test1 // Set IP IP netns exec test1 IP addr add dev Test1ToBridge1 192.168.10.10/24 // Start two nics IP netns exec test1 IP link set test1ToBridge1 up IP link set bridge1ToTest1 Master Bridge1 IP link set bridge1ToTest1 up // Checking nic status IP netns exec test1 IP link IP linkCopy the code

Connect test2 to Bridge1

IP link add test2ToBridge1 type veth peer name bridge1ToTest2 // Create network adapters from test2 to bridge1 and bridge1 to test2 IP link set test2ToBridge1 netns test2 // Set IP IP netns exec test2 IP addr add dev Test2ToBridge1 192.168.20.20/24 // Start two nics IP netns exec test2 IP link set test2ToBridge1 up IP link set bridge1ToTest2 master bridge1 ip link set bridge1ToTest2 upCopy the code

Last start bridge

ip link set bridge1 up
Copy the code

Test the communication

IP netns exec test1 ping 192.168.20.20Copy the code

At present, ping cannot be pinged because it is across network segments, so a route needs to be specified. First, add two IP addresses on Bridge1

IP addr add dev bridge1 192.168.10.1/24 IP addr add dev bridge1 192.168.20.1/24Copy the code

Set the route in the main namespace

Route add-net 192.168.10.0/24 dev bridge1 // Data going to 192.168.20.0/24 goes to bridge1 Run the route add-net 192.168.20.0/24 dev bridge1Copy the code

Set routes to each other in the test1 and test2 namespaces

IP netns exec test1 route add-net 192.168.20.0/24 GW 192.168.10.1 IP netns exec test2 route add-net 192.168.10.0/24 GW 192.168.20.1Copy the code

Then use the DNAT technology to change the source IP address

// Packet sent from 192.168.10.0/24 to 192.168.20.0/24, Change IP address to 192.168.10.1 iptables -t NAT -i PREROUTING -s 192.168.10.0/24 -d 192.168.20.0/24 -j DNAT --to 192.168.10.1Copy the code

At this point, you can ping again

IP netns exec test1 ping 192.168.20.20 ping 192.168.20.20 (192.168.20.20) 56(84) bytes of data.64 bytes from 192.168.20.20: ICmp_seq =1 TTL =64 time=0.074 ms 64 bytes from 192.168.20.20: Icmp_seq =2 TTL =64 time=0.083 ms 64 bytes from 192.168.20.20: Icmp_seq =3 TTL =64 time=0.089 ms 64 bytes from 192.168.20.20: Icmp_seq =4 TTL =64 time=0.084 ms 64 bytes from 192.168.20.20: ICmp_seq =5 TTL =64 time=0.080 msCopy the code

cgroup

Control Group is short for control group, which limits system resources, such as CPUS and memory, under a namespace

The corresponding files are in the /sys/fs/cgroup directory. You can run the cat command to view and echo command to modify the contents of these files instead of using the vi command

Here’s how to create a cgroup:

apt install cgroup-tools
cgcreate -g cpu:test1
Copy the code

The test1 directory will be created under /sys/fs/cgroup/ CPU. The files in this directory are the configuration files used for resource limiting, and the subdirectory will inherit the configuration files in the parent directory.

In the tasks directory ll, you can view the ids of all processes managed under the current CGroup

Limit the CPU

Test1 /sys/fs/cgroup/ CPU /test1 /sys/fs/cgroup/ CPU /test1

  1. cpu.cfs_quota_usYes CPU usage, in microseconds. The default value is -1, indicating that the CPU usage is unlimited
  2. cpu.cfs_period_usIs the CPU allocation period, the default is 100000, generally do not change, change the above is good

Cfs_quota_us is changed to 30000 by echo 30000 > cpu.cfs_quota_us, indicating that all processes in the cgroup can occupy a maximum of 30% of the CPU

Now add a process to the current cgroup:

Cgclassify -g CPU: indicates the ID of the test1 processCopy the code

To add a process to a Cgroup, you can echo the process ID >> Tasks file, and then add a line to the cgroup

Binding the CPU

In the /sys/fs/cgroup/cpuset directory, modify the cpuset.cpus file as follows:

echo '1-2' > cpuset.cpus
Copy the code

Indicates that processes in the Cgroup can occupy only one or two CPU cores

Limited memory

In /sys/fs/cgroup/memory, modify the memory.limit_in_bytes file as follows:

echo $((256 * 1024 * 1024)) > memory.limit_in_bytes
Copy the code

overlay

Merged file system is divided into lower dir, upper dir, and merged file system from bottom to top. The lower dir is read-only, upper dir is read and written, and merged is a combination of lower and upper. The lower layer can contain multiple directories. Merges, namesake overwrites, and copy-on-write

  • Merge means that files and directories in lower and upper are merged into the merged layer

  • Namesake override means that if there are files with the same name in lower and upper, the files in upper will overwrite the files in lower

  • Copy-on-write means that if the file to be modified is in the upper layer, it is modified directly. If the file to be modified is in the lower layer, the file is copied to the upper layer and modified

Merged files are created at the merged layer and saved to the upper layer

If the file is at the upper layer, delete it directly. If the file is in the lower tier, create a 0 file with the same name and no permissions for any user in the upper tier

Docker image is composed of overlay2 layer file system, each step of image construction will generate a layer, when there is the same operation, Docker can use cache, docker local image directory is /var/lib/docker-overlay2 default, Entering this directory, you can see many subdirectories, which are layer by layer of files

Create a hierarchical directory yourself to test the creation and modification of files at different levels

// Create 4 directories mkdir lower upper merged worker Echo 'a-lower' > lower/a.txt echo 'a-upper' > upper/a.txt touch lower/b.txtCopy the code

Create layered files, if the lower layer has multiple directories available: separate

mount -t overlay overlay -o lowerdir=lower,upperdir=upper,workdir=worker merged
Copy the code

A single command can create a layered file. If an error occurs during execution, check the kernel error message

journalctl -xe
Copy the code

Now look at the merged directory

ll merged
Copy the code

Now you can test creating files in merged

Echo 'c-merged' > merged/ c.next // check the merged file ll merged/ / check the upper directory ll upperCopy the code