This is the 21st day of my participation in Gwen Challenge

Zero, preamble (the container is a special process)

Why use containers?

Container virtualization makes building applications more efficient and easier to manage and maintain.

On the left is how the virtual machine works and on the right is Docker, as shown:

The core technologies for containers are Cgroup and Namespace, on top of which there are several other tools that make up the container technology.

A container is a process on a host:

  1. Container technology passedNamespaceImplementing resource isolation
  2. throughCgroupImplementing Resource Control
  3. throughrootfsImplement file system isolation
  4. The container engine itself has features to manage the lifecycle of the container

Supplement: Docker is similar to LXC management engine in the early stage. LXC is the management tool of Cgroup, and Cgroup is the user space management interface of Namespace. Namespace is the basic mechanism used by the Linux kernel to manage processes in task_struct.




A,NamespaceResource isolation

One of the main goals of Namespace development was to enable lightweight virtualization services

Resource isolation comes to mind with the chroot command, which enables file system isolation.

As shown in figure:

(1) 6 kindsNamespaceisolation

Containers require six basic quarantines:

As shown in figure:

  1. IPCInterprocess communication is realized through shared memory. If the two processes can go straight throughIPCVisits? That’s not isolation. PrimitiveLinux ENVDifferent processes can be passed directlyIPCcommunication
  2. There are two forms of process tree and file system tree. One is user-spaceINITOne is subordinate to a process. Child processes are created, terminated, and reclaimed by the parent process.INITBefore the end, you need to terminate all processes (PID mapping) in the user space.
  3. Use a User to run a process (User). Each user space needs its ownrootPretend to berootA user [who cannot process content in other user space, such as deleting files] is a normal user on the host.
  4. Mount, file mount system,cd /usr/bin/Compare to the host. Found to be an independent file system
  5. UTS-hostname
  6. Network:netstat -an | grep 22


(2)Namespaceoperation

Operations on the Namespace are performed through clone, setNS, and unshare system calls.

  1. cloneCan be used to create new onesNamespace
  2. unshareThe called process will be put into a new oneNamespace
  3. setnsPut the process into an existing oneNamespace

Example Query the Namespace of the current process

donald@donald-pro:~$ ls -l /proc/$$/ns total 0 lrwxrwxrwx 1 donald donald 0 Apr 22 00:00 cgroup -> 'cgroup:[4026531835]'  lrwxrwxrwx 1 donald donald 0 Apr 22 00:00 ipc -> 'ipc:[4026531839]' lrwxrwxrwx 1 donald donald 0 Apr 22 00:00 mnt -> 'mnt:[4026531840]' lrwxrwxrwx 1 donald donald 0 Apr 22 00:00 net -> 'net:[4026532009]' lrwxrwxrwx 1 donald donald 0 Apr 22 00:00 pid -> 'pid:[4026531836]' lrwxrwxrwx 1 donald donald 0 Apr 22 00:00 pid_for_children -> 'pid:[4026531836]' lrwxrwxrwx 1 donald donald 0 Apr 22 00:00 user -> 'user:[4026531837]' lrwxrwxrwx 1 donald donald 0 Apr 22 00:00 uts -> 'uts:[4026531838]' donald@donald-pro:~$Copy the code




Second,Cgroup

Cgroup is a mechanism provided by the Linux kernel to limit, record, and isolate the physical resources (such as CPU, memory, IO, and so on) used by process groups

Cgroup has a process grouping framework, and different resources are controlled by different subsystems. A subsystem is a resource controller, such as the CPU subsystem, which controls the allocation of CPU time.

By Linux namespace for the newly created process isolation between the file system, network and host machine process isolation from each other, but the namespace is not able to provide us with the isolation on the physical resources, such as CPU or memory, if on the same machine running multiple [container] knew nothing about each other and the host machine, Together, these containers occupy the physical resources of the host machine.

When Docker is installed on Linux, you will find a directory named Docker in the directories of all subsystems.

The contents of the cpu.cfs_quota_us file can limit CPU usage.

donald@donald-pro:/sys/fs/cgroup$ ll
total 0
drwxr-xr-x 15 root root 380 Apr 22 18:05 ./
drwxr-xr-x  9 root root   0 Apr 22 18:05 ../
dr-xr-xr-x  4 root root   0 Apr 22 18:05 blkio/
lrwxrwxrwx  1 root root  11 Apr 22 18:05 cpu -> cpu,cpuacct/
lrwxrwxrwx  1 root root  11 Apr 22 18:05 cpuacct -> cpu,cpuacct/
dr-xr-xr-x  4 root root   0 Apr 22 18:05 cpu,cpuacct/
dr-xr-xr-x  2 root root   0 Apr 22 18:05 cpuset/
dr-xr-xr-x  5 root root   0 Apr 22 18:05 devices/
dr-xr-xr-x  3 root root   0 Apr 22 18:05 freezer/
dr-xr-xr-x  2 root root   0 Apr 22 18:05 hugetlb/
dr-xr-xr-x  4 root root   0 Apr 22 18:05 memory/
lrwxrwxrwx  1 root root  16 Apr 22 18:05 net_cls -> net_cls,net_prio/
dr-xr-xr-x  2 root root   0 Apr 22 18:05 net_cls,net_prio/
lrwxrwxrwx  1 root root  16 Apr 22 18:05 net_prio -> net_cls,net_prio/
dr-xr-xr-x  2 root root   0 Apr 22 18:05 perf_event/
dr-xr-xr-x  4 root root   0 Apr 22 18:05 pids/
dr-xr-xr-x  2 root root   0 Apr 22 18:05 rdma/
dr-xr-xr-x  5 root root   0 Apr 22 18:05 systemd/
dr-xr-xr-x  5 root root   0 Apr 22 18:05 unified/
Copy the code
donald@donald-pro:/sys/fs/cgroup/cpu/docker$ ll total 0 drwxr-xr-x 3 root root 0 Apr 25 14:28 ./ dr-xr-xr-x 5 root root 0 Apr 25 14:28 .. / drwxr-xr-x 2 root root 0 Apr 25 14:28 c988e6a0567ccc350b18e3e2eb96cfe0dbff4edd202ab4132012916b019c2904/ -rw-r--r-- 1 root root 0 Apr 25 14:28 cgroup.clone_children -rw-r--r-- 1 root root 0 Apr 25 14:28 cgroup.procs -r--r--r-- 1 root root  0 Apr 25 14:28 cpuacct.stat -rw-r--r-- 1 root root 0 Apr 25 14:28 cpuacct.usage -r--r--r-- 1 root root 0 Apr 25 14:28 cpuacct.usage_all -r--r--r-- 1 root root 0 Apr 25 14:28 cpuacct.usage_percpu -r--r--r-- 1 root root 0 Apr 25 14:28 cpuacct.usage_percpu_sys -r--r--r-- 1 root root 0 Apr 25 14:28 cpuacct.usage_percpu_user -r--r--r-- 1 root root 0 Apr 25  14:28 cpuacct.usage_sys -r--r--r-- 1 root root 0 Apr 25 14:28 cpuacct.usage_user -rw-r--r-- 1 root root 0 Apr 25 14:28 cpu.cfs_period_us -rw-r--r-- 1 root root 0 Apr 25 14:28 cpu.cfs_quota_us -rw-r--r-- 1 root root 0 Apr 25 14:28 cpu.shares -r--r--r-- 1 root root 0 Apr 25 14:28 cpu.stat -rw-r--r-- 1 root root 0 Apr 25 14:28 notify_on_release -rw-r--r-- 1 root root 0 Apr 25 14:28 tasksCopy the code
donald@donald-pro:/sys/fs/cgroup/cpu/docker/c988e6a0567ccc350b18e3e2eb96cfe0dbff4edd202ab4132012916b019c2904$ sudo docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES c988e6a0567c mobz/elasticsearch-head:5 "/bin/sh -c 'grunt s..." 5 months ago Up 3 minutes 0.0.0.0:9100->9100/ TCP loving_albattani 1 said not Donald @ Donald - pro: the limit/sys/fs/cgroup/CPU/docker/c988e6a0567ccc350b18e3e2eb96cfe0dbff4edd202ab4132012916b019c2904 $ cat cpu.cfs_quota_us -1Copy the code




The container creation process

(1) System callcloneCreate a new process and have your ownNamespace

This process has its own PID, mount, User, NET, IPC, uts namespace

root@docker:~# pid = clone(fun, stack, flags, clone_arg);
Copy the code

(2)pidwritecgroupSubsystem is subjected tocgroupSubsystem control

root@docker:~# echo$pid > /sys/fs/cgroup/cpu/tasks
root@docker:~# echo$pid > /sys/fs/cgroup/cpuset/tasks
root@docker:~# echo$pid > /sys/fs/cgroup/bikio/tasks
root@docker:~# echo$pid > /sys/fs/cgroup/memory/tasks
root@docker:~# echo$pid > /sys/fs/cgroup/devices/tasks
root@docker:~# echo$pid > /sys/fs/cgroup/feezer/tasks
Copy the code

(3) Passpivot_rootThe system calls

Through the pivot_root system call, the process enters a new rootfs, and then executes /bin/bash in the new Namespace, Cgroup, and rootfs through the exec system call

fun() {
  pivot_root("path_of_rootfs/", path);
  exec("/bin/bash");
}
Copy the code