How does Docker achieve isolation

An overview of the

Containerization technology is very popular in the current cloud computing, micro-services and other systems, and Docker is a typical containerization technology. It is necessary for us to understand the typical containerization technology. Therefore, in this article, I will analyze how Docker achieves isolation technology, and what are the differences between Docker and virtual machine? And then we started to uncover it.

Start by running a container

Let’s start by running a simple container, using the BusyBox image as an example. Busybox image is a common Linux toolkit that can be used to execute many Linux commands. Execute command:

docker run -it --name demo_docker busybox /bin/sh
Copy the code

This command starts a busyBox image Docker container, and the it parameter provides the container with an output/output interactive environment (TTY). /bin/sh indicates the commands or programs that the container runs interactively.

Process isolation

After successful execution, we will enter the Docker container and run ps -ef to check the process

/ # ps -ef
PID   USER     TIME  COMMAND
    1 root      0:00 /bin/sh
    8 root      0:00 ps -ef
Copy the code

Run the top command to view process resources

Mem: 1757172K used, 106080K free, 190676K shrd, 129872K buff, 998704K cached CPU: 0.0% usr sys 0.0% NIC 99.6% IDLE 0.0% IO 0.0% IRQ 0.0% SIRQ Load Average: 0.00 0.01 0.05 2/497 9 PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND 1 0 root S 1300 0.01 0.0 /bin/sh 9 1 root R 1292 0.0 0.0 the top 3Copy the code

And we are under the host machine to check the currently executing the process of container ps – ef | grep busybox

root       5866   5642  0 01:19 pts/4    00:00:00 /usr/bin/docker-current run -it --name demo_docker busybox /bin/sh
root       5952   5759  0 01:20 pts/11   00:00:00 grep --color=auto busybox
Copy the code

Here we can know that the docker run command for the host is only a process whose PID is 5866. As for the container itself, it is isolated, and only its own process can be seen inside the container. How does Docker do that? It is actually with the help of the Linux kernel Namespace technology to achieve, here I combine a section of C program to simulate process isolation.

#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <stdio.h>
#include <sched.h>
#include <signal.h>
#include <unistd.h>
#include <sys/mount.h>/* Define a value forcloneUse the stack, stack size 1M */#define STACK_SIZE (1024 * 1024)
static char container_stack[STACK_SIZE];

char* const container_args[] = {
    "/bin/bash",
    NULL
};


int container_main(void* arg)
{
    printf("Container process [%5d] ---- enter container! \n",getpid());
    mount("proc"."/proc"."proc", 0, NULL); /** Run /bin/bash */ execv(container_args[0], container_args);printf("Wrong! \n");
    return 1;
}

int main()
{
    printf("Host process [%5d] - Start a container! \n",getpid()); / * callcloneFunction */ int container_pid =clone(container_main, container_stack+STACK_SIZE, CLONE_NEWPID | CLONE_NEWNS | SIGCHLD, NULL); /* Wait for child process to end */ waitpid(container_pid, NULL, 0);printf("Host machine - End of container! \n");
    return 0;
}
Copy the code

The clone() function is called container_main to clone a process. The next parameter is called stack space. CLONE_NEWPID and CLONE_NEWNS then represent the invocation classes of Linux NameSpace, creating a new process NameSpace and mounting the NameSpace, respectively.

CLONE_NEWPID causes the executing program to renumber the PID internally, starting with process 1
CLONE_NEWNS clones the new mount environment, masking the parent process information by remounting the Proc folder inside the child process.

Let’s run this program to see what happens.

compile

gcc container.c -o container
Copy the code

perform

[root@host1 luozhou]# ./container Host process [6061] - Start a container! The container process [1] ---- enters the container!Copy the code

From the point of view of the host, the PID of this program is 6061. From the point of view of the cloned child, its PID is 1. We run ps -ef to see the list of processes

[root@host1 luozhou]# ps -ef
UID         PID   PPID  C STIME TTY          TIME CMD
root          1      0  0 01:46 pts/2    00:00:00 /bin/bash
root         10      1  0 01:48 pts/2    00:00:00 ps -ef
Copy the code

We find that only the processes inside the container are running, then execute the top command

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 root 20 0 115576 2112 1628 S 0.0 0.1 05:00.00 bash 11 root 20 0 161902124 1544 R 0.0 0.1 0:00.00 topCopy the code

The result is only information about two processes.

This is the basic principle of container isolation process, Docker is mainly with the help of Linux kernel technology Namespace to achieve isolation, in fact, including I will talk about the isolation of files later, the isolation of resources are in the new Namespace by mounting the way to isolation.

File isolation

After understanding the process isolation, I believe you have a general impression of Docker container isolation play, let’s take a look at Docker internal file system isolation, that is, you execute ls inside Docker display folders and files how to come.

Let’s use the Docker command as an example and execute ls

bin   dev   etc   home  proc  root  run   sys   tmp   usr   var
Copy the code

We found that the container already contains these folders, so where did these folders come from? So let’s go ahead and execute docker info to see what file system our Docker is using.

Server Version: 1.13.1
Storage Driver: overlay2
Copy the code

My version is 1.13.1, and the storage driver is Overlay2. Different storage drivers have different performance in Docker, but the principle is similar. Let’s see how Docker uses Overlay2 to render so many folders. As we mentioned earlier, Docker is always mounted by mount. We first find our container instance ID.

Perform docker ps – a | grep demo_docker

c0afd574aea7        busybox                         "/bin/sh"                42 minutes ago      Up 42 minutes 
Copy the code

We according to our container ID to find mount information, perform the cat/proc/mounts | grep c0afd574aea7

shm /var/lib/docker/containers/c0afd574aea716593ceb4466943bbd13e3a081bf84da0779ee43600de0df384b/shm tmpfs rw,context="system_u:object_r:container_file_t:s0:c740,c923",nosuid,nodev,noexec,relatime,size=65536k 0 0
Copy the code

There is a mount information here, but this record is not our focus, we need to find the mount information for Overlay2, So here we also need to execute a command: cat/proc/mounts | grep system_u: object_r: container_file_t: s0: c740, c923

overlay /var/lib/docker/overlay2/9c9318031bc53dfca45b6872b73dab82afcd69f55066440425c073fe681109d3/merged overlay rw,context="system_u:object_r:container_file_t:s0:c740,c923",relatime,lowerdir=/var/lib/docker/overlay2/l/FWESUOVO6DYTXBBJIQBPUWLN6K:/var/lib/docker/overlay2/l/XPKQU6AMUX3AKLAX2BR6 V4JQ3R,upperdir=/var/lib/docker/overlay2/9c9318031bc53dfca45b6872b73dab82afcd69f55066440425c073fe681109d3/diff,workdir=/ var/lib/docker/overlay2/9c9318031bc53dfca45b6872b73dab82afcd69f55066440425c073fe681109d3/work 0 0 shm /var/lib/docker/containers/c0afd574aea716593ceb4466943bbd13e3a081bf84da0779ee43600de0df384b/shm tmpfs rw,context="system_u:object_r:container_file_t:s0:c740,c923",nosuid,nodev,noexec,relatime,size=65536k 0 0
Copy the code

Here overlay mount was not associated with the container id, so we directly according to the container id is can’t find the overlay, mount information here with the help of the context to the associated, so we through the context of the content will be mounted address found us. Let’s go into the catalog and see the results

[root@host1 l]# ls /var/lib/docker/overlay2/9c9318031bc53dfca45b6872b73dab82afcd69f55066440425c073fe681109d3/merged
bin  dev  etc  home  proc  root  run  sys  tmp  usr  var
Copy the code

We find that this is consistent with our container’s directory. We create a new directory under this directory and see if a new directory appears inside the container.

The picture above to verify the container inside the file content and mount the/var/lib/docker overlay2 / ID/merged under is consistent, this is the docker file system the basic principle of isolation.

Resource constraints

Those of you who have played Docker must know that Docker can still limit the use of resources, such as CPU and memory, so how to achieve this part? In Linux, everything is a file, so the Cgroups technology will also be represented in the file, we can run the mount -t cgroup to see the Cgroups mount situation

cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,devices)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,net_prio,net_cls)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,hugetlb)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,perf_event)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,freezer)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,blkio)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuacct,cpu)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,pids)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,memory)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuset)

Copy the code

We see that the directory mounted above contains CPU and memory, so we guess that it is under this folder that the restriction information is configured. To verify this, run the following command:

docker run -d --name='cpu_set_demo' --cpu-period=100000 --cpu-quota=20000 busybox md5sum /dev/urandom 
Copy the code

–cpu-period=100000 –cpu-quota=20000 limits CPU usage to 20%. Details on these two parameters can be found here

The CPU limit of the Docker container is set to 20%. The CPU limit of the Docker container is set to 20%. Again, the configuration here must be tied to the container instance ID, My path to the file is in the/sys/fs/cgroup/CPU/system. Slice/docker – 5 bbf589ae223b347c0d10b7e97cd1461ef82149a6d7fb144e8b01fcafecad036. The scope, 5 bbf589ae223b347c0d10b7e97cd1461ef82149a6d7fb144e8b01fcafecad036 is we start the container id.

Switch to the folder above and view the parameters we set:

[root@host1]# cat cpu.cfs_period_us
100000
[root@host1]# cat cpu.cfs_quota_us 
20000
Copy the code

We found that our container startup parameters are the same, that is, the file values here are used to limit the CPU usage of the container. Here it is important to note that different Linux version Docker Cgroup file location may be different, some of them are in the/sys/fs/Cgroup/under/Docker/ID/CPU.

Different from traditional VIRTUAL machine technology

After the previous process, file system, resource limitation analysis, you have a basic understanding of the isolation principle of Docker, then it and the traditional virtual machine technology and differences? Here is a picture of the difference between Docker and virtual machine on the Internet

This figure should clearly show the difference between virtual machine technology and Docker technology. Virtual machine technology completely virtualizes a separate system, which deals with various running requests of applications, so it actually has an impact on performance. Docker technology is completely dependent on the Linux kernel features of Namespace and Cgroup technology to achieve, essentially speaking: Your application running in the container is still a normal process on the host and is directly scheduled by the host. Comparatively speaking, there is less performance loss, which is also an important advantage of Docker technology.

Docker technology is still an ordinary process, so it is not completely isolated. It still shares the kernel of the host computer, and its isolation level and security are not as high as that of virtual machines, which is also its disadvantage.

conclusion

In this paper, I verified the isolation principle of Docker container technology in process, file system and resource restriction through practice. Finally, I also compared the difference between virtual machine and Docker technology. In general, Docker technology has performance advantages because it is a common host process, while virtual machine is a completely virtual system. Therefore, it has the advantages of high isolation and security, which have advantages and disadvantages of each other. However, containerization is the current trend, I believe that with the maturity of technology, the current problem of incomplete isolation can be solved, containerization is not a dream of the world.

reference

People.redhat.com/vgoyal/pape…
Docs.docker.com/v17.09/engi…
Lwn.net/Articles/25…

An overview of the

Start by running a container

Process isolation

File isolation

Resource constraints

Different from traditional VIRTUAL machine technology

conclusion

reference

Related Posts

Python’s widely used concurrency library futures uses intro and internals

SAR a full Linux performance monitoring commands | 7 clock

MySQL 4 consecutive ask, being abused by the interviewer