Learning Golang has been a bit boring, I found a book called diy Docker, the title of which caught my eye. I decided to copy it right away.

Zero here means: a certain understanding of containers, a little golang.

  1. Use GoLang to write a Docker from scratch — Diy Docker Reading Notes
  2. Use GoLang to write a Docker from scratch – Diy Docker Reading Notes
  3. Write a Docker from scratch using GoLang — Diy Docker Reading Notes
  4. Write a Docker from scratch using GoLang. — “Write Your Own Docker” book notes

1. Basic concepts

I’m not going to explain the container, I’m not going to explain the Golang, which I wrote about in my previous article. A few Linux concepts are highlighted here.

Linux 1.1 Namespace

First, take a look at namespaces

1.1.1 the Linux Kernel

Before you learn about namespaces, learn about the kernel. In Linux, the kernel is used for four jobs:

  1. Memory management
  2. Management process
  3. Management drive
  4. Manage system calls and security protection

Kernel is the middleware that manages scheduling between hardware and other resources (the four resources above).

1.1.2 Namespace

A Namespace is used to isolate various resources. There are six different types of Namespace in Linux, each of which isolates different resources.

1.1.2.1 UTS Namespace

UTS Namespace is used to isolate nodeName from domainName. Each UTS Namespace can have a hostname. Changing hostname in a UTS Namespace does not affect other namespaces.

1.1.2.2 IPC Namespace

The IPC Namespace is used to isolate System V IPC and POSIX Message Queues.

1.1.2.3 PID Namespace

As the name suggests, used to isolate PID. The same process sees different Pids in different namespaces. The first PID in the newly created Namespace is 1, and this PID actually maps to a large PID on the host.

1.1.2.4 Mount Namespace

Mount Namespace Isolates a file system. Since Mount Namespace was the first Namespace, it was assumed that there would be no other namespaces, so in many places, especially in naming commands, Mount Namespace was simply called NS. Mount Namespace isolates the Mount point view, which means that when you Mount a directory, that directory is treated as the root directory under the Namespace. The file system tree you see will have this directory as the root directory. The mount operation itself does not affect the external. The volume in Docker also uses this feature.

1.1.2.5 User Namespace

The User Namespace isolates User group ids.

1.1.2.6 Network Namespace

Each Namespace has its own network device. You can use the same port number to map to different port numbers of host.

The 1.2 Linux Cgroups

Namespace allows the container to have its own space. But how do you make sure that these Spaces don’t compete with each other for size? This is where Cgroups comes in.

1.2.1 What are Cgroups

Cgroups can restrict and monitor the resources of a group of processes and their children.

1.2.1.1 Three components in Cgroups

  1. Cgroup: As the name suggests is a group, a group contains a set of processes. And there can be subsystem parameter configurations to associate a group of subs.
  2. Subsystem: a set of modules that control resources.
  3. Hierarchy: String a group of Cgroups into a tree structure. To provide inherited functionality.

1.2.1.2 Association of the three components

Linux has some limitations:

  1. First, create a Hierarchy. Hierarchy has a CGroup root node to which all processes are added. All nodes created on the Hierarchy are children of the root node.
  2. One subsystem can only be added to one hierarchy.
  3. But a single subsystem can be added to multiple Cgroups on the same hierarchy.
  4. A hierarchy can have more than one subsystem.
  5. A process can reside in multiple Cgroups, but these Cgroups must reside in different hierarchies.
  6. When a process forks a child process, the parent process and child process belong to the same Cgroup.
1.2.1.3 The relationship between cgroup and Subsystem and Hierarchy that all monkeys can understand

It’s actually a little bit easier to understand:

  1. Hierarchy is a Cgroups tree consisting of multiple Cgroups. Each Hierarchy establishment includes all Linux processes. By “all” I mean “all” in your mind, all processes in each hierarchy are the same. Different hierarchies actually refer to different ways of grouping, which is why a process can exist in multiple hierarchies. To be more precise, a process must reside in all hierarchies at the same time, but in which Cgroup it is placed is different.

  2. In Linux there is only one argument for subsystem, not one argument. This means that if you use the memory subsystem on one hierarchy, then the other hierarchy can’t use the memory subsystem.

  3. Subsystem is a type of resource controller. There are many subsystem, each of which controls different resources. Subsystem associates with cgroups. When you create a cgroups folder it will automatically generate a bunch of subsystem config files. That is the SUBSYSTEM config file. Git is not git. You can download a.git folder from someone else even if you don’t have git installed. It just doesn’t work. The same is true for subsystem config files. Creating a new Cgroup generates a Cgroup config file, but this does not mean you are attaching a SUBSYSTEM. Only if you change a Cgroup configuration file to subsubsystem for a limited resource will it be automatically attached to the subsubsystem for that limited resource.

  4. Assuming you have 12 subs on Linux, you can only model for 12 hierarchies at most. (Of course you can model for more hierarchies without subs, and then cgroups become pure groupings.) To each hierarchy one subsystem. Of course, if you put more than one subsystem on one hierarchy, there will be fewer hierarchies to establish.

  5. Subsystem is attached to cgroup, not hierarchy, but you will often see people saying to attach a subsystem to a hierarchy. These people are generally associated with a Cgroup or multiple Cgroups in hierarchy.

1.2.2 Kernel Interface of cGroup

The kernel interface, which means an API is called on Linux to control cgroups.

  1. Hierarchy is created and mounted to a directory. Create a new directory here:
    mkdir hierarchy-test
    Copy the code
  2. Then mount:
    sudo mount -t cgroup -o none,name=hierarchy-test hierarchy-test ./hierarchy-test
    Copy the code
  3. Then look at the files in this directory, and you’ll find a bunch of files. These files are the cgroup root configuration.
  4. Then create a new empty directory under this directory using mkdir. At this point, you will find that the new directory automatically has many cgroup configuration files. These directories have become child cgroups of the root cgroup node.
  5. Add and move processes in the Cgroup: All processes in the system are placed in the root node. You can move the process as needed:
    • Simply write the process ID to the tasks file of the corresponding Cgroup.
      sudo sh -c "echo $$ >> tasks"
      Copy the code

      This command adds the current terminal process to the Tasks file in the current cgroup directory.

  6. Subsystem to restrict the resources of processes within the Cgroup:
    • The problem with the above approach is that the hierarchy is not attached to any subsystem and therefore cannot control the resources.
    • However, in fact, the system will automatically create a hierarchy for each subsystem, so the process can be controlled by controlling the configuration in the hierarchy.

1.2.3 How does Docker use Cgroups

Here comes the big one. Docker creates a Cgroup for each container and then limits the resources of that Cgroup. To limit the resources of the container.

1.3 the Demo

Here is a Demo:

package main

import (
	"fmt"
	"io/ioutil"
	"os"
	"os/exec"
	"path"
	"strconv"
	"syscall"
)

const cgroupMemoryHierarchyCount = "/sys/fs/cgroup/memory"

func main(a) {

	// This code will be run the second time
        // This code can be thought of as a simple container
        // The process is isolated
        // But you can see that the PID has changed to 1 because we have pid Namespace
	if os.Args[0] = ="/proc/self/exe" {
		fmt.Printf("current pid %d\n", syscall.Getpid())
		cmd := exec.Command("sh"."-c".`stress --vm-bytes 200m --vm-keep -m 1`)
		cmd.SysProcAttr = &syscall.SysProcAttr{}
		cmd.Stdin = os.Stdin
		cmd.Stdout = os.Stdout
		cmd.Stderr = os.Stderr
		iferr := cmd.Run(); err ! =nil {
			fmt.Println(err)
			os.Exit(1)}}// Run this segment for the first time
        // **command is set to the current process, which is the go program itself, meaning cmd.start () will run the program again
	cmd := exec.Command("/proc/self/exe") 
        // Before start, modify the various configurations of CMD, that is, when the program is run the second time
	/ / create a namespace
	cmd.SysProcAttr = &syscall.SysProcAttr{
		Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS,
	}
	cmd.Stdin = os.Stdin
	cmd.Stdout = os.Stdout
	cmd.Stderr = os.Stderr

        // Since the id of process is to be printed later, start is used
        // If run is used here, the else code will never execute because stress never ends
	iferr := cmd.Start(); err ! =nil {
		fmt.Println("ERROR", err)
		os.Exit(1)}else {
		// Prints the new Process ID
		fmt.Printf("%v\n", cmd.Process.Pid)
                
                
                // select * from cgroup
		// the hierarchy has been already created by linux on the memory subsystem
		// create a sub cgroup
		os.Mkdir(path.Join(
			cgroupMemoryHierarchyCount,
			"testmemorylimit",),0755)

		// place container process in this cgroup
		ioutil.WriteFile(path.Join(
			cgroupMemoryHierarchyCount,
			"testmemorylimit"."tasks"), []byte(strconv.Itoa(cmd.Process.Pid)), 0644)

		// restrict the stress process on this cgroup
		ioutil.WriteFile(path.Join(
			cgroupMemoryHierarchyCount,
			"testmemorylimit"."memory.limit_int_bytes"), []byte("100m"), 0644)
                
                // cmd.start () does not wait for the process to end, so you need to wait manually
                // If not, the child process will be forcibly terminated because the main process is terminated
		cmd.Process.Wait()
	}
}
Copy the code

1.4 it

In fact, it has been introduced in the previous article, here is a review. It also resolves the question of how UFS implements changes to delete files. Please refer to # 1.4.5 of this article

1.4.1 it concept

The concept of UFS is much like Git, and is itself used in Linux, FreeBSD, and NetBSD. Changes to a file are reflected in a new file, rather than changes to an old file.

1.4.2 AUFS

AUFS is a modified version of UFS. In fact, some of AUFS’s own features were later incorporated into UFS 2.x.

1.4.3 Docker and AUFS

Docker actually used AUFS in its early days. Until now it was also optional as a storage driver type.

1.4.4 image layer

Image consists of multiple read-only layers. As I write this, the default storage driver has become Overlay2. You can find the overlay2 folder in /var/lib/docker. Each file in this folder is a layer. When a container is started, an init layer, also read-only, is added to the image to store the container’s environment configuration. In addition, Docker creates a Read-write layer that performs all write operations.

This Read-write layer remains when the container is stopped, and is deleted only when the container is deleted.

1.4.5 How do I delete old files in AUFS if I do not change them?

After reading the above statement, it was easy to think of this question, and finally someone has answered it. The answer is that Docker generates a.wh.

file at the Read-write layer to hide the file to be deleted.

1.4.6 Implementing an AUFS (Full Linux Operation)

  1. First create a folder structure like this:

    aufs
    |
    | -- mnt
    |
    | -- container-layer
    |                  | -- container-layer.txt << "I am container layer"
    |
    | -- image-layer1
    |               | -- image-layer1.txt << "I am image layer1"
    |
    | -- image-layer2
    |               | -- image-layer2.txt << "I am image layer2"
    |
    | -- image-layer3
    |               | -- image-layer3.txt << "I am image layer3"
    |
    | -- image-layer4
    |               | -- image-layer4.txt << "I am image layer4"
    Copy the code
  2. Then mount it to the MNT folder as follows:

    By default, the first folder on the left of the dirs folder has read-write permission, and all other directories have read-only permission.

  3. You can run cat /sys/fs/aufs/ si_XXXXXXXX /* to view the permissions of folders under the aufs folder.

  4. Then modify the image-layer1.txt file in the MNT folder and add a line of text at the end.

  5. If you look at image-layer1/image-layer1.txt, it doesn’t change.

    • However, when you look at the container-layer folder, there is an extra image-layer1.txt file
    • And the contents of this folder, not only after the addition of the text, and before the addition of the text.

  6. In other words, actually. When you modify a layer, you don’t actually change the layer, you copy it to the Container-Layer, and then modify the new file.