The Linux kernel | process management

1. Processes and threads

1.1 define

A process is a general term for a program in running state and related resources. It is the smallest unit of resource allocation.

A thread is a sequence of execution within a process and is the smallest unit of CPU scheduling.

There is a piece of executable code.
There is a system stack space and system space stack dedicated to processes.
There are process descriptors, which describe information about the process.
There are separate storage Spaces, that is, dedicated user space, which in turn has a user space stack.

The Linux system is very special to threading implementation. It does not distinguish between a thread and a process; a thread is just a special kind of process. If there is no user space at all, it is a system thread. If there is shared user space, it is a user thread.

1.2 Main Differences

The process is the basic unit of resource allocation, while the thread is the basic unit of independent operation and independent scheduling. Since the thread is smaller than the process and basically does not own system resources, the cost of its scheduling will be much smaller, and the degree of concurrent execution among multiple programs in the system can be improved more efficiently.

The main difference between processes and threads is that they are different operating system resource management methods. Processes have separate address Spaces. If a process crashes, it will not affect other processes in protected mode. Threads are just different execution paths within a process. Threads have their own stack and local variables, but there is no separate address space between threads, the death of a thread is equal to the death of the whole process, so the multi-process program is more robust than the multi-threaded program, but in the process switching, the cost of resources is larger, less efficient. But for some concurrent operations that require simultaneous operation and share some variables, only threads can be used, not processes.

Summary: In Linux, the only difference between a process and a thread is a separate address space.

2. Process descriptor and task structure

On a 32-bit machine, about 1.7KB in size, the process descriptor completely describes all the information about an executing process.

Task queue (bidirectional circular linked list)

Process descriptor struct task_struct (source | linnux/sched. H | v5.4)

struct task_struct { volatile long state; // -1 is unrunnable, 0 is runnable, >0 is interrupted int lock_depth; // Lock depth unsigned int policy; // Scheduling strategy: FIFO, RR, CFS, PID_T PID; Struct task_struct *parent (struct task_struct *parent); // struct list_head children; // struct list_head sibling; // sibling process}

2.1 Assign process descriptors

2.1.1 The slab allocator

Linux uses a slab allocator to assign task_struct structures

Purpose: Object reuse and cache coloring.

The slab allocator dynamically generates the task_struct by creating a new struct thread_info at either the bottom of the stack (as opposed to the stack growing downward) or the top of the stack (as opposed to the stack growing upward).

2.1.2 Process Descriptor Storage

The maximum PID value is set to 32768 (short int < Linux /threads.h>). The maximum can be increased by modifying /proc/sys/kernel/pid_max.

The current macro finds the process descriptor of the currently running process.

On x86 systems, current blocks the 13 significant bits behind the stack pointer and calculates the offset of THREAD_INFO.

Current_thread_info function

movl $-8192,%eax
andl %esp,%eax

2.1.3 Process state

TASK_RUNNING: 1. Executing 2. Waiting to execute in the run queue
TASK_INTERRUPTIBLE: Block (interruptible)
TASK_UNINTERRUPTIBLE: Block (uninterruptible)
\_\_TASK_TRACED: A process tracked by other processes
\_\_TASK_STOPPED: Process stopped

Trapped in kernel execution

The system calls
Exception handler

2.1.4 Process family tree

The init process

All processes are descendants of the init process with PID of 1
The kernel starts the init process at the end of system boot.

The purpose of init process: read the initialization script of the system, and execute other related procedures, and finally complete the whole process of system startup.

The task_struct records parent and child processes

Parent pointer (to the parent process)
The linked list of children processes

3. Process creation

Other operating systems provide mechanisms for spawn processes, which are first created in a new address space, read in an executable, and finally executed.

UNIX divides this mechanical flow into two steps: fork() and exec()

Fork () creates a child process by copying the current process
Exec () is responsible for reading the executable and putting it in the address space

3.1 Copy-on-write

Deferring the copying of a page in the address space until a write actually occurs.

How it works: If a process tries to modify a page, a page miss interrupt is generated. The way the kernel handles a page fault interrupt is to make a transparent copy of the page. The page’s COW attribute is cleared, indicating that it is no longer being shared.

3.2 the fork () function

The actual cost of fork() is copying the parent’s page table and creating a unique process descriptor for the child process.

In the current Linux kernel, fork() is actually implemented by the clone() system call

3.2.1 copy_process () function

Dup_task_struct () creates a kernel stack for the new process, with the thread_info structure and task_struct identical to the current process. The parent-child process descriptors are identical. (Allocate space)
Check to make sure that the number of processes owned by the current user does not exceed the resource limit assigned to it when the process is newly created. (Check the boundary)
Child processes are distinguished from parent processes. Many members of the process descriptor are cleared to 0 or initialized. Those members that are not inherited from the process descriptor are primarily statistics. Most of the data in the task_struct remains unmodified. (child process initialization)
The state of the child process is set to TASK_UNINTERRUPTIBLE (UNINTERRUPTIBLE, BLOCKED) to ensure that it will not be run. Set state of child process
Copy_process () calls copy_flags() to update the flags member of the task_struct. (Set flag bits)
- The PF_SUPERPRIV flag indicating whether the process has superuser privileges is cleared 0
- The PF_FORKNOEXEC flag is set to indicate that the process has not called exec()
Call alloc_pid() to assign a valid PID to the new process. Assign PID to child process
Depending on the parameters passed to clone(), copy_process() copies or shares open files, file system information, signal handlers, process address Spaces, namespaces, and so on. Typically, these resources are shared by all threads of a given process; Otherwise, these resources are different for each process and are therefore copied here. Assign resource parameter flags to structures
Copy_process () does the cleanup and returns a pointer to the child, then returns to do_fork(). If copy_process() returns successfully, the newly created child is woken up and put to work. Returns a pointer to the child process and awakens the child process to execute.

Note: The kernel intentionally lets the child process execute first, but this is not always the case, because the child process usually calls the exec() function right away, which avoids the copy-on-write overhead. Because the parent process executes first, it may write to the address space.

3.3 vfork function

The difference between vfork() and fork() : vfork() does not copy page table entries of the parent process.

Vfork () : The child runs in its address space as a separate thread of the parent. The parent is blocked until the child exits or executes exec(), and the child cannot write to the address space.

4. Thread creation

Thread creation is essentially the same as process creation, with a parameter flag passed by the clone() function called to indicate which resources to share.

Create a thread

clone(CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND, 0); // CLONE_VM: address space // CLONE_FS: file system // CLONE_FILES: file descriptor // CLONE_SIGHAND: signal handler and blocked signal

Create a process (equivalent to fork())

clone(SIGCHLD,0);

Create a process (equivalent to vfork())

clone(CLONE_VFORK | CLONE_VM | SIGCHLD, 0)

4.1 Kernel Threads

Kernel threads execute only in kernel space and never switch to user space.

The difference between a kernel thread and a normal process is that a kernel thread does not have a separate address space. (task_struct mm pointer set to NULL)

Kernel threads can only be created by other kernel threads, spawning all new kernel threads from the kthreadd kernel thread. Kthreadd is the ancestor of all kernel threads.

4.1.1 KThreadd kernel threads

The kthreadd kernel thread is created when the kernel is initialized. The kthreadd function is executed in a loop and is used to manage and schedule other kernel threads.

The kthreadd function runs kthreads maintained in the global linked list of kthread_create_list. You can call kthread_create to create a kthread, which is added to the kthread_create_list, and kthread_create wakes up kthreadd_task. When a kthreadd kernel thread runs a kthread, it calls the old interface kernel_thread, which runs a kernel thread named “kthread” to create a kthread. Executed kthreads are removed from the kthread_create_list, and the scheduler is repeatedly called by kthreadd to free up the CPU. The thread cannot be closed.

Create kernel thread, do not run

Kthread_create function (source | Linux/kthread. H | v5.4) is through the clone () system call, a kernel thread is created, but the newly created thread is not running.

kthread_create(threadfn, data, namefmt, arg...)

Create the kernel thread and run it

Kthread_run function (source | Linux/kthread. H | v5.4), by calling the kthread_create function creates a kernel thread, and then call wake_up_process () to wake up.

#define kthread_run(threadfn, data, namefmt, ...) \ ({ \ struct task_struct *__k \ = kthread_create(threadfn, data, namefmt, ## __VA_ARGS__); \ if (! IS_ERR(__k)) \ wake_up_process(__k); \ __k; The \})

Kernel thread stop

int kthread_stop(struct task_struct *k);

5. End of the process

Release the occupied resources and notify the parent process.

In general, the process’s destructing is self-inflicted; it occurs when the process invokes the exit() system call.

You can either explicitly call the exit() system call or implicitly return it from a program’s main function. (The C editor places the call exit code after the return point from main().)

Most terminated tasks rely on do_exit()(<kernel/exit.c>)

5.1 to do_exit () function

Set the flag member in task_struct to PF_EXITING
Call del_timer_sync() to remove any kernel timer. Make sure no timer is queuing and no timer handler is running.
If BSD billing is enabled, do_exit() calls acct_update_integrals() to print billing information.
The exit_mm() function is called to free the mm_structs occupied by the process, and if no other process is using them at the same time (that is, the address space is not shared), it frees them completely.
The sem__exit() function is called, and if the process is queued for an IPC signal, it exits the queue.
Calls to exit_files() and exit_fs() decrement the file descriptor, file system data reference count, respectively. If one of the reference counts drops to zero, it does not mean that no process is using the resource, and can be freed at this time.
The task exit code stored in the task_struct exit_code() member is the exit code provided by exit(), or any other exit action specified by the kernel mechanism. The exit code is stored here for the parent process to retrieve at any time.
The call to exit_notify signals the parent process, and sets the exit_state stored in the task_struct structure to EXIT_ZOMBIE.
The do_exit call schedule() switches to the new process. Since a process in the EXIT_ZOMBIE state is not scheduled, this is the last piece of code that the process executes, and do_exit() never returns.

5.2 Wait family functions

The wait family of functions is implemented through a unique but complex system call, wait4(), which suspends the calling process until one of its children exits, at which point the function returns the child’s PID. In addition, the pointer provided when this function is called contains the exit code for the child function.

Author: The world is the most beautiful

More blog posts: https://hqber.com