preface

Hi, dear friends, we meet again. There have been some changes in my work and life this month. Now I am making positive adjustments to adapt to the changes. I will make more efforts to maintain the previous pace of study and Posting. Without further ado, today we are going to talk about processes and process scheduling in the Linux kernel. When I learned the operating system before, ALTHOUGH I knew some basic design ideas of the operating system, I still didn’t have a good understanding of the practical application of Windows and Linux, especially in the crucial aspect of process scheduling, and I didn’t understand the specific differences between the two operating systems. Android is based on the Linux kernel. It is always good to know a little kernel knowledge. This is the first article in a series on the Linux kernel, and more will follow.

Of course, the Linux kernel is extensive and profound, and I only get a glimpse of it by reading source code and books. If there is anything wrong, welcome friends to correct. This article is based on Linux 2.6.25, in this version of Linux has been added to the CFS scheduling policy, while the amount of code is not very large, more suitable for reading. The source code download address: mirrors.edge.kernel.org/pub/linux/k…

For those of you who may not find it useful to read this, you can read this article first. Crash capture mechanism and implementation of Android Native code

It will take about 30 minutes to read this article and you will learn:

  • Process basics in Linux
  • Process descriptors and assignments
  • Process status and changes
  • Process family tree
  • Kernel mode and user mode
  • Process creation
  • Thread and process connections and differences

Weak weak ask for a point of praise and attention, give small stupid bird a little writing motivation. Android’s Journey to the Dumb Bird. Stay tuned for more technical advisory articles.

1. Processes and threads

1.1 Basic Elements

It may not be easy to give a clear definition of a process, but it is generally accepted that a process is a general term for a running program and related resources, with some elements:

  • Have a piece of executable program code. It’s like a play needs a script. Code snippets can be shared by multiple processes, just as a script can be used for many shows.
  • Has a system stack space dedicated to processes. Can be considered the “private property” of this process and, accordingly, must have a system space stack
  • In the system, there are process control blocks (or process descriptors, in both cases) that describe information about the process. Think of it as the “account” of the process. The system uses this control block to control the related behavior of the process
  • There is a separate storage space, that is, a dedicated user space, and a corresponding user space stack.

All four are necessary for progress. A process usually contains one or more threads of execution. Threads are objects that are active in the process and are the basic object of kernel scheduling. You may have a question, the process can have independent running procedures and resources, has been able to complete the relevant tasks ah, why the introduction of the concept of threads, and the thread as the kernel scheduling of the basic object?

The reason for the introduction of a process is that there may be multiple different tasks in the same process. These tasks need to share the data of the process, but the operation data of these tasks are independent to a certain extent. Therefore, multiple tasks do not need to be executed according to the time sequence, so the concept of thread comes into being. Take Office Word to write articles as an example, the operation of Office Word program is a process, but the process can only run it, and Word also detects the movement of your cursor, error correction and other related functions, so the cursor movement and error correction is different threads, need CPU according to different strategies to schedule. As a result, threads were introduced as basic objects for kernel scheduling

Linux is very special about thread implementation, it does not distinguish between thread and process, thread is just a special kind of process. From the point of view of the above four elements, have the first three and lack of the fourth element is the thread, if there is no fourth user space, that is the system thread, if it is shared user space, that is the user thread.

Since threads are just special processes, we’ll focus on processes and finally explain the difference between threads and processes under Linux.

1.2 Process descriptor

The kernel stores the list of processes in a two-way circular linked list called a task queue. Each item in the list is of type task_struct, called a process descriptor, and is defined in < Linux /sched.h>. Process descriptors contain all information about a process, including quite a lot of data, such as process status, open files, pending signals, parent and child processes, etc., so they are relatively large.

Some of the more important attributes are defined below.

struct task_struct {
	volatile long state;	// -1 indicates that it cannot run, 0 indicates that it can run, and >0 indicates that it is interrupted
	int lock_depth;		// Lock depth
    unsigned int policy; // Scheduling policy: FIFO, RR, CFS
	pid_t pid;   // Process identifier, used to represent a process
	struct task_struct *parent;	/ / the parent process
	struct list_head children;	/ / the child process
	struct list_head sibling;   // Sibling process
}
Copy the code

Since version 2.6, Linux has used the slab allocator to dynamically generate task_structs by creating a new structure, struct thread_info(the end of the stack object), either at the bottom of the stack (the downward growing stack) or at the top (the upward growing stack). You can think of a slab allocator as an optimization strategy for allocating and releasing data structures. By pre-allocating and reusing task_struct, you can avoid the resource consumption associated with dynamic allocation and release. Therefore, rapid process creation is an important feature of Linux systems.

The slab allocator divides different object types into different cache groups, such as one cache for the process descriptor task_struct and another for the inode. These caches are in turn divided into slabs, which consist of one or more physically contiguous pages. When applying for a data structure, such as a task_struct, we apply for it from a half-full slab(slabs_partial). If there is none, we apply for it from an empty slab(slabs_empty). Until all slabs are slabs_full. If there’s no empty slab left, then apply for a new empty slab. This strategy reduces the amount of memory fragmentation frequently requested and freed by data structures, and is quickly allocated and freed thanks to caching.

Let’s continue with the thread_info structure. Defined in < the asm/thread_info. H >

struct thread_info {
	struct task_struct	*task;		/* main task structure */
	struct exec_domain	*exec_domain;	/* execution domain */
	unsigned long		flags;		/* low level flags */
	__u32			cpu;
	int			preempt_count; /* 0 => preemptable, <0 => BUG */
	mm_segment_t		addr_limit;	/* thread address space */
	struct restart_block	restart_block;
};
Copy the code

In the kernel, all processes need to obtain the pointer to the task_struct process descriptor, so the speed of obtaining the task_struct is particularly important. Some hardware systems will provide special registers to store the pointer to the current task_struct process. Some systems that don’t have enough registers have to create thread_info structures at the end of the stack, which can be evaluated indirectly.

1.3 Process Status

As we said earlier, the process descriptor describes the current state of the process and contains all information about the process. The state field in the process descriptor describes the current state of the process. We can in the kernel/include/Linux/sched. H > found in the definition of process state values. Different system versions may vary. The following states are commonly used:

#define TASK_RUNNING		0
#define TASK_INTERRUPTIBLE	1
#define TASK_UNINTERRUPTIBLE	2
#define __TASK_STOPPED		4
#define __TASK_TRACED		8
/* in tsk->exit_state */
#define EXIT_ZOMBIE		16
#define TASK_DEAD		64
Copy the code

Every process in the system must be in one of seven process states:

  • TASK_RUNNING: The process is executable. It may be executing or in a run queue waiting to be executed. That is, if it is executable, it is in TASK_RUNNING state regardless of whether it is executed or not. Multiple processes may be in an executable state at the same time, and they are placed in a run queue waiting for the process scheduler to schedule them.

  • TASK_INTERRUPTIBLE: The process is blocking, waiting for some condition to be reached. When these conditions are met, the kernel sets the process state to run, and the process in this state wakes up early and is ready to run when it receives a signal. Using the ps command, you can see that most of the system is asleep.

  • TASK_UNINTERRUPTIBLE: Does not wake up or get ready to run even if a signal is received. Less used than interruptible states. Uninterruptible does not mean that the CPU does not respond to interrupts from external hardware, but that the process does not respond to asynchronous signals. The TASK_UNINTERRUPTIBLE state exists because certain kernel processes cannot be interrupted. For example, when the kernel interacts with hardware, this state may be required to prevent the interaction between the process and the device from being interrupted, causing the device to fall into an uncontrollable state.

  • __TASK_TRACED: A process that is traced by other processes, such as debuggers, through ptrace. When we use breakpoints in our daily development, we will find that the process stays at the same location as our breakpoint. This process is __TASK_TRACED. A process in this state cannot continue to return to TASK_RUNNING until the debugger performs PTRACE_CONT, PTRACE_DETACH, and so on through the ptrace system call.

  • __TASK_STOPPED: The process has stopped executing. It’s not up and running and it can’t be up and running. It usually occurs when receiving signals such as SIGSTOP, SIGTSTP, SIGTTIN, and SIGTTOU. This is similar to the __TASK_TRACED state. Sending a SIGCONT signal to a TASK_STOPPED process can restore it to the TASK_RUNNING state.

  • TASK_DEAD: Indicates that the process exits and is about to be destroyed.

  • EXIT_ZOMBIE/TASK_ZOMBIE: The process has ended but the task_struct process control block has not been logged out. This state needs to be seen in conjunction with the TASK_DEAD state above, where the process is in the TASK_DEAD state during exit. During this exit, all resources held by the process are reclaimed, but the parent process is likely to care about some information about the process, so the task_struct structure carrying that information is not destroyed.

The switching relationship between states is shown as follows:

It can be seen that although there are many states, the direction of change is always two:

  • TASK_RUNNING -> Non-task_running state
  • Non-task_running -> TASK_RUNNING state

That is, even if a process is killed in TASK_INTERRUPTIBLE state, it needs to wake up and enter TASK_RUNNING state before responding to the kill signal to enter TASK_DEAD state.

1.4 Process family tree

Processes in Linux have an obvious inheritance relationship. All processes are descendants of the init process with PID 1. The kernel starts the init process at the end of the boot process, which reads the initialization script and completes the boot process.

Each process in the system must have a parent process, and each process can have zero or more child processes, so each process can also have more than one sibling process. The relationship between processes is stored in the task_struct process descriptor. We saw in Section 1.2 that task_struct has three attributes: parent, child, and sibling lists.

	struct task_struct *parent;	/ / the parent process
	struct list_head children;	/ / the child process
	struct list_head sibling;   // Sibling process
Copy the code

Because of this inheritance system, we can look up any given process by pointing to any other process.

1.5 Kernel mode and user mode

As we all know, the Linux kernel is actually a special software program. What’s special about it? Controls the hardware resources of a computer, such as coordinating CPU resources, allocating memory resources, and providing a stable environment for programs to run. The application completes its tasks under the scheduling of the kernel.

From the perspective of system design, the kernel should have the ability to control all the resources and operations of the system, and the application program access to resources and various operations should be allowed by the system, so as to ensure the smooth and safe operation of the system. Thus, one of the philosophies of Linux is to assign different levels of execution to different operations, and some particularly critical operations related to the system must be performed by the most privileged program. The corresponding is the kernel mode and the user mode. Processes running in user mode can perform very limited operations and access resources, using only as many resources and functions as they allow, whereas processes running in kernel mode can perform any operation and have no restrictions on resource usage.

From a memory usage perspective, there are two states: kernel space and user space. Each process has 4 gigabytes of virtual addressing space, of which up to 1 gigabyte is the same, known as kernel space. Only the remaining 3G is for the process’s own use. In other words, one gigabyte of space is shared by all processes. A process is kernel-mode when it is running in kernel space and user-mode when it is running in user-space. At the same time, there is a system stack and a user stack respectively in these two Spaces, and processes run in different states using different stacks. When running in kernel space, the CPU can execute any instruction. There are no restrictions on the code that can be run. When a process is running in the user address space, it should behave like a child controlled by an adult. See if you understand the basic elements of section 1.1 better.

Why does a process need a kernel state to run in user state? In fact, this is not possible because the process function is kernel dependent (no way, too limited). For example, the common printf function is initiated by the application, but it needs to enter kernel state to write data to the console. We also say that applications run in kernel space, or that the kernel runs in a process context, or is trapped in kernel space. This interaction is the basic behavior of the application to do its job.

So what are the ways to go from user to kernel? There are generally three kinds

  • throughThe system callsTo enter, for example printf in our example above calls the write function
  • throughsoftirqsEnter, common is that the process suddenly abnormal. For example, when an Application crash occurs in Android, the process will enter the kernel state and invoke interrupt service.
  • throughHardware interruptEntry, usually an interruption of an external device. When the peripheral device to complete the user’s request operation, will like THE CPU to send an interrupt signal, at this time, the CPU will suspend the execution of the next instruction to be executed, to execute the interrupt signal corresponding to the processing program, if the previous execution of the instruction is in the user state, it will naturally occur from the user state to the kernel state conversion. Such as network card, keyboard, etc., as soon as you type, the process will fall into the kernel state, is not very wonderful.

After the above mentioned applications enter the kernel state through soft interrupts and hard interrupts, they will find and invoke the corresponding interrupt service routines. Linux interrupt service routines do not execute in the process context, but run in a process-independent interrupt context to ensure that the interrupt service routines can respond and process immediately, and then exit quickly.

So the process, or CPU, must be doing one of three things at any given point in time:

  • Runs in user space and executes user processes
  • Runs in kernel space, in process context, and executes on behalf of a particular process
  • Runs in kernel space in an interrupt context, independent of any process, and handles a particular interrupt.

1.6 Section Summary

This section first introduces the basic elements of a process, the knowledge of process descriptors and the switch between different states of a process, and then introduces the process family tree and the kernel state and user state of a process. After reading the first section, you should have a preliminary understanding of how processes work. We will take a step closer in the second section, introducing process creation and introducing the connection and difference between processes and threads under Linux. Give yourself a pep talk and let’s keep going!

2. Process creation and threads

2.1 Process Creation

Linux process creation is unique, of course, because it inherits from Unix. Many other operating systems, such as Windows, provide a mechanism for creating a process by first creating it in a new address space, reading in an executable, and then executing it, perhaps through a single method. But Unix splits the above steps into two separate functions, which are used together as single functions on other systems.

  • The fork function: Create a child process by copying the current process. The difference between the child process and the parent process is PID(process ID), PPID(parent process ID) and a few resources
  • The exec function: Is responsible for reading the executable and loading it into the address space to start running. Usually refers to the exec family of functions.

This design reflects the Unix design philosophy and is now more consistent with the single responsibility principle, since it is common for parent and child processes to do different things (executables).

As mentioned earlier, one of the features of Unix is that processes are created and released fairly quickly. This is also shown in the fork function. The responsibility of the fork function is to create a child that has all of the parent’s resources. The traditional fork function copies all the resources of the parent class to the new resource. This implementation is too simple and inefficient, because in many cases the copy is meaningless, such as the data is not shared, or the child process does not use the data. The Linux fork function uses copy-on-write optimization. When Linux creates a child process, the kernel does not copy the parent process’s address space. Instead, the child process shares the parent process’s address space data in read-only mode, which you can imagine as a pointer to the original address. When the child process needs to write the data, the data will be copied to the child process. This can delay or even eliminate copying data.

Let’s look at the process creation process at the code level.

2.2 the fork and vfork

At the pure code level, Linux has two different functions to create processes: fork and vfork. Both functions copy a new process from the parent process, but there are differences. Here are definitions of fork and vfork. Defined in <kernel/fork.c>. This code is from kernel 4.4.

//fork the system call
SYSCALL_DEFINE0(fork)
{
  return do_fork(SIGCHLD, 0.0.NULL.NULL);
}

SYSCALL_DEFINE0(vfork)
{
  return _do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, 0.0.NULL.NULL.0);
}

long do_fork(unsigned long clone_flags,
        unsigned long stack_start,
        unsigned long stack_size,
        int __user *parent_tidptr,
        int __user *child_tidptr)
{
  return _do_fork(clone_flags, stack_start, stack_size,
      parent_tidptr, child_tidptr, 0);
}

long _do_fork(unsigned long clone_flags,
        unsigned long stack_start,
        unsigned long stack_size,
        int __user *parent_tidptr,
        int __user *child_tidptr,
        unsigned long tls)
{
	/ /... Leave out a lot of code
	p = copy_process(clone_flags, stack_start, stack_size,
       child_tidptr, NULL, trace, tls);
	/ /... Leave out a lot of code
}
Copy the code

We can see that both fork and vfork are ultimately implemented through calls to do_fork, but the arguments passed in differ in value. Some flags are passed in the first parameter, and this flag is eventually used by copy_process. The copy_process method is where the actual copy is executed, and you can explore it if you’re interested. So what do these flags mean? Here is the definition of flag in Linux (defined in <include/ Linux /sched.h>).

With reference to the difference in Flags passed in here, we can further conclude:

  1. Fork makes a copy of the parent’s page table, whereas Vfork never makes a copy of the parent’s page table. (The page table implements address mapping from page numbers to physical block numbers.) This is where vfork is superior to the current fork
  2. After vfork creates the child process, the child process acts as the parent processthreadRunning in the address space whileBlocking the parent processUntil the child exits or exec() is executed. And the child process cannot write to the address space. This would have made sense before fork did not support copy-on-write, but since fork later introduced copy-on-write pages and made it clear that child processes execute first, the benefits of Vfork are limited to not copying page entries.

2.3 the thread

Creating a thread is similar to creating a normal process, except that a different flag is passed when calling do_fork to indicate which resources need to be shared. This is done using the pthread_create method, which eventually invokes the do_fork method. The flag passed in is

CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND
Copy the code

As can be seen from the above flag, the created thread is actually a special process from the kernel level. It shares the open file and file system information, address space and signal processing functions with the parent process. This is also consistent with our traditional impression of the relationship between threads and processes. The Linux implementation is quite different from the implementation of operations like Windows. Assume that a process has four threads. In other systems that provide dedicated thread support, a process descriptor is added to the process that contains Pointers to all the threads of the process. But Linux makes it easier to just create four processes and assign four common process descriptors, specifying that they share certain resources.

Threads are divided into kernel threads and user threads. Kernel threads are standard processes that run independently in kernel space. They have no independent address space and never switch to user space. The user thread is what we know as a normal thread.

2.4 Section Summary

This section describes the difference between threads and processes in process creation. Next we move on to the topic of CPU scheduling, also known as CPU elapsed time allocation.

The full text summary

This article introduces the process and thread related knowledge, originally wanted to include the process scheduling related content, but limited to the length of the split into two articles, in this to tell the secret, the Linux kernel process scheduling is very different from our previous operating system class, I feel very elegant and interesting. For those interested, stay tuned for my next article, “A Primer on the Linux Kernel: Process Scheduling.”

reference

Linux Kernel Design and Implementation

Man manual

Linux user state and kernel state

Gityuan

I am Android stupid bird journey, stupid birds also want to have the heart to fly up, I am here to accompany you slowly become stronger. Looking forward to your attention