———–

Speaking of processes, perhaps the most common interview question is about threads and processes, so here’s the answer: On Linux, processes and threads are almost indistinguishable.

A process in Linux is a data structure that allows you to understand the underlying workings of file descriptors, redirects, and pipe commands. Finally, from an operating system perspective, we’ll see why threads and processes are basically the same.

PS: I have carefully written more than 100 original articles and brushed 200 force button topics hand by hand, all of which were published in labuladong’s algorithm cheat sheet and updated continuously. Suggested collection, in accordance with the order of my article brush topic, master all kinds of algorithm set to re-enter the sea of questions like a duck in water.

What is the process

First, in the abstract, our computer is this thing:

The large rectangle represents the computer’s memory space, with smaller rectangles representing processes, circles in the lower left corner representing disks, and graphics in the lower right corner representing input and output devices such as mice, keyboards, monitors, and so on. Also, notice that the memory space is divided into two parts, with the top half representing user space and the bottom half representing kernel space.

The user space contains the resources that the user process needs to use. For example, if you open an array in the program code, the array must exist in the user space. The kernel space stores system resources that the kernel processes need to load, and these resources are generally not allowed to be accessed by users. Note, however, that some user processes share resources in kernel space, such as dynamic link libraries.

We use C language to write a Hello program, compiled to get an executable file, run on the command line can print a hello world, and then the program exit. At the operating system level, we create a new process that reads our compiled executable into memory, executes it, and exits.

The executable you compiled is just a file, not a process. The executable must be loaded into memory and packaged as a process to actually run. Processes are created by the operating system. Each process has its own attributes, such as process number (PID), process status, open files and so on. After the process is created, it reads your program and your program is executed by the system.

So how does the operating system create processes? For the operating system, the process is a data structure, we directly look at the Linux source code:

struct task_struct {
	// Process status
	long			  state;
	// Virtual memory structure
	struct mm_struct  *mm;
	/ / process
	pid_t			  pid;
	// A pointer to the parent process
	struct task_struct __rcu  *parent;
	// List of child processes
	struct list_head		children;
	// A pointer to file system information
	struct fs_struct		*fs;
	// An array containing Pointers to open files for the process
	struct files_struct		*files;
};
Copy the code

Task_struct is the Linux kernel’s description of a process, also known as a “process descriptor.” The source code is complicated, and I’ve captured a few of the more common ones.

Of particular interest are the MM and files Pointers. Mm refers to the process’s virtual memory, where resources and executables are loaded; The files pointer points to an array that holds Pointers to all open files for the process.

PS: I have carefully written more than 100 original articles and brushed 200 force button topics hand by hand, all of which were published in labuladong’s algorithm cheat sheet and updated continuously. Suggested collection, in accordance with the order of my article brush topic, master all kinds of algorithm set to re-enter the sea of questions like a duck in water.

What is the file descriptor

Let’s start with files, which is an array of file Pointers. In general, a process reads input from files[0], writes output to files[1], and writes error messages to files[2].

For example, the C printf function prints characters to the command line from our point of view, but from the process’s point of view, it writes data to files[1]; Similarly, scanf is the process trying to read data from files[0].

When each process is created, the first three digits of files are filled with default values, pointing to the standard input stream, standard output stream, and standard error stream respectively. By default, the program’s file descriptor is 0 for input, 1 for output, and 2 for error.

We can draw a new picture:

On a typical computer, the input stream is the keyboard, the output stream is the monitor, and the error stream is also the monitor, so now the process has three wires to the kernel. Since the hardware is managed by the kernel, our process needs to make “system calls” to the kernel process to access hardware resources.

PS: Don’t forget that in Linux everything is abstracted into files, and devices are files that can be read and written.

If we write a program that needs other resources, such as opening a file to read or write, this is also easy. Make a system call and ask the kernel to open the file, and the file will be placed in position 4 of files:

If we point files[0] to a file, the program will read data from that file, not from the keyboard:

$ command < file.txt
Copy the code

Similarly, output redirection is to point files[1] to a file, so that the output of the program is not written to the monitor, but to this file:

$ command > file.txt
Copy the code

The same is true for error redirection, which I won’t go over again.

In fact, the pipe is similar to the same, the output stream of one process and the input stream of another process connected to a “pipe”, data transfer in it, we have to say that this design idea is really beautiful:

$ cmd1 | cmd2 | cmd3
Copy the code

Here, you may see that “everything is in the Linux file” design thinking of intelligent, whether equipment, another process, the socket socket or real file, all can read and write, unified into a simple array of files, the process by simple file descriptors to access resources, the corresponding specific details to the operating system, Effective decoupling, beautiful and efficient.

What is a thread

First of all, it should be clear that multithreading and multithreading are both concurrent and can improve the efficiency of the processor, so the key is now, what is the difference between multithreading and multithreading.

The reason why there is almost no difference between threads and processes in Linux is that, from the Linux kernel’s point of view, threads and processes are not treated differently.

We know that fork() creates a child process and pThread () creates a thread. However, both threads and processes are represented by task_struct structures, with the only difference being the shared data area.

In other words, a thread looks no different from a process, except that some of its data areas are shared with its parent, and the child is a copy, not a share. For example, the mm structure and files structure are shared between threads. Let me draw two pictures to see what you can see:

Therefore, our multithreaded program should use the locking mechanism to avoid multiple threads writing data to the same area at the same time, otherwise it may cause data corruption.

Why, you might ask, is multithreading so much more common than multi-processing, since processes are similar to threads and data is not shared between multiple processes, i.e. there is no problem with data corruption?

Because in reality, the concurrency of data sharing is more common, for example, 10 people simultaneously withdraw 10 yuan from an account, we want the balance of the shared account to be correctly reduced by 100 yuan, rather than each person getting a copy of the account, each copy of the account reduced by 10 yuan.

Of course, it must be noted that only Linux systems treat threads as processes that share data without special treatment. Many other operating systems treat threads and processes differently. Threads have their own unique data structures, which I personally think is not as simple as Linux, increasing the complexity of the system.

It is very efficient to create threads and processes in Linux. To solve the problem of copying the memory area when creating a process, Linux adopts a copy-on-write policy optimization, that is, it does not actually copy the memory space of the parent process, but does not copy it until it needs to write. So creating new processes and threads in Linux is very fast.

_____________

My online e-book has 100 original articles, hand handle with brush 200 force buckle topic, suggest collection! The corresponding GitHub algorithm repository has been awarded 70K Star, welcome standard star!