We use the pipe command all the time. It takes the output of one instruction to the input of another instruction, which is butt to mouth, and even elementary programmers know this principle. But less than 1% of people would be able to answer the question of how the output of one instruction is imported into the input of another instruction and what role the pipe plays. Let’s take a closer look at the implementation of the pipe instruction. What does the shell do with the following instruction

$ cmd1 | cmd2
Copy the code

First, I describe the final form with the following picture, and then I break down the formation process of the final form step by step

In the figure above, we see the process descriptor table, pipes, and parent-child relationships for processes.

The fork and the exec

Each time the shell executes an instruction, it forks a child process to execute it, and then replaces the mirror of the child process with the target instruction, which in turn uses exec. Take this simple command

$ cmd
Copy the code

The exec function does not change the process number of the current process and does not change the parent-child relationship between processes. A process can be thought of as a sphere with a shell. After exec, the outer shell does not change and the contents of the sphere are completely replaced. By default, the I/O file descriptor is above the shell, which means that the I/O of the CMD command inherits the I/o of the shell process.

$ cmd1 | cmd2
Copy the code

When an instruction contains a pipe character, which means that two instructions need to be executed in parallel, the shell forks two child processes and then exec them into the target instruction.

We notice that there is also a pipe in the diagram, which is the pipe responsible for communication between the parent and child processes.

pipe

Pipes are used to communicate between parent and child processes, and create pipes before fork, which become the link between the parent and child processes after fork. The pipe function returns two descriptors (pipe_in, pipe_out), one for reading and one for writing.

dup2

Now we need to adjust the tip of the descriptor, so that the CMD1 process stdout descriptor is written to the pipe, and the CMD2 process stdin descriptor is read to the pipe. This requires the magic dup2(fd1, fd2) function, It is used to associate the fD1 descriptor with the kernel object pointed to by fD2. The reference count of the kernel object pointed to by fd1 is reduced by one. If reduced to zero, it is destroyed. Note that normally we call the close method to essentially decrement the reference count; the same kernel object can be shared by multiple processes. This is officially closed when the reference count drops to zero.

Let’s apply the dup2 rule to both processes by calling dup2

Then close off the unwanted descriptors and you get the final image below. Perfect!

If it is two pipes and three commands are as follows, two pipes are generated

$ cmd1 | cmd2 | cmd3
Copy the code

What happens if a process on either side suddenly hangs?

Assume that CMD1 hangs first, pipe write is passively closed, and CMD2 encounters EOF while reading pipe contents, and then terminates normally. If cmD2 hangs first and cmD1 continues to write to the pipe, the process will receive a SIGPIPE signal. The default action is to exit the process.

In the next article, we’ll use cool code to implement the whole process, not only to understand how it works, but also to learn more details by experimenting with it.

To read more in-depth technical articles, scan the QR code above to follow the wechat public account “Code Hole”.