This is the 24th day of my participation in Gwen Challenge

preface

In the single-core era, everyone wrote single-process/single-threaded programs. With the development of computer hardware technology, after entering the multi-core era, in order to reduce response time and make full use of multi-core CPU resources, the means of using multi-process programming is gradually accepted and mastered by people. However, because of the high cost of creating a process, multithreaded programming is becoming more and more popular.

I remember thinking when I first learned about thread processes, why is it so rare for people to use multi-process and multi-thread together, wouldn’t it be better to combine them? Now think back to the original was too young too simple, the following article will mainly discuss this problem.

Process and thread models

The classic definition of a process is an instance of an executing program. Every program in the system is running in a process context. Context is made up of the states needed for a program to run correctly. This state includes the program’s code and data in storage, its stack, the contents of the general-purpose registers, the program counter (PC), environment variables, and a collection of open file descriptors.

Processes provide two main abstractions to upper-layer applications:

  • A separate logical control flow that provides the illusion that our program exclusively uses the processor.
  • A private virtual address space that provides the illusion that our program exclusively uses the memory system.

A thread is a flow of logic running in a process context. Threads are automatically scheduled by the kernel. Each thread has its own thread context, including a unique integer thread ID, stack, stack pointer, program counter (PC), general purpose registers, and condition codes. Each thread shares the rest of the process context with other threads running in the same process. This includes the entire user virtual address space, which is made up of read-only text (code), read/write data, heap, and all shared library code and data areas. Threads also share collections of open files.

That is, a process is the smallest unit of resource management and a thread is the smallest unit of program execution.

On Linux, POSIX threads can be “considered” lightweight processes. Both pthread_create and fork are created by calling the __clone function in the kernel, with different options such as whether to share virtual address Spaces, file descriptors, and so on.

Fork and multithreading

We know that a child process created through fork is almost but not exactly the same as the parent process. The child gets the same (but separate) copy of the parent’s user-level virtual address space, including text, data and BSS segments, heaps, and user stacks. The child also gets the same copy of any open file descriptor as the parent, which means that the child can read and write any open file in the parent. The biggest difference between the parent and the child is that they have different Pids.

The fork(2)-Linux Man Page has the following description:

The child process is created with a single thread–the one that called fork(). The entire virtual address space of the parent is replicated in the child, including the states of mutexes, condition variables, and other pthreads objects; the use of pthread_atfork(3) may be helpful for dealing with problems that this can cause.

This means that all threads, except the one calling the fork, “evaporate” in the child process.

This is where all the problems with forks in multithreading come from.

The mutex

Mutex is a key part of most problems with multi-threaded fork.

On most operating systems, for the sake of performance factors, locks are basically realize in user mode rather than a kernel mode (because the user mode to realize the most convenient, is basically by atomic operations, or the memory of the mentioned in previous articles barrier), so calls fork, will copy the parent of all the lock to the child.

That’s where the problem lies. From an operating system perspective, each lock has an owner, the thread that locks it. Suppose that before the fork, one thread locks a lock, that is, holds the lock, and then another thread calls the fork to create a child process. However, the thread holding the lock in the child process “disappears”. From the child process’s point of view, the lock is “permanently” locked because its owner “evaporated”.

A deadlock occurs if any thread in the child process locks the already held lock.

Of course, one could argue that it is possible to have the forked thread acquire all the locks before forking, and then release each lock in the child process. Regardless of whether business logic and other factors allow this, the problem with this approach is that it implies an order of precedence for locking, and if the order is different than usual, a deadlock will occur.

If you say you can lock in the right order without making a mistake, there is another implicit problem that you can’t control: library functions.

Because you can’t be sure that all library functions you use will not use shared data, they are completely thread-safe. There are a number of thread-safe library functions, such as the C/C++ library functions malloc and printf, which are used by almost all programs, that are implemented internally by holding mutex.

For example, if a multithreaded program allocates dynamic memory before fork, malloc will be used. This also requires malloc, which is unsafe because it is possible that the lock inside malloc was already held by a thread before the fork occurred, and that thread disappeared from the child process.

Exec and file descriptors

In light of the above analysis, it seems that the only sensible choice in multithreading is to call exec immediately on the child process produced by fork, but even this is a bit of a disadvantage. Since the child inherits all open file descriptors from the parent, it can still read and write files from the parent until exec is executed, but what if you don’t want the child to be able to read and write an open file from the parent?

Perhaps FCNTL setting file attributes is one way to do this:

int fd = open("file", O\_RDWR | O\_CREAT);
if (fd < 0)
{
    perror("open");
}
fcntl(fd, F\_SETFD, FD\_CLOEXEC);
Copy the code

However, if the child is forked by another thread after the file is opened and before the CLOEXEC property is set by FCNTL, the child will still be able to read and write file. If locks are used, we are back to the situation discussed above.

Starting with the Linux 2.6.23 kernel, we can now set the O_CLOEXEC flag in open, making “open file and set CLOEXEC” an atomic operation. This prevents the parent process from reading or writing open files before the child process forks.

pthread_atfork

If you’re unlucky enough to have a fork problem with multiple threads, try using pthread_atfork:

int pthread\_atfork(void (\*prepare)(void), void (\*parent)void(), void (\*child)(void));
Copy the code
  • The prepare handler is called by the parent process before the fork creates the child process. Its job is to obtain all locks defined by the parent process.
  • The parent handler is called in the parent environment after fork creates the child process, but before fork returns. Its job is to unlock all locks acquired by Prepare.
  • The child handler is called in the child environment before the fork returns. Like the parent handler, it must unlock all locks acquired in prepare.

Because the child inherits a copy of the parent’s lock, all of the above is not unlocked twice, but separately. The pthread_atfork function can be called multiple times to set up multiple sets of fork handlers, but when multiple handlers are used. Handlers are not called in the same order. Parent and Child are called in the order in which they were registered, while Prepare is called in the reverse order. This allows multiple modules to register their own handlers and maintain the lock hierarchy (similar to the construction-destructor hierarchy of multiple RAII objects).

Note that pthread_atfork can only clean locks, but not condition variables. In some system implementations, condition variables do not need to be cleaned up. However, in some systems, the implementation of condition variables contains locks, which may need to be cleaned up. However, there is no interface or method to clean up condition variables.

conclusion

  • In multithreaded programs, it is best to use only fork to perform exec functions, and do not perform any other operations on the children of fork processes.
  • If you decide to fork an exec function in multiple threads, you need to add the CLOEXEC flag when opening the file descriptor before fork.