preface

In the era of multi-core CPUS, using multi-threading or multi-processing can take full advantage of the CPU’s multi-core performance to improve the execution efficiency of programs. However, why Python multi-threading sometimes takes longer than a single thread, and why Python multi-processing is recommended in most cases instead of multi-threading? To address these issues, this article will focus on the differences between Python multi-process and multi-thread and the selection of application scenarios.


Introduction to processes and threads

A program is an executable file stored on disk. When the program is loaded into memory and called by the operating system, it has a life cycle. A process is a running program. A process can run multiple threads in parallel, each performing a different task, that is, threads are part of the process. At least one task is performed when a process is started, so there is at least one main thread, which in turn creates other child threads. The execution of multithreading is similar to that of multiprocessing, with the operating system quickly switching between multiple threads, allowing each thread to briefly alternate and appear to be executing simultaneously. Of course, multi-core cpus can be truly multi-threaded or multi-process simultaneous execution.

There are different characteristics between processes and threads. Each process has its own address space, memory, and data stack. The operating system manages all processes and allocates proper execution time for them. Since resources are independent between processes, different processes need to share information through IPC (inter-process communication). However, the system does not crash when a single process crashes. While multi-threading is executed in the same process, sharing the same data space, compared with the process, information sharing between threads is easier, but when a thread crash will cause the whole process crash.


Python GIL

Execution of Python code is controlled by the Python interpreter. At present, there are many Python interpreters, such as CPython, PyPy, Jython, etc. Among them, CPython is the most widely used Python interpreter. In theory, cpus with multiple cores can support simultaneous execution of multiple threads, but when Python was designed to execute Python code in the main loop of the Python interpreter, CPython uses the Global Interpreter Lock (GIL) mechanism () to manage access to the Interpreter. Python threads must compete for GIL permission to execute. Thus, on both single-core and multicore cpus, only one thread is being executed by the Python interpreter at any given time, which is the root reason why Python’s multithreading is sometimes inefficient on multicore cpus.

Note: With regard to the Python interpreter, simply put, any programming language needs to implement it in another language, such as C, which is implemented in machine language. Therefore, Python, as a programming language, can be divided into CPyhton, Pypy, Jython, etc.


Perform contrast

There are several main solutions to Python multitasking:

  • Start multi-process, each process has only one thread, through the multi-process to perform multi-task;
  • Start single process, start multi-thread in the process, through multi-thread to perform multi-task;
  • Start multi-process, start multiple threads in each process, and perform more tasks at the same time;

Due to the complex model of the third method, it is rarely used in practice. In this paper, based on the dual-core CPU(Intel(R) Core(TM)2 Duo CPU E7500@ 2.93GHz) hardware platform, Python2.7 was used to test the execution efficiency of computation-intensive and I/ o-intensive tasks in Linux.

(1) Computationally intensive testing. Computation-intensive tasks are characterized by a large number of computations that consume CPU resources throughout the time slice. Due to GIL mechanism, multi-threading cannot use multi-core to participate in calculation, but the overhead time of switching between multi-threads still exists, so multi-threading requires more execution time than single thread. However, in the multi-process, independent GIL locks do not affect each other, which can make full use of multi-core to participate in the calculation and accelerate the execution speed.

The test code is as follows: #! /usr/bin/python from threading import Thread from multiprocessing import Process,Manager from timeit import timeit def count(n): while n > 0: n-=1 def test_normal(): count(1000000) count(1000000) def test_Thread(): t1 = Thread(target=count,args=(1000000,)) t2 = Thread(target=count,args=(1000000,)) t1.start() t2.start() t1.join() t2.join() def test_Thread(): t1 = Thread(target=count,args=(1000000,)) t2 = Thread(target=count,args=(1000000,)) t1.start() t2.start() t1.join() t2.join() def test_Process(): t1 = Process(target=count,args=(1000000,)) t2 = Process(target=count,args=(1000000,)) t1.start() t2.start() t1.join() t2.join() if __name__ == '__main__': print "test_normal",timeit('test_normal()','from __main__ import test_normal',number=10) print "test_Thread",timeit('test_Thread()','from __main__ import test_Thread',number=10) print "test_Process",timeit('test_Process()','from __main__ import test_Process',number=10)Copy the code

The result is as follows:


Test_normal 1.19412684441


Test_Thread 1.88164782524


Test_Process 0.687992095947

Note: The Join function executes process/thread by thread until the child process/thread terminates

(2) I/O intensive testing. I/O intensive tasks are characterized by low CPU consumption, with tasks spending most of their time waiting for I/O operations to complete (I/O speeds are much lower than CPU and memory speeds). Here, the count() function content is replaced with time.sleep(0.5) to simulate I/O blocking using suspension. When suspended, the I/O task releases the GIL, allowing other concurrent threads to execute, increasing the efficiency of running the program. Since the cost of creating and destroying multiple processes is higher than that of multithreading, the thread execution efficiency is higher when two threads and processes are created, and the difference becomes more obvious when the number of threads and processes increases to 100.

The execution results of the two thread and process test generations are as follows:


Test_normal 10.0101339817


Test_Thread 5.00919413567


Test_Process 5.03114795685

Increase to 100 threads and processes test code:

def test_Thread():
    l = []
    for i in range(100):
        p = Thread(target=count,args=(1000000,))
        l.append(p)
        p.start()
    for j in l:
        j.join()
        
def test_Process():
    l = []
    for i in range(100):
        p = Process(target=count,args=(1000000,))
        l.append(p)
        p.start()
    for j in l:
        j.join()
Copy the code

The result is as follows:


Test_Thread 5.15325403214


Test_Process 5.84798789024


conclusion

Due to Python’s GIL limitations, multithreading is better suited for I/O intensive applications (such as typical crawlers). For computationally intensive applications, in order to achieve better parallelism, multiple processes can be used to enable other CPU cores to join the execution.