Why do people say Python multithreading is chicken? A question has been raised on Zhihu. In our common sense, multi-process and multi-threaded programs make full use of hardware resources in a concurrent way to improve the efficiency of the program, but in Python, how can they become useless?

Some of you might know the answer, because the notorious Python GIL, what is GIL? Why is GIL there? Is multithreading really a chicken rib? Can GIL be removed? With these questions in mind, we’re going to look at this with a little patience on your part.

Multi-threading is not easy, let’s do an experiment, the experiment is very simple, we decrease the number “100 million” to zero and the program is terminated, if we use a single thread to perform this task, what is the completion time? How much if you use multithreading? show me the code

# task
def decrement(n):
    while n > 0:
        n -= 1
Copy the code

Single thread

import time

start = time.time()
decrement(100000000)
cost = time.time() - start
>>> 6.541690826416016
Copy the code

On my 4-core CPU computer, the single-thread time was 6.5 seconds. One might ask, where is the thread? In fact, when any program is running, there will be a main thread executing by default. (I won’t expand on threads and processes here, but I’ll do a separate article.)

multithreading

import threading

start = time.time()

t1 = threading.Thread(target=decrement, args=[50000000])
t2 = threading.Thread(target=decrement, args=[50000000])

t1.start() # Start the thread to execute the task
t2.start() # same as above

t1.join() # Main thread blocks until T1 completes, then main thread continues to execute
t2.join() # same as above

cost = time.time() - start

>>>6.85541033744812
Copy the code

Create two subthreads, T1 and T2, and each thread performs 50 million subtraction. When both threads are finished, the main thread terminates the program. As a result, the two threads executing cooperatively is 6.8 seconds slower. When two threads are running in parallel on two cpus at the same time, the time should be halved.

What makes multithreading slow and slow?

The Cpython Interpreter has a Global Interpreter Lock that must be obtained before the Interpreter interprets and executes Python code. Only one thread may be executing code at any one time. In order for the CPU to execute code instructions, another thread must first acquire the lock. If the lock is held by another thread, the thread must wait until the thread holding the lock releases the lock before executing the code instructions.

Together so that is why two threads execute more slowly instead of reason, because the same time, only one thread running, other threads can only wait for, even if multicore cpus, didn’t also the way to multiple threads “parallel” to execute code at the same time, can only be interleaved execution, because multiple threads involved online switch, lock mechanism (acquiring a lock, Release locks, etc.), so multithreaded execution is slower.

When will GIL be released?

When a thread encounters an I/O task, it releases the GIL. A cpu-bound thread also releases the GIL when it executes the interpreter’s ticks 100 times (ticks can be roughly thought of as instructions in the Python virtual machine). You can set the step length by using sys.setCheckInterval () and check the step length by using sys.getCheckInterval (). These are more of the overhead of multiple threads than single threads

Why is the CPython interpreter designed this way?

Multithreading is in order to adapt to the rapid development of modern computer hardware make full use of the product of multi-core processors, through multi-threading CPU can be efficient utilization of resources, the Python was born in 1991, when the hardware configuration far there is no such luxury today, now a common server 32 nuclear 64 gb of memory are not commonplace, But multithreaded have a problem, how to solve the problem of sharing data synchronization and consistency, because, for multiple threads access Shared data, there are two possible thread to modify a data at the same time, if there is no suitable mechanism to ensure the consistency of the data, so program eventually lead to abnormal, so, the father of Python is a global thread lock, Regardless of whether your data synchronization problems, anyway, a global lock, to ensure data security. This is where multi-threading comes in, because it does not have fine-grained control over data security, but rather solves it in a simple and crude way.

This solution in the 90 s, it is no problem, after all, the hardware configuration of those days was very humble, CPU or mainstream, multithreaded application scenario, most of the time still running in a single thread, single-threaded don’t involve thread context switching, efficiency is higher than multi-threaded instead (in multi-core environment, This rule does not apply. Therefore, using the GIL approach to ensure data consistency and security is not necessarily undesirable, at least at the time it was a very low-cost way to implement.

So is it feasible to remove the GIL?

In 1999, two buddies Greg Stein and Mark Hammond created a branch of Python that removed the GIL and replaced it with a more fine-grained lock on all mutable data structures. However, after benchmarking, Python with the GIL removed performed nearly twice as slowly under single-threaded conditions.

Because of these considerations, removing the GIL would be of little value and would not require much effort.

summary

The CPython interpreter provides a GIL (global interpreter lock) to ensure that thread data is synchronized, so with GIL, do we need thread synchronization? How does multithreading perform in IO intensive tasks? Welcome to leave your comments

Public account: Zen of Python