GIL(Global interpreter lock)

Python is an ancient language, and one of the things we have to talk about to understand the multithreading and multiprocessing and coroutines of the language, and to understand when to use multithreading and when to use multiprocessing or coroutines, is Python’s GIL (global interpreter lock). In this article, we will see what happened to GIL.

1. What is GIL?

First let’s see what GIL is. It is important to be clear that GIL is not a Python feature, but a concept introduced during the implementation of the Python parser (CPython). C++ is a set of language (syntactic) standards that can be compiled into executable code using different compilers. Well-known compilers such as GCC, INTEL C++, Visual C++, etc. In Python, the same code can be executed using CPython, PyPy, Psyco, etc. JPython, for example, has no GIL. However, because CPython is the default Python execution environment for most environments. Therefore, in many people’s concept of CPython as Python, GIL is automatically attributed to the defects of the Python language. So let’s be clear here: GIL is not a Python feature, and Python can do without GIL.

What about the GIL in the CPython implementation? GIL stands for Global Interpreter Lock so as not to mislead, let’s take a look at the official explanation:

In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly CPython’s memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.)

In CPython, the global interpreter lock (GIL) is a mutex that prevents multiple native threads from executing Python bytecode simultaneously. This lock is required mainly because CPython’s memory management is not thread-safe (however, other features have evolved to rely on its guarantee of execution due to the presence of the GIL).

This looks like a Bug that prevents multiple threads from executing machine code simultaneously. Why does it exist?

2. Why does GIL exist?

The GIL problem is actually caused by the gradual evolution of application programs and operating systems from multi-task single core to multi-task and multi-core in recent decades. Multiple threading tasks are scheduled on an ancient single-core CPU, and everyone shares a global lock with each other. Whoever executes on the CPU owns the lock. Until the thread gives up CPU due to IO operation or Timer Tick expiration, the thread that is not executing quietly waits for the lock (they should have nothing to do but wait). The following diagram illustrates the thread scheduling for a single-core CPU:

Obviously, on a modern multi-core processor, the above model has a lot of room for optimization, as threaded tasks that could only wait can now be scheduled for concurrent execution on other idle cores. Due to the old GIL mechanism, if thread 2 needs to execute on CPU 2, it needs to wait for thread 1 executing on CPU 1 to release the GIL (remember: the GIL is global). If thread 1 gave up the GIL because of I/O blocking, thread 2 must get the GIL. But if thread 1 loses the GIL because the timer ticks count reaches 100, then thread 1 and thread 2 are playing fair. However, in Python 2.x, thread 1 does not dynamically adjust its priority, so there is a high probability that thread 1 will be selected to execute the next time. For many of these election cycles, thread 2 can only watch thread 1 happily execute on CPU 1 with the GIL.

In slightly more extreme cases, such as thread 1 executing on CPU 1 using while True, it is “one core, eight cores”, as shown below:

Next, let’s use real code to see how the presence of a GIL affects multithreaded execution (using a loop to simulate time-consuming operations) :

# coding:utf-8
import threading, time

def my_counter() :
    i = 0
    for _ in range(100000000):
        i = i+1
    return True

def main1() :
    thread_ary = {}
    start_time = time.time()
    for tid in range(2):
        t = threading.Thread(target=my_counter)
        t.start()
        t.join()  # Join causes the main thread to block on the first loop, but the second thread is not started, so the two threads are executed sequentially

    print("Single-thread sequential execution total_time: {}".format(time.time() - start_time))

def main2() :
    thread_ary = {}
    start_time = time.time()
    for tid in range(2):
        t = threading.Thread(target=my_counter)
        t.start()
        thread_ary[tid] = t

    for i in range(2):
        thread_ary[i].join()  Both threads are started, so both threads are concurrent

    print(Concurrent execution total_time: {}.format(time.time() - start_time))

if __name__ == "__main__":
    main1()
    main2()
Copy the code

The above code, running on Python3, takes about the same time both concurrently and sequentially, which fully demonstrates that the GIL does have an impact on the efficiency of multithreaded programs in this case. If you’re running Python2, the gap is even more pronounced.

Results running on Python3.6 (GIL has been greatly optimized since Python3.2)

The result of running on Python2.7

3. Does the GIL mean thread-safe

Having a GIL doesn’t necessarily mean Python is thread-safe, so we have to figure out when it’s safe and when it’s not. As mentioned earlier, a thread can release the global interpreter lock in two cases: it can release the GIL before the thread enters the IO operation, and it can also release the GIL after the interpreter has run 1000 bytecode (Py2) or 15 milliseconds (Py3) without a break. Since a thread can lose its GIL at any time, there must be thread safety issues involved. GIL while considering the thread from the starting point of design, is safe, but this kind of thread safety is coarse-grained thread-safe, which don’t need a programmer myself to lock the thread (in the same way, the so-called fine-grained means that programmers need to add, unlock to ensure thread safety, typical representative is Java, and CPthon coarse-grained lock, The language layer itself maintains a global locking mechanism to keep threads safe. So when you need to lock and when you don’t need to lock, it’s a case by case analysis. Let’s analyze and summarize each of these scenarios.

Let’s start with the first case where the thread releases the GIL. Assume that thread A now because in IO operation actively released GIL, so in this case, because the thread A IO operations waiting time not sure, so waiting thread B will get GIL lock, the comparison “polite” we generally called “collaborative multitasking,” equivalent to everybody negotiated in accordance with the rules, the thread is safe, No extra locking is required.

Next, let’s look at another case where thread A abandons the GIL because the interpreter executes 1000 bytecode instructions non-stop or runs uninterrupted for 15 milliseconds. In this case, thread A and thread B will actually compete for the GIL lock at the same time. In the case of simultaneous competition, the fact that the competition is successful is a result of uncertainty, so it is commonly referred to as “preemptive multitasking,” in which case, of course, it depends on who wins. Of course, in python3 thread B has a higher priority because the GIL is optimized and the thread priority is dynamically adjusted, but there is still no guarantee that thread B will get the GIL. In this case, the thread might be in an unsafe state. For this purely computational operation, let’s use a piece of code to demonstrate this thread unsafe state. The code is as follows:

import threading

n = 0

def add() :
    global n
    for i in range(1000000):
    n = n + 1

def sub() :
    global n
    for i in range(1000000):
    n = n - 1

if __name__ == "__main__":
t1 = threading.Thread(target=add)
t2 = threading.Thread(target=sub)

t1.start()
t2.start()

t1.join()
t2.join()

print("The value of n is:, n)
Copy the code

The above code is simple, adding and subtracting the global variable N 1,000,000 times with thread 1 and thread 2, respectively. If thread-safe, then the final result should still be zero. But in fact, when we run it, we’re going to see that this n is sometimes large and sometimes small, completely indeterminate. This is a typical thread-unsafe problem with multiple threads operating on the same global variable. We understand the problem, but we will only discuss the cause of the problem here, and we will discuss how to solve the problem by locking in a later article.

Next, let’s examine the causes of this problem at the code level. In the thread, we basically perform an addition and subtraction operation. For the sake of illustration, let’s simplify the functions to an addition function and a subtraction function to analyze their bytecode execution and see how releasing the GIL lock causes the problem. The demo code is as follows:

import dis
n = 0

def add() :
    global n
    n = n + 1

print(dis.dis(add))

def sub() :
    global n
    n = n - 1
print(dis.dis(sub))
Copy the code

Dis method in dis module can print out the bytecode execution process corresponding to a function, so it is very convenient for us to analyze. The running results are as follows:

Thread insecurity It is important to understand the process

Both addition and subtraction are done in four steps. For example, the first step is LOAD_GLOBAL(loading global variable N), the second step is LOAD_CONST(loading constant 1), the third step is to add the binary, the fourth step is to store the calculation result in the global variable N, the end of the addition calculation. These four orders if the guarantee is run as a whole, complete, then is not a problem, but based on the principle of the front threads said the release of GIL, so probably in a thread is executing any step of the four steps when released GIL and has reached the awaited state, what happened this time is more interesting. To help you understand, let me take an extreme case. For example, when we are about to execute the fourth step in the addition operation, we unfortunately lose the GIL and enter the wait state (note that the n value is still 0 at this time). The subtraction thread starts executing, it loads the global variable N (with a value of 0) and performs subtraction-related computations, it also loses its GIL on step 3, at which point it enters the wait state and the addition continues. The last addition continues with step 4, which assigns the result of the addition to the global variable n, which has a value of 1. Similarly, when the subtraction takes back the GIL, it has already loaded the value of n with a value of 0, so it continues to the last assignment and the value of n is 0-1=-1. In other words, the value of n is either 1 or -1, but we expect 0. This creates a thread unsafe situation. Eventually, after millions of such uncertain addition and subtraction, the result must be uncertain. This is the process and cause of the problem.

Next, we need to address the issue of whether all GIL releases caused by non-IO operations need to be locked to address thread-safe issues, since the GIL is potentially thread unsafe from coarse-grained conditions. This problem also depends, because Python has many atomic operations compared to other thread-free languages such as Java. For atomic operations, since the method itself is a single bytecode, there is no way for a thread to abandon the GIL during a call. As a typical example of the sort method, we can also see what this atomic operation looks like in Python bytecode, as shown below:

import dis

lst = [4.1.3.2]

def foo() :
    lst.sort()

print(dis.dis(foo))
Copy the code

The results after running are as follows:

From the point of view of bytecode, calling sort is atomically non-separable, so the thread does not release the GIL during execution, which means we can consider the sort operation to be thread-safe without locking. The addition and subtraction operations we demonstrated above are not atomic, so we must lock them to be thread safe.

So, to summarize, if multithreaded operations are not IO intensive, and computations are not atomic operations, then we need to consider thread safety, otherwise we don’t need to consider thread safety at all. Of course, to avoid worrying about which operation is atomic, we can follow a simple rule: always lock around reads and writes that share mutable state. After all, getting a threading.Lock in Python is just a line of code. 支那

4 How to Avoid the Impact of GIL

Two suggestions:

In IO intensive applications, where I/O operations are the main task, the performance difference between multi-threading and multi-process is not significant. The reason is that even if there is GIL lock in Python, the I/O operations in the thread will make the thread immediately release the GIL and switch to other non-I/O threads to continue the operation, thus improving the program execution efficiency. Compared with process operation, thread operation is more lightweight and the communication complexity between threads is lower. Therefore, multi-threading is recommended.
For computationally intensive applications, use multiple processes or coroutines instead of multithreading.

Resources: Understanding GIL in Depth: How to Write High-performance and Thread-safe Python Code, What Python Interviews Need to know: GIL.

Related Posts

Storage performance acceleration engine prefetch

Tired of Docker, try Vagrant

Using technology this way, programmers are happier