@ the Author: Runsen

In fact, another important topic for Python multithreading is called the Global Interpreter Lock (GIL).

Multiple threads are not necessarily faster than single threads

In Python, multitasking can be achieved by multi-processing, multi-threading, and multi-coroutines. Is multithreading necessarily faster than single-threading?

Let me prove my point with a piece of code.

"' @ Author: Runsen @ WeChat public number: the king of the Python @ blog: https://blog.csdn.net/weixin_44510615 @ Date: 2020/6/4 ' ' '

import threading, time
def my_counter() :
    i = 0
    for _ in range(100000000):
        i = i+1
    return True

def main1() :
    start_time = time.time()
    for tid in range(2):
        t = threading.Thread(target=my_counter)
        t.start()
        t.join()  # Join causes the main thread to block on the first loop, but the second thread is not started, so the two threads are executed sequentially

    print("Single-thread sequential execution total_time: {}".format(time.time() - start_time))

def main2() :
    thread_ary = {}
    start_time = time.time()
    for tid in range(2):
        t = threading.Thread(target=my_counter)
        t.start()
        thread_ary[tid] = t

    for i in range(2):
        thread_ary[i].join()  Both threads are started, so both threads are concurrent

    print("Multi-threaded execution total_time: {}".format(time.time() - start_time))

if __name__ == "__main__":
    main1()
    main2()
Copy the code

The results

Single-thread sequential execution of total_time:17.754502773284912Multithreaded execution total_time:20.01178550720215
Copy the code

I’m afraid you said I messed up the result, I’d better cut a picture to see more clearly

At this point, I wondered: is there something wrong with my machine? That’s not the case. Essentially, Python’s thread is dead and does not perform parallel computation.

Python threads, indeed, encapsulate the underlying operating system threads, which are pthreads (full POSIX Thread) on Linux and Windows threads on Windows. In addition, Python threads are completely managed by the operating system, such as coordinating when to execute, managing memory resources, managing interrupts, and so on.

GIL is not a Python feature

The concept of GIL is explained in a simple sentence: a single CPython interpreter can only execute one bytecode at any one time, regardless of the number of threads. Some points to note about this definition:

The first thing to make clear is that GIL is not a Python feature; it is a concept introduced when implementing the Python parser (CPython).

C++ is a set of language (syntactic) standards, but can be compiled into executable code using different compilers. Well-known compilers such as GCC, INTEL C++, Visual C++, etc.

In Python, the same code can be executed using CPython, PyPy, Psyco, etc.

Other Python interpreters do not necessarily have a GIL. For example, Jython (JVM) and IronPython (CLR) have no GIL, while CPython, PyPy have GIL;

Because CPython is the default Python execution environment for most environments. Therefore, in many people’s concept of CPython as Python, GIL is automatically attributed to the defects of the Python language. So let’s be clear here: GIL is not a Python feature, and Python can do without GIL

The GIL is essentially a mutex

GIL is a mutex in essence. Since it is a mutex, the essence of all mutex is the same, which is to turn concurrent operation into serial operation, so as to control that shared data can only be modified by one task at the same time, thus ensuring data security.

One thing is for sure: different locks should be used to secure different data.

How GIL works: Here is an example of GIL working in Python. Threads 1, 2, and 3 are executed in turn. When each Thread starts executing, it locks the GIL to prevent other threads from executing. Similarly, after each thread completes a segment, the GIL is released to allow other threads to start using the resource.

Computation-intensive

Computation-intensive tasks are characterized by a large number of computations that consume CPU resources.

Let’s start with a simple computationally intensive example:

"' @ Author: Runsen @ WeChat public number: the king of the Python @ blog: https://blog.csdn.net/weixin_44510615 @ Date: 2020/6/4 ' ' '
import time
COUNT = 50 _000_000

def count_down() :
   global COUNT
   while COUNT > 0:
       COUNT -= 1

s = time.perf_counter()
count_down()
c = time.perf_counter() - s
print('time taken in seconds - >:', c)

time taken in seconds - >: 9.2957003

Copy the code

This is a single thread, the time is 9s, let’s use two threads to see what happens:

"' @ Author: Runsen @ WeChat public number: the king of the Python @ blog: https://blog.csdn.net/weixin_44510615 @ Date: 2020/6/4 ' ' '
import time
from threading import Thread

COUNT = 50 _000_000

def count_down() :
   global COUNT
   while COUNT > 0:
       COUNT -= 1

s = time.perf_counter()
t1 = Thread(target=count_down)
t2 = Thread(target=count_down)
t1.start()
t2.start()
t1.join()
t2.join()
c = time.perf_counter() - s
print('time taken in seconds - >:', c)

time taken in seconds - >: 17.110625

Copy the code

The main operation of our program is in the calculation, the CPU did not wait, but changed to multi-threading, after increasing the thread, frequent switching between threads, increased the time overhead, time will of course increase.

The other type is IO intensive. Tasks involving network and disk I/O are IO intensive tasks. These tasks consume little CPU and spend most of the time waiting for I/O operations to complete (because the SPEED of I/O is far lower than that of CPU and memory). For IO intensive tasks, the more tasks, the higher THE CPU efficiency, but there is a limit. Most common tasks are IO intensive tasks, such as Web applications.

Conclusion: For IO intensive work (Python crawlers), multithreading can greatly improve code efficiency. For CPU intensive (Python data analysis, machine learning, deep learning), multithreading may be slightly less efficient than single-threading. Therefore, in the field of data, there is no multi-threading to improve efficiency, only the CPU to GPU, TPU to improve computing capacity.

Reference: Geek Time Python column ~

GitHub, portal ~, GitHub, portal ~, GitHub, Portal ~