1. GIL

If you are familiar with Python, you know that there is a global interpreter lock in python interpreters written in C language. Because of the global interpreter lock, the Python interpreter can only run the code of one thread at a time, which greatly affects the performance of Python multithreading. This interpreter lock is now almost impossible to remove for historical reasons.

The reason why Python GIL affects the performance of multithreading is that, in the case of multithreading, only when the thread obtains a global lock, the code of the thread can run, and there is only one global lock. Therefore, using Python multithreading, only one thread is running at the same time. Therefore, even in the case of multi-core, it can only play a single-core performance.

2. Multithreading processing IO intensive tasks

IO intensive tasks mean that the CPU performance of the system is much better than that of hard disks or memory. In this case, the CPU is waiting for I/ OS (hard disks or memory) to read or write, and the CPU Loading is not high. Tasks involving network and disk I/O are IO intensive tasks. When a thread is executing an IO intensive task, the CPU is idle, so the GIL is released to other threads, reducing the overall wait time.

from concurrent.futures import ThreadPoolExecutor
from time importSleep, time # Worker number N =4Pool = ThreadPoolExecutor(max_workers=N)Copy the code

2.1 Define an IO – intensive function

This function “sleeps” for x seconds.

def io_bound_func(x):
    sleep(x)
    print("Sleep for %d seconds." % x)
Copy the code

2.2 Use serial processing

To iterate over all the elements of a list, perform the func function.

def process_array(arr):
    for x in arr:
        io_bound_func(x)
Copy the code

2.3 Use multi-threading

Using the thread pool’s Map method, you can apply the same function to all elements in the list.

def fast_process_array(arr):
    for x in pool.map(io_bound_func, arr):
        pass
Copy the code

2.4 Calculate the running time of the function

  • Serial version running time = 1 + 2 + 3 = 6 seconds
  • Multithreaded version running time = Max (1, 2, 3) = 3 seconds
def time_it(fn, *args):
    start = time()
    fn(*args)
    print("The %s version runs in %.5f seconds!" % (fn.__name__, time() - start))
time_it(process_array, [1.2.3])
Copy the code
Sleep for 1 seconds.
Sleep for 2 seconds.
Sleep for 3The runtime of the seconds.process_array version is6.00883Seconds!Copy the code
time_it(fast_process_array, [1.2.3])
Copy the code
Sleep for 1 seconds.
Sleep for 2 seconds.
Sleep for 3The runtime of the fast_process_array version is3.00300Seconds!Copy the code

3. Multi-threaded CPU intensive tasks

Cpu-intensive tasks, such as calculating PI or decoding video in high definition, are characterized by a large amount of computation that consumes CPU resources. When a thread is performing a CPU-intensive task and the CPU is busy, the GIL will be released to other threads after running 1000 bytes of bytecode, plus the thread switching time may be slower than serial code.

3.1 Define a CPU-intensive function

This function sums the integers between [1, x].

def cpu_bound_func(x):
    tot = 0
    a = 1
    while a <= x:
        tot += x
        a += 1
    print("Finish sum from 1 to %d!" % x)
    return tot
Copy the code

3.2 Use serial processing

To iterate over all the elements of a list, perform the func function.

def process_array(arr):
    for x in arr:
        cpu_bound_func(x)
Copy the code

3.3 Use multi-threading

Using the thread pool’s Map method, you can apply the same function to all elements in the list.

def fast_process_array(arr):
    for x in pool.map(cpu_bound_func, arr):
        pass
Copy the code

3.4 Calculate the running time of the function

  • The serial version runs in 2.1 seconds
  • The multithreaded version runs in 2.2 seconds
def time_it(fn, *args):
    start = time()
    fn(*args)
    print("The %s version runs in %.5f seconds!" % (fn.__name__, time() - start))
time_it(process_array, [10支那7.10支那7.10支那7])
Copy the code
Finish sum from 1 to 10000000!
Finish sum from 1 to 10000000!
Finish sum from 1 to 10000000! The process_array version runs in time2.10489Seconds!Copy the code
time_it(fast_process_array, [10支那7.10支那7.10支那7])
Copy the code
Finish sum from 1 to 10000000!
Finish sum from 1 to 10000000!
Finish sum from 1 to 10000000! The runtime of the fast_PROCESS_array version is2.20897Seconds!Copy the code

Refer to the article

1, in Python GIL lock: https://www.jianshu.com/p/c75ed8a6e9af

2. What is CPU intensive and IO intensive? : https://www.cnblogs.com/tusheng/articles/10630662.html.

Author: Li Xiaowen, engaged in data analysis and data mining work successively, mainly developed the language Python, and now works as an algorithm engineer in a small Internet company.

Github: github.com/tushushu

Read more

Someone poisoned the code! Use the PIP install command \ with caution

PIP 20.3 new release released! Getting rid of Python 2.x\

Learn the Hook function \ in Python in 5 minutes

Special recommendation \

Programmer’s guide to fish

For your selection of Silicon Valley geeks,

From FLAG giant developers, technology, venture capital first-hand news

\

Click below to read the article and join the community

Give it a thumbs up