0/ Outline of this article

<1> With the threading of many threads, we can use multiprocessing to compare the computation speed of cpu-intensive tasks. <4Copy the code

1/ With threading, why use multiprocessing

GIL: The GIL will be released based on the number of bytecode lines and timeslices executed. The multiprocessing module is a module introduced by Python to solve GIL defects. The principle is to use multiple processes to execute in parallel on multiple cpus (cores).Copy the code

As you can see from the above figure, for IO intensive tasks, using Python multithreading, even though the task is executed concurrently (not in parallel) due to the presence of the GIL, all threads can still run faster due to the presence of the IO. For example, thread1 executes first, waits until thread1 encounters an IO, releases the GIL, and thread2 retrieves the GIL and executes while thread1 is IO. However, for CPU-intensive tasks with little or no IO, if you use multi-threading, the thread will automatically switch when you run 100 ticks, which becomes a burden and slows down the execution speed. So CPU intensive tasks are not suitable for multi-threading.Copy the code

2/ Multiprocessing knowledge combing

The threading module provided by Python is similar to the Multiprocessing module in that the syntax is almost identical. This was a deliberate effort by Python officials to make it easy for developers to migrate seamlessly.Copy the code

3/ Code combat: single thread, multi-thread, multi-process compared to CPU intensive task computing speed

import time 
import math
from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor

PRIMES = [56687541674151654] * 100  # list of size 100, each element is large

# check whether a number is prime
def func_is_prime(n) :
   if n < 2:
      return False
   if n == 2:
      return True
   if n % 2= =0:
      return False
   
   sqrt_n = int( math,floot(math.sqrt(n)) )
   for i in range(3,sqrt_n+1.2) :if n % i == 0:
         return False
   
   return True

# single thread
def single_thread() :
   for i in PRIMES:
      func_is_prime(i)

# multithreaded
def multi_thread() :
   with ThreadPoolExecutor as pool:
      pool.map(func_is_prime,PRIMES)
  
# multiple processes
def multi_process() :
   with ProcessPoolExecutor as pool:
      pool.map(func_is_prime,PRIMES)
      
      
if __name__ == "__main__":
    start_time = time.time()
    single_thread()
    end_time = time.time()
    print('Single thread time is:',end_time - start-time)
   
    start_time = time.time()
    multi_thread()
    end_time = time.time()
    print('Multithreading takes:',end_time - start-time)
    
    start_time = time.time()
    multi_process()
    end_time = time.time()
    print('Multi-process time is:',end_time - start-time)
    
Copy the code

4/ Finally, the pooling technique is explained

< 1 > base class Executor

The Executor class is the base class for ThreadPoolExecutor and ProcessPoolExecutor. It provides the following methods: (1) Submit (fn,*args,**kwargs) submit the task. Execute as fn(*args **kwargs) and return the Future object. Fn: function address. * ARgs: position parameter. Map (func,*iterables,timeout=None,chunksize=1) 1) func: address of the function. 2) Iterables: an iterable that iterates arguments to functions. 3) timeout: This parameter is not clear, if None wait for all processes to finish. 4) Chunksize: With ProcessPoolExecutor, this method splits iterables into chunks and submits them to the execution pool as separate tasks. The number of these blocks can be set by specifying chunksize. For very long iterators, resetting the chunksize value can significantly improve performance over the default of 1. Chunksize has no effect on ThreadPoolExecutor. 5) Shutdown (wait=True) : If True, wait for the thread pool or process pool to complete execution and release the resource in use. If wait=False, this returns immediately, and all pending futures are freed once they have finished executing. Regardless of the value of wait, the entire Python program will wait until all pending executables have finished executing.Copy the code

<2> Thread pool object

ThreadPoolExecutor is a subclass of Executor. The following describes the parameters of ThreadPoolExecutor. Class concurrent. Futures. ThreadPoolExecutor (max_workers = None, thread_name_prefix = ",, initializer = None, initargs = ()) : Max_workers: number of thread pools. Thread_name_prefix: prefix of the thread name. Default thread name ThreadPoolExecutor- Number of threads. Initializer: A function or method that is called before a thread is enabled (to add additional tasks to the thread pool). Initargs passes arguments to the function in Initializer as primitives. It should be noted that except for max_workers, the other three parameters are rarely used. Max_workers is the number of threads that can be placed in a thread pool.Copy the code

<3> Process pool object

ProcessPoolExecutor is a subclass of Executor.  class concurrent.futures.ProcessPoolExecutor( max_workers=None, mp_context=None, initializer=None, Initargs =()) max_workers: Number of processes, and the maximum number of processes in the process pool, and how many processes we intend to start. If max_workers is None or not given (the default), it defaults to the number of processors (and total CPU cores) on the machine. If max_workers is less than or equal to 0, ValueError is raised. On Windows, max_workers must be less than or equal to 61 or ValueError will be raised. If max_workers is None, the default selected is at most 61, even if more processors are present. Mp_context: can be a multi-process context or None. It will be used to start the worker process. If mp_context is None or not given, the default multiprocess context is used. Initializer: A function or method that is called before a thread is enabled. Initargs: Passes arguments as tuples to the function in Initializer. There is no more to say about initializer and initargs being similar to ThreadPoolExecutor.Copy the code