• Intro to Threads and Processes in Python
  • Brendan Fortuner
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: lsvih
  • Proofreader: Yqian1991

Beginner’s guide to parallel programming

In Kaggle’s Understanding the Amazon from Space contest, I tried to speed up various parts of my code. Speed is of the essence in Kaggle. High rankings often require trying hundreds of model constructs and hyperparameter combinations, and saving 10 seconds in an epoch that lasts a minute is a huge win.

To my surprise, data processing is the biggest bottleneck. I used Numpy’s matrix rotation, matrix flip, scale and crop operations on the CPU. Numpy and Pytorch’s DataLoader use parallel processing in some cases. I would run three to five experiments at a time, each with its own data processing. But this seems inefficient, and I wanted to see if I could use parallel processing to speed up all my experiments.

What is parallel processing?

This can be as simple as doing two things at the same time, running code on different cpus, or making the most of “wasted” CPU cycles while the program waits for external resources (file loads, API calls, etc.).

The following example is a “normal” program. It uses a single thread to download a list of urls in turn.

Here is the same program, but with two threads. It splits the LIST of urls between different threads, almost doubling the processing speed.

If you are curious about how to draw the diagram above, you can refer to the source code, and here is a brief introduction:

  1. Add a timer to your function and return the start and end times of the function
URLS = [url1, url2, url3, ...]  def download(url, base): start = time.time() - base resp = urlopen(url) stop = time.time() - basereturn start,stop
Copy the code
  1. A single-threaded program can be visualized as follows: Execute your function multiple times and store multiple start and end times
results = [download(url, 1) for url in URLS]
Copy the code
  1. Transpose the result array of [start, stop] to draw a bar graph
def visualize_runtimes(results): Start,stop = np.array(results).t plt.barh(range(len(start)), stop-start, left=start) plt.grid(axis= 'x') plt.ylabel("Tasks")
    plt.xlabel("Seconds")
Copy the code

Multithreaded diagrams are generated similarly. Python’s concurrent libraries can also return result arrays.

Process vs. Thread

A process is an instance of a program (such as a Jupyter notebook or Python interpreter). The process starts threads (subprocesses) to handle some subtasks (such as pressing buttons, loading HTML pages, saving files, and so on). Threads live inside processes and share the same memory space.

Example: Microsoft Word When you open Word, you create a process. When you start typing, the process starts several threads: one for getting keyboard input, one for displaying text, one for auto-saving files, and one for spell-checking. After starting these threads, Word makes better use of idle CPU time (the time you spend waiting for keyboard input or files to load) to make you more productive.

process

  • Created by the operating system to run the program
  • A process can contain multiple threads
  • Two processes can execute code simultaneously in a Python program
  • Starting and terminating processes takes more time, so using processes is more expensive than using threads
  • Because processes do not share memory space, exchanging information between processes is much slower than exchanging information between threads. In Python, the exchange of information using serialized data structures (such as groups of numbers) takes IO processing level time.

thread

  • A thread is something like a mini-process inside a process
  • Different threads share the same memory space and can read and write the same variables efficiently
  • Two threads cannot execute code in the same Python program (there is a solution to this problem*)

CPU vs nuclear

The CPU, or processor, manages the most basic computing work of a computer. A CPU has one or more cores that allow the CPU to execute code simultaneously.

If there is only one core, there will be no speed increase for CPU-intensive tasks such as loops, calculations, etc. The operating system needs to switch back and forth between tasks in very small time slices. As a result, multitasking can degrade performance when doing trivial tasks, such as downloading images. The reason for this is that there is also a performance overhead when starting and maintaining multiple tasks.

Python GIL lock problem

CPython (the standard implementation of Python) has a thing called a GIL (Global explain lock) that prevents two threads from executing simultaneously in a program. Some people dislike it very much, but some people like it. There are several ways around this, but libraries like Numpy mostly get around this limitation by executing external C code.

When to use threads and when to use processes?

  • Thanks to multicore and the absence of a GIL, multicore can speed up CPU-intensive Python programs.
  • Multithreading works well for IO tasks or tasks involving external systems because threads can combine different tasks efficiently. The process needs to serialize the results to aggregate multiple results, which takes extra time.
  • Because of the GIL, multithreading doesn’t help cpu-intensive Python programs.

* For certain operations, such as dot products, Numpy bypasses Python’s GIL lock and can execute code in parallel.

Parallel processing example

Python’s concurrent.futures library is a breeze to use. You simply pass it functions, a list of objects to be processed, and the number of concurrent requests. In the sections that follow, I’ll demonstrate when to use threads and when to use processes with several experiments.

def multithreading(func, args, 
                   workers):
    with ThreadPoolExecutor(workers) as ex:
        res = ex.map(func, args)
    return list(res)

def multiprocessing(func, args, 
                    workers):
    with ProcessPoolExecutor(work) as ex:
        res = ex.map(func, args)
    return list(res)
Copy the code

API call

For API calls, multithreading is significantly faster than serial and multi-process processing.

def download(url):
    try:
        resp = urlopen(url)
    except Exception as e:
        print ('ERROR: %s' % e)
Copy the code

Two threads

Four threads

Two processes

Four processes

IO intensive task

I passed in a huge text to observe thread and process write performance. Threading works better, but multiple processes can also speed things up.

def io_heavy(text):
    f = open('output.txt'.'wt', encoding='utf-8')
    f.write(text)
    f.close()
Copy the code

serial

%timeit -n 1 [io_heavy(TEXT,1) for i inRange (N)] >> 1 loop, best of 3: 1.37s per loopCopy the code

Four threads

Four processes

CPU intensive task

With no GIL and the ability to execute code on multiple cores at the same time, multi-processes naturally win out.

def cpu_heavy(n):
    count = 0
    for i in range(n):
        count += i
Copy the code

Serial: 4.2 seconds 4 threads: 6.5 seconds 4 processes: 1.9 seconds

The dot product of Numpy

Not surprisingly, neither multithreading nor multi-processing will help this code. Numpy executes external C code behind the scenes, bypassing the GIL.

def dot_product(i, base):
    start = time.time() - base
    res = np.dot(a,b)
    stop = time.time() - base
    return start,stop

Copy the code

Serial: 2.8 seconds 2 threads: 3.4 seconds 2 processes: 3.3 seconds

Please refer to the Notebook of these experiments here, and you can reproduce them yourself.

The related resources

Here are some of my references as I explore this topic. This blog post by Nathan Grigg was especially recommended and inspired me with visualizations.

  • Multiprocessing vs Threading Python: I am trying to understand the advantages of multiprocessing over threading. I know that multiprocessing gets around the…

  • multithreaded blas in python/numpy: I re-run the benchmark on our new HPC. Both the hardware as well as the software stack changed from the setup in…

  • Amdahl’s law – Wikipedia: In Computer architecture, Amdahl’s law (or Amdahl’s argument) is a formula which gives the theoretical speedup In…

  • How Linux handles threads and process scheduling: The Linux kernel scheduler is actually scheduling tasks, and these are either threads or (single-threaded) processes…

  • Optimal number of threads per core【 CITE 】 : Let’s say I have a 4-core CPU, and I want to run some process in the minimum amount of time. The process is cited…

  • How many threads is too many?: I am writing a server, and I branch each action of into a thread when the request is incoming. I do this because almost…

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.