Concurrent and parallel | Python implementation multi-thread threading and multi-process multiprocessing

Last night, it was my turn to report the technical content of the group. Recently, I have been dealing with Ray and Spark. I just want to talk about concurrency and parallelism. Anyway, we are all in the management school, and we seldom touch this kind of stuff, so this topic is unlikely to be laughed at because of the content basis.

This article deals with concurrency and parallelism. Attached is a very simple Python code that involves the use of the built-in libraries Threading and MultiProcessing.

Concurrency and parallelism

Let’s simply use multiple threads for concurrency and multiple processes for parallelism. Multithreading concurrency emphasizes performance; Multi-process parallelism puts more emphasis on raising the performance ceiling.

Let me illustrate this with a very simple and lax metaphor.

multithreading

One CPU is equal to one student.

A student has a group meeting once a week, in other words, reports to the teacher once a week.

Teachers usually assign several tasks to students at the same time, such as doing a competition, a project, and reading a paper. Students may do the competition on Monday, read the paper on Tuesday, and do the project on Wednesday… At the group meeting, he reported all the three things. The teacher was very pleased, because from the perspective of the teacher, the students were doing these three things at the same time.

Multithreading is the same thing. Let’s say your phone only has a single CPU. Your CPU spends the first 0.01 seconds playing music, the next 0.01 parsing web pages… In your point of view: playing music and parsing the web are simultaneous. You can surf the Internet while listening to music

What does it mean to take full advantage of performance? If the student has only one job, he or she may spend only two days of the week on the task, leaving time to do it. . Therefore, we use “multithreading” to allow students to achieve “concurrency” and make full use of students’ abilities.

In practice, the words multithreading and high concurrency are more often used in server-side programs. For example, a network connection by a thread, a CPU can be responsible for processing multiple asynchronous requests, greatly improving CPU utilization.

Multiple processes

Multiple CPUs (multiple cores of CPUs) are equivalent to multiple students.

A task can be broken down into several tasks that cooperate with each other and run simultaneously, or it is a multi-process.

For example, for graduate courses, the teacher has to leave a paper assignment, I go to graduate school, leave what big assignment.

Let’s do multithreading in parallel. Once you’ve decided on the general idea, just write the rest. We have a total of four students in our team, then:

  • Student A is responsible for the Introduction
  • Student B is responsible for Background
  • Student C is responsible for Related Works
  • Student Ding is responsible for Methodology

This is the objection raised by student B: we should not finish the Introduction first and then write the Background, shall we do it one by one?

Eldest brother, all graduate student ao, homework to fool fool almost got ah. You write when you are told.

Predictably, doing all four of these at the same time is faster than writing four of them alone.

So multi-process parallelism raises the performance ceiling.

In practice, multi-process is more associated with high performance computing and distributed computing.

Python implementation

Let’s first state our experimental environment.

> Python -- Version Python 3.8.5

Let’s set up a task: find the value of the Euler function of the number.

def euler_func(n: int) -> int:
    res = n
    i = 2
    while i <= n // i:
        if n % i == 0:
            res = res // i * (i - 1)
            while (n % i == 0): n = n // i
        i += 1
    if n > 1:
        res = res // n * (n - 1)
    return res

Finding the Euler of a number might be quick, but what about a bunch of numbers?

So I wanted to do it in parallel.

Let’s divide the task into three parts.

task1 = list(range(2, 50000, 3))  # 2, 5, ...
task2 = list(range(3, 50000, 3))  # 3, 6, ...
task3 = list(range(4, 50000, 3))  # 4, 7, ...

def job(task: List):
    for t in task:
        euler_func(t)

Let’s take a look at a normal serial.

@timer
def normal():
    job(task1)
    job(task2)
    job(task3)

Finish task1 and then task2… Yes, that’s all right.

Look at multithreading?

import threading as th

@timer
def mutlthread():
    th1 = th.Thread(target=job, args=(task1, ))
    th2 = th.Thread(target=job, args=(task2, ))
    th3 = th.Thread(target=job, args=(task3, ))

    th1.start()
    th2.start()
    th3.start()

    th1.join()
    th2.join()
    th3.join()

How about multiple processes?

import multiprocessing as mp

@timer
def multcore():
    p1 = mp.Process(target=job, args=(task1, ))
    p2 = mp.Process(target=job, args=(task2, ))
    p3 = mp.Process(target=job, args=(task3, ))

    p1.start()
    p2.start()
    p3.start()

    p1.join()
    p2.join()
    p3.join()

The logic of the above code looks like this:

  • I create threads/processes that are designed to accomplish tasksjob(task1)orjob(task2),job(task3)Notice that the function name and parameters are separated heretarget=job, args=(task1, )
  • thenstart()“Tells the thread/process that you are ready to work
  • They’re doing their own thing, and the main logic of our program has to keep going
  • tojoin()By this, we mean that threads/processes block our main logic, for examplep1.join()Refers to:p1If I don’t finish my work, my master logic will not proceed (” block “).
  • So, our functionmultcoreWhen finished, the thread/process task must be completed

Let’s look at the results:

if __name__ == '__main__': Print (" mutlthread ") print(" mutlthread ") print(" mutlthread ") print(" mutlthread ") Using 0.246s: Timer: Using 0.246s: Timer: Using 0.246s: Timer: Using 0.246s

The result is not quite right. It should be argued that multi-process parallel should take one-third as long as synchronous serial, after all, three tasks of the same size are running simultaneously.

Multithreading concurrency should be slower than synchronous concurrency because it has the same computational force as synchronous serialization, but it is slower because it has to switch back and forth between tasks.

You ask what does @timer mean? Oh, here’s the modifier I wrote, as follows.

def timer(func):
    @wraps(func)
    def inner_func():
        t = time.time()
        rts = func()
        print(f"timer: using {time.time() - t :.5f} s")
        return rts
    return inner_func

If you don’t understand the Python decorator brother, just click “watching” and follow me. We’ll talk more about it later.

I’m Pai, WeChat PiperLHJ, thanks for watching.