Python multiprocess foundation

The multi-process programming method in Python is very similar to multi-threading, almost just changing some functions. With the foundation of multi-threading mentioned before, I will only show some code in many places, and emphasize the differences involved.

This paper is divided into the following parts

Instructions in advance
The simplest multi-process
The form of class
The process of pool
Memory independence between processes
The queue
pipe
value
Process the lock

Instructions in advance

There are two things to keep in mind when writing code

When using multiple processes, it is best to write code in a file and use CMD to execute, as in Jupyter you often do not get the desired results
The code to create the process must be placed inif __name__ == '__main__'inside

The simplest multi-process

import multiprocessing

import time

def myfun(num):

    time.sleep(1)

    print(num + 1)

if __name__ == '__main__':

    for i in range(5) :

        p = multiprocessing.Process(target = myfun, args = (i, ))

        p.start()

Copy the code

In addition, join IS_alive daemon name current_process is the same.

The form of class

import multiprocessing

import requests

from bs4 import BeautifulSoup

class MyProcess(multiprocessing.Process) :

    def __init__(self, i):

        multiprocessing.Process.__init__(self)

        self.i = i

    def run(self):

        url = 'https://movie.douban.com/top250?start={}&filter='.format(self.i*25)

        r = requests.get(url)

        soup = BeautifulSoup(r.content, 'html.parser')

        lis = soup.find('ol'.class_='grid_view').find_all('li')

        for li in lis:

            title = li.find('span'.class_="title").text

            print(title)

if __name__= ='__main__':

    for i in range(10) :

        p = MyProcess(i)

        p.start()

Copy the code

The process of pool

import requests

from bs4 import BeautifulSoup

from multiprocessing import Pool, current_process

def get_title(i):

    print('start', current_process().name)

    title_list = []

    url = 'https://movie.douban.com/top250?start={}&filter='.format(i*25)

    r = requests.get(url)

    soup = BeautifulSoup(r.content, 'html.parser')

    lis = soup.find('ol', class_='grid_view').find_all('li')

    for li in lis:

        title = li.find('span', class_="title").text

        # return title

        title_list.append(title)

        print(title)

    return(title_list)

if __name__ == '__main__':

    pool = Pool()

    for i in range(10) :

        pool.apply_async(get_title, (i, ))

    pool.close()

    pool.join()

    print('finish')

Copy the code

Here’s a clarification

usePoolIf the number of processes is not specified, the number of CPU cores is used by default
The number of cores corresponds to the computer’s (task Manager-performance) number of logical processors, not the number of cores (my computer has 2 cores and has 4 logical processors, so 4 processes are used by default)
The number of processes can be hundreds or thousands. It does not mean that the maximum number of open processes is 4Pool(10)You can start 10 processes at the same time for fetching
However, it should be noted that, whether multi-threaded or multi-process, too many open will cause switching time, reduce efficiency, so be careful to create too many threads and processes

Memory independence between processes

The biggest difference between multi-process and multi-thread is that each process of multi-process has a copy of variables, and operations between processes do not affect each other. Let’s take a look at the following example

import multiprocessing

import time

zero = 0

def change_zero(a):

    global zero

    for i in range(3) :

        zero = zero + 1

        print(multiprocessing.current_process().name, zero)

if __name__ == '__main__':

    p1 = multiprocessing.Process(target = change_zero)

    p2 = multiprocessing.Process(target = change_zero)

    p1.start()

    p2.start()

    p1.join()

    p2.join()

    print(zero)

Copy the code

The result is as follows

Process-1 1

Process-1 2

Process-1 3

Process-2 1

Process-2 2

Process-2 3

0

Copy the code

The above results show that the two newly created processes each increased the value to 3, not 6 at the same time. Meanwhile, the value of the main process remains 0. So each process copies the data and does it itself, without sharing the results with other processes.

But writing files is different

import multiprocessing

import time

def write_file(a):

    for i in range(30) :

        with open('try.txt'.'a') as f:

            f.write(str(i) + ' ')

if __name__ == '__main__':

    p1 = multiprocessing.Process(target = write_file)

    p2 = multiprocessing.Process(target = write_file)

    p1.start()

    p2.start()

    p1.join()

    p2.join()

Copy the code

The resulting try.txt file reads as follows

0 12 3 4 5 6 7 8 9 10 11 12 13 14 0 15 2 16 17 3 4 18 19 5 20 6 21 22 8 9 23 10 11 25 26 12 13 27 28 14 29 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Copy the code

You can see that both processes write data to the same file.

Now let’s discuss the first case. What if you really want to share variables between two processes

The queue

This introduces the first form of communication between processes — queues. The multiprocessing module provides multiprocessing.Queue, which is different from queue. Queue because it encapsulates the communication between processes. Different processes can operate the same multiprocessing.Queue.

from multiprocessing import Process, Queue

def addone(q):

    q.put(1)

def addtwo(q):

    q.put(2)

if __name__ == '__main__':

    q = Queue()

    p1 = Process(target=addone, args = (q, ))

    p2 = Process(target=addtwo, args = (q, ))

    p1.start()

    p2.start()

    p1.join()

    p2.join()

    print(q.get())

    print(q.get())

Copy the code

The result is as follows

1

2

Copy the code

This queue is thread – and process-safe, meaning that no changes to the queue can be interrupted and result in errors.

pipe

Pipe is similar in function to Queue and can be interpreted as a simplified Queue. Let’s start with an example

import random

import time

from multiprocessing import Process, Pipe, current_process

def produce(conn):

    while True:

        new = random.randint(0.100)

        print('{} produce {}'.format(current_process().name, new))

        conn.send(new)

        time.sleep(random.random())

def consume(conn):

    while True:

        print('{} consume {}'.format(current_process().name, conn.recv()))

        time.sleep(random.random())

if __name__ == '__main__':

    pipe = Pipe()

    p1 = Process(target=produce, args=(pipe[0],))

    p2 = Process(target=consume, args=(pipe[1],))

    p1.start()

    p2.start()

Copy the code

The results are as follows

Process-1 produce 24

Process-2 consume 24

Process-1 produce 95

Process-2 consume 95

Process-1 produce 100

Process-2 consume 100

Process-1 produce 28

Process-2 consume 28

Process-1 produce 62

Process-2 consume 62

Process-1 produce 92

Process-2 consume 92

.

Copy the code

Pipe was used above to implement the production consumption pattern.

The differences between Queue and pipe are summarized as follows

Queueuseput getTo maintain the queue,pipeusesend recvTo maintain the queue
pipeOnly two endpoints are provided, whileQueueThere is no limit. That means usepipeOnly two processes can be started at the same time, as above, one producer and one consumer, respectively.Pipe()The two endpoints jointly maintain a queue. If multiple process pairspipeOperating on the same endpoint at the same time, an error occurs (similar to thread unsafe because it is not locked). So the two endpoints provide only two safe places for processes to operate, limiting the number of processes to two
QueueThe package is better,QueueProvides only one result, which can be called by many processes simultaneously; whilePipe()Returns two results, to be called separately by two processes
QueueImplementation based onpipe, sopipeRunning speed ratioQueueA lot faster
Used when only two processes are requiredpipeFaster, used when multiple processes need to operate on the queue at the same timeQueue

value

When we do not want to maintain a queue, but just multiple processes operating on a number at the same time, we need to provide a method, Value, that can be shared between multiple processes

from multiprocessing import Process, Value

def f1(n):

    n.value += 1

def f2(n):

    n.value -= 2

if __name__ == '__main__':

    num = Value('d'.0.0)

    p1 = Process(target=f1, args=(num, ))

    p2 = Process(target=f2, args=(num, ))

    p1.start()

    p2.start()

    p1.join()

    p2.join()

    print(num.value)

Copy the code

The run result is

- 1. 0

Copy the code

The d in Value(‘d’, 0.0) represents a double-precision floating point number. See more here.

In addition to Value, the module also provides a similar Array. Interested readers can check out its usage on the website

Process the lock

Now that variables can be shared between processes, the insecurity of operating on a variable at the same time also arises. Like multithreading, processes are handled by locks, and they are used in the same way as in multithreading.

lock = multiprocessing.Lock()

lock.acquire()

lock.release()

with lock:

Copy the code

These uses and functions are the same as multithreading

In addition, multiprocessing. Semaphore Condition Event RLock also same and multithreading

Welcome to my zhihu column

Column home: Programming in Python

Table of contents: table of contents

Version description: Software and package version description