This is the 14th day of my participation in the August Text Challenge.More challenges in August

Multithreaded detailed solution

How do I create multithreading in Python?

  1. Create multiple threads with Thread
  2. Create multiple threads by subclass Thread

Python’s threading module provides a wrapper around threads to make them easier to use. The threading module is similar to the threading module.

# Example 1 Basic usage of threads
#coding=utf-8
import threading
import time
def xianyu() :
    print("Plath the Salted Fish.")
    time.sleep(1)
if __name__ == "__main__":
    for i in range(5):
        t = threading.Thread(target=xianyu)
        t.start() Start the thread, that is, let the thread start executionOutput: Finished fish Plath Salt fish Plath salt fish Plath [Finished fishin 1.1s]

Example 2: Use the Threading subclass to create multiple threads
#coding=utf-8
import threading
import time
class MyThread(threading.Thread) :
    def run(self) :
        for i in range(3):
            time.sleep(1)
            msg = "I'm "+self.name+The '@'+str(i)
            print(msg)
def test() :
    for i in range(5):
        t = MyThread()
        t.start()
if __name__ == '__main__': test() Output: I'm Thread-1 @ 0
I'm Thread-2 @ 0
I'm Thread-3 @ 0
I'm Thread-4 @ 0
I'm Thread-5 @ 0
I'm Thread-1 @ 1
I'm Thread-2 @ 1
I'm Thread-4 @ 1
I'm Thread-3 @ 1
I'm Thread-5 @ 1
I'm Thread-1 @ 2
I'm Thread-2 @ 2
I'm Thread-5 @ 2
I'm Thread-4 @ 2
I'm Thread-3 @ 2
[Finished in 3.2s]
Copy the code

What is the difference between multi-threaded and multi-process execution?

  1. Multi-process is the simultaneous execution of multiple programs
  2. Multithreading is the simultaneous execution of multiple execution Pointers in a program
  3. Multithreading does not require communication between threads. Threads share global variables, but processes do not
  4. Process is an independent unit of system resource allocation and scheduling, thread is an entity of process, is the basic unit of CPU scheduling and dispatching, it is smaller than process can run independently of the basic unit. A thread has virtually no system resources of its own, only a few resources that are essential to running (such as a program counter, a set of registers, and a stack), but it can share all the resources owned by a process with other threads that belong to the same process.
  5. Threads have a smaller partition scale than processes (fewer resources than processes), making multithreaded programs more concurrent.
  6. The process has an independent memory unit during execution, while multiple threads share the memory, thus greatly improving the efficiency of the program
  7. A line thread cannot execute independently and must depend on a process
  8. Thread execution costs little, but is not conducive to resource management and protection, while process is the opposite

Several states of a thread

During the execution process, if the thread executes the sleep statement midway, the thread will enter the blocking state. When the sleep is over, the thread enters the ready state, waiting for the scheduling and the scheduling will select a thread to execute.

Refer to the following figure for specific thread state transformation:

Global variables are shared between threads

Here’s an example:

from threading import Thread
import time
num = 100
def work1() :
    global num
    for i in range(3):
        num += 1
    print("----in work1, num is %d---"%num)
def work2() :
    global num
    print("----in work2, num is %d---"%num)
print(G_num is %d--"%num)
t1 = Thread(target=work1)
t1.start()
Wait a little while to make sure things are done in the T1 thread
time.sleep(1) t2 = Thread(target=work2) t2.start() Output: -- g_num before the Thread is createdis 100---
----in work1, num is 103---
----in work2, num is 103---
[Finished in 1.1s]
Copy the code

Conclusion:

  1. Global variables are shared by all threads in a process, allowing data to be shared between multiple threads without the other way around (this is better than multiple processes)
  2. The disadvantage is that threads arbitrarily modify global variables, which can lead to chaos between multiple threads on global variables (i.e. thread unsafe)

What is thread unsafe?

Here’s an example:

from threading import Thread
import time
num = 0
def test1() :
    global num
    for i in range(1000000):
        num += 1
    print("---test1---num=%d"%g_numnum
def test2() :
    global num
    for i in range(1000000):
        num += 1
    print("---test2---num=%d"%num)
p1 = Thread(target=test1)
p1.start()
# time.sleep(3)
p2 = Thread(target=test2)
p2.start()
print("---num=%d---"%num) output: when time.sleep(3), without unmasking --num=235159---
---test1---num=1172632
---test2---num=1334237
[Finished in 0.3S] when time. Sleep (3), when unmasking --test1-- num=1000000
---num=1014670---
---test2---num=2000000
[Finished in 3.3s]
Copy the code

If num=0, thread 1 is in the sleeping state. If num=0, thread 2 is in the sleeping state. If num=0, thread 2 is in the sleeping state. In this case, num in thread 2 is 1, and the system schedules thread 2 to be in the “sleeping” state. When thread 1 increments, num is still 0, and the result is not what we expected.

Not controlling the access of multiple threads to the same resource causes data destruction and makes the results of thread running unpredictable, which is the phenomenon of thread insecurity.

How do I avoid thread insecurity?

When multiple threads modify a shared data almost at the same time, synchronization control is needed. Thread synchronization can ensure that multiple threads access competing resources safely. The simplest synchronization mechanism is to introduce mutex. A mutex introduces a state for a resource: locked/unlocked.

When a thread wants to change the shared data, it locks it first. In this case, the resource status is locked, and other threads cannot change it. Other threads cannot lock the resource again until the thread releases the resource, making its state “unlocked.” The mutex ensures that only one thread writes at a time, thus ensuring data correctness in multithreaded situations.

Here’s an example:

from threading import Thread, Lock
import time
num = 0
def test1() :
    global num
    for i in range(1000000) :#True means blocked i.e. if the lock was locked before it was locked, the thread will wait here until it is unlocked
        #False means that it is not blocked, that is, regardless of whether the call is successfully locked, it will not be stuck and will continue to execute the following code
        mutexFlag = mutex.acquire(True)
        if mutexFlag:
            num += 1
            mutex.release()
    print("---test1---num=%d"%num)
def test2() :
    global num
    for i in range(1000000):
        mutexFlag = mutex.acquire(True) #True indicates blocking
        if mutexFlag:
            num += 1
            mutex.release()
    print("---test2---num=%d"%num)
Create a mutex
# this file is unlocked by default
mutex = Lock()
p1 = Thread(target=test1)
p1.start()
p2 = Thread(target=test2)
p2.start()
print("---num=%d---"%num) : --num=61866---
---test1---num=1861180
---test2---num=2000000
Copy the code

When a thread calls acquire() on a lock, the lock enters the “locked” state. Only one thread can acquire the lock at a time. If another thread attempts to acquire the lock at this point, the thread becomes “blocked”, called “blocked”, until the lock is unlocked when the thread that owns it calls the release() method. The thread scheduler selects one of the threads in the synchronous blocking state to acquire the lock and make the thread run.

Lock ensures that a key code segment can only be performed by a thread from the beginning to the end completely, but prevents multi-threaded concurrent execution, contains a lock code is actually only in single thread model, efficiency is greatly decreased, because there can be multiple locks, different threads hold different locks, and attempts to obtain other holdings of lock, may lead to a deadlock

What is a deadlock?

When multiple resources are shared between threads, a deadlock can occur if two threads each hold a portion of the resource and wait for the other’s resource at the same time.

Here’s an example:

#coding=utf-8
import threading
import time

class MyThread1(threading.Thread) :
    def run(self) :
        if mutexA.acquire():
            print(self.name+'----do1---up----')
            time.sleep(1)

            if mutexB.acquire():
                print(self.name+'----do1---down----')
                mutexB.release()
            mutexA.release()

class MyThread2(threading.Thread) :
    def run(self) :
        if mutexB.acquire():
            print(self.name+'----do2---up----')
            time.sleep(1)
            if mutexA.acquire():
                print(self.name+'----do2---down----')
                mutexA.release()
            mutexB.release()

mutexA = threading.Lock()
mutexB = threading.Lock()

if __name__ == '__main__':
    t1 = MyThread1()
    t2 = MyThread2()
    t1.start()
    t2.start()
Copy the code

The producer and consumer model

We can address thread synchronization and thread safety through the producer and consumer model.

Python’s Queue module provides synchronous, thread-safe Queue classes, including FIFO (first in, first out) Queue, LIFO (last in, first out) Queue, and PriorityQueue. All of these queues implement locking primitives (which can be understood as atomic operations, i.e., do nothing or do it) and can be used directly in multithreading, where queues can be used for synchronization between threads.

Here’s an example:

#encoding=utf-8
import threading
import time

The # python2
# from Queue import Queue
The # python3
from queue import Queue
class Producer(threading.Thread) :
    def run(self) :
        global queue
        count = 0
        while True:
            if queue.qsize() < 1000:
                for i in range(100):
                    count = count +1
                    msg = 'Build product'+str(count)
                    queue.put(msg)
                    print(msg)
            time.sleep(0.5)
class Consumer(threading.Thread) :
    def run(self) :
        global queue
        while True:
            if queue.qsize() > 100:
                for i in range(3):
                    msg = self.name + 'consumed'+queue.get()
                    print(msg)
            time.sleep(1)
if __name__ == '__main__':
    queue = Queue()
    for i in range(500):
        queue.put('Initial product'+str(i))
    for i in range(2):
        p = Producer()
        p.start()
    for i in range(5):
        c = Consumer()
        c.start()
Copy the code

What is the producer and consumer model?

The producer-consumer model solves the strong coupling problem of producer and consumer through a container. No direct communication between producers and consumers to each other, and through the blocking queue to communicate, so after producers to produce the data need not wait for the consumer process, directly to the blocking queue, consumers are looking for producers to data, but directly from the blocking queue, blocking queue is equivalent to a buffer, balance the processing capacity of producers and consumers. The blocking queue is used to decouple producers and consumers. In most design patterns, a third party is needed to decouple.

Want to use global variables do not want to lock how to do?

In a multithreaded environment, each thread has its own data. It is better for a thread to use its own local variables than to use global variables, because local variables are visible only to the thread itself and do not affect other threads, and changes to global variables must be locked.

Here’s an example:

import threading
Create a global ThreadLocal object:
local_school = threading.local()
def process_student() :
    Get student associated with the current thread:
    std = local_school.student
    print('Hello, %s (in %s)' % (std, threading.current_thread().name))
def process_thread(name) :
    # bind student to ThreadLocal:
    local_school.student = name
    process_student()
t1 = threading.Thread(target= process_thread, args=('salted fish',), name='Thread-A')
t2 = threading.Thread(target= process_thread, args=('Plath',), name='Thread-B'T1.start () t2.start() t1.join() t2.join()inThread-a) Hello, Plath (in Thread-B)
Copy the code

The local_school global variable is a ThreadLocal object that can read and write student attributes from each Thread, but does not affect each other. You can think of local_school as a global variable, but each property, such as local_school.student, is local to the thread and can be read or written to without interfering with each other.

The global variable local_school is a dict. You can use local_school.student and bind other variables, such as local_school.teacher.

The most common use of ThreadLocal is to bind each thread to a database connection, HTTP request, user identity, etc., so that all handlers called by a thread can easily access these resources.

A ThreadLocal variable is a global variable, but each thread can only read and write its own separate copy without interfering with each other. ThreadLocal solves the problem of passing parameters between functions in a thread

Synchronous versus asynchronous?

Synchronous call is when you call your friend to eat, your friend is busy, you wait there all the time, when your friend is finished, you go together.

An asynchronous call is when you ask your friend to have dinner, and your friend says he knows, he’ll meet you later when he’s busy, and you go do something else.

Here’s an example:

from multiprocessing import Pool
import time
import os
def test() :
    print("---进程池中的进程---pid=%d,ppid=%d--"%(os.getpid(),os.getppid()))
    for i in range(3) :print("----%d---"%i)
        time.sleep(1)
    return "hahah"
def test2(args) :
    print("---callback func--pid=%d"%os.getpid())
    print("---callback func--args=%s"%args)
pool = Pool(3)
pool.apply_async(func=test,callback=test2)
time.sleep(5)
print("---- main process -pid=%d----"%os.getpid()) output: -- Processes in the process pool --pid=9401,ppid=9400--
----0---
----1---
----2---
---callback func--pid=9400-- Callback func--args=hahah ---- main process -pid=9400----
Copy the code

Note: The callback is executed by the main process, and when the child dies, the main process calls the function.

What is GIL lock?

The Python Global explain lock (GIL) is simply a mutex (or lock) that allows only one thread to control the Python interpreter. This means that only one thread is in execution at any one time.

So multithreading is false in Python because there is only one thread executing in the CPU during execution.

When you use multiple processes, you are more efficient than multithreading.

The Python GIL is often considered an arcane and difficult topic, but remember that as a Python supporter, you are only affected by the GIL if you are writing C extensions or if you have computationally intensive multithreaded tasks in your programs.