1 / introduction

The CPU in the computer does all the computing. It's like a factory, always running. Suppose the plant has a limited supply of power, one workshop at a time. That is to say, when one workshop starts, all the others must stop. The implication is that a single CPU can only run one task (application, process) at a time. This is why people now want to buy multi-core CPU laptops. Processes are like factory floors; they represent individual tasks that the CPU can handle. At any one time, a single-core CPU is always running one process and the other processes are not running. The CPU switches quickly between multiple processes, giving the illusion of simultaneous execution. But if there are too many processes, switching can cause the computer to stall. In a workshop, there can be many workers who work together to complete a task. (Of course, you can have only one worker.)Copy the code

Threads are like workers in a workshop. A process can contain multiple threads (or just one thread). One person can also complete a task, and multiple people can also cooperate to complete a task. Workshop Spaces are shared by workers, with many rooms accessible to each worker. This indicates that the memory space of a process is shared, and each thread can use the shared memory. However, each room is of a different size, and some rooms can only hold one person at most. Toilets, for example, can't be accessed when there are people inside. This means that when a thread uses some shared memory, other threads must wait for it to finish before they can use it. A simple way to keep people out is to add a lock to the door. Those who arrive first lock the door, and those who arrive later, when they see the lock, line up at the door and wait for the lock to open before they enter. This is called a mutex and prevents multiple threads from reading or writing to a memory area at the same time. There are also rooms that can accommodate up to n people, such as the kitchen. In other words, if the number of people is greater than n, the extra people have to wait outside. This is like some memory area that can only be used by a fixed number of threads. The solution to this situation is to hang a bunch of keys in the doorway. He who goes in takes a key, and hangs it back on his way out. When the last person to arrive finds his keys on the rack, he knows he must wait in line at the door. This practice, called Semaphore, is used to ensure that multiple threads do not conflict with each other. It is not hard to see that a mutex is a special case of a semaphore (n=1). In other words, you can replace the former with the latter. However, because mutex is relatively simple and efficient, this design is used in cases where resource exclusivity must be guaranteed.Copy the code

2 / summary

Operating system design, therefore, can be boiled down to three points: <1> in the form of multi-process, allowing multiple tasks to run at the same time (parallel); <2> In the form of multi-threading, allowing a single task to be divided into different parts to run; <3> provides a coordination mechanism that prevents conflicts between processes and threads on the one hand, and allows resources to be shared between processes and threads on the other.Copy the code

Two ways Python gets a multithreaded return value

Create a custom get_result() method by overwriting the Thread classCopy the code
  from threading import Thread
  
  def cal_sum(a,b) :
      return a+b
  
  class MyThread(Thread) : Thread class
      def __init__(self, func, args) :
          super(MyThread, self).__init__()
          self.func = func
          self.args = args
           
      def run(self) :
          self.result = self.func(*self.args)
          
      def get_result(self) :
          try:
              return self.result
          except Exception:
              return None
              
  if __name__ == "__main__":
      Instantiate the object
      Instantiate 2 child threads
      subt1 = MyThread(cal_sum,args=(1.5))
      subt2 = MyThread(cal_sum,args=(6.10))
      subt1.start()
      subt2.start()
      
      subt1.join()
      subt2.join()
      
      res1 = subt1.get_result()
      res2 = subt2.get_result()
      
      print("Main thread start......")
      print(res1 + res2)
      print("Main thread end......")
Copy the code

Why multithreading

Threads are concurrent execution flows. Compared with independent processes, threads in a process are less isolated and easy to switch. They share memory, file handles, and other state processes should have because threads are smaller than processes, making multithreaded programs more concurrency, that is, switching between threads. The process has an independent memory unit during execution, and multiple threads share the memory, thus greatly improving the efficiency of the program.Copy the code

Main thread, child thread, daemon thread

<1> Look at the following 2 examples where setDaemon(True) is used to turn all child threads into daemons for the main thread, so that when the main thread ends, the child thread ends with it. When the main thread ends, the entire program exits. Based on the final output, we can see that when the main thread terminates, the child thread terminates as well.Copy the code
    import threading
    import time
    
    def f(n) :
      print("task",n)
      time.sleep(1)
      
      print(3)
      time.sleep(1)
      
      print(2)
      time.sleep(1)
      print(1)
    
     if __name__ == "__main__":
        subt = threading.Thread(target=f,args=("t1",))
        subt.setDaemon(True)
        subt.start()
        
        print("end")
Copy the code
The result of the above code is: Task, T1 endCopy the code

<2> The main thread waits for the child thread to finish. To make the main thread finish after the daemon thread finishes, we can use the join method to make the main thread wait for the child thread to finish.Copy the code
    import threading
    import time
    
    def main(n) :
      print("task",n)
      time.sleep(1)
      print(3)
      time.sleep(1)
      print(2)
      time.sleep(1)
      print(1)
    
     if __name__ == "__main__":
        t = threading.Thread(target=main,args=("t1",))
        t.setDaemon(True)
        t.start()
        print("end")
Copy the code
The result of the above code is Task T1 3 2 1 endCopy the code

Multithreaded code example

<1> Multithreading gets the result backCopy the code
      from threading import Thread
      
      class MyThread(Thread) :
          def __init__(self,func,args=()) :
              super(MyThread,self).__init__()
              self.func = func
              self.args = args
              
          def run(self) :
              self.result = self.func(*self.args)
              
          def get_result(self) :
              try:
                  # If the child thread does not use the join method, there may be no self.result error
                  return self.result  
              except Exception:
                  return None
                  
      def f(raw_list, n) :
          for i in range(0.len(raw_list), n):
              yield raw_list[i:i + n]
              
              
      threads_pool = []  # the thread pool
      for i in temp:
          subt = MyThread(f,args=(final_data_df,i))  Create a thread object
          threads_pool.append(subt)  Add to the thread pool

      for t in threads_pool:
          Set each child thread as a daemon thread, so that when the main thread ends, the child thread ends, too.
          Otherwise, when the main thread ends, the child thread will remain suspended
          t.setDaemon(True) 
          t.start()  # Execute the child thread

      Join the main thread to block the main thread from running faster than the child thread.
      The child thread terminates at the end of the main thread without receiving the result of the child thread
      Each child thread blocks the main thread
      iterative_df_list = []
      for t in threads:
          t.join()
          iterative_df_list.append(t.get_result())  Get the result of the child thread
      
      final_data_df = reduce(lambda a,b: pd.concat([a,b],sort=Fasle),iterative_df_list).reset_index(drop=True)
Copy the code