1 Threads and coroutines

1.1 About Threads

Threading model

There are three kinds of thread realization models: kernel-level thread model, user-level thread model and hybrid thread model. The biggest difference lies in the corresponding relationship between threads and Kernel Scheduling Entity (KSE). The object Entity that can be scheduled by the operating system Kernel is also called kernel-level thread, which is the minimum Scheduling unit of the operating system Kernel

  • Kernel level thread model user thread and KSE 1:1 relationship, most programming language thread libraries are the packaging of the OPERATING system KSE, scheduling work completely by the OS scheduler to do, simple implementation.
  • User-level threading model The relationship between user threads and KSE M:1. The creation, destruction, and coordination of multiple threads are all handled by the user’s own thread library, which is how coroutines are implemented. This way, the number of threads created and the resource cost of context switching are minimal. However, there are disadvantages: KSE blocks when a user thread makes a blocking system call (blocking read network IO). When the kernel thread blocks, the CPU suspends the CURRENT KSE and all user threads become blocked. A common solution to this problem is to work within the user thread. Once a blocking operation occurs, the current user thread is actively moved to the next thread to avoid KSE blocking. Python gevent requires a monkey_patch IO operation to encapsulate it.
  • Mixed thread model the relationship between user threads and KSE M:N. Multiple Kses are created in a process and threads can be associated with different Kses, so when a KSE executes a suspended blocking user thread, other user threads bound to that KSE will be allocated to other Kses. This dynamic correlation mechanism is implemented by users. For example, go implements this model. The user scheduler realizes the “scheduling” of user threads to KSE, and the kernel scheduler realizes the scheduling of KSE to CPU

1.2 coroutines

Coroutine, commonly known as user-mode thread, but different from thread preemptive scheduling, using collaborative scheduling, starting from a state a flow of execution switching between multiple coroutines, as if multiple coroutines writing to complete a task.

2. Python coroutines

2.1 From generators to coroutines

For generators, we know the iterator that uses yield:

def generator():
    for i in range(10):
        yield i
        print 123
a = generator()
next(a)
Copy the code
This execution is when we generate a generator for A, the code has been executed to yield, the program is tampered, and we call next(a) once again to return the I after yield. We can control its execution process through next(a), so as to realize the alternate execution of different subroutines.

In Python’s PEP-342, enhancements to yield, the ability to pass in values, the ability to use them in try-finally, and so on, make generators more usable as coroutines.

def generator(g2):
    for i in range(10):
        num = yield 1
a = generator()
a.send(None)
a.send(2)
Copy the code

2.2 greenlet

The greenlet is a C extension of Python that provides self-scheduled coroutines. Its advantage over yield is that it can pass parameters from one coroutine to another (in the switch function, although yield can also pass values to coroutines, But we have to be in the scheduling algorithm for the main function.

def test1():
    print("test1 into")
    gr2.switch()
    print("test1 out")
​
​
def test2():
    print("test2 into")
    gr1.switch()
    print("test2 out")
​
​
gr1 = greenlet(test1)
gr2 = greenlet(test2)
gr1.switch()
Copy the code
Test1 (gr1.switch()); test2 (gr2.switch()); test1 (gr2.switch()); Finally, when test1 exits, it returns directly to the main thread, so we can see the output:

test1 into
test2 into
test1 out
Copy the code
Multiple greenlets can co-exist in one thread, but only one greenlet is running in one thread at any time.

In order for another greenlet to run, the currently running greenlet must relinquish control, which is called switching. The switch must explicitly choose which Greenlet will take over execution, and the current greenlet calls the destination greenlet’s switch method to make the switch. When the switch occurs, the call stack of the current Greenlet is saved, the call stack of the destination Greenlet is put in the correct place, and the execution flow continues where the destination Greenlet was last switched.

2.3 gevent

2.3.1 gevent concept

A third-party coroutine library, based on greenlet implementation of its own coroutine scheduling mechanism, the implementation of cooperative scheduling, different from CPU preemptive scheduling.

Event loop

Instead of blocking and waiting for the socket operation to complete (polling), gEvent lets the operating system emit events so that the event loop knows when an event has occurred, such as when data has been transferred and the socket is ready to read. By doing so, GEvent can continue to run the next Greenlet that may already have an event in place. This process of repeatedly registering events and processing them as they occur is called an event loop.Copy the code
The Hub is the most important part of a GEvent

When A function A in the GEvent API blocks, it gets an instance of gEvent.hub. hub (A hub instance is A very special Greenlet running an event loop), and then that greenlet of function A is switched to the Hub(giving control back to the Hub). If there is no instance of gEvent.hub.hub at the time, one is automatically createdCopy the code

2.3.2 gevent process

Hub itself is also a greenlet. In hub, we run loop, one of the cores of GEvent, which is the event loop. Loop encapsulates various interfaces corresponding to the underlying LibeV. It will be used to process that all greenlets can get the current hub object through get_hub, or create a new one if it doesn’t exist. After switching, any greenlet will enter the greenlet of hub. If the event rotation training is not started at this time, it will actively start, and then wait for the callback event of the next greenlet.

# _socket3.py def __init_common(self): _socket.socket.setblocking(self._sock, Socket fileno = _socket.socket.fileno(self._sock) self.hub = get_hub() io_class = self.hub.loop.io # Predefined one IO watcher self._read_event = io_class(fileno, 1) # socket readable event watcher self._write_event = io_class(fileno, 1) 2)# watcher self.timeout = _socket.getDefaultTimeout ()Copy the code
Get_hub gets the hub object

def get_hub_noargs(): hub = _threadlocal.hub if hub is None: Hubtype = get_hub_class() # HubType is initially set to gevent.hub.hub hub = _threadlocal.hub = hubType () return hubCopy the code
In Python’s standard library socket defaults to blocking mode, in which many of its functions are blocked, including recv(), which receives data sent by the peer end.

def recv(self, *args): while True: try: return self._sock.recv(*args) except error as ex: If no data is found, EWOULDBLOCK if ex.args[0]! If not more than one IO watcher is not found for the event self._wait(self._read_event), if not more than one IO watcher is not found for the event.Copy the code
Wait (), declare a Waiter object, and pass the Waiter. Switch function to the IO Watcher that is bound to the current socket read event. Execute the callback function waitwaiter. Switch

    def wait(self, watcher):
        waiter = Waiter(self) # pylint:disable=undefined-variable
        watcher.start(waiter.switch, waiter)
        try:
            result = waiter.get()
            if result is not waiter:
                raise xxxx
        finally:
            watcher.stop()
Copy the code
We don’t know what gEvent is doing between the time the greenlet is cut out of the hub and the time the socket finally receives the message from the peer and the watcher callback successfully cuts into the current greenlet. Let’s leave that for now and explore another question.

How does the hub manage greenlets

We can take a look at a gEvent usage example

def do_something():
    pass
g1 = gevent.spawn(do_something)
g1.join()
Copy the code
“Spawn” means to produce something, so now we’re using “gevent” so what’s the probability of producing a coroutine

def spawn(cls, *args, **kwargs):
    g = cls(*args, **kwargs)
    g.start()
    return g
Copy the code
Looking at the code, we see that we initialize an instance of Greenlet(inherited from the Greenlet class, which encapsulates the C library’s Greenlet interface) and then call

Return it at last with start. So we know, let’s first look at the initialization method of this class, and there’s this line

_greenlet__init__(self, None, get_hub())
def __init__(self, run=None, parent=None): 
    pass
Copy the code
If a coroutine is generated in the same thread, it will specify its parent as the same hub, so that when the coroutine exits, it will not exit to the main thread, but jump to our hub coroutine.

Let’s look at the start method

def start(self): """Schedule the greenlet to run in this loop iteration""" if self._start_event is None: _call_spawn_callbacks(self) hub = get_my_hub(self) # GetParent coroutine, The hub self._start_event = hub.loop.run_callback(self.switch) # register with the hub event loopCopy the code
At this point, we’re just registering two coroutines, we’re not actually running them, so let’s look at join() to simplify the code we see here.

At this point, we’re just registering two coroutines, we’re not actually running them, so let’s look at join() to simplify the code we see here.

def join():
    result = get_my_hub(self).switch()
Copy the code
This is simply switching to the hub greenlet, as defined by the switch method:

If this greenlet has never been run, then this greenlet
will be switched to using the body of self.run(*args, **kwargs).
Copy the code
Let’s look at what the hub’s self.run does to simplify things

 def run(self):       
     while 1:
        loop = self.loop
        try:
            loop.run()
        finally:
            loop.error_handler = None  # break the refcount cycle
Copy the code
Logic is probably carry out our event loop loop, set out from the Hub to find loop code eventually landed to gevent. The libev. Corecffi. Loop (if Linux environment) and then to call c lib library, ILoop, loop.run() performs event listening for the underlying liveV. When an event is triggered, the callback registered at the time is executed, allowing the switch back to the specified coroutine.

2.4 asyncio

Python is a standard library for asynchronous programming designed by the python founder himself. Its core components include tasks and Future objects in addition to event loops and Coroutines.