Hello, I’m Yue Chuang.

The concept of coroutines, introduced in Python 3.4, uses a single-thread, single-process approach to concurrency. Talking about concurrency, most friends think of more should be multi-process and multi-thread, they are more consumption of system resources, today we do not talk about threads and processes, but to talk about the current more popular coroutine.

Because in crawler operations, coroutines have advantages over multithreading. Coroutines are single threaded, and a single thread can achieve high concurrency.

What is a coroutine?

Coroutine, English name is Coroutine, also known as micro thread, is a user – mode lightweight thread. Unlike threads and processes, which require context switching on the system kernel, coroutines are at the programmer’s discretion. In Python, a coroutine is a function that can be paused, which sounds like a generator.

Development of coroutines

Coroutines have been added to the standard library since Python3.4, when they were implemented via @asyncio.coroutine and yeild from, which look like generators. In order to better distinguish coroutines from generators, Python3.5 introduced async/await syntactic sugar, which was not accepted until Python3.6. Python3.6, currently the most stable version of Python3, has a large number of users. Later, Python3.7 officially declared async and await as reserved words, and coroutine calls became much simpler. Python3.6 is recommended because many async libraries report errors for function names, typically scrapy.

Advantages of coroutines over multithreading

Multithreaded programming is difficult because the scheduler can interrupt the thread at any time, and you must remember to retain locks to protect important parts of the program from interruptions in the execution of multiple threads.

By default, coroutines are fully protected against interrupts. We must display output for the rest of the program to run. In the case of coroutines, there is no need to keep the lock, and if you operate synchronously between multiple threads, the coroutine itself will be synchronized, because only one coroutine is running at any one time. Summarize the following points:

  • No system kernel context switch, reduce overhead;
  • No overhead of atomic operation locking and synchronization, no worry about resource sharing;
  • Single thread can achieve high concurrency, single core CPU even support tens of thousands of coroutines are not a problem, so it is suitable for high concurrency processing, especially in the application of network crawler.

Definition of coroutines

Using coroutines also means that you need to write asynchronous methods all the time. In Python we use the asyncio module to implement a coroutine. If we call ordinary functions in Python synchronous functions (methods), then functions defined by coroutines are called asynchronous functions (methods).

Note that all of the following code instances run in environments that require Python3.6 or greater.

Definition of synchronous and asynchronous functions

Synchronization function definition

def regular_double(x) :
    return 2 * x
Copy the code

Asynchronous function definition

async def async_double(x) :
    return 2 * x
Copy the code

Calls to synchronous and asynchronous functions

For the synchronous function we know that it is called like this:

 regular_double(2)
Copy the code

How do asynchronous functions get called? With this problem in mind let’s look at a simple example below.

import asyncio

async def foo() :
    print("This is a coroutine")


if __name__ == '__main__':
    loop = asyncio.get_event_loop() Define an event loop
    try:
        print("Start running the coroutine")
        coro = foo()
        print("Enter the event loop.")
        loop.run_until_complete(coro) Run the coroutine
    finally:
        print("Close event loop")
        loop.close() Close the coroutine after running
Copy the code

This is the simplest example of a coroutine. Let’s look at the code above.

First, you need to import the required package – Asyncio. Then, an event loop is created, because coroutines are based on event loops. The coroutine is then run by passing in an asynchronous function through the run_until_complete method. The coroutine is finally closed by calling the close method at the end. So that’s the way to call a coroutine. There are other differences between coroutines.

Chain calls between coroutines

We can call a coroutine within a coroutine by using the await keyword. One coroutine can start another coroutine so that tasks can be encapsulated into different coroutines depending on what they are doing. We can use the await keyword in coroutines and chain schedule coroutines to form a coroutine task flow. As in the following example:

import asyncio


async def main() :
    print(Principal coroutine)
    print("Wait for the Result1 coroutine to run")
    res1 = await result1()
    print("Wait for the Result2 coroutine to run")
    res2 = await result2(res1)
    return(res1 and res2)async def result1() :
    print("This is a result1 coroutine.")
    return "result1"


async def result2(arg) :
    print("This is a result2 coroutine.")
    return F "result2 receives a parameter,{arg}"


if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    try:
        result = loop.run_until_complete(main())
        print(F "gets the return value:{result}")
    finally:
        print("Close event loop")
        loop.close()
Copy the code

Output:

The main coroutine waits for the result1 coroutine to run and this is the result2 coroutine waiting for the result2 coroutine to run and this is the result2 coroutine getting the return value :('result1'.'result2 receives an argument, result1') close the event loopCopy the code

Above, we know that calling coroutines involves creating an event loop and then running it. What we need to know here is that if we want to call a coroutine from a coroutine we need to use the await keyword. Take the example above of calling coroutines result1 and result2 from the main function. So the question is: what is await doing?

Await the role of

The effect of await is to wait for the current coroutine to finish before continuing with the following code. Because we execute result1 for a very short time, it looks like resulT1 and Result2 are executed together. This is what await is for. Wait for the execution of a coroutine to complete, and then receive the result of the coroutine if it returns a result. Return can be used to return a result of the coroutine, just like return of a synchronous function.

Execute tasks concurrently

A series of coroutines can be called with await chain, but there are times when we need to wait for multiple coroutines in a coroutine, such as when we are waiting for 1000 asynchronous network requests in a coroutine, and there is no requirement for access order, we can use the keyword wait. Wait suspends a coroutine until the background operation completes.

The use of the Task

import asyncio


async def num(n) :
    print(F "The current numbers are:{n}")
    await asyncio.sleep(n)
    print(F "Waiting time:{n}")


async def main() :
    tasks = [num(i) for i in range(10)] # coroutines list
    Gather (*tasks) #await asyncio
    await asyncio.wait(tasks) Run coroutines of the coroutine list concurrently


if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    try:
        loop.run_until_complete(main())
    finally:
        loop.close()
Copy the code

Output:

The current numbers are:0The current numbers are:4The current numbers are:8The current numbers are:1The current numbers are:5The current numbers are:7The current numbers are:2The current numbers are:6The current numbers are:9The current numbers are:3Waiting time:0Waiting time:1Waiting time:2Waiting time:3Waiting time:4Waiting time:5Waiting time:6Waiting time:7Waiting time:8Waiting time:9
Copy the code

If asyncio.wait is executed concurrently, it will print the number 10 times, but it is not executed sequentially, which also indicates that asyncio.wait is executed out of order. To ensure order, just use gather to unpack tasks, which is the code in the comments section above.

How do you use ordinary functions in a coroutine?

We know that between calls to ordinary functions, the function name can be parenthesized, as follows:

def foo() :
   print("This is a normal function.")
   return "test"

def main() :
   print("Call foo") 
   res=foo()
   print(f"{received value from foo function}:res")
if __name__=="__main__"
   main()
Copy the code

So how do you use an ordinary function in a coroutine? There are methods to call ordinary functions in coroutines. You can use keywords such as call_soon.

call_soon

The call can be interpreted literally to return immediately. Let’s look at a specific use example:

import asyncio
import functools


def callback(Args, *, kwargs="defalut") :
    print(F "ordinary function as callback function, get argument:{args}.{kwargs}")


async def main(loop) :
    print("Registered callback") loop. Call_soon (callback,1) Wrapped = functools. Partial (callback, kwargs="not defalut") loop. Call_soon (wrapped,2)
    await asyncio.sleep(0.2)


if __name__ == '__main__':
    loop = asyncio.get_event_loop()
try:
    loop.run_until_complete(main(loop))
finally:
    loop.close()
Copy the code

Output result:

Register the normal callback function as the callback function, and get the argument:1Defalut normal function as callback function, get argument:2.not defalut
Copy the code

From the output we can see that we have successfully called a normal function in the coroutine, printing 1 and 2 sequentially.

Now, after looking at these examples, you might wonder, is there no downside to coroutines?

Disadvantages of coroutines

Similarly, the following two points should be made.

CPU multicore cannot be used

The nature of a coroutine is single-threaded. It cannot use multiple cores on a single CPU at the same time. Coroutines need to work with processes to run on multiple cpus. Of course, most of the applications we write daily are not necessary. For example, for web crawlers, there are other factors that limit the speed of crawlers, such as the concurrency of websites and network speed. Unless you are doing some intensive applications, you may need multiple processes and coroutines at this point.

Use non-blocking code everywhere

Writing coroutines means that you need to write non-blocking code, using various asynchronous versions of the library, such as aioHTTP, which is an asynchronous version of the Request library later in the asynchronous crawler tutorial. However, these disadvantages do not detract from the advantages of using coroutines.

summary

This is enough to get you started on coroutines. Of course, coroutines involve more than these, and this is just for you to have a certain understanding of coroutines in advance. Later, we will continue to explain other knowledge of coroutines. All the knowledge basis of coroutines is to prepare for the later asynchronous crawler tutorial, and only if you are familiar with the use of coroutines, you can quickly get started in the later tutorial. The following topics are further covered, including event loops, Task, Future, Awaitable, as well as the high-level API knowledge of coroutines. Stay tuned!