Code has published a number of articles on asynchronous crawlers and asynchronous programming, and a reader recently asked me to take a closer look at how Asyncio achieves concurrency with a single thread and process. And whether asynchronous code can replace synchronous code in all aspects.

Some examples

The first example

Let’s say you need to use a rice cooker to cook, use a washing machine to wash clothes, and call a friend to come over for dinner. It takes 30 minutes for a rice cooker to finish cooking, 40 minutes for a washing machine to finish washing and 50 minutes for a friend to arrive at your house. So, do you need to spend 30 + 40 + 50 = 120 minutes on these three things?

In reality, it only takes you 50 minutes ————

  1. Call a friend first and tell him to go out now
  2. Put the clothes in the washing machine and turn on the power
  3. Wash the rice, put it in the rice cooker and turn on the power

Then, all you have to do is wait.

Second example

Now, you need to finish the Chinese paper, the math paper and the English paper. One hour is required for each test paper. So you need 1 + 1 + 1 = 3 hours to complete all the papers. There is no one to help you, so you will not be able to finish these three papers in less than three hours.

The third example

Now you need a rice cooker to cook, a washing machine to wash clothes and a maths paper to complete. It takes 30 minutes for a rice cooker to finish cooking, 40 minutes for a washing machine to finish washing clothes and an hour to complete test papers.

But you don’t need 30 + 40 + 60 = 130 minutes. You only need about 70 minutes ————

  1. Put the clothes in the washing machine and turn on the power
  2. Wash the rice, put it in the rice cooker and turn on the power
  3. Start to finish the paper

Asynchrony and asynchrony

In the first example, cooking, washing and other friends have one thing in common, that is, each operation seems to take a long time, but only a small amount of time is needed to do it: washing rice and turning on the electric cooker takes 5 minutes; It took two minutes to put the clothes in the washing machine and turn them on. It takes one minute to call a friend. Most of the rest of the time you don’t need anyone to do it, you just wait.

Look at the second example, each test paper will take up the whole of you, there is no time to wait, so must be completed one test paper at a time.

These two examples actually correspond to two types of programs: I/ O-intensive programs and computationally intensive programs.

When we use Requests to request urls, query remote databases, or read or write local files, that’s an I/O operation. The common feature of these operations is waiting.

In the case of Request request URLS, requests may initiate a request in 0.01 seconds. The program then freezes, waiting for the site to return. Request data is transmitted to the website server through the network, the website server initiates the database query request, the website server returns the data, the data passes the network cable back to your computer. Once the requests receives the data, proceed.

A lot of time is wasted waiting for websites to return data. If we can make the most of this wait time, we can make more requests. This is why asynchronous requests are useful.

But for code that requires a lot of computation, the CPU is always running at high speed and there is no waiting, so there is no excuse for doing anything else with the waiting time.

So: Asynchrony applies only to I/ O-related code, not to non-I /O operations.

Asynchronous code for Python

We used a real-life example to illustrate asynchronous requests above, which can give you the wrong impression ———— I can control my code to make it asynchronous where I want it to be and synchronous where I don’t want it to be. For example, one might wish to write code in the way described by this pseudocode:

Request https://baidu.com, during site return: a =1 + 1
    b = 2 + 2
    c = 3 + 3Get the data back and do something elseCopy the code

Just like when we plug in the electric cooker, I can read, make phone calls, watch TV and do whatever I want while waiting for the meal to be ready.

This pseudocode is intuitive, but you can’t do it in Python.

Let’s use some real code to show what’s wrong with this.

First of all, we do a web site, when we request http://127.0.0.1:8000/sleep/ < num >, the website will wait num seconds will return. For example: http://127.0.0.1:8000/sleep/3 said, when you initiate the request, website will wait 3 seconds and then return. The running effect is shown in the figure below.

Now we send 3 requests using AIOHTTP and wait 1 second, 2 seconds, 3 seconds to return:

import aiohttp
import asyncio
import time


async def request(sleep_time):
    async with aiohttp.ClientSession() as client:
        resp = await client.get(F 'http://127.0.0.1:8000/sleep/{sleep_time}')
        resp_json = await resp.json()
        print(resp_json)


async def main(a):
    start = time.perf_counter()
    await request(1)
    a = 1 + 1
    b = 2 + 2
    print('Can I run here while waiting for the first request? ')
    await request(2)
    print('Can I run here while waiting for the second request? ')
    await request(3)

    end = time.perf_counter()
    print(F 'Total time:{end - start}')


asyncio.run(main())
Copy the code

The operating effect is shown in the figure below:

In line 15 of the figure, a one-second request is made, so line 15 should wait one second before returning data. Lines 16, 17, and 18 are all simple assignment and print functions that obviously take less than a second to run, so theoretically we should see the return:

Can I run here while waiting for the first request? Is it possible to run here while waiting for the second request? {'success': True, 'time': 1}
{'success': True, 'time': 2}
{'success': True, 'time': 3} Total time: 3.018130547Copy the code

Instead, what we see is a program that runs to line 15, waits for the request to complete and then returns to the site, then runs lines 16, 17, and 18, then runs line 19, waits for the request to complete in 2 seconds, then runs line 20, and finally runs line 21. Three requests were made in succession, taking six seconds.

The logic of the program is not what we expect. Instead of using the I/O wait time to make a new request, the program waits for the previous request to finish before sending the next one.

What’s the problem?

The problem is that in Python’s asynchronous code, switching between requests cannot be managed directly by the developer.

The developer tells the asyncio with an await statement that the function after it can be awaited asynchronously. Note that it is possible to wait, but it is up to the Python layer to decide whether to wait or not.

Because Python knows about an I/O operation, whether you’re making a network request or reading or writing to a hard disk, it takes advantage of the I/O wait time when Python realizes that the operation you’re doing is indeed an I/O operation.

So, in Python asynchronous programming, the only thing a developer can do is batch tell Python all the operations that can be done asynchronously. It is then up to Python itself to coordinate, schedule, and make the most of the wait time. The developer does not have the right to directly determine how these I/O operations are scheduled.

So, we need to make some changes to the above code:

import aiohttp
import asyncio
import time


async def request(sleep_time):
    async with aiohttp.ClientSession() as client:
        resp = await client.get(F 'http://127.0.0.1:8000/sleep/{sleep_time}')
        resp_json = await resp.json()
        print(resp_json)


async def main(a):
    start = time.perf_counter()
    tasks_list = [
        asyncio.create_task(request(1)),
        asyncio.create_task(request(2)),
        asyncio.create_task(request(3)),]await asyncio.gather(*tasks_list)
    end = time.perf_counter()
    print(F 'Total time:{end - start}')


asyncio.run(main())
Copy the code

The operating effect is shown in the figure below:

As you can see, it now takes 3 seconds, indicating that these three requests did use the waiting time for the request.

We use asyncio.create_task() to define various coroutines as asynchronous tasks, and place these asynchronous tasks in a list. When a batch of tasks is gathered, we submit it to Asyncio.Gather () in one go. Python then automatically schedules this batch of asynchronous tasks, making full use of their request wait time to initiate new requests.

When we write a Scrapy crawler, we have code like this:

.yield scrapy.Request(url, callback=self.parse)
next_url = url + '&page=2'
yield scrapy.Request(next_url, callback=self.parse)
Copy the code

It looks like you “request” the URL, then use the waiting time for that request to do next_URL = URL + ‘&page=2’ and then make another request.

In fact, within Scrapy, when we yield scrapy.Request, we simply place a Request object into our Scrapy Request queue, and continue to execute next_URL = URL + ‘&page=2’.

The request object has not actually made the HTTP request after it has been placed in the request queue. Only when Scrapy’s downloader has a certain number of requests or has waited a certain amount of time does it schedule them and send HTTP requests uniformly. When a request is returned, Scrapy assembles the returned HTML into a Response object and passes it to the callback function to perform subsequent operations.

To do asynchronous programming in Python, you need to assemble a batch of asynchronous tasks and submit them to Asyncio to schedule them for you. You can’t directly control manually what code is executed when an asynchronous request is waiting, as javascript does.

Call synchronous functions in asynchronous code

It is possible to call synchronous functions in asynchronous functions. But if the synchronous function being called is time consuming, the other asynchronous functions will get stuck. The print function, for example, is a synchronous function, but because it takes so little time, asynchronous tasks are not stuck.

We now write a recursive Fibonacci number NTH calculation function and call it from another asynchronous function:

def sync_calc_fib(n):
    if n in [1.2] :return 1
    return sync_calc_fib(n - 1) + sync_calc_fib(n - 2)


async def calc_fib(n):
    result = sync_calc_fib(n)
    print('the first f{n}Item is calculated and the result is:{result}')
    return result
Copy the code

As we all know, it is very slow to calculate the NTH term of Fibonacci sequence based on recursion. Let’s calculate the 36th term, which takes about 5 seconds:

What if we combined computing Fibonacci numbers (CPU intensive) with requesting web sites (I/O intensive) tasks?

Let’s look at the results:

It can be seen that the total time is about 8 seconds, including 5 seconds for the calculation of Fibonacci number 36 and 3 seconds for the remaining 3 network requests, so the total time is 8 seconds.

This code shows that when one asynchronous function (calc_fib) calls a synchronization function (sync_calc_fib) that takes a very long time, all the asynchronous tasks in the batch will be stuck, and the other asynchronous functions can be scheduled normally only after the synchronization function is completed. This is why it is not recommended to use time.sleep in asynchronous programming.