In JavaScript, everything is done asynchronously, not sequentially like in Python, and it is very easy to block the entire program because one function is executing too slowly, far less efficient than JavaScript’s native asynchronous mechanism. Of course, Python can also implement asynchronous functionality, but it can be cumbersome, especially for small crawlers.

In my previous crawler article, I mentioned many benefits of asynchrony, but at the same time, there are many disadvantages. Many times, the output results are not what we want. For example, we want to crawl all the data before starting to execute the processing function, but JavaScript will crawl the data at the same time to execute the processing function, directly causing a crawler error.

1. 5. Smart refrigerator

const getWeb = function(){
    requestSomething()
    setData()
    return console.log ("3.End of program ")}const requestSomething = function(){
    setTimeout((a)= > console.log ('1.Page data crawl completed! '),2000)}const setData = function(){
    console.log ('2.Start processing the data! ')} getWeb ()Copy the code

Output result:

RequestSomething is executed first, but setData is the handler that prints the results first, even after the entire getWeb function returns.

To solve this problem and rein in JavaScript, Async functions are needed to make JavaScript programs run in the desired order.

1. How to create Async usage

We should first create an async function and control with await where we want to control asynchrony, such as:

const getWeb = async function(){
    await requestSomething()
    setData()
    return console.log ("3.End of program ")}Copy the code

The requestSomething function returns a Promise object that accepts resolve and reject as anonymous arguments. Resoleve () indicates that the function is finished. Reject () returns an error message, so let’s look at resoleve:

const requestSomething = function(){
    return new Promise((resolve, reject) = > {
        setTimeout((a)= > {
            console.log ('1.Page data crawl completed! ') resolve ()},2000)})}Copy the code

To run the modified code:

2. How to handle errors

Remember there’s a reject command? Reject, reject, reject, reject, catch, reject, reject, reject, reject, reject, catch, reject, reject, reject, catch, reject, reject, reject, catch, reject, reject, catch, reject, reject, reject, catch, reject, reject, reject, catch, reject, reject, reject, catch, reject, reject

const requestSomething = function(){
    return new Promise((resolve, reject) = > {
        letErr = '1.Climb failed! 'if(typeoferr ! = 'undefined') {// Return an error message
            reject(err)
        } else (
            setTimeout((a)= > {
                console.log ('1.Page data crawl completed! ') resolve ()},2000))})}const getWeb = async function(){
    // Catch the returned error message
    await requestSomething().catch(e= > console.log(e))
    setData()
    return console.log ("3.End of program ")}Copy the code

Output result:

So far, we have successfully controlled the asynchronous program, can be directly applied in the crawler!

Full code: github.com/Card007/Nod…

Also welcome to my home page: Nothlu’s Blog