Recently because of work needs, learned the Node crawler, a simple record of my heart process.

Let’s start with the puppeteer library. If you look up the word puppeteer, it seems to mean puppeteer, which makes sense. This library is essentially designed for automated testing. It provides some apis to directly control Chrome’s actions, which can be used for UI tests or as a crawler to retrieve page data.

We agreed to start from the beginning, well, let’s officially start the entry:

First, Puppeteer is an NPM package that is easy to install.

After creating the project directory, execute:

$ yarn add puppeteer

or

$ npm install puppeteer

The latest version of Chromium is automatically downloaded when you install it, and all subsequent operations are done directly in Chromium.

Create a new index.js file in the project directory

Puppeteer: Take a look at the functions and basic uses of the puppeteer

Let’s try puppeteer first

In index.js, add the following code


const puppeteer = require("puppeteer");



(async() = > {const browser = await puppeteer.launch();

const page = await browser.newPage();

await page.goto("http://www.baidu.com");

await page.screenshot({ path: "baidu.png" });

awaitbrowser.close(); }) ();Copy the code

Execute later

node index.js

After the puppeteer is successfully executed, a screenshot is written to the puppeteer root directory

Read this code roughly. An obvious feature is the high frequency of async/await. This is a feature of ES7, puppeteer supports async/await very well, so Node 7.6 and above is officially recommended. The code is simple, semantic and easy to understand:


// Import the puppeteer library

const puppeteer = require("puppeteer");



// Use IIFE to execute functions directly

(async() = > {// Create the browser instance

const browser = await puppeteer.launch();

// Create a new page

const page = await browser.newPage();

// Open baidu URL

await page.goto("http://www.baidu.com");

// Take a screenshot and set the location of the picture

await page.screenshot({ path: "baidu.png" });

// Close the browser

awaitbrowser.close(); }) ();Copy the code

Above, a simple little example is implemented using Puppeteer.

In general, it is not complicated to use, the official documentation is good, and there is a Chinese version, this praise! In the process of learning, I will mainly refer to the documents.

The official online API documentation address: zhaoqize. Making. IO/puppeteer – a…