What is the Puppeteer

Puppeteer is a Node library that provides a complete API for manipulating Chrome or Chromium via the DevTools protocol. Puppeteer runs headless by default and can also run Chrome and Chromium GUI.

Those familiar with crawlers or UI automation may associate PhantomJS, CasperJS, or Selenium, The Chrome DevTools team produces and maintains Puppeteer, which will crush other tools in terms of integrity, stability, compatibility, security, and performance.

The role of the Puppeteer

Everything we can theoretically do in Chrome, puppeteer can do. Such as:

  • Take screenshots of pages and elements
  • Save the page as a PDF
  • Crawl the content of SPA (Single-Page Application) website and generate pre-render content for SSR (Server-side Rendering) website
  • UI automation testing, automatic filling/submission of forms, simulated UI input
  • Test the latest Javascript and Chrome features
  • Performance test: Generates timeline trace to locate website performance problems
  • Test Chrome plugins

Of course, puppeteer is not universal. For example, it lacks cross-browser compatibility. Currently, there is only experimental support for Firefox, so you need to use a tool like Selenium/WebDriver to test your site’s browser compatibility. Puppeteer is more focused on interworking with Chromium to provide richer and more reliable functionality.

Install the Puppeteer

npm i puppeteer
Copy the code

or

yarn add puppeteer
Copy the code

During puppeteer installation, the Chromiun of the latest version (~170MB for Mac, ~282MB for Linux, ~280MB for Win) will be downloaded to ensure that the latest puppeteer is fully compatible with Chromium. We can also skip the Chromium download, or download other versions of Chromium to a specific path, which can be configured using Environment variables (see Environment Variables).

puppeteer-core

Puppeteer-core is a lightweight version of Puppeteer that does not download Chromium by default. You can choose to use Chrome locally or remotely.

npm i puppeteer-core
Copy the code

or

yarn add puppeteer-core
Copy the code

To use Puppeteer-Core, ensure that its version is compatible with the connected Chrome version.

Puppeteer-core ignores all puppeteer _* environment variables

For a detailed comparison of Puppeteer and Puppeteer-Core, see: Puppeteer vs Puppeteer-Core.

Usage, for example,

Example 1– accessexample.comAnd screenshots of the web page

Create a screenshot. Js

const puppeteer = require("puppeteer");

(async() = > {const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto("https://example.com");
  await page.screenshot({ path: "example.png" });

  awaitbrowser.close(); }) ();Copy the code

Perform screenshot. Js

node screenshot.js
Copy the code

Generate image preview:

Puppeteer’s initial window size is 800x600px, which determines the size of the screenshots of the page. We can use page.setViewPort () to set the window size, such as 1080P:

page.setViewport({
  width: 1920.height: 1080});Copy the code

If you want to scroll through screenshots of the actual web page, you can use:

await page.screenshot({ fullPage: true });
Copy the code

Example 2– accessGithub.com/puppeteer/p…Save the web page as a PDF file.

Create savePDF. Js

const puppeteer = require("puppeteer");

(async() = > {const browser = await puppeteer.launch();
  const page = await browser.newPage();
  page.setViewport({
    width: 1920.height: 1080});await page.goto("https://github.com/puppeteer/puppeteer", {
    waitUntil: "networkidle2"});await page.pdf({
    path: "puppeteer.pdf".format: "a2"});awaitbrowser.close(); }) ();Copy the code

Perform savePDF. Js

node savePDF.js
Copy the code

Generated PDF preview:

See page.pdf () for more options for generating PDF.

Example 3– Executes JS code in the context of the browser

Create a get – dimensions. Js

const puppeteer = require("puppeteer");

(async() = > {const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto("https://example.com");

  // Get the "viewport" of the page, as reported by the page.
  const dimensions = await page.evaluate(() = > {
    return {
      width: document.documentElement.clientWidth,
      height: document.documentElement.clientHeight,
      deviceScaleFactor: window.devicePixelRatio,
    };
  });

  console.log("Dimensions:", dimensions);

  awaitbrowser.close(); }) ();Copy the code

Perform the get – dimensions. Js

node get-dimensions.js
Copy the code

Execution Result:

For more evaluate uses, see Page.evaluate().

Example 4– Automatically fill in the form and submit it (atdevelopers.google.comEnter keywords in the page search boxHeadless ChromeAnd search)

Create a search. Js

const puppeteer = require("puppeteer");

(async() = > {const browser = await puppeteer.launch({
    headless: false./ / GUI mode
  });
  const page = await browser.newPage();
  await page.goto("https://developers.google.com/web/");
  // Enter the keyword in the search box
  await page.type(".devsite-search-field"."Headless Chrome");
  / / press Enter
  await page.keyboard.press("Enter");
  // Wait for the result to return
  const resultsSelector = ".gsc-result .gs-title";
  await page.waitForSelector(resultsSelector);
  // Crawl the results from the page
  const links = await page.evaluate((resultsSelector) = > {
    const anchors = Array.from(document.querySelectorAll(resultsSelector));
    return anchors.map((anchor) = > {
      const title = anchor.textContent.split("|") [0].trim();
      return `${title} - ${anchor.href}`;
    });
  }, resultsSelector);
  // Print the result
  console.log(links.join("\n"));

  awaitbrowser.close(); }) ();Copy the code

To perform the search. Js

node search.js
Copy the code

Results show:

Was Debugging skills

Puppeteer is very powerful at the debugging level, and some commonly used techniques are listed below.

1. Turn off “headless” mode – seeing what is displayed in the browser is very helpful for debugging

const browser = await puppeteer.launch({ headless: false });
Copy the code

2. Open the “slow Motion” mode – take a closer look at the browser

const browser = await puppeteer.launch({
  headless: false.slowMo: 250.// Slow down puppeteer operation by 250ms
});
Copy the code

3. Listen to the output in the browser console

page.on("console".(msg) = > console.log("PAGE LOG:", msg.text()));

await page.evaluate(() = > console.log(`url is ${location.href}`));
Copy the code

4. Use the Debugger in browser execution code

There are currently two execution contexts: the Node.js context where the code under test is run and the browser context where the code under test is run. We can use Page.evaluate () to insert the debugger into the browser context for debugging:

  • First set {devtools: true} when puppeteer is started:

    const browser = await puppeteer.launch({ devtools: true });
    Copy the code
  • Then insert the debugger in the evaluate() execution code so Chromium stops at this point:

    await page.evaluate(() = > {
      debugger;
    });
    Copy the code

5. Enable verbose loggin – The internal DevTools traffic is routed through the puppeteer namespaceThe debug modulerecord

Basic usage:

DEBUG=puppeteer:* node screenshot.js
Copy the code

Cross-env can be used under Windows

npx cross-env DEBUG=puppeteer:* node screenshot.js
Copy the code

Protocol traffic can be quite complex and we can filter out all network domain messages

env DEBUG=puppeteer:\* env DEBUG_COLORS=true node ./examples/screenshot.js 2>&1 | grep -v '"Network'
Copy the code

6. Use the NDB tool for debugging. For details, seendb

Links to resources

  1. The Puppeteer’s official website
  2. The API documentation
  3. Use the sample
  4. Github – Awesome Puppeteer
  5. Troubleshooting

The Demo link of this article: github.com/MudOnTire/p…