puppeteer

Official Chinese document

The Demo project

The structure

Puppeteer can be used to open a headless Chrom browser process that controls the browser with code

yarn add puppeteer
yarn add puppeteer-core
Copy the code

Puppeteer’s API mimics the browser structure (the transparent parts indicate that these browser structures are not implemented in Puppeteer yet:


Browser

The new instance

Using a puppeteer. Launch (options? To create a browser instance

After the instance is created, you need to manually close the instance when the page operation is complete, otherwise the instances will increase and cause memory leaks

const puppeteer = require('puppeteer')

const init = async() = > {const browser = await puppeteer.launch()
  / / operation...
  await browser.close()
}

init()
Copy the code

For visual debugging, headless mode can be turned off and on with each step of code operation delayed

const browser = await puppeteer.launch({ 
  headless: false.slowMo: 100,})Copy the code

Multiple instances can be created

const init = async() = > {const browserA = await puppeteer.launch()
  const browserB = await puppeteer.launch()
}

init()
Copy the code

The connection instance

There is no need to create a new browser instance when it already exists, so use puppeteer.connect(options?). To connect to

The connection is a browser Websocket endpoint link that can be returned via browser.wsendpoint ()

// > Symptom: Open the browser and keep opening and closing new tabs in the same browser
const puppeteer = require('puppeteer')

let browserWSEndpoint = ' '

const init = async() = > {const browser = await puppeteer.launch()
  
  // Do not close the browser when puppeteer.connect is used
  browserWSEndpoint = await browser.wsEndpoint()
  / / ws: / / 127.0.0.1:62989 / devtools/browser / 4 eb19a2a - e2f c019-4 - b476-9 d9a6182e67d
  console.log(browserWSEndpoint)   
  
}

init()

setInterval(async() = > {const browser = await puppeteer.connect({ browserWSEndpoint })
  const page = await browser.newPage()
  await page.goto('https://www.bilibili.com/')
  await page.close()  
},5000)
Copy the code

BrowserContext

When creating a browser instance, a default browser context is automatically created, which is the most basic execution environment

Cookies and caches are not shared between different browser contexts

You can use the createIncognitoBrowserContext () to create a new context browser

Each context is marked with an _ID, and the default is NULL

const puppeteer = require('puppeteer')

const init = async() = > {const browser = await puppeteer.launch()
  const defaultContext = browser.defaultBrowserContext() 
  const newContextA = await browser.createIncognitoBrowserContext()
  const newContextB = await browser.createIncognitoBrowserContext()
  const contexts = await browser.browserContexts()
  
  // Verify context
  console.log(defaultContext._id)               // null
  console.log(newContextA._id)                  // B071B622CF995C82950F273E59F70B34
  console.log(newContextB._id)                  // FBC321F5FEF60814255BEF7CF23E6C37

  // Validate the context array
  console.log(contexts[0] === defaultContext)   // true
  console.log(contexts[1] === newContextA)      // true
  console.log(contexts[2] === newContextB)      // true
  console.log(contexts[3])                      // undefined

  browser.close()
}

init()
Copy the code

Browser.browsercontexts () : Returns an array of all open browsers

The defaultBrowserContext () : returns the default browser context

The browser instance contains all of the browser context content

console.log(browser._defaultContext === defaultContext)                 // true
console.log(browser._contexts.get(newContextA._id) === newContextA)     // true
console.log(browser._contexts.get(newContextB._id) === newContextB)     // true
Copy the code

A browser context represents a browser window, that is, n instances will open n Windows in visual debugging, but the opening of the window needs to be conditional, that is, there must be a page, just as there must be an initial page when opening the browser

A browser instance does not represent a browser window; a window opens when the instance is initialized because it comes with a default context and a default page

The default window has two tabs (because the default comes with one) and the newContextA window has one
const init = async() = > {const browser = await puppeteer.launch({ headless: false })
  const newContextA = await browser.createIncognitoBrowserContext()
  const newContextB = await browser.createIncognitoBrowserContext()

  const defaultPage = await browser.newPage()
  const pageA = await newContextA.newPage()
}
Copy the code

While a browser instance actually represents a browser process, the following is the difference between multiple browser instances and multiple contexts of a single browser instance

const init = async() = > {const browserA = await puppeteer.launch({ headless: false })
  const browserB = await puppeteer.launch({ headless: false})}Copy the code

const init = async() = > {const browser = await puppeteer.launch({ headless: false })
  const newContext = await browser.createIncognitoBrowserContext()
}
Copy the code


Page

A page runs in a browserContext, using browsercontext.newpage () to create a TAB. Just as a browser window can open multiple tabs, a context can create multiple pages

If a browser instance is used to create a page, the default context is browser.newPage(), which is shorthand for browser._defaultContext.newPage()

const init = async() = > {const browser = await puppeteer.launch()
  const newContextA = await browser.createIncognitoBrowserContext()
  const newContextB = await browser.createIncognitoBrowserContext()

  const defaultPage = await browser.newPage()
  const pageA = await newContextA.newPage()
  const pageB = await newContextB.newPage()

  console.log(defaultPage.browser() === browser)      // true
  console.log(pageA.browser() === browser)            // true
  console.log(pageB.browser() === browser)            // true

  browser.close()
}
Copy the code

Page.browser () : Returns the browser instance to which the current page belongs


Frame

Frames represent the specific content of a Page. A Page has a default mainFrame. Generally speaking, browser page-level operations take place in the Page class and in-page operations take place in the Frame class

For example, set page cookie page.setcookie (… MainFrame ().$(selector) inherits different vectors, and puppeteer abbreviates the page. MainFrame (). In this way, there is no need to consider their differences too much, and the unified view is the operation page


ExecutionContext

It represents a JS execution context that comes into play when native code is injected into the page for execution

Injection using

MainFrame ().evaluate(pageFunction… args?) Inject native code, similarly abbreviated to Page. Evaluate

const init = async() = > {const browser = await puppeteer.launch({ headless: false })
  const page = await browser.newPage()
  await page.goto('https://www.bilibili.com/')
  
  const data = 'data'
  
  await page.evaluate(() = > {
    alert(data)
  })
  
}
Copy the code

The above code will report an error because the variable declaration of data will not be found because the code is executed in the execution context of the web page, rather than locally


Function of the script

Recording script

If you want to experience writing scripts, you can use a Chrom plug-in Headless Recorder to record operations and turn it into puppeteer scripts


The page navigation

The API:

  • page.goto(url, options?) : Navigate to the specified address
  • page.goForward(options?) : Navigate to the next page in the page history
  • page.goBack(options?) : Navigate to a previous page in history
  • Page.url () : Returns the current page URL
const puppeteer = require('puppeteer')

const init = async() = > {const browser = await puppeteer.launch({ headless: false })
  const page = await browser.newPage()
  await page.goto('https://www.bilibili.com')
}

init()
Copy the code

To facilitate debugging, you can use a local web file for debugging

const puppeteer = require('puppeteer')
const path = require('path')

const filePrefix = 'file://'
const templatePath = `${filePrefix}${path.resolve(__dirname, './index.html')}` 

const init = async() = > {const browser = await puppeteer.launch({ headless: false })
  const page = await browser.newPage()
  await page.goto(templatePath)
  / / operation...
  await page.close()  
  await browser.close()
}

init()
Copy the code

Page setup

The API:

  • Page.setviewport (viewPort) : Sets the page size
  • page.cookies(… urls?) : Returns cookies of the specified page. By default, cookies of the current page are returned
  • page.setCookie(… Cookies) : Sets cookies
  • page.deleteCookie(… Cookies) : Deletes cookies
  • Page.setextrahttpheaders (headers) : Sets custom headers for all requests on the current page
  • Page. SetUserAgent (userAgent) : sets the userAgent
  • Page.setgeolocation (options) : Sets the location of the current page
  • Page. SetJavaScriptEnabled (enabled) : set whether to disable the js
  • Page. SetRequestInterception (value) : set the request interceptor
  • page.setCacheEnabled(enabled?) : Specifies whether to enable request caching. This function is enabled by default
  • Page. SetOfflineMode (enabled) : Sets whether to enable offline mode

Note: General page Settings will be configured before the page jumps

Simulator window

The corresponding terminal emulator can be used to open the page to get the corresponding responsive page style and effect

const puppeteer = require('puppeteer')

const init = async() = > {const browser = await puppeteer.launch({ headless: false })
  const page = await browser.newPage()

  // use iphone6 simulator to open the page
  await page.emulate(puppeteer.devices['iPhone 6'])
  await page.goto('https://www.bilibili.com')}Copy the code

Request to intercept

If only to test part of the function of the web page, there is no need to load some static resources of the web page, this time you can use request blocking to block some

const init = async() = > {const browser = await puppeteer.launch({ headless: false })
  const page = await browser.newPage()
  
  await page.setRequestInterception(true)     // Enable interceptor
  
  page.on('request'.(interceptedRequest) = > {
    // Block PNG and JPG images
    if (interceptedRequest.url().endsWith('.png') || interceptedRequest.url().endsWith('.jpg')) {
      console.log(interceptedRequest.url())
      interceptedRequest.abort()              // Interrupt the request
    } else {
      interceptedRequest.continue()
    }
  })

  await page.goto('https://www.bilibili.com')}Copy the code

Script injection

The API:

  • page.evaluate(pageFunction, … args?) : executes the specified method in a page

  • page.evaluateHandle(pageFunction, … Args) : Executes the specified method in a page

    Unlike Page. Evaluate, which returns only the data type, this method returns the JSHandle type

  • page.evaluateOnNewDocument(pageFunction, … args?) : executes the specified method in a page

    This method is called before any script is executed on the page, so it can be used to modify global code properties

  • Page. ExposeFunction (name, puppeteerFunction) : Mounts the specified method to the window object of the page

    The advantage of using this approach is that you can continue to use the exposed method even after the same page has been navigated multiple times

  • Page.addscripttag (options) : Creates

  • Page. AddStyleTag (options) : Create tags on the page and introduce web or local style code

const puppeteer = require('puppeteer')

const init = async() = > {const browser = await puppeteer.launch({ headless: false })
  const page = await browser.newPage()
  await page.goto('https://www.bilibili.com')
  await page.evaluate(() = > {
    alert('Code executed on a web page')
  })
}

init()
Copy the code

If you are performing an asynchronous operation in an injected script, you need to block subsequent operations in the native code by returning a promise, because essentially await blocks a promise, and you know the consequences of not doing so

const init = async() = > {const browser = await puppeteer.launch({ headless: false })
  const page = await browser.newPage()
  await page.goto('https://www.bilibili.com')
  await page.evaluate(() = > {
    return new Promise((resolve) = > {
      setTimeout(() = > {
        console.log('Blocked for 10 seconds')
        resolve()
      })
    })
  })

  console.log('Execute after page blocking operation')}Copy the code

Note that the script is executed in a web page, so the code injected into the web page should be treated as a completely separate part, so note:

  • Variable data cannot be accessed from the context in the code, and all the data required by the injected method needs to be passed in externally or declared internally
  • Browser doesn’t recognizerequireimport, you cannot directly use the local NPM library file in the web page. You need to create it in the web page<script>Tags and manually inputted library CDN links can be found inBootCDNFind the CDN address of the corresponding packet

So using third-party libraries on your pages requires additional import

const init = async() = > {const browser = await puppeteer.launch({ headless: false })
  const page = await browser.newPage()
  await page.goto('https://www.bilibili.com')
  
  // Bring dayJS to the web page
  await page.addScriptTag({
    url: 'https://cdn.bootcdn.net/ajax/libs/dayjs/1.10.6/dayjs.min.js'
  })
  await page.evaluate(() = > {
    const result = dayjs().format('YYYY year MM month DD day ')
    alert(result)
  })
}
Copy the code

If the CDN cannot be loaded properly, check the browser proxy

Or expose a library method to a Window object

Note that page.exposeFunction will return a Promise object and subsequent operations should use await to get the final data

const dayjs = require('dayjs')

const init = async() = > {const browser = await puppeteer.launch({ headless: false })
  const page = await browser.newPage()
  await page.goto('https://www.bilibili.com')
  
  await page.exposeFunction('format'.(rules) = > {
    return dayjs().format(rules)
  })

  await page.evaluate(async() = > {const result = await window.format('YYYY year MM month DD day ')
    alert(result)
  })
}
Copy the code

The page elements

The API:

  • Page.content () : Returns the complete HTML code for the page

  • Page.$(selector) : Uses document.querySelector to find the specified element

  • Page. $$(selector) : use the document. The querySelectorAll looking for a specified element

  • Page.$x(expression) : Uses xpath to find the specified element

  • page.$eval(selector, pageFunction, … args?) : Injects a method into the page, passing the result as the first argument to the function body after executing Document. querySelector

  • page.$$eval(selector, pageFunction, … args?) : in the page injection method, the implementation of the document. After querySelectorAll will result as the first parameter to the function body

  • page.click(selector, options?) : Click the element matched by the selector. If more than one element meets the matching condition, only the first element is used

  • Page. Tap (selector) : : Tap the matched element of the selector. If there are multiple elements matching conditions, only the first element is used

  • Page. Focus (selector) : Gets the focus of an element matched by a selector. If more than one element matches, only the first one is used

  • Page. Hover (selector) : hover over an element matched by a selector

  • page.type(selector, text, options?) : Input content to the element matched by the specified selector. If there are multiple elements matching conditions, only the first element is used

Click the jump

If a click event triggers a page jump, wait for the jump to complete in order to ensure the result is normal

const [result] = await Promise.all([
  page.waitForNavigation(),
  page.click('.jump-btn')])Copy the code

Elements to manipulate

Since puppeteer does not provide many ways to manipulate elements, these methods are not necessary when native JS can be injected into web pages to manipulate elements

The following code implements the input and submission of a web form

<! -- index.html -->
<div class="container">
  <form class="from">
    <input class="input-box" type="text" />
    <button class="btn" onclick="Alert (' button clicked ')">submit</button>
  </form>
</div>
Copy the code
const puppeteer = require('puppeteer')
const path = require('path')

const filePrefix = 'file://'
const templatePath = `${filePrefix}${path.resolve(__dirname, './index.html')}` 

const init = async() = > {const browser = await puppeteer.launch({ headless: false })
  const page = await browser.newPage()
  await page.goto(templatePath)
  
  await page.waitForSelector('.container')
  await page.evaluate(() = > {
    const inptBox =  document.querySelector('.input-box')
    const btn = document.querySelector('.btn')
    inptBox.value = '输入的内容'
    btn.click()
  })
}
Copy the code

Perform waiting for

The API:

  • page.waitForNavigation(options?) : Waits for the condition to complete the page jump. The default value is when the load event is triggered

  • page.waitForXPath(xpath, options?) : Page elements waiting for xpath parsing appear in the page

  • page.waitForSelector(selector, options?) : The page element waiting for the selector to resolve appears in the page

    Note that this method and the above method are triggered when they are available in the DOM. You can configure the DOM to not be triggered for display: None or visibility: hidde, but you cannot control the occurrence of elements within the window

  • page.waitForFunction(pageFunction, options? . args?) : Methods that wait to be put into the page context for execution return true

  • Page. WaitForRequest (urlOrPredicate, options) : Waits for requests made on the page to meet the criteria and return true

  • Page. WaitForResponse (urlOrPredicate, options) : Waits for the response to the request received on the page to meet the criteria and return true

  • page.waitFor(selectorOrFunctionOrTimeout, options? . args?) : can act as page.waitForxpath, Page.waitForSelector, Page.waitForFunction, and delay effects

    To avoid confusion, it is recommended to use only for delay effects

Each of these apis executes the function body repeatedly until a non-false value is returned, which blocks subsequent code execution if the condition is not met

Do not manually return false values, which may cause bugs. You can directly judge whether to return true values

All wait methods wait for 30s by default. If no true value is returned within this period, an error message is displayed. You can configure another timeout period

Note: With the exception of page.waitforFunction, all execution contexts are local, so you can continue using the methods puppeteer provides directly in the function body

Waiting for the element

Wait for the element style to show the trigger method

<! -- index.html -->
<body>
  <div class="container"></div>

  <script defer>
    setTimeout(() = > {
      const ele = document.querySelector('.container')
      ele.classList.add('show')},10000)
  </script>

  <style>
    .container {
      display: none;
    }

    .show {
      display: block;
    }
  </style>
</body>
Copy the code
const puppeteer = require('puppeteer')
const path = require('path')

const filePrefix = 'file://'
const templatePath = `${filePrefix}${path.resolve(__dirname, './index.html')}` 

const init = async() = > {const browser = await puppeteer.launch({ headless: false })
  const page = await browser.newPage()
  await page.goto(templatePath)
  
  await page.waitForSelector('.container', {
    visible: true
  }).then(() = > {
    console.log('Container has appeared')})console.log("Wait over")

}

init()
Copy the code

Waiting for the request

Trigger method to wait for a specific request to receive response information

The parameters passed in to the Request and Response inherit Request and Response, respectively, and you can use their instance methods

const init = async() = > {const browser = await puppeteer.launch({ headless: false })
  const page = await browser.newPage()
  await page.goto('https:www.bilibili.com')
  
  await page.waitForResponse(async (res) => {
    if(res.url() === 'https://api.bilibili.com/x/web-interface/nav') {
      const result = await res.json()
      console.log(result)     // {code: -101, message: 'login ', TTL: 1, data: {isLogin: false}}}})}Copy the code

Usage scenarios

screenshots

In general, puppeteer is used for the following screenshots: page. Screenshot (options?)

The default screenshot size is 800 x 600. You can change the screenshot range by changing the page size

const puppeteer = require('puppeteer')

const init = async() = > {const browser = await puppeteer.launch({ headless: false })
  const page = await browser.newPage()

  await page.setViewport({
    width: 1024.height: 768
  })
  await page.goto('https://www.bilibili.com')
  
  await page.screenshot({
    path: `./imgs/The ${new Date().valueOf()}.png`})}Copy the code

If you want to capture all web content including the scroll bar, that is, long screenshots, you can directly turn on the fullPage option

await page.screenshot({
  path: `./imgs/The ${new Date().valueOf()}.png`.fullPage: true
})
Copy the code

This method can be applied to not use lazy loading type load content page, if there are any lazy loading web pages will be beaten, because this page is for the first screen speed so as to judge whether the viewport go only then request content into the specified range, the puppeteer screenshot cannot directly determine this, so the first screen this is a placeholder element has no real content

Screenshots adjusted for lazy loading: Open the site → inject scripts to control the web scroll bar → wait for network requests → scroll down → repeat the process until the end of the page → Screenshots

But there is a problem, the puppeteer can’t seem to be very good in the process of operation of the page to determine whether a new request is request, unless the judging each concrete request to do, but this is obviously not realistic, so here I can only manual plus time delay to wait for the page to load, but it’s not insurance, can only say, There is no good way to make do with it

Here encapsulates a custom screenshot image method, the design idea is as follows:

  1. You can connect to a server instance or create your own server, and you can pass in the server instance configuration
  2. You can set screenshot parameters by passing in parameters, such as the size of the screenshot window and the maximum height of the screenshot, or directly cutting the screenshot to the end of the web page
  3. The mechanism of infinite scrolling of a web page is basically that the scroll bar scrolls to a certain distance, it will start to load new data, so directly inject scripts to the web page to control the scroll bar
  4. Determine whether the current scrolling height of the page is less than or equal to the container height. If so, continue scrolling. If not, exit the loop and take a screenshot

It should be noted that all places involving requests need to manually add a delay to wait for network requests to come back. For example, after the rolling height is changed, it is necessary to wait; otherwise, the new container height after loading of new data cannot be obtained in time, resulting in a judgment error. The specific waiting time depends on the network situation

// screenshot.js
const puppeteer = require('puppeteer')
const path = require('path')

const screenshot = async (url, config={}) => {
  
  let { 
    viewPort = { width: 1920.height: 1080 }, 
    maxHeight = 1080, 
    browserWSEndpoint = null, 
    browserConfig = {},
    fullPage = false,
  } = config

  let browser = null

  if (fullPage) {
    maxHeight = Infinity                                           // For infinite scrolling, this field can be truncated forever, overwriting the maxHeight field
  }
  
  if (browserWSEndpoint) {
    browser = await puppeteer.connect({ browserWSEndpoint })      // If there is a browser instance, connect directly
  } else {
    browser = await puppeteer.launch(browserConfig)
  }

  const page = await browser.newPage()
  await page.setViewport(viewPort)

  await Promise.all([
    page.goto(url),
    page.waitForNavigation([
      'load'.'domcontentloaded'.'networkidle0']])const maxTime = Math.ceil(maxHeight / viewPort.height)

  const documentHeight =  await page.evaluate(async (maxTime, height) => {
    for(let i=1; i<maxTime; i++ ) {
      const curHeight = i * height

      window.scrollTo({
        left: 0.top: curHeight - height,
        behavior: 'smooth'
      }) 

      // The active delay is waiting for the new request to be loaded after the jump connection, otherwise the container height will not be updated in time
      await new Promise((resolve) = > {
        setTimeout(() = > {
          resolve()
        },500)})const documentHeight = document.documentElement.scrollHeight    // We get the height of the container each time because the height of the container may change when new elements are loaded

      // If the current window height is greater than the total container height, the end of the page has been reached
      if(curHeight >= documentHeight) {
        return documentHeight
      } 
    }
  }, maxTime, viewPort.height)
  
  const imgPath = path.resolve(__dirname, `./imgs/The ${new Date().valueOf()}.png`)

  await page.screenshot({
    path: imgPath,
    fullPage: fullPage,
    clip: fullPage ? null : {
      x: 0.y: 0.width: viewPort.width,
      height: documentHeight || maxHeight                             // If the page does not reach the maximum height, cut the page height}})await page.close()
  await browser.close()

  return imgPath

}

module.exports = screenshot
Copy the code

The startup code is as follows

// app.js
const puppeteer = require('puppeteer')
const screenshot = require('./screenshot')

const initBrowser = async(browserConfig) => {
  const browser = await puppeteer.launch(browserConfig)
  return await browser.wsEndpoint()
}

const app = async() = > {const browserWSEndpoint = await initBrowser({ headless: false })
  const imgPath = await screenshot('https://bilibili.com', {
    browserWSEndpoint,
    maxHeight: 6000,})console.log(imgPath)
}

app()
Copy the code

The execution demo is as follows


permissions

The browser Puppeteer opens does not have a cookie record, so a login is required to access a site where users are restricted

If you want to website automatic login, in theory can be done to open the specified website and input account password, but will be stuck in the link of the verification code, because the verification code is to prevent this, here of course can not be external image algorithm to analyze the verification code, so the simplest way:

Open the website for the first time → manually login → get the cookies after login and store them → open the website later → read the corresponding cookies → set cookies → refresh the page for login

Therefore, we can encapsulate a method to read and set cookies as follows:

  • Since the first login requires manual login, you need to set a flag before completing the login. If this flag is not completed it will block subsequent code requests. This requirement workspage.waitForFunctionTo implement, the following code is designed to be manually set up in the development windowxWill wait for the user login, that is, after manual login, you need to manually develop window SettingsxThis principle can be changed to the callback after successful login of any url, but this callback is not so universal so manual setting method is used
  • After login, the cookie will be saved, which can be changed into the database by writing local files to simulate
  • If the file does not exist when reading cookies, create the file first
// login.js
const { writeFile, readFile } = require('fs/promises')

const saveCookie = async (page, website) => {
  await page.waitForFunction(() = > {
    if(window.x) return true
  }, { timeout: 0 })

  const cookie =  await page.cookies()
	
  // The cookies of all websites are written into the cookies folder
  await writeFile(`./cookies/${website}.json`.JSON.stringify(cookie))
}

const getCookie = async (website) => {
  const path = `./cookies/${website}.json`
  const data = await readFile(`./cookies/${website}.json`, {
    flag: 'a+'
  })
  
  // If the cookie expires, return the cookie
  if(! data.toString())return 
  
  return JSON.parse(data.toString())
}

module.exports = { saveCookie, getCookie }
Copy the code

The startup code is as follows:

const puppeteer = require('puppeteer')
const { saveCookie, getCookie } = require('./login')


const websites = {
  juejin: 'juejin'
}

const app = async() = > {const browser = await puppeteer.launch({ headless: false })
  const page = await browser.newPage()

  await page.setViewport({
    width: 1920.height: 1080
  })

  await Promise.all([
    page.goto('https://juejin.cn/'),
    page.waitForNavigation([
      'load'.'domcontentloaded'.'networkidle0']])// Get the cookie of the specified website. If the local file is not stored, it will be stored
  let cookie = await getCookie(websites.juejin)

  if(! cookie) {console.log('Need to login... ')
    await saveCookie(page, websites.juejin)
    cookie = await getCookie(websites.juejin)
  }

  awaitpage.setCookie(... cookie)await page.reload()

}

app()
Copy the code

Since manual login is required and headless mode cannot be completely achieved, you can start a normal login first and then open headless mode after data is available


other

There are many other things that Puppeteer can do, such as web crawlers, page testing, PDF generation, etc., but most of the operations are basically js injection and crazy operation on the page. There is not much related to Puppeteer content. The flower that wants to play will always play

For example, if I want to access the PUPpeteer into the QQ robot, when I enter a command, it will help me go to the designated website and capture a picture of what I want, such as the hot information of Weibo. In this way, I will actually visit the corresponding WEBSITE, take the screenshot to a local file, and then send the picture to QQ using the robot. In this way, it is very convenient to know the real-time hot spots of weibo. One of the major advantages of screenshot is that there is no need to analyze the specific interface and then process the data in detail, and a direct picture is vivid and vivid

Afterword.

Mo got, in short is a more interesting library, interested in can play