When Puppeteer operates a headless browser for data crawling, native JS selectors are sufficient if fewer DOM nodes need to be retrieved

When there are more elements to fetch, I want to use JQ, which is the most efficient way to manipulate the DOM

If you visit a site that native references JQ

If targetPage already includes jQuery, you can manipulate the DOM structure directly with $by declaring a variable

const userEl = await page.evaluate(() = > {
let arr=[]
    var$=window.$
      $('ul[class=user-list]').find('li').each((index,item) = >{
           arr.push($(item).find('img').attr('src'))
           console.log($(item).find('img').attr('src'))})return arr
    })
Copy the code

In addition, I have found that some sites cannot use Windows even if they already reference jQuery. Acquisition method

If not, you can insert jQuery for subsequent DOM manipulation

await page
    .mainFrame()
    .addScriptTag({
      url: 'https://cdn.bootcss.com/jquery/3.2.0/jquery.min.js'
    })
  const userEl = await page.evaluate(() = > {
  let arr=[]
      $('ul[class=user-list]').find('li').each((index,item) = >{
           arr.push($(item).find('img').attr('src'))
           console.log($(item).find('img').attr('src'))})return arr
    })  
Copy the code

Or use the tripartite tool

  • Puppeteer -jquery
  • www.npmjs.com/package/pup…

This is the first time to change the mirror address

  • npm config set puppeteer_download_host=npm.taobao.org/mirrors

  • The installation

  • yarn add puppeteer puppeteer-jquery

const puppeteer = require('puppeteer');
const $jquery = require("puppeteer-jquery");
const {pageExtend, PageEx} = $jquery;

(async__ = > {let browser = await puppeteer.launch({headless: false});
    let pageOrg = await browser.newPage();
    / / the key
    let page = pageExtend(pageOrg);
    
    await page.jQuery('body').append(`<h1>Title</h1>`);
    
    let title = await page.jQuery('h1').text();
   
    let text = await page.jQuery('body button:last')
        .closest('div')
        .find('h3')
        .css('color'.'yellow')
        .parent()
        .find(':last') .text() ; }) ()Copy the code