When you see your favorite online document, do you always want to save it and study it slowly? But suffering from no ready-made tools, here I will introduce two JS class libraries, just need to simple encapsulation, from where to catch where to catch.

Use Phantomjs

1. Easy to use

const phantom = require('phantom'); (async function() { const instance = await phantom.create(); const page = await instance.createPage(); await page.on('onResourceRequested', function(requestData) { console.info('Requesting', requestData.url); }); const status = await page.open('http://jartto.wang'); await page.render('jarttoTest.pdf'); await instance.exit(); }) ();Copy the code

Here’s a complete example. Let’s look at the core and zoom in:

page.open(address, function (status) { if (status ! == 'success') { // --- Error opening the webpage --- console.log('Unable to load the address! '); } else { // --- Keep Looping Until Render Completes --- window.setTimeout(function () { page.render(output); phantom.exit(); }, 200); }});Copy the code

Well, combined, it’s perfect. Here are the best practices:

var page = require('webpage').create(); page.open('http://jartto.wang', function(status) { setTimeout(function() { if ( status === 'success' ) { page.render('test.png'); phantom.exit(); } else { console.log('Page failed to load.'); }}, 200); });Copy the code

Because the open operation has a response time, you need to use setTimeout to ensure the process.

As an added bonus, if you need to detect UA, you need to use:

var page = require('webpage').create(); Page. UserAgent = 'Mozilla / 5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36'; page.open('https://securelearning.in', function() { page.render('image.png'); phantom.exit(); });Copy the code

I won’t go into details here, but you can check out the Demo for more details.

Use Puppeteer

If Phantomjs is so good, why learn about Puppeteer?

With this in mind, there is a powerful official background for Puppeteer: Puppeteer is the official Headless Chrome tool for The Google Chrome team. So Puppeteer is our first choice, both for future development and completeness.

So what can Puppeteer do and what does it do:

  • Generate screenshots andPDF
  • Crawl site content
  • Simulated login, automatic form submission,UITesting, keyboard input, etc
  • Use the latestJavaScriptAnd browser features directly in the latest versionChromeRun the tests in.
  • Capture a timeline trace of a web site to help diagnose performance problems
  • testChromeadd-in

With that said, let’s get started quickly and get a taste of it.

1. Install

npm i puppeteerCopy the code

2. Precautions

The installation won’t be as smooth because you’re dependent on Chromium, so you might run into exceptions like this.


Correct posture:

Set PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=1 NPM I puppeteer # or ignore the script to install Jartto's demo NPM I --save puppeteer --ignore-scriptsCopy the code

If not, go to the official website and download Chromium manually. Of course, the pit is much more than that, and interested children can poke here to Puppeteer exception handling.

3. Download the core package

npm i puppeteer-coreCopy the code

Puppeteer-core will not download Chromium by default starting with version 1.7. Puppeteer-core uses an installed browser or a remote browser. For details, see puppeteer vs Puppeteer-core.

4. Sample site snapshot generation:

const puppeteer = require('puppeteer'); (async () => { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('http://jartto.wang'); await page.screenshot({path: 'blog.png'}); await browser.close(); }) ();Copy the code

Generate PDF by site address:

const puppeteer = require('puppeteer'); (async () => { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pdf', {waitUntil: 'networkidle2'}); await page.pdf({path: 'jartto.pdf', format: 'A4'}); await browser.close(); }) ();Copy the code

Set viewport:

await page.setViewport({width: 1024, height: 880});Copy the code

Setting cookies:

const COOKS =[ { 'domain': 'jartto.wang', 'name': 'user', 'value': 'jartto', } ] await page.setCookie(... COOKS);Copy the code

Simulate the device iphonex to generate screenshots:

const devices = require('puppeteer/DeviceDescriptors'); const puppeteer = require('puppeteer'); (async () => {// create a browser sample object const browser = await puppeteer.launch({executablePath: 'chromium/Chromium.app/Contents/MacOS/Chromium', headless: true }); Const Page = await browser.newPage(); await page.emulate(devices['iPhone X']); await page.goto('http://jartto.wang'); await page.screenshot({path: 'temp/iphonex.png'}); await browser.close(); })().catch(error => console.log('error: ', error.message));Copy the code

If it is local chromium, remember configuration executablePath: ‘chromium/chromium. The app/Contents/MacOS/chromium’.

Even more interesting, we can search for keywords to generate snapshots, such as:

const devices = require('puppeteer/DeviceDescriptors') const puppeteer = require('puppeteer'); (async () => { const browser = await puppeteer.launch({ executablePath: 'chromium/Chromium.app/Contents/MacOS/Chromium', headless: true }); const page = await browser.newPage(); await page.emulate(devices['iPhone X']); await page.goto('https://www.baidu.com/'); // Enter box ID, search keyword jartto await page.type('#index-kw', 'jartto'); Click ('#index-bn'); // jump wait time await page.waitfornavigation ({timeout: 3000}); await page.screenshot({path: 'temp/search.png'}); await browser.close(); })().catch(error => console.log('error: ', error.message));Copy the code

To execute the code above, Baidu search engine will be opened and the keyword jartto will be searched at the same time. After the jump is completed, a screenshot will be taken and a picture will be saved. Core code:

// Enter box ID, search keyword jartto await page.type('#index-kw', 'jartto'); Click ('#index-bn'); // jump wait time await page.waitfornavigation ({timeout: 3000});Copy the code

Be sure to find the id of the element. You can use chrome Developer Tools to toggle mobile mode to see the element.

The blog snapshot on the left and the search jump snapshot on the right:


If you don’t think examples are enough, here’s an online Demo, as well as one I wrote myself.

5. Related resources

There are always a lot of learning resources, learning from the shallow to the deep will get twice the result with half the effort, I will sort out a little.

  • The first stage
    • The online DEMO
    • Understand the Puppeteer
    • Introduction and practice of Puppeteer
    • Chromium download pit
    • Generate a PDF file using Puppeteer
  • The second stage
    • In-depth Puppeteer
    • Puppeteer exception handling
  • The third stage
    • Chinese document
    • Puppeteer API

Start your learning journey step by step.

Third, summary

This article provides two ways to grab urls and generate PDFS online. Of course, the function of the plug-in is not simple, you can also do more interesting things. Well, build it up and trust me, you’ll need it some day.