1.puppeteerWhat is?

The puppeteer API is the official Google Headless Chrome node repository for puppeteer github

Official Introduction:

Most of the things you can do manually in your browser can be done using Puppeteer!

Generate screen shots and PDFS of the page.

Grab the SPA and generate the pre-rendered content (the “SSR”). Automate form submission, UI testing, keyboard input, etc. Create an up-to-date automated test environment. Run the tests directly in the latest version of Chrome using the latest JavaScript and browser features. Capture timelines to track your website to help diagnose performance issues. Test the Chrome extension.

2, crawl site generationPDF

2.1 installation puppeteer

// The puppeteer installation may fail due to network reasons. Can use taobao, the mirror / / NPM install - g CNPM - registry=https://registry.npm.taobao.org NPM I puppeteer # or "yarn add puppeteer." "Copy the code

2.2″React.jsIntroduction to The Little Book

Introduction to React. Js Little Book

About the author @Beard Big ha

This is a little book about React.js. Because it’s used all the time at workReact.jsI have been thinking of summarizing myself aboutReact.jsSome knowledge, experience. So some of the ideas slowly arranged to write down, made a bookOpen source, free, professional, simpleThe introductory level of a small book available to the community. I hope I can help moreReact.jsFriends just getting started.

Below is theReact.jsSome screenshots from The Little Book:

2.3 Some may be usedpuppeteer API

// create reactMinibook. js and run node reactMinibook. js to generate PDF const puppeteer = require('puppeteer'); // create reactMinibook. js and run node reactMinibook. js to generate PDF const puppeteer = require('puppeteer'); (async () => {// start the browser const browser = await puppeteer.launch({// no interface default true, change to false, you can see the browser operations, currently generated PDF only support no interface operations. // Headless: false, // enable developer debug mode, default is false, that is, the usual F12 open version // devtools: true,}); Const page = await browser.newPage(); / / jump to the page http://huziketang.mangojuice.top/books/react/ await page. Goto (' http://huziketang.com/books/react/ ', {waitUntil: 'networkidle2'}); // await page. PDF ({path: 'react. PDF ', format: 'A4'}); Await browser.close(); }) ();Copy the code

Now that you’ve started the browser, opened the page and closed the main flow of the browser, let’s look at a few more apis.

const args = 1; Let wh = await page.evaluate((args) => {// args can be passed to the function as such. // similar to setTimeout(() => {console.log(args); }, 3000, args); console.log('args', args); / / 1 / / here you can run the dom manipulation and js / / return through dom manipulation such as access to the data return {width: 1920, height: document. Body. ClientHeight,}; }, args); // Set await page. SetViewport (wh); // waitFor 2s await page. WaitFor (2000);Copy the code
// Execute with the iPhone X. const devices = require('puppeteer/DeviceDescriptors'); const iPhone = devices['iPhone X']; await page.emulate(iPhone);Copy the code

2.4 Know the aboveAPIAfter that, you can start writing the main program.

Briefly: Implement functionality and main flow. Check out the screenshot of the React. Js book above. 2, jump to 1.react. Js introduction page, get all navigation a links left href, title. 3, use the obtained array a link for loop, this loop mainly do the following things:

3.1 Hide the left navigation for easy generation of PDF 3.2 Ordinal **React. Js introduction ** and other titles, easy to view 3.3 set the docment. It is also a good choice for this module (because the header and footer set the book link information, so this is hidden). 3.6 At the end of the statement, the description of the PDF is only for learning and communication, and is strictly prohibited for commercial use. 3.7 Return width and Height, used to Set the View Size 3.8 Set the View Size, create and generate PDF files

4. Close the browser

The code: check out the code here for each section of the crawler that generates the PDF of the React. Js Little Book

/ / the node execute the file / / the author here is: the node/SRC/puppeteer reactMiniBook. JsCopy the code

You can generate a PDF of each section (sections 0-46) as shown below

Now, once you’ve generated these, the problem is that you can’t always look at a section and open a section, so it’s very inconvenient. So the next step is to merge these PDFS into one PDF file.

3. Merge into a PDF filepdf-merge

At first, I used the online site Smallpdf to merge PDFS. The effect of the merger is still very good. The site also has other functions. For example, word to PDF. An NPM Packagepdf merge provided by the community was found. (After all, I write programs, so I did it in code.)

This PDF-merge relies on PDFTK

Download and install PDFtk Windows. After I install it, restart your computer to use it.

Debian, Ubuntu is available after I install it on Ubuntu. apt-get install pdftk

Using the example

const PDFMerge = require('pdf-merge'); const files = [ `${__dirname}/1.pdf`, `${__dirname}/2.pdf`, ]; // Buffer (Default) PDFMerge(files) .then((buffer) => {... }); // Stream PDFMerge(files, {output: 'Stream'}) .then((stream) => {... }); Save as new file PDFMerge(files, {output: '${__dirname}/3.pdf'}).then((buffer) => {... });Copy the code

With that in mind, you can start writing the main program. 1, read all the generated PDF file paths, and sort (0-46) 2, determine whether the output folder exists, do not exist, create 3, merge these sections of PDF and save to a new file React little book (full version) – author: Beards big Ha – timestamp.pdf

Check out the code here for the crawler to generate the PDF merge PDF of the React. Js little book

The final combined PDF file is available for download. Github download link: React Little Book (full edition) – Author: Big Beard.

This thought can also add bookmarks and page numbers, did not find a suitable generation scheme, that temporarily do not add. If the reader has a good plan, welcome to communicate with the author.

summary

Puppeteer is Google’s official headless Chrome Node library. Most of the operations you can perform manually in the browser can be performed using Puppeteer. It can be used to do a lot of interesting things. 2. Create a PDF file for each section using puppeteer and merge it into a new PDF file using pdFTK-dependent PDF-merge NPM package. Or use sites like Smallpdf to merge. 3, “React. Js Little Book”, recommended for everyone. The crawler generates a PDF, which should have no effect on the author @Beards. It’s not easy for an author to write a book to serve the community. Support the author as much as possible.

Finally, I recommend a few links for you to learn puppeteer. This is an ES6 standard introduction to puppeteer. This is a Chinese version of the Amway PUPpeteer API

about

Author: often with the name of Ruochuan mixed in rivers and lakes. The front road lovers | | PPT know little, only good study. Personal blog segmentfault front view column, opened front view column, welcome to pay attention to ~ gold column, welcome to pay attention to ~ github front view column, welcome to pay attention to ~ github blog, find a star^_^~

Wechat public account Ruochuan Vision

Welcome to Wakawa Vision. You can also add wechat Ruochuan12, indicate the source, pull you into [front view communication group].