Convert HTML to PDF 4 solutions and implementation

3. Crazy geek

The original:
https://blog.risingstack.com/…

This article first send WeChat messages public number: jingchengyideng welcome attention, every day to push you fresh front-end technology articles

In this article, I’ll show you how to generate PDF documents from a complex React page using Node.js, Puppeteer, Headless Chrome, and Docker.

Background: A few months ago, a client asked us to develop a feature where users could get the React page content in PDF format. This page is basically a report of the patient’s case and the results of the data visualization, which includes a lot of SVG. There are also special requests to manipulate the layout and do some reordering of the HTML elements. So there should be different styles and extra content in the PDF compared to the original React page.

Since this task is much more complex than solving it with simple CSS rules, we first explored possible implementations. We found three main solutions. This blog post will guide you through their possibilities and ultimately implement them.

Directory:

Is it generated on the client side or the server side?
Scenario 1: Make screen shots from the DOM
Scenario 2: Use the PDF library only
Final Solution 3: Node.js, Puppeteer, and Headless Chrome
- Style control
- Send the file to the client and save it
Use Puppeteer in Docker
Plan 3 +1: CSS printing rules
conclusion

Is it generated on the client side or the server side?

PDF files can be generated on both the client and server sides. But it might make more sense to let the back end handle it, because you don’t want to exhaust all the resources that the user’s browser has to offer.

Even so, I’ll show you the solution to both approaches.

Scenario 1: Make screen shots from the DOM

At first glance, this solution may seem the simplest, and it turns out it is, but it has its limitations. If you don’t have specific requirements, such as selecting text in a PDF or searching for text, this is a simple and easy way to use.

The method is straightforward: create a screen capture from the page and place it in a PDF file. It’s pretty straightforward. We can do this using two packages:

HTML2Canvas, generates a screenshot from the DOM
JSPDF, a library for generating PDFs

Start coding:

npm install html2canvas jspdf

import html2canvas from 'html2canvas'
import jsPdf from 'jspdf'
 
function printPDF () {
    const domElement = document.getElementById('your-id')
    html2canvas(domElement, { onclone: (document) => {
      document.getElementById('print-button').style.visibility = 'hidden'
}})
    .then((canvas) => {
        const img = canvas.toDataURL('image/png')
        const pdf = new jsPdf()
        pdf.addImage(imgData, 'JPEG', 0, 0, width, height)
        pdf.save('your-filename.pdf')
})

In this way!

Notice the OnClone method of HTML2Canvas. It comes in handy when you need to manipulate the DOM (such as hiding the print button) before you take a screenshot. I have seen many projects that use this package. Unfortunately, this is not what we want, because we need to finish creating the PDF on the back end.

Scenario 2: Use the PDF library only

There are several libraries on NPM, such as jsPDF (as described above) or PDFKit. Their problem was that if I wanted to use these libraries, I would have to reorganize the page. This definitely hurts maintainability because I need to apply all subsequent changes to the PDF template and the React page.

Look at the code below. You need to manually create the PDF document yourself. You need to traverse the DOM and find each element and convert it to PDF format, which is a lot of work. A simpler way must be found.

doc = new PDFDocument doc.pipe fs.createWriteStream('output.pdf') doc.font('fonts/PalatinoBold.ttf') .fontSize(25) .text('Some text with an embedded font! ', 100, 100) doc.image('path/to/image.png', { fit: [250, 300], align: 'center', valign: 'center' }); doc.addPage() .fontSize(25) .text('Here is some vector graphics... ', 100, 100) doc.end()

This code snippet comes from the PDFKit documentation. But it’s useful if your goal is to generate a PDF file directly, rather than converting to an existing (and constantly changing) HTML page.

Final Solution 3: Node.js based Puppeteer and Headless Chrome

What are puppeteers? Its documents say:

Puppeteer is a Node library that provides a high-level API to control Chrome or Chromium on the DevTools protocol. Puppeteer runs Chrome or Chromium in headless mode by default, but it can also be configured to run in full (non-headless) mode.

It is essentially a browser that runs from Node.js. If you read the documentation, one of the first things it says is that you can use Puppeteer to generate screenshots and PDFs of pages. Excellent! That’s exactly what we want.

Start by installing the Puppeteer with the NPMI Puppeteer and implementing our features.

const puppeteer = require('puppeteer')
 
async function printPDF() {
  const browser = await puppeteer.launch({ headless: true });
  const page = await browser.newPage();
  await page.goto('https://blog.risingstack.com', {waitUntil: 'networkidle0'});
  const pdf = await page.pdf({ format: 'A4' });
 
  await browser.close();
  return pdf
})

This is a simple feature that navigates to a URL and generates a PD F file for the site.

First, we launch the browser (PDF generation is supported in headless mode only), then open the new page, set the viewport, and navigate to the URL provided.

Setting the waitUntil:’networkidle0′ option means that when there is no network connection for at least 500 milliseconds, the Puppeteer thinks the navigation is complete. (See API Docs for more information.)

After that, we save the PDF as a variable, close the browser, and return the PDF.

Note: The page.pdf method accepts options objects, and you can use the ‘path’ option to save the file to disk. If no path is provided, the PDF will not be saved to disk, but will be buffered. (I’ll discuss how to handle it later.)

If you need to log in to generate a PDF from a protected page, first you need to navigate to the login page, check the ID or name of the form element, fill them out, and submit the form:

await page.type('#email', process.env.PDF_USER)
await page.type('#password', process.env.PDF_PASSWORD)
await page.click('#submit')

Always keep your login credentials in your environment variables and do not hard code them!

Style control

Puppeteer also has a solution for this style of operation. You can insert style tags before the PDF is generated, and the Puppeteer will generate a file with the modified style.

await page.addStyleTag({ content: '.nav { display: none} .navbar { border: 0px} #print-button {display: none}' })

Send the file to the client and save it

Okay, now you’ve generated a PDF file on the back end. What’s next?

As mentioned above, if you don’t save the file to disk, you will get a buffer. You just need to send a buffer with the appropriate content type to the front end.

printPDF.then(pdf => {
    res.set({ 'Content-Type': 'application/pdf', 'Content-Length': pdf.length })
    res.send(pdf)

Now, you can get the generated PDF by simply sending a request to the server in your browser.

function getPDF() {
 return axios.get(`${API_URL}/your-pdf-endpoint`, {
   responseType: 'arraybuffer',
   headers: {
     'Accept': 'application/pdf'
   }
 })

Once the request has been sent, the contents of the buffer should start downloading. The final step is to convert the buffer data to a PDF file.

SavePDF = () => {this.openModal(' Loading... ') // open modal return getPDF() // API call. then((response) => {const blob = new blob ([response.data], {type: 'application/pdf'}) const link = document.createElement('a') link.href = window.URL.createObjectURL(blob) link.download = `your-file-name.pdf` link.click() this.closeModal() // close modal }) .catch(err => /** error handling **/) } <button onClick={this.savePDF}>Save as PDF</button>

In this way! If you click the Save button, the browser will save the PDF.

Use Puppeteer in Docker

I think this is the trickiest part of the implementation — so let me save you a few hours of Baidu time.

The official documentation states that “using Headless Chrome in Docker and getting it running can be very tricky”. The documentation has a troubleshooting section where you can find all the necessary information about installing Puppeteer with Docker.

If you’re installing Puppeteer on the Alpine image, be sure to scroll down a bit when you see this part of the page. Otherwise you might ignore the fact that you can’t run the latest Puppeteer version and that you need to disable SHM with a flag:

const browser = await puppeteer.launch({
  headless: true,
  args: ['--disable-dev-shm-usage']
});

Otherwise, the Puppeteer child process may run out of memory before it starts properly.

Plan 3 + 1: CSS printing rules

One might think that simply printing rules using CSS is easy from a developer’s point of view. No NPM module, just plain CSS. But how well does it perform in terms of cross-browser compatibility?

When choosing CSS printing rules, you have to test the results in each browser to make sure that it provides the same layout, and that it doesn’t do it 100% of the time.

For example, inserting a break-after after a given element is not a very advanced technique, but you might be surprised to find that to use it in Firefox you need workaround-works.

Unless you are an experienced CSS master with a lot of experience creating printable pages, this can be very time consuming.

Print rules are useful if you can keep the print stylesheet simple.

Let’s look at an example.

@media print { .print-button { display: none; } .content div { break-after: always; }}

The CSS above hides the print button and inserts a page break after each div that contains the Content class. There’s a great article summarizing what you can do with printing rules and what’s wrong with them, including browser compatibility.

All things considered, CSS printing rules are very effective if you want to generate PDFs from less complex pages.

conclusion

Let’s quickly review the previous scenario to generate a PDF file from an HTML page:

Generating screenshots from the DOM: This can be useful when you need to create snapshots from a page (such as creating thumbnails), but it can be a bit of a problem when you need to process a lot of data.
PDF Library Only: This is the perfect solution if you want to create PDF files programmatically from scratch. Otherwise, you need to maintain both HTML and PDF templates, which is definitely a no-no.
Puppeteer: Although it was relatively difficult to work on Docker, it gave us the best results for our implementation and was the easiest to code.
CSS printing rules: If your users are educated enough to know how to print page content to a file, and your page is relatively simple, then this is probably the easiest solution. As you can see in our case, this is not the case.

Happy printing!

The first send WeChat messages public number: Jingchengyideng

Welcome to scan the two-dimensional code to pay attention to the public number, every day to push you fresh front-end technology articles

Read on for the other great articles in this column:

12 Amazing CSS Experiment Projects
What are the front-end interview questions at the world’s top companies
CSS Flexbox Visualization Manual
The holidays are boring? Write a little brain game in JavaScript!
React from a designer’s point of view
How does CSS sticky positioning work
A step-by-step guide to implementing animations using HTML5 SVG
Programmer 30 years old before the monthly salary is less than 30K, which way to go
7 open front end questions
React Tutorial: A Quick Start Guide