How to use Puppeteer to automate Chrome in a Netlify serverless API

  • By Colby Fayock
  • August 26, 2021
    • automation
    • Hosting and deployment
    • Node. Js
    • The tutorial

Automation often involves completely code-based tasks, even without regard to the browser, but some tasks require interaction and use of a browser, much like a human performing a search on a web site. How can we leverage tools that automate browsers and package them to serverless API endpoints for easy access?

What’s inside 🧐

  • What is a puppeteer?
  • How do we use Puppeteer in Netlify’s serverless capabilities?
  • What are we going to build?
  • Step 0: Create a new node project
  • Step 1: Install and configure the Netlify CLI
  • Step 2: Create a new serverless feature
  • Step 3: Install Chrome and Puppeteer for serverless functionality
  • Step 4: Set up a new browser using Puppeteer to retrieve page titles and SEO metadata
  • Step 5: Use site search to find content
  • Step 6: Deploy the function to Netlify
  • What can you do next?

What is a puppeteer?

Puppeteer is a Google JavaScript library that allows developers to control Chrome through an API.

A common use case is testing with Puppeteer, where we can ensure that the operations we perform in the browser work as expected.

But we can also use it to automate the tasks we want to perform programmatically in the browser. For example, we can use Puppeteer to launch a website in Chrome and get its lighthouse score (see the example Puppeteer team).

How do we use Puppeteer in Netlify’s serverless capabilities?

Puppeteer can open a visible browser UI or work “headless,” meaning it will run as a process without actually launching the UI.

This makes it ideal for running in places where you might not have a browser UI, such as a CI environment and, as you might have guessed, serverless functionality.

Therefore, we can leverage this capability to build API endpoints that can perform operations using Puppeteer.

What are we going to build?

We will use Netlify to create a new serverless function to which we can send requests through API endpoints.

In this article, we’ll learn how to package all this into a serverless function that can be run on demand using packages such as Chrome-AWS lambda and Puppeteer.

We’ll use Netlify CLI to handle our function, but it works very similarly no matter how you run and package the function.

Step 0: Create a new node project

For this project, we’re going to start from scratch, because we don’t need too many templates.

The good news is because there isn’t much boilerplate, which should really transfer to any project, so you should have no problems with your existing projects.

First, let’s create a new directory for our project and navigate to it:

mkdir my-puppeteer-function 
cd my-puppeteer-function 
Copy the code

Note: Feel free to use a different name for your project!

We will then initialize a new node project so that we can install the packages we need to improve our productivity.

To create a new node project, run:

Initialize theCopy the code

This will ask you how you want to set up your project through a series of questions. Feel free to enter all of these and use the default values, as they are not important for this walkthrough.

Tip: You can always update these values package.json!

At this point, we now have a new node project and we can start using our new project to improve efficiency.

I also recommend setting the project as a GitHub repository. When doing so, make sure to add a.gitignore root file to include your node_modules to avoid committing those.

To do this, create a file in the.gitignore root directory and simply add:

The node moduleCopy the code

Now we should be ready to start digging!

Follow along and submit!

Step 1: Install and configure the Netlify CLI

As I mentioned earlier, we will use Netlify CLI to manage our functionality. This will include installing the CLI as a global package through NPM or YARN. If you want to cancel this route, you can also try looking at Netlip-lambda, which you can install as a local package, but it may work differently.

You can find the full instructions and documentation on Netlify, but first we need to install the CLI package:

npm install netlify-cli -g 
Copy the code

After installation, you should be able to run the following commands and view the list of available options:

The Internet,Copy the code

While this alone will get you started with the CLI, I also recommend that you log in using your existing Netlify account.

This will allow you to link your project more easily later when you want to deploy your functionality.

You can do this by running:

Netlify loginCopy the code

Netlify makes this process super easy, opening a new browser window where you can use your account for authorization, and then you’ll get authorization from the CLI.

You can also try running the following command:

Netlify developersCopy the code

This should start a local server, but you’ll notice that it won’t do anything because we don’t have anything in the project, and that’s where we’ll start next!

Step 2: Create a new serverless feature

Now that we’re digging into the code, we want to set up a new serverless feature.

We have to break this down:

  • The function itself includes a file and a function handler
  • Netlify Configuration file (netlify.toml) only allows us to point to the directory where we want to create the function

To start by creating the function file itself, let’s create a new folder named functions and inside the root of our project, add a folder named meta.js (our first example will get some metadata from the web page).

Note: Feel free to use directory names that would prefer something different than “functions,” just be sure to use the same name for the rest of the walkthrough.

Functions /meta. Js add:

Export.handler = asynchronous function (event, context) {returns {status code: 200, body: json.stringify ({status: 'good'})}; }Copy the code

This will create a new asynchronous function that will act as our “handler,” which will basically run when we reach the endpoint.

Inside, we return a 200 status code, which means it was a successful request, and a body with a simple status, which means “OK.”

Now we need to create our configuration file before we can use it.

Create a new file named.netliffe. Toml in the root directory of the project.

Inside.netlify. Toml add:

[construct] function = "function"Copy the code

This tells Netlify that we want to create our functions in a folder named “functions”!

And now, the moment we’ve all been waiting for.

We can launch our development server and view this work!

Run the following command:

Netlify developersCopy the code

You should see a few lines in the terminal stating that the CLI found your function and started the server on the specified port (8888 by default).

Start the local development server using the Netlify CLI

Netlify will even try to open it in a browser, but it won’t find anything because we don’t have any items to display.

However, if we try to visit http://localhost:8888/.netlify/functions/meta, we should be able to see the JSON response in the browser!

Successful request for no server functionality endpoint

This may not seem like much, but we just created a new API endpoint where we can start writing custom code!

Follow along and submit!

Step 3: Install Chrome and Puppeteer for serverless functionality

We have the new serverless feature, we can see it running in the browser, now we need to install the tools needed to run Chrome and Puppeteer.

We will use two dependencies for this:

  • Chrome-aws – Ramda
  • Puppet core

Psst: Technically, we’ll use the third one, but we’ll see why later!

Our serverless feature doesn’t have Chrome available by default, and we don’t really have a mechanism to “install” it. Chrome-aws -lambda packages the Chromium binary so that we can use it as a nodal package along with the other dependencies of our project.

Puppeteer-core is the driver function of Puppeteer, but the biggest difference is that the Puppeteer package does not come with a browser. Since we needed to provide our own browser via Chrome-AWS -lambda, we didn’t want to try to add additional browsers to our package because we were limited by the file size in serverless functionality.

Now that we know why we want to use these packages, let’s install them.

Add chrome-aws-lambda puppeteer-core # or NPM install chrome-aws-lambda puppeteer-coreCopy the code

Once that’s done, we can delve into the actual code!

Follow along and submit!

Step 4: Set up a new browser using Puppeteer to retrieve page titles and SEO metadata

First, we need to import our dependencies first.

At the top functions/meta.js add:

constchrome = require('chrome-aws-lambda'); 
const puppeteer = require('puppeteer-core'); 
Copy the code

Next, the way Puppeteer works is that we create the browser instance by associating it with the installed browser copy and launching it.

Add the following at the top of the handler function:

Const browser = await puppeteer.launch({args: chrom.args, executablePath: await chrom.executablepath, headless: true,}); Waiting for the close ();Copy the code

We’re using the Puppeteer launch method to pass in tokens from our Chromium instance, an executable path (from which the browser application launches) that the Chromium package can find and determine, and a headless token set to true because we don’t want to try to launch the UI.

Notice at the end, we also use the close method. We want to make sure that we always clean up the browser to avoid pending requests and wasting resources.

Now, before we go any further, let’s make sure everything is okay. When we start it, we don’t see anything actually “happening” because it’s running headless and we’re not doing anything to it, but we don’t want to see any errors either

In your terminal, run:

Netlify developersCopy the code

Then try to open this function in your browser to http://localhost:8888/.netlify/functions/meta.

Oh oh, you’ll notice that we actually got an error!

Unfortunately, Chrome-AWs-lambda doesn’t “work” when you try to run it locally. This should work if you deploy it as-is to Netlify, but it doesn’t do us much good if we can’t test it locally when we develop it.

The good news is that we can override our executable path at local runtime by using environment variables to use our existing Chrome installation!

Note: Chrome-AWS -lambda has a workaround to run the project locally, installing puppeteer as a development dependency, and I haven’t had much luck getting it to work on its own.

First, we’ll use the popular Dotenv package, which can be easily set up. Run on your terminal:

Yarn add Dotenv # or NPM install DotenvCopy the code

ExecutablePath (executablePath); executablePath (executablePath);

Executable path: process. The env. CHROME_EXECUTABLE_PATH | | waiting for chromium. ExecutablePath,Copy the code

This tells the Puppeteer that we should first try to see if the environment variables are set (local), and if not (production) try to find the Chromium path.

Now we need to set the environment variable.

In the root directory of the project, create a.env and add:

CHROME_EXECUTABLE_PATH="/path/to/chrome" 
Copy the code

Now that might be the tricky part, finding the path.

Fortunately, Chrome makes this a little easier. If we go to Chrome ://version/ in the browser we should be able to find a field named Executable Path which is exactly what we need!

Here’s what I look like on a Mac:

Executable path of Chrome

So now we can insert this value into our environment variable:

CHROME_EXECUTABLE_PATH="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" 
Copy the code

If we restart our development server so that the variables start, we should now be able to refresh the endpoint in the browser and see our “Ok” status again!

Note: Before we continue, make sure to also add.env to your.gitignore file as we don’t want to push it to the repository.

Now using our browser, we can start our Puppeteer interaction by creating a new page and navigating to the site of our choice.

Add the following under the browser constant:

const page = await browser.newPage(); 

await page.goto('https://spacejelly.dev/'); 
Copy the code

Note: Feel free to customize the URL to whatever you want!

If we try to run it, we still don’t see anything happening. Let’s solve this problem by looking for the page title and returning it with our data.

After we navigate to the site of our choice, add:

const title = await page.title(); 
Copy the code

In our return statement, add the following as a new property to status:

Text: json.stringify ({status: 'good', page: {title}})Copy the code

This tells the Puppeteer to get the page title and return it in our response.

Now, if we refresh the page in the browser, we should see our page title!

A meta-function that returns the page title using Puppeteer

It can even be extended at will using the Puppeteer API. For example, if we also wanted to get the meta description, we could add it under the heading:

const description = await page.$eval('meta[name="description"]', element => element.content); 
Copy the code

Note: There is no native API like the title to get the description, so we need to find the tag and evaluate it manually

As before, return it in our data:

Page: {title, Description}Copy the code

If we refresh the browser, we should now see the title and description!

Meta description in endpoint response

Follow along and submit!

Step 5: Use site search to find content

The cool thing about Puppeteer is that we have a lot of functionality. We can interact with pages and really do a lot of things that real people would do on a web page.

To test this, let’s try an example of searching on spacejelly.dev and getting a list of results.

We’ll start processing by copying the current endpoint and creating a new one.

In your project, copy functions/meta.js to a new file functions/results.js.

Most of the shell of the file will be the same, because we’ll create a new browser, just like we did with the metadata, only this time, we’ll search the page instead of getting the title and description!

Function /results.js replaces the title and description lines with:

Await page.focus('#search-query') await page.keyboard. Type (' API '); Const results = await page.$$eval('#search-query + div a', (links) => {return links.map(link => {return {text: Link. InnerText, href: link.href}}); });Copy the code

This will cause the browser to focus on the search input, and then enter the query “API”, which will bring up the search results client.

Once available, we can find these results and evaluate them, grab the text and location within the link, and store it in a results variable.

So finally, let’s return it with our data. In our return statement, add:

Returns {status code: 200, body: json.stringify ({status: 'good', result})};Copy the code

Now with our development server running, if we reach the endpoint, we should see our results!

Response to search results in the data

Follow along and submit!

Step 6: Deploy the function to Netlify

Finally, we want to see this work in production, so let’s deploy it to Netlify.

Since we are using the Netlify CLI, this is actually quite easy to do from our terminal!

First, run:

Netlify deploymentCopy the code

It first asks you whether you want to link to an existing project or create a new one. If you are following up, you may want to create a new one. If you are in an existing project, you may want to follow the existing project.

You will then select the team and site name for your Netlify account. You will also be asked to provide a publishing directory, and if you continue to care, you can use the default directory., so just press Enter.

Preview the logs when deployed to Netlify

At this point, Netlify has deployed only a preview version, which you can view at the draft URL of the website.

To see this, we can get the URL and append the path to the function. In the example shown in the screen capture above, it looks like:

https://6127137f71ef564eb08211ac--my-puppeteer-function.netlify.app/.netlify/functions/meta 
Copy the code

This should work like a local!

Note: I removed my deployment, so the link above doesn’t actually work!

If we are ready, we can deploy it into production using the following command:

Netlify deployment - prodCopy the code

Once complete, we can now see our new serverless capabilities deployed to Netlify using Puppeteer and Chrome!

What can you do next?

Moore puppeteer

There is much to be tried in the Puppeteer library. If you can do this yourself in a browser, chances are you can find a way to do it using Puppeteer.

This is useful for things like testing, where you might want to make sure that a particular part of the site is working, and you want to do that through the endpoint. Or, if you want to do some web scraping to get real-time data from a web site. Be ethical! 🧐)

pptr.dev/