This article mainly introduces common SEO solutions based on Puppeteer Vue, React and other single page applications.

Recent blog system just finished the first edition, accidentally learned that some search engines such as baidu for single-page applications are not included, the reason is very simple, because baidu reptiles for js don’t recognize, leads to an ajax request data cannot be captured, the crawler climb is just a facade single-page seo plan a lot, there are many online teaching materials, such as SSR, Pre-render, etc. But none of them fits into an already developed program and needs to be modified more or less, hence the Puppeteer.

                                                


Puppeteer is a Node library that provides advanced apis for controlling Chrome or Chromium through the DevTools protocol, and most of the things you can do manually in your browser can be done using Puppeteer! Such as:

  • Generate screen captures and PDF of the page.
  • Grab SPA (single-page application) and generate pre-rendered content (i.e. “SSR” (server-side rendering))
  • Automatically perform form submission, UI testing, keyboard input, etc
  • Create the latest automated test environment
  • Run tests directly in the latest version of Chrome, using the latest JavaScript and browser features
  • Capture a timeline trace of the site to help diagnose performance problems
  • Test the Chrome extension.

Grasping the learning spirit of Internet development and sharing, here is a summary of what I have done for 4 or 5 days and share it with friends who need to do SPA. Without saying much, I will go straight to the dry goods.



This section describes how puppeteer implements single-page applications

Puppeteer can operate Googleheadless to access our site and return a complete HTML code. Before we talk about puppeteer, a little bit about the UA. Please skip what you already know and go straight to the following.



# This is the user-Agent in the log of the website I visited using the local MAC systemMozilla / 5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36# This is the user-agent when Google crawler visits my siteMozilla / 5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot / 2.1; +http://www.google.com/bot.html)# sogou access user-agentSogou web spiders / 4.0 (+ http://www.sogou.com/docs/help/webmasters.htm#07;Sogou web spiders / 4.0 (+ http://www.sogou.com/docs/help/webmasters.htm#07);
# Baidu UAMozilla / 5.0 (compatible; Baiduspider / 2.0; +http://www.baidu.com/search/spider.html);Copy the code


Ordinary users visit our website and crawlers visit our website, there is a difference between user-Agent. I got the log on the server and grabbed a few comparisons. It can be seen that the CRAWler UA of the search engine is also different, so we can use Nginx for reverse proxy. We proxy it to a server that specializes in rendering the crawler, the Puppeteer, which accesses the corresponding url and returns the rendered page to the crawler, as shown in the following image.


As shown in the figure, we need to perform two operations; 1. Proxy server configuration 2. Setup of rendering server

                         

2 Proxy server configuration

Since I use Nginx, all proxy server configuration I directly on the nginx configuration code

location /{
  The #if statement is used to determine whether the UA is crawler
  if ($http_user_agent~* (BaiduSpider|Googlebot|360Spider|Bingbot|Sosospider|Yahoo! \ Slurp\ China|YisouSpider)) {If it is a crawler, jump directly to render server side XXX represents, render server domain name or IP
       proxy_pass http://xxx;
  } 
  # return index directly
  try_files $uri $uri/ /index.html;
}
Copy the code



Third rendering server, I use KAO as rendering service



const Koa = require('koa');
const fs = require('fs');
const puppeteer = require('puppeteer');
const browserUrl = fs.readFileSync("chrome.txt"."utf8");
​
const app = new Koa();
const baseUrl = 'http://www.9cka.cn'; App. use(async (CTX,next)=>{let browser = await puppeteer.launch({dumpio:true,args: ['--no-sandbox'.'--disable-setuid-sandbox'],timeout: 10000});
​
   const page = await browser.newPage();
   try {
    
     letmyUrl = baseUrl+ctx.url; await page.goto(myUrl); Await page.waitfor (5000); } catch (err) { await page.close(); await browser.disconnect(); console.log('Error:'+err); // Error 'error'} is caught here.let html = await page.content()
  
  /*fs.writeFile("myhtml22.html",html,function (err) {
    
    if (err){
      console.log('File write error'+err);
      throw err ; 
    } 
    
    console.log("Myhtml. HTML success"); // File saved}) */ ctx.type ="text/html; charset=utf-8";
  ctx.body =html;
  await page.close();
  await browser.close();
​
});
​
app.listen('3388');
console.log('Port 3388 crawler started');
​


Copy the code


Four after the completion of the deployment can be tested.

I tested it using Postman, and I tested it using crawler UA. The result is shown in the figure below, where I get the full HTML, including the data of the Ajax request, so that the crawler can catch the data.





My personal blog address: Click to jump