Case one crawls the web page

Create index.js in the project, open the terminal, and install package.json

npm init -y
Copy the code

Install the Express, Requests package

npm i express requests
Copy the code

Fs.writefile (file, data[, options], callback)

let requests = require('requests')
let fs = require('fs')
requests('https://www.jsdaima.com/js/demo/1358.html')
.on('data'.function(chunk){
    fs.writeFile('index.html',chunk,function(){
        console.log('save successfully')})})Copy the code

The terminal executes node index.js and crawls out index.html as shown below

<html xmlns="http://www.w3.org/1999/xhtml">

<head>
    <meta charset="utf-8">
    <meta name="viewport" content="Width = device - width, minimum - scale = 1.0, the maximum - scale = 1.0"</title> <meta name="keywords" content="Up, down, left, right, seamless scrolling,jQuery plugin" />
    <meta name="description" content="Scroll up, down, left, and right seamlessly to download the jQuery plugin. Realize up, down, left and right automatic scrolling, seamless scrolling. />
    <meta name="author" content="Js code" />
    <meta name="copyright" content="Js code" />
    <style>
        * {
            margin: 0px;
            padding: 0px;
            font-family: Microsoft Yahei; } html, iframe, body { height: 100% } .none { display: none ! important } @media screen and (max-width: 640px) {#mobileFrame {
                display: none !important;
            }
        }

        #hidemobile {
            font-size: 14px;
            font-weight: bold;
            border: 1px solid silver;
            position: absolute;
            right: 20px;
            top: 8px;
            width: 15px;
            height: 15px;
            text-align: center;
            padding: 0;
            line-height: 15px;
            border-radius: 15px;
            cursor: pointer;
        }
    </style>
    <script type="text/javascript" src="/static/js/protect.js"></script>
</head>

<body><iframe src="https://www.jsdaima.com/Uploads/js/201803/1522376449/index.html" frameborder="0" width="100%"
        height="100%"></iframe></body>

</html>
Copy the code

As you can see he is through the iframe embeds a page, so we’re going to climb again from https://www.jsdaima.com/Uploads/js/201803/1522376449/index.html

let requests = require('requests')
let path = require('path')
let fs = require('fs')
requests('https://www.jsdaima.com/Uploads/js/201803/1522376449/index.html')
.on('data'.function(chunk){
    fs.writeFile('index.html',chunk,function(){
        console.log('save successfully')})})Copy the code

In the end, index.html was successfully climbed. In addition, we can see from his code that the following files are needed. Jquery can be obtained through BootCDN

<script type="text/javascript" src="/ static/js/jquery - 1.10.2 mins. Js." "></script>
<link rel="stylesheet" href="css/demo.css"/>
<script src="js/rollslide.js"></script>
Copy the code

Change SRC according to the file path

let requests = require('requests')
let path = require('path')
let fs = require('fs')
requests('https://www.jsdaima.com/Uploads/js/201803/1522376449/css/demo.css')
.on('data'.function(chunk){
    fs.writeFile('demo.css',chunk,function(){
        console.log('save successfully')})})Copy the code

The same goes for js files

Case 2: Crawl data from script in HTML file of web page

Objective: To crawl the epidemic data of lilac Garden

From the web page of Lilac Garden, it can be seen that his data is put in HTML script, rather than ajax request.

Each script has a separate ID on it, and the required script is obtained with the help of CHEERio of NPM

Cheerio var cheerio = require('cheerio'),
const $ = cheerio.load(chunk)
Copy the code

As you can see, the epidemic data is stored in the getAreaStat property of the Window object. Node does not have Windows, so you need to add a Window object, so that when the data is retrieved, it will not report an error when stored in the Window object

let window={}
Copy the code

So the Cheerio is crawling out a string, and you need to use eval to turn it into JS.

eval($('#getAreaStat').html())
Copy the code

Convert the window getAreaStat to a JSON string and save it in data. JSON

let requests = require('requests')
let fs = require('fs')
let cheerio = require('cheerio')
requests('https://ncov.dxy.cn/ncovh5/view/pneumonia_peopleapp?from=timeline&isappinstalled=0')
.on('data'.function(chunk){
    let window={}
    const $ = cheerio.load(chunk)
    eval($('#getAreaStat').html()) // Convert window getAreaStat to JSON string and save it in data.json fs.writefile ('data.json',JSON.stringify(window.getAreaStat),function(){
        console.log('save successfully')})})Copy the code

To be successful