Nodejs crawler

Recently, I have been studying some functions of NodeJS. I just made a simple nodeJS crawler, which is used to pick up lyrics on Douban. I just want to share this topic.

A web crawler (also known as a web spider, a web bot, and more commonly as a web chaser in the FOAF community) is a program or script that automatically crawls information from the World Wide Web according to certain rules. Learn about nodeJS crawlers:

1. NPM basic instruction, common nodeJS module.

2. Jquery selector.

3. Basic es6 and javascript knowledge.

4. Basics of HTML5.

5. Basic knowledge of HTTP and HTTPS.

The project preparation begins below.

1. Create an empty folder.

2. NPM init initializes a project, just press Enter.

3. Install NPM package NPM install

4. Create a new folder called app.js (name yourself)

5. Introduce the node modules we need in app.js.

const https = require('https'); const fs = require('fs') const cheerio = require('cheerio')

Then take a URL as the target website of our crawler:

Let the url = 'https://sh.5i5j.com/zufang/pudongxinqu/' / / data from I love my family, HTTPS site using HTTPS module, HTTP using HTTP module.

I’m going to use a Cheerio JS package, which is used to fetch the HTML DOM as jquery.

Fs is a nodeJS library that calls the file system. We use the writeFile method to write data to a file. In this case, we write to an HTML file.

fs.writeFile(url, file, function(err) {console.log(err)})

Url represents the address of the relative path, file is the JSON data we got from Cheerio, followed by a callback function that throws an error message.

Next up all the code:

Actual effect:

Successfully completed a crawler for housing information.

Making address:

https://github.com/angleneo/nodejs-spider

Related Posts

When should we not use lock files

Javascript in-depth series 3: scope, execution context stack, this, closure

The use of Lodash