Use Nodejs built-in module + shell script to achieve a simple picture crawler

The module HTTPS is used to request child_process to execute shell scripts

The built-in HTTP/HTTPS module is introduced

const http = require('http');
Copy the code

The spawn module is introduced to execute shell scripts

const { spawn } = require('child_process');
Copy the code

Define the URL to retrieve the image

const url = 'https://www.duitang.com/article/?id=870171';
Copy the code

Defining a regular match

const reg = /http(s)? :\/\/([\S]*)\.(jpg|jpeg|png|gif|webp)/g;
Copy the code

The initiating

// Initiate a request
https.get(url, res= > {

	// Define an empty string
	let str = ' ';

	// Listen for loading
	res.on('data'.data= > {
		str += data;
	})

	// Stop listening
	res.on('end'.() = > {
		
        let i;
        
		// Match one at a time
		while (i = reg.exec(str)) {

			// fetch each data
			let item = i[0];

			// Execute the shell script to pass the img address link
			spawn('sh'['index.sh', item]);

		}
		console.log('execution')})})Copy the code

Shell scripting

#! /bin/sh

#Define named timestamp + random number
time=$(date "+%Y%m%d%H%M%S")-$RANDOM

#Take picture format
type="${1##*.}"

#If the directory does not exist, create it
if [ ! -d "img" ]; then
    mkdir img
fi

#Send the request to the specified directory with the specified file name
curl $1 -o ./img/$time.$type

#Exit the script
exit 0;
Copy the code

Execute the index.js script

node index.js 
Copy the code

The script must be executed in the current directory.

If the URL starts with HTTPS then you need HTTPS to make the request otherwise HTTP will do.

After normal execution, a new IMG folder will be created in the current directory, which is the downloaded image.

Index.js complete code

Index. sh Complete code