Cause: : Dad asked me to download thousands of songs for him to play in the car, feel manual download, even if the batch download also takes time, just write a crawler to download automatically.

For this crawler small project, select Node +koa2, initialize the project koA2 projectName (koA-Generator needs to be installed globally), then enter the project file, NPM install && NPM start, Among them, superagent, Cheerio, Async, FS and PATH are used as dependencies

Open the netease Cloud web page and click on the playlist page. I select the Chinese category, right-click the source code of the framework, obtain the real URL, and find the HTML structure with the ID of M-PL-Container. This is the playlist to be crawled this time. Async is required for concurrent crawls

static getPlayList(){
	const pageUrlList = this.getPageUrl();

	return new Promise((resolve, reject) => {
		asy.mapLimit(pageUrlList, 1, (url, callback) => {
			this.requestPlayList(url, callback);
		}, (err, result) => {
			if(err){ reject(err); } resolve(result); })})}Copy the code

Const asy = require(‘async’) const asy = require(‘async’) const asy = require(‘async’

static requestPlayList(url, callback){
	superagent.get(url).set({
		'Connection': 'keep-alive'
	}).end((err, res) => {
		if(err){
			console.info(err);
			callback(null, null);
			return;
		}

		const $ = cheerio.load(res.text);
		letcurList = this.getCurPalyList($); callback(null, curList); })}Copy the code

GetCurPalyList gets the information on the page, passing in $for DOM manipulation

static getCurPalyList($){
	let list = [];

	$('#m-pl-container li').each(function(i, elem){
		let _this = $(elem);
		list.push({
			name: _this.find('.dec a').text(),
			href: _this.find('.dec a').attr('href'),
			number: _this.find('.nb').text()
		});
	});

	return list;
}
Copy the code

Now that the list of songs has been climbed, it’s time to climb the list of songs

static async getSongList(){
	const urlCollection = await playList.getPlayList();

	let urlList = [];
	for(let item of urlCollection){
		for(letsubItem of item){ urlList.push(baseUrl + subItem.href); }}return new Promise((resolve, reject) => {
		asy.mapLimit(urlList, 1, (url, callback) => {
			this.requestSongList(url, callback);
		}, (err, result) => {
			if(err){ reject(err); } resolve(result); })})}Copy the code

RequestSongList is used in much the same way as playList above, so it is not repeated. After the above code gets the song list, you need to download it locally

static async downloadSongList(){
	const songList = await this.getSongList();

	let songUrlList = [];
	for(let item of songList){
		for(let subItem of item){
			let id = subItem.url.split('=') [1]; songUrlList.push({ name: subItem.name, downloadUrl: downloadUrl +'? id=' + id + '.mp3'}); }}if(! fs.existsSync(dirname)){ fs.mkdirSync(dirname); }return new Promise((resolve, reject) => {
		asy.mapSeries(songUrlList, (item, callback) => {
			setTimeout(() => {
				this.requestDownload(item, callback);
				callback(null, item);
			}, 5e3);
		}, (err, result) => {
			if(err){ reject(err); } resolve(result); })})}Copy the code

RequestDownload is a request for a downloadUrl and saves the download to the local PC

static requestDownload(item, callback){
	let stream = fs.createWriteStream(path.join(dirname, item.name + '.mp3'));

	superagent.get(item.downloadUrl).set({
		'Connection': 'keep-alive'
	}).pipe(stream).on('error', (err) => { console.info(err); // Error handling, when climbing error, print error and continue down})}Copy the code

At this point, the crawler applet is complete. This project climbs the songList -> songList -> download to the local, of course, can directly find the home page of a certain artist, change the URL passed into the songList, directly download the popular songs of that artist.