1. Start with a false start

Last time, I compiled the example of taking the pen fun pavilion novel. This time, I will borrow some songs from the Internet

2. Page analysis

First let’s go to our target website music.163.com/#/artist?id…

Find our target list song-list-pre-cache and the most important song ids.

We have to get our own cookies because we have to log in.

Here because the original chain have encryption, I won’t crack, found in a certain degree of a connection music.163.com/song/media/ behind… Through this link we can directly obtain the MP3 address of the song, the next is the implementation of the code.

3. Code implementation

First, introduce dependencies

< the dependency > < groupId > org, apache httpcomponents < / groupId > < artifactId > httpclient < / artifactId > < version > 4.5.6 < / version > </dependency> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.10.2</version> </dependency>Copy the code

The main code

public class MusicSpider { private String downloadUrl = "http://music.163.com/song/media/outer/url?id="; private String path = "D:/music/"; Public void getSongList(String url) {Document Document = getHtml(url); if (document ! = null) { Elements elements = document.select("#song-list-pre-data"); System.out.println("json data as follows "+ elements. Text ()); String resJson = elements.text(); JSONArray JSONArray = jsonObject.parsearRay (resJson); For (int I = 0; i < jsonArray.size(); i++) { Music music = new Music(); String singer = jsonArray.getJSONObject(i).getJSONArray("artists").getJSONObject(0).get("name").toString(); music.setSinger(singer); music.setSongUrl(downloadUrl + jsonArray.getJSONObject(i).get("id").toString()); music.setSong(jsonArray.getJSONObject(i).get("name").toString()); try { if(i<jsonArray.size()){ Thread.sleep(10000); System.out.println(" rest for 10 seconds to continue climbing "); downloadFile(music.getSongUrl(), path + music.getSong() + "-" + music.getSinger() + ".mp3"); } } catch (Exception e) { e.printStackTrace(); }}}} /** * Download files * @param fileUrl file address * @param fileLocal file storage address * @throws Exception */ public void downloadFile(String fileUrl, String fileLocal) throws Exception { URL url = new URL(fileUrl); HttpURLConnection urlCon = (HttpURLConnection) url.openConnection(); urlCon.setConnectTimeout(6000); urlCon.setReadTimeout(6000); int code = urlCon.getResponseCode(); if (code ! = httpurlConnection. HTTP_OK) {throw new Exception(" file read failed "); } // Read file stream; DataInputStream in = new DataInputStream(urlCon.getInputStream()); DataOutputStream out = new DataOutputStream(new FileOutputStream(fileLocal)); byte[] buffer = new byte[2048]; int count = 0; while ((count = in.read(buffer)) > 0) { out.write(buffer, 0, count); } out.close(); in.close(); } private Document getHtml(String url) {List<Header> headerList = new ArrayList<>(); headerList.add(new BasicHeader("origin", "music.163.com")); headerList.add(new BasicHeader("referer", "https://music.163.com/")); Headerlist. add(new BasicHeader("user-agent", "Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"); Headerlist. add(new BasicHeader("cookie", "here is your login cookie")); String result = HttpClientUtil.doGet(url, headerList, "utf-8"); if (result ! = null && ! result.contains("n-for404")) { return Jsoup.parse(result); } return null; }}Copy the code

An entity class that stores information

@Data public class Music { private String singer; private String song; private String songUrl; public Music(){ } public Music(String singer,String song,String songUrl){ this.singer = singer; this.song = song; this.songUrl =songUrl; }}Copy the code

Finally start

public static void main(String[] args){ MusicSpider spider = new MusicSpider(); / / here the need to manually change the singer's id, don't know why I can't get list of data, for a moment, if after solved the spiders. The revision getSongList (" https://music.163.com/artist?id=3684 "); }Copy the code

4. Operation effect

5. To summarize

The basic functions have been realized, but due to some strange problems, I can’t get the content of the ranking, so I can only get the songs by semi-automatic way temporarily. If you have a solution, please contact me. For some songs, specific data cannot be obtained due to copyright and other reasons. There’s a lot of damage.