I. Project Objectives

Get QQ music designated singer single ranking specified page number of songs song name, album name, play link.

From shallow to deep, layer by layer, very suitable for beginners to practice.

【 II. Required Libraries 】

The main libraries involved are requests, JSON, and OpenPyXL

Iii. Project Realization

1. Understand the robots protocol of QQ Music website

Disable only playlists.

2. Access the QQ Music homepage y.qq.com/

3. Enter any singer, such as G.E.M.

4. Open the Review element (Ctrl+Shift+I)

5. After analyzing the source code Elements of the web page, it was found that BeautifulSoup could not be used because there was no song information, as shown in the figure below, and the result was empty.

6. Click Network to see if the data is in XHR

My experience is to look at the largest Size first, then analyze the Name,

Check Preview, sure enough, inside!

7. Click Headers to get the relevant parameters. See the picture below

The relationship between url and Query String Parameters is found

The W in the URL represents the name of the artist and the P represents the number of pages.

8. Through JSON code, first try the first page, climb

The url is copied directly from the data. Success!

9. Params parameter is introduced to realize the query of specified singer and specified page number.

Note that the code URL is “? “in the previous url. In the preceding section, we need to add ‘,requests. Get ‘to both params and headers.

  1. Add storage function, save to local (Excel). It can also be saved to a CSV file or a database. The operation is similar.

【 IV. Summary 】

1. It is slightly more difficult to climb QQ music than douban and other websites. The required information is not in the source code of the web page.

2. Json is generally used for data crawling through XHR, in the following format:

* * * res = requests.get(url)

json = res.json()

List = json [‘] [‘]…

3. It is only for practice. It is not recommended to crawl too much data to increase the load on the server.

We will introduce how to crawl specific song lyrics and reviews (Selenium), and generate a wordcloud.

5. If you need the source code of this article, please reply “QQ music” four words in the background of the public number for access.

Have you learned anything after reading this article? Please share it with more people

Home of IT Sharing

Please reply in wechat background [Join group]

To learn more about Web crawlers and data mining in Python, visit:pdcfighting.com/