Python Web Crawler - Crawling the Cloud Music Review (3)

“This is the third day of my participation in the Gwen Challenge in November. Check out the details: The last Gwen Challenge in 2021.”

After the first two days of reptilian basic learning, today to test the knife. Check out netease Cloud’s comments.

1. Locate the location target

First I look for my favorite song “Golden Age”, but there is no original song, netease cloud really does not have any original song, a lot of covers!!

You can see that all comments are wrapped with id=” auto-id-0FLvTEg8ZLVKFZST”

Tag. Regardless of it, now download the web page down to see.

2. Download the web page

Just download the web page and use BeautifulSoup to extract the comments.

Import requests def get_URL (url): headers = {'user-agent':'Mozilla/5.0 (Windows NT 6.1); WOW64) AppleWebKit / 535.1 (KHTML, } res = requests. Get (url,headers = headers) return res def main(): Url = input(" please input the url :") res = get_url(url) with open("res.txt","w",encoding = 'utF-8 ') as file: file.write(res.text) if __name__ == "__main__": main()Copy the code

The output result is as follows, you need to input the relevant song webpage:

Search the relevant comments, no search! Comments are not in this file! That means the comments are in another file!

3. Set the speed and locate the target file

The Internet is so fast that it loads the entire page with one swipe.

If we click on Network and refresh, we can find many source files that are part of the entire web page:

We’re going to dig through this pile of papers to find the one with the comment; Obviously, we can go through file by file, but that’s a little tricky. Here we can make the browser load the page slowly and stop time when the target is detected.

In addition, the order of labels is as follows:

data = soup.select('#main > div > div.mtop.firstMod.clearfix > div.centerBox > ul.newsList > li > a')
Copy the code

The car overturned, the browser does not work! Go back tonight and update Chrome!

Comments are documents, and we can look directly at XHR and DOC files. Also, when we download the target file, we find that the file is a POST file, remember what we said about POST files? We need to submit certain data to the server in order to get what we want. I will update this tomorrow!

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Python Web Crawler – Crawling the Cloud Music Review (3)

1. Locate the location target

2. Download the web page

3. Set the speed and locate the target file

Python Web Crawler – Crawling the Cloud Music Review (3)

1. Locate the location target

2. Download the web page

3. Set the speed and locate the target file

Related Posts

How do I find a dying ReLu? The visualization tool TensorBoard helps

Music playback based on MATLAB GUI dynamic music playback

Play artificial Intelligence in Python: Recognize gesture numbers