First, capture data

To create a word cloud chart, you need data first. So it takes a little bit of crawler skill.

The basic idea is: capture packet analysis, encryption information processing, capture popular comment information

1. Packet capture analysis

First of all, we opened the web version of netease Cloud Music with the browser and entered the song page of Xue Zhiqian’s “Skyscraper”. You can see the comments below. F12 then goes to the developer console (review element).

The next step is to find the url for the song review and analyze it to see if the data matches the actual data on the web

Organize your thoughts, analyze the API, and simulate sending a request to get json parsing

2. Encryption information processing

And then after the test, just take these two data directly from the browser. But to really solve this encryption process, you need a bit of encryption and decryption just storage

3. Grab popular comments





Second, data visualization

After obtaining the relevant comment data, we made it into charts and word clouds, which will make people more intuitive.

Next, you need to install the required installation packages on your computer: Pyecharts, Matplotlib, WordCloud

import requests

import json

from pyecharts import Bar

from wordcloud import WordCloud

import matplotlib.pyplot as plt

url = ‘http://music.163.com/weapi/v1/resource/comments/R_SO_4_551816010?csrf_token=568cec564ccadb5f1b29311ece2288f1’

headers = {

‘the user-agent’ : ‘Mozilla / 5.0 (Windows NT 6.1; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36’,

‘Referer’:’http://music.163.com/#/album?id=38388012′,

‘Origin’:’http://music.163.com’,

‘Host’:’music.163.com’

}

# Encrypt the data and use it directly

user_data = {

‘params’: ‘vRlMDmFsdQgApSPW3Fuh93jGTi/ZN2hZ2MhdqMB503TZaIWYWujKWM4hAJnKoPdV7vMXi5GZX6iOa1aljfQwxnKsNT+5/uJKuxosmdhdBQxvX/uwXSOVdT+ 0RFcnSPtv’,

‘encSecKey’: ’46fddcef9ca665289ff5a8888aa2d3b0490e94ccffe48332eca2d2a775ee932624afea7e95f321d8565fd9101a8fbc5a9cadbe07daa61a27d18e4eb 214ff83ad301255722b154f3c1dd1364570c60e3f003e15515de7c6ede0ca6ca255e8e39788c2f72877f64bc68d29fac51d33103c181cad6b0a297fe 13cd55aa67333e3e5′

}

response = requests.post(url,headers=headers,data=user_data)

data = json.loads(response.text)

hotcomments = []

for hotcommment in data[‘hotComments’]:

item = {

‘nickname’:hotcommment[‘user’][‘nickname’],

‘content’:hotcommment[‘content’],

‘likedCount’:hotcommment[‘likedCount’]

}

hotcomments.append(item)

# get comment username, content, and the corresponding number of likes

content_list = [content[‘content’] for content in hotcomments]

nickname = [content[‘nickname’] for content in hotcomments]

liked_count = [content[‘likedCount’] for content in hotcomments]

Bar = bar (” Hot comments like sample graph “)

Bar.add (” liking “,nickname, liked_count, is_stack=True,mark_line=[“min”, “Max “],mark_point=[“average”])

bar.render()

content_text = ” “.join(content_list)

wordcloud = WordCloud(font_path=r”C:\simhei.ttf”,max_words=200).generate(content_text)

plt.figure()

plt.imshow(wordcloud,interpolation=’bilinear’)

plt.axis(‘off’)

plt.show()