Beautiful word cloud



“The Age of Awakening” is a hot review

Click on the high praise



Multi – page crawl explanation



Hot evaluation crawl analysis

A hot rating corresponds to a class value ofcomment-itemDiv tags

So we just need to get all of the class valuescomment-itemThe div tag of the current page can get all the hot comments

However, the heat specific is stored in a class value ofcomment-itemThe class value under the div tag isshortUnder the span TAB

So we just need to get all of the class valuesshortThe span tag of the current page can get all the hot comments

spans = data.find_all(class_="short")
for i in spans:
     global_text += ",".join(jieba.cut(str(i.text).strip()))



The complete code

import matplotlib.pyplot as plt import wordcloud import jieba from imageio import imread import requests from bs4 import  BeautifulSoup global_text = "" def getDetail(data): global global_text data = BeautifulSoup(data,"html.parser") spans = data.find_all(class_="short") for i in spans: Global_text += ",".join(jieba.cut(STR (i.ext).strip()))) # def towordCloud (): Global global_text mask = imread("./9.png") # WCD = wordcloud.wordcloud (font_path="C:\Windows\Fonts\msyh.ttc", Background_color ='white', # set background color random_state=80, Mask =mask) WCD. Generate (global_text) WCD. To_file ("res.jpg") # save as image plt.imShow (WCD) plt.axis('off') plt.show() if __name__ = = "__main__ ': headers = {the user-agent:" Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit / 537.36 (KHTML, Like Gecko) Chrome/91.0.4472.124 Safari/ 537.36edg /91.0.864.64"} url = 'https://movie.douban.com/subject/30228394/comments?percent_type=h&start={}&limit=20&status=P&sort=new_score' for i in Range (6, 10) : new_url = url.format(i * 20) response = requests.get(url=url,headers=headers) response.encoding = 'utf-8' getDetail(response.text) toWordCloud()

The last

I am aCode pipi shrimp, a love of sharing knowledge of mantis shrimp lovers, in the future will continue to update the beneficial blog, look forward to your attention!!

Creation is not easy, if this blog post is helpful to you, I hope you can == a key three even oh! Thanks for your support. See you next time ~~~

== Share outline ==

Big factory interview topic column






Java from entry to the grave learning route directory index </font>






<font size=”5″> open source crawler example tutorial directory index

More exciting content to share, please clickHello World (low low ◡)

This article crawler source has been by
GitHub
https://github.com/2335119327/PythonSpiderIncluded (
Connotation of more crawlers not included in this blog post, interested partners can have a look) and will be updated later. Welcome
Star.