In this article

Stand By Me Doraemon 2, one of the works to commemorate the 50th anniversary of Doraemon, will be released on the Chinese mainland on May 28, 2021.

The Doraemon series are stories that have accompanied me and even generations of people growing up. Over the past 50 years, Mr. Fujiko F. Fujio has created bamboo dragonflies, random doors, time machines and countless other novelty items, allowing Nobita and his friends to experience various adventures such as space wars, the age of dinosaurs, as well as many dramatic and funny daily life.

integrityChildren's DayBefore and after, I heard that Nobita and Shizuka also got married, so I specially wrote this article to teach you how to draw word cloud map by hand.

Well, this article is based on douban movie review word cloud mapping. Through this article, you will receive:

  • ① Douban movie data crawling;
  • ② Hand to hand teach you to learn to draw word cloud;

Steps of douban reptile

Of course, there are many other data on Douban, which are worth our analysis after crawling. But in this article we’ll just crawl the comment information.

Since there is only one field, we use the RE regular expression directly to solve the problem.

Look at those creepy-crawlies, this is another chance for you to practice.

Here are the steps of crawler:

Import requests import chardet import re # 2. Structure request head, this is an anti-raking measures, early learn to summarize, which sites are used what, summed up more, with handy. Headers = {' user-agent ':'Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36', # Because sometimes the server will use the Referer to determine whether the request was made by the browser or crawler :'https://www.douban.com/'} # 3. For I in range(0,1001,20) url = f"https://movie.douban.com/subject/34913671/comments?start={i}&limit=20&status=P&sort=new_score" # 4. Response = requests. Get (url,headers=headers) Response. Encoding = "utF-8" # 6. Gets the result returned, calling the text property. It is important to distinguish between the text and content attributes. Text = response.text# returns a decoded string # 7. Comments_list = [I for I in re.findall('<span class="short">(.*?)</span>',text, re.s)] # 8. For comment in comments_list, for each entry in the list, we directly use the open() function and write it to the TXT file: TXT "," A ",encoding = "utF-8 ") as f: f.rite (comment + "\n") with open(r" Doraemon: 2.txt"," A ",encoding =" utF-8 ")Copy the code

The final effect is as follows:

Word cloud map making process

Many students do not know how to make word cloud map, take this opportunity, I write a detailed process here, we just follow suit.

The detailed steps of word cloud mapping are as follows:

  • ① Import related libraries;
  • ② Read text files and use Jieba library to dynamically modify the dictionary;
  • (3) Using the lcut() method in jieba library for word segmentation;
  • ④ Read stop words, add extra stop words, and remove stop words;
  • ⑤ Word frequency statistics;
  • ⑥ Draw word cloud map

① Import related libraries

Here, import whatever library you need.

import jieba
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from imageio import imread

import warnings
warnings.filterwarnings("ignore")
Copy the code

② Read text files and modify the dictionary dynamically using jieba library

Here, with open() reads the text file, which I won’t explain. Dynamic dictionary modification is explained here.

Sentence = "There is a dog on the 3rd street, Guangshui, Hubei" list(jieba.cut(sentence))Copy the code

The results are as follows:



In view of the above segmentation results, what if we want to treat “Hubei Guangshui” and “the third street” as one complete word without separating them? At this point, you need to modify the dictionary dynamically with the add_word() method.

Sentence = "There is a dog on the 3rd street of Guangzhou (left) jieba.add_word(left) jieba.add_word(left)Copy the code

The results are as follows:



Conclusion:

  • Jieba.add_word () : only one word can be added dynamically.
  • To dynamically add multiple words, use jieba.load_userdict(). That is, put all your custom words into one text, and then use this method to dynamically modify the dictionary set at once.

With the above foundation, we can modify the dictionary dynamically after reading the text directly.

With open(r" Doraemon: Stand by me 2. TXT ",encoding=" utF-8 ") as f: With open(r" Doraemon: Stand by me 2. TXT ",encoding=" utF-8 ") as f: TXT = f.read() TXT = txt.split() jieba.add_word(" doraemon ") jieba.add_word(" daxiao ")Copy the code

③ Use the lcut() method in jieba library for word segmentation

One line of code, very simple.

data_cut = [jieba.lcut(x) for x in txt]
Copy the code

④ Read stop words, add extra stop words, and remove stop words

Read the stop words, using the split() function after the split, will get a list of stop words. Then, use the + sign to add the extra stop words to the list.

With open(r"stoplist. TXT ",encoding=" utF-8 ") as f: Stop = f.read() stop = stop.split() # S_data_cut = pd.Series(data_cut) all_words_after = s_data_cut. Apply (lambda x:[I  for i in x if i not in stop])Copy the code

⑤ Word frequency statistics

Note here the use of value_counts() in series.

all_words = []
for i in all_words_after:
    all_words.extend(i)
    
word_count = pd.Series(all_words).value_counts()
Copy the code

⑥ Draw word cloud map

Wc = WordCloud(font_path="simhei.ttf", wc ="simhei.ttf", background_color="white", max_words=2000, mask=back_picture, max_font_size=200, Figure (figsize=(16,8)) plt.imshow(wc2) plt.axis("off") plt.imshow(wc2) plt.axis("off") plt.show() wc.to_file("ciyun.png")Copy the code

The results are as follows:



From the word cloud picture, we can roughly see: this is another tearjerker movie, this is a sentimental movie, accompany us to grow up big male are married? What about us? In fact, when we were little, we wanted Nobita and Shizuka to be good friends, and in this movie, they got married. How should this movie play out? You can go to the cinema to find out.

In fact, see fat tiger for big male head, xiaobian tears eyes! Ye qing”

Recently, many friends have sent messages to ask about learning Python. For easy communication, click on blue to join yourselfDiscussion solution resource base