Original link:tecdat.cn/?p=6852

Knowledge popularization

 

Word frequency: The number of occurrences of a word in a document. Stop word: filter out certain words or words in data processing, such as website, website, etc. Corpus: the collection of all documents we are analyzing Chinese word segmentation: To divide a sequence of Chinese characters into individual words

Using third-party libraries

DataFrame() Generates the DataFrame object pandas.DataFrame. Groupby () Group statistics Group statistics instance Groupby (by= array of column names). Agg ({string names: function}) wordCloudPython

Word cloud implementation

#! /usr/bin/env python # coding=utf-8import osimport jiebaimport codecsimport pandas as pdimport numpy as npfrom wordcloud Import WordCloud, ImageColorGeneratorimport matplotlib. Pyplot used as PLT # import library file basefile = data store path load f_in = # corpus Codecs.open (basefile+'an.txt','r','utf-8') content = f_in.read() # jieba.cut(content)for seg in segs: if len(seg)>1: SegmentDF = pd.dataframe ({'segment':segments}) # segmentdf.groupby (by = segmentdf.groupby [' segment ']) [' segment '] agg ({' count ': np size}). Reset_index () sort_values (by = [' count'], / / stopWordscn.txt ("./ stopWordscn.txt ", encoding='utf8', encoding='utf8', FSegStat = segStat[~ segstat.segment.isin (stopwords.stopword)] # build wordcloud file wordcloud = WordCloud(font_path='./simhei.ttf', # WordCloud display font background_color="black", # word cloud display background color) words = FSegStat. Set_index (' segment '). To_dict () wordcloud. Fit_words (words [' count ']) PLT, imshow (wordcloud) PLT. The show ()Copy the code

 

Results show

 

 

AnnaKarenina

Word cloud beautification

Bimg = imread(basefile+' an.png ') wordcloud = wordcloud (background_color="white", Mask =bimg, font_path='./simhei.ttf')wordcloud = wordcloud.fit_words(words[' count ']) # figsize=(8, 6), dpi=80, facecolor='w', BimgColors = ImageColorGenerator(bimg)plt.axis("off") # reset word cloud colors plt.imshow(wordcloud.recolor(color_func=bimgColors))plt.show()Copy the code

Need help? Contact us