preface

PK Creative Spring Festival, I am participating in the “Spring Festival Creative Submission Contest”, please see: Spring Festival Creative Submission Contest “,

Yesterday saw the Nuggets of the Spring Festival essay activities, recently just look at the crawler, climb to take the Spring Festival greetings, generate a word cloud to play, we are interested in can try, will be presented with the source code, very simple. The renderings are as follows

The environment

  1. Environment: Windows,

  2. Language: Python, python version 3.7

  3. Dependent third-party packages:

    Selenium —- crawl site, collect greetings, this library is expected to be common for UI automated testing, I do not use the Requests library to crawl, the advantage of using this library is that the page is visible in real time during the crawl process

    Wordcloud – Used to generate word clouds

    PIL– make the word cloud grow into the desired contours. Note here that python3.7 should be installed using PIP install pillow

    Numpy – to generate the contourless word cloud, you also need to install this package, which represents the image of the given shape as a large matrix

    Jieba — the default word created by word cloud is English, because we climb the blessing language is Chinese, need to use this library to identify Chinese, prevent Chinese garbled

Above interest, you can deepen your understanding of these libraries

Train of thought

(1) I crawled Baidu, search about the Spring Festival blessings, and then put these blessings into a file. Details are as follows:

Selenium WebDriver is used here. The Firefox browser is used. Create a Firefox browser object

On this page, I simulated manually clicking the first search result to jump to other web pages, as shown in the picture

Get all the greetings from this page and save them in wishes. TXT

(2) Then parse the file using the relevant library to generate the word cloud, and generate the word cloud. Note here that the Chinese font used to generate the word cloud, font_path uses the Windows font library, here you can change the font

word_cloud = WordCloud(mask=mask, font_path='C:\Windows\Fonts\STXINGKA.TTF').generate(text)

Copy the code

Windows font library

The source code

note

You can modify the background color and title color, for example


word_cloud = WordCloud(mask=mask, background_color='white',  contour_color='red', colormap='brg',
                       max_words=600,
                       font_path='C:\Windows\Fonts\STXINGKA.TTF').generate(text)

Copy the code

After re-running, see figure

Support the colormap font color set, refer to the following link matplotlib.org/2.0.2/examp…