“This is the 17th day of my participation in the First Wen Challenge 2022.First challenge in 2022”.

1. Install the module

We usually have two ways to install, one is to directly use PIP installation, we open CMD to execute the following statement:

pip install wordcloud
Copy the code

I’m using Python 3.7, 64-bit, so I’ll download the file in the red box:

After downloading, we go to the root directory of the file:

cd C:\Users\zaxwz\Downloads
Copy the code

Then execute the following statement:

PIP install wordcloud 1.6.0 - cp37 - cp37m - win_amd64. WHLCopy the code

The following part of install is the full name of the file. We can also choose to type worDC and then use the TAP key to automatically complete the file. After executing the above statement, the installation of the WordCloud module begins.

2. Introduction to wordCloud module

The main implementation of our wordcloud is the implementation of the wordcloud class in the wordcloud module. Let’s look at a wordcloud class first.

2.1 Generate a simple word cloud

The steps to implement a simple word cloud are as follows:

  1. Import the WordCloud module
  2. Preparing text data
  3. Create a WordCloud object
  4. Generate word clouds from textual data
  5. Save the word cloud file

We implement the simplest word cloud by following the steps above:

# import module
from wordcloud import WordCloud
Prepare text data
sentence = "Do not go gentle into that good night!"
Create a word cloud object
wc = WordCloud()
Generate word clouds from textual data
wc.generate(sentence)
Save the word cloud file
wc.to_file("wc.png")    
Copy the code

We follow the above steps to achieve a simple word cloud code is exactly five sentences, we achieve the following effect:

As can be seen, the above word cloud is still very monotonous, and the background is all black. We will improve it gradually in the follow-up study.

2.2 Constructing some parameters of WordCloud

Let’s take a look at some of the parameters in WordCloud, as described in the table below.

parameter The parameter types Parameter is introduced
width int(default=400) The word cloud wide
height int(default=200) The word cloud
background_color Color value (default = “black”) Background color of the word cloud
font_path string The font path
mask nd-array(default=None) Picture cloud background picture
stopwords set Words to block
max_font_size int(default=None) Maximum font size
min_font_size int(default=None) Minimum font size
max_words number(default=200) The maximum number of words to display
contour_width int Outline the thickness
contour_color color value Outline of the color
scale float(default=1) A multiple of the original scale

Let’s test the above parameters:

# import module
from wordcloud import WordCloud
Prepare text data
sentence = "He had already passed sixty of the stages leading to it, and he had brought from his journey nothing but errors and remorse. Now his health was poor, his mind vacant, his heart sorrowful, and his old age short of comforts."
Prepare the disable word, which needs to be of type set
stopwords = set(['go'.'to'])
# Set the parameters to create a WordCloud object
wc = WordCloud(
    width=200.Set the width to 400px
    height=150.Set the height to 300px
    background_color='white'.# Set the background color to white
    stopwords=stopwords,         # set to disable words, words in the set will not appear in the generated word cloud
    max_font_size=100.Set the maximum font size. No word should exceed 100px
    min_font_size=10.Set the minimum font size. No word should exceed 10px
    max_words=10.# set maximum number of words
    scale=2                     # Twice as big
)
Generate word clouds from textual data
wc.generate(sentence)
Save the word cloud file
wc.to_file('wc.png')
Copy the code

Execute the code above to generate the following word cloud:

We use the following code to get the size of the image:

import os
Get the file size
size = os.path.getsize('./wc.png')
# output size
print(size)
Copy the code

The following output is displayed:

24966
Copy the code

When we construct the WordCloud object, we set scale to 1 and save the obtained WordCloud file as wc2.png. We output its size according to the code above, and the result is as follows:

11988
Copy the code

Finding that the memory was halved, I set it to 15 when I was initially unaware of this property, resulting in a 100M image.

Some of the above parameters will be explained later.

2.3. Some methods of WordCloud

Here are three methods we will use, as follows:

Method names Incoming parameters Methods described
generate text Generate word clouds from text
recolor [random_state, color_func, colormap] Recolor the existing output
to_file filename Output to file

We’ve already used to_file, which is just passing in the file path, and generate, which is passing in text and having the WordCloud object generate the WordCloud from the text. The Recolor method will be used later, and I’ll show you later.

2.4 Generate a word cloud with shape

In the example above we used monotony rectangles. Now we use the mask property of the WordCloud to generate a WordCloud with shape. Let’s prepare the text data first. I selected Steve Jobs’ speech at Stanford and saved it as jobs.txt file, which is in English. Let’s read the file first:

# Open the file jobs.txt in read-only, utf8 encoding
f = open('jobs.txt'.'r'.'utf8')
Read the data in the file
article = f.read()
# close file
f.close()
Copy the code

Once we have the text data, we also need an Ndarray object. There are many ways to get it. Here we use OpencV:

# Fetch image, return ndarray object
im = cv2.imread('1.png')
Copy the code

Opencv installation is as follows:

pip install opencv-python
Copy the code

Then we can start generating the word cloud. The complete code is as follows:

from wordcloud import WordCloud
import cv2
# Open file
f = open('jobs.txt'.'r', encoding='utf8')
# read file
article = f.read()
# close file
f.close()
# fetch image
im = cv2.imread('1.png')
# Generate word clouds
wc = WordCloud(
    width=400,
    height=500,
    background_color='white',
    max_font_size=50,
    mask=im     # set mask property to let words draw word cloud as IM outline (background must be all white)
)
Generate word clouds from text
wc.generate(article)
Save the word cloud file
wc.to_file('w2.png')
Copy the code

The renderings are as follows:

PNG is on the left, and the generated word cloud is on the right.

2.5 Generate Chinese word cloud

In the example above, we are using English as the text data. We are now trying to generate a simple word cloud using Chinese as the text data. The code is as follows:

# import module
from wordcloud import WordCloud
Prepare text data
sentence = Do not go gentle into that good night.
Create a word cloud object
wc = WordCloud()
Generate word clouds from textual data
wc.generate(sentence)
Save the word cloud file
wc.to_file("wc.png")    
Copy the code

The resulting renderings are as follows:

This is because WordCloud does not support Chinese by default. We need to set a font that can support Chinese. We modify the code as follows:

# import module
from wordcloud import WordCloud
Prepare text data
sentence = Do not go gentle into that good night.
Create a word cloud object
wc = WordCloud(font_path='msyh.ttc')
Generate word clouds from textual data
wc.generate(sentence)
Save the word cloud file
wc.to_file("wc.png")
Copy the code

Above we set font_path. The font file can be found in C:\Windows\Fonts. But we found another problem, that is, there is no participle displayed this time, but a whole sentence. If we use other articles to test, we will also find that the word cloud is presented in the form of sentences, so we need to use jieba word segmentation module to solve this problem.

3. Jieba participle module

3.1 Use jieba participle

Let’s download this module first:

pip install jieba
Copy the code

Jieba is used to separate the parts of the sentence that may be words. There are three segmentation modes: full mode, accurate mode and search engine mode. We understand the accurate mode here:

import jieba
# Prepare participle sentences
sentence = Do not go gentle into that good night.
# Use exact mode segmentation by default
words = jieba.cut(sentence)
# Use "" to connect the separated words
text = "".join(words)
# output
print(text)
Copy the code

The following output is displayed:

Do not go gentle into that good nightCopy the code

3.2 Jieba Used with WordCloud

We can use jieba to divide sentences into words first and then generate word clouds as text data. The code is as follows:

import jieba
from wordcloud import WordCloud
# Prepare sentences
sentence = Do not go gentle into that good night.
Create a WordCloud object and set the font
wc = WordCloud(font_path='msyh.ttc')
Jieba # jieba
words = jieba.cut(sentence)
# Separate words with ""
text = "".join(words)
# Generate word clouds from words
wc.generate(text)
Save the word cloud file
wc.to_file('wc.png')
Copy the code

The generated word cloud is as follows:

With the stuttering tool we can generate more complex word clouds.

4. ImageColorGenerator object

In the example above, we do not control the color of any word cloud, and the ImageColorGenerator object is a class that gets the ImageColorGenerator, which we can think of as a color picker. We mentioned the Recolor method above, but never specifically, and we’ll use it here.

4.1 Obtaining image color

Getting the graphics color is very simple, we just need to create an ImageColorGenerator object:

import cv2
from wordcloud import ImageColorGenerator
# fetch image
im = cv2.imread('1.png')
Get the image color generator
im_color = ImageColorGenerator(im)
Copy the code

Once we get the color, we can redraw the image.

4.2 Redrawing images

We use Recolor to color the word cloud as follows:

import cv2
import jieba
from wordcloud import WordCloud,ImageColorGenerator
# fetch image
im = cv2.imread('1.jpg')
Get the color generator
im_color = ImageColorGenerator(im)
# Open file
f = open('csb.txt'.'r', encoding='utf8')
# read file
article = f.read()
# participle
words = jieba.cut(article)
# Splicing words
text = "".join(words)
Create a word cloud object
wc = WordCloud(
    background_color='white',
    font_path='msyh.ttc',
    mask=im,
)
Generate word clouds from text
wc.generate(text)
# Color words
wc.recolor(color_func=im_color)
# Save the word cloud
wc.to_file('wc.png')
Copy the code

Where csb. TXT is the tutorial table, the above code execution effect is as follows:

The green algal head is visually displayed, you can use other materials to try it out.

All the code for this article has been uploaded to GitHub:

Github.com/IronSpiderM…