preface

Let’s share a crawler this weekend. Use Python to crawl and simply analyze dangdang’s book data visually.

Let’s have a good time

The development tools

Python version: 3.6.4
Related modules:

Requests module;

Bs4 module;

Wordcloud module;

Jieba module;

Pillow module;

Pyecharts module;

And some modules that come with Python.

Environment set up

Install Python and add it to the environment variables. PIP installs the required related modules.

Data crawl

Task:

Based on a given keyword, crawls all book data associated with that keyword.

Implementation:

Using the python keyword as an example, the web page for the book data we want to crawl looks like this:

The link format of the webpage is:

search.dangdang.com/?key={keyword}&act=input&page_index={page_index}’

So request all links related to keywords:

BeautifulSoup then uses BeautifulSoup to parse the returned web data separately and extract the data we need:

Operation effect:

Run the “ddspider. py” file in a CMD window.

The effect is as follows:

All the source code for this section is available in the ddspider. py file in the related files section of the main page introduction.

The data analysis

Ok, now let’s simply visualize the wave of python related book data we crawled through

Let’s first look at the distribution of book prices:

Does anyone want to know the price of the most expensive Python book? The answer: 28390RMB

The title is:

Python in Computers Programming

Take a look at the distribution of books’ ratings:

It seems that most python books are unbought

How about the number of comments?

So what are the top six books with the most reviews?

As usual, draw two word clouds to finish, how about the introduction of all python related books?

All source code for this section is available in the profile:

To help upgrade those of you who are learning Python, here is a rich learning package