preface

Use Python to crawl and simply analyze a-share company data. Let’s get started

The development tools

Python version: 3.6.4
Related modules:

Requests module;

Bs4 module;

LXML module;

Pyecharts module;

Wordcloud module;

Jieba module;

And some modules that come with Python.

Environment set up

Install Python and add it to the environment variables. PIP installs the required related modules.

Data crawl

Target Website:

http://www.askci.com/reports/
Copy the code

The data to be crawled is as follows:

Needless to think, BeautifulSoup extracts these data directly. The source code is as follows:

For complete source code, see the spider. py file on your home page or in a private file.

The screenshot of the operation effect is as follows:

All done~

The data analysis

In the data crawl part, we obtained A total of 3573 A share company data, let’s simply analyze A wave of visualization

First, let’s take A look at the regional distribution of A-share companies:

Among them, there are more than 300 Provinces with A-share companies:

  • guangdong

  • Beijing

  • zhejiang

  • jiangsu

Let’s take A look at the revenue of a-share companies:

The TOP10 main business revenues are:

Take A look at the number of employees in a-share companies:

Let’s take A look at the listing time distribution of A-share companies:

Among them, the number of listed companies in 2013 was the least (2); The largest number of companies went public in 2017 (438).

OK, let’s take A look at the TOP10 industry types of a-share companies:

Emmmm, very real.

Finally, we draw the a-share company’s main business into A word cloud.

I share Python data crawler cases every day. The next article is to share Python simple analysis of college entrance examination data

All done! All of the source code referred to in this section is in the analysis.py file, either in a profile or in a private message related file.