preface

Using Python to crawl and simply analyze the data of A-share companies. Let’s get off to a happy start

The development tools

Python version: 3.6.4
Related modules:

Requests module;

Bs4 module;

LXML module;

Pyecharts module;

Wordcloud module;

Jieba module;

And some of the modules that come with Python.

Environment set up

Install Python and add it to the environment variable, and the PIP will install the appropriate module.

Data crawl

Target Website:

http://www.askci.com/reports/

The data to be crawled is shown in the figure below:

Instead of thinking about it, BeautifulSoup extracts the data directly from BeautifulSoup. The source code is as follows:

Complete source code for details see personal home page or private message access to related files in the spider. py file.

Screenshots of the running effect are as follows:

All done~

The data analysis

In the data crawling part, we obtained A total of 3573 pieces of A-share company data. Let’s take A simple visual analysis of the wave

First, let’s take A look at the regional distribution of A-share companies:

The provinces with more than 300 A-share companies are:

  • guangdong
  • Beijing
  • zhejiang
  • jiangsu

Here’s A look at the earnings of A-share companies:

Among them, the TOP10 main business revenue are:

Or look at the number of employees in A-share companies:

Take A look at the listing time distribution of A-share companies:

Among them, the number of listed companies in 2013 was the least (2); The year 2017 saw the highest number of companies go public (438).

OK, or let’s take A look at the TOP10 industry types of A-share companies:

Emmmm, very real.

Finally, let’s draw the A-share company’s main business into A word cloud.

After reading this article, your friends like to click “like” to support it. Pay attention to me to share Python data crawler cases every day. The next article to share is Python simple analysis of college entrance examination data

All done! All of the source code covered in this section is contained in the analysis.py file in your profile or in your private message-related file.