preface

Using Python to crawl and simply analyze the pull net recruitment data, let’s take a look.

The development tools

Python version: 3.6.4

Related modules:

Requests module;

Pyecharts module;

And some of the modules that come with Python.

Environment set up

Install Python and add it to the environment variable, and the PIP will install the appropriate module.

Data crawl

Climb job postings targeted at some big cities:

You can retrieve data with a POST request:

The required parameters are:

PN stands for page number and KD for keyword.

I wrote the simplest version of the crawler to retrieve data, so there is no proxy pool and cookie pool, so I set a long delay to prevent it from being blocked, because the anti-crawling mechanism of the dragnet seems to be a bit better:

The job listings in Shanghai, Beijing, Guangzhou, Nanjing, Shenzhen, Hangzhou, Chengdu, Wuhan, and Tianjin were retrieved. The job search keywords were set to Python.

Screenshot of how this part of code is used:

The data analysis

First, let’s take a look at the typical salary for a Python-related position:

The minimum is 9.4K (mean), and the maximum is 17.6K (mean)

Let’s take a look at the educational requirements of the positions (the results of the recruitment data of the cities I crawled are the same below) :

So, general undergraduate course graduates with broad ~

Then let’s look at the nature of the job:

OK, let’s look at the typical size of a company looking for a Python-related position:

So, not so bad

Now, which industries employ Python? The statistical results are shown in the figure below:

Python seems to be needed for every domain

Climb the data is not much, the analysis to here ~

To help those who are learning Python, there is a rich learning kit here

Complete source code and results see the personal introduction to obtain the relevant files.