preface

Let’s take a look at using Python to crawl and simply analyze the pull list recruitment data.

The development tools

Python version: 3.6.4

Related modules:

Requests module;

Pyecharts module;

And some modules that come with Python.

Environment set up

Install Python and add it to the environment variables. PIP installs the required related modules.

Data crawl

Crawl job listings targeted at some major cities:

Find that data can be retrieved with a POST request:

The required parameters are:

Pn stands for page number and KD stands for keyword.

Write the simplest version of crawler to crawl data, there is no proxy pool and cookie pool, so directly set a relatively long delay to prevent blocking, because the dragnet crawler mechanism seems to be a bit good:

It crawls job listings for Shanghai, Beijing, Guangzhou, Nanjing, Shenzhen, Hangzhou, Chengdu, Wuhan and Tianjin, with the search keyword set to Python.

Screenshot of how this part of the code is used:

The data analysis

First, let’s take a look at the typical salary for a Python-related job:

At least 9.4K (mean), at most 17.6K (mean), weird, Beijing and Tianjin are quite close to each other

Take a look at the degree requirements for the job (combined with the recruitment data from the cities I crawled, the same below) :

So, general undergraduate course graduates with respect to broad ~

Then let’s look at the nature of the job:

OK, let’s take a look at the typical size of a python-related company:

So, it’s ok

Next, which industries are hiring Python? The statistical results are as follows:

Every domain seems to need Python

Climb data is not much, analysis here ~

To help upgrade those of you who are learning Python, here is a rich learning package

Complete source code and results see personal introduction to obtain related files.