Foreword: \

With the growing prosperity of the Internet big data industry, more and more people are engaged in it, and many friends have a strong interest in it and want to participate in it. From this issue, we will introduce you into the Internet big data industry in four phases, and understand four different positions related to big data: data mining & machine learning, data analysis, algorithm & Deep learning, and data product manager.

* * * *

Data Source:

Our data for the next four periods mainly come from Laogou. Currently, popular recruitment websites liepin, Boss Zhipin and Laogou all have a lot of Internet job introductions. We chose the check box for the following reasons: 1. Salaries are mostly direct and rarely negotiable. 2. The number of enterprises is relatively complete, basically covering Internet-related companies. 3. The URL address is relatively orderly, which is convenient for batch crawling. The data display page is as follows:

This section uses Selenium in Python to crawl, with the following code:

Copy the code
  1. while True:
  2.        try:
  3.            for j in range(15):
  4.                xpath = '//*[@id="s_position_list"]/ul/li['+str(j+1)+']'
  5.                a = driver.find_element_by_xpath(xpath)
  6.                job_desc.append(a.text)
  7.                job_code.append(a.find_element_by_class_name('position_link'
  8.                                ).get_attribute('data-lg-tj-cid'))
  9.            js="var q=document.documentElement.scrollTop=10000"  
  10.            driver.execute_script(js)
  11.            driver.find_element_by_class_name('pager_next').click()
  12.        except:
  13.            break

* * * *

Salary:

We’ll take a look at the number of opportunities and the average monthly salary in each city from several angles, starting with the following chart (the bubble size indicates the number of jobs and the bar height indicates the average monthly salary) :

It can be seen that the number of jobs in Wuhan, the eighth city in the list, is already one fortieth of that in Beijing, and the number of jobs in the bottom city is less than 20. To some extent, this reflects the concentration of data mining & machine learning positions in Beijing, Shanghai, Guangzhou, Shenzhen and Hangzhou. Besides the five major cities, Chengdu, Nanjing and Wuhan also have unlimited potential in the future.

Let’s take a look at the number of jobs and salaries for different types of work experience:

You can see that most of the job opportunities on the checkbox are for experienced job seekers. Three years and five years of work experience have also become two more important thresholds, and the salary will be clearly indicated, which shows that enterprises value experience

Let’s look at the requirements of an enterprise for academic qualifications:

It should be noted that the education requirement on the check box is the minimum requirement, and the average education in the actual work will be much higher than that shown in the picture.

Let’s take a look at how much experience increases pay in different cities by combining cities with experience:

Beijing leads the country in salary across all categories of work experience, demonstrating the capital’s status as an Internet hub. In terms of the comparison of 5-10 years of work experience, the increase rate of Guangzhou lags behind other big cities. Friends who work in Guangzhou can share with us whether this is realistic to some extent.

Major companies offer the average monthly salary:

BAT&TMD are among the 15 companies that offer the most jobs on The website, including sogou, Weibo and NetEase. Surprisingly, the highest-paid job offers come from Sina Weibo, where the actual average salary of a company is notoriously complex, and the figures are just for reference.

We plotted the above chart using GGplot, with the code as follows (take the company salary chart as an example) :

Copy the code
  1. ggplot(company_com,aes(x=reorder(company,-salary),y=salary,fill=as.character(rep(1:5,each=3))))+
  2.  geom_bar(stat='identity')+
  3.  geom_text(aes(label=round(salary,2),y= salary+1),size=5)+
  4.  theme_wsj()+
  5.  scale_fill_wsj()+
  6.  scale_color_wsj()+
  7. Ggtitle (' Average monthly salary (K)')+
  8.  theme(axis.text.x = element_text(size=12),
  9.        axis.text.y = element_blank(),
  10. The plot. The title = element_text (hjust = 0.5, size = 25),
  11.        legend.position='none',
  12.        panel.grid = element_blank(),
  13.        axis.title  = element_blank(),
  14. Axis. The text = element_text (face = 'bold' hjus = 0.8, size = 10, Angle = 15)
  15.  )

* * * *

Expected monthly salary:

We use linear regression model to simply calculate the expected salary (the data is monthly salary, and the unit is K). We only select three factors, including experience, city and education background, and do not consider factors such as interaction term and high rank term. The results are for reference only, and the actual situation is much more complicated:

\

Skills & Benefits required:

In addition to the above hardware requirements, the practical skills that a person has are actually more important in order to get a good salary. Let’s take a look at the data mining & machine learning skills required for entry.

What kind of benefits can we get after successfully joining the company? Please take a look at the following figure:

\

Python Chinese community as a decentralized global technology community, to become the world’s 200000 Python tribe as the vision, the spirit of Chinese developers currently covered each big mainstream media and collaboration platform, and ali, tencent, baidu, Microsoft, amazon and open China, CSDN industry well-known companies and established wide-ranging connection of the technical community, Have come from more than 10 countries and regions tens of thousands of registered members, members from the Ministry of Public Security, ministry of industry, tsinghua university, Beijing university, Beijing university of posts and telecommunications, the People’s Bank of China, the Chinese Academy of Sciences, cicc, huawei, BAT, represented by Google, Microsoft and other government departments, scientific research institutions, financial institutions, and well-known companies at home and abroad, nearly 200000 developers to focus on the platform.

Click **** to read the original article and become a free member of **** community