preface

The text and pictures in this article come from the network, only for learning, communication, do not have any commercial purposes, if you have any questions, please contact us to deal with.

PS: If you need Python learning materials, please click on the link below to obtain them

Free Python learning materials and group communication solutions click to join

The development tools

  • Python 3.6.5

  • Pycharm

    import requests import parsel import csv import time 1234

Related modules can be installed using the PIP command

Web data analysis

! [](https://p6-tt-ipv6.byteimg.com/large/pgc-image/d896f38e77264c54910029fe3947823d)
! [](https://p1-tt-ipv6.byteimg.com/large/pgc-image/123228a2a23944468bde0ecf0c5c1bfc)

As shown in the figure, these are all the data to be captured today

Open the developer tools

! [](https://p6-tt-ipv6.byteimg.com/large/pgc-image/8fcca11b679a46059678457eaac49e1e)

We can look at the data returned by the webpage, copy the data and search in Response to see whether there is needed data among the data returned by the webpage requested.

import requests # pip install requests url = 'https://www.zhipin.com/c100010000/?query=python&page=1&ka=page-1' headers Response = requests. Get (url=url, params=params, headers=headers) print(response.text) 12345Copy the code

Second, analyze web data structure

Select the Elements selection arrow in the developer tool to select data from the web page, and it will automatically jump to the page TAB, telling you where the data is in the page TAB.

! [](https://p1-tt-ipv6.byteimg.com/large/pgc-image/e3367a4a474249ca88958a481b0861ec)

As shown in the figure above, the recruitment data of each enterprise is contained in the LI label. We only need to extract the required data through data analysis.

import parsel # pip install parsel response.encoding = response.apparent_encoding selector = Parsel.selector (response.text) lis = Selector. CSS ('#main. Dit = {} # create a dictionary to accept data for li in lis: Title = li. CSS ('. The job - the name a: : attr (title) '). The get () # title dit [' title '] = title area = li. The CSS (' job - area: : text '). The get () # region Dit [' region '] = area xz_info = li.css('.red::text').get() # salary dit[' salary '] = xz_info xl_list = li.css('.job-limit ' P: : text '). The getall () # degree experience xl_str = '|'. Join (xl_list) dit [' academic experience] = xl_str js_list = li. The CSS (' tags span: : text '). The getall () # skill js_str = '|'. Join (js_list) dit [' skill '] = js_str company = li. The CSS (' the company - text. Name a: : attr (title) '). The get () # Company name dit [' company name '] = company gz_info = li. The CSS (' company - text p: : text '). The getall () # work types gz_str = '|'. Join (gz_info) Dit [' job type '] = gz_str fl_info = li. The CSS (' info - desc: : text '). The get () # welfare dit [' welfare '] = fl_info 1234567891011121314151617181920212223242526272829Copy the code

3. Data preservation

CSV ', mode='a', encoding=' UTF-8-sig ', encoding=' utF-8-sig ', encoding=' utF-8-sig ', Csv_writer = CSV.DictWriter(f, fieldNames =[' Title ', 'region ',' Salary ', 'educational background ',' Skill Requirement ', 'company name ',' Job type ', Csv_writer.writerow (dit) # write to 1234567891011Copy the code

Four, multi-page crawl

'''
https://www.zhipin.com/c100010000/?query=python&page=1&ka=page-1
https://www.zhipin.com/c100010000/?query=python&page=2&ka=page-2
https://www.zhipin.com/c100010000/?query=python&page=3&ka=page-3
'''
12345
Copy the code

Each page is changed by changing the page parameter

for page in range(1, 10):
	url = 'https://www.zhipin.com/c100010000/?query=python&page={}&ka=page-{}'.format(page)
12
Copy the code

This can achieve the effect of page crawling!

Implementation effect

! [](https://p6-tt-ipv6.byteimg.com/large/pgc-image/a6811d69e30b4a85b0a82b9390373c51)
! [](https://p6-tt-ipv6.byteimg.com/large/pgc-image/5a0cb8a8cec14780b7e2bd6403655962)

That’s all for this article

If you like it, you can like it

Have don’t understand the place also can private letter I or comment