As for the data acquisition of recruitment websites, our crawler partners must have experimented, and xiaobian also shared the information of grabbing pull hook and Zhipin before. But before it belongs to the small white advanced learning, all aspects are slightly almost. So today we use mature technology to grab the information of Liepin by scrapy. Liepin is a more authoritative website, so the anti-crawl aspect of the site must also be stricter, but we have learned from the previous experience. In the face of obstacles, we have a solution.

Today we will share how to obtain liepin data by using different proxy methods, that is dynamic forwarding. By default, each HTTP request is automatically forwarded without binding an IP address whitelist. This technology is convenient and suitable for quick project startup. This is more suitable for some novices than our traditional API acquisition mode, interested can come to understand. Let’s briefly share the access module about proxy. The following is an example:

#! -*- encoding:utf-8 -*- import base64 import sys import random PY3 = sys.version_info[0] >= 3 def base64ify(bytes_or_str): if PY3 and isinstance(bytes_or_str, str): input_bytes = bytes_or_str.encode('utf8') else: input_bytes = bytes_or_str output_bytes = base64.urlsafe_b64encode(input_bytes) if PY3: return output_bytes.decode('ascii') else: return output_bytes class ProxyMiddleware(object): def process_request(self, request, spider): ProxyHost = "t.16yun.cn" proxyPort = "31111" proxyUser = "16GJUVVQ" proxyPass = Meta ['proxy'] = "http://{0}:{1}". Format (proxyHost,proxyPort) # add validation header encoded_user_pass = base64ify(proxyUser + ":" + proxyPass) request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass # Tunnel = random. Randint (1,10000) request. Headers [' proxy-tunnel '] = STR (tunnel)Copy the code

Today, we will mainly share the access mode of the new proxy mode, and we will share the specific data capture process in detail next time. In the learning crawler agent this small partner can share the exchange.