In the era of big data, there are more and more web crawlers using proxy IP, and they have heard about the enhanced version of tunnel forwarding crawler of Yiuniuyun, but they do not know much about tunnel forwarding crawler, nor do they know how to use crawler or the benefits of crawler.

Tunnel forwarding crawler: “Yiuniuyun Crawler IP” establishes a dedicated network link through a fixed cloud proxy service address, and the proxy platform automatically realizes the millisecond-level proxy IP switch, ensuring network stability and speed and avoiding crawler clients’ investment in proxy IP policy optimization.

It is clear to everyone that the process of sending requests using API proxy IP is normal: web crawler users access the URL connection of Yuniuyun API through their own programs, and periodically obtain proxy IP information through the URL. Crawler sends requests to target websites through these proxy IP addresses to obtain data, and then stores them. Complete the crawler process. This is the general web crawler users using a proxy IP method, actually there are more ways to use proxy IP to get data an API way, use a proxy IP network, there are many kinds of ways, forwarding agent is one of the creeper, tunnel and tunnel forward relative API agency crawler, convenient and easy to use and fast speed.

Yiniuyun tunnel forwarding crawler: Network crawler only needs to modify the interface, extract the proxy information four parameters (domain name, port, user name and password) configuration to the code use. At runtime, observe the status code returned by HTTP. If 200 is returned, it is correct. A web crawler’s program simply makes a request for data, and our server forwards the request. You do not need to manage or maintain the IP address pool. You only need to send requests to obtain data and collect statistics on the success rate.

Tunnel forwarding crawler has been popular among web crawler workers since it is easy to use and fast.

#! -*- encoding:utf-8 -*- import base64 import sys import random PY3 = sys.version_info[0] >= 3 def base64ify(bytes_or_str): if PY3 and isinstance(bytes_or_str, str): input_bytes = bytes_or_str.encode('utf8') else: input_bytes = bytes_or_str output_bytes = base64.urlsafe_b64encode(input_bytes) if PY3: return output_bytes.decode('ascii') else: return output_bytes class ProxyMiddleware(object): def process_request(self, request, spider): ProxyHost = "t.16yun.cn" proxyPort = "31111" proxyUser = "username" proxyPass = Meta ['proxy'] = "http://{0}:{1}". Format (proxyHost,proxyPort) # encoded_user_pass = base64ify(proxyUser + ":" + proxyPass) request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass # Tunnel = random. Randint (1,10000) request. Headers [' proxy-tunnel '] = STR (tunnel)Copy the code