Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

When crawler is used, it is often necessary to construct request Headers to disguise crawler as browser to bypass anti-crawler mechanism. According to the anti-crawler mechanism of target website server, we often need to construct request header parameters such as Accept, user-agent, Cookie, etc. If you need to manually copy and paste every time you crawl, it’s too tedious.

This article provides a simple Python script that formats the HEADERS section of the browser in one click, making it much easier to write crawlers.

The code is as follows:

Import re headerStr = "" request headers from browser copied here "" ret = "" for I in headerStr: if I == '\n': i = "',\n'" ret += i ret = re.sub(": ", "': '", ret) print(ret[3: -3])Copy the code

When used, copy and paste the Request Headers selection of the package caught by the developer tool into the code at headerStr.

As shown below.

import re headerStr = ''' Accept: text/html,application/xhtml+xml,application/xml; Q = 0.9, image/avif, image/webp image/apng, * / *; Q = 0.8, application/signed - exchange; v=b3; Q =0.9 Accept-encoding: gzip, deflate Accept-language: zh-cn,zh; Q =0.9 Connection: keep-alive Cookie: ******* Host: paper.people.com.cn Referer: http://paper.people.com.cn/rmrb/html/2021-10/08/nbs.D110000renmrb_01.htm Upgrade-Insecure-Requests: 1 User-Agent: Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36 "" ret = "" for I in headerStr: if i == '\n': i = "',\n'" ret += i ret = re.sub(": ", "': '", ret) print(ret[3: -3])Copy the code

After running the code, the formatted Headers string is printed and can be used directly in the code, which is very convenient.