Installing third-party Modules

  • pip install

  • Download the source code and go to the decompressed directory, Python setup.py install

  • PIP install ***.whl


Determines whether the request was successful

assert response.status_code == 200
Copy the code
  • response.header Is the response header
  • response.request.headersIs the header of the request
  • Cookie can be set not only with Response, but also with JS
  • response.request.headersIn the dictionary returnedUser-AgentThe default isPython-requests /2.**.*(version number)

Send the request with the header

why?

Impersonate the browser, trick the server, and get the same content as the browser

how?

  • Header form: dictionary

  • # How to set
    header = {"User-Agent" = "Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36"}
    
    
    requests.get(url,headers = headers)
    Copy the code

Sends a request with parameters

  • Arguments are in the form of dictionaries
  • Kw = {'wd':' Great Wall '}
  • Usage:requests.get(url,params = kw)

URL encoding

  • https://www.baidu.com/s?User-Agent=Mozilla%2F5.0+%28Windows+NT+10.0%3B+Win64%3B+x64%29+AppleWebKit%2F537.36+%28KHTML%2C+ Like + 29 + Chrome Gecko % % 2 f84. 0.4147.105 + Safari % 2 f537. 36 & wd = % E5 E5 A9 E5%93% % % % 93% 94% 93% 94% E5%93% A9
  • Use an online tool to decode

Resolve returned page garbled characters

1. Use the Chardet library to determine the page code

import chardet 
chardet.detect(r.content) # View the page escape code
r.content.decode(chardet.detect(r.content).encoding)

"" contains Chinese words, there is misjudgment, resulting in garbled gb2312 < GBK < GB18030" "
Copy the code

2. Use ccharDET library

import cchardet 
Cchardet is more accurate than Chardet
cchardet.detect(r.content) # View the page escape code
r.content.decode(cchardet.detect(r.content).encoding)

Copy the code

How a string is formatted

"{} bilibilii ", format(" search ") ->

"Beep {0} beep {1}". Format (" beep", "lI ")->" beep {1}"

"%(name)s playing %(gamename)s have %(time)d min"%{'name':'Elvis','gamename':'OW','time':60}


List derivation

[i for i in range(3)] - > [0.1.2]
[i+3 for i in range(3)] - > [3.4.5]
["a" for i in range(3)] - > ["a"."a"."a"]
["a" for i in range(3) if i%2= =0] - > ["a"."a"]
Copy the code

Using proxy IP

  • Prepare a bunch of IP addresses to form an IP address pool and randomly use one IP address
  • Check IP availability
    • You can use Requests to add timeout parameters to determine the quality of IP addresses
    • Online proxy IP quality inspection site
  • Add key-value pairs to get() ‘s data
  • How to randomly select proxy IP addresses
    • {“IP”:IP,”times”:0}
    • [{},{},{}], sort the IP list by the number of uses
    • Select 10 IP addresses that are not frequently used and select one at random

Cookie carrying request

  • Carry a bunch of cookies for the request, the cookies into a Cookie pool

Three ways to get the page after login

① Use the Requests session class to provide ideas for requesting web sites after logging in

  • Instantiation of the session
  • First use the session to send the request, log in to the website, and save the cookie in the session
  • If the website can be accessed only after login with session request, session can automatically carry the cookies saved in the successful login and request

② Do not send post request, use cookie to get the page after login

  • A site whose cookie expires for a long time
  • Get all the data before the cookie expires
  • For use with other programs that specifically get cookies, the current program is only responsible for requesting the page

③ Convert the cookie string in the request to a dictionary

  • Let’s start with dictionary derivation

  • String segmentation in the derivation

    Copy the code

Cookie = “td_cookie=18446744073253590511; anonymid=ke7y6ilm-ahnt2f; depovince=GW; r01=1; taihe_bi_sdk_uid=77c5697f32fb227f7b7ae5e3d07cd4a8; __utma = 151146938.1942804965.1598238975.1598238975.1598238975.1; __utmz = 151146938.1598238975.1.1.utmcsr=renren.com | utmccn = (referral) | utmcmd = referral | utmcct = /; JSESSIONID=abc8yDsem3PpRa82_nFqx; ick_login=7ec23d19-f0e2-412c-a11f-62a04ae519d9; taihe_bi_sdk_session=8b040723910994a20200d9ed284e9d59; jebecookies=bfa7caf7-0e49-436a-9025-d6dfc7bafafe|||||; _de=17F2A61E5D774F08027123AF20EEBA9A; p=da2e70a380c5b80b935a29d95602b2e34; first_login_flag=1; ln_uact=13610052334; Ln_hurl=head.xiaonei.com/photos/0/0/… ; t=53a151335819c405da02c6bde4bff7ce4; societyguester=53a151335819c405da02c6bde4bff7ce4; id=974982934; xnsid=e0a46487; Ver = 7.0; loginfrom=null; wp_fold=0”

cookie = {i.split(“=”)[0]: i.split(“=”)[1] for i in Cookie.split(“; “)}

------------ ### page form submission * The attribute value of "name" in the page tag is used as the key, and the real data of individual users is the value. Send it with a POST request ### find the login post address * find the action URL in the form * Post data is the input tag name value as the key, the real username password as the dictionary of values, * Select "Perserve Log" button to prevent page redirection to not find the URL * Search for POST data, confirm parameters * parameter unchanged -- directly use, Such as dynamic encryption password is not * * parameters change in the current response by JS generation -- -- -- -- -- -- -- -- - # # # < a href = "https://www.bilibili.com/video/BV1Lx411d7Cj?p=16" </a> * Select the button that triggers the JS event and click the "Event Listener" button in debug. Find the JS location * Search all file in Chrome for keywords in the URL * Add a breakpoint to view the JS operation. Do the same thing with Python ------ ### Requests tip * Requests. Get, you can view the cookies returned through the property of "cookies", which is an object. This can be converted to a dictionary with "requests. Utils.dict_from_cookiejar". * Convert the dictionary to the object "requests.utils.cookiejar_from_dict" * "requests.utils.unquote" method, decode the URL address * "requests.utils.quote" method, The URL address code -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- * request SSL certificate validation * ` response = requests. Get (" https://www.12306.cn/mormhweb/ ", Verify ` = False) -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - * set the timeout * ` ` ` response = Requests. The get (" https://www.12306.cn/mormhweb/ ", timeout = 20) ` ` ` * "retrying" third-party libraries, Timeout attempt to retry * ` ` ` # retry 3 @ retry (stop_max_attempt_number = 3) ` ` ` -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - * with a status code determine whether the request is successful * ` ` ` assert Response. status_code == 200 * use try-except [exception state]Copy the code