Preface: finally began to write blog, wish the technology along with the word number growth.

It feels like the basic syntax of major programming languages is pretty much the same, like variable types, loops, branches, etc. So after taking a look at the basic Python syntax, I went straight to the crawler. Learning is input, blogging is output. Between the input and output, to learn some solid, not black big blind break off the stick, break off a. The study requests are based on the open courses of Beijing Institute of Technology in the MOOC of Chinese universities. What Mr. Song Tian said is really good.Copy the code

1. Install the requests

CMD command line to execute the PIP install requests command

2. requests common methods

There are seven main methods for requests, which correspond to HTTP GET, HEAD, POST, PUT, PATCH, and DELETE + Request () methods respectively. I mainly remember two related to reptiles.Copy the code
The method name Parameters that For example,
requests.requests(method,url,**kwargs) Method. **kwargs represents 13 access control parameters requests.request(get,”www.baidu.com”,data)
requests.get(url,params=None,**kwargs) Where params is a dictionary or byte stream format that is padded after the URL requests.get(“www.baidu.com”)

Special note: ** Kwargs are detailed in Section 5 below

3. requests.get()

The requests. Get () method has two main objects: the Requests object and the Response object

The Requests object is the data object sent to the browser when the GET () method is called and the Response object is the browser data object that gets in, and more importantly (we’ll focus on the Response object).

Requests and Response object primary properties

r = resquests.get(“www.baidu.com”,timeout=10)

attribute instructions For example,
r.status_code HTTP request status. 200 indicates a successful request and 404 indicates a failed request 200
r.encoding Guess the encoding format of the response content based on the HTTP header returned ISO – 8859-1 (the default)
r.apparent_encoding The encoding format analyzed from the corresponding content uft-8
r.text The string format of the corresponding content .

4. Common code framework for crawling web pages

Crawling web pages is risky and handling exceptions is important. At present, my ability is still weak, and I wrote a simple method to deal with abnormalities with My teacher Song Tian. As follows:

def getHTML(url):
    try:
        r = requests.request(url)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        return r.text
    except:
        return 'An exception has occurred!! '
Copy the code

5. Climb the access control parameters of the web page

** Kwargs (timeout, Params, data, JSON, Proxies, header, Auth, files, cookie, Allow, redirects, cert, Stream, verify) The first six of these are interesting and often used, so make a note of them. After a few use of the time to ask Baidu bar.Copy the code
The parameter name instructions For example,
params Fill it in after the URL and send it to the server as part of the Requests object See code 1 below for an example
data Dictionary, byte sequence, or file object as the contents of Request no
json Data in JSON format as the content of the Request no
timeout Set the timeout period in seconds timeout=10
header Dictionary format, HTTP custom header See code 2 below for an example
proxies Dictionary type, set proxy server. Login authentication can be added to prevent backtracking of crawlers See code 3 for an example
1 # code
import requests
body = '123'
r = requests.request("GET"."http://www.python123.io/ws", params=body)
print(r.url)
Copy the code

The output is “www.python123.io/ws?123”

2 # code
import requests
hd = {"user-agent":"Chrome/10"}
r = requests.request("POST"."http://www.python123.io/ws", headers=hd)
Copy the code

Send a request to the server on behalf of an emulated Chrome10 version of the browser

3 # code
import requests
pxs = {"http":"http://user:[email protected]:1234"
       "https":"https://10.10.10.1:4321"}
r = requests.request("GET"."http://www.python123.io/ws", proxies=pxs)
Copy the code

Well, today's summary first wrote here, or heart is not quiet ah! Diary is insufficient, the old remember more than!Copy the code