Urllib +Re(regular expression) is used to write crawlers to achieve, but today I understand Requests library, it is more convenient to initiate Requests than URllib, I am based on the Chinese university MOOC in Beijing Institute of Technology teacher Song Tian “Python Web crawler and Information Extraction” course for the introduction of learning

Requests the libraryThe website address

Several main methods of the Requests library

methods	instructions
requests.request()	Construct a request that supports the underlying methods for each of the following methods
requests.get()	The main method of getting HTML web pages, corresponding to HTTP GET
requests.head()	Method of getting HTML page header information, corresponding to HTTP HEAD
requests.post()	A method of submitting a POST request to an HTML web page, corresponding to an HTTP POST
requests.put()	A method for submitting a PUT request to an HTML page, corresponding to HTTP PUT
requests.patch()	Submit local modification requests to HTML web pages, corresponding to HTTP patches
requests.delete()	Submit a DELETE request to an HTML page, corresponding to HTTP DELETE

These HTTP methods are described below

methods	instructions
GET	Request a resource at the URL location
HEAD	Request a response message report for a URL location resource, that is, get the header information for that resource
POST	Request to append new data to the resource at the URL location
PUT	Request to store a resource at the URL location, overwriting the resource at the original URL location
PATCH	Requests a local update of a resource at a URL location, that is, changes part of the content of that resource
DELETE	Request to delete the resource stored at the URL location

The get method

Requests. Get (url,params=None,**kwargs) is the url, the argument sent, and the other arguments to the function return a Response object. Here are some properties of the Response object

attribute	instructions
r.status_code	Return status of the HTTP request. 200 indicates a successful connection and 404 indicates a failed connection
r.text	The HTTP response content is a string, that is, the page content corresponding to the URL
r.encoding	Guess the encoding of the response content from the HTTP header
r.apparent_encoding	Encoding method of response content analyzed from content (alternative encoding method)
r.content	The binary form of the HTTP response content

import requests
r = requests.get("http://www.baidu.com")
type(r)
Copy the code

requests.models.Response
Copy the code

r.status_code
Copy the code

200

r.text
Copy the code

You can see,r.textProperty of the web page data, Chinese is not displayed normally

R.coding is ISO-8859-1, because the value of R.coding comes from the charset property of the header. If charset is not present in the header, the encoding is considered ISO‐8859‐1. See R.haaders

r.headers
Copy the code

{‘ the content-type: text/HTML, ‘the Content – Encoding: “gzip”}

R.ext displays web content according to R.encoding, which displays garbled characters and then looks at R.aparent_encoding, which guesses what encoding should be based on the content in the text

r.encoding = 'utf-8'
r.text
Copy the code

It’s working.

The head () method

>>> r = requests.head('http://httpbin.org/get')
>>> r.headers
{'the Content ‐ Length': '238'.'Access ‐ Control ‐ Allow ‐ Origin': The '*'.'Access ‐ Control ‐ Allow ‐ Credentials': 'true'.'the Content ‐ Type': 'application/json'.'Server': 'nginx'.'Connection': 'keep ‐ alive'.'Date': 'Sat, 18 Feb 2017 12:07:44 GMT'}
>>> r.text
' '
Copy the code

Post () method

The data argument is the data submitted by the PSOT method

POST a dictionary to the URL automatically encoded as a form

>>> payload = {'key1': 'value1'.'key2': 'value2'}
>>> r = requests.post('http://httpbin.org/post', data = payload)
>>> print(r.text)
{  ...
"form": {
"key2": "value2"."key1": "value1"}},Copy the code

POST a string to the URL automatically encoded as data

>>> r = requests.post('http://httpbin.org/post', d  a  t  a = 'ABC')
>>> print(r.text)
{  ...
"data": "ABC"
"form": {},}Copy the code

The other methods are similar

`requests.request()`methods

requests.request(method, url, **kwargs)

Method: indicates the request mode, including get, PUT, and POST
Url: The URL link of the page to be obtained
**kwargs: 13 parameters to control access

The six HTTP equivalents actually call the Requests method

r = requests.request('GET', url, **kwargs)
r = requests.request('HEAD', url, **kwargs)
r = requests.request('POST', url, **kwargs)
r = requests.request('PUT', url, **kwargs)
r = requests.request('PATCH', url, **kwargs)
r = requests.request('delete', url, **kwargs)
r = requests.request('OPTIONS', url, **kwargs)
Copy the code

Where **kwargs: control access parameters have the following

parameter	instructions
params	A dictionary or byte sequence added as a parameter to the URL
data	A dictionary, byte sequence, or file object as the contents of the Request
json	Data in JSON format as the content of the Request
headers	Dictionary, HTTP custom header
cookies	Dictionary or CookieJar, cookies in Request
auth	A tuple that supports HTTP authentication
files	Dictionary type, transfer file
timeout	Set the timeout period in seconds
proxies	Dictionary type, set access proxy server, can add login authentication
allow_redirects :	True/False, default True, redirection switch
stream	True/False, default is True, get content immediately download switch
verify	True/False: indicates whether to authenticate the SSL certificate
cert	Local SSL certificate path

Params example

>>> kv = {'key1': 'value1'.'key2': 'value2'}
>>> r = requests.request('GET'.'http://python123.io/ws', params=kv)
>>> print(r.url) http://python123.io/ws? key1=value1&key2=value2Copy the code

The data example

>>> kv = {'key1': 'value1'.'key2': 'value2'}
>>> r = requests.request('POST'.'http://python123.io/ws', data=kv)
>>> body = 'Body content'
>>> r = requests.request('POST'.'http://python123.io/ws', data=body)
Copy the code

Json example

>>> kv = {'key1': 'value1'}
>>> r = requests.request('POST'.'http://python123.io/ws', json=kv)
Copy the code

Headers example

>>> hd = {'the user ‐ agent': 'Chrome/10'}
>>> r = requests.request('POST'.'http://python123.io/ws', headers=hd)
Copy the code

Example files

>>> fs = {'file': open('data.xls'.'rb')}
>>> r = requests.request('POST'.'http://python123.io/ws', files=fs)
Copy the code

The timeout example

>>> r = requests.request('GET'.'http://www.baidu.com', timeout=10)
Copy the code

Proxies example

>>> pxs = { 'http': 'http://user:[email protected]:1234'
'https': 'https://10.10.10.1:4321' }
>>> r = requests.request('GET'.'http://www.baidu.com', proxies=pxs)
Copy the code

Exception handling

abnormal	instructions
requests.ConnectionError	The network connection is abnormal, such as DNS query failure or connection denial
requests.HTTPError	HTTP error exception
requests.URLRequiredURL	Abnormal loss
requests.TooManyRedirects	The maximum number of redirection times is exceeded, causing a redirection exception. Procedure
requests.ConnectTimeout	The connection to the remote server timed out. Procedure
requests.Timeout	The REQUEST URL timed out. Procedure

ConnectTimeout is a Timeout in the process of connecting to the server, and Timeout is a Timeout in the process of initiating the request (including the time required before and after connecting to the server).

R.r aise_for_status () function

If a status code other than 200 is returned, a requests.HTTPError exception is raised

A code framework for crawling web pages

A crawling web page picture and save the code case

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Python crawler for the Requests library introductory notes

Several main methods of the Requests library

The get method

The head () method

Post () method

`requests.request()`methods

Params example

The data example

Json example

Headers example

Example files

The timeout example

Proxies example

Exception handling

R.r aise_for_status () function

A code framework for crawling web pages

A crawling web page picture and save the code case

Python crawler for the Requests library introductory notes

Several main methods of the Requests library

The head () method

requests.request()methods

The data example

Headers example

The timeout example

Exception handling

A code framework for crawling web pages

Related Posts

`requests.request()`methods