The request function

  • Simulating web Surfing through a browser

The installation

pip install requests
Copy the code

process

  1. The specified url
  2. The initiating
  3. Get reactive data
  4. Persistent storage

Case 1: Climb sogou home page

import  requests
url = "https://www.sogou.com/"
response = requests.get(url=url)
# text returns the response data as a string
page_text = response.text
print(page_text)

with open("./sougou2021.html"."w",encoding="utf-8") as fp:
	fp.write(page_text)

Copy the code

Case 2: Make a simple web collector

Ua camouflage

key = input('enter a key word:')
# Parameter dynamic
params = {'query':key}
# UA camouflage
headers = {
   'User-Agent':'the Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36',
}
url = 'https://www.sogou.com/web'
response = requests.get(url=url,params=params,headers=headers)
page_text = response.text
fileName = key+'.html'
with open(fileName,'w',encoding='utf-8') as fp:
   fp.write(page_text)
Copy the code

Case 3: Get douban movies

Dynamically loading data

Get: URL, request mode, request parameters, and request header information

import requests
headers = {
    'User-Agent':'the Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36',
}
url = 'https://movie.douban.com/j/chart/top_list'
for i in range(1.30):
    params = {
        'type': i,
        'interval_id': '100:90'.'action':' ' ,
        'start': '0'.'limit': '20',
    }
    json_data = requests.get(url=url,headers=headers,params=params).json()
    print(json_data)

Copy the code

Case 4 Post request operation

The post request uses data instead of params (form data for this page).

import requests
headers = {
    'User-Agent':'the Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36',
}

url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword'
for pageNum in range(1.6):
    data = {
        'cname': ' '.'pid': ' '.'keyword': 'Shanghai'.'pageIndex': pageNum,
        'pageSize': '10',
    }
    json_data = requests.post(url=url,headers=headers,data=data).json()['Table1']
    for dic in json_data:
        print(dic['addressDetail'])
Copy the code

Case 5: Glory crawl

Form data is a dictionary format and takes JSON instead of data

import requests
headers = {
    'User-Agent':'the Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36',}# Obtain store ID in batch
main_url = 'https://openapi.vmall.com/mcp/offlineshop/getShopList'
data = {"portal":2."lang":"zh-CN"."country":"CN"."brand":1."province":"Beijing"."city":"Beijing"."pageNo":1."pageSize":20}
# Unpack the smallest dictionary unit with the shop_ID information needed under the newspaper
json_data = requests.post(main_url,headers=headers,json=data).json()['shopInfos'] 



url = 'https://openapi.vmall.com/mcp/offlineshop/getShopById'

for dic in json_data:
    shop_id = dic['id']
    params = {
        'portal':'2'.'version': '10'.'country': 'CN'.'shopId': shop_id,
        'lang': 'zh-CN',
    }
    json_data = requests.get(url,headers=headers,params=params).json()
    print(json_data)

Copy the code

Json_data is a dictionary data, where the shopInfo key corresponds to a list, and the list element is a list of dictionaries => dictionary sets list sets dictionary