[SDK community] Python emoji crawler

The article directories

1. Modules needed

Start writing code

1. Create a folder to save your images

2. Create a request header

3. Code body

4. Create a loop

The complete code

1. Modules needed

import requests 
import os
from bs4 import BeautifulSoup 

Copy the code

You also need to have an LXML library but you don’t need to import BeautifulSoup under beautifulsoup4

Start writing code

1. Create a folder to save your images

If not os.path.isdir("./img/"): # create an img folderCopy the code

2. Create a request header

Headers ={' user-agent ':'Mozilla/5.0 (Windows NT 10.0; WOW64; The rv: 52.0) Gecko / 20100101 Firefox / 52.0 '}Copy the code

The reason for creating a request header is that most web sites will validate the current request to determine whether it is valid or not during a web site visit (it would be considered illegal if you did not use a browser to retrieve web site data). If you do not add the request header, you may not have access to the current site.

Tip: The contents of the request header can be fetched by Burpsuite

3. Code body

def pa(num): Url = 'https://fabiaoqing.com/biaoqing/lists/page/' + STR (num) + 'HTML' # structure rp url = requests. Get (url, headers). The text # request url Return soup = BeautifulSoup(RP,' LXML ') # build soup img_list = soup. Find_all ('img',class_=' UI image lazy') # Filter all IMG tags Class = XXX for img in img_list Img_url = img['title'] print(img_url,img_title) try: with open('img/'+img_title + os.path.splitext(img_url)[-1],'wb') as f: Get (img_url).content # request img_URL to return f.write(image) except: passCopy the code

4. Create a loop

For I in range(1,201): # Pa (I) # grab images from every pageCopy the code

The complete code

If not os.path.isdir("./img/"): Os.mkdir ("./img/") headers={' user-agent ':'Mozilla/5.0 (Windows NT 10.0; WOW64; Rv :52.0) Gecko/20100101 Firefox/52.0'} # create headers def pa(num): Url = 'https://fabiaoqing.com/biaoqing/lists/page/' + STR (num) + 'HTML' # structure rp url = requests. Get (url, headers). The text # request url Return soup = BeautifulSoup(RP,' LXML ') # build soup img_list = soup. Find_all ('img',class_=' UI image lazy') # Filter all IMG tags Class = XXX for img in img_list Img_url = img['title'] print(img_url,img_title) try: with open('img/'+img_title + os.path.splitext(img_url)[-1],'wb') as f: F.write (image) except: pass for I in range(1,201): pa(I)Copy the code

The SDK community: www.sdk.cn/details/q5r…

[SDK community] Python emoji crawler

The article directories

1. Modules needed

Start writing code

1. Create a folder to save your images

2. Create a request header

3. Code body

4. Create a loop

The complete code

1. Modules needed

Start writing code

1. Create a folder to save your images

2. Create a request header

3. Code body

4. Create a loop

The complete code

Related Posts

The fundamentals of JAVA annotations

Bye, Excel! One line of code for “conditional formatting” is written for Pandas.

LeetCode Brush question – guess the number size