From now on, anyone who tries to compete with me ends badly!

In daily chat, we often use a lot of interesting emoticons to liken the atmosphere. Nowadays, chatting without emoticons feels out of step with The Times. However, in the face of massive emoticons, it is obviously impossible to download emoticons one by one manually.

This kind of entertainment is already a basic operation. To win the game, you need to have a large number of emojis. Manual downloading is not enough, you have to work hard.

So how to make these emojis automatically downloaded locally?

Well, without further ado, we can use a crawler technology to help us complete this function.

1. Create request headers, also known as masquerading browsers

Some of you might be asking what a request header is. The request header is actually a user agent, which contains the operating system version information of the current user and the browser version information of the current user to access the website.

The reason for creating a request header is that most web sites will validate the current request to determine whether it is valid or not during a web site visit (it would be considered illegal if you did not use a browser to retrieve web site data). If you do not add the request header, you may not have access to the current site.

Create method:

1. Open the browser

2. Hold the f12

3. Select the NetWork TAB

4. Click on the filtered link (link selection optional)

5. Find the user-Agent option and copy it

'User-agent ': 'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36'Copy the code

2. Complete the site data request using the Requests Network Request library

Once we have created the request header, we need to access the web site to retrieve the page data for the site. In this case, we need to use Requests to retrieve the page data for the site.

Before using Requests, you need to download and install.

Download the installation command: PIP install requests -i pypi.douban.com/simple

Once the installation is complete, we can use Requests.

Method of use: requests. Get (‘ fabiaoqing.com/biaoqing/li… ‘, headers=headers).text

When obtaining web page data, you need to call the GET method in HTTP protocol to obtain the page data. In general, it will return the parameters required by the HTML page source code: The domain name and headers of the site call the built-in Text method in Requests to retrieve text data, or the built-in Content method to retrieve binary data.

3. Use BS4 to extract page data

Because a page contains too much data, such as: search box, page advertising, ICP record number and so on. But what if we just want to get memes on the page? This is where we need to use a very useful third-party package: BS4

Usage:

1. In your browser, press AND hold DOWN F12 to bring up the Developer tools, click On Elements, then click on the Element selector (the small arrow on the left), then select the image on the page, and left-click. The browser will help us locate the HTML code in which the image is located

2. After seeing the selected tag, view the element tag of the current image. The current tag is IMG

3. Call find_all(‘img’, class_=’ UI image lazy’) and return a list of image labels

For I in img_list: image_URL = I [‘data-original’] for I in img_list: image_URL = I [‘data-original’]

5. After extraction, all image labels of the current page will be obtained. With open(‘./’ + ‘image name’ + ‘file name ‘, ‘wb’) as f: image_data = requests.get(image_url, headers=headers).content f.write(image_data)

At this point, a simple micro letter expression pack crawler is completed.

The source code

In order for you to view the code in the article, I have shown it in the form of a picture. You can type it yourself after you understand it, or you can click the blue font: ** Recently many friends have asked about learning Python through private messages. For easy communication, click blue to join the discussion and answer resource base **

From now on, anyone who tries to compete with me ends badly!

1. Create request headers, also known as masquerading browsers

2. Complete the site data request using the Requests Network Request library

3. Use BS4 to extract page data

The source code

Related Posts

Hongmeng open source third party component – SlidingMenu_ohos side slide menu component

Melang’s coroutine concurrent agent in action

The final keyword