Pyspider, the python crawler extension, has been installed with some trepidation.

1. Start the PySpider service
1pyspider all
Copy the code
Create a PySpider project

3. Description of project area

4, from baidu home page began to climb

Fill in the baidu home page address and click run to start crawling, and click the crawling link to perform the next step

Click on any link to crawl to the next crawl

Returns the contents of the entered details page

5. Code editing area functions
 1#! /usr/bin/env python
 2# -*- encoding: utf-8 -*-
 3# Created on 2021-04-10 11:24:26
 4# Project: test
 5
 6from pyspider.libs.base_handler import *
 7
 8# processing class
 9class Handler(BaseHandler):
10    Crawler parameter configuration, global effect (dictionary type)
11    crawl_config = {
12        'url':'http://www.baidu.com'
13    }
14
15    # indicates once a day in minutes
16    @every(minutes=24 * 60)
17    # program entry
18    def on_start(self) :
19        Set crawler address
20        self.crawl('http://www.baidu.com', callback=self.index_page)
21
22    # indicates that it will not be climbed again for 10 days, in age seconds
23    @config(age=10 * 24 * 60 * 60)
24    # Callback function, data parsing
25    def index_page(self, response) :
26        # response.doc() returns a PyQuery object, so pyQuery parsing is used
27        for each in response.doc('a[href^="http"]').items():
28            Walk through and call back to the crawl details page
29            self.crawl(each.attr.href, callback=self.detail_page)
30
31    # Task priority setting
32    @config(priority=2)
33    Callback function, return result
34    def detail_page(self, response) :
35        # Return to details page
36        return {
37            "url": response.url,
38            "title": response.doc('title').text(),
39        }
Copy the code

More exciting things to come to wechat public account “Python Concentration Camp”, focusing on Python technology stack, information acquisition, communication community, dry goods sharing, looking forward to your joining ~