How to create a new crawler frame

How to create your first crawler project under the framework of Scrapy (above), today we further understand the understanding of crawler project, here to bile online website all article pages as an example to illustrate.

After we have created our Scrapy crawler project, we will be prompted to create our Scrapy project directly from the template. As prompted, we first run the “CD article” command, which opens or enters the article folder, and then execute the “scrapy genspider jobbole blog.jobbole.com” command. Represents creating a Scrapy project using the basic template that comes with Scrapy, as shown below.

The spiders. Jobbole template is created as an article. Spiders. Jobbole.

Of course, crawler templates don’t have to be Scrapy to create custom crawler templates, but Scrapy templates are generally sufficient to use.

Next, import the entire crawler project into Pycharm and click “File” a “open” in the upper left to find the folder created by the crawler project and click ok.

If you cannot see the spiders folder jobbole. Py in Pycharm, select the spiders folder, then right-click “Synchronize Spider” to Synchronize with the spiders folder. Then you can see that Jobbole.py is loaded.

Click on the jobbole.py file to see the contents, as shown below. You can see that this file is already filled with some Python code by default, and was actually created by copying it from the source template.

You can see that the file contains the name of the current Scrapy crawler project, the allowed_domains allowed for the Scrapy crawler, and the starting URL of the Scrapy crawler project, start_urls.

Next take a final look at the Project’s Python interpreter, click setting on Pycharm and type “Interpreter” to find the interpreter location, as shown below.

If “Project Interpreter” displays an Interpreter that is not a virtual environment under the current Project, click the setup button to the right of “Project Interpreter” as shown below.

Then click “Add Local” as shown below.

Find the Python interpreter corresponding to the virtual environment of the project and add it, as shown in the figure below.

Now that we have our Scrapy crawler virtual environment created, our Scrapy crawler project imported to Pycharm, and our interpreter configured, we are ready to write our crawler logic and extract data

Interested in crawler friend, welcome to Github:https://github.com/cassieeric, like remember give a star oh ~ ~

Related Posts

Java Eight-part text

With Huawei Cloud DevCloud transformation of big data

No longer have to worry about Internet cafes open black teammates inaudible! Noise reduction solution to know?