How to create a new crawler frame

How to create a web crawler virtual environment under Windows and how to install Scrapy, as well as the common problems in the installation process of Scrapy summary and corresponding solutions, you can click on the link to view. The first project to create a Scrapy crawler framework is as follows:

How to create a virtual environment on Windows and how to create a virtual environment on Windows (by default) Once in the environment, you can use the “PIP list” command to check whether Scrapy was successfully installed, as shown in the following figure.

See that Scrapy has been installed successfully.

2, here small make up to put Scrapy project in the demo folder, so to return to the higher level directory, as shown in the figure below.

3, start to create a new Scrapy project, enter the command “Scrapy startproject article”, wherein article is the name of the crawler project, can be changed by yourself. After entering the create command, wait a moment to create a project based on the template in the D:pythonDemo 8SeptemberDemoscrapy_demolibsite-Packagesscrapy emplatesProject directory. Depending on your crawler environment, as shown below, wait for the project to be created. Of course we can customize crawler templates, but for now, the crawler framework provides enough templates for us to get our Scrapy right.

Enter “CD article”, then “dir” to view the directory. You can also use “tree /f” to create the tree structure of the file directory. As shown in the following figure, you can see the file generated by Scrapy creation command.

The Article folder at the top is the project name.

The second layer contains an article folder with the same name as the project and a scrapy. CFG file. The article folder with the same name is a module to which all project code is added, and the scrapy. CFG file is the configuration file for the entire project.

The third layer has five files and a folder, where __init__.py is an empty file that turns its parent directory into a module; Kitems. py is the file that defines the stored objects and decides which items to crawl; The middlewares.py file is middleware and is generally not modified. It is mainly responsible for requests and responses between related components. > < span style = “box-sizing: border-box; line-height: 22px; word-break: inherit! Important; word-break: inherit! Important; Settings. py is the Settings file of the project, which sets the processing method of the project pipeline data, crawler frequency, table name, etc. The spiders folder places the crawler body file (used to implement crawler logic) and an empty __init__.py file.

You can also see the new Scrapy file in the Windows folder, as shown below.

6. Of course, you can also use Pycharm to import the project files, which will become clearer, as shown below.

7. Click on each project file to view its contents. The settings.py file is shown below, and the rest of the file is not described here.

So far, the first Scrapy crawler project creation and the file analysis of the Scrapy crawler project introduction to the first here, the next step to start the advanced content of the Scrapy crawler project, please look forward to ~~

Related Posts

JAVA programmers learn Vue series 6 – Vue – CLI

Experience sharing: Solve the problem that sogou input method cannot input Chinese in Linux system

Create a PWA with Svelte