What does Gerapy framework do?

To integrate the projects written by our crawler engineers through Scrapy crawler framework into the Django Web environment for unified management of the background. The simple reason is that an Admin background controls the crawler script we have written, carries out targeted network data collection (such as fixed time, fixed interval, or one-time collection) for convenient management, and carries out simple project management for the project. For those who understand Django Web development, it is easy to add Admin module functions based on this framework if you need report functions. The framework is very beginner friendly and easy to use and efficient.

Pay attention to the pit!

  1. The Django version of Gerapy is 1.x. If you are using a Django environment of 2.x, 3.x or higher, it is not compatible. [deal transfer] (https://blog.csdn.net/qq_20288327/article/details/107971227)
  2. If the server is deployed, you need to open a port for remote access. Do not open a port on the 6800.
  3. The remote server configuration and scrapy configuration versions should be the same.

Results show

! [](https://pic4.zhimg.com/v2-460ad6f90781b9e6a7efb7c8c67964bf_b.png)

! [](https://pic1.zhimg.com/v2-12c0fc7e05b66b7a8946c03dc43df504_b.png)

! [](https://pic1.zhimg.com/v2-4ac8f97c7e1a13122071307c007c6748_b.png)

! [](https://pic1.zhimg.com/v2-aa7786acf9cefb490dc44780c8e755ac_b.png)

Deployment process

Project installation

Gerapy installation note your own version

pip install gerapy
Copy the code
Scrapyd installation

pip install scrapyd
Copy the code
Create a working file directory where the initialization project (filename optional) is executed on the command line

gerapy init
Copy the code
Initializing the database

cd gerapy
gerapy migrate
Copy the code
! [](https://pic3.zhimg.com/v2-f1a4a3180f500a7377d899c36838bd3a_b.png)

Create a superuser (remember username and password)

gerapy initadmin
Copy the code
Then you can start the service, start the service (you can specify the URL and port)

Gerapy runserver then executes your IP + port (0.0.0.0:8000) scrapyd or/usr/local/python3 / bin/scrapydCopy the code
Enter the admin platform (browser input) for example so that it opens locally

http://127.0.0.1:8000
Copy the code
This place is a little pit, many online tutorials do not have a user name password to create login process, do not create a user can not use. Create a super user (both the user name and password are admin). After the super user is created, log in to the management platform and change the password.

gerapy initadmin
Copy the code

The host management

  1. Create host name (machine name: if it can be distinguished)
  2. Creating host IP addresses (IP address segment: omitted)
  3. Creating host ports (port number: 6800 by default)
  4. Authentication: For now, this username works correctly and incorrectly
! [](https://pic1.zhimg.com/v2-b8849480c6a52e794a7b4cac0594bc88_b.png)

The project management

  1. Copy gerapy’s project directory to gerapy’s project directory, and it will display your project directory directly on the page.
  2. The deployment of
! [](https://pic2.zhimg.com/v2-4d36d9a99f37e06f101a078f08701769_b.png)

! [](https://pic2.zhimg.com/v2-3e8077dea99c0c7f3653fde8bfc470b9_b.png)

Task management

Create tasks including name (user-defined), project (consistent with project management), and crawler (separate spider file). Execute tasks as follows: Host + scheduling mode + Running time + Time zone. Select Asia/Hong_Kong in mainland China

! [](https://pic3.zhimg.com/v2-446e7ea8948fde6ee360a386d2084a26_b.png)

You can use the crawler simulator to write an automatic script according to the script captured by yourself. Basically, 100 scripts a week are too boring, trying to add directly in SQLite cannot execute, I don’t know why.