[Gerapy crawler Management Framework] Deployment process

What does Gerapy framework do?

To integrate the projects written by our crawler engineers through Scrapy crawler framework into the Django Web environment for unified management of the background. The simple reason is that an Admin background controls the crawler script we have written, carries out targeted network data collection (such as fixed time, fixed interval, or one-time collection) for convenient management, and carries out simple project management for the project. For those who understand Django Web development, it is easy to add Admin module functions based on this framework if you need report functions. The framework is very beginner friendly and easy to use and efficient.

Pay attention to the pit!

The Django version of Gerapy is 1.x. If you are using a Django environment of 2.x, 3.x or higher, it is not compatible. [deal transfer] (https://blog.csdn.net/qq_20288327/article/details/107971227)
If the server is deployed, you need to open a port for remote access. Do not open a port on the 6800.
The remote server configuration and scrapy configuration versions should be the same.

Results show

! [](https://pic4.zhimg.com/v2-460ad6f90781b9e6a7efb7c8c67964bf_b.png)

! [](https://pic1.zhimg.com/v2-12c0fc7e05b66b7a8946c03dc43df504_b.png)

! [](https://pic1.zhimg.com/v2-4ac8f97c7e1a13122071307c007c6748_b.png)

! [](https://pic1.zhimg.com/v2-aa7786acf9cefb490dc44780c8e755ac_b.png)

Deployment process

Project installation

Gerapy installation note your own version

pip install gerapy
Copy the code

Scrapyd installation

pip install scrapyd
Copy the code

Create a working file directory where the initialization project (filename optional) is executed on the command line

gerapy init
Copy the code

Initializing the database

cd gerapy
gerapy migrate
Copy the code

! [](https://pic3.zhimg.com/v2-f1a4a3180f500a7377d899c36838bd3a_b.png)

Create a superuser (remember username and password)

gerapy initadmin
Copy the code

Then you can start the service, start the service (you can specify the URL and port)

Gerapy runserver then executes your IP + port (0.0.0.0:8000) scrapyd or/usr/local/python3 / bin/scrapydCopy the code

Enter the admin platform (browser input) for example so that it opens locally

http://127.0.0.1:8000
Copy the code

This place is a little pit, many online tutorials do not have a user name password to create login process, do not create a user can not use. Create a super user (both the user name and password are admin). After the super user is created, log in to the management platform and change the password.

gerapy initadmin
Copy the code

The host management

Create host name (machine name: if it can be distinguished)
Creating host IP addresses (IP address segment: omitted)
Creating host ports (port number: 6800 by default)
Authentication: For now, this username works correctly and incorrectly

! [](https://pic1.zhimg.com/v2-b8849480c6a52e794a7b4cac0594bc88_b.png)

The project management

Copy gerapy’s project directory to gerapy’s project directory, and it will display your project directory directly on the page.
The deployment of

! [](https://pic2.zhimg.com/v2-4d36d9a99f37e06f101a078f08701769_b.png)

! [](https://pic2.zhimg.com/v2-3e8077dea99c0c7f3653fde8bfc470b9_b.png)

Task management

Create tasks including name (user-defined), project (consistent with project management), and crawler (separate spider file). Execute tasks as follows: Host + scheduling mode + Running time + Time zone. Select Asia/Hong_Kong in mainland China

! [](https://pic3.zhimg.com/v2-446e7ea8948fde6ee360a386d2084a26_b.png)

You can use the crawler simulator to write an automatic script according to the script captured by yourself. Basically, 100 scripts a week are too boring, trying to add directly in SQLite cannot execute, I don’t know why.

[Gerapy crawler Management Framework] Deployment process

What does Gerapy framework do?

Pay attention to the pit!

Results show

Deployment process

Project installation

The host management

The project management

Task management

Related Posts

Intranet penetration, it took me three hours to sort out the detailed setup steps

DolphinDB general purpose computing tutorial

MPush open-source message push system: simple, secure and supports clustering