This is the 12th day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021

Gerapy

Prerequisite: The Scrapyd service is started and the project has been deployed to Scrapyd

1. Install

Command line, speed installation.

pip install gerapy

Copy the code

2. Initialize the service

Run the following command to install, initialize, and create a user.

The installation

pip install gerapy
Copy the code

Initialize the

Create a new directory that will serve as gerapy’s working directory by executing the following command.

gerapy init
Copy the code

A Gerapy folder is generated in the current directory.

Creating a database

Go to the gerapy folder and run the following command to generate an SQLite database in the Gerapy directory and create the data tables.

gerapy migrate
Copy the code

Sqlite here is an embedded database, and another embedded database, BerkeleyDB, will also be used later. In Linux, if the SQLite version is too early, an error message is displayed. In this case, you need to install sqLite of an earlier version.

The detailed steps of installation are not written here, leaving you a little space to play.

The user to create

Execute the following command to create an administrative user.

gerapy createsuperuser
Copy the code

After this command is executed, you are prompted to enter your user name, email address, and password. All operations are shown below:

3. Start gerapy

In the gerapy working directory, run the following command to start the service:

Run foreground mode

Gerapy runserver then executes 0.0.0.0:8888Copy the code

Background mode running

Gerapy runServer 0.0.0.0:8888 > /dev/null 2>&1 &Copy the code

Two points are emphasized here:

  1. If you want access from the Internet, add 0.0.0.0 at startup
  2. The default port is 8000, but I changed it to 8888
  3. The test adopts the foreground mode and the production adopts the background mode

Access port 8888 to access the login page.

Enter the user name and password to log in to the host management menu.

4. Menu introduction

The host management

Host management is to implement interface management operation for crawler deployed on scrapyd service.

Click the Create button in the upper right corner to add the IP and port of the Scrapyd service and click Save. The result is shown below.

When the status is normal, click the schedule button to enter the list of crawlers for the scrapy project.

Click the run button behind the crawler to run the crawler.

The project management

Gerapy’s working directory has an empty projects folder, where the Scrapy directory is located. You can deploy a Scrapy project without scrapyd, as long as the project files are in the Projects folder.

There are three main methods:

  1. Move or copy the local Scrapy project directly to the project folder.
  2. Clone or download a remote project, such as Git Clone, and download the project to the project folder.
  3. Link the project to the project folder through the soft connection (using ln command on Linux, Mac, using mklink command)

Here I have compressed the local scrapy project into a zip format and uploaded it.

Once the upload is complete, click the Deploy button to package and then deploy to a remote host.

Once deployed, you can see the new scrapy project on the remote host.

You can also edit the project online by clicking the Edit button.

If you look at the gerapy working directory after the deployment is complete, you can see that there is a newly deployed scrapy project under Projects.

Task management

You can set scheduled tasks, access the task management menu, and create scheduled tasks. For example, you can create the crontab mode to run a scheduled task every minute.

Click Edit to modify the scheduled task.

conclusion

This article mainly describes the main project scrapy, in the support of Scrapyd and Genrapy, the final implementation of interface operation. This is one of the reasons I think scrapy ecology is superior to native crawlers.

So, crawlers aren’t just crawling data. When you’re bored and unskilled, dig deep and find something interesting in your area.