Deployment of distributed crawlers using scrapyd-client

There is a tool available to complete the deployment process called scrapyd-Client. This section introduces how to deploy a Scrapy project using scrapyd-Client.

First, preparation

Make sure scrapyd-Client is installed correctly.

Scrapyd-Client

Scrapyd-client Provides the following functions to facilitate the deployment of Scrapy projects.

Package the project as an Egg file.
Deploy the packaged Egg file to Scrapyd via the addversion.json interface.

Scrapyd-client does this for us, so we don’t have to worry about how the Egg file is generated, or read the Egg file and request the interface to upload. We just need to execute a command and deploy it with one click.

Scrapyd-client deployment

To deploy the Scrapy project, we first need to modify the project’s configuration file. For example, the first layer of the Scrapy crawler project will have a scrapy. CFG file that reads as follows:

[settings]
default = weibo.settings

[deploy]
#url = http://localhost:6800/
project = weiboCopy the code

You need to configure the Deploy section here. For example, if we were to deploy the project to Scrapyd 120.27.34.25, the following changes would be made:

[the deploy] url = http://120.27.34.25:6800/ project = weiboCopy the code

/ / scrapy. CFG/scrapy. CFG/scrapy. CFG/scrapy. CFG/scrapy. CFG/scrapy. CFG

scrapyd-deployCopy the code

The running results are as follows:

Packing version 1501682277
Deploying to project "weibo" inhttp://120.27.34.25:6800/addversion.json Server response (200) : {"status": "ok"."spiders": 1, "node_name": "datacrawl-vm"."project": "weibo"."version": "1501682277"}Copy the code

The returned result indicates that the deployment was successful.

The project version defaults to the current timestamp. We can also specify the project version, passing in the version parameter. Such as:

scrapyd-deploy --version 201707131455Copy the code

Note that in Python 3 Scrapyd 1.2.0, version numbers cannot be specified as alphabetic strings; they must be pure numbers or an error will occur.

If there are multiple hosts, you can configure the aliases of each host and modify the configuration file as follows:

Url = http://120.27.34.24:6800/ project = [deploy: vm1] weibo/deploy: vm2 url = http://139.217.26.30:6800/ project = weiboCopy the code

Configure multiple hosts in a unified manner. One host corresponds to a group of configurations. Add the host alias to the end of deploy. If you want to deploy the project to the VM2 host with IP 139.217.26.30, you only need to run the following command:

scrapyd-deploy vm2Copy the code

CFG file, and then call scrapyd-deploy with the host name.

If Scrapyd sets access restrictions, we can add usernames and passwords to the configuration file and change the port to an Nginx proxy port. For example, in Chapter 1 we used 6801, so we need to change this to 6801, as follows:

Url = http://120.27.34.24:6801/ project = [deploy: vm1] weibo username = admin password = admin/deploy: vm2 url = The http://139.217.26.30:6801/ project = weibo username = germey password = germeyCopy the code

By adding the USERNAME and Password fields, we can automate Auth authentication at deployment time and then successfully deploy.

Four, conclusion

This section describes how to deploy projects to Scrapyd easily using scrapyd-Client, which makes deployment less of a hassle.

This resource starting in Cui Qingcai personal blog still find: Python3 tutorial | static find web crawler development practical experience

For more crawler information, please follow my personal wechat official account: Attack Coder

Weixin.qq.com/r/5zsjOyvEZ… (Qr code automatic recognition)

Deployment of distributed crawlers using scrapyd-client

First, preparation

Scrapyd-Client

Scrapyd-client deployment

Four, conclusion

Related Posts

PHP security: Nginx use security

Distributed domain driven design based on Akka

Why did Dubbo rewrite it with Go