Disclaimer: This article is only for study and research, prohibited for illegal use, otherwise the consequences, such as infringement, please inform to delete, thank you!

Project scenario:

When we work with crawler, may be a few crawler for a start, we can deploy the manual scheduling, accumulate over a long period, from ten to one hundred, when the 100 crawler is completed, we will manually restart them, this is very troublesome, and if you want to look at their output log also one by one, A tool for quick deployment, task scheduling, and log viewing is a must. Here we use the Scrapyd deployment tool and spiderKeeper visual crawler management UI to do this.

Module Overview:

Scrapy is an open source web crawler framework written in Python. It is a program framework designed to crawl network data and extract structural data.

PIP install scrapy

Scrapyd: A service that runs Scrapy crawlers, allowing you to deploy and control their crawlers using the HTTP JSON API.

PIP install scrapyd

Scrapyd-client: The scrapyd-client is a Scrapyd Client that allows you to deploy projects to the Scrapyd server. You can also generate an egg file.

PIP install scrapyd-client

Spiderkeeper: Visual crawler management UI that can be set to run periodically to view statistics.

PIP install SpiderKeeper

Solution:

1. Create a new crawler (scrapy startProject mySpider) and enter the mySpider directory to create a crawlerwww.baidu.com

2. Modify scrapy. CFG and add a deployment name my after deploy

3. Start scrapyd

4. Upload our crawler project under mySpider directory`scrapyd-deploy my -p myspider`

5. The status that can be viewed after the upload is OK, and then the crawler is executed`Spiders curl http://127.0.0.1:6800/schedule.json - d project = myspider - d = spiders`

6. Status “OK” indicates that the operation is successful. Then we need to display it on the SPIDerKeeper UI and start spiderKeeper to listenhttp://localhost:6800 `spiderkeeper --server=http://localhost:6800`

7. Access after startupHttp://server IP address :5000 The Spiderkeeper management page is displayedThe default account password is admin.

8. Click Create Project to Create the Project, and then we will generate the egg file`scrapyd-deploy --build-egg output.egg`If the information in the red box is displayed, success is indicated

9. Then we upload the egg file

10. Click Submit and click Project to select the project we just created

conclusion

At this point, our scrapy project is successfully deployed. If your scrapy crawler code is updated later, you just need to re-upload the crawler to scrapyd`Scrapyd -deploy Deployment name -p Project name`

Reference links:zhuanlan.zhihu.com/p/63302475

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Deploy scrapy crawlers using scrapyd+ SpiderKeeper in Linux

Disclaimer: This article is only for study and research, prohibited for illegal use, otherwise the consequences, such as infringement, please inform to delete, thank you!

Project scenario:

Module Overview:

Scrapy is an open source web crawler framework written in Python. It is a program framework designed to crawl network data and extract structural data.

Scrapyd: A service that runs Scrapy crawlers, allowing you to deploy and control their crawlers using the HTTP JSON API.

Scrapyd-client: The scrapyd-client is a Scrapyd Client that allows you to deploy projects to the Scrapyd server. You can also generate an egg file.

Spiderkeeper: Visual crawler management UI that can be set to run periodically to view statistics.

Solution:

1. Create a new crawler (scrapy startProject mySpider) and enter the mySpider directory to create a crawlerwww.baidu.com

2. Modify scrapy. CFG and add a deployment name my after deploy

3. Start scrapyd

4. Upload our crawler project under mySpider directory`scrapyd-deploy my -p myspider`

5. The status that can be viewed after the upload is OK, and then the crawler is executed`Spiders curl http://127.0.0.1:6800/schedule.json - d project = myspider - d = spiders`

6. Status “OK” indicates that the operation is successful. Then we need to display it on the SPIDerKeeper UI and start spiderKeeper to listenhttp://localhost:6800 `spiderkeeper --server=http://localhost:6800`

7. Access after startupHttp://server IP address :5000 The Spiderkeeper management page is displayedThe default account password is admin.

8. Click Create Project to Create the Project, and then we will generate the egg file`scrapyd-deploy --build-egg output.egg`If the information in the red box is displayed, success is indicated

9. Then we upload the egg file

10. Click Submit and click Project to select the project we just created

conclusion

At this point, our scrapy project is successfully deployed. If your scrapy crawler code is updated later, you just need to re-upload the crawler to scrapyd`Scrapyd -deploy Deployment name -p Project name`

Reference links:zhuanlan.zhihu.com/p/63302475

Deploy scrapy crawlers using scrapyd+ SpiderKeeper in Linux

Disclaimer: This article is only for study and research, prohibited for illegal use, otherwise the consequences, such as infringement, please inform to delete, thank you!

Project scenario:

Module Overview:

Scrapy is an open source web crawler framework written in Python. It is a program framework designed to crawl network data and extract structural data.

Scrapyd: A service that runs Scrapy crawlers, allowing you to deploy and control their crawlers using the HTTP JSON API.

Scrapyd-client: The scrapyd-client is a Scrapyd Client that allows you to deploy projects to the Scrapyd server. You can also generate an egg file.

Spiderkeeper: Visual crawler management UI that can be set to run periodically to view statistics.

Solution:

1. Create a new crawler (scrapy startProject mySpider) and enter the mySpider directory to create a crawlerwww.baidu.com

2. Modify scrapy. CFG and add a deployment name my after deploy

3. Start scrapyd

4. Upload our crawler project under mySpider directoryscrapyd-deploy my -p myspider

5. The status that can be viewed after the upload is OK, and then the crawler is executedSpiders curl http://127.0.0.1:6800/schedule.json - d project = myspider - d = spiders

6. Status “OK” indicates that the operation is successful. Then we need to display it on the SPIDerKeeper UI and start spiderKeeper to listenhttp://localhost:6800 spiderkeeper --server=http://localhost:6800

7. Access after startupHttp://server IP address :5000 The Spiderkeeper management page is displayedThe default account password is admin.

8. Click Create Project to Create the Project, and then we will generate the egg filescrapyd-deploy --build-egg output.eggIf the information in the red box is displayed, success is indicated

9. Then we upload the egg file

10. Click Submit and click Project to select the project we just created

conclusion

At this point, our scrapy project is successfully deployed. If your scrapy crawler code is updated later, you just need to re-upload the crawler to scrapydScrapyd -deploy Deployment name -p Project name

Reference links:zhuanlan.zhihu.com/p/63302475

Related Posts

Spock data-driven testing

Redis ordered set is used to query IP address

Computer Fundamentals

4. Upload our crawler project under mySpider directory`scrapyd-deploy my -p myspider`

5. The status that can be viewed after the upload is OK, and then the crawler is executed`Spiders curl http://127.0.0.1:6800/schedule.json - d project = myspider - d = spiders`

6. Status “OK” indicates that the operation is successful. Then we need to display it on the SPIDerKeeper UI and start spiderKeeper to listenhttp://localhost:6800 `spiderkeeper --server=http://localhost:6800`

8. Click Create Project to Create the Project, and then we will generate the egg file`scrapyd-deploy --build-egg output.egg`If the information in the red box is displayed, success is indicated

At this point, our scrapy project is successfully deployed. If your scrapy crawler code is updated later, you just need to re-upload the crawler to scrapyd`Scrapyd -deploy Deployment name -p Project name`