After three months of development and testing, The Digger crawler platform is officially open source on Github

What is a Digger?

Digger is a configurable, distributed, cross-platform crawler developed in pure Golang that allows you to write Javascript plug-ins to achieve whatever you want. Digger and related components can run on a variety of inexpensive servers and development boards, such as raspberry PI, with minimal resource overhead. Digger has no complex dependencies and is very simple to deploy. It supports Linux and Windows platforms. Currently, it supports CPU architectures including AMD64, ARM and ARM64

The project address Github.com/hetianyi/di…
The document docs.diggerit.me/
Online experience demo.diggerit.me/

You can quickly experience functions in the Demo environment.

Due to limited resources, use the demo environment properly. Scheduled tasks will clear data at 00:00 every day.

Function introduction

  • Css selectors and Xpath selectors are supported
  • Supports multiple result types: plain text, HTML, and array
  • Web side crawler configuration editor
  • Online debugging crawler configuration, precise positioning problems
  • Plug-in support
  • Browse crawler logs in real time
  • Results Online browse, export, one-click generation of database schema (Postgres and mysql)
  • Timing task
  • Support for suspending tasks
  • Distributed worker instance, effectively avoid crawler block
  • Supports the scheduling function of matching tasks and worker labels
  • Supports configuration import and export
  • Email notification function
  • Pin notification (TODO)
  • DiggerHub supports crawler configuration sharing (TODO)

Project screenshots

Enjoy your crawling!