Use Python + scrapy to create a crawler. This article will introduce Colly, the crawler framework of Go.

The framework is introduced

Colly is Gopher’s very fast and elegant crawler framework, providing a clean interface to write any kind of crawler. You can easily extract structured data from web sites that can be used for a variety of applications, such as data mining, data processing, or archiving.

Characteristics of framework

Let’s look at some of the benefits of this framework:

  • Simple API
  • Fast (single core can reach 1K requests per second)
  • Manage request latency and maximum concurrency per domain name
  • Automatic cookie and session processing
  • Synchronous/asynchronous/parallel fetching
  • The cache
  • Automatically encodes non-Unicode responses
  • Robots.txt support
  • Distributed crawler
  • Configure using environment variables
  • extensible

The installation

There are two ways to install Colly.

Methods a

On the go. Mod add github.com/gocolly/colly/v2 latest

1.14 the require module github.com/k8scat/spider go (github.com/gocolly/colly/v2 latest)Copy the code

Way 2

General installation methods of the Go module are as follows:

go get -u github.com/gocolly/colly/v2
Copy the code

A simple case

Now let’s try this framework out:

func main(a) {
    c := colly.NewCollector()

    // Find and visit all links
    c.OnHTML("a[href]".func(e *colly.HTMLElement) {
        e.Request.Visit(e.Attr("href"))
    })

    c.OnRequest(func(r *colly.Request) {
        fmt.Println("Visiting", r.URL)
    })

    c.Visit("http://go-colly.org/")}Copy the code

conclusion

Go write crawler overall feeling is still very good, worth a try!