Feel the Go write crawler | Go on topic

Use Python + scrapy to create a crawler. This article will introduce Colly, the crawler framework of Go.

The framework is introduced

Colly is Gopher’s very fast and elegant crawler framework, providing a clean interface to write any kind of crawler. You can easily extract structured data from web sites that can be used for a variety of applications, such as data mining, data processing, or archiving.

Characteristics of framework

Let’s look at some of the benefits of this framework:

Simple API
Fast (single core can reach 1K requests per second)
Manage request latency and maximum concurrency per domain name
Automatic cookie and session processing
Synchronous/asynchronous/parallel fetching
The cache
Automatically encodes non-Unicode responses
Robots.txt support
Distributed crawler
Configure using environment variables
extensible

The installation

There are two ways to install Colly.

Methods a

On the go. Mod add github.com/gocolly/colly/v2 latest

1.14 the require module github.com/k8scat/spider go (github.com/gocolly/colly/v2 latest)Copy the code

Way 2

General installation methods of the Go module are as follows:

go get -u github.com/gocolly/colly/v2
Copy the code

A simple case

Now let’s try this framework out:

func main(a) {
    c := colly.NewCollector()

    // Find and visit all links
    c.OnHTML("a[href]".func(e *colly.HTMLElement) {
        e.Request.Visit(e.Attr("href"))
    })

    c.OnRequest(func(r *colly.Request) {
        fmt.Println("Visiting", r.URL)
    })

    c.Visit("http://go-colly.org/")}Copy the code

conclusion

Go write crawler overall feeling is still very good, worth a try!

Feel the Go write crawler | Go on topic

The framework is introduced

Characteristics of framework

The installation

Methods a

Way 2

A simple case

conclusion

Related Posts

Dual-mode IT gives your business two engines

AlertManager Alarm component

10, online custom parameter interface, free to use