Web scraping with Golang – Go and Colly

Web scraping with Golang – Go and Colly,Creating a basic Colly scraper,Extracting data from a website,Following links,

Web scraping is a technique used to extract data from websites. It involves automating the process of visiting a website, parsing the HTML or XML markup, and extracting the desired information. Golang is a popular programming language that can be used for web scraping. In this tutorial, we will explore how to use the Golang library called Colly to perform web scraping.

Web scraping with Golang – Go and Colly

Installation

To use Colly in your Golang project, you need to install it using the following command:

go 

go get -u github.com/gocolly/colly/...

Creating a basic Colly scraper

To create a basic Colly scraper, you need to first import the Colly library in your Golang file as follows:

go

Copy code

import (

    "fmt"

    "github.com/gocolly/colly"

)

After importing the library, you can create a new Colly collector and set up the callback functions that will be executed when certain events occur. In the example below, we are setting up a callback function to be executed when a page is visited:


go

func main() {

    c := colly.NewCollector()


    c.OnRequest(func(r *colly.Request) {

        fmt.Println("Visiting", r.URL)

    })


    c.Visit("http://go-colly.org/")

}

The NewCollector() function creates a new Colly collector. The OnRequest() function sets up a callback function that will be executed when a page is visited. In this example, we are simply printing the URL of the page being visited. Finally, we call the Visit() function to start the scraping process.

Extracting data from a website

To extract data from a website, you need to set up callback functions that will be executed when certain HTML elements are encountered. In the example below, we are setting up a callback function to be executed when a div element with a class attribute of post is encountered:

go

func main() {

    c := colly.NewCollector()


    c.OnHTML("div.post", func(e *colly.HTMLElement) {

        fmt.Println(e.ChildText("h2"))

        fmt.Println(e.ChildText("p"))

    })


    c.Visit("http://go-colly.org/")

}

The OnHTML() function sets up a callback function that will be executed when an HTML element matching the specified CSS selector is encountered. In this example, we are looking for div elements with a class attribute of post. The callback function then extracts the text content of the h2 and p elements inside the div element.

Following links

In some cases, you may want to follow links on a page to scrape additional data. To follow links, you can set up a callback function to be executed when a link is encountered, as shown in the example below:

go

func main() {

    c := colly.NewCollector()


    c.OnHTML("a[href]", func(e *colly.HTMLElement) {

        link := e.Attr("href")

        c.Visit(e.Request.AbsoluteURL(link))

    })


    c.OnRequest(func(r *colly.Request) {

        fmt.Println("Visiting", r.URL)

    })


    c.Visit("http://go-colly.org/")

}

In this example, we are setting up a callback function to be executed when an a element with an href attribute is encountered. The callback function then extracts the value of the href attribute and calls the Visit() function to follow the link. We are also setting up a callback function to be executed when a page is visited

Post a Comment

Cookie Consent
We serve cookies on this site to analyze traffic, remember your preferences, and optimize your experience.
Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
AdBlock Detected!
We have detected that you are using adblocking plugin in your browser.
The revenue we earn by the advertisements is used to manage this website, we request you to whitelist our website in your adblocking plugin.