Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.


Geziyor is a blazing fast web crawling and web scraping framework. It can be used to crawl websites and extract structured data from them. Geziyor is useful for a wide range of purposes such as data mining, monitoring and automated testing.

  • JS Rendering
  • 5.000+ Requests/Sec
  • Caching (Memory/Disk/LevelDB)
  • Automatic Data Exporting (JSON, CSV, or custom)
  • Metrics (Prometheus, Expvar, or custom)
  • Limit Concurrency (Global/Per Domain)
  • Request Delays (Constant/Randomized)
  • Cookies, Middlewares, robots.txt
  • Automatic response decoding to UTF-8

See scraper Options for all custom settings.


We highly recommend you to use Geziyor with go modules.


This example extracts all quotes from and exports to JSON file.

func main() {
        StartURLs: []string{""},
        ParseFunc: quotesParse,
        Exporters: []export.Exporter{&export.JSON{}},

func quotesParse(g *geziyor.Geziyor, r *client.Response) {
    r.HTMLDoc.Find("div.quote").Each(func(i int, s *goquery.Selection) {
        g.Exports <- map[string]interface{}{
            "text":   s.Find("span.text").Text(),
            "author": s.Find("").Text(),
    if href, ok := r.HTMLDoc.Find(" > a").Attr("href"); ok {
        g.Get(r.JoinURL(href), quotesParse)

See tests for more usage examples.



go get -u

If you want to make JS rendered requests, make sure you have Chrome installed.

NOTE: macOS limits the maximum number of open file descriptors. If you want to make concurrent requests over 256, you need to increase limits. Read this for more.

Making Normal Requests

Initial requests start with StartURLs []string field in Options. Geziyor makes concurrent requests to those URLs. After reading response, ParseFunc func(g *Geziyor, r *Response) called.

    StartURLs: []string{""},
    ParseFunc: func(g *geziyor.Geziyor, r *client.Response) {

If you want to manually create first requests, set StartRequestsFunc. StartURLs won't be used if you create requests manually.
You can make requests using Geziyor methods:

    StartRequestsFunc: func(g *geziyor.Geziyor) {
    	g.Get("", g.Opt.ParseFunc)
        g.Head("", g.Opt.ParseFunc)
    ParseFunc: func(g *geziyor.Geziyor, r *client.Response) {

Making JS Rendered Requests

JS Rendered requests can be made using GetRendered method. By default, geziyor uses local Chrome application CLI to start Chrome browser. Set BrowserEndpoint option to use different chrome instance. Such as, "ws://localhost:3000"

    StartRequestsFunc: func(g *geziyor.Geziyor) {
        g.GetRendered("", g.Opt.ParseFunc)
    ParseFunc: func(g *geziyor.Geziyor, r *client.Response) {
    //BrowserEndpoint: "ws://localhost:3000",

Extracting Data

We can extract HTML elements using response.HTMLDoc. HTMLDoc is Goquery's Document.

HTMLDoc can be accessible on Response if response is HTML and can be parsed using Go's built-in HTML parser If response isn't HTML, response.HTMLDoc would be nil.

    StartURLs: []string{""},
    ParseFunc: func(g *geziyor.Geziyor, r *client.Response) {
        r.HTMLDoc.Find("div.quote").Each(func(_ int, s *goquery.Selection) {
            log.Println(s.Find("span.text").Text(), s.Find("").Text())

Exporting Data

You can export data automatically using exporters. Just send data to Geziyor.Exports chan. Available exporters

    StartURLs: []string{""},
    ParseFunc: func(g *geziyor.Geziyor, r *client.Response) {
        r.HTMLDoc.Find("div.quote").Each(func(_ int, s *goquery.Selection) {
            g.Exports <- map[string]interface{}{
                "text":   s.Find("span.text").Text(),
                "author": s.Find("").Text(),
    Exporters: []export.Exporter{&export.JSON{}},


8748 request per seconds on Macbook Pro 15" 2016

See tests for this benchmark function:

>> go test -run none -bench Requests -benchtime 10s
goos: darwin
goarch: amd64
BenchmarkRequests-8   	  200000	    108710 ns/op
ok	22.861s
  • google-chrome: executable file not found in $PATH

    google-chrome: executable file not found in $PATH


    I get an error when I start my service on the server. Local on my machine everything works so far.

    request getting rendered: exec: "google-chrome": executable file not found in $PATH



    // ...
    	crawler := geziyor.NewGeziyor(&geziyor.Options{
    		StartRequestsFunc: func(g *geziyor.Geziyor) {
    			g.GetRendered("", g.Opt.ParseFunc)
    		Exporters: []export.Exporter{&export.JSON{}},
    // ...


    # -- Stage 1 -- #
    FROM golang:1.16-alpine as builder
    WORKDIR /app
    COPY . .
    RUN go build -mod=readonly -o bin/service
    # -- Stage 2 -- #
    FROM alpine
    # Install any required dependencies.
    RUN apk --no-cache add ca-certificates
    WORKDIR /root/
    COPY --from=builder /app/bin/service /usr/local/bin/
    CMD ["service"]


    I assume I need additional dependencies on my server for geziyor to run smoothly? For example Headless Chrome?

  • Cookie cutters and Declarative scrapping

    Cookie cutters and Declarative scrapping

    Many web sites can be scrapped using standard CSS selection without defining fancy Go code to do that. For this, I still like goscrape's "structured scraper" approach. Ref:

    And here is how its scrapping is defined declaratively:

    	config := &scrape.ScrapeConfig{
    		DividePage: scrape.DividePageBySelector("tr:nth-child(3) tr:nth-child(3n-2):not([style='height:10px'])"),
    		Pieces: []scrape.Piece{
    			{Name: "title", Selector: "td.title > a", Extractor: extract.Text{}},
    			{Name: "link", Selector: "td.title > a", Extractor: extract.Attr{Attr: "href"}},
    			{Name: "rank", Selector: "td.title[align='right']",
    				Extractor: extract.Regex{Regex: regexp.MustCompile(`(\d+)`)}},
    		Paginator: paginate.BySelector("a[rel='nofollow']:last-child", "href"),

    Hope geziyor can do declarative scrapping using predefined cookie cutters like above as well.

  • context deadline exceeded

    context deadline exceeded

    I'm trying to scrape 3242 webpages but I'm getting response: Get "": context deadline exceeded (Client.Timeout exceeded while awaiting headers) for a lot of urls

    Any advice?

  • Queue performance enhancements + delay middleware fix

    Queue performance enhancements + delay middleware fix

    Enhances the queue logic to improve memory management and handle deadlock situations

    • Fixes delay middleware to always factor in delay if randomised delay is added (combined, not instead of)
    • Moves request middleware to prior to the core func - this allows middleware to cancel requests within triggering the semaphore locks (and corresponding rate limits!), also avoids queuing items that will only be cancelled, saving memory.
    • Avoids deadlocks when the queue exceeds the max queue size, discards any new records and prints a message to the log
  • Are there any plan to add supports for a POST request?

    Are there any plan to add supports for a POST request?

    Hi there, I was using the project for a personal crawler, after navigating the source code I've realized that the only way to send a POST request might be implementing a StartRequestsFunc (let me know if I'm wrong lol) which manipulates the http client directly, e.g.,

    func postToUrl(url string, body io.Reader) {
    		StartRequestsFunc: func(g *geziyor.Geziyor) {
    			req, _ := client.NewRequest("POST", url, body)
    			g.Do(req, nil)

    I haven't tried this approach yet but I'd like to know if that's the proper way to send requests other than a GET? Or is there any plan to add other implementations or an official example about a POST request?

  • out of control RAM usage

    out of control RAM usage

    I've got a script that clicks every link and then clicks every link, and it quickly gets out of hand in terms of memory usage (40+GB) before crashing. Any suggestions as to where it's getting out of control? Storing millions of requests shouldn't take that much RAM in my mind.

  • Proxy Management Not supported

    Proxy Management Not supported

    In order to integrate proxies, geziyor does not provide any interface. It does provide request middlewares but the object that can be manipulated in the middleware does not have proxy related configuration. Would be great if that can be supported as well.

  • How to get response error other than HTTP errors

    How to get response error other than HTTP errors


    How can I get response error other than HTTP errors (StatusCode), like time out, address not found, Website isn't reachable.... ? For example

    		StartURLs: []string{""},
    		ParseFunc: func(g *geziyor.Geziyor, r *client.Response) {

    Log output :

    2019/12/10 15:00:21 Scraping Started 2019/12/10 15:00:21 Retrying: 2019/12/10 15:00:21 Retrying: 2019/12/10 15:00:21 Response error: Get dial tcp: lookup no such host 2019/12/10 15:00:21 Scraping Finished

    I want to strore in DataBase site Url & Error ("", "dial tcp: lookup no such host")

  • Recursive Exports / Native return channels

    Recursive Exports / Native return channels

    I found it quite common to have recursive / nested scarping.

    ├── a
    │   ├── itemA
    │   └── foldA
    │       └── itemB
    └── b
        ├── itemC
        └── foldA
            └── itemA

    Total result being something like:

      "a": [
          "title": "itemA",
          "author": "Foo Bar",
          "contents": "asdjnasknd"
          "title": "foldA",
          "children": [
              "title": "itemB",
              "author": "Foo Baz",
              "contents": "afgdgasknd"
      "b": [
          "title": "itemC",
          "author": "Foo Bar",
          "contents": "odjfoij"
          "title": "foldA",
          "children": [
              "title": "itemA",
              "author": "Foo Baz",
              "contents": "alsd"

    Problem is, as soon as you pass something to g.Do(), you have no way of hearing back from the function.

  • runtime error: invalid memory address or nil pointer dereference

    runtime error: invalid memory address or nil pointer dereference

    I just ran the basic example and got this error


    package main
    import (
    func main() {
    		StartRequestsFunc: func(g *geziyor.Geziyor) {
    			g.GetRendered("", g.Opt.ParseFunc)
    		ParseFunc: func(g *geziyor.Geziyor, r *client.Response) {
    		//BrowserEndpoint: "ws://localhost:3000",


    Scraping Started
    Crawled: (200) <GET>
    runtime error: invalid memory address or nil pointer dereference goroutine 40 [running]:
            C:/Program Files/Go/src/runtime/debug/stack.go:24 +0x65*Geziyor).recoverMe(0xc00016cdc0)
            C:/Users/Marshall/go/pkg/mod/[email protected]/geziyor.go:307 +0x45
    panic({0x111dc60, 0x17f7d60})
            C:/Program Files/Go/src/runtime/panic.go:838 +0x207
    main.main.func2(0xc00014a1c8?, 0xc000409d10?)
            C:/Users/Marshall/Desktop/gezi/main.go:16 +0x18*Geziyor).do(0xc00016cdc0, 0xc0001524b0, 0x12350c8)
            C:/Users/Marshall/go/pkg/mod/[email protected]/geziyor.go:262 +0x235
    created by*Geziyor).Do
            C:/Users/Marshall/go/pkg/mod/[email protected]/geziyor.go:228 +0xd2
    Scraping Finished

    Any advice?

  • Add a generic in-memory counter and expose Metrics

    Add a generic in-memory counter and expose Metrics

    Added a Generic metrics counter (in-memory) and also exposed the Metrics variable so that it can be used outside of external metrics counters.

    (This is more of a suggestion, I'm otherwise counting manually but it doesn't make sense when it's built in!)

  • problem installing geziyor

    problem installing geziyor

    go get -u
    go: go.mod file not found in current directory or any parent directory.
    	'go get' is no longer supported outside a module.
    	To build and install a command, use 'go install' with a version,
    	like 'go install'
    	For more information, see
    	or run 'go help get' or 'go help install'.
    ubuntu2204@ubuntu2204:~/goscrape$ go get go.mod
    go: go.mod file not found in current directory or any parent directory.
    	'go get' is no longer supported outside a module.
    	To build and install a command, use 'go install' with a version,
    	like 'go install'
    	For more information, see
    	or run 'go help get' or 'go help install'.
    ubuntu2204@ubuntu2204:~/goscrape$ go install
    go: 'go install' requires a version when current directory is not in a module
    	Try 'go install' to install the latest version
    ubuntu2204@ubuntu2204:~/goscrape$ go install
    go: downloading v0.0.0-20220429000531-738852f9321d
    go: downloading v0.0.0-20220411224347-583f2d630306
    go: downloading v0.8.0
    go: downloading v1.8.0
    go: downloading v0.0.0-20220428002153-285dfb42699c
    go: downloading v0.0.0-20220425223048-2871e0cb64e4
    go: downloading v0.3.7
    go: downloading v0.12.0
    go: downloading v1.12.1
    go: downloading v1.1.2
    go: downloading v1.3.1
    go: downloading v1.0.1
    go: downloading v2.1.2
    go: downloading v1.5.2
    go: downloading v0.2.0
    go: downloading v1.1.0
    go: downloading v0.34.0
    go: downloading v0.7.3
    go: downloading v1.28.0
    go: downloading v1.0.0
    go: downloading v1.0.1
    go: downloading v0.0.0-20220422013727-9388b58f7150
    package is not a main package
  • Scrape URLs then get to there.

    Scrape URLs then get to there.

    I'm looking for,

    • Parses URLs
    • Visit to each parsed URLs
    • Parse data from visited page.

    For example,

    • Get books URLs from Goodreads
    • Visit those places
    • Get books' data from the visited pages.

    This is possible with colly, I wonder if it's possible with geziyor.

  • Is scraping shadow DOM an option?

    Is scraping shadow DOM an option?

    Hi, I'm trying to web scrapping YouTube charts, unsuccessfully because they use polymer / shadow DOM. With Geziyor, could I do that? I'm using colly, and they don't have support for that.

