ant (alpha) is a web crawler for Go.




ant (alpha) is a web crawler for Go.








Declarative

The package includes functions that can scan data from the page into your structs or slice of structs, this allows you to reduce the noise and complexity in your source-code.

You can also use a jQuery-like API that allows you to scrape complex HTML pages if needed.

var data struct { Title string `css:"title"` }
page, _ := ant.Fetch(ctx, "https://apple.com")
page.Scan(&data)
data.Title // => Apple

Headless

By default the crawler uses http.Client, however if you're crawling SPAs youc an use the antcdp.Client implementation which allows you to use chrome headless browser to crawl pages.

eng, err := ant.Engine(ant.EngineConfig{
  Fetcher: &ant.Fetcher{
    Client: antcdp.Client{},
  },
})

Polite

The crawler automatically fetches and caches robots.txt, making sure that it never causes issues to small website owners. Of-course you can disable this behavior.

eng, err := ant.NewEngine(ant.EngineConfig{
  Impolite: true,
})
eng.Run(ctx)

Concurrent

The crawler maintains a configurable amount of "worker" goroutines that read URLs off the queue, and spawn a goroutine for each URL.

Depending on your configuration, you may want to increase the number of workers to speed up URL reads, of-course if you don't have enough resources you can reduce the number of workers too.

eng, err := ant.NewEngine(ant.EngineConfig{
  // Spawn 5 worker goroutines that dequeue
  // URLs and spawn a new goroutine for each URL.
  Workers: 5,
})
eng.Run(ctx)

Rate limits

The package includes a powerful ant.Limiter interface that allows you to define rate limits per URL. There are some built-in limiters as well.

ant.Limit(1) // 1 rps on all URLs.
ant.LimitHostname(5, "amazon.com") // 5 rps on amazon.com hostname.
ant.LimitPattern(5, "amazon.com.*") // 5 rps on URLs starting with `amazon.co.`.
ant.LimitRegexp(5, "^apple.com\/iphone\/*") // 5 rps on URLs that match the regex.

Note that LimitPattern and LimitRegexp only match on the host and path of the URL.


Matchers

Another powerful interface is ant.Matcher which allows you to define URL matchers, the matchers are called before URLs are queued.

ant.MatchHostname("amazon.com") // scrape amazon.com URLs only.
ant.MatchPattern("amazon.com/help/*")
ant.MatchRegexp("amazon\.com\/help/.+")

Robust

The crawl engine automatically retries any errors that implement Temporary() error that returns true.

Becuase the standard library returns errors that implement that interface the engine will retry most temporary network and HTTP errors.

eng, err := ant.NewEngine(ant.EngineConfig{
  Scraper: myscraper{},
  MaxAttempts: 5,
})

// Blocks until one of the following is true:
//
// 1. No more URLs to crawl (the scraper stops returning URLs)
// 2. A non-temporary error occured.
// 3. MaxAttempts was reached.
//
err = eng.Run(ctx)

Built-in Scrapers

The whole point of scraping is to extract data from websites into a machine readable format such as CSV or JSON, ant comes with built-in scrapers to make this ridiculously easy, here's a full cralwer that extracts quotes into stdout.

func main() {
	var url = "http://quotes.toscrape.com"
	var ctx = context.Background()
	var start = time.Now()

	type quote struct {
		Text string   `css:".text"   json:"text"`
		By   string   `css:".author" json:"by"`
		Tags []string `css:".tag"    json:"tags"`
	}

	type page struct {
		Quotes []quote `css:".quote" json:"quotes"`
	}

	eng, err := ant.NewEngine(ant.EngineConfig{
		Scraper: ant.JSON(os.Stdout, page{}, `li.next > a`),
		Matcher: ant.MatchHostname("quotes.toscrape.com"),
	})
	if err != nil {
		log.Fatalf("new engine: %s", err)
	}

	if err := eng.Run(ctx, url); err != nil {
		log.Fatal(err)
	}

	log.Printf("scraped in %s :)", time.Since(start))
}

Testing

anttest package makes it easy to test your scraper implementation it fetches a page by a URL, caches it in the OS's temporary directory and re-uses it.

The func depends on the file's modtime, the file expires daily, you can adjust the TTL by setting antttest.FetchTTL.

// Fetch calls `t.Fatal` on errors.
page := anttest.Fetch(t, "https://apple.com")
_, err := myscraper.Scrape(ctx, page)
assert.NoError(err)


Owner
Comments
  • build(deps): bump github.com/andybalholm/cascadia from 1.1.0 to 1.2.0

    build(deps): bump github.com/andybalholm/cascadia from 1.1.0 to 1.2.0

    Bumps github.com/andybalholm/cascadia from 1.1.0 to 1.2.0.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Note: This repo was added to Dependabot recently, so you'll receive a maximum of 5 PRs for your first few update runs. Once an update run creates fewer than 5 PRs we'll remove that limit.

    You can always request more updates by clicking Bump now in your Dependabot dashboard.

    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language
    • @dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

    Additionally, you can set the following in your Dependabot dashboard:

    • Update frequency (including time of day and day of week)
    • Pull request limits (per update run and/or open at any time)
    • Out-of-range updates (receive only lockfile updates, if desired)
    • Security updates (receive only security updates, if desired)
  • build(deps): bump github.com/mafredri/cdp from 0.31.0 to 0.32.0

    build(deps): bump github.com/mafredri/cdp from 0.31.0 to 0.32.0

    Bumps github.com/mafredri/cdp from 0.31.0 to 0.32.0.

    Release notes

    Sourced from github.com/mafredri/cdp's releases.

    v0.32.0

    • Update to latest protocol definitions (39a25b4)
    • cmd/cdpgen: Add initialisms (a69e549)
    • cmd/cdpgen: Handle project outside GOPATH (a7ff101)

    v0.31.1

    • devtool: Close http client request body (#126) (fd8f409)
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • MatchRegexp incorrect behavior

    MatchRegexp incorrect behavior

    MatchRegexp function tries to match r.Host+r.Path, but the r.Path is normalized to not contain the leading slash. So if the url := "https://google.com/search/car" and one tries to match with exp := regexp.QuoteMeta("google.com/search/car") it returns false because MatchRegexp is comparing "google.comsearch/car" with the exp.

  • Errors in the example for Built-in Scrapers

    Errors in the example for Built-in Scrapers

    The example under https://github.com/yields/ant#built-in-scrapers has a few minor errors in it.

    A corrected version would be:

    package main
    
    import (
    	"context"
    	"os"
    
    	"github.com/yields/ant"
    )
    
    func main() {
    	// Describe how a quote should be extracted.
    	type Quote struct {
    		Text string   `css:".text"`
    		By   string   `css:".author"`
    		Tags []string `css:".tag"`
    	}
    
    	// A page may have many quotes.
    	type Page struct {
    		Quotes []Quote `css:".quote"`
    	}
    
    	// Where we want to fetch quotes from.
    	const host = "quotes.toscrape.com"
    
    	// Initialize the engine with a built-in scraper
    	// that receives a type and extract data into an io.Writer.
    	eng, err := ant.NewEngine(ant.EngineConfig{
    		Scraper: ant.JSON(os.Stdout, Page{}),
    		Matcher: ant.MatchHostname(host),
    	})
    	if err != nil {
    		panic(err)
    	}
    
    	// Block until there are no more URLs to scrape.
    	if err := eng.Run(context.Background(), "http://"+host); err != nil {
    		panic(err)
    	}
    }
    

    One suggestion I have is to include examples in your README using something like https://github.com/campoy/embedmd which would make it easier to spot when examples are out of date for some reason.

  • Fix import in example for antcdp

    Fix import in example for antcdp

    The import in the example for antcdp was incorrect (probably outdated), and gave the following error:

    go get: module github.com/yields/ant@upgrade found (v0.0.0-20210325225620-7fef9fdab5a8), but does not contain package github.com/yields/ant/exp/antcdp
    
  • build(deps): bump github.com/hashicorp/go-multierror from 1.1.0 to 1.1.1

    build(deps): bump github.com/hashicorp/go-multierror from 1.1.0 to 1.1.1

    Bumps github.com/hashicorp/go-multierror from 1.1.0 to 1.1.1.

    Commits
    • 9974e9e Merge pull request #50 from sethvargo/sethvargo/npe
    • 0023bb0 Check if multierror is nil in WrappedErrors
    • ab6846a we require go 1.13
    • cb869d9 Merge pull request #49 from hashicorp/fix-ci-config
    • af59c66 Upgrade gotestsum version
    • eabd672 Fix CI config
    • 7795f06 Merge pull request #47 from hashicorp/mwhooker-patch-1
    • e27d231 update badges to use circle
    • 78708db Carry over some changes from #26
    • c7dd669 Merge pull request #27 from mdeggies/add-circleci
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language
    • @dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

    Additionally, you can set the following in your Dependabot dashboard:

    • Update frequency (including time of day and day of week)
    • Pull request limits (per update run and/or open at any time)
    • Out-of-range updates (receive only lockfile updates, if desired)
    • Security updates (receive only security updates, if desired)
  • build(deps): bump github.com/stretchr/testify from 1.6.1 to 1.7.0

    build(deps): bump github.com/stretchr/testify from 1.6.1 to 1.7.0

    Bumps github.com/stretchr/testify from 1.6.1 to 1.7.0.

    Release notes

    Sourced from github.com/stretchr/testify's releases.

    Minor improvements and bug fixes

    Minor feature improvements and bug fixes

    Commits
    • acba37e Only use repeatability if no repeatability left
    • eb8c41e Add more tests to mock package
    • a5830c5 Extract method to evaluate closest match
    • 1962448 Use Repeatability as tie-breaker for closest match
    • 92707c0 Fixed the link to not point to assert only
    • 05dd0b2 Updated the readme to point to pkg.dev
    • c26b7f3 Update assertions.go
    • 8fb4b24 [Fix] The most recent changes to golang/protobuf breaks the spew Circular dat...
    • dc8af72 add generated code for positive/negative assertion
    • 1544508 add assert positive/negative
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language
    • @dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

    Additionally, you can set the following in your Dependabot dashboard:

    • Update frequency (including time of day and day of week)
    • Pull request limits (per update run and/or open at any time)
    • Out-of-range updates (receive only lockfile updates, if desired)
    • Security updates (receive only security updates, if desired)
  • build(deps): bump github.com/mafredri/cdp from 0.29.2 to 0.30.0

    build(deps): bump github.com/mafredri/cdp from 0.29.2 to 0.30.0

    ⚠️ Dependabot is rebasing this PR ⚠️

    If you make any changes to it yourself then they will take precedence over the rebase.


    Bumps github.com/mafredri/cdp from 0.29.2 to 0.30.0.

    Release notes

    Sourced from github.com/mafredri/cdp's releases.

    0.30.0

    • Refactor errors and add support for Go 1.13 errors (#124) 0ea84f3f55ce89090f9ecb993590f6920ebdea38
      • all: Implement Is, As and Unwrap for errors
      • Deprecate cdp.ErrorCause in favor of Is, As, Unwrap
      • Use errors.Is and errors.As where applicable
      • travis: Drop Go 1.12, add Go 1.15
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language
    • @dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

    Additionally, you can set the following in your Dependabot dashboard:

    • Update frequency (including time of day and day of week)
    • Pull request limits (per update run and/or open at any time)
    • Out-of-range updates (receive only lockfile updates, if desired)
    • Security updates (receive only security updates, if desired)
  • build(deps): bump github.com/tidwall/match from 1.0.3 to 1.1.1

    build(deps): bump github.com/tidwall/match from 1.0.3 to 1.1.1

    Bumps github.com/tidwall/match from 1.0.3 to 1.1.1.

    Commits
    • 4c9fc61 Add MatchLimit function for limiting pattern complexity
    • 84b6aac Avoid expensive recursive operations
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • build(deps): bump github.com/andybalholm/cascadia from 1.2.0 to 1.3.1

    build(deps): bump github.com/andybalholm/cascadia from 1.2.0 to 1.3.1

    Bumps github.com/andybalholm/cascadia from 1.2.0 to 1.3.1.

    Commits
    • a8d8085 Fix index out of bounds in parser.
    • b690f7e Fix spelling of test_resources.
    • 93fe7c5 Add ignore case feature for CSS attribute selectors
    • 939eba4 Support more pseudo classes.
    • 9bb392e Check in go.sum file.
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • build(deps): bump github.com/golang/snappy from 0.0.3 to 0.0.4

    build(deps): bump github.com/golang/snappy from 0.0.3 to 0.0.4

    Bumps github.com/golang/snappy from 0.0.3 to 0.0.4.

    Commits
    • 544b418 Update AUTHORS and CONTRIBUTORS
    • 0eaccd4 Fix dangling golden_test filename link
    • 3ff355f Merge pull request #51 from topos-ai/bytereader
    • b9440b4 Merge pull request #40 from EdwardBetts/spelling
    • ef34881 Merge pull request #60 from alexlegg/master
    • 33fc3d5 Merge pull request #61 from cuonglm/cuonglm/fix-wrong-arm64-scaled-register-f...
    • b46926b Fix wrong arm64 scaled register format
    • e149cdd Use a more inclusive text for golden input.
    • 0a27eb7 Add ReadByte method, satisfies the io.ByteReader interface
    • da2bb33 correct spelling mistake
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • build(deps): bump github.com/mafredri/cdp from 0.31.0 to 0.33.0

    build(deps): bump github.com/mafredri/cdp from 0.31.0 to 0.33.0

    Bumps github.com/mafredri/cdp from 0.31.0 to 0.33.0.

    Release notes

    Sourced from github.com/mafredri/cdp's releases.

    v0.33.0

    • Switch to GitHub Actions (#129) (159a656)
    • Update to latest protocol definitions (#130) (d0c159a)
    • rpcc: Fix potential deadlock in stream Sync (#134) (91594d7)
    • Format project with gofumpt (#135) (3c5eab7)
    • Add example using alternative websocket implementation (69932b8)
    • Update go.mod (9d6eeb0)
    • Update to latest protocol definitions (#136) (b079440)

    https://github.com/mafredri/cdp/compare/v0.32.0...v0.33.0

    v0.32.0

    • Update to latest protocol definitions (39a25b4)
    • cmd/cdpgen: Add initialisms (a69e549)
    • cmd/cdpgen: Handle project outside GOPATH (a7ff101)

    v0.31.1

    • devtool: Close http client request body (#126) (fd8f409)
    Commits
    • b079440 Update to latest protocol definitions (#136)
    • 9d6eeb0 Update go.mod
    • 69932b8 Add example using alternative websocket implementation
    • 3c5eab7 Format project with gofumpt (#135)
    • 91594d7 rpcc: Fix potential deadlock in stream Sync (#134)
    • d0c159a Update to latest protocol definitions (#130)
    • 159a656 Switch to GitHub Actions (#129)
    • 39a25b4 Update to latest protocol definitions
    • a69e549 cmd/cdpgen: Add initialisms
    • a7ff101 cmd/cdpgen: Handle project outside GOPATH
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架

Crawlab 中文 | English Installation | Run | Screenshot | Architecture | Integration | Compare | Community & Sponsorship | CHANGELOG | Disclaimer Golang-

Jan 7, 2023
Apollo 💎 A Unix-style personal search engine and web crawler for your digital footprint.
Apollo 💎 A Unix-style personal search engine and web crawler for your digital footprint.

Apollo ?? A Unix-style personal search engine and web crawler for your digital footprint Demo apollodemo.mp4 Contents Background Thesis Design Archite

Dec 27, 2022
Fast, highly configurable, cloud native dark web crawler.

Bathyscaphe dark web crawler Bathyscaphe is a Go written, fast, highly configurable, cloud-native dark web crawler. How to start the crawler To start

Nov 22, 2022
Just a web crawler
Just a web crawler

gh-dependents gh command extension to see dependents of your repository. See The GitHub Blog: GitHub CLI 2.0 includes extensions! Install gh extension

Sep 27, 2022
A recursive, mirroring web crawler that retrieves child links.

A recursive, mirroring web crawler that retrieves child links.

Jan 29, 2022
Fast golang web crawler for gathering URLs and JavaSript file locations.

Fast golang web crawler for gathering URLs and JavaSript file locations. This is basically a simple implementation of the awesome Gocolly library.

Sep 24, 2022
Elegant Scraper and Crawler Framework for Golang

Colly Lightning Fast and Elegant Scraping Framework for Gophers Colly provides a clean interface to write any kind of crawler/scraper/spider. With Col

Jan 9, 2023
Pholcus is a distributed high-concurrency crawler software written in pure golang
Pholcus is a distributed high-concurrency crawler software written in pure golang

Pholcus Pholcus(幽灵蛛)是一款纯 Go 语言编写的支持分布式的高并发爬虫软件,仅用于编程学习与研究。 它支持单机、服务端、客户端三种运行模式,拥有Web、GUI、命令行三种操作界面;规则简单灵活、批量任务并发、输出方式丰富(mysql/mongodb/kafka/csv/excel等

Dec 30, 2022
:paw_prints: Creeper - The Next Generation Crawler Framework (Go)
:paw_prints: Creeper - The Next Generation Crawler Framework (Go)

About Creeper is a next-generation crawler which fetches web page by creeper script. As a cross-platform embedded crawler, you can use it for your new

Dec 4, 2022
Go IMDb Crawler
 Go IMDb Crawler

Go IMDb Crawler Hit the ⭐ button to show some ❤️ ?? INSPIRATION ?? Want to know which celebrities have a common birthday with yours? ?? Want to get th

Aug 1, 2022
High-performance crawler framework based on fasthttp

predator / 掠食者 基于 fasthttp 开发的高性能爬虫框架 使用 下面是一个示例,基本包含了当前已完成的所有功能,使用方法可以参考注释。

May 2, 2022
A crawler/scraper based on golang + colly, configurable via JSON

A crawler/scraper based on golang + colly, configurable via JSON

Aug 21, 2022
crawlergo is a browser crawler that uses chrome headless mode for URL collection.
crawlergo is a browser crawler that uses chrome headless mode for URL collection.

A powerful browser crawler for web vulnerability scanners

Dec 29, 2022
A crawler/scraper based on golang + colly, configurable via JSON

Super-Simple Scraper This a very thin layer on top of Colly which allows configuration from a JSON file. The output is JSONL which is ready to be impo

Aug 21, 2022
New World Auction House Crawler In Golang

New-World-Auction-House-Crawler Goal of this library is to have a process which grabs New World auction house data in the background while playing the

Sep 7, 2022
Simple content crawler for joyreactor.cc
Simple content crawler for joyreactor.cc

Reactor Crawler Simple CLI content crawler for Joyreactor. He'll find all media content on the page you've provided and save it. If there will be any

May 5, 2022
A PCPartPicker crawler for Golang.

gopartpicker A scraper for pcpartpicker.com for Go. It is implemented using Colly. Features Extract data from part list URLs Search for parts Extract

Nov 9, 2021
Multiplexer: HTTP-Server & URL Crawler

Multiplexer: HTTP-Server & URL Crawler Приложение представляет собой http-сервер с одним хендлером. Хендлер на вход получает POST-запрос со списком ur

Nov 3, 2021
A simple crawler sending Telegram notification when Refurbished Macbook Air / Pro in stock.

A simple crawler sending Telegram notification when Refurbished Macbook Air / Pro in stock.

Jan 30, 2022