Declarative web scraping

Ferret

Go Report Status Build Status Discord Chat Ferret release Apache-2.0 License

ferret

Try it! Docs CLI Test runner Web worker

What is it?

ferret is a web scraping system. It aims to simplify data extraction from the web for UI testing, machine learning, analytics and more.
ferret allows users to focus on the data. It abstracts away the technical details and complexity of underlying technologies using its own declarative language. It is extremely portable, extensible, and fast.

Read the introductory blog post about Ferret here!

Features

  • Declarative language
  • Support of both static and dynamic web pages
  • Embeddable
  • Extensible

Documentation is available at our website.

Owner
Comments
  • Website

    Website

    Create a website with nice and clean design. It should contain:

    • [ ] Introduction
    • [ ] Quick start
    • [ ] Guideline
    • [ ] API documentation
    • [ ] FAQ (?)
    • [ ] Contact information

    The website repo is here.

  • New to Go, Not working on Ubuntu 18

    New to Go, Not working on Ubuntu 18

    I'm new to this go stuff. I tried installing this on Ubuntu 18. First installing go, and then trying to make ferret... Would it be possible to post a completely newbie guide with all the command line steps? Would be much appreciated, thanks!

  • How to get rid of converted characters in URLs

    How to get rid of converted characters in URLs

    I am developing a crawler and so far, so very good: thank you for this outstanding crawler.

    The only issue is that, in the returned URLs, there is a & character which gets converted into \u0026, thus: "https://thedomain/alphabet=M\u0026borough=Bronx"

    So I tried to replace it, either by using SUBSTITUTE: RETURN SUBSTITUTE(prfx + letter.attributes.href, "\u0026", "&")

    or REGEX_REPLACE.

    In both cases, the \u0026 string is NOT replaced and remains embedded into the resulting URLs. However, when I try SUBSTITUTEsay on a -> z it works fine.

    Is it a limitation of JSON, which I use as an output format? How can I get rid of the converted string as it prevents me from crawling at the lower levels of the website.

  • Create a Docker image with stripped down version of Chromium

    Create a Docker image with stripped down version of Chromium

    Chrome is awesome and all, but for scraping tasks it's too heavy. We need to investigate how we can create a custom build with stripped features that are not relevant to web scraping and publish this Docker image.

    Some links:

    • https://github.com/gcarq/inox-patchset
    • https://github.com/Eloston/ungoogled-chromium
  • Add object functions

    Add object functions

    • [x] KEYS(object, sort) → strArray
    • [x] HAS(object, keyName) → isPresent ~~LENGTH(object) → count~~ (Implemented here)
    • [x] MERGE(object1, object2, ... objectN) → newMergedObject
    • [x] MERGE_RECURSIVE(object1, object2, ... objectN) → newMergedObject
    • [x] VALUES(document, removeInternal) → anyArray
    • [x] ZIP(keys, values) → newObj
    • [x] KEEP(object, key1, key2, ... key) → newObj
  • Many changes

    Many changes

    1. Added options for http driver: WithTimeout, WithBodyLimit.
    2. Changed default level from Debug to Trace when setting http headers.
    3. Added ferret function to_binary for convert data to array bytes (it's necessary for requests with body).
    4. Added mutex to map with params because in some cases, emerging race condition.
    5. Big changes - it's unification using function document without DOM tree prepare. Current realization (IO::NET::HTTP) not using proxy and not retuned headers and cookies and duplicates code in http.Driver. So I rewrote it with HTTPResponse as returned object from ferret - realization and tests
  • STYLE_GET seems broken

    STYLE_GET seems broken

    Describe the bug Not sure because it's my first test on it but STYLE_GET seems broken

    To Reproduce

    LET doc = DOCUMENT('https://news.ycombinator.com/', {
        driver: 'cdp',
        viewport: {
            width: 1920,
            height: 1080
        }
    })
    
    
    WAIT_ELEMENT(doc, '.storylink', 5000)
    
    FOR el IN ELEMENTS(doc, '.title')
        RETURN STYLE_GET(el, 'font-family')
    

    will return [{},{},{}...]

    Expected behavior

    font-family: Verdana, Geneva, sans-serif

    Screenshots If applicable, add screenshots to help explain your problem.

    Desktop (please complete the following information):

    • Version: 0.11.1
  • Fix arithmetic operators

    Fix arithmetic operators

    The arithmetic operators must accept operands of any type. Passing non-numeric values to an arithmetic operator must cast the operands to numbers:

    • NONE will be converted to 0
    • false will be converted to 0, true will be converted to 1
    • a valid numeric value remains unchanged, but NaN and Infinity will be converted to 0
    • string values are converted to a number if they contain a valid string representation of a number. Any whitespace at the start or the end of the string is ignored. Strings with any other contents are converted to the number 0
    • an empty array is converted to 0, an array with one member is converted to the numeric representation of its sole member. Arrays with more members are converted to the number 0.
    • objects, binary and custom types are converted to the number 0.

    Here are a few examples:

    Upd:

    1 + "a"                 // "1a"
    1 + "99"                // "199"
    1 + NONE                // 1
    NONE + 1                // 1
    3 + [ ]                 // 3
    24 + [ 2 ]              // 26
    24 + [ 2, 4 ]           // 30
    25 - NONE               // 25
    17 - true               // 16
    23 * { }                // 0
    5 * [ 7 ]               // 35
    5 * [ 7, 2 ]               // 45
    24 / "12"               // 2
    1 / 0                   // panic
    
  • Google example does not work with version 0.10

    Google example does not work with version 0.10

    Describe the bug The google example does not work anymore with the version 0.10

    To Reproduce Steps to reproduce the behavior:

    1. Run with version 0.9 the script https://github.com/MontFerret/ferret/blob/master/examples/google-search.fql
    2. The script works as expected
    3. Run with version 0.10 the script https://github.com/MontFerret/ferret/blob/master/examples/google-search.fql
    4. the script prints the error:
    Failed to execute the query
    cdp.DOM: GetContentQuads: rpc error: Could not compute content quads. (code = -32000): CLICK(google,'input[name="btnK"]') at 8:0
    

    Expected behavior It is expected to have the same behavior.

    Screenshots If applicable, add screenshots to help explain your problem.

    Desktop (please complete the following information):

    • OS: OSX 10.14.5
    • Browser
    /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --version
    Google Chrome 80.0.3987.132
    
    • Version [e.g. 22]

    Additional context Chrome is launched with the command line:

    /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222
    
  • What should be the origin when NAVIGATE

    What should be the origin when NAVIGATE

    This is a question rather than issue. I am trying to get URLs of files inside the FOR loop (or ideally download the PDFs). The flow is Login page -> Saved content page -> Refcard page -> Download button. I struggle to understand what the origin should be and how many DOCUMENTs do i need? Any help would be appreciated.

    // Login works fine.
    LET base_url = DOCUMENT("https://dzone.com/", true)
    LET login_doc = DOCUMENT("https://dzone.com/users/login.html", true)
    LET login_btn = ELEMENT(login_doc, "button[type=submit]")
    INPUT(login_doc, "form[role=form] input[name=j_username]", "[email protected]", 5)
    INPUT(login_doc, "form[role=form] input[name=j_password]", "XXXXXX", 5)
    CLICK(login_btn)
    WAIT_NAVIGATION(login_doc, 25000)
    
    // Loop in Refcardz on the 'Saved' content page of the user to get the links. Also working fine.
    LET origin_doc = DOCUMENT("https://dzone.com/users/3590306/dzone-refcardz.html?sort=saved", true)
    LET origin_url = "https://dzone.com/users/3590306/dzone-refcardz.html?sort=saved"
    NAVIGATE(login_doc, origin_url, 25000)
    WAIT_ELEMENT(origin_doc, 'p[class=comment-title]', 50000)
    LET titles = ELEMENTS(origin_doc, 'div[class="col-md-11 comment-description"] p[class="comment-title"]')
    LET links = (
      FOR el IN titles
        LET refcard_name = ELEMENT(el, "a")
        LET refcard_url = "https://dzone.com" + refcard_name.attributes.href
        RETURN refcard_url
    )
    
    // On each Refcard page, click on the 'Download' button, get the URL and then go back. Does not work.
    FOR link_url IN links
      LET link_origin_doc = DOCUMENT(origin_url, true)
      NAVIGATE(link_origin_doc, link_url, 50000)
      WAIT_ELEMENT(link_origin_doc, 'button[class="btn download btn-lg"]', 5000)
      LET download_btn = ELEMENT(link_origin_doc, 'button[class="btn download btn-lg"]')
      CLICK(download_btn)
      RETURN(link_origin_doc.url)
      NAVIGATE_BACK(link_origin_doc)
    
  • Added fuzzer

    Added fuzzer

    This PR adds a fuzzer that targets Compile()

    I managed to get the fuzzer running on oss-fuzz's platform as well, and I propose setting ferret up on in oss-fuzz. This will allow oss-fuzz to run this fuzzer as well as future fuzzers continuously. If a bug is discovered, oss-fuzz sends a bug report via email to all maintainers on their contact list. It doesn't cost anything to be a part of oss-fuzz, but the bugs do need to get fixed.

    If you would like to setup ferret on oss-fuzz, I need a primary contact email address and the email addresses of all maintainers to add to the email list. This can be updated at anytime.

    Google might ask about the userbase of ferret in order to accept the project, so if you have any information about any companies or other packages using ferret, it would help with getting ferret accepted.

  • Update cdp driver to work with latest cdp protocol

    Update cdp driver to work with latest cdp protocol

    In commit d0c159af4b8f9066cc6c90cce5970bb0289e0c01 in https://github.com/mafredri/cdp it looks like a couple of changes were made that are not backwards compatible with the implementation in ferret so these changes should fix that. it looks like they:

    • removed the SetignoreInvalidPageRanges function
    • added arguments to the dom.enable for including white space

    I got these errors immediately after cloning the repository and was unable to build/run without fixing them.

  • Bump golang.org/x/text from 0.4.0 to 0.5.0

    Bump golang.org/x/text from 0.4.0 to 0.5.0

    Bumps golang.org/x/text from 0.4.0 to 0.5.0.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • How to access cdp.Driver.client

    How to access cdp.Driver.client

    Hi! I would need to access the "cdp.Driver.client" object, because I would like to modify the "SetDownloadBehavior" of the cdp chrome driver (in fact, I would like to be able to modify the location where files gets downloaded, for that I need to "SetDownloadPath": https://pkg.go.dev/github.com/mafredri/cdp?utm_source=godoc#Page.SetDownloadBehavior )

    But this object is private (client). Is there a way for me to be able to manipulate the client, throught the context of whatever else so I could modify this SetDownloadPath by any mean without rebuilding your whole project ?

    Thanks a lot in advance for any tip! And congratulations again, amazing project !

  • fatal error: unexpected signal during runtime execution, fresh install 🤷‍♂️

    fatal error: unexpected signal during runtime execution, fresh install 🤷‍♂️

    Describe the bug On fresh ferret install, I got fatal error: unexpected signal during runtime execution when I return JSON. Probably linked to: https://github.com/MontFerret/worker/issues/27

    To Reproduce Steps to reproduce the behavior:

    • go version go1.19 darwin/amd64

    • go install github.com/MontFerret/cli/ferret@latest3.

    Screenshot 2022-09-02 at 09 35 31
    • ferret exec --browser-open test.yml

    RETURN 0

    Screenshot 2022-09-02 at 09 35 49
    • ferret exec --browser-open test.yml

    RETURN {'test': 0}

    Screenshot 2022-09-02 at 09 49 59
    fatal error: unexpected signal during runtime execution
    [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1071705]
    
    goroutine 1 [running]:
    runtime.throw({0x1951176?, 0xc00031a0c0?})
    	/usr/local/go/src/runtime/panic.go:1047 +0x5d fp=0xc000249190 sp=0xc000249160 pc=0x103649d
    runtime.sigpanic()
    	/usr/local/go/src/runtime/signal_unix.go:819 +0x369 fp=0xc0002491e0 sp=0xc000249190 pc=0x104c189
    sync.(*Pool).Get(0x2117ca0)
    	/usr/local/go/src/sync/pool.go:132 +0x25 fp=0xc000249218 sp=0xc0002491e0 pc=0x1071705
    github.com/wI2L/jettison.encodeSortedMap(0x2117c60, {0xc0000c4000, 0x1, 0x1000}, {{0x1a7f2b8, 0xc00019a008}, {0x194ad2c, 0x23}, 0x5, 0x80, ...}, ...)
    	/Users/pbrisorgueil/Documents/Dev/go/pkg/mod/github.com/w!i2!l/[email protected]/encode.go:415 +0x7a fp=0xc000249358 sp=0xc000249218 pc=0x150c0ba
    github.com/wI2L/jettison.encodeMap(0x0?, {0xc0000c4000, 0x0, 0x1000}, {{0x1a7f2b8, 0xc00019a008}, {0x194ad2c, 0x23}, 0x5, 0x80, ...}, ...)
    	/Users/pbrisorgueil/Documents/Dev/go/pkg/mod/github.com/w!i2!l/[email protected]/encode.go:364 +0x345 fp=0xc000249438 sp=0xc000249358 pc=0x150bc65
    github.com/wI2L/jettison.newMapInstr.func1(0x2117ba0?, {0xc0000c4000?, 0xc00031eb90?, 0xc0000c4000?}, {{0x1a7f2b8, 0xc00019a008}, {0x194ad2c, 0x23}, 0x5, 0x80, ...})
    	/Users/pbrisorgueil/Documents/Dev/go/pkg/mod/github.com/w!i2!l/[email protected]/instruction.go:400 +0x72 fp=0xc0002494c8 sp=0xc000249438 pc=0x1511e52
    github.com/wI2L/jettison.wrapInlineInstr.func1(0xc0003191d0, {0xc0000c4000?, 0xab389f8?, 0x40?}, {{0x1a7f2b8, 0xc00019a008}, {0x194ad2c, 0x23}, 0x5, 0x80, ...})
    	/Users/pbrisorgueil/Documents/Dev/go/pkg/mod/github.com/w!i2!l/[email protected]/instruction.go:406 +0x65 fp=0xc000249538 sp=0xc0002494c8 pc=0x1512065
    github.com/wI2L/jettison.marshalJSON({0x180ea20?, 0xc0003191d0?}, {{0x1a7f2b8, 0xc00019a008}, {0x194ad2c, 0x23}, 0x5, 0x80, 0x0, 0x0})
    	/Users/pbrisorgueil/Documents/Dev/go/pkg/mod/github.com/w!i2!l/[email protected]/json.go:167 +0xd9 fp=0xc000249600 sp=0xc000249538 pc=0x1513019
    github.com/wI2L/jettison.MarshalOpts({0x180ea20, 0xc0003191d0}, {0xc0002496e8, 0x1, 0x187e540?})
    	/Users/pbrisorgueil/Documents/Dev/go/pkg/mod/github.com/w!i2!l/[email protected]/json.go:142 +0x1a9 fp=0xc0002496c0 sp=0xc000249600 pc=0x1512e09
    github.com/MontFerret/ferret/pkg/runtime/values.(*Object).MarshalJSON(0xc000318f30?)
    	/Users/pbrisorgueil/Documents/Dev/go/pkg/mod/github.com/!mont!ferret/[email protected]/pkg/runtime/values/object.go:47 +0x45 fp=0xc000249700 sp=0xc0002496c0 pc=0x151f905
    github.com/MontFerret/ferret/pkg/runtime.(*Program).Run(0xc000318f90, {0x1a7f328, 0xc000318ff0}, {0xc000249910?, 0x0?, 0x0?})
    	/Users/pbrisorgueil/Documents/Dev/go/pkg/mod/github.com/!mont!ferret/[email protected]/pkg/runtime/program.go:99 +0x366 fp=0xc0002498a8 sp=0xc000249700 pc=0x1526ea6
    github.com/MontFerret/cli/runtime.(*Builtin).Run(0xc00004a300, {0x1a7f328, 0xc0000b0000}, {0xc00003e498?, 0x1a77c00?}, 0x0?)
    	/Users/pbrisorgueil/Documents/Dev/go/pkg/mod/github.com/!mont!ferret/[email protected]/runtime/builtin.go:55 +0x174 fp=0xc000249930 sp=0xc0002498a8 pc=0x17020f4
    github.com/MontFerret/cli/runtime.Run({0x1a7f328, 0xc0000b0000}, {{0x192b904, 0x7}, {0x1a77c00, 0x0}, {0x1a77c00, 0x0}, 0x0, 0x0, ...}, ...)
    	/Users/pbrisorgueil/Documents/Dev/go/pkg/mod/github.com/!mont!ferret/[email protected]/runtime/runtime.go:40 +0xaa fp=0xc0002499a8 sp=0xc000249930 pc=0x1703bca
    github.com/MontFerret/cli/cmd.execScript(0xc0001acec0?, {{0x192b904, 0x7}, {0x1a77c00, 0x0}, {0x1a77c00, 0x0}, 0x0, 0x0, 0x0, ...}, ...)
    	/Users/pbrisorgueil/Documents/Dev/go/pkg/mod/github.com/!mont!ferret/[email protected]/cmd/exec.go:131 +0x6e fp=0xc000249a60 sp=0xc0002499a8 pc=0x172c2ae
    github.com/MontFerret/cli/cmd.ExecCommand.func2(0xc0002c3900, {0xc00009a080, 0x1, 0x2?})
    	/Users/pbrisorgueil/Documents/Dev/go/pkg/mod/github.com/!mont!ferret/[email protected]/cmd/exec.go:110 +0x4d1 fp=0xc000249cd0 sp=0xc000249a60 pc=0x172b8d1
    github.com/spf13/cobra.(*Command).execute(0xc0002c3900, {0xc0001ac020, 0x2, 0x2})
    	/Users/pbrisorgueil/Documents/Dev/go/pkg/mod/github.com/spf13/[email protected]/command.go:856 +0x67c fp=0xc000249da8 sp=0xc000249cd0 pc=0x11d7e1c
    github.com/spf13/cobra.(*Command).ExecuteC(0xc0002c2a00)
    	/Users/pbrisorgueil/Documents/Dev/go/pkg/mod/github.com/spf13/[email protected]/command.go:974 +0x3bd fp=0xc000249e60 sp=0xc000249da8 pc=0x11d849d
    github.com/spf13/cobra.(*Command).Execute(...)
    	/Users/pbrisorgueil/Documents/Dev/go/pkg/mod/github.com/spf13/[email protected]/command.go:902
    github.com/spf13/cobra.(*Command).ExecuteContext(...)
    	/Users/pbrisorgueil/Documents/Dev/go/pkg/mod/github.com/spf13/[email protected]/command.go:895
    main.main()
    	/Users/pbrisorgueil/Documents/Dev/go/pkg/mod/github.com/!mont!ferret/[email protected]/ferret/main.go:70 +0x449 fp=0xc000249f80 sp=0xc000249e60 pc=0x172d409
    runtime.main()
    	/usr/local/go/src/runtime/proc.go:250 +0x212 fp=0xc000249fe0 sp=0xc000249f80 pc=0x1038cb2
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc000249fe8 sp=0xc000249fe0 pc=0x10675c1
    
    goroutine 2 [force gc (idle)]:
    runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
    	/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc00007afb0 sp=0xc00007af90 pc=0x1039076
    runtime.goparkunlock(...)
    	/usr/local/go/src/runtime/proc.go:369
    runtime.forcegchelper()
    	/usr/local/go/src/runtime/proc.go:302 +0xad fp=0xc00007afe0 sp=0xc00007afb0 pc=0x1038f0d
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc00007afe8 sp=0xc00007afe0 pc=0x10675c1
    created by runtime.init.6
    	/usr/local/go/src/runtime/proc.go:290 +0x25
    
    goroutine 3 [GC sweep wait]:
    runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
    	/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc00007b790 sp=0xc00007b770 pc=0x1039076
    runtime.goparkunlock(...)
    	/usr/local/go/src/runtime/proc.go:369
    runtime.bgsweep(0x0?)
    	/usr/local/go/src/runtime/mgcsweep.go:297 +0xd7 fp=0xc00007b7c8 sp=0xc00007b790 pc=0x1026037
    runtime.gcenable.func1()
    	/usr/local/go/src/runtime/mgc.go:178 +0x26 fp=0xc00007b7e0 sp=0xc00007b7c8 pc=0x101ac86
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc00007b7e8 sp=0xc00007b7e0 pc=0x10675c1
    created by runtime.gcenable
    	/usr/local/go/src/runtime/mgc.go:178 +0x6b
    
    goroutine 4 [GC scavenge wait]:
    runtime.gopark(0xc0000a4000?, 0x1a77c38?, 0x0?, 0x0?, 0x0?)
    	/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc00007bf70 sp=0xc00007bf50 pc=0x1039076
    runtime.goparkunlock(...)
    	/usr/local/go/src/runtime/proc.go:369
    runtime.(*scavengerState).park(0x2118720)
    	/usr/local/go/src/runtime/mgcscavenge.go:389 +0x53 fp=0xc00007bfa0 sp=0xc00007bf70 pc=0x1024093
    runtime.bgscavenge(0x0?)
    	/usr/local/go/src/runtime/mgcscavenge.go:622 +0x65 fp=0xc00007bfc8 sp=0xc00007bfa0 pc=0x1024685
    runtime.gcenable.func2()
    	/usr/local/go/src/runtime/mgc.go:179 +0x26 fp=0xc00007bfe0 sp=0xc00007bfc8 pc=0x101ac26
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc00007bfe8 sp=0xc00007bfe0 pc=0x10675c1
    created by runtime.gcenable
    	/usr/local/go/src/runtime/mgc.go:179 +0xaa
    
    goroutine 18 [finalizer wait]:
    runtime.gopark(0x0?, 0x19b8750?, 0x0?, 0x60?, 0x2000000020?)
    	/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc00007a628 sp=0xc00007a608 pc=0x1039076
    runtime.goparkunlock(...)
    	/usr/local/go/src/runtime/proc.go:369
    runtime.runfinq()
    	/usr/local/go/src/runtime/mfinal.go:180 +0x10f fp=0xc00007a7e0 sp=0xc00007a628 pc=0x1019d8f
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc00007a7e8 sp=0xc00007a7e0 pc=0x10675c1
    created by runtime.createfing
    	/usr/local/go/src/runtime/mfinal.go:157 +0x45
    
    goroutine 19 [select, locked to thread]:
    runtime.gopark(0xc0000767a8?, 0x2?, 0xf7?, 0x93?, 0xc0000767a4?)
    	/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc000114e20 sp=0xc000114e00 pc=0x1039076
    runtime.selectgo(0xc000114fa8, 0xc0000767a0, 0x0?, 0x0, 0x0?, 0x1)
    	/usr/local/go/src/runtime/select.go:328 +0x7bc fp=0xc000114f60 sp=0xc000114e20 pc=0x10483fc
    runtime.ensureSigM.func1()
    	/usr/local/go/src/runtime/signal_unix.go:991 +0x187 fp=0xc000114fe0 sp=0xc000114f60 pc=0x104c5e7
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc000114fe8 sp=0xc000114fe0 pc=0x10675c1
    created by runtime.ensureSigM
    	/usr/local/go/src/runtime/signal_unix.go:974 +0xbd
    
    goroutine 5 [syscall]:
    runtime.sigNoteSleep(0x0)
    	/usr/local/go/src/runtime/os_darwin.go:123 +0x1e fp=0xc00007c7a0 sp=0xc00007c768 pc=0x103347e
    os/signal.signal_recv()
    	/usr/local/go/src/runtime/sigqueue.go:149 +0x28 fp=0xc00007c7c0 sp=0xc00007c7a0 pc=0x1063948
    os/signal.loop()
    	/usr/local/go/src/os/signal/signal_unix.go:23 +0x19 fp=0xc00007c7e0 sp=0xc00007c7c0 pc=0x10f1259
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc00007c7e8 sp=0xc00007c7e0 pc=0x10675c1
    created by os/signal.Notify.func1.1
    	/usr/local/go/src/os/signal/signal.go:151 +0x2a
    
    goroutine 6 [chan receive]:
    runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
    	/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc00007cef8 sp=0xc00007ced8 pc=0x1039076
    runtime.chanrecv(0xc0001a65a0, 0x0, 0x1)
    	/usr/local/go/src/runtime/chan.go:583 +0x49b fp=0xc00007cf88 sp=0xc00007cef8 pc=0x10081bb
    runtime.chanrecv1(0x0?, 0x0?)
    	/usr/local/go/src/runtime/chan.go:442 +0x18 fp=0xc00007cfb0 sp=0xc00007cf88 pc=0x1007cb8
    main.main.func3()
    	/Users/pbrisorgueil/Documents/Dev/go/pkg/mod/github.com/!mont!ferret/[email protected]/ferret/main.go:65 +0x2d fp=0xc00007cfe0 sp=0xc00007cfb0 pc=0x172d46d
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc00007cfe8 sp=0xc00007cfe0 pc=0x10675c1
    created by main.main
    	/Users/pbrisorgueil/Documents/Dev/go/pkg/mod/github.com/!mont!ferret/[email protected]/ferret/main.go:63 +0x3de
    
    goroutine 12 [GC worker (idle)]:
    runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
    	/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc00032a750 sp=0xc00032a730 pc=0x1039076
    runtime.gcBgMarkWorker()
    	/usr/local/go/src/runtime/mgc.go:1235 +0xf1 fp=0xc00032a7e0 sp=0xc00032a750 pc=0x101cdd1
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc00032a7e8 sp=0xc00032a7e0 pc=0x10675c1
    created by runtime.gcBgMarkStartWorkers
    	/usr/local/go/src/runtime/mgc.go:1159 +0x25
    
    goroutine 11 [GC worker (idle)]:
    runtime.gopark(0x2121580?, 0xc00007df88?, 0xe5?, 0x31?, 0xc000007d40?)
    	/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc00007df50 sp=0xc00007df30 pc=0x1039076
    runtime.gcBgMarkWorker()
    	/usr/local/go/src/runtime/mgc.go:1235 +0xf1 fp=0xc00007dfe0 sp=0xc00007df50 pc=0x101cdd1
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc00007dfe8 sp=0xc00007dfe0 pc=0x10675c1
    created by runtime.gcBgMarkStartWorkers
    	/usr/local/go/src/runtime/mgc.go:1159 +0x25
    
    goroutine 22 [GC worker (idle)]:
    runtime.gopark(0x6f188fe939b55?, 0x3?, 0x1e?, 0xb1?, 0x0?)
    	/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc00007d750 sp=0xc00007d730 pc=0x1039076
    runtime.gcBgMarkWorker()
    	/usr/local/go/src/runtime/mgc.go:1235 +0xf1 fp=0xc00007d7e0 sp=0xc00007d750 pc=0x101cdd1
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc00007d7e8 sp=0xc00007d7e0 pc=0x10675c1
    created by runtime.gcBgMarkStartWorkers
    	/usr/local/go/src/runtime/mgc.go:1159 +0x25
    
    goroutine 20 [IO wait]:
    runtime.gopark(0x5?, 0xc0002d8000?, 0x0?, 0x10?, 0x0?)
    	/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc000088af8 sp=0xc000088ad8 pc=0x1039076
    runtime.netpollblock(0x107a129?, 0x1087f97?, 0x0?)
    	/usr/local/go/src/runtime/netpoll.go:526 +0xf7 fp=0xc000088b30 sp=0xc000088af8 pc=0x10322f7
    internal/poll.runtime_pollWait(0xab11f08, 0x72)
    	/usr/local/go/src/runtime/netpoll.go:305 +0x89 fp=0xc000088b50 sp=0xc000088b30 pc=0x1061949
    internal/poll.(*pollDesc).wait(0xc000334000?, 0xc0002d8000?, 0x0)
    	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32 fp=0xc000088b78 sp=0xc000088b50 pc=0x10cfdb2
    internal/poll.(*pollDesc).waitRead(...)
    	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
    internal/poll.(*FD).Read(0xc000334000, {0xc0002d8000, 0x1000, 0x1000})
    	/usr/local/go/src/internal/poll/fd_unix.go:167 +0x25a fp=0xc000088bf8 sp=0xc000088b78 pc=0x10d111a
    net.(*netFD).Read(0xc000334000, {0xc0002d8000?, 0x0?, 0x4?})
    	/usr/local/go/src/net/fd_posix.go:55 +0x29 fp=0xc000088c40 sp=0xc000088bf8 pc=0x1112149
    net.(*conn).Read(0xc0001a4178, {0xc0002d8000?, 0xc00031c018?, 0x1?})
    	/usr/local/go/src/net/net.go:183 +0x45 fp=0xc000088c88 sp=0xc000088c40 pc=0x1120085
    net/http.(*persistConn).Read(0xc000332000, {0xc0002d8000?, 0x1049180?, 0xc000088ec8?})
    	/usr/local/go/src/net/http/transport.go:1929 +0x4e fp=0xc000088ce8 sp=0xc000088c88 pc=0x132af2e
    bufio.(*Reader).fill(0xc0001a66c0)
    	/usr/local/go/src/bufio/bufio.go:106 +0xff fp=0xc000088d20 sp=0xc000088ce8 pc=0x11828bf
    bufio.(*Reader).Peek(0xc0001a66c0, 0x1)
    	/usr/local/go/src/bufio/bufio.go:144 +0x5d fp=0xc000088d40 sp=0xc000088d20 pc=0x1182a1d
    net/http.(*persistConn).readLoop(0xc000332000)
    	/usr/local/go/src/net/http/transport.go:2093 +0x1ac fp=0xc000088fc8 sp=0xc000088d40 pc=0x132bd4c
    net/http.(*Transport).dialConn.func5()
    	/usr/local/go/src/net/http/transport.go:1751 +0x26 fp=0xc000088fe0 sp=0xc000088fc8 pc=0x132a526
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc000088fe8 sp=0xc000088fe0 pc=0x10675c1
    created by net/http.(*Transport).dialConn
    	/usr/local/go/src/net/http/transport.go:1751 +0x173e
    
    goroutine 21 [select]:
    runtime.gopark(0xc000110f90?, 0x2?, 0xd8?, 0xd?, 0xc000110f24?)
    	/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc000110d90 sp=0xc000110d70 pc=0x1039076
    runtime.selectgo(0xc000110f90, 0xc000110f20, 0xc0001acf80?, 0x0, 0xc00027efc0?, 0x1)
    	/usr/local/go/src/runtime/select.go:328 +0x7bc fp=0xc000110ed0 sp=0xc000110d90 pc=0x10483fc
    net/http.(*persistConn).writeLoop(0xc000332000)
    	/usr/local/go/src/net/http/transport.go:2392 +0xf5 fp=0xc000110fc8 sp=0xc000110ed0 pc=0x132d9d5
    net/http.(*Transport).dialConn.func6()
    	/usr/local/go/src/net/http/transport.go:1752 +0x26 fp=0xc000110fe0 sp=0xc000110fc8 pc=0x132a4c6
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc000110fe8 sp=0xc000110fe0 pc=0x10675c1
    created by net/http.(*Transport).dialConn
    	/usr/local/go/src/net/http/transport.go:1752 +0x1791
    
    goroutine 36 [GC worker (idle)]:
    runtime.gopark(0xc0003260c0?, 0x0?, 0x0?, 0x0?, 0x0?)
    	/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc000076f50 sp=0xc000076f30 pc=0x1039076
    runtime.gcBgMarkWorker()
    	/usr/local/go/src/runtime/mgc.go:1235 +0xf1 fp=0xc000076fe0 sp=0xc000076f50 pc=0x101cdd1
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc000076fe8 sp=0xc000076fe0 pc=0x10675c1
    created by runtime.gcBgMarkStartWorkers
    	/usr/local/go/src/runtime/mgc.go:1159 +0x25
    
    goroutine 37 [GC worker (idle)]:
    runtime.gopark(0x4?, 0xc000314020?, 0xe?, 0x0?, 0xc0001c3700?)
    	/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc00032e750 sp=0xc00032e730 pc=0x1039076
    runtime.gcBgMarkWorker()
    	/usr/local/go/src/runtime/mgc.go:1235 +0xf1 fp=0xc00032e7e0 sp=0xc00032e750 pc=0x101cdd1
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc00032e7e8 sp=0xc00032e7e0 pc=0x10675c1
    created by runtime.gcBgMarkStartWorkers
    	/usr/local/go/src/runtime/mgc.go:1159 +0x25
    
    goroutine 13 [GC worker (idle)]:
    runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
    	/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc00032af50 sp=0xc00032af30 pc=0x1039076
    runtime.gcBgMarkWorker()
    	/usr/local/go/src/runtime/mgc.go:1235 +0xf1 fp=0xc00032afe0 sp=0xc00032af50 pc=0x101cdd1
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc00032afe8 sp=0xc00032afe0 pc=0x10675c1
    created by runtime.gcBgMarkStartWorkers
    	/usr/local/go/src/runtime/mgc.go:1159 +0x25
    
    goroutine 38 [GC worker (idle)]:
    runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
    	/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc00032ef50 sp=0xc00032ef30 pc=0x1039076
    runtime.gcBgMarkWorker()
    	/usr/local/go/src/runtime/mgc.go:1235 +0xf1 fp=0xc00032efe0 sp=0xc00032ef50 pc=0x101cdd1
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc00032efe8 sp=0xc00032efe0 pc=0x10675c1
    created by runtime.gcBgMarkStartWorkers
    	/usr/local/go/src/runtime/mgc.go:1159 +0x25
    
    goroutine 39 [GC worker (idle)]:
    runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
    	/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc00032f750 sp=0xc00032f730 pc=0x1039076
    runtime.gcBgMarkWorker()
    	/usr/local/go/src/runtime/mgc.go:1235 +0xf1 fp=0xc00032f7e0 sp=0xc00032f750 pc=0x101cdd1
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc00032f7e8 sp=0xc00032f7e0 pc=0x10675c1
    created by runtime.gcBgMarkStartWorkers
    	/usr/local/go/src/runtime/mgc.go:1159 +0x25
    
    goroutine 14 [GC worker (idle)]:
    runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
    	/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc00032b750 sp=0xc00032b730 pc=0x1039076
    runtime.gcBgMarkWorker()
    	/usr/local/go/src/runtime/mgc.go:1235 +0xf1 fp=0xc00032b7e0 sp=0xc00032b750 pc=0x101cdd1
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc00032b7e8 sp=0xc00032b7e0 pc=0x10675c1
    created by runtime.gcBgMarkStartWorkers
    	/usr/local/go/src/runtime/mgc.go:1159 +0x25
    
    goroutine 15 [GC worker (idle)]:
    runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
    	/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc00032bf50 sp=0xc00032bf30 pc=0x1039076
    runtime.gcBgMarkWorker()
    	/usr/local/go/src/runtime/mgc.go:1235 +0xf1 fp=0xc00032bfe0 sp=0xc00032bf50 pc=0x101cdd1
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc00032bfe8 sp=0xc00032bfe0 pc=0x10675c1
    created by runtime.gcBgMarkStartWorkers
    	/usr/local/go/src/runtime/mgc.go:1159 +0x25
    
    goroutine 40 [GC worker (idle)]:
    runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
    	/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc00032ff50 sp=0xc00032ff30 pc=0x1039076
    runtime.gcBgMarkWorker()
    	/usr/local/go/src/runtime/mgc.go:1235 +0xf1 fp=0xc00032ffe0 sp=0xc00032ff50 pc=0x101cdd1
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc00032ffe8 sp=0xc00032ffe0 pc=0x10675c1
    created by runtime.gcBgMarkStartWorkers
    	/usr/local/go/src/runtime/mgc.go:1159 +0x25
    
    goroutine 23 [GC worker (idle)]:
    runtime.gopark(0x6f188fe93942c?, 0x3?, 0x9f?, 0x67?, 0x0?)
    	/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc000077750 sp=0xc000077730 pc=0x1039076
    runtime.gcBgMarkWorker()
    	/usr/local/go/src/runtime/mgc.go:1235 +0xf1 fp=0xc0000777e0 sp=0xc000077750 pc=0x101cdd1
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc0000777e8 sp=0xc0000777e0 pc=0x10675c1
    created by runtime.gcBgMarkStartWorkers
    	/usr/local/go/src/runtime/mgc.go:1159 +0x25
    
    goroutine 24 [GC worker (idle)]:
    runtime.gopark(0x2153dc0?, 0x1?, 0x7?, 0xcd?, 0x0?)
    	/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc000077f50 sp=0xc000077f30 pc=0x1039076
    runtime.gcBgMarkWorker()
    	/usr/local/go/src/runtime/mgc.go:1235 +0xf1 fp=0xc000077fe0 sp=0xc000077f50 pc=0x101cdd1
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc000077fe8 sp=0xc000077fe0 pc=0x10675c1
    created by runtime.gcBgMarkStartWorkers
    	/usr/local/go/src/runtime/mgc.go:1159 +0x25
    
    goroutine 25 [GC worker (idle)]:
    runtime.gopark(0x6f188fe93cee5?, 0x0?, 0x0?, 0x0?, 0x0?)
    	/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc000078750 sp=0xc000078730 pc=0x1039076
    runtime.gcBgMarkWorker()
    	/usr/local/go/src/runtime/mgc.go:1235 +0xf1 fp=0xc0000787e0 sp=0xc000078750 pc=0x101cdd1
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc0000787e8 sp=0xc0000787e0 pc=0x10675c1
    created by runtime.gcBgMarkStartWorkers
    	/usr/local/go/src/runtime/mgc.go:1159 +0x25
    
    goroutine 16 [GC worker (idle)]:
    runtime.gopark(0x6f188fe939a5a?, 0x0?, 0x0?, 0x0?, 0x0?)
    	/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc00032c750 sp=0xc00032c730 pc=0x1039076
    runtime.gcBgMarkWorker()
    	/usr/local/go/src/runtime/mgc.go:1235 +0xf1 fp=0xc00032c7e0 sp=0xc00032c750 pc=0x101cdd1
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc00032c7e8 sp=0xc00032c7e0 pc=0x10675c1
    created by runtime.gcBgMarkStartWorkers
    	/usr/local/go/src/runtime/mgc.go:1159 +0x25
    
    goroutine 26 [GC worker (idle)]:
    runtime.gopark(0x6f188fe939437?, 0x0?, 0x0?, 0x0?, 0x0?)
    	/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc000078f50 sp=0xc000078f30 pc=0x1039076
    runtime.gcBgMarkWorker()
    	/usr/local/go/src/runtime/mgc.go:1235 +0xf1 fp=0xc000078fe0 sp=0xc000078f50 pc=0x101cdd1
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc000078fe8 sp=0xc000078fe0 pc=0x10675c1
    created by runtime.gcBgMarkStartWorkers
    	/usr/local/go/src/runtime/mgc.go:1159 +0x25
    

    Expected behavior No error

    Desktop (please complete the following information): Mac intel, ferret browser, latest

    Additional context strangely it works fine with binaries already made https://github.com/MontFerret/cli/releases/download/v1.8.0/cli_darwin_x86_64.tar.gz Screenshot 2022-09-02 at 10 03 20

  • Installation documentation is incorrect

    Installation documentation is incorrect

    Describe the bug Install docs don't work. image

    Running any of these commands does not produce a successful outcome.

    To Reproduce Steps to reproduce the behavior:

    1. Run any of the commands in the screenshot above

    Expected behavior Expected Ferret to install

    Screenshots image

    Desktop (please complete the following information):

    • OS: Mac OS 12
    • Version: Latest

    Additional context This works go install github.com/MontFerret/cli/ferret@latest

Implementing WEB Scraping with Go

WEB Scraping with Go In this project I implement a WEB scraper that create a CSV file with quotes and authors from the Pensador programing Web Page. R

Dec 10, 2021
🦙 acao(阿草), the tool man for data scraping of https://asoul.video/.

?? acao acao(阿草), the tool man for data scraping of https://asoul.video/. Deploy to Aliyun serverless function with Raika update_member Update A-SOUL

Jul 25, 2022
A neat wrapper around the 4chan API for content scraping.

moonarch A neat wrapper around the 4chan API for content scraping. How-To First, get the repository: go get github.com/lazdotdigital/fourscrape. fours

Nov 27, 2021
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架

Crawlab 中文 | English Installation | Run | Screenshot | Architecture | Integration | Compare | Community & Sponsorship | CHANGELOG | Disclaimer Golang-

Jan 7, 2023
ant (alpha) is a web crawler for Go.

The package includes functions that can scan data from the page into your structs or slice of structs, this allows you to reduce the noise and complexity in your source-code.

Dec 30, 2022
Fetch web pages using headless Chrome, storing all fetched resources including JavaScript files

Fetch web pages using headless Chrome, storing all fetched resources including JavaScript files. Run arbitrary JavaScript on many web pages and see the returned values

Dec 29, 2022
Web Scraper in Go, similar to BeautifulSoup

soup Web Scraper in Go, similar to BeautifulSoup soup is a small web scraper package for Go, with its interface highly similar to that of BeautifulSou

Jan 9, 2023
Gospider - Fast web spider written in Go
Gospider - Fast web spider written in Go

GoSpider GoSpider - Fast web spider written in Go Painless integrate Gospider into your recon workflow? Enjoying this tool? Support it's development a

Dec 31, 2022
Apollo 💎 A Unix-style personal search engine and web crawler for your digital footprint.
Apollo 💎 A Unix-style personal search engine and web crawler for your digital footprint.

Apollo ?? A Unix-style personal search engine and web crawler for your digital footprint Demo apollodemo.mp4 Contents Background Thesis Design Archite

Dec 27, 2022
DataHen Till is a standalone tool that instantly makes your existing web scraper scalable, maintainable, and more unblockable, with minimal code changes on your scraper.
DataHen Till is a standalone tool that instantly makes your existing web scraper scalable, maintainable, and more unblockable, with minimal code changes on your scraper.

DataHen Till is a standalone tool that instantly makes your existing web scraper scalable, maintainable, and more unblockable, with minimal code changes on your scraper.

Dec 14, 2022
Fast, highly configurable, cloud native dark web crawler.

Bathyscaphe dark web crawler Bathyscaphe is a Go written, fast, highly configurable, cloud-native dark web crawler. How to start the crawler To start

Nov 22, 2022
Just a web crawler
Just a web crawler

gh-dependents gh command extension to see dependents of your repository. See The GitHub Blog: GitHub CLI 2.0 includes extensions! Install gh extension

Sep 27, 2022
Golang based web site opengraph data scraper with caching
Golang based web site opengraph data scraper with caching

Snapper A Web microservice for capturing a website's OpenGraph data built in Golang Building Snapper building the binary git clone https://github.com/

Oct 5, 2022
Crawls web pages and prints any link it can find.

crawley Crawls web pages and prints any link it can find. Scan depth (by default - 0) can be configured. features fast SAX-parser (powered by golang.o

Jan 4, 2023
WebWalker - Fast Script To Walk Web for find urls...

WebWalker send http request to url to get all urls in url and send http request to urls and again .... WebWalker can find 10,000 urls in 10 seconds.

Nov 28, 2021
skweez spiders web pages and extracts words for wordlist generation.

skweez skweez (pronounced like "squeeze") spiders web pages and extracts words for wordlist generation. It is basically an attempt to make a more oper

Nov 27, 2022
Examples for chromedp for web scrapping

About chromedp examples This folder contains a variety of code examples for working with chromedp. The godoc page contains a number of simple examples

Nov 30, 2021
Dumbass-news - A web service to report dumbass news

Dumbass News - a web service to report dumbass news Copyright (C) 2022 Mike Tayl

Jan 18, 2022
A recursive, mirroring web crawler that retrieves child links.

A recursive, mirroring web crawler that retrieves child links.

Jan 29, 2022