Gospider - Fast web spider written in Go

Jaeles Project

Last update: Dec 31, 2022

Comments: 15

GoSpider

GoSpider - Fast web spider written in Go

Painless integrate Gospider into your recon workflow?

Enjoying this tool? Support it's development and take your game to the next level by using HunterSuite.io

Installation

go get -u github.com/jaeles-project/gospider

Features

Fast web crawling
Brute force and parse sitemap.xml
Parse robots.txt
Generate and verify link from JavaScript files
Link Finder
Find AWS-S3 from response source
Find subdomains from response source
Get URLs from Wayback Machine, Common Crawl, Virus Total, Alien Vault
Format output easy to Grep
Support Burp input
Crawl multiple sites in parallel
Random mobile/web User-Agent

Showcases

Usage

Fast web spider written in Go - v1.1.5 by @thebl4ckturtle & @j3ssiejjj

Usage:
  gospider [flags]

Flags:
  -s, --site string               Site to crawl
  -S, --sites string              Site list to crawl
  -p, --proxy string              Proxy (Ex: http://127.0.0.1:8080)
  -o, --output string             Output folder
  -u, --user-agent string         User Agent to use
                                  	web: random web user-agent
                                  	mobi: random mobile user-agent
                                  	or you can set your special user-agent (default "web")
      --cookie string             Cookie to use (testA=a; testB=b)
  -H, --header stringArray        Header to use (Use multiple flag to set multiple header)
      --burp string               Load headers and cookie from burp raw http request
      --blacklist string          Blacklist URL Regex
      --whitelist string          Whitelist URL Regex
      --whitelist-domain string   Whitelist Domain
  -t, --threads int               Number of threads (Run sites in parallel) (default 1)
  -c, --concurrent int            The number of the maximum allowed concurrent requests of the matching domains (default 5)
  -d, --depth int                 MaxDepth limits the recursion depth of visited URLs. (Set it to 0 for infinite recursion) (default 1)
  -k, --delay int                 Delay is the duration to wait before creating a new request to the matching domains (second)
  -K, --random-delay int          RandomDelay is the extra randomized duration to wait added to Delay before creating a new request (second)
  -m, --timeout int               Request timeout (second) (default 10)
  -B, --base                      Disable all and only use HTML content
      --js                        Enable linkfinder in javascript file (default true)
      --subs                      Include subdomains
      --sitemap                   Try to crawl sitemap.xml
      --robots                    Try to crawl robots.txt (default true)
  -a, --other-source              Find URLs from 3rd party (Archive.org, CommonCrawl.org, VirusTotal.com, AlienVault.com)
  -w, --include-subs              Include subdomains crawled from 3rd party. Default is main domain
  -r, --include-other-source      Also include other-source's urls (still crawl and request)
      --debug                     Turn on debug mode
      --json                      Enable JSON output
  -v, --verbose                   Turn on verbose
  -l, --length                    Turn on length
  -L, --filter-length             Turn on length filter
  -R, --raw                       Turn on raw
  -q, --quiet                     Suppress all the output and only show URL
      --no-redirect               Disable redirect
      --version                   Check version
  -h, --help                      help for gospider

Example commands

Quite output

gospider -q -s "https://google.com/"

Run with single site

gospider -s "https://google.com/" -o output -c 10 -d 1

Run with site list

gospider -S sites.txt -o output -c 10 -d 1

Run with 20 sites at the same time with 10 bot each site

gospider -S sites.txt -o output -c 10 -d 1 -t 20

Also get URLs from 3rd party (Archive.org, CommonCrawl.org, VirusTotal.com, AlienVault.com)

gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source

Also get URLs from 3rd party (Archive.org, CommonCrawl.org, VirusTotal.com, AlienVault.com) and include subdomains

gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source --include-subs

Use custom header/cookies

gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source -H "Accept: */*" -H "Test: test" --cookie "testA=a; testB=b"

gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source --burp burp_req.txt

Blacklist url/file extension.

P/s: gospider blacklisted .(jpg|jpeg|gif|css|tif|tiff|png|ttf|woff|woff2|ico) as default

gospider -s "https://google.com/" -o output -c 10 -d 1 --blacklist ".(woff|pdf)"

Show and Blacklist file length.

gospider -s "https://google.com/" -o output -c 10 -d 1 --length --filter-length "6871,24432"

License

Gospider is made with ♥ by @j3ssiejjj & @thebl4ckturtle and it is released under the MIT license.

Donation

Owner

Jaeles Project

The Swiss Army knife for automated Web Application Testing

https://github.com/jaeles-project/gospider

Comments

Help to install the script on Mac

can you please explain how can I install this tool in Mac?

Thanks

Pentest@tools ~ % sudo go get -u github.com/jaeles-project/gospider
Password:
Pentest@tools~ % gospider -s "https://google.com/" -o output -c 10 -d 1
zsh: command not found: gospider

Too many open files

Command: gospider -S "urls files" -o outuputfles -c 5 -t 100 -d 2 --other-source -v --robots --sitemap -u web

Error: [0024] ERROR Failed to open file to write Output: open *********/target_folder : too many open files

go get installation errors

go get -u github.com/jaeles-project/gospider
# github.com/jaeles-project/gospider/core
go/src/github.com/jaeles-project/gospider/core/crawler.go:27:17: unknown field 'MaxConnsPerHost' in struct literal of type http.Transport
go/src/github.com/jaeles-project/gospider/core/crawler.go:183:15: undefined: strings.ReplaceAll
go/src/github.com/jaeles-project/gospider/core/crawler.go:296:20: undefined: strings.ReplaceAll
go/src/github.com/jaeles-project/gospider/core/linkfinder.go:14:12: undefined: strings.ReplaceAll
go/src/github.com/jaeles-project/gospider/core/linkfinder.go:15:12: undefined: strings.ReplaceAll

This appens when downloading directly with go get -u github.com/jaeles-project/gospider

Any advice? Wrong Go version?

OS: Ubuntu 18.04.3 LTS x86_64 Kernel: 4.15.0-76-generic Go Version: go version go1.10.4 linux/amd64

Add features and fixes
Added features :

Show length and add length filter

Show Raw source code

Crawl all subdomains

Fixes :

Fix linkfinder (relative path, js in js, output ..)

Fix subdomains

Fix href output

Fix case sensitive for Duplicate

delay all requests
GoSpider doesn't seem to honor the delay parameter

Hi @j3ssie and team,

From what I can tell, setting the -k/--delay parameter doesn't delay anything. GoSpider still requests URL faster than expected.

Can you replicate this issue?
Empty output specifying HTTP(S) port
Description

Web I try to run gospider on an URL specifying also the HTTP port, sometimes I don't know why exactly it doesn't crawl the target.

Go version

go version go1.16.2 linux/amd64

Gospider Version

1.1.5 (In the last commit of https://github.com/jaeles-project/gospider/blob/2e610b3fd79e1ac0945b694385edd88028f821ce/core/version.go the version is wrong btw)

Test case 1 - Not specifying http or https port

./gospider -q -s https://shippingmanager.bpost.be/ --debug [0000] INFO Start crawling: https://shippingmanager.bpost.be/ [0000] INFO Found robots.txt: https://shippingmanager.bpost.be//robots.txt https://shippingmanager.bpost.be/ShmFrontEnd/ [0000] INFO Done.

Test case 2 - Specifying the port:

./gospider -q -s https://shippingmanager.bpost.be:443/ --debug
Include response length in output

It would be useful if the response length was included in the output, or even better, have a way to filter the output by response length (!=, <, >). This would allow the user to filter out expected responses during enumeration.
Subdomains are not shown in output

Hey, since the last update the gospider tool is not showing subdomains in output. I have checked this with multiple flags but it's not working.

The line with the pointed arrow is now missing from gospider output.

Can you please add it back?

Best,
Missing License

Hello, I didn't find license information. Can you add a LICENSE file or add the license information in the README.md?

I would like to package it for Kali Linux: https://bugs.kali.org/view.php?id=6514

Thanks.
Update README.md

Installing executables with "go get" in module mode is deprecated. "go install pkg@version" should be used instead. For more information, see https://golang.org/doc/go-get-install-deprecation
removing lower case conversion of paths and parameters

Gospider was converting case-sensitive paths and parameters to lowercase which results in lots of valid case-sensitive paths and parameters being 404 not found. For example a path found in HTML or JavaScript source: /SearchLive.php?Param=1 converts to: /searchlive.php?param=1
Output only URLs

Hi,

At first, congratulations for this project. I am have an issue, maybe by my mistake, but I'd want to send to stdout only URLs, without flags like [url] and [code-200]. Is it possible?
RAM usage

Gospider uses a lot of RAM. It's ram usage keeps increasing with time . And it chokes my server if it keeps running for a few hours even with low threads (2-3).

Is there any solution to this? or can you kindly solve this issue if possible?

Thanks

Add Dockerfile

FROM golang:1.17.8-alpine3.14 AS build-env
RUN apk add --no-cache build-base
RUN go install github.com/jaeles-project/gospider@latest

FROM alpine:3.15.0
RUN apk add --no-cache bind-tools ca-certificates
COPY --from=build-env /go/bin/gospider /usr/local/bin/gospider
ENTRYPOINT ["gospider"]

Run Docker

docker build -t gospider .
docker run --rm -t gospider -q -s "https://google.com/"

issue with URLs containing dashes

URLs containing dashes in a list can not be parsed: http://ec2-XXX-XX-XX-XXX.compute-1.amazonaws.com

gospider -S test.txt -v [0000] ERROR Failed to parse domain

Go spider: A crawler of vertical communities achieved by GOLANG

go_spider A crawler of vertical communities achieved by GOLANG. Latest stable Release: Version 1.2 (Sep 23, 2014). QQ群号：337344607 Features Concurrent

Dec 9, 2021

Fast, highly configurable, cloud native dark web crawler.

Bathyscaphe dark web crawler Bathyscaphe is a Go written, fast, highly configurable, cloud-native dark web crawler. How to start the crawler To start

Nov 22, 2022

WebWalker - Fast Script To Walk Web for find urls...

WebWalker send http request to url to get all urls in url and send http request to urls and again .... WebWalker can find 10,000 urls in 10 seconds.

Nov 28, 2021

Fast golang web crawler for gathering URLs and JavaSript file locations.

Fast golang web crawler for gathering URLs and JavaSript file locations. This is basically a simple implementation of the awesome Gocolly library.

Sep 24, 2022

Fast website link checker in Go

Muffet Muffet is a website link checker which scrapes and inspects all pages in a website recursively. Features Massive speed Colored outputs Differen

Dec 26, 2022

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

Jan 7, 2023

ant (alpha) is a web crawler for Go.

The package includes functions that can scan data from the page into your structs or slice of structs, this allows you to reduce the noise and complexity in your source-code.

Dec 30, 2022

Declarative web scraping

Ferret Try it! Docs CLI Test runner Web worker What is it? ferret is a web scraping system. It aims to simplify data extraction from the web for UI te

Jan 4, 2023

Fetch web pages using headless Chrome, storing all fetched resources including JavaScript files

Fetch web pages using headless Chrome, storing all fetched resources including JavaScript files. Run arbitrary JavaScript on many web pages and see the returned values

Dec 29, 2022

Web Scraper in Go, similar to BeautifulSoup

soup Web Scraper in Go, similar to BeautifulSoup soup is a small web scraper package for Go, with its interface highly similar to that of BeautifulSou

Jan 9, 2023

Apollo 💎 A Unix-style personal search engine and web crawler for your digital footprint.

Apollo ?? A Unix-style personal search engine and web crawler for your digital footprint Demo apollodemo.mp4 Contents Background Thesis Design Archite

Dec 27, 2022

DataHen Till is a standalone tool that instantly makes your existing web scraper scalable, maintainable, and more unblockable, with minimal code changes on your scraper.

Dec 14, 2022

Gospider - Fast web spider written in Go

GoSpider

Painless integrate Gospider into your recon workflow?

Installation

Features

Showcases

Usage

Example commands

Quite output

Run with single site

Run with site list

Run with 20 sites at the same time with 10 bot each site

Also get URLs from 3rd party (Archive.org, CommonCrawl.org, VirusTotal.com, AlienVault.com)

Also get URLs from 3rd party (Archive.org, CommonCrawl.org, VirusTotal.com, AlienVault.com) and include subdomains

Use custom header/cookies

Blacklist url/file extension.

Show and Blacklist file length.

License

Donation

Owner

Jaeles Project

Comments

Help to install the script on Mac

Too many open files

go get installation errors

Add features and fixes

GoSpider doesn't seem to honor the delay parameter

Empty output specifying HTTP(S) port

Description

Go version

Gospider Version

Test case 1 - Not specifying http or https port

Test case 2 - Specifying the port:

Include response length in output

Subdomains are not shown in output

Missing License

Update README.md

removing lower case conversion of paths and parameters

Output only URLs

RAM usage

Add Dockerfile

Add Dockerfile

Run Docker

issue with URLs containing dashes

Related tags

Go spider: A crawler of vertical communities achieved by GOLANG

Fast, highly configurable, cloud native dark web crawler.

WebWalker - Fast Script To Walk Web for find urls...

Fast golang web crawler for gathering URLs and JavaSript file locations.

Fast website link checker in Go

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

ant (alpha) is a web crawler for Go.

Declarative web scraping

Fetch web pages using headless Chrome, storing all fetched resources including JavaScript files

Web Scraper in Go, similar to BeautifulSoup

Apollo 💎 A Unix-style personal search engine and web crawler for your digital footprint.

DataHen Till is a standalone tool that instantly makes your existing web scraper scalable, maintainable, and more unblockable, with minimal code changes on your scraper.

Just a web crawler

Golang based web site opengraph data scraper with caching

Crawls web pages and prints any link it can find.

skweez spiders web pages and extracts words for wordlist generation.

Examples for chromedp for web scrapping

Youtube tutorial about web scraping using golang and Gocolly

Implementing WEB Scraping with Go