A download manager package for Go

grab

GoDoc Build Status Go Report Card

Downloading the internet, one goroutine at a time!

$ go get github.com/cavaliercoder/grab

Grab is a Go package for downloading files from the internet with the following rad features:

  • Monitor download progress concurrently
  • Auto-resume incomplete downloads
  • Guess filename from content header or URL path
  • Safely cancel downloads using context.Context
  • Validate downloads using checksums
  • Download batches of files concurrently
  • Apply rate limiters

Requires Go v1.7+

Example

The following example downloads a PDF copy of the free eBook, "An Introduction to Programming in Go" into the current working directory.

resp, err := grab.Get(".", "http://www.golang-book.com/public/pdf/gobook.pdf")
if err != nil {
	log.Fatal(err)
}

fmt.Println("Download saved to", resp.Filename)

The following, more complete example allows for more granular control and periodically prints the download progress until it is complete.

The second time you run the example, it will auto-resume the previous download and exit sooner.

package main

import (
	"fmt"
	"os"
	"time"

	"github.com/cavaliercoder/grab"
)

func main() {
	// create client
	client := grab.NewClient()
	req, _ := grab.NewRequest(".", "http://www.golang-book.com/public/pdf/gobook.pdf")

	// start download
	fmt.Printf("Downloading %v...\n", req.URL())
	resp := client.Do(req)
	fmt.Printf("  %v\n", resp.HTTPResponse.Status)

	// start UI loop
	t := time.NewTicker(500 * time.Millisecond)
	defer t.Stop()

Loop:
	for {
		select {
		case <-t.C:
			fmt.Printf("  transferred %v / %v bytes (%.2f%%)\n",
				resp.BytesComplete(),
				resp.Size(),
				100*resp.Progress())

		case <-resp.Done:
			// download is complete
			break Loop
		}
	}

	// check for errors
	if err := resp.Err(); err != nil {
		fmt.Fprintf(os.Stderr, "Download failed: %v\n", err)
		os.Exit(1)
	}

	fmt.Printf("Download saved to ./%v \n", resp.Filename)

	// Output:
	// Downloading http://www.golang-book.com/public/pdf/gobook.pdf...
	//   200 OK
	//   transferred 42970 / 2893557 bytes (1.49%)
	//   transferred 1207474 / 2893557 bytes (41.73%)
	//   transferred 2758210 / 2893557 bytes (95.32%)
	// Download saved to ./gobook.pdf
}

Design trade-offs

The primary use case for Grab is to concurrently downloading thousands of large files from remote file repositories where the remote files are immutable. Examples include operating system package repositories or ISO libraries.

Grab aims to provide robust, sane defaults. These are usually determined using the HTTP specifications, or by mimicking the behavior of common web clients like cURL, wget and common web browsers.

Grab aims to be stateless. The only state that exists is the remote files you wish to download and the local copy which may be completed, partially completed or not yet created. The advantage to this is that the local file system is not cluttered unnecessarily with addition state files (like a .crdownload file). The disadvantage of this approach is that grab must make assumptions about the local and remote state; specifically, that they have not been modified by another program.

If the local or remote file are modified outside of grab, and you download the file again with resuming enabled, the local file will likely become corrupted. In this case, you might consider making remote files immutable, or disabling resume.

Grab aims to enable best-in-class functionality for more complex features through extensible interfaces, rather than reimplementation. For example, you can provide your own Hash algorithm to compute file checksums, or your own rate limiter implementation (with all the associated trade-offs) to rate limit downloads.

Owner
Ryan Armstrong
Husbing, Father, Gopher, PE @ Facebook
Ryan Armstrong
Comments
  • Download is not resumed after killing the application with ctrl + c

    Download is not resumed after killing the application with ctrl + c

    I made an application that uses your lib and when I hit ctrl + c and then execute it again it starts downloading from 0. I'm downloading the same URL on the same path.

  • nil pointer dereference @ response.go:81

    nil pointer dereference @ response.go:81

    After update:

    panic: runtime error: invalid memory address or nil pointer dereference [signal 0xb code=0x1 addr=0x4 pc=0xc6974]

    goroutine 34 [running]: panic(0x346eb0, 0x1070a038) /root/.gvm/gos/go1.6/src/runtime/panic.go:464 +0x330 sync/atomic.loadUint64(0x1080a1d4, 0x0, 0x0) /root/.gvm/gos/go1.6/src/sync/atomic/64bit_arm.go:10 +0x54 github.com/cavaliercoder/grab.(_Response).BytesTransferred(0x1080a180, 0x4, 0x386878) /root/.gvm/pkgsets/go1.6/global/src/github.com/cavaliercoder/grab/response.go:81 +0x40 github.com/cavaliercoder/grab.(_Client).do(0x107b79c0, 0x107f80f0, 0x0, 0x0, 0x0) /root/.gvm/pkgsets/go1.6/global/src/github.com/cavaliercoder/grab/client.go:222 +0x3c0 github.com/cavaliercoder/grab.(_Client).DoAsync.func1(0x107b79c0, 0x107f80f0, 0x10802380) /root/.gvm/pkgsets/go1.6/global/src/github.com/cavaliercoder/grab/client.go:94 +0x24 created by github.com/cavaliercoder/grab.(_Client).DoAsync /root/.gvm/pkgsets/go1.6/global/src/github.com/cavaliercoder/grab/client.go:102 +0x60

  • readme: fix syntax error in example

    readme: fix syntax error in example

    grab.Response.Size is a int64 value and not a function

    I noticed this problem when testing the example in the readme.

    Hopefully this could help others...

    image

  • DeleteOnError does not seem to work (at least on Windows)

    DeleteOnError does not seem to work (at least on Windows)

    // create download request
    req, err := NewRequest("", "http://example.com/example.zip")
    if err != nil {
        panic(err)
    }
    
    // set request checksum
    sum, err := hex.DecodeString("33daf4c03f86120fdfdc66bddf6bfff4661c7ca11c5da473e537f4d69b470e57")
    if err != nil {
        panic(err)
    }
    req.SetChecksum(sha256.New(), sum, true)
    
    // download and validate file
    resp := DefaultClient.Do(req)
    if err := resp.Err(); err != nil {
        panic(err)
    }
    

    Although there is a checksum mismatch, the file will not be removed:

    panic: checksum mismatch
    
    goroutine 1 [running]:
    main.downloadOpenjdk()
            C:/Users/path/to/main.go:88 +0x1b6
    main.main()
            C:/Users/path/to/main.go:94 +0x2c
    exit status 2
    
    

    Apart from this, how to prevent that the file gets downloaded anyway if there is checksum mismatch?

    I think that one of the first improvements could be adding error handling to the os.remove snippet https://github.com/cavaliercoder/grab/blob/925bcfe56bc16868f1a398af4231cd4ffa07276f/client.go#L294 to ensure that at least an error message is returned. In my opinion error message should not be omitted.

  • Response from github is 403 when downloading a release file.

    Response from github is 403 when downloading a release file.

    Not able to download release artifact from GitHub.

    package main
    
    import (
    	"fmt"
    	"log"
    
    	"github.com/cavaliercoder/grab"
    )
    
    func main() {
    	client := grab.NewClient()
    	req, err := grab.NewRequest("", "https://github.com/minishift/minishift-centos-iso/releases/download/v1.12.0/minishift-centos7.iso")
    	if err != nil {
    		log.Fatal(err)
    	}
    	resp := client.Do(req)
    	fmt.Printf("Response is: %v\n", resp.HTTPResponse.Status)
    }
    

    Unexpected one.

    ==== Output ====
    $ go run test.go
    Response is: 403 Forbidden
    
  • Synchronized access to Response.transfer and Response.bytesResumed

    Synchronized access to Response.transfer and Response.bytesResumed

    I've been experiencing "race condition detected" errors during tests, I'm displaying a progress bar during download (using another goroutine), that's why those fields needs to be protected for concurrent access.

  • Corrupted contents when downloading on top of wrong file

    Corrupted contents when downloading on top of wrong file

    Grab looked like a good fit for a project I'm working on so I gave it a spin. I found that it downloaded a file perfectly and when asked to download the same file again managed to avoid downloading all the bytes again, which was just what I was looking for.

    I then overwrote the downloaded file with completely different contents and then downloaded again using grab. The message:

      206 Partial Content
    

    was emitted and the download was apparently successful. The downloaded file even had the same number of bytes as the original, but unfortunately the contents were corrupted.

    Fortunately, this problem is easily reproduced using the example program in the README:

    $ go run main.go
    Downloading http://www.golang-book.com/public/pdf/gobook.pdf...
      200 OK
    Download saved to ./gobook.pdf
    $ mv gobook.pdf gobook.pdf.good
    $ cp main.go gobook.pdf
    $ go run main.go
    Downloading http://www.golang-book.com/public/pdf/gobook.pdf...
      206 Partial Content
    Download saved to ./gobook.pdf
    $ diff gobook.pdf gobook.pdf.good
    Binary files gobook.pdf and gobook.pdf.good differ
    $ ls -l
    total 11320
    -rw-r--r--  1 gnormington  staff  2893557  1 Jan  1970 gobook.pdf
    -rw-r--r--  1 gnormington  staff  2893557  1 Jan  1970 gobook.pdf.good
    -rw-r--r--  1 gnormington  staff     1139  3 Nov 11:10 main.go
    

    The environment is go version go1.9.2 darwin/amd64 on macOS 10.13.1.

    In case the README changes, the contents of main.go above is:

    package main
    
    import (
    	"fmt"
    	"os"
    	"time"
    
    	"github.com/cavaliercoder/grab"
    )
    
    func main() {
    	// create client
    	client := grab.NewClient()
    	req, _ := grab.NewRequest(".", "http://www.golang-book.com/public/pdf/gobook.pdf")
    
    	// start download
    	fmt.Printf("Downloading %v...\n", req.URL())
    	resp := client.Do(req)
    	fmt.Printf("  %v\n", resp.HTTPResponse.Status)
    
    	// start UI loop
    	t := time.NewTicker(500 * time.Millisecond)
    	defer t.Stop()
    
    Loop:
    	for {
    		select {
    		case <-t.C:
    			fmt.Printf("  transferred %v / %v bytes (%.2f%%)\n",
    				resp.BytesComplete(),
    				resp.Size,
    				100*resp.Progress())
    
    		case <-resp.Done:
    			// download is complete
    			break Loop
    		}
    	}
    
    	// check for errors
    	if err := resp.Err(); err != nil {
    		fmt.Fprintf(os.Stderr, "Download failed: %v\n", err)
    		os.Exit(1)
    	}
    
    	fmt.Printf("Download saved to ./%v \n", resp.Filename)
    
    	// Output:
    	// Downloading http://www.golang-book.com/public/pdf/gobook.pdf...
    	//   200 OK
    	//   transferred 42970 / 2893557 bytes (1.49%)
    	//   transferred 1207474 / 2893557 bytes (41.73%)
    	//   transferred 2758210 / 2893557 bytes (95.32%)
    	// Download saved to ./gobook.pdf
    }
    
  • Make buffer size configurable

    Make buffer size configurable

    Hey im downloading to an external hard drive and with the default buffer of 4096 bytes I'm getting really poor performance <1M/s. I manually increased the buffer size to 4096*1024 bytes and now im getting 6.2M/s (which is the maximum my internet connection offers).

  • Calling CancelFunc in context doesn't cancel download

    Calling CancelFunc in context doesn't cancel download

    @oliverpool As per discussed in PR #73, creating a request with req.WithContext(ctx) that uses a context created from context.WithCancel and calling the returned CancelFunc doesn't stop the file from downloading.

    The file being downloaded is around 160mb, which takes ~18s to download with 100Mbps connection.

    Network activity can still be observed after calling CancelFunc, and the code doesn't return context.ErrCanceled.

  • Data race

    Data race

    I just ran the example on the start page with the race detector.

    Downloading http://www.golang-book.com/public/pdf/gobook.pdf...
      200 OK
    ==================
    WARNING: DATA RACE
    Read at 0x00c420136298 by main goroutine:
      github.com/cavaliercoder/grab.(*Response).BytesComplete()
          /home/rkaufmann/go/src/github.com/cavaliercoder/grab/response.go:133 +0x43
      main.main()
          /home/rkaufmann/Downloads/grab.go:30 +0x442
    
    Previous write at 0x00c420136298 by goroutine 14:
      [failed to restore the stack]
    
    Goroutine 14 (running) created at:
      github.com/cavaliercoder/grab.(*Client).Do()
          /home/rkaufmann/go/src/github.com/cavaliercoder/grab/client.go:81 +0x451
      main.main()
          /home/rkaufmann/Downloads/grab.go:18 +0x325
    ==================
    ==================
    WARNING: DATA RACE
    Write at 0x00c420076180 by main goroutine:
      sync/atomic.CompareAndSwapInt32()
          /usr/local/go/src/runtime/race_amd64.s:293 +0xb
      sync.(*Mutex).Lock()
          /usr/local/go/src/sync/mutex.go:74 +0x4d
      github.com/cavaliercoder/grab.(*transfer).N()
          /home/rkaufmann/go/src/github.com/cavaliercoder/grab/transfer.go:74 +0x4a
      github.com/cavaliercoder/grab.(*Response).BytesComplete()
          /home/rkaufmann/go/src/github.com/cavaliercoder/grab/response.go:133 +0x58
      main.main()
          /home/rkaufmann/Downloads/grab.go:30 +0x442
    
    Previous write at 0x00c420076180 by goroutine 14:
      [failed to restore the stack]
    
    Goroutine 14 (running) created at:
      github.com/cavaliercoder/grab.(*Client).Do()
          /home/rkaufmann/go/src/github.com/cavaliercoder/grab/client.go:81 +0x451
      main.main()
          /home/rkaufmann/Downloads/grab.go:18 +0x325
    ==================
      transferred 1156706 / 0 bytes 2893557 bps (39.98%)
      transferred 2617418 / 0 bytes 2893557 bps (90.46%)
    Download saved to ./gobook.pdf
    Found 2 data race(s)
    exit status 66
    
    
  • Downloading Text file gives an issue while in resume mode.

    Downloading Text file gives an issue while in resume mode.

    I am trying to to download text file form a URL.

    Grab downloads it perfectly first time.

    But if I make some small change in source text file and try to download again; Grab append that change to end of a file. now downloaded text file is of a no use.

    Can you please help in this regards?

  • Fix head break connection

    Fix head break connection

    Problem

    I tried to use grab and tested download from server https://speed.hetzner.de When I retried download to partially saved file grab failed with nil response and error Head EOF

    I've researched the source of the problem. Remote server breaks the connection on HEAD request, but works good at GET request. So, I think that in this case grab should not fail after HEAD and go further and try GET.

    This PR fixes this.

    Solution

    Test

    I've added test that emulate HEAD request breaking connection WithHeadRequestBreak.

    Fix

    In headRequest State Function in branch with response error grab now does not closing the response but go to GET request.

    client.Do(req) returns only nil response along with error, so there is no need to close nil response.

    Other fixes

    I've added one more commit to PR with linter fixes and update of the Go version.

  • Update project on golang.org

    Update project on golang.org

    Hi!

    I can't update my grab (v2) module to v3 because of https://pkg.go.dev/github.com/cavaliercoder/grab has only Version: v2.0.0+incompatible Latest

    Could you update it please?

  •  http: ContentLength=111 with Body length 0

    http: ContentLength=111 with Body length 0

    I'm receiving an error while trying to download a file using POST, I've successfully done this with regular net/http. But the net/http is having issues with larger files, which is why I'm trying to move to this package. The error I'm receiving is on a file I was able to download successfully with net/http. I say another stackoverflow discussion that seems similar to my issue, but for another package, but not sure how to solve it in this package. Any help is appreciated. Link to other issue: https://stackoverflow.com/questions/52429036/getting-error-on-put-body-length-0-using-net-http

    The error: http: ContentLength=111 with Body length 0

    My code:

    	transport := &http.Transport{
    		TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
    	}
    
    	client := &http.Client{
    		Transport: transport,
    	}
    
    
    	site := fmt.Sprintf("https://%s/seg/api/v3/matrix/data/0/services-export", FSApplianceFQDN)
    	method := "POST"
    
    	//payload := strings.NewReader(`{"srcZoneId":"g_8973766297000773843","dstZoneId":"g_3554460426726078343","shouldOnlyShowPolicyViolation":false}`)
    	payloadFormat := fmt.Sprintf(`{"srcZoneId":"%s","dstZoneId":"%s","shouldOnlyShowPolicyViolation":false}`, SRCZone, DSTZone)
    	payload := strings.NewReader(payloadFormat)
    	req, err := http.NewRequest(method, site, payload)
    
    	if err != nil {
    		fmt.Println(err)
    		return
    	}
    	req.Header.Add("Host", FSApplianceFQDN)
    	req.Header.Add("Accept", "application/json, text/plain, */*")
    	req.Header.Add("Sec-Ch-Ua-Mobile", "?0")
    	req.Header.Add("Content-Type", "application/json;charset=UTF-8")
    	req.Header.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36")
    	req.Header.Add("Sec-Fetch-Site", "same-origin")
    	req.Header.Add("Sec-Fetch-Mode", "cors")
    	req.Header.Add("Sec-Fetch-Dest", "empty")
    	req.Header.Set("referer", fmt.Sprintf("https://%s/forescout-client/", FSApplianceFQDN))
    	req.Header.Add("Accept-Encoding", "gzip, deflate")
    	req.Header.Add("Accept-Language", "en-US,en;q=0.9")
    	req.Header.Add("Connection", "close")
    	user := fmt.Sprintf("%%22%s%%22", FSusername)
    	Cookies := fmt.Sprintf("JSESSIONID=%v; user=%v; XSRF-TOKEN=%v", JSESSIONID, user, XSRFTOKEN)
    	req.Header.Set("Cookie", Cookies)
    	req.Header.Set("X-Xsrf-Token", XSRFTOKEN)
    
    
    	grabclient := &grab.Client{HTTPClient: client}
    	grabreq := &grab.Request{HTTPRequest: req}
    	filedownload := grabclient.Do(grabreq)
    	if err := filedownload.Err(); err != nil {
    		log.Fatal(err)
    	}
    
  • cannot call non-function resp.Size (type int64)

    cannot call non-function resp.Size (type int64)

    Apologies if this is some rookie mistake - I'm fairly new to Go :-)

    When trying the basic grab example (straight from the homepage), I'm getting an error: cannot call non-function resp.Size (type int64)

    This is with Go 1.16.3 (reproducible both on MacOS and Raspbian; go env output attached). go mod reports github.com/cavaliercoder/[email protected]+incompatible.

    Any help would be greatly appreciated.

    goenv.txt main.go.txt

  • rate limiter is not accurate

    rate limiter is not accurate

    Time to wait should be expected download chunk time minus actual download chunk time, and then plus last sleep adjust time for compensation. Actual sleep time is not equal to desired sleep time exactly sometimes.

A C/S Tool to Download Torrent Remotely and Retrieve Files Back Over HTTP at Full Speed without ISP Torrent Limitation.

remote-torrent Download Torrent Remotely and Retrieve Files Over HTTP at Full Speed without ISP Torrent Limitation. This repository is an extension to

Sep 30, 2022
Download movie from YTS
Download movie from YTS

Torrent-Box Download movie form YTS without visiting to YTS built top on anacrolix/torrent lib. Motivation Most of the times, We are lazy people; to s

Nov 14, 2022
A quick and dirty but useful tool to download each text/html page from the wayback machine for a specific domain and search for keywords within the saved content

wayback-keyword-search A quick and dirty but useful tool to download each text/html page from the wayback machine for a specific domain and search for

Dec 2, 2022
Tiny utility to download file from GitHub

gget I needed a way to download file from GitHub without going to its raw version and I made this tiny program. If there is a way to do it with a cURL

Apr 16, 2022
🍔 Product-storage service, work on gRPC. Client sends the URL to download products, and requests the result.

?? Product-storage service, work on gRPC. Client sends the URL to download products, and requests the result. The server transfer request to a third-party resource for .csv-file uploading and saves the products to own database.

Dec 16, 2021
SSH file upload/download tool in weak network environment

RSCP 弱网环境下的ssh文件上传/下载工具 SSH file upload/download tool in weak network environment USAGE -b int each block size (bytes) -c string c

Jan 14, 2022
Moviefetch: a simple program to search and download for movies from websites like 1337x and then stream them

MovieFetch Disclaimer I am NOT responisble for any legal issues or other you enc

Dec 2, 2022
Package manager for minecraft servers

KoperManager Package manager for minecraft servers Install minecraft server software and plugins in 1 click command Setup server ./koper_manager setup

Dec 23, 2021
Protobuf files manager

Prot - protobuf files manager. It application can help your manage protobuf files and generate code based on him. !!! Before use Prot you must install

Jun 22, 2022
A flexible configuration manager for Wireguard networks
A flexible configuration manager for Wireguard networks

Drago A flexible configuration manager for WireGuard networks Drago is a flexible configuration manager for WireGuard networks which is designed to ma

Jan 7, 2023
High-performance PHP application server, load-balancer and process manager written in Golang
High-performance PHP application server, load-balancer and process manager written in Golang

RoadRunner is an open-source (MIT licensed) high-performance PHP application server, load balancer, and process manager. It supports running as a serv

Jan 1, 2023
A Fyne login manager for linux desktop computers
A Fyne login manager for linux desktop computers

Fin, the Fyne Login Manager This app is in it's very early stages and has only been tested with pam and systemd to log in users with a .xinitrc file.

Oct 12, 2022
A Wireguard VPN Server Manager and API to add and remove clients

Wireguard Manager And API A manager and API to add, remove clients as well as other features such as an auto reapplier which deletes and adds back a c

Dec 22, 2022
High-performance PHP application server, load-balancer and process manager written in Golang
High-performance PHP application server, load-balancer and process manager written in Golang

RoadRunner is an open-source (MIT licensed) high-performance PHP application server, load balancer, and process manager. It supports running as a serv

Dec 9, 2021
Events - Event Manager - Nodejs like

events Event Manager - Nodejs like Please take a look at the TESTS, for further comprehension. Example package main import ( "errors" "fmt" "log"

Dec 31, 2021
Proxima is the only reasonable proxy manager

Proxima: a Proxy Manager built in Go, configured in Prolog. What is this? Proxima is the only reasonable proxy manager. You can fine tune how your sys

Jan 22, 2022
Fetch-npm-package - A small utility that can be used to fetch a given version of a NPM package

Use fetch-npm-package <package> <version> <output-dir> E.g. fetch-npm-package is

May 21, 2022
Package arp implements the ARP protocol, as described in RFC 826. MIT Licensed.

arp Package arp implements the ARP protocol, as described in RFC 826. MIT Licensed. Portions of this code are taken from the Go standard library. The

Dec 20, 2022
Package dhcp6 implements a DHCPv6 server, as described in RFC 3315. MIT Licensed.

dhcp6 Package dhcp6 implements a DHCPv6 server, as described in IETF RFC 3315. MIT Licensed. At this time, the API is not stable, and may change over

Sep 27, 2022