A download manager package for Go

grab

GoDoc Build Status Go Report Card

Downloading the internet, one goroutine at a time!

$ go get github.com/cavaliercoder/grab

Grab is a Go package for downloading files from the internet with the following rad features:

  • Monitor download progress concurrently
  • Auto-resume incomplete downloads
  • Guess filename from content header or URL path
  • Safely cancel downloads using context.Context
  • Validate downloads using checksums
  • Download batches of files concurrently
  • Apply rate limiters

Requires Go v1.7+

Example

The following example downloads a PDF copy of the free eBook, "An Introduction to Programming in Go" into the current working directory.

resp, err := grab.Get(".", "http://www.golang-book.com/public/pdf/gobook.pdf")
if err != nil {
	log.Fatal(err)
}

fmt.Println("Download saved to", resp.Filename)

The following, more complete example allows for more granular control and periodically prints the download progress until it is complete.

The second time you run the example, it will auto-resume the previous download and exit sooner.

package main

import (
	"fmt"
	"os"
	"time"

	"github.com/cavaliercoder/grab"
)

func main() {
	// create client
	client := grab.NewClient()
	req, _ := grab.NewRequest(".", "http://www.golang-book.com/public/pdf/gobook.pdf")

	// start download
	fmt.Printf("Downloading %v...\n", req.URL())
	resp := client.Do(req)
	fmt.Printf("  %v\n", resp.HTTPResponse.Status)

	// start UI loop
	t := time.NewTicker(500 * time.Millisecond)
	defer t.Stop()

Loop:
	for {
		select {
		case <-t.C:
			fmt.Printf("  transferred %v / %v bytes (%.2f%%)\n",
				resp.BytesComplete(),
				resp.Size,
				100*resp.Progress())

		case <-resp.Done:
			// download is complete
			break Loop
		}
	}

	// check for errors
	if err := resp.Err(); err != nil {
		fmt.Fprintf(os.Stderr, "Download failed: %v\n", err)
		os.Exit(1)
	}

	fmt.Printf("Download saved to ./%v \n", resp.Filename)

	// Output:
	// Downloading http://www.golang-book.com/public/pdf/gobook.pdf...
	//   200 OK
	//   transferred 42970 / 2893557 bytes (1.49%)
	//   transferred 1207474 / 2893557 bytes (41.73%)
	//   transferred 2758210 / 2893557 bytes (95.32%)
	// Download saved to ./gobook.pdf
}

Design trade-offs

The primary use case for Grab is to concurrently downloading thousands of large files from remote file repositories where the remote files are immutable. Examples include operating system package repositories or ISO libraries.

Grab aims to provide robust, sane defaults. These are usually determined using the HTTP specifications, or by mimicking the behavior of common web clients like cURL, wget and common web browsers.

Grab aims to be stateless. The only state that exists is the remote files you wish to download and the local copy which may be completed, partially completed or not yet created. The advantage to this is that the local file system is not cluttered unnecessarily with addition state files (like a .crdownload file). The disadvantage of this approach is that grab must make assumptions about the local and remote state; specifically, that they have not been modified by another program.

If the local or remote file are modified outside of grab, and you download the file again with resuming enabled, the local file will likely become corrupted. In this case, you might consider making remote files immutable, or disabling resume.

Grab aims to enable best-in-class functionality for more complex features through extensible interfaces, rather than reimplementation. For example, you can provide your own Hash algorithm to compute file checksums, or your own rate limiter implementation (with all the associated trade-offs) to rate limit downloads.

Owner
The Cavalier Gopher
A collection of Go modules originally authored by Ryan Armstrong
The Cavalier Gopher
Comments
  • Download is not resumed after killing the application with ctrl + c

    Download is not resumed after killing the application with ctrl + c

    I made an application that uses your lib and when I hit ctrl + c and then execute it again it starts downloading from 0. I'm downloading the same URL on the same path.

  • nil pointer dereference @ response.go:81

    nil pointer dereference @ response.go:81

    After update:

    panic: runtime error: invalid memory address or nil pointer dereference [signal 0xb code=0x1 addr=0x4 pc=0xc6974]

    goroutine 34 [running]: panic(0x346eb0, 0x1070a038) /root/.gvm/gos/go1.6/src/runtime/panic.go:464 +0x330 sync/atomic.loadUint64(0x1080a1d4, 0x0, 0x0) /root/.gvm/gos/go1.6/src/sync/atomic/64bit_arm.go:10 +0x54 github.com/cavaliercoder/grab.(_Response).BytesTransferred(0x1080a180, 0x4, 0x386878) /root/.gvm/pkgsets/go1.6/global/src/github.com/cavaliercoder/grab/response.go:81 +0x40 github.com/cavaliercoder/grab.(_Client).do(0x107b79c0, 0x107f80f0, 0x0, 0x0, 0x0) /root/.gvm/pkgsets/go1.6/global/src/github.com/cavaliercoder/grab/client.go:222 +0x3c0 github.com/cavaliercoder/grab.(_Client).DoAsync.func1(0x107b79c0, 0x107f80f0, 0x10802380) /root/.gvm/pkgsets/go1.6/global/src/github.com/cavaliercoder/grab/client.go:94 +0x24 created by github.com/cavaliercoder/grab.(_Client).DoAsync /root/.gvm/pkgsets/go1.6/global/src/github.com/cavaliercoder/grab/client.go:102 +0x60

  • readme: fix syntax error in example

    readme: fix syntax error in example

    grab.Response.Size is a int64 value and not a function

    I noticed this problem when testing the example in the readme.

    Hopefully this could help others...

    image

  • DeleteOnError does not seem to work (at least on Windows)

    DeleteOnError does not seem to work (at least on Windows)

    // create download request
    req, err := NewRequest("", "http://example.com/example.zip")
    if err != nil {
        panic(err)
    }
    
    // set request checksum
    sum, err := hex.DecodeString("33daf4c03f86120fdfdc66bddf6bfff4661c7ca11c5da473e537f4d69b470e57")
    if err != nil {
        panic(err)
    }
    req.SetChecksum(sha256.New(), sum, true)
    
    // download and validate file
    resp := DefaultClient.Do(req)
    if err := resp.Err(); err != nil {
        panic(err)
    }
    

    Although there is a checksum mismatch, the file will not be removed:

    panic: checksum mismatch
    
    goroutine 1 [running]:
    main.downloadOpenjdk()
            C:/Users/path/to/main.go:88 +0x1b6
    main.main()
            C:/Users/path/to/main.go:94 +0x2c
    exit status 2
    
    

    Apart from this, how to prevent that the file gets downloaded anyway if there is checksum mismatch?

    I think that one of the first improvements could be adding error handling to the os.remove snippet https://github.com/cavaliercoder/grab/blob/925bcfe56bc16868f1a398af4231cd4ffa07276f/client.go#L294 to ensure that at least an error message is returned. In my opinion error message should not be omitted.

  • Response from github is 403 when downloading a release file.

    Response from github is 403 when downloading a release file.

    Not able to download release artifact from GitHub.

    package main
    
    import (
    	"fmt"
    	"log"
    
    	"github.com/cavaliercoder/grab"
    )
    
    func main() {
    	client := grab.NewClient()
    	req, err := grab.NewRequest("", "https://github.com/minishift/minishift-centos-iso/releases/download/v1.12.0/minishift-centos7.iso")
    	if err != nil {
    		log.Fatal(err)
    	}
    	resp := client.Do(req)
    	fmt.Printf("Response is: %v\n", resp.HTTPResponse.Status)
    }
    

    Unexpected one.

    ==== Output ====
    $ go run test.go
    Response is: 403 Forbidden
    
  • Synchronized access to Response.transfer and Response.bytesResumed

    Synchronized access to Response.transfer and Response.bytesResumed

    I've been experiencing "race condition detected" errors during tests, I'm displaying a progress bar during download (using another goroutine), that's why those fields needs to be protected for concurrent access.

  • Corrupted contents when downloading on top of wrong file

    Corrupted contents when downloading on top of wrong file

    Grab looked like a good fit for a project I'm working on so I gave it a spin. I found that it downloaded a file perfectly and when asked to download the same file again managed to avoid downloading all the bytes again, which was just what I was looking for.

    I then overwrote the downloaded file with completely different contents and then downloaded again using grab. The message:

      206 Partial Content
    

    was emitted and the download was apparently successful. The downloaded file even had the same number of bytes as the original, but unfortunately the contents were corrupted.

    Fortunately, this problem is easily reproduced using the example program in the README:

    $ go run main.go
    Downloading http://www.golang-book.com/public/pdf/gobook.pdf...
      200 OK
    Download saved to ./gobook.pdf
    $ mv gobook.pdf gobook.pdf.good
    $ cp main.go gobook.pdf
    $ go run main.go
    Downloading http://www.golang-book.com/public/pdf/gobook.pdf...
      206 Partial Content
    Download saved to ./gobook.pdf
    $ diff gobook.pdf gobook.pdf.good
    Binary files gobook.pdf and gobook.pdf.good differ
    $ ls -l
    total 11320
    -rw-r--r--  1 gnormington  staff  2893557  1 Jan  1970 gobook.pdf
    -rw-r--r--  1 gnormington  staff  2893557  1 Jan  1970 gobook.pdf.good
    -rw-r--r--  1 gnormington  staff     1139  3 Nov 11:10 main.go
    

    The environment is go version go1.9.2 darwin/amd64 on macOS 10.13.1.

    In case the README changes, the contents of main.go above is:

    package main
    
    import (
    	"fmt"
    	"os"
    	"time"
    
    	"github.com/cavaliercoder/grab"
    )
    
    func main() {
    	// create client
    	client := grab.NewClient()
    	req, _ := grab.NewRequest(".", "http://www.golang-book.com/public/pdf/gobook.pdf")
    
    	// start download
    	fmt.Printf("Downloading %v...\n", req.URL())
    	resp := client.Do(req)
    	fmt.Printf("  %v\n", resp.HTTPResponse.Status)
    
    	// start UI loop
    	t := time.NewTicker(500 * time.Millisecond)
    	defer t.Stop()
    
    Loop:
    	for {
    		select {
    		case <-t.C:
    			fmt.Printf("  transferred %v / %v bytes (%.2f%%)\n",
    				resp.BytesComplete(),
    				resp.Size,
    				100*resp.Progress())
    
    		case <-resp.Done:
    			// download is complete
    			break Loop
    		}
    	}
    
    	// check for errors
    	if err := resp.Err(); err != nil {
    		fmt.Fprintf(os.Stderr, "Download failed: %v\n", err)
    		os.Exit(1)
    	}
    
    	fmt.Printf("Download saved to ./%v \n", resp.Filename)
    
    	// Output:
    	// Downloading http://www.golang-book.com/public/pdf/gobook.pdf...
    	//   200 OK
    	//   transferred 42970 / 2893557 bytes (1.49%)
    	//   transferred 1207474 / 2893557 bytes (41.73%)
    	//   transferred 2758210 / 2893557 bytes (95.32%)
    	// Download saved to ./gobook.pdf
    }
    
  • Make buffer size configurable

    Make buffer size configurable

    Hey im downloading to an external hard drive and with the default buffer of 4096 bytes I'm getting really poor performance <1M/s. I manually increased the buffer size to 4096*1024 bytes and now im getting 6.2M/s (which is the maximum my internet connection offers).

  • Calling CancelFunc in context doesn't cancel download

    Calling CancelFunc in context doesn't cancel download

    @oliverpool As per discussed in PR #73, creating a request with req.WithContext(ctx) that uses a context created from context.WithCancel and calling the returned CancelFunc doesn't stop the file from downloading.

    The file being downloaded is around 160mb, which takes ~18s to download with 100Mbps connection.

    Network activity can still be observed after calling CancelFunc, and the code doesn't return context.ErrCanceled.

  • Data race

    Data race

    I just ran the example on the start page with the race detector.

    Downloading http://www.golang-book.com/public/pdf/gobook.pdf...
      200 OK
    ==================
    WARNING: DATA RACE
    Read at 0x00c420136298 by main goroutine:
      github.com/cavaliercoder/grab.(*Response).BytesComplete()
          /home/rkaufmann/go/src/github.com/cavaliercoder/grab/response.go:133 +0x43
      main.main()
          /home/rkaufmann/Downloads/grab.go:30 +0x442
    
    Previous write at 0x00c420136298 by goroutine 14:
      [failed to restore the stack]
    
    Goroutine 14 (running) created at:
      github.com/cavaliercoder/grab.(*Client).Do()
          /home/rkaufmann/go/src/github.com/cavaliercoder/grab/client.go:81 +0x451
      main.main()
          /home/rkaufmann/Downloads/grab.go:18 +0x325
    ==================
    ==================
    WARNING: DATA RACE
    Write at 0x00c420076180 by main goroutine:
      sync/atomic.CompareAndSwapInt32()
          /usr/local/go/src/runtime/race_amd64.s:293 +0xb
      sync.(*Mutex).Lock()
          /usr/local/go/src/sync/mutex.go:74 +0x4d
      github.com/cavaliercoder/grab.(*transfer).N()
          /home/rkaufmann/go/src/github.com/cavaliercoder/grab/transfer.go:74 +0x4a
      github.com/cavaliercoder/grab.(*Response).BytesComplete()
          /home/rkaufmann/go/src/github.com/cavaliercoder/grab/response.go:133 +0x58
      main.main()
          /home/rkaufmann/Downloads/grab.go:30 +0x442
    
    Previous write at 0x00c420076180 by goroutine 14:
      [failed to restore the stack]
    
    Goroutine 14 (running) created at:
      github.com/cavaliercoder/grab.(*Client).Do()
          /home/rkaufmann/go/src/github.com/cavaliercoder/grab/client.go:81 +0x451
      main.main()
          /home/rkaufmann/Downloads/grab.go:18 +0x325
    ==================
      transferred 1156706 / 0 bytes 2893557 bps (39.98%)
      transferred 2617418 / 0 bytes 2893557 bps (90.46%)
    Download saved to ./gobook.pdf
    Found 2 data race(s)
    exit status 66
    
    
  • Downloading Text file gives an issue while in resume mode.

    Downloading Text file gives an issue while in resume mode.

    I am trying to to download text file form a URL.

    Grab downloads it perfectly first time.

    But if I make some small change in source text file and try to download again; Grab append that change to end of a file. now downloaded text file is of a no use.

    Can you please help in this regards?

  • Fix head break connection

    Fix head break connection

    Problem

    I tried to use grab and tested download from server https://speed.hetzner.de When I retried download to partially saved file grab failed with nil response and error Head EOF

    I've researched the source of the problem. Remote server breaks the connection on HEAD request, but works good at GET request. So, I think that in this case grab should not fail after HEAD and go further and try GET.

    This PR fixes this.

    Solution

    Test

    I've added test that emulate HEAD request breaking connection WithHeadRequestBreak.

    Fix

    In headRequest State Function in branch with response error grab now does not closing the response but go to GET request.

    client.Do(req) returns only nil response along with error, so there is no need to close nil response.

    Other fixes

    I've added one more commit to PR with linter fixes and update of the Go version.

  • Update project on golang.org

    Update project on golang.org

    Hi!

    I can't update my grab (v2) module to v3 because of https://pkg.go.dev/github.com/cavaliercoder/grab has only Version: v2.0.0+incompatible Latest

    Could you update it please?

  •  http: ContentLength=111 with Body length 0

    http: ContentLength=111 with Body length 0

    I'm receiving an error while trying to download a file using POST, I've successfully done this with regular net/http. But the net/http is having issues with larger files, which is why I'm trying to move to this package. The error I'm receiving is on a file I was able to download successfully with net/http. I say another stackoverflow discussion that seems similar to my issue, but for another package, but not sure how to solve it in this package. Any help is appreciated. Link to other issue: https://stackoverflow.com/questions/52429036/getting-error-on-put-body-length-0-using-net-http

    The error: http: ContentLength=111 with Body length 0

    My code:

    	transport := &http.Transport{
    		TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
    	}
    
    	client := &http.Client{
    		Transport: transport,
    	}
    
    
    	site := fmt.Sprintf("https://%s/seg/api/v3/matrix/data/0/services-export", FSApplianceFQDN)
    	method := "POST"
    
    	//payload := strings.NewReader(`{"srcZoneId":"g_8973766297000773843","dstZoneId":"g_3554460426726078343","shouldOnlyShowPolicyViolation":false}`)
    	payloadFormat := fmt.Sprintf(`{"srcZoneId":"%s","dstZoneId":"%s","shouldOnlyShowPolicyViolation":false}`, SRCZone, DSTZone)
    	payload := strings.NewReader(payloadFormat)
    	req, err := http.NewRequest(method, site, payload)
    
    	if err != nil {
    		fmt.Println(err)
    		return
    	}
    	req.Header.Add("Host", FSApplianceFQDN)
    	req.Header.Add("Accept", "application/json, text/plain, */*")
    	req.Header.Add("Sec-Ch-Ua-Mobile", "?0")
    	req.Header.Add("Content-Type", "application/json;charset=UTF-8")
    	req.Header.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36")
    	req.Header.Add("Sec-Fetch-Site", "same-origin")
    	req.Header.Add("Sec-Fetch-Mode", "cors")
    	req.Header.Add("Sec-Fetch-Dest", "empty")
    	req.Header.Set("referer", fmt.Sprintf("https://%s/forescout-client/", FSApplianceFQDN))
    	req.Header.Add("Accept-Encoding", "gzip, deflate")
    	req.Header.Add("Accept-Language", "en-US,en;q=0.9")
    	req.Header.Add("Connection", "close")
    	user := fmt.Sprintf("%%22%s%%22", FSusername)
    	Cookies := fmt.Sprintf("JSESSIONID=%v; user=%v; XSRF-TOKEN=%v", JSESSIONID, user, XSRFTOKEN)
    	req.Header.Set("Cookie", Cookies)
    	req.Header.Set("X-Xsrf-Token", XSRFTOKEN)
    
    
    	grabclient := &grab.Client{HTTPClient: client}
    	grabreq := &grab.Request{HTTPRequest: req}
    	filedownload := grabclient.Do(grabreq)
    	if err := filedownload.Err(); err != nil {
    		log.Fatal(err)
    	}
    
  • cannot call non-function resp.Size (type int64)

    cannot call non-function resp.Size (type int64)

    Apologies if this is some rookie mistake - I'm fairly new to Go :-)

    When trying the basic grab example (straight from the homepage), I'm getting an error: cannot call non-function resp.Size (type int64)

    This is with Go 1.16.3 (reproducible both on MacOS and Raspbian; go env output attached). go mod reports github.com/cavaliercoder/[email protected]+incompatible.

    Any help would be greatly appreciated.

    goenv.txt main.go.txt

  • rate limiter is not accurate

    rate limiter is not accurate

    Time to wait should be expected download chunk time minus actual download chunk time, and then plus last sleep adjust time for compensation. Actual sleep time is not equal to desired sleep time exactly sometimes.

Shell script to download and set GO environmental paths to allow multiple versions.
Shell script to download and set GO environmental paths to allow multiple versions.

gobrew gobrew lets you easily switch between multiple versions of go. It is based on rbenv and pyenv. Installation The automatic installer You can ins

Nov 3, 2022
Download your Fitbit weight history and connect to InfluxDB and Grafana

WemonFit Weight monitoring for Fitbit, using InfluxDB and Grafana Generating a new certificate openssl req -new -newkey rsa:2048 -nodes -keyout lo

Oct 22, 2022
A package manager written in Go which uses the LFS Symlink method.

pacsym A package manager powered by symlinks. How to use The package manager assumes that all software installed is installed with /usr/pkg/<packagena

Dec 11, 2021
A Golang package for simplifying storing configuration in the OS-provided secret manager.

go-keyconfig A Golang package for simplifying storing configuration in the OS-provided secret manager. Operating System Support OS Secret Manager MacO

Jul 22, 2022
The missing package manager for golang binaries (its homebrew for "go install")

Bingo: The missing package manager for golang binaries (its homebrew for "go install") Do you love the simplicity of being able to download & compile

Oct 31, 2022
Io's package manager
Io's package manager

Amirani Io's package manager Contributors ✨ Thanks goes to these wonderful people (emoji key): an aspirin ?? ?? ?? ?? This project follows the all-con

Feb 20, 2022
📦 Package manager for CLI

AFX - Package manager for CLI AFX is a package manager for command-line tools and shell plugins. afx can allow us to manage almost all things availabl

Dec 2, 2022
Go Version Manager

gvm By Josh Bussdieker (jbuss, jaja, jbussdieker) while working at Moovweb Currently lovingly maintained by Benjamin Knigge Pull requests and other an

Jan 2, 2023
A simple and powerful SSH keys manager
A simple and powerful SSH keys manager

SKM is a simple and powerful SSH Keys Manager. It helps you to manage your multiple SSH keys easily! Features Create, List, Delete your SSH key(s) Man

Dec 17, 2022
Go version manager. Super simple tool to install and manage Go versions. Install go without root. Gobrew doesn't require shell rehash.

gobrew Go version manager Install or update With curl $ curl -sLk https://git.io/gobrew | sh - or with go $ go get -u github.com/kevincobain2000/gobre

Jan 5, 2023
Kubernetes Lazy User Manager

klum - Kubernetes Lazy User Manager klum does the following basic tasks: Create/Delete/Modify users Easily manage roles associated with users Issues k

Dec 6, 2022
network-node-manager is a kubernetes controller that controls the network configuration of a node to resolve network issues of kubernetes.
network-node-manager is a kubernetes controller that controls the network configuration of a node to resolve network issues of kubernetes.

Network Node Manager network-node-manager is a kubernetes controller that controls the network configuration of a node to resolve network issues of ku

Dec 18, 2022
The smart virtual machines manager. A modern CLI for Vagrant Boxes.
The smart virtual machines manager.  A modern CLI for Vagrant Boxes.

The smart virtual machines manager Table of Contents: What is Vermin Install Vermin Usage Contributors TODO What is Vermin Vermin is a smart, simple a

Dec 22, 2022
Grafana Dashboard Manager

Grafana dash-n-grab Grafana Dash-n-Grab (GDG) -- Dashboard/DataSource Manager. The purpose of this project is to provide an easy to use CLI to interac

Dec 31, 2022
operator to install cluster manager and klusterlet.

registration-operator Minimum cluster registration and work Community, discussion, contribution, and support Check the CONTRIBUTING Doc for how to con

Dec 14, 2022
The Scylla Manager.

Scylla Manager Welcome to Scylla Manager repository! Scylla Manager user docs can be found here. Scylla Manager consists of tree components: a server

Jan 4, 2023
This manager helps handle the life cycle of your eBPF programs

eBPF Manager This repository implements a manager on top of Cilium's eBPF library. This declarative manager simplifies attaching and detaching eBPF pr

Dec 1, 2022
GO ABI for AWS Secrets-Manager

secrets-manager-cli GO ABI for AWS Secrets-Manager SDK Setup AWS Documentation Download (Source) go get github.com/aws/aws-sdk-go-v2/aws go get github

Nov 16, 2021
Composer is a simple process manager for dev environments.

Composer Composer is a simple service manager for dev environments. How to build/install it? To build composer under ./bin, run: make build To build

May 12, 2022