Fastzip is an opinionated Zip archiver and extractor with a focus on speed.

fastzip

godoc Build Status

Fastzip is an opinionated Zip archiver and extractor with a focus on speed.

  • Archiving and extraction of files and directories can only occur within a specified directory.
  • Permissions, ownership (uid, gid on linux/unix) and modification times are preserved.
  • Buffers used for copying files are recycled to reduce allocations.
  • Files are archived and extracted concurrently.
  • By default, the excellent github.com/klauspost/compress/flate library is used for compression and decompression.

Example

Archiver

// Create archive file
w, err := os.Create("archive.zip")
if err != nil {
  panic(err)
}
defer w.Close()

// Create new Archiver
a, err := fastzip.NewArchiver(w, "~/fastzip-archiving")
if err != nil {
  panic(err)
}
defer a.Close()

// Register a non-default level compressor if required
// a.RegisterCompressor(zip.Deflate, fastzip.FlateCompressor(1))

// Walk directory, adding the files we want to add
files := make(map[string]os.FileInfo)
err = filepath.Walk("~/fastzip-archiving", func(pathname string, info os.FileInfo, err error) error {
	files[pathname] = info
	return nil
})

// Archive
if err = a.Archive(context.Background(), files); err != nil {
  panic(err)
}

Extractor

// Create new extractor
e, err := fastzip.NewExtractor("archive.zip", "~/fastzip-extraction")
if err != nil {
  panic(err)
}
defer e.Close()

// Extract archive files
if err = e.Extract(context.Background()); err != nil {
  panic(err)
}

Benchmarks

Archiving and extracting a Go 1.13 GOROOT directory, 342M, 10308 files.

StandardFlate is using compress/flate, NonStandardFlate is klauspost/compress/flate, both on level 5. This was performed on a server with an SSD and 24-cores. Each test was conducted using the WithArchiverConcurrency and WithExtractorConcurrency options of 1, 2, 4, 8 and 16.

$ go test -bench Benchmark* -archivedir go1.13 -benchtime=30s -timeout=20m

goos: linux
goarch: amd64
pkg: github.com/saracen/fastzip
BenchmarkArchiveStore_1-24                            39         788604969 ns/op         421.66 MB/s     9395405 B/op     266271 allocs/op
BenchmarkArchiveStandardFlate_1-24                     2        16154127468 ns/op         20.58 MB/s    12075824 B/op     257251 allocs/op
BenchmarkArchiveStandardFlate_2-24                     4        8686391074 ns/op          38.28 MB/s    15898644 B/op     260757 allocs/op
BenchmarkArchiveStandardFlate_4-24                     7        4391603068 ns/op          75.72 MB/s    19295604 B/op     260871 allocs/op
BenchmarkArchiveStandardFlate_8-24                    14        2291624196 ns/op         145.10 MB/s    21999205 B/op     260970 allocs/op
BenchmarkArchiveStandardFlate_16-24                   16        2105056696 ns/op         157.96 MB/s    29237232 B/op     261225 allocs/op
BenchmarkArchiveNonStandardFlate_1-24                  6        6011250439 ns/op          55.32 MB/s    11070960 B/op     257204 allocs/op
BenchmarkArchiveNonStandardFlate_2-24                  9        3629347294 ns/op          91.62 MB/s    18870130 B/op     262279 allocs/op
BenchmarkArchiveNonStandardFlate_4-24                 18        1766182097 ns/op         188.27 MB/s    22976928 B/op     262349 allocs/op
BenchmarkArchiveNonStandardFlate_8-24                 34        1002516188 ns/op         331.69 MB/s    29860872 B/op     262473 allocs/op
BenchmarkArchiveNonStandardFlate_16-24                46         757112363 ns/op         439.20 MB/s    42036132 B/op     262714 allocs/op
BenchmarkExtractStore_1-24                            20        1625582744 ns/op         202.66 MB/s    22900375 B/op     330528 allocs/op
BenchmarkExtractStore_2-24                            42         786644031 ns/op         418.80 MB/s    22307976 B/op     329272 allocs/op
BenchmarkExtractStore_4-24                            92         384075767 ns/op         857.76 MB/s    22247288 B/op     328667 allocs/op
BenchmarkExtractStore_8-24                           165         215884636 ns/op        1526.02 MB/s    22354996 B/op     328459 allocs/op
BenchmarkExtractStore_16-24                          226         157087517 ns/op        2097.20 MB/s    22258691 B/op     328393 allocs/op
BenchmarkExtractStandardFlate_1-24                     6        5501808448 ns/op          23.47 MB/s    86148462 B/op     495586 allocs/op
BenchmarkExtractStandardFlate_2-24                    13        2748387174 ns/op          46.99 MB/s    84232141 B/op     491343 allocs/op
BenchmarkExtractStandardFlate_4-24                    21        1511063035 ns/op          85.47 MB/s    84998750 B/op     490124 allocs/op
BenchmarkExtractStandardFlate_8-24                    32         995911009 ns/op         129.67 MB/s    86188957 B/op     489574 allocs/op
BenchmarkExtractStandardFlate_16-24                   46         652641882 ns/op         197.88 MB/s    88256113 B/op     489575 allocs/op
BenchmarkExtractNonStandardFlate_1-24                  7        4989810851 ns/op          25.88 MB/s    64552948 B/op     373541 allocs/op
BenchmarkExtractNonStandardFlate_2-24                 13        2478287953 ns/op          52.11 MB/s    63413947 B/op     373183 allocs/op
BenchmarkExtractNonStandardFlate_4-24                 26        1333552250 ns/op          96.84 MB/s    63546389 B/op     373925 allocs/op
BenchmarkExtractNonStandardFlate_8-24                 37         817039739 ns/op         158.06 MB/s    64354655 B/op     375357 allocs/op
BenchmarkExtractNonStandardFlate_16-24                63         566984549 ns/op         227.77 MB/s    65444227 B/op     379664 allocs/op
Comments
  • Decompressed file content is messed up

    Decompressed file content is messed up

    I'm archiving a folder containing .git/config file with fastzip on Linux, following the example shown in README.

    When extracting said file, I'm expecting to see:

    [core]
            repositoryformatversion = 0
            filemode = true
            bare = false
    [remote "origin"]
            url = ...
    [receive]
            denyNonFastForwards = false
            denyCurrentBranch = ignore
            denyDeleteCurrent = ignore
    

    However, I get:

    <..J.@^PE.._1.^CL..Z^H..>.(.%."..$;..Sfg+.{..>^....w,x0...3).yb.VO(.8A^KkSM^T0....JAS^MV.0....^E#+...fJ....^Dh..^^....N.....g.v]7........9....4w^O..5...^X.fR_.^[..^?I\.^Xy.qH..v.t~...Y.,?V.|...]^Q../b...^B....
    

    This happens with both golang sdk (using NewExtractor) and with traditional unix tar command. Worth mentioning, is that this is only the case when using the default Deflate compression method. Using 0 (Store) method works fine (no compression though, which is expected).

    I'd appreciate any help with this issue.

    EDIT: Compressing only the file itself (as opposed to the folder containing it) seems to work fine.

  • Zips created in windows are not valid in linux due to the invalid separator

    Zips created in windows are not valid in linux due to the invalid separator

    Even if Archive is provided with the files map that has "/" separator in keys it is changed to "\" in the system calls in the ArchiveWithContext. When "\" is used as separator linux sees invalid folder structure.

    Fix: in ArchiveWithContext change

    fileInfoHeader(rel, fi, hdr)
    

    to

    fileInfoHeader(strings.ReplaceAll(rel, "\\", "/"), fi, hdr)
    

    or some similar solution using os.PathSeparator

  • Prevent unnecessary directory in zip?

    Prevent unnecessary directory in zip?

    When using fastzip, by default, there will be a single directory at the top level of the zip, which contains everything.

    Is there a way to not have this top level directory?

  • Fix atomic access to unaligned fields causing panics on 32bit systems

    Fix atomic access to unaligned fields causing panics on 32bit systems

    We access the "written" and "entries" fields in "Archiver" and "Extractor" using atomic operations. This operations require the fields to be 8 byte aligned. Otherwise they panic at runtime.

    On 32bit systems, int64 fields are not necessarily aligned to 8 bytes. They may just be aligned to 4 bytes depending on the fields before them.

    This currently causes fastzip to reproducably fail archiving and extraction on 32bit systems.

    Move the fields which are accessed via atomic operations to the start of the struct to properly align them and avoid the problem.

    See https://go101.org/article/memory-layout.html "The Alignment Requirement for 64-bit Word Atomic Operations" However, on 32-bit architectures, the alignment guarantee made by the standard Go compiler for 64-bit words is only 4 bytes. 64-bit atomic operations on a 64-bit word which is not 8-byte aligned will panic at runtime. ... The ways are described as the first (64-bit) word in a variable or in an allocated struct, array, or slice can be relied upon to be 64-bit aligned.

    See https://pkg.go.dev/sync/atomic#pkg-note-BUG On ARM, 386, and 32-bit MIPS, it is the caller's responsibility to arrange for 64-bit alignment of 64-bit words accessed atomically. The first word in a variable or in an allocated struct, array, or slice can be relied upon to be 64-bit aligned.

    Related https://github.com/golang/go/issues/11891

    The panic behavior can easily be reproduced by running "go test" in fastzip on Windows with go 1.17.6 i386 installed.

    panic({0xca2aa0, 0xd0fa90})
            C:/Program Files (x86)/Go/src/runtime/panic.go:1038 +0x1bc
    runtime/internal/atomic.panicUnaligned()
            C:/Program Files (x86)/Go/src/runtime/internal/atomic/unaligned.go:8 +0x2d
    runtime/internal/atomic.Load64(0x11ce0084)
            C:/Program Files (x86)/Go/src/runtime/internal/atomic/atomic_386.s:225 +0x10
    github.com/saracen/fastzip.(*Archiver).Written(0x11ce0050)
            C:/go/fastzip/archiver.go:94 +0x26
    github.com/saracen/fastzip.TestArchiveCancelContext(0x11c84960)
            C:/go/fastzip/archiver_test.go:165 +0x506
    
  • add custom root folder during archival?

    add custom root folder during archival?

    👋 I'm trying to compress a bunch of files and have everything inside a root folder (its name differs from the source directory).

    So given

    /tmp/source-dir
       /A.txt
       /B.txt
    

    would create a ZIP file with a structure like this:

    /myroot
       /A.txt
       /B.txt
    

    I can't get the myroot root directory working without renaming the original source dir (not an option, unfortunately). I tried using symlinks (seems to compress symlink itself, not follow it?) and changing the file path in the filepath.Walk function (results in "file not found").

    Is there a way to do this?

    PS: I can't do this on the extract, since I have no control over that part

  • zip: not a valid zip file

    zip: not a valid zip file

    I copied the README code almost verbatim as far as I can tell, but I'm seeing this error

    panic: zip: not a valid zip file
    

    My code:

    package main
    
    import (
    	"context"
    	"os"
    	"path/filepath"
    
    	"github.com/saracen/fastzip"
    )
    
    func main() {
    	sourceDir := "util/"
    	zipFile := "test.zip"
    	w, err := os.Create(zipFile)
    	if err != nil {
    		panic(err)
    	}
    	defer w.Close()
    
    	a, err := fastzip.NewArchiver(w, sourceDir)
    	if err != nil {
    		panic(err)
    	}
    	defer a.Close()
    
    	files := make(map[string]os.FileInfo)
    	err = filepath.Walk(sourceDir, func(pathname string, info os.FileInfo, err error) error {
    		files[pathname] = info
    		return nil
    	})
    
    	if err = a.Archive(context.Background(), files); err != nil {
    		panic(err)
    	}
    
    	e, err := fastzip.NewExtractor(zipFile, "./")
    	if err != nil {
    		panic(err) // PANICS
    	}
    	defer e.Close()
    
    	if err = e.Extract(context.Background()); err != nil {
    		panic(err)
    	}
    }
    

    When I unzip the file in OSX Finder, it works just fine :-?

  • A buffer size of zero uses no buffer, rather than default size

    A buffer size of zero uses no buffer, rather than default size

    Previously, a buffer size of zero would use the default buffer size and -1 would disable the buffer. This is probably confusing/unexpected.

    This change remains backwards compatible, as <= 0 will disable the buffer.

  • Fix CreateRaw() flags and timestamps

    Fix CreateRaw() flags and timestamps

    When the standard Go library's version of CreateRaw was added, rather than solely focus on custom compression in "raw" mode, it also removed the convenience of setting up common zip flags and timestamp logic.

    This change duplicates some functions from archive/zip to add the flags and timestamp logic to CreateRaw().

    This fixes an issue raised by a user of GitLab-Runner https://gitlab.com/gitlab-org/gitlab-runner/-/issues/28928 where timestamps were missing whenever CreateRaw was used.

  • Remove patched archive/zip, replace with klauspost's

    Remove patched archive/zip, replace with klauspost's

    klauspost kindly implemented the CreateHeaderRaw proposal (https://github.com/golang/go/issues/34974) in https://github.com/klauspost/compress/pull/214. This repository was vendoring the patched version (https://go-review.googlesource.com/c/go/+/202217/) but having it supported in a well maintained library already used by fastzip is a plus.

  • Rate-limit active go routines

    Rate-limit active go routines

    The filepool's Get()/Put() was designed to rate limit up to a certain concurrency, and whilst the unit of work is rate limited, we launch as many go routines as there are files to be archived first. By moving the Get() call outside of a go routine, we also limit the amount of go routines created at once.

    This likely has memory savings, but also allows the -race detector to work when archiving many thousands of files.

  • Updates klauspost/compress to v1.14.3

    Updates klauspost/compress to v1.14.3

    This also now calls the newer CreateRaw() method, as CreateHeaderRaw() is now deprecated.

    This fixes #32, as when CreateHeaderRaw() was deprecated, it started calling the incorrect function that replaced it: https://github.com/klauspost/compress/pull/502

    A test with a larger content body has been added as it was able to detect this regression.

    Fixes #32

  • Preserve timestamps of symlinks

    Preserve timestamps of symlinks

    Does this library support preserving timestamps of symlinks?

    See the problem report at gitlab.com:

    • https://gitlab.com/gitlab-org/gitlab/-/issues/359272
  • Archiver modifies the last modified date of the folder to archive

    Archiver modifies the last modified date of the folder to archive

    Hi,

    I'm on MS Windows and when I use your "Archiver example" from the readme.me, the last modified date of the folder to archive is changed to now().

    Instead of ~/fastzip-archiving I'm using R:/tst_zip/zip me" which contains about 2000 files and 300 folders

    Is there a way to avoid this behavior? I know that the last modified date is not know on linux but it is on Windows and it shouldn't be reset only because the folder gets zipped...

RIFF file extractor written in Go.
RIFF file extractor written in Go.

RIFF-Extractor RIFF file extractor written in Go. This was written for Dying Light 2, but should also work for other games. I wasn't able to find any

Aug 1, 2022
mateors zip and unzip package

Zip unzip package How to install this package? go get github.com/mateors/mzip How to make zip files? package main import "github.com/mateors/mzip" f

Apr 25, 2022
Go Copy Zip file With Filter

Go Copy Zip file With Filter Unsurprisingly, this was written for Log4shell remediation, to remove the Jndi class from log4j-core.jar's It gives you j

Jan 10, 2022
ZipFly: a golang HTTP server that streams a ZIP file from a list of URLs extracted from a JSON manifest

ZipFly, streaming files as a ZIP like a ?? ZipFly is a golang HTTP server that s

Jun 6, 2022
RtxTest - Extract this zip file into your golang development environment

Documentation 1. Clone or extract file extract this zip file into your golang de

May 12, 2022
Yet another Go package for working with *.golden test files, with a focus on simplicity.

go-golden Yet another Go package for working with *.golden test files, with a focus on simplicity. Import import "github.com/jimeh/go-golden" Usage fu

Aug 3, 2022
gsheet is a CLI tool (and Golang package) for piping csv data to and from Google Sheets

gsheet Table of Contents Introduction Why? Installation Authentication and Authorization What about OAuth authentication? CLI Usage Sheet commands Dri

Nov 15, 2022
Bigfile -- a file transfer system that supports http, rpc and ftp protocol https://bigfile.site
Bigfile -- a file transfer system that supports http, rpc and ftp protocol   https://bigfile.site

Bigfile ———— a file transfer system that supports http, rpc and ftp protocol 简体中文 ∙ English Bigfile is a file transfer system, supports http, ftp and

Dec 31, 2022
Takes an input http.FileSystem (likely at go generate time) and generates Go code that statically implements it.

vfsgen Package vfsgen takes an http.FileSystem (likely at go generate time) and generates Go code that statically implements the provided http.FileSys

Dec 18, 2022
QueryCSV enables you to load CSV files and manipulate them using SQL queries then after you finish you can export the new values to a CSV file
QueryCSV enables you to load CSV files and manipulate them using SQL queries then after you finish you can export the new values to a CSV file

QueryCSV enable you to load CSV files and manipulate them using SQL queries then after you finish you can export the new values to CSV file

Dec 22, 2021
🏵 Gee is tool of stdin to each files and stdout
🏵 Gee is tool of stdin to each files and stdout

Gee is tool of stdin to each files and stdout. It is similar to the tee command, but there are more functions for convenience. In addition, it was written as go. which provides output to stdout and files.

Nov 17, 2022
Golang PDF library for creating and processing PDF files (pure go)

UniPDF - PDF for Go UniDoc UniPDF is a PDF library for Go (golang) with capabilities for creating and reading, processing PDF files. The library is wr

Dec 28, 2022
Go (golang) library for reading and writing XLSX files.

XLSX Introduction xlsx is a library to simplify reading and writing the XML format used by recent version of Microsoft Excel in Go programs. Current s

Jan 5, 2023
Golang library for reading and writing Microsoft Excel™ (XLSX) files.
Golang library for reading and writing Microsoft Excel™ (XLSX) files.

Excelize Introduction Excelize is a library written in pure Go providing a set of functions that allow you to write to and read from XLSX / XLSM / XLT

Dec 31, 2022
go-fastdfs 是一个简单的分布式文件系统(私有云存储),具有无中心、高性能,高可靠,免维护等优点,支持断点续传,分块上传,小文件合并,自动同步,自动修复。Go-fastdfs is a simple distributed file system (private cloud storage), with no center, high performance, high reliability, maintenance free and other advantages, support breakpoint continuation, block upload, small file merge, automatic synchronization, automatic repair.(similar fastdfs).
go-fastdfs 是一个简单的分布式文件系统(私有云存储),具有无中心、高性能,高可靠,免维护等优点,支持断点续传,分块上传,小文件合并,自动同步,自动修复。Go-fastdfs is a simple distributed file system (private cloud storage), with no center, high performance, high reliability, maintenance free and other advantages, support breakpoint continuation, block upload, small file merge, automatic synchronization, automatic repair.(similar fastdfs).

中文 English 愿景:为用户提供最简单、可靠、高效的分布式文件系统。 go-fastdfs是一个基于http协议的分布式文件系统,它基于大道至简的设计理念,一切从简设计,使得它的运维及扩展变得更加简单,它具有高性能、高可靠、无中心、免维护等优点。 大家担心的是这么简单的文件系统,靠不靠谱,可不

Jan 8, 2023
Dragonfly is an intelligent P2P based image and file distribution system.
Dragonfly is an intelligent P2P based image and file distribution system.

Dragonfly Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in o

Jan 9, 2023
Package cae implements PHP-like Compression and Archive Extensions.

Compression and Archive Extensions 中文文档 Package cae implements PHP-like Compression and Archive Extensions. But this package has some modifications de

Jun 16, 2022
Easily create & extract archives, and compress & decompress files of various formats

archiver Introducing Archiver 3.1 - a cross-platform, multi-format archive utility and Go library. A powerful and flexible library meets an elegant CL

Jan 3, 2023