LZ4 compression and decompression in pure Go

lz4 : LZ4 compression in pure Go

GoDoc Build Status Go Report Card GitHub tag (latest SemVer)

Overview

This package provides a streaming interface to LZ4 data streams as well as low level compress and uncompress functions for LZ4 data blocks. The implementation is based on the reference C one.

Install

Assuming you have the go toolchain installed:

go get github.com/pierrec/lz4

There is a command line interface tool to compress and decompress LZ4 files.

go install github.com/pierrec/lz4/cmd/lz4c

Usage

Usage of lz4c:
  -version
        print the program version

Subcommands:
Compress the given files or from stdin to stdout.
compress [arguments] [<file name> ...]
  -bc
        enable block checksum
  -l int
        compression level (0=fastest)
  -sc
        disable stream checksum
  -size string
        block max size [64K,256K,1M,4M] (default "4M")

Uncompress the given files or from stdin to stdout.
uncompress [arguments] [<file name> ...]

Example

// Compress and uncompress an input string.
s := "hello world"
r := strings.NewReader(s)

// The pipe will uncompress the data from the writer.
pr, pw := io.Pipe()
zw := lz4.NewWriter(pw)
zr := lz4.NewReader(pr)

go func() {
	// Compress the input string.
	_, _ = io.Copy(zw, r)
	_ = zw.Close() // Make sure the writer is closed
	_ = pw.Close() // Terminate the pipe
}()

_, _ = io.Copy(os.Stdout, zr)

// Output:
// hello world

Contributing

Contributions are very welcome for bug fixing, performance improvements...!

  • Open an issue with a proper description
  • Send a pull request with appropriate test case(s)

Contributors

Thanks to all contributors so far!

Special thanks to @Zariel for his asm implementation of the decoder.

Special thanks to @klauspost for his work on optimizing the code.

Owner
Comments
  • The latest commit (move stuff to v2) breaks package with go 1.10 and earlier

    The latest commit (move stuff to v2) breaks package with go 1.10 and earlier

    Everything is in the name.

    go get github.com/pierrec/lz4
    
    package github.com/pierrec/lz4/v2/internal/xxh32: cannot find package "github.com/pierrec/lz4/v2/internal/xxh32" in any of:
            /usr/lib/go-1.9/src/github.com/pierrec/lz4/v2/internal/xxh32 (from $GOROOT)
            /***/gopath/src/github.com/pierrec/lz4/v2/internal/xxh32 (from $GOPATH)
    
  • Fix previous commit that needs to convert pool returns.

    Fix previous commit that needs to convert pool returns.

    Can't go get v4 because it doesn't compile.

    Also I noted 2 staticcheck issue:

    • One is fixed in this PR. (int32 can never be negative).
    • The second one is more complex.

    It seems that you want can to reuse hashtables([]int) with a sync.pool .

    		chainTable = HashTablePool.Get().([]int)
    		defer HashTablePool.Put(chainTable)
    

    This cause https://staticcheck.io/docs/checks#SA6002. The problem is that you need to store it as pointer, otherwise the slice header is copied, not a bit deal but still 3 int allocations ( that could be avoided). I think you could create your own type or do a pointer dance here.

    I wanted to help you fix this one but I realized you might want to remove the parameters in the function since you always passing nil ? e.g. CompressBlockHC(src, dst []byte, depth CompressionLevel, hashTable, chainTable []int) = > CompressBlockHC(src, dst []byte, depth CompressionLevel)

    WDYT ?

    Signed-off-by: Cyril Tovena [email protected]

  • Multithreading support

    Multithreading support

    I noticed that this library uses only one cpu core. It may be problem on processing large data. Do you have ideas about how to add multi thread support?

  • module name in go.mod for v2

    module name in go.mod for v2

    It should have v2 in go.mod

    module github.com/pierrec/lz4/v2
    

    and then this example works,

    import "github.com/pierrec/lz4/v2"
    
    func test() {
        // ...
        text := lz4.NewReader(compressed)
        // ...
    }
    
  • [v4] Adds arm64 acceleration to decoder.

    [v4] Adds arm64 acceleration to decoder.

    Solves #142. Adapted from @greatroar's work on arm32.

      daisy  lizf  …  lz4  internal  lz4block  uname -a
    Linux daisy 5.10.0-1008-oem #9+lx2k1 SMP Sat Dec 26 01:51:36 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux
      daisy  lizf  …  lz4  internal  lz4block  go test ./...
    ok  	github.com/pierrec/lz4/v4/internal/lz4block	(cached)
    
  • LZ4 Legacy support?

    LZ4 Legacy support?

    I noticed that the package doesn't have legacy support There is this legacy magic value here: https://github.com/lz4/lz4/blob/dev/programs/lz4io.c#L80

    Is there any plan to support this?

  • Non-deterministic output observed

    Non-deterministic output observed

    Hi,

    we are observing a weird issue, where different servers running the same version of LZ4 package (github.com/pierrec/lz4 v2.3.0+incompatible) produce different compressed output from the same input, and I'm trying to understand what is going on.

    In our codebase, we reuse *lz4.Writer instances, and use (*lz4.Writer).Reset call when starting a new output. (*lz4.Writer) receives single Write call with entire input all at once. Inputs are ~2 MB, we use 64kb buffers in LZ4, mostly to reduce memory usage during decompression.

    When reading two files with lz4 debug enabled, I see these outputs:

    LZ4: reader.go:145 Read buf len=32768
    LZ4: reader.go:103 header block max size id=4 size=65536
    LZ4: reader.go:132 header read: lz4.Header{BlockMaxSize: 65536 }
    LZ4: reader.go:152 header read OK compressed buffer 65536 / 131072 uncompressed buffer 65536 : 65536 index=65536
    LZ4: reader.go:164 reading block from writer
    LZ4: reader.go:203 raw block size 15501
    LZ4: reader.go:238 compressed block size 15501
    LZ4: reader.go:274 current frame checksum 30916118
    LZ4: reader.go:295 copied 32768 bytes to input
    LZ4: reader.go:145 Read buf len=32768
    LZ4: reader.go:295 copied 32768 bytes to input
    LZ4: reader.go:145 Read buf len=32768
    LZ4: reader.go:164 reading block from writer
    LZ4: reader.go:203 raw block size 15400
    LZ4: reader.go:238 compressed block size 15400
    LZ4: reader.go:274 current frame checksum fd477d82
    LZ4: reader.go:295 copied 32768 bytes to input
    LZ4: reader.go:145 Read buf len=32768
    LZ4: reader.go:295 copied 32768 bytes to input
    LZ4: reader.go:145 Read buf len=32768
    LZ4: reader.go:164 reading block from writer
    LZ4: reader.go:203 raw block size 15396
    LZ4: reader.go:238 compressed block size 15396
    LZ4: reader.go:274 current frame checksum 564efa4
    LZ4: reader.go:295 copied 32768 bytes to input
    LZ4: reader.go:145 Read buf len=32768
    LZ4: reader.go:295 copied 32768 bytes to input
    LZ4: reader.go:145 Read buf len=32768
    LZ4: reader.go:164 reading block from writer
    LZ4: reader.go:203 raw block size 15388
    LZ4: reader.go:238 compressed block size 15388
    LZ4: reader.go:274 current frame checksum 20424901
    LZ4: reader.go:295 copied 32768 bytes to input
    LZ4: reader.go:145 Read buf len=32768
    LZ4: reader.go:295 copied 32768 bytes to input
    LZ4: reader.go:145 Read buf len=32768
    LZ4: reader.go:164 reading block from writer
    LZ4: reader.go:203 raw block size 13886
    LZ4: reader.go:238 compressed block size 13886
    LZ4: reader.go:274 current frame checksum bba29999
    LZ4: reader.go:295 copied 32768 bytes to input
    LZ4: reader.go:145 Read buf len=32768
    LZ4: reader.go:295 copied 25551 bytes to input
    LZ4: reader.go:145 Read buf len=32768
    LZ4: reader.go:164 reading block from writer
    LZ4: reader.go:185 frame checksum got=bba29999 / want=bba29999
    

    Second:

    LZ4: reader.go:145 Read buf len=32768
    LZ4: reader.go:103 header block max size id=4 size=65536
    LZ4: reader.go:132 header read: lz4.Header{BlockMaxSize: 65536 }
    LZ4: reader.go:152 header read OK compressed buffer 65536 / 131072 uncompressed buffer 65536 : 65536 index=65536
    LZ4: reader.go:164 reading block from writer
    LZ4: reader.go:203 raw block size 15501
    LZ4: reader.go:238 compressed block size 15501
    LZ4: reader.go:274 current frame checksum 30916118
    LZ4: reader.go:295 copied 32768 bytes to input
    LZ4: reader.go:145 Read buf len=32768
    LZ4: reader.go:295 copied 32768 bytes to input
    LZ4: reader.go:145 Read buf len=32768
    LZ4: reader.go:164 reading block from writer
    LZ4: reader.go:203 raw block size 15400
    LZ4: reader.go:238 compressed block size 15400
    LZ4: reader.go:274 current frame checksum fd477d82
    LZ4: reader.go:295 copied 32768 bytes to input
    LZ4: reader.go:145 Read buf len=32768
    LZ4: reader.go:295 copied 32768 bytes to input
    LZ4: reader.go:145 Read buf len=32768
    LZ4: reader.go:164 reading block from writer
    LZ4: reader.go:203 raw block size 15399
    LZ4: reader.go:238 compressed block size 15399
    LZ4: reader.go:274 current frame checksum 564efa4
    LZ4: reader.go:295 copied 32768 bytes to input
    LZ4: reader.go:145 Read buf len=32768
    LZ4: reader.go:295 copied 32768 bytes to input
    LZ4: reader.go:145 Read buf len=32768
    LZ4: reader.go:164 reading block from writer
    LZ4: reader.go:203 raw block size 15386
    LZ4: reader.go:238 compressed block size 15386
    LZ4: reader.go:274 current frame checksum 20424901
    LZ4: reader.go:295 copied 32768 bytes to input
    LZ4: reader.go:145 Read buf len=32768
    LZ4: reader.go:295 copied 32768 bytes to input
    LZ4: reader.go:145 Read buf len=32768
    LZ4: reader.go:164 reading block from writer
    LZ4: reader.go:203 raw block size 13887
    LZ4: reader.go:238 compressed block size 13887
    LZ4: reader.go:274 current frame checksum bba29999
    LZ4: reader.go:295 copied 32768 bytes to input
    LZ4: reader.go:145 Read buf len=32768
    LZ4: reader.go:295 copied 25551 bytes to input
    LZ4: reader.go:145 Read buf len=32768
    LZ4: reader.go:164 reading block from writer
    LZ4: reader.go:185 frame checksum got=bba29999 / want=bba29999
    

    The only difference is that raw block sizes for last two blocks are 15388, 13886 and 15386, 13887 respectively, otherwise both files decompress back to the same data.

    Is there anything that can make LZ4 writers to generate slightly different output? Reset call seems to reset everything except the hashtable – could that be an issue?

    Thank you.

    ps: So far I've been unable to reproduce the issue on my machine. :-(

  • CompressBlock error returns are confusing

    CompressBlock error returns are confusing

    CompressBlock and CompressBlockHC docs currently state:

    // If the destination buffer size is lower than CompressBlockBound and
    // the compressed size is 0 and no error, then the data is incompressible.
    //
    // An error is returned if the destination buffer is too small.
    

    This was introduced to fix #71, but I find it confusing. What is the difference between incompressible and the destination buffer being too small?

  • go test results in error: invalid frame checksum

    go test results in error: invalid frame checksum

    To reproduce I created a larger file:

    for i in {1..100};do cat Mark.Twain-Tom.Sawyer.txt >> longer.txt; done

    And added the "longer.txt" into the _test files.

    Result: testdata/longer.txt : 24394635 / 18475893 / 38785100 --- FAIL: TestWriter (0.00s) --- FAIL: TestWriter/testdata/longer.txt/lz4.Header{CompressionLevel:10} (3.72s) writer_test.go:65: lz4: invalid frame checksum: got cdbda0bd; expected f1d62b24 FAIL exit status 1 FAIL github.com/pierrec/lz4 18.054s

    I believe it's got something to do with io.Reader and it's quirk of not returning everything each time.

  • Increase general compression efficiency

    Increase general compression efficiency

    tl;dr increased compression speed for small messages 95%, for large ones ~60%, but parallelism has died.


    I like optimizing things. For one project I'm working on we wanted to support lz4 in a network compression scenario, where we are encoding small standalone sequences in a single data stream. I wrote a couple of reference implementations; flate/gzip performed quite well initially, but lz4 didn't do so well. I set about fixing that.

    The first adjustment I did was to remove the parallelism in the Writer. I expect this is the most controversial change in this PR and totally understand if you don't want to accept it as a result. But, I think it makes sense... Go's built-in gzip implementation is single threaded and in most applications where lz4 support would be embedded, threading compression itself is unnecessary because the application is already serving other requests on all available threads; concurrency comes at the request level rather than the operation level. Also this let me do some other optimizations further below...

    This change directly allowed me to make tweaks in the writer so that no memory allocations are necessary during compression. Previously Go needed to make some heap allocations for memory that was shared between threads. These changes cut about 50% off the time of compressing the lorem string, and are what is included in the first commit in this PR.

    After that I noticed that each write was suspiciously slow given that we weren't allocating anything. Turns out most of the time was spec in clearing the hashTable as it was allocated on the stack. I toyed around with a few solutions to avoid this but ultimately ended up with a "generational" table, where each call to CompressBlock is given its own unique generation ID and operations only affect other values written during the current generation. This, combined with a single threading approach, lets us cache the hashTable. This change cut 75% off of the time that was remaining.

    Finally, pprof showed me that quite a bit of time was being spent in binary.LittleEndian.Uint32. I used a little bit of scary unsafe that avoids this call on most common computers (x86/amd64/some ARM) which natively have little endian byte order. This cut 50% off of the remaining time.

    All in all, compressing lorem in the included benchmark now runs about 95% faster on my machine. I was concerned that removing the parallelism would hurt compression speed on large sequences but it seems that, at least on the VM I was testing on, the optimizations which become possible as a result were a good tradeoff.

    benchmark                          old ns/op     new ns/op     delta
    BenchmarkCompressEndToEnd          44258         2421          -94.53%
    BenchmarkCompressEndToEnd-10MB     24691717      10113045      -59.04%
    

    Look ma, no mallocs!

    BenchmarkCompressEndToEnd	  500000	      2434 ns/op	       8 B/op	       0 allocs/op
    

    I think I got most of the low-hanging fruit in this PR. The majority of the remaining time is spent in branch mispredictions, some memmoves, hashing, bitwise operators, and assignments.

  • lz4 reader read block until buf is full?

    lz4 reader read block until buf is full?

    for a io.reader n, err := lz4Reader.read(buf) i expect read should directly get any of current result and return the current read length, but it block until buf is full. it cause problem: https://go.dev/play/p/uFyyFbCL7Yp

  • avoid uncompressed duplicates in testdata to make module smaller

    avoid uncompressed duplicates in testdata to make module smaller

    I was investigating what the largest dependencies were in a project by measuring how large the module zips were, and yours stood out at about thirty megabytes. This isn't terrible per se, but it's still surprisingly large, and still slows down some builds depending on the internet speed.

    I see that the testdata directory contains some files in both the compressed and uncompressed forms. Have you thought about only keeping the compressed forms?

  • Add lz4.AppendOption for NewWriter

    Add lz4.AppendOption for NewWriter

    Add lz4.AppendOption, if NewWriter set lz4.AppendOption(true) ,then will not write header.

    In this way, you can continue to write new data on the existing lz4 file.

    like this:

            // 
    	fw, _ := os.OpenFile(path, os.O_CREATE|os.O_RDWR|os.O_APPEND, 0644)
    	w := lz4.NewWriter(fw)
    
    	w.Apply(lz4.ChecksumOption(false))
    
    	w.Write(d)
    	w.Flush()
    
            // do some other  thing, or a long time later
    
            
    	fw, _ := os.OpenFile(path, os.O_CREATE|os.O_RDWR|os.O_APPEND, 0644)
    	w := lz4.NewWriter(fw)
    
    	w.Apply(lz4.ChecksumOption(false), lz4.AppendOption(true))
    
    	w.Write(d)
    	w.Flush()
    
    
    

    Thank

  • jsonlz4: lz4: invalid source or destination buffer too short

    jsonlz4: lz4: invalid source or destination buffer too short

    I tried using pierrec/lz4 to access mozilla's jsonlz4 files using the way outlined here: https://github.com/pierrec/lz4/issues/28 , but it always shows this error:

    lz4: invalid source or destination buffer too short

    mozlz4a.py can decompress them without problems

    Demo: https://go.dev/play/p/A32NWBYkchg

    package main
    
    import (
    	"fmt"
    	"crypto/md5"
    	"github.com/pierrec/lz4"
    )
    
    // Payload created with this Python script: https://gist.github.com/Tblue/62ff47bef7f894e92ed5
    //
    // $ printf 'mozLz40\x00!\x00\x00\x00\xF0\x12{\"version\":[\"sessionrestore\",1]}\n'  | md5sum
    // 12c5a86eaafe57bbb0345f52505610bf  -
    // printf 'mozLz40\x00!\x00\x00\x00\xF0\x12{\"version\":[\"sessionrestore\",1]}\n'  | python3.7 mozlz4a.py  -d -
    // {"version":["sessionrestore",1]}
    
    func md5sum(s string) (r string) {
    	digest := md5.New()
    	digest.Write([]byte(s))
    	return fmt.Sprintf("%x", digest.Sum(nil))
    }
    
    var payload string = "mozLz40\x00!\x00\x00\x00\xF0\x12{\"version\":[\"sessionrestore\",1]}\n"
    
    func main() {
    	fmt.Println(md5sum(payload))
    	out := make([]byte, len(payload)*1000)
    	_, e := lz4.UncompressBlock([]byte(payload), out)
    	if e != nil {
    		panic(e)
    	}
    	fmt.Print(string(out))
    }
    
  • lz4: invalid source or destination buffer too short

    lz4: invalid source or destination buffer too short

    I have a compressed lz4 file (which is produced by a cpp program using c lz4 lib). I can decode it normally using lz4 command tool, but using golang lz4 cmd tool, it failed with this error message: lz4: invalid source or destination buffer too short

    what can be the possible cause of this issue?

  • [RSVP] What is the source of xxh32zero.go?

    [RSVP] What is the source of xxh32zero.go?

    Hi @pierrec,

    What is the source of xxh32zero.go? I'm guessing your ported the reference implementation, and that the unreachable url in xxh32zero.go is a form of attribution. Or is this someone else's work that you then built off of?

    I was able to circumvent the other issues by excluding files mentioned in #178 and disabling associated tests. Debian ftpmasters are treating #194, this issue, as a hard blocker. Finally this issue is also blocking work on reverse dependencies, so please reply asap :)

    Thank you, Nicholas

  • Data race when using concurrency > 1

    Data race when using concurrency > 1

    In v4 branch, the lz library may cause data corrupt if you have set concurrency > 1. The race problem happens at

    https://github.com/pierrec/lz4/blob/v4/writer.go#L100 https://github.com/pierrec/lz4/blob/v4/writer.go#L159

    Suppose this scenario, the calling sequences are Write, Flush, Write, Flush repeatedly, this is a very common scenario in web server. When w.data is filled with data but doesn't reach the buffer size, calling Flush will send w.data[:w.idx] to channel and reset w.idx to zero. Here comes the problem because w.data[:w.idx] is a shadow copy of slice, that means a new call of Write and compression gorouting writes and reads the same underlaying byte array with an overlap index.

Optimized compression packages

compress This package provides various compression algorithms. zstandard compression and decompression in pure Go. S2 is a high performance replacemen

Jan 4, 2023
Go wrapper for LZO compression library

This is a cgo wrapper around the LZO real-time compression library. LZO is available at http://www.oberhumer.com/opensource/lzo/ lzo.go is the go pack

Mar 4, 2022
Go parallel gzip (de)compression

pgzip Go parallel gzip compression/decompression. This is a fully gzip compatible drop in replacement for "compress/gzip". This will split compression

Dec 29, 2022
Unsigned Integer 32 Byte Packing Compression

dbp32 Unsigned Integer 32 Byte Packing Compression. Inspired by lemire/FastPFor. Package bp32 is an implementation of the binary packing integer compr

Sep 6, 2021
Bzip2 Compression Tool written in Go

Bzip2 Compression Tool written in Go

Dec 28, 2021
Slipstream is a method for lossless compression of power system data.

Slipstream Slipstream is a method for lossless compression of power system data. Design principles The protocol is designed for streaming raw measurem

Apr 14, 2022
An easy-to-use CLI-based compression tool.

Easy Compression An easy-to-use CLI-based compression tool. Usage NAME: EasyCompression - A CLI-based tool for (de)compression USAGE: EasyCompr

Jan 1, 2022
zlib compression tool for modern multi-core machines written in Go

zlib compression tool for modern multi-core machines written in Go

Jan 21, 2022
a little app to gzip+base64 encode and decode

GO=GZIP64 A little golang console utility that reads a file and either: 1) Encodes it - gzip compress followed by base64 encode writes

Oct 16, 2021
A simple zip compactor app written in golang to help you life. Usage with native GUI and CLI.
A simple zip compactor app written in golang to help you life. Usage with native GUI and CLI.

Usage Install go install github.com/gustavonobreza/zip-compactor Run in GUI (Can select many files) zip-compactor Run in CLI (Can select just one file

Nov 12, 2021
Port of LZ4 lossless compression algorithm to Go

go-lz4 go-lz4 is port of LZ4 lossless compression algorithm to Go. The original C code is located at: https://github.com/Cyan4973/lz4 Status Usage go

Jun 14, 2022
An effective time-series data compression/decompression method based on Facebook's Gorilla.

Gorilla This package provides the effective time-series data compression method based on Facebook's Gorilla.. In a nutshell, it uses delta-of-delta ti

Sep 26, 2022
Go bindings for unarr (decompression library for RAR, TAR, ZIP and 7z archives)

go-unarr Golang bindings for the unarr library from sumatrapdf. unarr is a decompression library and CLI for RAR, TAR, ZIP and 7z archives. GoDoc See

Dec 29, 2022
Package cae implements PHP-like Compression and Archive Extensions.

Compression and Archive Extensions 中文文档 Package cae implements PHP-like Compression and Archive Extensions. But this package has some modifications de

Jun 16, 2022
Novel, efficient, and practical image compression with visually appealing results. 🤏 ✨
Novel, efficient, and practical image compression with visually appealing results. 🤏 ✨

Tiny Thumb ?? ✨ A novel, efficient, and practical method for lossy image compression, that produces visually appealing thumbnails. This technique is u

Nov 1, 2022
Optimized compression packages

compress This package provides various compression algorithms. zstandard compression and decompression in pure Go. S2 is a high performance replacemen

Jan 4, 2023
Go wrapper for LZO compression library

This is a cgo wrapper around the LZO real-time compression library. LZO is available at http://www.oberhumer.com/opensource/lzo/ lzo.go is the go pack

Mar 4, 2022
Go parallel gzip (de)compression

pgzip Go parallel gzip compression/decompression. This is a fully gzip compatible drop in replacement for "compress/gzip". This will split compression

Dec 29, 2022
Integer Compression Libraries for Go

Encoding This is a set of integer compression algorithms implemented in Go. It is an (incomplete) port of the JavaFastPFOR by Dr. Daniel Lemire. For m

Dec 16, 2022
Using brotli compression to embed static files in Go.
Using brotli compression to embed static files in Go.

?? Broccoli go get -u aletheia.icu/broccoli Broccoli uses brotli compression to embed a virtual file system of static files inside Go executables. A f

Nov 25, 2022