Port of LZ4 lossless compression algorithm to Go

go-lz4

go-lz4 is port of LZ4 lossless compression algorithm to Go. The original C code is located at:

https://github.com/Cyan4973/lz4

Status

Build Status
GoDoc

Usage

go get github.com/bkaradzic/go-lz4

import "github.com/bkaradzic/go-lz4"

The package name is lz4

Notes

  • go-lz4 saves a uint32 with the original uncompressed length at the beginning of the encoded buffer. They may get in the way of interoperability with other implementations.

Alternative

https://github.com/pierrec/lz4

Contributors

Damian Gryski (@dgryski)
Dustin Sallings (@dustin)

Contact

@bkaradzic
http://www.stuckingeometry.com

Project page
https://github.com/bkaradzic/go-lz4

License

Copyright 2011-2012 Branimir Karadzic. All rights reserved.
Copyright 2013 Damian Gryski. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY COPYRIGHT HOLDER ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Owner
Бранимир Караџић
кодер (gamedev, open source, ex-demoscene) ★
Бранимир Караџић
Comments
  • Incompatibility with official LZ4

    Incompatibility with official LZ4

    Setup:

    • backend which uses go-lz4
    • client which uses official LZ4 C libraries (no way to Go unfortunately)

    The problem I have is that sometimes data from backend couldn't be decoded on client. The keyword is "sometimes". I've got one sample (55K compressed, 114K uncompressed).

    I'm not sure yet which side contains bug but let me know if you want to check it out.

  • optimized cp(), 2x performance increase for large files

    optimized cp(), 2x performance increase for large files

    AFAICT, cp() needs to append byte by byte because initially there may not be enough in the dst to cover the length requested. But once there's enough in dst, you can just append a whole slice.

  • Can't compress files larger than 2^17 bytes

    Can't compress files larger than 2^17 bytes

    This is the size of the internal buffer, but there appear to be no checks to make sure we don't overrun it when we load data in in writer.go encoder.cache(). I haven't checked the reader, but it seems likely the same problem will occur in flush() if we try to write out a larger section that what we've read it.

  • Incompatible NewWriter signature

    Incompatible NewWriter signature

    The typical signature for NewWriter is func NewWriter(wr io.Writer) *Writer whereas the LZ4 code has NewWriter(r io.Reader) io.ReadCloser. This makes it unusable within normal Go programs

  • Optimize decoder

    Optimize decoder

    By preallocating our destination buffer, we can eliminate all the calls to append(). This plus some inlining two hot routines give a considerable speedup to Decode().

    Below are benchmarks ported from snappy-go:

    benchmark                  old ns/op    new ns/op    delta
    BenchmarkLZ4Decode           4480128      3150442  -29.68%
    BenchmarkWordsDecode1e3         6071         3506  -42.25%
    BenchmarkWordsDecode1e4        69195        45798  -33.81%
    BenchmarkWordsDecode1e5       744347       539174  -27.56%
    BenchmarkWordsDecode1e6      6616125      4841891  -26.82%
    
    benchmark                   old MB/s     new MB/s  speedup
    BenchmarkWordsDecode1e3       164.71       285.18    1.73x
    BenchmarkWordsDecode1e4       144.52       218.35    1.51x
    BenchmarkWordsDecode1e5       134.35       185.47    1.38x
    BenchmarkWordsDecode1e6       151.15       206.53    1.37x
    
  • Add some basic tests so Travis does more than just build.

    Add some basic tests so Travis does more than just build.

    These are some sanity tests to make sure we actually encode/decode things.

    I'm using /usr/share/dict/words as a source of input. If you're prefer (because the line-endings suggest to me you're on windows) we can import a novel from Project Gutenberg and use that instead so the data is actually commited to the repo.

    The snappy tests actually download their larger test data at benchmark time, if requested, so that's an option too. I wouldn't want to do that for the basic tests though.

  • Allow more seamless integration to other projects.

    Allow more seamless integration to other projects.

    This allows people to more easily use the library from within their code without having to pull it down to a different location and manipulate their GOPATH.

  • hashTable allocating tons of memory

    hashTable allocating tons of memory

    $ go tool pprof populate /tmp/profile764028996/mem.pprof
    Entering interactive mode (type "help" for commands)
    (pprof) list lz4.Encode
    Total: 3.90GB
    ROUTINE ======================== github.com/bkaradzic/go-lz4.Encode in /home/ubuntu/go/src/github.com/bkaradzic/go-lz4/writer.go
        3.70GB     3.70GB (flat, cum) 94.89% of Total
             .          .    107:   if len(src) >= MaxInputSize {
             .          .    108:           return nil, ErrTooLarge
             .          .    109:   }
             .          .    110:
             .          .    111:   if n := CompressBound(len(src)); len(dst) < n {
        8.52MB     8.52MB    112:           dst = make([]byte, n)
             .          .    113:   }
             .          .    114:
        3.69GB     3.69GB    115:   e := encoder{src: src, dst: dst, hashTable: make([]uint32, hashTableSize)}
             .          .    116:
             .          .    117:   binary.LittleEndian.PutUint32(dst, uint32(len(src)))
             .          .    118:   e.dpos = 4
             .          .    119:
             .          .    120:   var (
    

    This line in the code is causing Badger to OOM when loading data really fast. Ideally, you want to reuse the same hashTable. Can be done via sync.Pool. Happy to send a PR if that'd help.

  • Use capacity to determine whether dst can be reused for encoding and decoding

    Use capacity to determine whether dst can be reused for encoding and decoding

    Also, modify the benchmarks to reuse the allocated slice. This improves the ns/op for both encoding and decoding by 0.4s/op (2.4s -> 2.0s for Decode, and similar for encode).

  • Interoperability issue with other lz4 libraries

    Interoperability issue with other lz4 libraries

    I'm using two github versions of lz4: yours for Golang https://github.com/jpountz/lz4-java for java.

    I used your library in the code below:

    package main
    
    import (
    	lz4 "github.com/bkaradzic/go-lz4"
    	"fmt"
    	"os"
    	"encoding/base64"
    )
    
    func main() {
    	var data []byte
    	var err error
    
    	to_compress:="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    
    	if data,err=lz4.Encode(nil,[]byte(to_compress)); err!=nil {
    		fmt.Fprintf(os.Stderr,"Failed to compress: '%s'",err)
    		return
    	}
    
    	fmt.Fprintf(os.Stderr,"Success! Length=%d\n",len(to_compress))
    
    	fmt.Fprintf(os.Stdout,"%s\n",base64.StdEncoding.EncodeToString(data))
    }
    

    and obtain following output:

    raffi@iot-micro-raffi ~/tmp > go run ./test_lz4.go 
    Success! Length=100
    ZAAAAB94AQBLUHh4eHh4
    raffi@iot-micro-raffi ~/tmp > 
    

    Then I inject this output into a Java version to decode the result:

    import java.util.Base64;
    import net.jpountz.lz4.LZ4Factory;
    import net.jpountz.lz4.LZ4FastDecompressor;
    
    public class test_lz4_decomp {
    
        public static void main(String[] args) {
            String compressed_base64= "ZAAAAB94AQBLUHh4eHh4";
            int original_size = 100;
            byte [] compressed = Base64.getDecoder().decode(compressed_base64);
    
            LZ4Factory _factory= LZ4Factory.fastestInstance();
            LZ4FastDecompressor decompressor = _factory.fastDecompressor();
            byte []restored = new byte[original_size];
            decompressor.decompress(compressed, 0, restored, 0,original_size);
    
            String decompressed_str=new String(restored);
    
            System.out.println(decompressed_str); 
        }
    
    }
    
    

    but obtain the following output:

    raffi@iot-micro-raffi ~/tmp > java -cp ".:lz4-1.3.0.jar" test_lz4_decompException in thread "main" net.jpountz.lz4.LZ4Exception: Error decoding offset 58 of input buffer
    	at net.jpountz.lz4.LZ4JNIFastDecompressor.decompress(LZ4JNIFastDecompressor.java:39)
    	at test_lz4_decomp.main(test_lz4_decomp.java:16)
    raffi@iot-micro-raffi ~/tmp > 
    
    

    The expected result would be that the Java version should be able to decompress the output of the Golang version.

  • LZ4 Framing

    LZ4 Framing

    I'd like to see LZ4 framing supported.

    The frame-descriptor includes an optional content-length, which may help to eliminate go-lz4's unique means of storing such:

    go-lz4 saves a uint32 with the original uncompressed length at the beginning of the encoded buffer

  • encode with io.Writer / decode with io.Reader

    encode with io.Writer / decode with io.Reader

    I'd love to use this code, but my project involves compressing files up to gigabytes in size.

    For that use case I really need a streaming version of the algorithm (i.e. using the io.Writer/io.Reader interfaces). I see there's one as part of the reference C implementation, but I really have no idea where to start with porting it to Go.

Related tags
Slipstream is a method for lossless compression of power system data.

Slipstream Slipstream is a method for lossless compression of power system data. Design principles The protocol is designed for streaming raw measurem

Apr 14, 2022
Optimized compression packages

compress This package provides various compression algorithms. zstandard compression and decompression in pure Go. S2 is a high performance replacemen

Jan 4, 2023
Go wrapper for LZO compression library

This is a cgo wrapper around the LZO real-time compression library. LZO is available at http://www.oberhumer.com/opensource/lzo/ lzo.go is the go pack

Mar 4, 2022
Go parallel gzip (de)compression

pgzip Go parallel gzip compression/decompression. This is a fully gzip compatible drop in replacement for "compress/gzip". This will split compression

Dec 29, 2022
Unsigned Integer 32 Byte Packing Compression

dbp32 Unsigned Integer 32 Byte Packing Compression. Inspired by lemire/FastPFor. Package bp32 is an implementation of the binary packing integer compr

Sep 6, 2021
Bzip2 Compression Tool written in Go

Bzip2 Compression Tool written in Go

Dec 28, 2021
An easy-to-use CLI-based compression tool.

Easy Compression An easy-to-use CLI-based compression tool. Usage NAME: EasyCompression - A CLI-based tool for (de)compression USAGE: EasyCompr

Jan 1, 2022
zlib compression tool for modern multi-core machines written in Go

zlib compression tool for modern multi-core machines written in Go

Jan 21, 2022
LZ4 compression and decompression in pure Go

lz4 : LZ4 compression in pure Go Overview This package provides a streaming interface to LZ4 data streams as well as low level compress and uncompress

Dec 27, 2022
Slipstream is a method for lossless compression of power system data.

Slipstream Slipstream is a method for lossless compression of power system data. Design principles The protocol is designed for streaming raw measurem

Apr 14, 2022
Eunomia is a distributed application framework that support Gossip protocol, QuorumNWR algorithm, PBFT algorithm, PoW algorithm, and ZAB protocol and so on.

Introduction Eunomia is a distributed application framework that facilitates developers to quickly develop distributed applications and supports distr

Sep 28, 2021
Package flac provides access to FLAC (Free Lossless Audio Codec) streams.

flac This package provides access to FLAC (Free Lossless Audio Codec) streams. Documentation Documentation provided by GoDoc. flac: provides access to

Jan 5, 2023
Kanzi is a modern, modular, expendable and efficient lossless data compressor implemented in Go.

kanzi Kanzi is a modern, modular, expendable and efficient lossless data compressor implemented in Go. modern: state-of-the-art algorithms are impleme

Dec 22, 2022
Go-enum-algorithm - Implement an enumeration algorithm in GO

go-enum-algorithm implement an enumeration algorithm in GO run the code go run m

Feb 15, 2022
A Go port of the Rapid Automatic Keyword Extraction algorithm (RAKE)

A Go implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm as described in: Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010).

Nov 23, 2022
A little websocket TCP proxy to let browsers talk to a fixed port on arbitrary hosts. Built for Gemini (gemini://, port 1965)

Kepler A little websocket TCP proxy built to let Amfora talk to Gemini servers when running in a browser. Usage $ git clone https://github.com/awfulco

May 27, 2022
Port-proxy - Temporary expose port for remote connections

Port proxy util Temporary expose port for remote connections. E.g. database/wind

Jan 27, 2022
Optimized compression packages

compress This package provides various compression algorithms. zstandard compression and decompression in pure Go. S2 is a high performance replacemen

Jan 4, 2023
Go wrapper for LZO compression library

This is a cgo wrapper around the LZO real-time compression library. LZO is available at http://www.oberhumer.com/opensource/lzo/ lzo.go is the go pack

Mar 4, 2022
Go parallel gzip (de)compression

pgzip Go parallel gzip compression/decompression. This is a fully gzip compatible drop in replacement for "compress/gzip". This will split compression

Dec 29, 2022