Kanzi is a modern, modular, expendable and efficient lossless data compressor implemented in Go.

kanzi

Kanzi is a modern, modular, expendable and efficient lossless data compressor implemented in Go.

  • modern: state-of-the-art algorithms are implemented and multi-core CPUs can take advantage of the built-in multi-tasking.
  • modular: entropy codec and a combination of transforms can be provided at runtime to best match the kind of data to compress.
  • expandable: clean design with heavy use of interfaces as contracts makes integrating and expanding the code easy. No dependencies.
  • efficient: the code is optimized for efficiency (trade-off between compression ratio and speed).

Kanzi supports a wide range of compression ratios and can compress many files more than most common compressors (at the cost of decompression speed). It is not compatible with standard compression formats.

For more details, check https://github.com/flanglet/kanzi-go/wiki.

Credits

Matt Mahoney, Yann Collet, Jan Ondrus, Yuta Mori, Ilya Muravyov, Neal Burns, Fabian Giesen, Jarek Duda, Ilya Grebnov

Disclaimer

Use at your own risk. Always keep a backup of your files.

Build Status Go Report Card Total alerts Documentation

Silesia corpus benchmark

i7-7700K @4.20GHz, 32GB RAM, Ubuntu 20.04

go1.16rc1

Kanzi version 1.9 Go implementation. Block size is 100 MB.

Compressor Encoding (sec) Decoding (sec) Size
Original 211,938,580
Zstd 1.4.8 -2 --long=30 1.2 0.3 68,761,465
Zstd 1.4.8 -2 -T6 --long=30 0.7 0.3 68,761,465
Kanzi -l 1 2.8 1.3 68,471,355
Kanzi -l 1 -j 6 0.9 0.4 68,471,355
Pigz 1.6 -6 -p6 1.4 1.4 68,237,849
Gzip 1.6 -6 6.1 1.1 68,227,965
Brotli 1.0.9 -2 --large_window=30 1.5 0.8 68,033,377
Pigz 1.6 -9 -p6 3.0 1.6 67,656,836
Gzip 1.6 -9 14.0 1.0 67,631,990
Kanzi -l 2 3.6 1.3 64,522,501
Kanzi -l 2 -j 6 1.3 0.4 64,522,501
Brotli 1.0.9 -4 --large_window=30 4.1 0.7 64,267,169
Zstd 1.4.8 -9 --long=30 5.3 0.3 59,937,600
Zstd 1.4.8 -9 -T6 --long=30 2.8 0.3 59,937,600
Kanzi -l 3 4.8 2.3 59,647,212
Kanzi -l 3 -j 6 1.7 0.8 59,647,212
Zstd 1.4.8 -13 --long=30 16.0 0.3 58,065,257
Zstd 1.4.8 -13 -T6 --long=30 9.2 0.3 58,065,257
Orz 1.5.0 7.7 2.0 57,564,831
Brotli 1.0.9 -9 --large_window=30 36.7 0.7 56,232,817
Lzma 5.2.2 -3 24.1 2.6 55,743,540
Kanzi -l 4 10.6 6.9 54,996,858
Kanzi -l 4 -j 6 3.8 2.3 54,996,858
Bzip2 1.0.6 -9 14.9 5.2 54,506,769
Zstd 1.4.8 -19 --long=30 59.9 0.3 53,039,786
Zstd 1.4.8 -19 -T6 --long=30 59.7 0.4 53,039,786
Kanzi -l 5 12.4 6.5 51,745,795
Kanzi -l 5 -j 6 4.2 2.1 51,745,795
Brotli 1.0.9 --large_window=30 356.2 0.9 49,383,136
Lzma 5.2.2 -9 65.6 2.5 48,780,457
Kanzi -l 6 15.6 10.8 48,067,846
Kanzi -l 6 -j 6 5.3 3.7 48,067,846
BCM 1.6.0 -7 18.0 22.1 46,506,716
Kanzi -l 7 22.2 17.3 46,446,991
Kanzi -l 7 -j 6 8.0 6.2 46,446,991
Tangelo 2.4 83.2 85.9 44,862,127
zpaq v7.14 m4 t1 107.3 112.2 42,628,166
zpaq v7.14 m4 t12 108.1 111.5 42,628,166
Kanzi -l 8 63.4 64.6 41,830,871
Kanzi -l 8 -j 6 22.5 21.8 41,830,871
Tangelo 2.0 302.0 310.9 41,267,068
Kanzi -l 9 84.8 86.5 40,369,883
Kanzi -l 9 -j 6 33.8 33.5 40,369,883
zpaq v7.14 m5 t1 343.1 352.0 39,112,924
zpaq v7.14 m5 t12 344.3 350.4 39,112,924

enwik8

i7-7700K @4.20GHz, 32GB RAM, Ubuntu 20.04

go1.16rc1

Kanzi version 1.9 Go implementation. Block size is 100 MB. 1 thread

Compressor Encoding (sec) Decoding (sec) Size
Original 100,000,000
Kanzi -l 1 1.49 0.75 32,650,127
Kanzi -l 2 2.03 0.74 31,018,886
Kanzi -l 3 2.41 1.19 27,328,809
Kanzi -l 4 5.10 3.40 25,670,935
Kanzi -l 5 5.02 2.60 22,484,700
Kanzi -l 6 7.15 4.45 21,232,218
Kanzi -l 7 10.84 7.97 20,935,522
Kanzi -l 8 23.86 23.90 19,671,830
Kanzi -l 9 31.84 32.55 19,097,962

Build

There are no dependencies, making the project easy to build.

Option 1: go get

cd $GOPATH

go get github.com/flanglet/kanzi-go

cd src/github.com/flanglet/kanzi-go/app

go build Kanzi.go BlockCompressor.go BlockDecompressor.go InfoPrinter.go

Option 2: git clone

cd $GOPATH/src

mkdir github.com; cd github.com

mkdir flanglet; cd flanglet

git clone https://github.com/flanglet/kanzi-go.git

cd kanzi-go/app

go build Kanzi.go BlockCompressor.go BlockDecompressor.go InfoPrinter.go
Similar Resources

A library for performing data pipeline / ETL tasks in Go.

A library for performing data pipeline / ETL tasks in Go.

Ratchet A library for performing data pipeline / ETL tasks in Go. The Go programming language's simplicity, execution speed, and concurrency support m

Jan 19, 2022

A distributed, fault-tolerant pipeline for observability data

Table of Contents What Is Veneur? Use Case See Also Status Features Vendor And Backend Agnostic Modern Metrics Format (Or Others!) Global Aggregation

Dec 25, 2022

Data syncing in golang for ClickHouse.

Data syncing in golang for ClickHouse.

ClickHouse Data Synchromesh Data syncing in golang for ClickHouse. based on go-zero ARCH A typical data warehouse architecture design of data sync Aut

Jan 1, 2023

sq is a command line tool that provides jq-style access to structured data sources such as SQL databases, or document formats like CSV or Excel.

sq: swiss-army knife for data sq is a command line tool that provides jq-style access to structured data sources such as SQL databases, or document fo

Jan 1, 2023

Machine is a library for creating data workflows.

Machine is a library for creating data workflows.

Machine is a library for creating data workflows. These workflows can be either very concise or quite complex, even allowing for cycles for flows that need retry or self healing mechanisms.

Dec 26, 2022

Stream data into Google BigQuery concurrently using InsertAll() or BQ Storage.

bqwriter A Go package to write data into Google BigQuery concurrently with a high throughput. By default the InsertAll() API is used (REST API under t

Dec 16, 2022

Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more

Gonum Installation The core packages of the Gonum suite are written in pure Go with some assembly. Installation is done using go get. go get -u gonum.

Dec 29, 2022

Simple CRUD application using CockroachDB and Go

Simple CRUD application using CockroachDB and Go

Feb 20, 2022
Comments
  • When using io.Copy, CompressedStream never returns EOF when decompressing - infinite loop

    When using io.Copy, CompressedStream never returns EOF when decompressing - infinite loop

    Hi,

    Most encoders, while attempting to decompress using io.Copy for other encoders (zstandard, lz4, gzip, etc), will have the Read function return an EOF as an error when it's completed, but that is not happening here. Is this on pupose?

    Here is my code sample:

    	var (
    		file     *os.File
    		filename = fmt.Sprintf("%v.bak", strings.TrimSuffix(in, ".knz"))
    		zfile    *os.File
    		zinfo    os.FileInfo
    		ctx      = map[string]interface{}{
    			"inputName":  in,
    			"outputName": filename,
    			"jobs":       uint(1),
    			"overwrite":  true,
    			"verbosity":  1,
    		}
    	)
    
    	zfile, err = os.Open(in)
    	if err != nil {
    		return
    	}
    
    	zinfo, err = zfile.Stat()
    	if err != nil {
    		return
    	}
    
    	ctx["fileSize"] = zinfo.Size()
    
    	mode := zinfo.Mode() // use the same mode for the output file
    
    	// Output file.
    	file, err = os.OpenFile(filename, os.O_CREATE|os.O_WRONLY, mode)
    	if err != nil {
    		return
    	}
    
    	zr, err := kio.NewCompressedInputStreamWithCtx(zfile, ctx)
    
    	// Uncompress.
    	_, err = io.Copy(file, zr)
    	if err != nil {
    		return
    	}
    
    	for _, c := range []io.Closer{zr, file, zfile} {
    		err = c.Close()
    		if err != nil {
    			return
    		}
    	}
    
    	msg := fmt.Sprintf("Decoding %v: %v => %v bytes", in, zinfo.Size(), zr.GetRead())
    	log.Println(msg, true)
    
    	return
    
  • Hash benchmark test failed

    Hash benchmark test failed

    Hi, an error occurred when I was running benchmark test. I’d be grateful if someone could help me.

    C:\Users\xxxxx\AppData\Local\Temp___gobench_Hash_test_go.exe -test.v -test.bench "^BenchmarkXXHash32b|BenchmarkXXHash64$" -test.run ^$ #gosetup goos: windows goarch: amd64 pkg: github.com/flanglet/kanzi-go/benchmark cpu: Intel(R) Core(TM) i5-8600 CPU @ 3.10GHz BenchmarkXXHash32b Hash_test.go:48: Incorrect result for XXHash32 --- FAIL: BenchmarkXXHash32b BenchmarkXXHash64 Hash_test.go:75: Incorrect result for XXHash64 --- FAIL: BenchmarkXXHash64 FAIL

    Process finished with exit code 1

  • Compatibility issue between master branch and tag 1.8.0

    Compatibility issue between master branch and tag 1.8.0

    Hi,

    I compressed a file with tag 1.8.0, then attempted to decompress with the commit d1d768f6499a (github.com/flanglet/kanzi-go v1.8.1-0.20210210012806-d1d768f6499a) and get the following error:

    image

    There looks to be an incompatibility issue.

Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.

Gleam Gleam is a high performance and efficient distributed execution system, and also simple, generic, flexible and easy to customize. Gleam is built

Jan 5, 2023
Graphik is a Backend as a Service implemented as an identity-aware document & graph database with support for gRPC and graphQL
Graphik is a Backend as a Service implemented as an identity-aware document & graph database with support for gRPC and graphQL

Graphik is a Backend as a Service implemented as an identity-aware, permissioned, persistant document/graph database & pubsub server written in Go.

Dec 30, 2022
Baker is a high performance, composable and extendable data-processing pipeline for the big data era

Baker is a high performance, composable and extendable data-processing pipeline for the big data era. It shines at converting, processing, extracting or storing records (structured data), applying whatever transformation between input and output through easy-to-write filters.

Dec 14, 2022
Dud is a lightweight tool for versioning data alongside source code and building data pipelines.

Dud Website | Install | Getting Started | Source Code Dud is a lightweight tool for versioning data alongside source code and building data pipelines.

Jan 1, 2023
CUE is an open source data constraint language which aims to simplify tasks involving defining and using data.

CUE is an open source data constraint language which aims to simplify tasks involving defining and using data.

Jan 1, 2023
xyr is a very lightweight, simple and powerful data ETL platform that helps you to query available data sources using SQL.

xyr [WIP] xyr is a very lightweight, simple and powerful data ETL platform that helps you to query available data sources using SQL. Supported Drivers

Dec 2, 2022
DEPRECATED: Data collection and processing made easy.

This project is deprecated. Please see this email for more details. Heka Data Acquisition and Processing Made Easy Heka is a tool for collecting and c

Nov 30, 2022
Open source framework for processing, monitoring, and alerting on time series data

Kapacitor Open source framework for processing, monitoring, and alerting on time series data Installation Kapacitor has two binaries: kapacitor – a CL

Dec 24, 2022
churro is a cloud-native Extract-Transform-Load (ETL) application designed to build, scale, and manage data pipeline applications.

Churro - ETL for Kubernetes churro is a cloud-native Extract-Transform-Load (ETL) application designed to build, scale, and manage data pipeline appli

Mar 10, 2022
Dev Lake is the one-stop solution that integrates, analyzes, and visualizes software development data
Dev Lake is the one-stop solution that integrates, analyzes, and visualizes software development data

Dev Lake is the one-stop solution that integrates, analyzes, and visualizes software development data throughout the software development life cycle (SDLC) for engineering teams.

Dec 30, 2022