Interfaces for LZ77-based data compression

Pack

Interfaces for LZ77-based data compression.

Introduction

Many compression libraries have two main parts:

  • Something that looks for repeated sequences of bytes
  • An encoder for the compressed data format (often an entropy coder)

Although these are logically two separate steps, the implementations are usually closely tied together. You can't use flate's matcher with snappy's encoder, for example. Pack defines interfaces and an intermediate representation to allow mixing and matching compression components.

Interfaces

Pack defines two interfaces, MatchFinder and Encoder, and a Writer type to tie them together and implement io.Writer. Both of the interfaces use an append-like API, where a destination slice is be passed as the first parameter, and the output is appended to it. If you pass nil as the destination, you will get a newly-allocated slice.

// A MatchFinder performs the LZ77 stage of compression, looking for matches.
type MatchFinder interface {
	// FindMatches looks for matches in src, appends them to dst, and returns dst.
	FindMatches(dst []Match, src []byte) []Match

	// Reset clears any internal state, preparing the MatchFinder to be used with
	// a new stream.
	Reset()
}

// An Encoder encodes the data in its final format.
type Encoder interface {
	// Encode appends the encoded format of src to dst, using the match
	// information from matches.
	Encode(dst []byte, src []byte, matches []Match, lastBlock bool) []byte

	// Reset clears any internal state, preparing the Encoder to be used with
	// a new stream.
	Reset()
}

// A Writer uses MatchFinder and Encoder to write compressed data to Dest.
type Writer struct {
	Dest        io.Writer
	MatchFinder MatchFinder
	Encoder     Encoder

	// BlockSize is the number of bytes to compress at a time. If it is zero,
	// each Write operation will be treated as one block.
	BlockSize int
}

Example

Here is an example program that finds repititions in the Go Proverbs, and prints them with human-readable backreferences.

package main

import (
	"fmt"

	"github.com/andybalholm/pack"
	"github.com/andybalholm/pack/flate"
)

const proverbs = `Don't communicate by sharing memory, share memory by communicating.
Concurrency is not parallelism.
Channels orchestrate; mutexes serialize.
The bigger the interface, the weaker the abstraction.
Make the zero value useful.
interface{} says nothing.
Gofmt's style is no one's favorite, yet gofmt is everyone's favorite.
A little copying is better than a little dependency.
Syscall must always be guarded with build tags.
Cgo must always be guarded with build tags.
Cgo is not Go.
With the unsafe package there are no guarantees.
Clear is better than clever.
Reflection is never clear.
Errors are values.
Don't just check errors, handle them gracefully.
Design the architecture, name the components, document the details.
Documentation is for users.
Don't panic.`

func main() {
	var mf flate.DualHash
	var e pack.TextEncoder

	b := []byte(proverbs)
	matches := mf.FindMatches(nil, b)
	out := e.Encode(nil, b, matches, true)

	fmt.Printf("%s\n", out)
}

Implementations

The brotli, flate, and snappy directories contain implementations of Encoder and MatchFinder based on github.com/andybalholm/brotli, compress/flate, and github.com/golang/snappy.

The flate implementation, with BestSpeed, is faster than compress/flate. This demonstrates that although there is some overhead due to using the Pack interfaces, it isn’t so much that you can’t get reasonable compression performance.

Similar Resources

mtail - extract internal monitoring data from application logs for collection into a timeseries database

 mtail - extract internal monitoring data from application logs for collection into a timeseries database

mtail - extract internal monitoring data from application logs for collection into a timeseries database mtail is a tool for extracting metrics from a

Dec 29, 2022

Go Huobi Market Price Data Monitor

Go Huobi Market Price Data Monitor

火币(Huobi)价格监控 由于部分交易对火币官方未提供价格监控,因此写了个小程序,长期屯币党可以用它来提醒各种现货价格。 该工具只需要提前安装Go环境和Redis即可。 消息推送使用的「钉钉」,需要提前配置好钉钉机器人(企业群类型、带webhook的机器人)。 使用方法 下载本项目 拷贝根目录下

Oct 13, 2022

Beta tool to normalize Orbit member data

Orbit Normalize Member Data Thanks for checking out my handy tool to work with Orbit's api. Everything is written in go and will continue to be update

Sep 16, 2021

Secure logger in Go to avoid output sensitive data in log

Secure logger in Go to avoid output sensitive data in log

zlog A main distinct feature of zlog is secure logging that avoid to output secret/sensitive values to log. The feature reduce risk to store secret va

Dec 6, 2022

Go-bqstreamer - Stream data into Google BigQuery concurrently using InsertAll()

Kik and me (@oryband) are no longer maintaining this repository. Thanks for all the contributions. You are welcome to fork and continue development. B

Dec 7, 2022

Implements a deep pretty printer for Go data structures to aid in debugging

spew Spew implements a deep pretty printer for Go data structures to aid in debugging. A comprehensive suite of tests with 100% test coverage is provi

Dec 25, 2022

Time based rotating file writer

cronowriter This is a simple file writer that it writes message to the specified format path. The file path is constructed based on current date and t

Dec 29, 2022

CoLog is a prefix-based leveled execution log for Go

CoLog is a prefix-based leveled execution log for Go

What's CoLog? CoLog is a prefix-based leveled execution log for Go. It's heavily inspired by Logrus and aims to offer similar features by parsing the

Dec 14, 2022

rtop is an interactive, remote system monitoring tool based on SSH

rtop rtop is a remote system monitor. It connects over SSH to a remote system and displays vital system metrics (CPU, disk, memory, network). No speci

Dec 30, 2022
Implements a deep pretty printer for Go data structures to aid in debugging

go-spew Go-spew implements a deep pretty printer for Go data structures to aid in debugging. A comprehensive suite of tests with 100% test coverage is

Jan 9, 2023
A flexible process data collection, metrics, monitoring, instrumentation, and tracing client library for Go
A flexible process data collection, metrics, monitoring, instrumentation, and tracing client library for Go

Package monkit is a flexible code instrumenting and data collection library. See documentation at https://godoc.org/gopkg.in/spacemonkeygo/monkit.v3 S

Dec 14, 2022
The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.
The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.

The open-source platform for monitoring and observability. Grafana allows you to query, visualize, alert on and understand your metrics no matter wher

Jan 3, 2023
Open source framework for processing, monitoring, and alerting on time series data

Kapacitor Open source framework for processing, monitoring, and alerting on time series data Installation Kapacitor has two binaries: kapacitor – a CL

Dec 26, 2022
Golang beautify data display for Humans

Golang beautify data display for Humans English 简体中文 Usage Examples package main import ( ffmt "gopkg.in/ffmt.v1" ) func main() { example() } typ

Dec 22, 2022
pprof is a tool for visualization and analysis of profiling data

Introduction pprof is a tool for visualization and analysis of profiling data. pprof reads a collection of profiling samples in profile.proto format a

Jan 8, 2023
Litter is a pretty printer library for Go data structures to aid in debugging and testing.

Litter Litter is a pretty printer library for Go data structures to aid in debugging and testing. Litter is provided by Sanity: The Headless CMS Const

Dec 28, 2022
Very simple charts with some debug data for Go programs
Very simple charts with some debug data for Go programs

debugcharts Go memory debug charts. This package uses Plotly chart library. It is open source and free for use. Installation go get -v -u github.com/m

Dec 14, 2022
Visualise Go program GC trace data in real time

This project is no longer maintained I'm sorry but I do not have the bandwidth to maintain this tool. Please do not send issues or PRs. Thank you. gcv

Dec 14, 2022
Parametrized JSON logging library in Golang which lets you obfuscate sensitive data and marshal any kind of content.
Parametrized JSON logging library in Golang which lets you obfuscate sensitive data and marshal any kind of content.

Noodlog Summary Noodlog is a Golang JSON parametrized and highly configurable logging library. It allows you to: print go structs as JSON messages; pr

Oct 27, 2022