A k-mer serialization package for Golang

.uniq v5

Go Reference

This package provides k-mer serialization methods for the package kmers, TaxIds of k-mers are optionally saved, while there's no frequency information.

This package is used in project unikmer and kmcp.

Details

K-mers (represented in uint64 in RAM ) are serialized in 8-Byte (or less Bytes for shorter k-mers in compact format, or much less Bytes for sorted k-mers) arrays and optionally compressed in gzip format with extension of .unik.

TaxIds are optionally stored next to k-mers with 4 or less bytes.

Compression rate comparison

No TaxIds stored in this test.

cr.jpg

label encoded-kmera gzip-compressedb compact-formatc sortedd comment
plain plain text
gzip gzipped plain text
unik.default gzipped encoded k-mers in fixed-length byte array
unik.compat gzipped encoded k-mers in shorter fixed-length byte array
unik.sorted gzipped sorted encoded k-mers
  • a One k-mer is encoded as uint64 and serialized in 8 Bytes.
  • b K-mers file is compressed in gzip format by default, users can switch on global option -C/--no-compress to output non-compressed file.
  • c One k-mer is encoded as uint64 and serialized in 8 Bytes by default. However few Bytes are needed for short k-mers, e.g., 4 Bytes are enough for 15-mers (30 bits). This makes the file more compact with smaller file size, controled by global option -c/--compact .
  • d One k-mer is encoded as uint64, all k-mers are sorted and compressed using varint-GB algorithm.
  • In all test, flag --canonical is ON when running unikmer count.

License

MIT License

History

This package was originally maintained in unikmer.

The magic number of binary format is still .unikmer for keeping compatibility.

Owner
Wei Shen
Bioinformatician, postdoc
Wei Shen
Similar Resources

golang struct 或其他对象向 []byte 的序列化或反序列化

bytecodec 字节流编解码 这个库实现 struct 或其他对象向 []byte 的序列化或反序列化 可以帮助你在编写 tcp 服务,或者需要操作字节流时,简化数据的组包、解包 这个库的组织逻辑 copy 借鉴了标准库 encoding/json 🙏 安装 使用 go get 安装最新版本

Jun 30, 2022

Dynamically Generates Ysoserial's Payload by Golang

Dynamically Generates Ysoserial's Payload by Golang

Gososerial 介绍 ysoserial是java反序列化安全方面著名的工具 无需java环境,无需下载ysoserial.jar文件 输入命令直接获得payload,方便编写安全工具 目前已支持CC1-CC7,K1-K4和CB1链 Introduce Ysoserial is a well-

Jul 10, 2022

Some Golang types based on builtin. Implements interfaces Value / Scan and MarshalJSON / UnmarshalJSON for simple working with database NULL-values and Base64 encoding / decoding.

gotypes Some simple types based on builtin Golang types that implement interfaces for working with DB (Scan / Value) and JSON (Marshal / Unmarshal). N

Feb 12, 2022

gogoprotobuf is a fork of golang/protobuf with extra code generation features.

GoGo Protobuf looking for new ownership Protocol Buffers for Go with Gadgets gogoprotobuf is a fork of golang/protobuf with extra code generation feat

Nov 26, 2021

generic sort for slices in golang

slices generic sort for slices in golang basic API func BinarySearch[E constraints.Ordered](list []E, x E) int func IsSorted[E constraints.Ordered](li

Nov 3, 2022

Bitbank-trezor - Bitbank trezor with golang

Bitbank - Trezor (c) 2022 Bernd Fix [email protected] Y bitbank-trezor is fre

Jan 27, 2022

binary serialization format

Colfer Colfer is a binary serialization format optimized for speed and size. The project's compiler colf(1) generates source code from schema definiti

Dec 25, 2022

binary serialization format

Colfer Colfer is a binary serialization format optimized for speed and size. The project's compiler colf(1) generates source code from schema definiti

Dec 25, 2022

Benchmarks of Go serialization methods

Benchmarks of Go serialization methods This is a test suite for benchmarking various Go serialization methods. Tested serialization methods encoding/g

Jan 1, 2023

faster JSON serialization for Go

ffjson: faster JSON for Go ffjson generates static MarshalJSON and UnmarshalJSON functions for structures in Go. The generated functions reduce the re

Dec 25, 2022

Implementation of Ethernet 2, 802.1Q, 802.1P, 802.11 (Wireless Ethernet) frame serialization/deserialization

Implementation of Ethernet 2, 802.1Q, 802.1P, 802.11 (Wireless Ethernet) frame serialization/deserialization

Feb 5, 2022

A Go (golang) package that enhances the standard database/sql package by providing powerful data retrieval methods as well as DB-agnostic query building capabilities.

ozzo-dbx Summary Description Requirements Installation Supported Databases Getting Started Connecting to Database Executing Queries Binding Parameters

Dec 31, 2022

Go Package Manager (gopm) is a package manager and build tool for Go.

🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 🚨 In favor of Go Modules Proxy since Go 1.11, this pr

Dec 14, 2022

Package set is a small wrapper around the official reflect package that facilitates loose type conversion and assignment into native Go types.

Package set is a small wrapper around the official reflect package that facilitates loose type conversion and assignment into native Go types. Read th

Dec 27, 2022

Go package providing simple database and server interfaces for the CSV files produced by the sfomuseum/go-libraryofcongress package

Go package providing simple database and server interfaces for the CSV files produced by the sfomuseum/go-libraryofcongress package

go-libraryofcongress-database Go package providing simple database and server interfaces for the CSV files produced by the sfomuseum/go-libraryofcongr

Oct 29, 2021

A package for running subprocesses in Go, similar to Python's subprocesses package.

A package for running subprocesses in Go, similar to Python's subprocesses package.

Jul 28, 2022

Utility to restrict which package is allowed to import another package.

go-import-rules Utility to restrict which package is allowed to import another package. This tool will read import-rules.yaml or import-rules.yml in t

Jan 7, 2022

A package for running subprocesses in Go, similar to Python's subprocesses package.

Subprocesses Spawn subprocesses in Go. Sanitized mode package main import ( "log" "github.com/estebangarcia21/subprocess" ) func main() { s := s

Jul 28, 2022
Go package for dealing with maps, slices, JSON and other data.

Objx Objx - Go package for dealing with maps, slices, JSON and other data. Get started: Install Objx with one line of code, or update it with another

Dec 27, 2022
Asn.1 BER and DER encoding library for golang.

WARNING This repo has been archived! NO further developement will be made in the foreseen future. asn1 -- import "github.com/PromonLogicalis/asn1" Pac

Nov 14, 2022
auto-generate capnproto schema from your golang source files. Depends on go-capnproto-1.0 at https://github.com/glycerine/go-capnproto

bambam: auto-generate capnproto schema from your golang source files. Adding capnproto serialization to an existing Go project used to mean writing a

Sep 27, 2022
Golang binary decoder for mapping data into the structure

binstruct Golang binary decoder to structure Install go get -u github.com/ghostiam/binstruct Examples ZIP decoder PNG decoder Use For struct From file

Dec 17, 2022
CBOR RFC 7049 (Go/Golang) - safe & fast with standard API + toarray & keyasint, CBOR tags, float64/32/16, fuzz tested.
CBOR RFC 7049 (Go/Golang) - safe & fast with standard API + toarray & keyasint, CBOR tags, float64/32/16, fuzz tested.

CBOR library in Go fxamacker/cbor is a CBOR encoder & decoder in Go. It has a standard API, CBOR tags, options for duplicate map keys, float64→32→16,

Jan 6, 2023
csvutil provides fast and idiomatic mapping between CSV and Go (golang) values.
csvutil provides fast and idiomatic mapping between CSV and Go (golang) values.

csvutil Package csvutil provides fast and idiomatic mapping between CSV and Go (golang) values. This package does not provide a CSV parser itself, it

Jan 6, 2023
Fixed width file parser (encoder/decoder) in GO (golang)

Fixed width file parser (encoder/decoder) for GO (golang) This library is using to parse fixed-width table data like: Name Address

Sep 27, 2022
Fast implementation of base58 encoding on golang.

Fast Implementation of Base58 encoding Fast implementation of base58 encoding in Go. Base algorithm is adapted from https://github.com/trezor/trezor-c

Dec 9, 2022
msgpack.org[Go] MessagePack encoding for Golang

MessagePack encoding for Golang ❤️ Uptrace.dev - All-in-one tool to optimize performance and monitor errors & logs Join Discord to ask questions. Docu

Dec 28, 2022
Encode and decode Go (golang) struct types via protocol buffers.

protostructure protostructure is a Go library for encoding and decoding a struct type over the wire. This library is useful when you want to send arbi

Nov 15, 2022