succinct provides several static succinct data types

succinct

Travis test

Report card Coverage Status

GoDoc PkgGoDev Sourcegraph

succinct provides several static succinct data types

Succinct Set

中文介绍: 100行代码的压缩前缀树: 50% smaller

Set is a succinct, sorted and static string set impl with compacted trie as storage. The space cost is about half lower than the original data.

Synopsis

package succinct

import "fmt"

func ExampleNewSet() {
	keys := []string{
		"A", "Aani", "Aaron", "Aaronic", "Aaronical", "Aaronite",
		"Aaronitic", "Aaru", "Ab", "Ababdeh", "Ababua", "Abadite",
	}
	s := NewSet(keys)
	for _, k := range []string{"Aani", "Foo", "Ababdeh"} {
		found := s.Has(k)
		fmt.Printf("lookup %10s, found: %v\n", k, found)
	}

	// Output:
	//
	// lookup       Aani, found: true
	// lookup        Foo, found: false
	// lookup    Ababdeh, found: true
}

Performance

  • 200 kilo real-world words collected from web:

    • the space costs is 57% of original data size.
    • And a Has() costs about 350 ns with a zip-f workload.

    Original size: 2204 KB

    With comparison with string array bsearch and google btree :

    Data Engine Size(KB) Size/original ns/op
    200kweb2 bsearch 5890 267% 229
    200kweb2 succinct.Set 1258 57% 356
    200kweb2 btree 12191 553% 483

    A string in go has two fields: a pointer to the text content and a length. Thus the space overhead is quite high with small strings. btree internally has more pointers and indirections(interface).

  • 870 kilo real-world ipv4:

    • the space costs is 67% of original data size.
    • And a Has() costs about 500 ns with a zip-f workload.

    Original size: 6823 KB

    Data Engine Size(KB) Size/original ns/op
    870k_ip4_hex bsearch 17057 500% 276
    870k_ip4_hex succinct.Set 2316 67% 496
    870k_ip4_hex btree 40388 1183% 577

Implementation

It stores sorted strings in a compacted trie(AKA prefix tree). A trie node has at most 256 outgoing labels. A label is just a single byte. E.g., [ab, abc, abcd, axy, buv] is represented with a trie like the following: (Numbers are node id)

^ -a-> 1 -b-> 3 $
  |      |      `c-> 6 $
  |      |             `d-> 9 $
  |      `x-> 4 -y-> 7 $
  `b-> 2 -u-> 5 -v-> 8 $

Internally it uses a packed []byte and a bitmap with len([]byte) bits to describe the outgoing labels of a node,:

^: ab  00
1: bx  00
2: u   0
3: c   0
4: y   0
5: v   0
6: d   0
7: ø
8: ø
9: ø

In storage it packs labels together and bitmaps joined with separator 1:

labels(ignore space): "ab bx u c y v d"
label bitmap:          0010010101010101111

In this way every node has a 0 pointing to it(except the root node) and has a corresponding 1 for it:

                               .-----.
                       .--.    | .---|-.
                       |.-|--. | | .-|-|.
                       || ↓  ↓ | | | ↓ ↓↓
labels(ignore space):  ab bx u c y v d øøø
label bitmap:          0010010101010101111
node-id:               0  1  2 3 4 5 6 789
                          || | ↑ ↑ ↑ |   ↑
                          || `-|-|-' `---'
                          |`---|-'
                          `----'

To walk from a parent node along a label to a child node, count the number of 0 upto the bit the label position, then find where the the corresponding 1 is:

childNodeId = select1(rank0(i))

In our impl, it is:

nodeId = countZeros(ss.labelBitmap, ss.ranks, bmIdx+1)
bmIdx = selectIthOne(ss.labelBitmap, ss.ranks, ss.selects, nodeId-1) + 1

Finally leaf nodes are indicated by another bitmap leaves, in which a 1 at i-th bit indicates the i-th node is a leaf:

leaves: 0001001111

License

This project is licensed under the MIT License - see the LICENSE file for details.

Similar Resources

go.fifo provides a simple fifo thread-safe queue for the Go programming language

go.fifo Description go.fifo provides a simple FIFO thread-safe queue. *fifo.Queue supports pushing an item at the end with Add(), and popping an item

Aug 29, 2022

SliceX provides functional operations on Go slices using Go 1.18 type parameters.

SliceX provides functional operations on Go slices using Go 1.18 type parameters.

Nov 6, 2021

A tree like tool help you to explore data structures in your redis server

 A tree like tool help you to explore data structures in your redis server

Redis-view is a tree like tool help you explore data structures in your redis server

Mar 17, 2022

Bitset data structure

Your basic bit Set data structure for positive numbers A bit array, or bit set, is an efficient set data structure. It consists of an array that compa

Dec 29, 2022

Probabilistic set data structure

Probabilistic set data structure

Your basic Bloom filter Golang probabilistic set data structure A Bloom filter is a fast and space-efficient probabilistic data structure used to test

Dec 15, 2022

Probabilistic data structures for processing continuous, unbounded streams.

Boom Filters Boom Filters are probabilistic data structures for processing continuous, unbounded streams. This includes Stable Bloom Filters, Scalable

Dec 30, 2022

Data structure and algorithm library for go, designed to provide functions similar to C++ STL

GoSTL English | 简体中文 Introduction GoSTL is a data structure and algorithm library for go, designed to provide functions similar to C++ STL, but more p

Dec 26, 2022

Gota: DataFrames and data wrangling in Go (Golang)

Gota: DataFrames, Series and Data Wrangling for Go This is an implementation of DataFrames, Series and data wrangling methods for the Go programming l

Jan 6, 2023
Nullable Go types that can be marshalled/unmarshalled to/from JSON.

Nullable Go types Description This package provides nullable Go types for bool, float64, int64, int32, string and time.Time replacing sql.NullString,

Dec 12, 2022
Null Types, Safe primitive type conversion and fetching value from complex structures.

Typ Typ is a library providing a powerful interface to impressive user experience with conversion and fetching data from built-in types in Golang Feat

Sep 26, 2022
Package set is a small wrapper around the official reflect package that facilitates loose type conversion and assignment into native Go types.

Package set is a small wrapper around the official reflect package that facilitates loose type conversion and assignment into native Go types. Read th

Dec 27, 2022
Go strcut based enum, support all types.

go-ultra-enum go-ultra-enum is an enum generator for Go. It is inspired by the powerful enum types found in Java. go-ultra-enum has the following capa

Dec 21, 2021
Custom generic HTTP handler providing automatic JSON decoding/encoding of HTTP request/response to your concrete types

gap Custom generic HTTP handler providing automatic JSON decoding/encoding of HTTP request/response to your concrete types. gap.Wrap allows to use the

Aug 28, 2022
Generic types that are missing from Go, including sets, trees, sorted lists, etc.

go-typ Generic types that are missing from Go, including sets, trees, sorted lists, etc. All code is implemented with 0 dependencies and in pure Go co

Dec 4, 2022
BTree provides a simple, ordered, in-memory data structure for Go programs.

BTree implementation for Go This package provides an in-memory B-Tree implementation for Go, useful as an ordered, mutable data structure. The API is

Dec 30, 2022
Provides conversion from athena outputs to strongly-typed data models.
Provides conversion from athena outputs to strongly-typed data models.

Provides conversion from athena outputs to strongly defined data models. Getting started Given the following data struct you define: type MyModel stru

Feb 7, 2022
Graphoscope: a solution to access multiple independent data sources from a common UI and show data relations as a graph
Graphoscope: a solution to access multiple independent data sources from a common UI and show data relations as a graph

Graphoscope A solution to access multiple independent data sources from a common UI and show data relations as a graph: Contains a list of by default

May 26, 2022
Package ring provides a high performance and thread safe Go implementation of a bloom filter.

ring - high performance bloom filter Package ring provides a high performance and thread safe Go implementation of a bloom filter. Usage Please see th

Nov 20, 2022