Adaptive Radix Trees implemented in Go

An Adaptive Radix Tree Implementation in Go

Build Status Coverage Status Go Report Card GoDoc

This library provides a Go implementation of the Adaptive Radix Tree (ART).

Features:

  • Lookup performance surpasses highly tuned alternatives
  • Support for highly efficient insertions and deletions
  • Space efficient
  • Performance is comparable to hash tables
  • Maintains the data in sorted order, which enables additional operations like range scan and prefix lookup
  • O(k) search/insert/delete operations, where k is the length of the key
  • Minimum / Maximum value lookups
  • Ordered iteration
  • Prefix-based iteration
  • Support for keys with null bytes, any byte array could be a key

Usage

package main

import (
    "fmt"
    "github.com/plar/go-adaptive-radix-tree"
)

func main() {

    tree := art.New()

    tree.Insert(art.Key("Hi, I'm Key"), "Nice to meet you, I'm Value")
    value, found := tree.Search(art.Key("Hi, I'm Key"))
    if found {
        fmt.Printf("Search value=%v\n", value)
    }

    tree.ForEach(func(node art.Node) bool {
        fmt.Printf("Callback value=%v\n", node.Value())
        return true
    })

    for it := tree.Iterator(); it.HasNext(); {
        value, _ := it.Next()
        fmt.Printf("Iterator value=%v\n", value.Value())
    }
}

// Output:
// Search value=Nice to meet you, I'm Value
// Callback value=Nice to meet you, I'm Value
// Iterator value=Nice to meet you, I'm Value

Documentation

Check out the documentation on godoc.org

Performance

plar/go-adaptive-radix-tree outperforms kellydunn/go-art by avoiding memory allocations during search operations. It also provides prefix based iteration over the tree.

Benchmarks were performed on datasets extracted from different projects:

  • The "Words" dataset contains a list of 235,886 english words. [2]
  • The "UUIDs" dataset contains 100,000 uuids. [2]
  • The "HSK Words" dataset contains 4,995 words. [4]
go-adaptive-radix-tree # Average time Bytes per operation Allocs per operation
Tree Insert Words 9 117,888,698 ns/op 37,942,744 B/op 1,214,541 allocs/op
Tree Search Words 26 44,555,608 ns/op 0 B/op 0 allocs/op
Tree Insert UUIDs 18 59,360,135 ns/op 18,375,723 B/op 485,057 allocs/op
Tree Search UUIDs 54 21,265,931 ns/op 0 B/op 0 allocs/op
go-art
Tree Insert Words 5 272,047,975 ns/op 81,628,987 B/op 2,547,316 allocs/op
Tree Search Words 10 129,011,177 ns/op 13,272,278 B/op 1,659,033 allocs/op
Tree Insert UUIDs 10 140,309,246 ns/op 33,678,160 B/op 874,561 allocs/op
Tree Search UUIDs 20 82,120,943 ns/op 3,883,131 B/op 485,391 allocs/op

To see more benchmarks just run

$ make benchmark

References

[1] The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases (Specification)

[2] C99 implementation of the Adaptive Radix Tree

[3] Another Adaptive Radix Tree implementation in Go

[4] HSK Words. HSK(Hanyu Shuiping Kaoshi) - Standardized test of Standard Mandarin Chinese proficiency.

Comments
  • Range Scan?

    Range Scan?

    is there support for doing range scans?

    For example:

    start := []byte("foo_10")
    end := []byte("foo_20")
    trie.ForEachRange(start, end, func(node art.Node) bool {
        // do something with node
        return true
    })
    

    I think only prefix iteration is support currently? Would it be hard to add range iteration?

  • Optimization: proposal for reducing memory usage

    Optimization: proposal for reducing memory usage

    • as the base part of node, numChildren and prefixLen could be modified to uint8 to save 6 bytes each node except leaf;

    type node struct { numChildren int prefixLen int prefix prefix }

    • as the Kind type, int is also unnecessary, uint8 is enough too, this optimization will save 3 bytes for each art-node type Kind int
  • Support for deletion of node inside of ForEach

    Support for deletion of node inside of ForEach

    Hello,

    I've been working with @prologic to try to bring bitcask up to version 1.0.

    One of the things we've been struggling with is allowing a user to iterate over all keys in the tree and allow deletion by value, rather than by key (i.e. delete a key from within the ForEach callback function). The current implementation causes the ForEach to skip a node that would have been iterated over at some point after the deleted node (sometimes the next node, sometimes the one after that, etc.)

    Would you consider letting us modify the iterator to pull the "next" value before the callback so that no nodes get skipped? Or would this be better done in a fork?

    Sorry if my ask here is not clear, let me know if you need any further clarification.

  • possible to limit this to

    possible to limit this to

    1. possible to make this into an lru cache with fixed size? so that it can automatically remove items based on fifo

    2. how does this compare with patricia trie? https://github.com/tchap/go-patricia any benchmarks on speed and mem storage size used against patricia?

    great work by the way

  • Feature: Storage persistence

    Feature: Storage persistence

    Would it be possible to use something like flatbuffers or cap'n'proto as backing for the tree which would allow it to be persisted into disk, hence it would be loadable from disk to memory without the need to rebuild it?

  • Keys don't support null bytes

    Keys don't support null bytes

    Great job on building this library!

    I was looking at the example in dump_tree.go and puzzled by the use of null byte "keys" for leaf nodes (since there is no extra byte in their key to pivot on). I'm not sure how that can result in a correct outcome.

    I may still be missing something but this example shows why I think it's wrong unless it's a conscious decision not to allow null bytes in keys which isn't documented.

    package main
    
    import (
    	"fmt"
    
    	art "github.com/plar/go-adaptive-radix-tree"
    )
    
    func main() {
    	tree := art.New()
    	terms := []string{"A", "a", "aa", "aa\x00"}
    	for _, term := range terms {
    		tree.Insert(art.Key(term), term)
    	}
    	//fmt.Println(tree)
    
    	// Should find "aa"
    	fmt.Println(tree.Search(art.Key("aa")))
    	// Should find "aa\x00"
    	fmt.Println(tree.Search(art.Key("aa\x00")))
    
    	// Expected Output (note the null byte doesn't print):
    	// aa true
    	// aa true
    	//
    	// Actual Output:
    	// aa true
    	// <nil> false
    
    	// Dump output shows both the leaf "aa" stored with "key" of 0 as well as the
    	// child with key 0:
    	// ...
    	//    │   ├── Node4 (0xc00000e2c0)
    	//    │   │   prefix(0): [0 0 0 0 0 0 0 0 0 0] [··········]
    	//    │   │   keys: [0 0 97 0] [··a·]
    	//    │   │   children(3): [0xc00000e280 0xc00000e2b0 0xc00000e2e0 <nil>]
    	//    │   │   ├── Leaf (0xc00000e280)
    	//    │   │   │   key: [97 97] [aa]
    	//    │   │   │   val: aa
    	//    │   │   │
    	//    │   │   ├── Leaf (0xc00000e2b0)
    	//    │   │   │   key: [97 97 0] [aa·]
    	//    │   │   │   val: aa
    	// ...
    }
    

    Perhaps I'm missing something about the design that makes this expected behaviour? If leaves are going to be stored in this way (with an implicit null "key" indicating end of key/leaf). Then it seems to rule out null bytes in keys entirely? That seems like a limitation that's at least worth documenting.

    It seems to stem from charAt returning 0 if the index is out of range of the key: https://github.com/plar/go-adaptive-radix-tree/blob/ae4b22ebc44e591df19fbc197c60b1405d52b526/node.go#L68-L70

    I wonder if this is a relic of the fact this code is based on libart - in C strings are generally null-terminated. I've not tested but it would seem that libart has a bug here too: https://github.com/armon/libart/blob/73c46356e6752cc8b0b774a5d0f4d2a6832a5ed3/src/art.c#L616

    In the case where the key being inserted is a prefix of the existing internal node, this will attempt to create a new leaf and insert it at the key[offset+prefix_diff] index in the internal node. In the case described that it's reading off the end of the array since the leaf key is a prefix of the current node which means either a random char is used from some other bit of memory, a segfault occurs, or (possibly why it's not been noticed) if the string was allocated with a null terminator then the off-by-one read just always returns a null byte which is equivalent behaviour to this Go implementation.

    I don't know how actively this library is maintained and I don't intend to use it directly but was referring to the implementation as I'm writing an immutable ART in Go and was confused by this.

    If the answer is "don't use null byte" that's cool, just wanted to make sure I understood the intention!

    Thanks!

  • multiple inserts of `nil` panics

    multiple inserts of `nil` panics

    Great library!

    Saw the following panic in my usage:

    package main
    
    import (
    	art "github.com/plar/go-adaptive-radix-tree"
    )
    
    func main() {
    	tree := art.New()
    
    	tree.Insert(art.Key("Hi, I'm Key"), "Nice to meet you, I'm Value")
    	tree.Insert(art.Key(nil), "Meet you again")
    	tree.Insert(art.Key(nil), "Meet you again") // panic
    }
    

    https://go.dev/play/p/G6tYWq0QzpJ

    And the resulting trace:

    panic: runtime error: slice bounds out of range [1:0]
    
    goroutine 1 [running]:
    github.com/plar/go-adaptive-radix-tree.(*tree).recursiveInsert(0xc00009ae60?, 0xc000106050, {0x0, 0x0, 0x0}, {0x46e6e0?, 0x496e98}, 0x1)
    	/tmp/gopath331505154/pkg/mod/github.com/plar/[email protected]/tree.go:120 +0x5d7
    github.com/plar/go-adaptive-radix-tree.(*tree).recursiveInsert(0xc00010a000?, 0xc00009af60, {0x0, 0x0, 0x0}, {0x46e6e0?, 0x496e98}, 0x0)
    	/tmp/gopath331505154/pkg/mod/github.com/plar/[email protected]/tree.go:175 +0x293
    github.com/plar/go-adaptive-radix-tree.(*tree).Insert(0xc00009af58, {0x0?, 0x404bd9?, 0x60?}, {0x46e6e0?, 0x496e98?})
    	/tmp/gopath331505154/pkg/mod/github.com/plar/[email protected]/tree.go:15 +0x48
    main.main()
    	/tmp/sandbox249177140/prog.go:12 +0xc5
    

    Looking to investigate further, the documentation hints that nil should be supported as a key.

  • what does the benchmark really mean?

    what does the benchmark really mean?

    1. insert words are all words? how many total?
    2. search words (26) what does that mean? just search for 1 word or many many words?

    | Tree Insert Words | 9 | 117,888,698 ns/op | 37,942,744 B/op | 1,214,541 allocs/op | | Tree Search Words | 26 | 44,555,608 ns/op | 0 B/op | 0 allocs/op | | Tree Insert UUIDs | 18 | 59,360,135 ns/op | 18,375,723 B/op | 485,057 allocs/op | | Tree Search UUIDs | 54 | 21,265,931 ns/op | 0 B/op | 0 allocs/op |

  • Wildcard support

    Wildcard support

    ART trees are useful for using with HTTP Routers, but this implementation lacks wildcards.

    http.Router.Register("foo/{bar}/quz", "custom value")
    

    That above should match a path like foo/good-day/quz. I could have made a fork which implements wildcards adding a node exclusively for the wildcards, and adding an member to identify which branch is a wildcard to fallback if none matches, but there is too much verbose code I don't know what the fuck what does what and why.

  • Possible mistake in

    Possible mistake in "type node48 struct "

    In node.go, will be keys 48 bytes long?

    type node48 struct {
    	node
    	children [node48Max]*artNode
    	// keys     [node256Max]byte
    	keys    [node48Max]byte
    	present [4]uint64 // need 256 bits for keys
    }
    
  • SIMD support?

    SIMD support?

    Does this library use SIMD instructions as per the original ART paper? It seems Go does in fact support SIMD instructions; See for example this package

A Go implementation of a radix tree, that uses binary searches to speed up insert, retrieve and delete operations on dense trees

radixs A Go implementation of a radix tree, that uses binary searches to speed up insert, retrieve and delete operations on dense trees. This implemen

Feb 14, 2022
A generic Go library for implementations of tries (radix trees), state commitments and proofs of inclusion

trie.go Go library for implementations of tries (radix trees), state commitments and proof of inclusion for large data sets. It implements a generic 2

Aug 3, 2022
A radix sorting library for Go (golang)

zermelo A radix sorting library for Go. Trade memory for speed! import "github.com/shawnsmithdev/zermelo" func foo(large []uint64) zermelo.Sort(l

Jul 30, 2022
An immutable radix tree implementation in Golang

go-immutable-radix Provides the iradix package that implements an immutable radix tree. The package only provides a single Tree implementation, optimi

Dec 29, 2022
A Left-Leaning Red-Black (LLRB) implementation of balanced binary search trees for Google Go

GoLLRB GoLLRB is a Left-Leaning Red-Black (LLRB) implementation of 2-3 balanced binary search trees in Go Language. Overview As of this writing and to

Dec 23, 2022
A go implementation of Verkle trees

go-verkle A very experimental implementation of Verkle trees. When production-ready, the code is to be split between go-kzg and go-ethereum. Notes Nod

Jan 8, 2023
Generic types that are missing from Go, including sets, trees, sorted lists, etc.

go-typ Generic types that are missing from Go, including sets, trees, sorted lists, etc. All code is implemented with 0 dependencies and in pure Go co

Dec 4, 2022
High-performance minimalist queue implemented using a stripped-down lock-free ringbuffer, written in Go (golang.org)

This project is no longer maintained - feel free to fork the project! gringo A high-performance minimalist queue implemented using a stripped-down loc

Oct 24, 2022
fim is a collection of some popular frequent itemset mining algorithms implemented in Go.

fim fim is a collection of some popular frequent itemset mining algorithms implemented in Go. fim contains the implementations of the following algori

Jul 14, 2022
Data structure,Algorithms implemented in Go (for education)
 Data structure,Algorithms implemented in Go (for education)

Data structure,Algorithms implemented in Go (for education) List of Content : 1. Math - 2. String - 3. Conversions - 4. Sort - 5. Search - 6. Data str

Dec 13, 2022
radix: a go radix tree with nearest matching

radix implements a radix tree. The package only provides a single Tree implementation, optimized for sparse nodes.

Feb 6, 2022
Provides the radix package that implements a radix tree.

go-radix Provides the radix package that implements a radix tree. The package only provides a single Tree implementation, optimized for sparse nodes.

Oct 26, 2021
Golang implementation of Radix trees

go-radix Provides the radix package that implements a radix tree. The package only provides a single Tree implementation, optimized for sparse nodes.

Dec 30, 2022
Golang in-memory database built on immutable radix trees

go-memdb Provides the memdb package that implements a simple in-memory database built on immutable radix trees. The database provides Atomicity, Consi

Jan 7, 2023
A Go implementation of a radix tree, that uses binary searches to speed up insert, retrieve and delete operations on dense trees

radixs A Go implementation of a radix tree, that uses binary searches to speed up insert, retrieve and delete operations on dense trees. This implemen

Feb 14, 2022
A generic Go library for implementations of tries (radix trees), state commitments and proofs of inclusion

trie.go Go library for implementations of tries (radix trees), state commitments and proof of inclusion for large data sets. It implements a generic 2

Aug 3, 2022
A fast string sorting algorithm (MSD radix sort)
A fast string sorting algorithm (MSD radix sort)

Your basic radix sort A fast string sorting algorithm This is an optimized sorting algorithm equivalent to sort.Strings in the Go standard library. Fo

Dec 18, 2022
Pure is a fast radix-tree based HTTP router
Pure is a fast radix-tree based HTTP router

package pure Pure is a fast radix-tree based HTTP router that sticks to the native implementations of Go's "net/http" package; in essence, keeping the

Dec 1, 2022