A package for Go that can be used for range queries on large number of intervals

go-stree

go-stree is a package for Go that can be used to process a large number of intervals. The main purpose of this module is to solve the following problem: given a set of intervals, how to find all overlapping intervals at a certain point or within a certain range.

It offers three different algorithms:

  • stree: implemented as segment tree
  • serial: simple sequential algorithm, mainly for testing purposes
  • mtree: implemented as segment tree with parallel processing

All three algorithms implement the following interface:

// Main interface to access tree
type Tree interface {
  // Push new interval to stack
  Push(from, to int)
  // Push array of intervals to stack
  PushArray(from, to []int)
  // Clear the interval stack
  Clear()
  // Build segment tree out of interval stack
  BuildTree()
  // Print tree recursively to stdout
  Print()
  // Transform tree to array
  Tree2Array() []SegmentOverlap
  // Query interval
  Query(from, to int) []Interval
  // Query interval array
  QueryArray(from, to []int) []Interval
}

Installation

go get github.com/toberndo/go-stree/stree

Example

import (
  "fmt"
  "github.com/toberndo/go-stree/stree"
)

func main() {
    tree := stree.NewTree()
    tree.Push(1, 1)
    tree.Push(2, 3)
    tree.Push(5, 7)
    tree.Push(4, 6)
    tree.Push(6, 9)
    tree.BuildTree()
    fmt.Println(tree.Query(3, 5))
}

The serial algorithm resides in the same package:

import (
  "github.com/toberndo/go-stree/stree"
)

func main() {
    serial := stree.NewSerial()
}

A parallel version of the segment tree is in the sub package multi:

import (
  "github.com/toberndo/go-stree/stree/multi"
)

func main() {
    mtree := multi.NewMTree()
}

Segment tree

A segment tree is a data structure that can be used to run range queries on large sets of intervals. This is for example required to analyze data of gene sequences. The usage is as in the example above: we build a new tree object, push intervals to the data structure, build the tree and can then run certain queries on the tree. The segment tree is a static structure which means we cannot add further intervals once the tree is built. Rebuilding the tree is then required.

segment tree example

Serial

The sequential algorithm simply traverses the array of intervals to search for overlaps. It builds up a dynamic structure where intervals can be added at any time. The interface is equal to the segment tree, but tree specific methods like BuildTree(), Print() and Tree2Array() are not supported.

API

See http://go.pkgdoc.org/github.com/toberndo/go-stree

Performance

To test performance execute the following command in directories stree and multi:

go test -test.bench "." -test.cpu 4

As a short summary: the performance depends highly on the quality of the test data. Parallelism does not always improve performance, in some scenarios the stree algorithm is faster. In the optimal case mtree version with parallel support performs 20% better on a dual core machine than single threaded stree version. A detailed analysis can be found in this article: http://www.chasinclouds.com/2012/05/building-segment-trees-with-go.html

Licence

Use of this source code is governed by a BSD-style license that can be found in the LICENSE file.

About

written by Thomas Oberndörfer [email protected]
Blog: http://www.chasinclouds.com/
follow me on Twitter

Owner
Comments
  • tree node insertion fix

    tree node insertion fix

    The stree insertNode method has some bug. For example if we push only one inteval (1, 1) into the tree stack, the insertNodes will runs into a dead loop in the else case (len(endpoint) is always 1, center is 0)

    func (t *stree) insertNodes(endpoint []int) *node {
        var n *node
        if len(endpoint) == 2 {
            n = &node{segment: Segment{endpoint[0], endpoint[1]}}
            if endpoint[1] != t.max {
                n.left = &node{segment: Segment{endpoint[0], endpoint[1]}}
                n.right = &node{segment: Segment{endpoint[1], endpoint[1]}}
            }
        } else {
            n = &node{segment: Segment{endpoint[0], endpoint[len(endpoint)-1]}}
            center := len(endpoint) / 2
            n.left = t.insertNodes(endpoint[:center+1])
            n.right = t.insertNodes(endpoint[center:])
        }
        return n
    }
    

    Besides, the covers of left child and right child of a certain parent with no intersection may be better. which means a node with segment of (1,3) will have a left child with segment (1,2) and right child with segment (3,3), the node (1,2) has a left child and a right child too.

          (1,3)
          /  \
       (1,2)  (3,3)
       /  \
    (1,1) (2,2)
    
  • Made stree behave like a proper segment tree

    Made stree behave like a proper segment tree

    I adjusted the implementation of the segment tree to follow the algorithm as described in chapter 10.3 of Computational Geometry: Algorithms and Applications rev. 3.

    This makes it possible to successfully perform stabbing queries on the tree when there are overlapping segments.

    Note that I did not change anything in the paralel implementation (mtree).

  • Fixed unbalenced tree creation

    Fixed unbalenced tree creation

    • Removed offset of node insertion to prevent unbalanced trees
    • Removed special case for 2 remaining endpoints in insertNodes to ensure leafs get created properly

    This seems to fix corner cases where some intervals were not found. The algorithm is now more similar to the description in the Wikipedia article.

    I have little experience with segment trees as I do not have a computer science background, so it would be great to have the logic checked by someone before accepting this PR.

  • replace map with slice in Query

    replace map with slice in Query

    benchmark                            old ns/op     new ns/op     delta
    BenchmarkSimple-4                    6728          6782          +0.80%
    BenchmarkBuildTree1000-4             1108448       1131862       +2.11%
    BenchmarkBuildTree100000-4           265806935     263615709     -0.82%
    BenchmarkQueryTree-4                 80.2          7.33          -90.86%
    BenchmarkQuerySerial-4               240040        207910        -13.39%
    BenchmarkQueryTreeMax-4              73.0          7.28          -90.03%
    BenchmarkQuerySerialMax-4            241232        207559        -13.96%
    BenchmarkQueryTreeArray-4            106           103           -2.83%
    BenchmarkQuerySerialArray-4          2343672       2104598       -10.20%
    BenchmarkEndpoints100000-4           43077896      40116570      -6.87%
    BenchmarkInsertNodes100000-4         35941380      35876511      -0.18%
    BenchmarkInsertIntervals100000-4     202871438     193046612     -4.84%
    

    When most queries return empty or few results, this has a huge benefit as shown by the benchmarks (10X faster). When queries return many results we should nearly identical performance as before.

  • Inserting intervals does not work correctly

    Inserting intervals does not work correctly

    Hi

    I finally found out what the real problem is with this library. CompareTo does not report INTERSECT_OR_SUPERSET when s is a subset of other, even though this implies that they also intersect. This means the algorithm for inserting the intervals in the tree can not be implemented correctly:

    // CompareTo compares two Segments and returns: DISJOINT, SUBSET or INTERSECT_OR_SUPERSET
    func (s *Segment) CompareTo(other *Segment) int {
        if other.From > s.To || other.To < s.From {
            return DISJOINT
        }
        if other.From <= s.From && other.To >= s.To {
            return SUBSET // this means there is also an intersection
        }
        return INTERSECT_OR_SUPERSET
    }
    
    // Inserts interval into given tree structure
    func insertInterval(node *node, intrvl *Interval) {
        switch node.segment.CompareTo(&intrvl.Segment) {
        case SUBSET:
            // interval of node is a subset of the specified interval or equal
            if node.overlap == nil {
                node.overlap = make([]*Interval, 0, 10)
            }
            node.overlap = append(node.overlap, intrvl)
        case INTERSECT_OR_SUPERSET:
            // In the original algorithm we should directly check if `node.left` intersects with `intrvl`.
            // In this algorithm we check this after calling `insertInterval`. However, we never land
            // here when `s.left` is a subset of `intrvl`.
            if node.left != nil {
                insertInterval(node.left, intrvl)
            }
            // In the original algorithm we should directly check if `node.right` intersects with `intrvl`.
            // In this algorithm we check this after calling `insertInterval`. However, we never land
            // here when `s.right` is a subset of `intrvl`.
            if node.right != nil {
                insertInterval(node.right, intrvl)
            }
        case DISJOINT:
            // nothing to do
        }
    }
    

    My suggestion is to use separate function to check if an interval is a subset of another interval, and a function to check if intervals intersect. This makes it possible to implement the correct algorithm.

    On page 43 of http://www.cs.uu.nl/geobook/pseudo.pdf there is some pretty good pseudo code. This is the pseudo code used in Computational Geometry: Algorithms and Applications, that is referenced (and pretty much copy-pasted) a lot in the segment tree wiki page.

    I already tested this in my code, and it should be fairly easy to fix this in your library. Shall I create a pull request for this? If yes, could you first please merge #2?

estruct traverses javascript projects and maps all the dependencies and relationships to a JSON. the output can be used to build network visualizations of the project and document the architecture.
estruct traverses javascript projects and maps all the dependencies and relationships to a JSON. the output can be used to build network visualizations of the project and document the architecture.

EStruct traverses javascript projects and maps all the dependencies and relationships to a JSON. The output can be used to build network visualizations of the project and document the architecture.

Jan 27, 2022
Go native library for fast point tracking and K-Nearest queries

Geo Index Geo Index library Overview Splits the earth surface in a grid. At each cell we can store data, such as list of points, count of points, etc.

Dec 3, 2022
Cache Slow Database Queries
Cache Slow Database Queries

Cache Slow Database Queries This package is used to cache the results of slow database queries in memory or Redis. It can be used to cache any form of

Dec 19, 2022
Structscanner is a simple library to make going from database queries to structs easier

structscanner is a simple library to make going from database queries to structs easier, while retaining the flexibility of joins and mapping using struct tags.

Oct 17, 2022
Package set is a small wrapper around the official reflect package that facilitates loose type conversion and assignment into native Go types.

Package set is a small wrapper around the official reflect package that facilitates loose type conversion and assignment into native Go types. Read th

Dec 27, 2022
A bloom filter is a probabilistic data structure that is used to determine if an element is present in a set

A Go implementation of a bloom filter, with support for boltdb and badgerdb as optional in-memory persistent storage.

Jul 4, 2022
Nullable Go types that can be marshalled/unmarshalled to/from JSON.

Nullable Go types Description This package provides nullable Go types for bool, float64, int64, int32, string and time.Time replacing sql.NullString,

Dec 12, 2022
A skip list of arbitrary elements that can be filtered using roaring bitmaps stored in an LRU cache

Skipfilter This package provides a data structure that combines a skiplist with a roaring bitmap cache. This library was created to efficiently filter

Aug 4, 2022
Snackbox - Snackbox can make it easier for customers to order snacks and rice boxes and do tracking
Snackbox - Snackbox can make it easier for customers to order snacks and rice boxes and do tracking

Catering Ecommerce Platform API Docs · Wireflow · Use Case Diagram · Entity Rela

Dec 5, 2022
grep utility that searches through zip,jar,ear,tgz,bz2 in any form of nesting; it can also decompile class files

rzgrep - grep for stuff in archives that are embedded within archives This is a small utility, it greps through the contents of an archive file, it al

May 10, 2022
Go package implementing bitsets

bitset Go language library to map between non-negative integers and boolean values Description Package bitset implements bitsets, a mapping between no

Jan 1, 2023
Go package for mapping values to and from space-filling curves, such as Hilbert and Peano curves.
Go package for mapping values to and from space-filling curves, such as Hilbert and Peano curves.

Hilbert Go package for mapping values to and from space-filling curves, such as Hilbert and Peano curves. Documentation available here This is not an

Dec 23, 2022
Package ring provides a high performance and thread safe Go implementation of a bloom filter.

ring - high performance bloom filter Package ring provides a high performance and thread safe Go implementation of a bloom filter. Usage Please see th

Nov 20, 2022
Go package implementing Bloom filters

Bloom filters A Bloom filter is a representation of a set of n items, where the main requirement is to make membership queries; i.e., whether an item

Dec 30, 2022
Go package implementing an indexable ordered multimap

PACKAGE package skiplist import "github.com/glenn-brown/skiplist" Package skiplist implements fast indexable ordered multimaps. This sk

Jul 2, 2022
parody of some of the basic python core features (collections package)

collections import "github.com/marcsantiago/collections" Overview Index Subdirectories Overview Index func StringEncoder(encoder *bytes.Buffer, data D

Jan 14, 2022
Package mafsa implements Minimal Acyclic Finite State Automata in Go, essentially a high-speed, memory-efficient, Unicode-friendly set of strings.

MA-FSA for Go Package mafsa implements Minimal Acyclic Finite State Automata (MA-FSA) with Minimal Perfect Hashing (MPH). Basically, it's a set of str

Oct 27, 2022
Package for indexing zip files and storing a compressed index

zipindex zipindex provides a size optimized representation of a zip file to allow decompressing the file without reading the zip file index. It will o

Nov 30, 2022
peanut is a Go package to write tagged data structs to disk in a variety of formats.

peanut peanut is a Go package to write tagged data structs to disk in a variety of formats. Its primary purpose is to provide a single consistent inte

Dec 16, 2022