indexing library for Go

Bluge Bluge

PkgGoDev Tests Lint

modern text indexing in go - blugelabs.com

Features

  • Supported field types:
    • Text, Numeric, Date, Geo Point
  • Supported query types:
    • Term, Phrase, Match, Match Phrase, Prefix
    • Conjunction, Disjunction, Boolean
    • Numeric Range, Date Range
  • BM25 Similarity/Scoring with pluggable interfaces
  • Search result match highlighting
  • Extendable Aggregations:
    • Bucketing
      • Terms
      • Numeric Range
      • Date Range
    • Metrics
      • Min/Max/Count/Sum
      • Avg/Weighted Avg
      • Cardinality Estimation (HyperLogLog++)
      • Quantile Approximation (T-Digest)

Indexing

    config := bluge.DefaultConfig(path)
    writer, err := bluge.OpenWriter(config)
    if err != nil {
        log.Fatalf("error opening writer: %v", err)
    }
    defer writer.Close()

    doc := bluge.NewDocument("example").
        AddField(bluge.NewTextField("name", "bluge"))

    err = writer.Update(doc.ID(), doc)
    if err != nil {
        log.Fatalf("error updating document: %v", err)
    }

Querying

    reader, err := writer.Reader()
    if err != nil {
        log.Fatalf("error getting index reader: %v", err)
    }
    defer reader.Close()

    query := bluge.NewMatchQuery("bluge").SetField("name")
    request := bluge.NewTopNSearch(10, query).
        WithStandardAggregations()
    documentMatchIterator, err := reader.Search(context.Background(), request)
    if err != nil {
        log.Fatalf("error executing search: %v", err)
    }
    match, err := documentMatchIterator.Next()
    for err == nil && match != nil {
        err = match.VisitStoredFields(func(field string, value []byte) bool {
            if field == "_id" {
                fmt.Printf("match: %s\n", string(value))
            }
            return true
        })
        if err != nil {
            log.Fatalf("error loading stored fields: %v", err)
        }
        match, err = documentMatchIterator.Next()
    }
    if err != nil {
        log.Fatalf("error iterator document matches: %v", err)
    }

License

Apache License Version 2.0

Comments
  • Install fails due to willf/bitset

    Install fails due to willf/bitset

    $ go get -u github.com/blugelabs/bluge
    go get: github.com/willf/[email protected] updating to
    	github.com/willf/[email protected]: parsing go.mod:
    	module declares its path as: github.com/bits-and-blooms/bitset
    	        but was required as: github.com/willf/bitset
    
  • Fix possible runtime panic on DecRef call

    Fix possible runtime panic on DecRef call

    The following error was hit when porting from bleve to bluge

    panic: runtime error: invalid memory address or nil pointer dereference
    [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x17000ce]
    
    goroutine 16 [running]:
    github.com/blugelabs/bluge/index.(*closeOnLastRefCounter).DecRef(0xc0004b59e0, 0x10ca11a, 0x8000000000000000)
    	/gocode/pkg/mod/github.com/blugelabs/[email protected]/index/segment_plugin.go:113 +0xae
    github.com/blugelabs/bluge/index.(*Snapshot).decRef(0xc0005a0780, 0x3, 0x8ffd3)
    	/gocode/pkg/mod/github.com/blugelabs/[email protected]/index/snapshot.go:77 +0xb3
    github.com/blugelabs/bluge/index.(*Snapshot).Close(...)
    	/gocode/pkg/mod/github.com/blugelabs/[email protected]/index/snapshot.go:89
    github.com/blugelabs/bluge/index.(*Writer).mergerLoop(0xc000069200, 0xc000606660, 0xc0000402a0)
    	/gocode/pkg/mod/github.com/blugelabs/[email protected]/index/merge.go:75 +0x4c5
    created by github.com/blugelabs/bluge/index.OpenWriter
    	/gocode/pkg/mod/github.com/blugelabs/[email protected]/index/writer.go:131 +0x8cd
    

    It seems like might be possible that https://github.com/blugelabs/bluge/blob/fe1f453e701a72cb3ee01b13b8a5e5f6e2b0f6cc/index/writer.go#L523 could throw a panic also as I don't see any other place segmentWrapper is initialized. It looks like a path exists where err is nil and the returned closer also is.

    I am using the bluge.InMemoryOnlyConfig.

  • bleve vs bluge question

    bleve vs bluge question

    Hello,

    I hope you don't mind me asking these questions. :-)

    My understanding is that bluge is the replacement for bleve. Could you let me know why you chose to stop development of bleve and start bluge (sorry if this explanation exists elsewhere, I haven't been able to find it). How is the design or implementation of bluge an improvement over bleve? I am asking as a long time user of Lucene, and wanting a performant Go replacement for certain projects. I would appreciate you sharing some of the design direction regarding bluge, and perhaps the use cases where bluge is/will be an improvement over bleve.

    constructively, :-) Glen

  • panic while merging in unit test

    panic while merging in unit test

     panic: runtime error: slice bounds out of range [:1153] with capacity 1152
    
    goroutine 95 [running]:
    github.com/blugelabs/ice/v2.(*Segment).copyStoredDocs(0xc000b40780, 0x200, 0xc000fc0000, 0x7d0, 0x7d0, 0xc000fad3f0, 0x0, 0x1)
    	C:/Users/runneradmin/go/pkg/mod/github.com/blugelabs/ice/[email protected]/merge.go:787 +0x745
    github.com/blugelabs/ice/v2.mergeStoredAndRemap(0xc000621f40, 0x3, 0x3, 0xc000621f20, 0x3, 0x3, 0xc0006b88d0, 0xc0006b88a0, 0x3, 0x3, ...)
    	C:/Users/runneradmin/go/pkg/mod/github.com/blugelabs/ice/[email protected]/merge.go:659 +0x853
    github.com/blugelabs/ice/v2.mergeToWriter(0xc000621f40, 0x3, 0x3, 0xc000621f20, 0x3, 0x3, 0x401, 0xc000621f60, 0xc000798fc0, 0xc000621f40, ...)
    	C:/Users/runneradmin/go/pkg/mod/github.com/blugelabs/ice/[email protected]/merge.go:130 +0x21c
    github.com/blugelabs/ice/v2.mergeSegmentBasesWriter(0xc000621f40, 0x3, 0x3, 0xc000621f20, 0x3, 0x3, 0x9c2580, 0xc0009fe300, 0x401, 0xc000798fc0, ...)
    	C:/Users/runneradmin/go/pkg/mod/github.com/blugelabs/ice/[email protected]/merge.go:96 +0x15a
    github.com/blugelabs/ice/v2.merge(0xc0006b8870, 0x3, 0x3, 0xc000621f20, 0x3, 0x3, 0x9c2580, 0xc0009fe300, 0xc000798fc0, 0x0, ...)
    	C:/Users/runneradmin/go/pkg/mod/github.com/blugelabs/ice/[email protected]/merge.go:85 +0x1d4
    github.com/blugelabs/ice/v2.(*Merger).WriteTo(0xc000cd4910, 0x9c2cc0, 0xc0009d8110, 0xc000798fc0, 0x9c61e0, 0xc000044950, 0x0)
    	C:/Users/runneradmin/go/pkg/mod/github.com/blugelabs/ice/[email protected]/merge.go:48 +0x191
    github.com/blugelabs/bluge/index.(*FileSystemDirectory).Persist(0xc0000ba780, 0x943751, 0x4, 0xd, 0x1d124eabf58, 0xc000cd4910, 0xc000798fc0, 0x9c4e60, 0xc000cd4910)
    	D:/a/bluge/bluge/index/directory_fs.go:125 +0x2d5
    github.com/blugelabs/bluge/index.(*Writer).merge(0xc0002f4480, 0xc0006b8870, 0x3, 0x3, 0xc000621f20, 0x3, 0x3, 0xd, 0x3, 0xc0009fe100, ...)
    	D:/a/bluge/bluge/index/merge.go:368 +0x22b
    github.com/blugelabs/bluge/index.(*Writer).executeMergeTask(0xc0002f4480, 0xc0007990e0, 0xc000621f00, 0xc000cd48c0, 0xc000621ea0)
    	D:/a/bluge/bluge/index/merge.go:144 +0x88b
    github.com/blugelabs/bluge/index.(*Writer).planMergeAtSnapshot(0xc0002f4480, 0xc0007990e0, 0xc0000fa980, 0xa, 0x4c4b40, 0x4024000000000000, 0xa, 0x7d0, 0x4000000000000000, 0x0, ...)
    	D:/a/bluge/bluge/index/merge.go:118 +0x47c
    github.com/blugelabs/bluge/index.(*Writer).mergerLoop(0xc0002f4480, 0xc0007990e0, 0xc000895380)
    	D:/a/bluge/bluge/index/merge.go:56 +0x4e5
    created by github.com/blugelabs/bluge/index.OpenWriter
    	D:/a/bluge/bluge/index/writer.go:131 +0xf8e
    FAIL	github.com/blugelabs/bluge	1.570s
    
  • Exposed sloppyness parameter in Query & Searcher

    Exposed sloppyness parameter in Query & Searcher

    Changes

    • Exposed slop parameter to (Muti/Match)PhraseQuery
    • Added a factory method MultiPhraseSearcher to take in slop as a parameter
    • Added tests for the above
    • In integ tests, added a utility function to create match results, to reduce verbosity
  • Made in memory directory abstraction thread-safe, fixes #41

    Made in memory directory abstraction thread-safe, fixes #41

    Added a RW lock to avoid concurrent writes to segment map.

    I couldn't create a reliable test for this, i was encountering this race condition every 3rd time and haven't encountered it since the fix.

    (I am already in the AUTHORS file)

  • Question: is it possible to add additonal values like recency to the score?

    Question: is it possible to add additonal values like recency to the score?

    I would like to configure how search results are scored based on other additional parameters that are not direct matches. For example I would want the recency of a document to be weighted into the score so they appear in front of other documents that may satisfy the query but are older and thus not that relevant. Is that possible or even the right thing to do to achieve the intended result?

    Another usecase is if i have a like to dislike ratio and i would like the ratio to effect the score/ranking but also maintain searchability

    I hope this is the right place to ask, wasn't sure

  • quick fix to sort issue

    quick fix to sort issue

    It has been observed that in some cases the computed sort value for a DocumentMatch becomes corrupted. The problem has been traced back to the doc values uncompressed slices, but for now a quick fix is proposed to copy the bytes associate with the sort key, ensuring that no other doc values operations can corrupt them.

  • In memory directory causes nil pointer dereference

    In memory directory causes nil pointer dereference

    The in memory directory implementation causes nil pointer deference because the Load method returns a nil closer..

    stacktrace:

    panic: runtime error: invalid memory address or nil pointer dereference
    [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x114c2ee]
    
    goroutine 12 [running]:
    github.com/blugelabs/bluge/index.(*closeOnLastRefCounter).DecRef(0xc0002e3ec0, 0x109145a, 0x8000000000000000)
    	/Users/voldyman/go/pkg/mod/github.com/blugelabs/[email protected]/index/segment_plugin.go:113 +0xae
    github.com/blugelabs/bluge/index.(*Snapshot).decRef(0xc00049e200, 0x3, 0x1bd995)
    	/Users/voldyman/go/pkg/mod/github.com/blugelabs/[email protected]/index/snapshot.go:77 +0xb3
    github.com/blugelabs/bluge/index.(*Snapshot).Close(...)
    	/Users/voldyman/go/pkg/mod/github.com/blugelabs/[email protected]/index/snapshot.go:89
    github.com/blugelabs/bluge/index.(*Writer).mergerLoop(0xc00003cd80, 0xc0002ec9c0, 0xc0000503c0)
    	/Users/voldyman/go/pkg/mod/github.com/blugelabs/[email protected]/index/merge.go:75 +0x4c5
    created by github.com/blugelabs/bluge/index.OpenWriter
    	/Users/voldyman/go/pkg/mod/github.com/blugelabs/[email protected]/index/writer.go:131 +0x8cd
    exit status 2
    
  • Why numeric field store so many tokens?

    Why numeric field store so many tokens?

    i create a document include a field Year as the value 1896. Then a check the docValue of the field, the result like this:

    ice docvalues 000000000008.seg 2 Year
    Year 0x2001404e68000000000000  | 1896.000000 <nil>
    Year 0x240c0476400000000000  | 1896.000000 <nil>
    Year 0x286027340000000000  | 1896.000000 <nil>
    Year 0x2c06023b2000000000  | 1896.000000 <nil>
    Year 0x3030135a00000000  | 1896.000000 <nil>
    Year 0x3403011d50000000  | 1896.000000 <nil>
    Year 0x3818096d000000  | 1896.000000 <nil>
    Year 0x3c01404e680000  | 1896.000000 <nil>
    Year 0x400c04764000  | 1896.000000 <nil>
    Year 0x4460273400  | 1896.000000 <nil>
    Year 0x4806023b20  | 1896.000000 <nil>
    Year 0x4c30135a  | 1896.000000 <nil>
    Year 0x5003011d  | 1856.000000 <nil>
    Year 0x541809  | 1024.000000 <nil>
    Year 0x580140  | 2.000000 <nil>
    Year 0x5c0c  | 2.000000 <nil>
    

    | 1986.00000 this part is my add for debug.

    We can see there are 16 values for this field, i don't understand why a numeric design like this?

    and it used too much space to store this field, it's just a numeric.

  • Hot to convert []byte to float?

    Hot to convert []byte to float?

    As I understand, the one way to get document field values is VisitStoredFields. Callback retrive value as []byte. So, text fields is simply converted via string(value). But how to convert Numeric (float64) fields?

    I tried

    func Float64frombytes(bytes []byte) float64 {
    	bits := binary.BigEndian.Uint32(bytes)
    	float := math.Float32frombits(bits)
    	return float64(float)
    }
    

    and

    func Float64frombytes(bytes []byte) float64 {
    	bits := binary.BigEndian.Uint64(bytes)
    	float := math.Float64frombits(bits)
    	return float
    }
    

    But both of them does not works correct. I tried LittleEndian - still no result.

  • Support illumos and Solaris

    Support illumos and Solaris

    This fixes build failure in recent versions of Grafana, e.g.:

    # github.com/blugelabs/bluge/index/lock
    ../.gopath/pkg/mod/github.com/blugelabs/[email protected]/index/lock/lock.go:33:9: undefined: open
    ../.gopath/pkg/mod/github.com/blugelabs/[email protected]/index/lock/lock.go:37:9: undefined: open
    ../.gopath/pkg/mod/github.com/blugelabs/[email protected]/index/lock/lock.go:49:11: e.unlock undefined (type *DefaultLockedFile has no field or method unlock)
    

    I've tested this patch against grafana 9.2.4 and it now builds correctly.

  • Is there a way to use this library more as a caching layer?

    Is there a way to use this library more as a caching layer?

    Is there a way to use this library more as a caching layer (preferably using mmap so it is not limited by RAM)? For example, I already store data in some external database and I would want to use this library more as a cache and would like to have custom handler for both evict and restore.

  • Indexing/Analyzing URLs, Email Addresses, etc?

    Indexing/Analyzing URLs, Email Addresses, etc?

    Been trying to figure out how to index/analyze things like:

    I don't think any of the built-in analyzers will index these strings properly?

  • Sorting by ascending order of _score

    Sorting by ascending order of _score

    I think I've found a bug.

    Calling SortBy([]string{"_score"}) doesn't seem to actually give me the results in ascending order of the scores.

    Is this suppose to work? (is it according to the docs)... Happy to write a reproducer to confirm / file a bug report.

  • Difference between a NewTextField() and NewKeywordField()

    Difference between a NewTextField() and NewKeywordField()

    Can you give some explanation (and add some doc strings) as to the differences between the different field types?

    Near as I can tell:

    • Keyword Field: single word, case sensitive
    • Text Field: multiple words split by whitespace, case insensitive

    But I'm also not completely sure, "read the source" just doesn't cut it sorry 😅

    Thanks! 🙏

Package for indexing zip files and storing a compressed index

zipindex zipindex provides a size optimized representation of a zip file to allow decompressing the file without reading the zip file index. It will o

Nov 30, 2022
A Go implementation of the core algorithm in paper

Boolean Expression Indexer Go library A Go implementation of the core algorithm in paper <Indexing Boolean Expression>, which already supports the fol

Dec 26, 2022
A small flexible merge library in go
A small flexible merge library in go

conjungo A merge utility designed for flexibility and customizability. The library has a single simple point of entry that works out of the box for mo

Dec 27, 2022
Golang string comparison and edit distance algorithms library, featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...

Go-edlib : Edit distance and string comparison library Golang string comparison and edit distance algorithms library featuring : Levenshtein, LCS, Ham

Dec 20, 2022
Go native library for fast point tracking and K-Nearest queries

Geo Index Geo Index library Overview Splits the earth surface in a grid. At each cell we can store data, such as list of points, count of points, etc.

Dec 3, 2022
Data structure and algorithm library for go, designed to provide functions similar to C++ STL

GoSTL English | 简体中文 Introduction GoSTL is a data structure and algorithm library for go, designed to provide functions similar to C++ STL, but more p

Dec 26, 2022
Zero allocation Nullable structures in one library with handy conversion functions, marshallers and unmarshallers

nan - No Allocations Nevermore Package nan - Zero allocation Nullable structures in one library with handy conversion functions, marshallers and unmar

Dec 20, 2022
A Go library for an efficient implementation of a skip list: https://godoc.org/github.com/MauriceGit/skiplist
A Go library for an efficient implementation of a skip list: https://godoc.org/github.com/MauriceGit/skiplist

Fast Skiplist Implementation This Go-library implements a very fast and efficient Skiplist that can be used as direct substitute for a balanced tree o

Dec 30, 2022
Go Library [DEPRECATED]

Tideland Go Library Description The Tideland Go Library contains a larger set of useful Google Go packages for different purposes. ATTENTION: The cell

Nov 15, 2022
an R-Tree library for Go

rtreego A library for efficiently storing and querying spatial data in the Go programming language. About The R-tree is a popular data structure for e

Jan 3, 2023
Golang library for reading and writing Microsoft Excel™ (XLSX) files.
Golang library for reading and writing Microsoft Excel™ (XLSX) files.

Excelize Introduction Excelize is a library written in pure Go providing a set of functions that allow you to write to and read from XLSX / XLSM / XLT

Jan 9, 2023
Golang library for querying and parsing OFX

OFXGo OFXGo is a library for querying OFX servers and/or parsing the responses. It also provides an example command-line client to demonstrate the use

Nov 25, 2022
Go (golang) library for reading and writing XLSX files.

XLSX Introduction xlsx is a library to simplify reading and writing the XML format used by recent version of Microsoft Excel in Go programs. Tutorial

Jan 5, 2023
A feature complete and high performance multi-group Raft library in Go.
A feature complete and high performance multi-group Raft library in Go.

Dragonboat - A Multi-Group Raft library in Go / 中文版 News 2021-01-20 Dragonboat v3.3 has been released, please check CHANGELOG for all changes. 2020-03

Jan 5, 2023
Go library implementing xor filters
Go library implementing xor filters

xorfilter: Go library implementing xor filters Bloom filters are used to quickly check whether an element is part of a set. Xor filters are a faster a

Dec 30, 2022
Library for hashing any Golang interface

recursive-deep-hash Library for hashing any Golang interface Making huge struct comparison fast & easy How to use package main import ( "fmt" "git

Mar 3, 2022
The Go library that will drive you to AOP world!

Beyond The Golang library that will drive you to the AOP paradigm world! Check Beyond Documentation What's AOP? In computing, aspect-oriented programm

Dec 6, 2022
☔️ A complete Go cache library that brings you multiple ways of managing your caches
☔️ A complete Go cache library that brings you multiple ways of managing your caches

Gocache Guess what is Gocache? a Go cache library. This is an extendable cache library that brings you a lot of features for caching data. Overview He

Jan 1, 2023
A radix sorting library for Go (golang)

zermelo A radix sorting library for Go. Trade memory for speed! import "github.com/shawnsmithdev/zermelo" func foo(large []uint64) zermelo.Sort(l

Jul 30, 2022