A modern text indexing library for go

bleve bleve

Tests Coverage Status GoDoc Join the chat at https://gitter.im/blevesearch/bleve codebeat Go Report Card Sourcegraph License

modern text indexing in go - blevesearch.com

Features

  • Index any go data structure (including JSON)
  • Intelligent defaults backed up by powerful configuration
  • Supported field types:
    • Text, Numeric, Date
  • Supported query types:
    • Term, Phrase, Match, Match Phrase, Prefix
    • Conjunction, Disjunction, Boolean
    • Numeric Range, Date Range
    • Simple query syntax for human entry
  • tf-idf Scoring
  • Search result match highlighting
  • Supports Aggregating Facets:
    • Terms Facet
    • Numeric Range Facet
    • Date Range Facet

Discussion

Discuss usage and development of bleve in the google group.

Indexing

message := struct{
	Id   string
	From string
	Body string
}{
	Id:   "example",
	From: "[email protected]",
	Body: "bleve indexing is easy",
}

mapping := bleve.NewIndexMapping()
index, err := bleve.New("example.bleve", mapping)
if err != nil {
	panic(err)
}
index.Index(message.Id, message)

Querying

index, _ := bleve.Open("example.bleve")
query := bleve.NewQueryStringQuery("bleve")
searchRequest := bleve.NewSearchRequest(query)
searchResult, _ := index.Search(searchRequest)

License

Apache License Version 2.0

Owner
bleve
modern text indexing for go - supported and sponsored by Couchbase
bleve
Comments
  • please support len、has、not operators

    please support len、has、not operators

    please support len、has operators

    eg: search name length > 10 len(name)>10

    // Files.Path is array len(Files.Path)>1

    // Identify that the current doc contains the addr field has(addr)

    not has(addr)

  • Improving rollback effectiveness

    Improving rollback effectiveness

    • Essentially, this PR aims to improve the existing rollback effectiveness by collecting the possible rollback points which are spread across a configurable time interval.
    • WIP
  • AsciiFoldingFilter should support vulgar fractions

    AsciiFoldingFilter should support vulgar fractions

    Unicode range 0x00BC to 0x00BE (1/4, 1/2, 3/4) and Unicode range 0x2150 to 0x215E (various other fractions), as well as 0x2189 (zero thirds) show up in actual text with some frequency.

    It would be helpful if AsciiFoldingFilter supported these, probably with an expansion to a pair of simple ASCII integers (0-9) separated by slash/solidus ('/', ASCII 2F).

    See https://github.com/unicode-org/icu/blob/2dce62892b6b00df02195a35ca09ebd8e082e80f/icu4c/source/data/translit/Latin_ASCII.txt for reference.

  • slow query

    slow query

    query slow, If you use him as a search engine, there is still a lot of room for optimization image

    Suggest optimization

    I don't know if there is a way to improve performance Has nothing to do with the number of hits, or increase the cluster, distributed design

  • please support Inclusion and exclusion of fields

    please support Inclusion and exclusion of fields

    Inclusion and exclusion of fields E.g I want to check tags:leak but no field: passwd doc, do not know how to write

    Two docs are now indexed 1

    tags: leak
    xxx
    ...
    }
    
    1. There is no tags (field)
    {
    name: leak
    xxx
    ...
    }
    

    Now I want to be able to query the doc without tags, probably like this

omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

omniparser Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JS

Jan 4, 2023
:book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

prose prose is a natural language processing library (English only, at the moment) in pure Go. It supports tokenization, segmentation, part-of-speech

Jan 4, 2023
👄 The most accurate natural language detection library in the Go ecosystem, suitable for long and short text alike
👄 The most accurate natural language detection library in the Go ecosystem, suitable for long and short text alike

?? The most accurate natural language detection library in the Go ecosystem, suitable for long and short text alike

Dec 25, 2022
Fonetic is a library to assess pronounceablility of a given text

fonetic-go assess pronounciblity of text Introduction Fonetic is a library to assess pronounceablility of a given text. For more information, check ou

Oct 21, 2022
Parse placeholder and wildcard text commands

allot allot is a small Golang library to match and parse commands with pre-defined strings. For example use allot to define a list of commands your CL

Nov 24, 2022
Guess the natural language of a text in Go

guesslanguage This is a Go version of python guess-language. guesslanguage provides a simple way to detect the natural language of unicode string and

Dec 26, 2022
Produces a set of tags from given source. Source can be either an HTML page, Markdown document or a plain text. Supports English, Russian, Chinese, Hindi, Spanish, Arabic, Japanese, German, Hebrew, French and Korean languages.
Produces a set of tags from given source. Source can be either an HTML page, Markdown document or a plain text. Supports English, Russian, Chinese, Hindi, Spanish, Arabic, Japanese, German, Hebrew, French and Korean languages.

Tagify Gets STDIN, file or HTTP address as an input and returns a list of most popular words ordered by popularity as an output. More info about what

Dec 19, 2022
Extract urls from text

xurls Extract urls from text using regular expressions. Requires Go 1.13 or later. import "mvdan.cc/xurls/v2" func main() { rxRelaxed := xurls.Relax

Jan 7, 2023
Easy AWK-style text processing in Go

awk Description awk is a package for the Go programming language that provides an AWK-style text processing capability. The package facilitates splitt

Jul 25, 2022
Change the color of console text.

go-colortext package This is a package to change the color of the text and background in the console, working both under Windows and other systems. Un

Oct 26, 2022
Templating system for HTML and other text documents - go implementation

FAQ What is Kasia.go? Kasia.go is a Go implementation of the Kasia templating system. Kasia is primarily designed for HTML, but you can use it for any

Mar 15, 2022
Package sanitize provides functions for sanitizing text in golang strings.

sanitize Package sanitize provides functions to sanitize html and paths with go (golang). FUNCTIONS sanitize.Accents(s string) string Accents replaces

Dec 5, 2022
Small and fast FTS (full text search)

Microfts A small full text indexing and search tool focusing on speed and space. Initial tests seem to indicate that the database takes about twice as

Jul 30, 2022
text to speech bot for discord
text to speech bot for discord

text to speech bot for discord

Oct 1, 2022
A diff3 text merge implementation in Go

Diff3 A diff3 text merge implementation in Go based on the awesome paper below. "A Formal Investigation of Diff3" by Sanjeev Khanna, Keshav Kunal, and

Nov 5, 2022
gomtch - find text even if it doesn't want to be found

gomtch - find text even if it doesn't want to be found Do your users have clever ways to hide some terms from you? Sometimes it is hard to find forbid

Sep 28, 2022
Unified text diffing in Go (copy of the internal diffing packages the officlal Go language server uses)

gotextdiff - unified text diffing in Go This is a copy of the Go text diffing packages that the official Go language server gopls uses internally to g

Dec 26, 2022
Convert scanned image PDF file to text annotated PDF file
Convert scanned image PDF file to text annotated PDF file

Jisui (自炊) This tool is PoC (Proof of Concept). Jisui is a helper tool to create e-book. Ordinary the scanned book have not text information, so you c

Dec 11, 2022
Paranoid text spacing in Go (Golang)

pangu.go Paranoid text spacing for good readability, to automatically insert whitespace between CJK (Chinese, Japanese, Korean) and half-width charact

Oct 15, 2022