Bloomfilter - Face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go

Important: Zeroth, consider if a Cuckoo filter could be right for your use-case.

GoDoc travis

Face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go

Copyright © 2014-2016,2018 Barry Allard

MIT license

WTF is a bloom filter

**TL;DR: **Probabilistic, extra lookup table to track a set of elements kept elsewhere to reduce expensive, unnecessary set element retrieval and/or iterator operations when an element is not present in the set. It's a classic time-storage tradeoff algoritm.

Properties

See wikipedia for algorithm details

Impact What Description
Good No false negatives know for certain if a given element is definitely NOT in the set
Bad False positives uncertain if a given element is in the set
Bad Theoretical potential for hash collisions in very large systems and/or badly hash.Hash64-conforming implementations
Bad Add only Cannot remove an element, it would destroy information about other elements
Good Constant storage uses only a fixed amount of memory

Naming conventions

(Similar to algorithm)

Variable/function Description Range
m/M() number of bits in the bloom filter (memory representation is about m/8 bytes in size) >=2
n/N() number of elements present >=0
k/K() number of keys to use (keys are kept private to user code but are de/serialized to Marshal and file I/O) >=0
maxN maximum capacity of intended structure >0
p maximum allowed probability of collision (for computing m and k for optimal sizing) >0..<1
  • Memory representation should be exactly 24 + 8*(k + (m+63)/64) + unsafe.Sizeof(RWMutex) bytes.
  • Serialized (BinaryMarshaler) representation should be exactly 72 + 8*(k + (m+63)/64) bytes. (Disk format is less due to compression.)

Binary serialization format

All values in Little-endian format

Offset Offset (Hex) Length (bytes) Name Type
0 00 8 k uint64
8 08 8 n uint64
16 10 8 m uint64
24 18 k (keys) [k]uint64
24+8*k ... (m+63)/64 (bloom filter) [(m+63)/64]uint64
24+8*k+8*((m+63)/64) ... 48 (SHA384 of all previous fields, hashed in order) [48]byte
  • bloomfilter.Filter conforms to encoding.BinaryMarshaler and `encoding.BinaryUnmarshaler'

Usage

import "github.com/steakknife/bloomfilter"

const (
  maxElements = 100000
  probCollide = 0.0000001
)

bf, err := bloomfilter.NewOptimal(maxElements, probCollide)
if err != nil {
  panic(err)
}

someValue := ... // must conform to hash.Hash64

bf.Add(someValue)
if bf.Contains(someValue) { // probably true, could be false
  // whatever
}

anotherValue := ... // must also conform to hash.Hash64

if bf.Contains(anotherValue) {
  panic("This should never happen")
}

err := bf.WriteFile("1.bf.gz")  // saves this BF to a file
if err != nil {
  panic(err)
}

bf2, err := bloomfilter.ReadFile("1.bf.gz") // read the BF to another var
if err != nil {
  panic(err)
}

Design

Where possible, branch-free operations are used to avoid deep pipeline / execution unit stalls on branch-misses.

Get

go get -u github.com/steakknife/bloomfilter  # master is always stable

Source

Contact

License

MIT license

Copyright © 2014-2016 Barry Allard

Owner
Barry Allard
Card shark. Con job. Boot cut. SRE/PE GPG Key Fingerprint: 10E3 3BC9 999F 6752 3EF2 24ED 14CA C196 A122 026C
Barry Allard
Similar Resources

A Golang lock-free thread-safe HashMap optimized for fastest read access.

hashmap Overview A Golang lock-free thread-safe HashMap optimized for fastest read access. Usage Set a value for a key in the map: m := &HashMap{} m.S

Dec 30, 2022

Go package implementing Bloom filters

Bloom filters A Bloom filter is a representation of a set of n items, where the main requirement is to make membership queries; i.e., whether an item

Dec 30, 2022

Go package implementing Bloom filters

Bloom filters A Bloom filter is a concise/compressed representation of a set, where the main requirement is to make membership queries; i.e., whether

Dec 28, 2022

Sync distributed sets using bloom filters

goSetReconciliation An implementation to sync distributed sets using bloom filters. Based on the paper "Low complexity set reconciliation using Bloom

Jan 4, 2022

A Connected Graph Generator tool that construct graphs of some given size

graph graph is a Connected Graph Generator tool that construct graphs of some given size. Notice that it generates all possible connected, undirected

Nov 5, 2021

dagger is a fast, concurrency safe, mutable, in-memory directed graph library with zero dependencies

dagger is a fast, concurrency safe, mutable, in-memory directed graph library with zero dependencies

dagger is a blazing fast, concurrency safe, mutable, in-memory directed graph implementation with zero dependencies

Dec 19, 2022

Null Types, Safe primitive type conversion and fetching value from complex structures.

Typ Typ is a library providing a powerful interface to impressive user experience with conversion and fetching data from built-in types in Golang Feat

Sep 26, 2022

Type-safe, zero-allocation sets for Go

Set Package set is a type-safe, zero-allocation port of the excellent package fatih/set. It contains sets for most of the basic types and you can gene

Jan 5, 2023

Go native library for fast point tracking and K-Nearest queries

Geo Index Geo Index library Overview Splits the earth surface in a grid. At each cell we can store data, such as list of points, count of points, etc.

Dec 3, 2022
Comments
  • Fix condition in `Filter.Contains`

    Fix condition in `Filter.Contains`

    By construction of a Bloom filter, all the bit positions of an item must be 1 if the item is in the set. The current code assumes that an item might be in the set if any of its bit positions are 1, ending up in a filter that is still correct but vastly less specific.

  • Fix formula for OptimalM

    Fix formula for OptimalM

    As I mentioned in #6, the formula for OptimalM is wrong by a set of parenthesis. This understated the value of K and M, resulting in filters that were significantly less specific than expected.

  • Fix binary marshal/unmarshal

    Fix binary marshal/unmarshal

    Thanks for a complete library. I fixed some issues I ran into using the marshalling to files feature:

    • Using correct raw stream in ReadFrom
    • Store length of bits and keys in serialized format
    • Add basic test coverage
go/golang: fast bit set Bloom filter

package implements a fast bloom filter with real 'bitset' and JSONMarshal/JSONUnmarshal to store/reload the Bloom filter.

Nov 3, 2022
A Go implementation of an in-memory bloom filter, with support for boltdb and badgerdb as optional data persistent storage.

Sprout A bloom filter is a probabilistic data structure that is used to determine if an element is present in a set. Bloom filters are fast and space

Jul 4, 2022
Cuckoo Filter: Practically Better Than Bloom

Cuckoo Filter Cuckoo filter is a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient

Jan 4, 2023
A simple Bloom Filter implementation in Go

This is a simple Bloom filter implementation written in Go. For the theory behind Bloom filters, read http://en.wikipedia.org/wiki/Bloom_filter This

Apr 26, 2018
Leaked password check library with bloom filter

Leaked password check With this library you can check the password is probably leaked or not. Pre generated bitset DB includes 6 Million leaked passwo

Aug 24, 2022
A bloom filter is a probabilistic data structure that is used to determine if an element is present in a set

A Go implementation of a bloom filter, with support for boltdb and badgerdb as optional in-memory persistent storage.

Jul 4, 2022
Bloomfilter written in Golang.

go-bloom-filter BloomFilter written in Golang. Implementations This repository contains several bloomfilter implementations that you can use to solve

Nov 8, 2021
A simple and efficient thread-safe sharded hashmap for Go

shardmap A simple and efficient thread-safe sharded hashmap for Go. This is an alternative to the standard Go map and sync.Map, and is optimized for w

Dec 17, 2022
A thread safe map which has expiring key-value pairs

~ timedmap ~ A map which has expiring key-value pairs. go get github.com/zekroTJA/timedmap Intro This package allows to set values to a map which will

Dec 29, 2022
go.fifo provides a simple fifo thread-safe queue for the Go programming language

go.fifo Description go.fifo provides a simple FIFO thread-safe queue. *fifo.Queue supports pushing an item at the end with Add(), and popping an item

Aug 29, 2022