A Go implementation of the core algorithm in paper

Boolean Expression Indexer Go library

A Go implementation of the core algorithm in paper <Indexing Boolean Expression>, which already supports the following features mentioned in paper:

  • DNF algorithm
  • Simple match
  • TopN match (ranking based)

usage

import 
	"github.com/csimplestring/bool-expr-indexer/api/dnf/expr"
	"github.com/csimplestring/bool-expr-indexer/api/dnf/indexer"
	"github.com/stretchr/testify/assert"
)

    	k := indexer.NewMemIndexer()

	k.Add(expr.NewConjunction(
		1,
		[]*expr.Attribute{
			{Name: "age", Values: []string{"3"}, Contains: true, Weights: []uint32{1}},
			{Name: "state", Values: []string{"NY"}, Contains: true, Weights: []uint32{40}},
		},
	))

	k.Add(expr.NewConjunction(
		2,
		[]*expr.Attribute{
			{Name: "age", Values: []string{"3"}, Contains: true, Weights: []uint32{1}},
			{Name: "gender", Values: []string{"F"}, Contains: true, Weights: []uint32{3}},
		},
	))

	k.Add(expr.NewConjunction(
		3,
		[]*expr.Attribute{
			{Name: "age", Values: []string{"3"}, Contains: true, Weights: []uint32{2}},
			{Name: "gender", Values: []string{"M"}, Contains: true, Weights: []uint32{5}},
			{Name: "state", Values: []string{"CA"}, Contains: false, Weights: []uint32{0}},
		},
	))

	k.Add(expr.NewConjunction(
		4,
		[]*expr.Attribute{
			{Name: "state", Values: []string{"CA"}, Contains: true, Weights: []uint32{15}},
			{Name: "gender", Values: []string{"M"}, Contains: true, Weights: []uint32{9}},
		},
	))

	k.Add(expr.NewConjunction(
		5,
		[]*expr.Attribute{
			{Name: "age", Values: []string{"3", "4"}, Contains: true, Weights: []uint32{1, 5}},
		},
	))

	k.Add(expr.NewConjunction(
		6,
		[]*expr.Attribute{
			{Name: "state", Values: []string{"CA", "NY"}, Contains: false, Weights: []uint32{0, 0}},
		},
	))

	k.Build()

    	matcher := &allMatcher{}

	matched := matcher.Match(k, expr.Assignment{
		expr.Label{Name: "age", Value: "3"},
		expr.Label{Name: "state", Value: "CA"},
		expr.Label{Name: "gender", Value: "M"},
	})

	assert.ElementsMatch(t, []int{4, 5}, matched)

	matched = matcher.Match(k, expr.Assignment{
		expr.Label{Name: "age", Value: "3"},
		expr.Label{Name: "state", Value: "NY"},
		expr.Label{Name: "gender", Value: "F"},
	})

	assert.ElementsMatch(t, []int{1, 2, 5}, matched)

see matcher_test.go

Use Scenario

It is an important component in online computational advertising system. As mentioned in the paper, the original motivation of this library is to provide an efficient way to select ads based on pre-defined targeting condition. Given a user with some labels, it finds the best-matched ads.

In a typical advertising management system, the following hierarchy models are used:

Advertisers
|__ LineItem 
    |__ InsertionOrder 
        |__ Campaign
            |__ Creative Banners With Targeting Condition

Ad selector in RTB

In the RTB or similar environment, when a bidding request comes, usually it comes with some user-profile context(in the paper, it is called assignment). The Ad server then needs to quickly find the matched creative banners that match the targeting condition(in the paper, it is expressed as conjunctions). A naive way is to iterate all the banners and compare the user-profile with targeting condition one-by-one, which is very slow when there are millions of banners in the system, however the whole RTB workflow must be done with 100 ms.

User segmentation in DMP

In a DMP environment, all the collected user data shall be processed, enriched and aggregated in nearly real-time. One key problem is to quickly identify a user belongs to which group, based on pre-defined group condition.

This library is developed for the above scenarios. The following features are supported

  • DNF algorithm

    For example: a targeting condition can be expressed as: (age = 30 AND city = A) OR (gender = male AND age = 20 AND region = B) etc

  • Simple match

    It returns all the matched expression ids

  • TopN match (ranking based)

    It returns the TopN matched expression ids, based on an adopted WAND algorithm. Note that the caller has to provide the upper bound and score of each conjunction, usually can be calculated in a Spark job.

Benchmark

Total number of expressions - size of key operations within 1s op per ns
Benchmark_Match_10000_20-12 106482 10142 ns/op
Benchmark_Match_100000_20-12 70419 14749 ns/op
Benchmark_Match_1000000_20-12 14438 87884 ns/op
Benchmark_Match_10000_30-12 80452 15832 ns/op
Benchmark_Match_100000_30-12 61867 20770 ns/op
Benchmark_Match_1000000_30-12 10000 103594 ns/op
Benchmark_Match_10000_40-12 61221 20778 ns/op
Benchmark_Match_100000_40-12 49161 25129 ns/op
Benchmark_Match_1000000_40-12 10000 110132 ns/op

Memory usage: 1 million of expressions are indexed and it takes 100 MB on average.

Roadmap

This library only implements the core DNF algorithm in that paper. However, it is more useful to use it to build up a production-ready Ad Server. Currently, the indexer does not support online update(all the expressions shall be indexed once in the beginning). Also, the speed will be slow if the expressions number increases. Therefore I plan to support more advanced features soon

  • online index update
  • metrics monitoring
  • index partitioning
  • canary rollout deployment
  • HA support
  • http/gRPC transport
  • web UI

PR issues are welcome

Similar Resources

记录算法学习和LeetCode、LintCode、codewars的学习路程。A record of algorithm learning.

Problem List Leetcode、LintCode、Codewars Algorithm problem solution written by golang. LeetCode id Name(Github) Name(Gitee) 00001 TwoSum TwoSum 00003 L

Nov 3, 2021

The simplest sorting algorithm that sorts in quadratic time

The simplest sorting algorithm that sorts in quadratic time

Simplest sort Showcases the simplest sorting algorithm that works in quadratic time. From here. The pseudocode for this algo can be seen below (sorts

Oct 14, 2022

Is this the simplest (and most surprising) sorting algorithm ever?

Is this the simplest sorting algorithm

May 15, 2022

Smartsort - A smart sorting algorithm for Go to sort filename containing digits that is not zero padded

smartsort A smart sorting algorithm for Go to sort filename containing digits th

Mar 26, 2022

Randomly generated tile maps using Oskar Stålberg's wave function collapse algorithm

Randomly generated tile maps using Oskar Stålberg's wave function collapse algorithm

go-wfc Procedurally-generated tile maps using wave function collapse. Demos Live demo (wasm): https://zfedoran.github.io/go-wfc-example/ Live algorith

Dec 26, 2022

Go implementation of Count-Min-Log

Count-Min-Log Count-Min-Log sketch: Approximately counting with approximate counters - Guillaume Pitel & Geoffroy Fouquier TL;DR: Count-Min-Log Sketch

Jan 4, 2023

A Go implementation of the Elias-Fano encoding

go-ef A Go implementation of the Elias-Fano encoding Example package main import ( "fmt" "github.com/amallia/go-ef" "os" ) func main() {

Nov 23, 2022

Set is a useful collection but there is no built-in implementation in Go lang.

goset Set is a useful collection but there is no built-in implementation in Go lang. Why? The only one pkg which provides set operations now is golang

Sep 26, 2022

A skip list implementation in Go

About This is a library implementing skip lists for the Go programming language (http://golang.org/). Skip lists are a data structure that can be used

Sep 21, 2022
Comments
  • 这里是否有问题?

    这里是否有问题?

    https://github.com/csimplestring/bool-expr-indexer/blob/95e021f67204f13a0a77f440a18cbc31125518cb/api/dnf/tools/bench.go#L67

    您好,在这里生成assignment的时候,if flase,value = ""才对吧,按照1%flase,其他均为true来看,如果这里没问题,那最后生成的assignment会有99%的attributes中value = “”。后续match的时候,取出来的postingLists中可能不会有任何元素,因为hashkey为name + “:”+value。

Document Indexing and Searching Library in Go

Fehrist Fehrist is a pure Go library for indexing different types of documents. Currently it supports only CSV and JSON but flexible architecture give

May 22, 2022
Package for indexing zip files and storing a compressed index

zipindex zipindex provides a size optimized representation of a zip file to allow decompressing the file without reading the zip file index. It will o

Nov 30, 2022
A Go implementation of the 64-bit xxHash algorithm (XXH64)

xxhash xxhash is a Go implementation of the 64-bit xxHash algorithm, XXH64. This is a high-quality hashing algorithm that is much faster than anything

Dec 22, 2022
Implementation of Boyer-Moore fast string search algorithm in Go

boyermoore Implementation of Boyer-Moore fast string search algorithm in Go

Oct 7, 2022
An interesting go struct tag expression syntax for field validation, etc.

An interesting go struct tag expression syntax for field validation, etc.

Jan 8, 2023
Exp-tree: go library for parsing expression tree

Vinshop expression tree Exp-tree is go library for parsing expression tree Installation go get -u github.com/vinshop/exp-tree Quick start Format Expre

May 11, 2022
parody of some of the basic python core features (collections package)

collections import "github.com/marcsantiago/collections" Overview Index Subdirectories Overview Index func StringEncoder(encoder *bytes.Buffer, data D

Jan 14, 2022
Data structure and algorithm library for go, designed to provide functions similar to C++ STL

GoSTL English | 简体中文 Introduction GoSTL is a data structure and algorithm library for go, designed to provide functions similar to C++ STL, but more p

Dec 26, 2022
Multi-String Pattern Matching Algorithm Using TrieHashNode

Multi-String Pattern Matching algorithm. This implementation is inspired from Aho-Corasick algorithm Getting Started modelA = mspm.NewModel("mspm_mode

Dec 9, 2022
golang sorting algorithm and data construction.

data-structures-questions 算法和程序结构是我们学习编程的基础,但是很多的时候,我们很多都是只是在应用,而没有深入的去研究这些,所以自己也在不断的思考和探索,然后分析,学习,总结自己学习的过程,希望可以和大家一起学习和交流下算法! 目录 网络协议 数据结构 算法 数据库 Go

Dec 17, 2022