A fast (5x) string keyed read-only map for Go - particularly good for keys using a small set of nearby runes.

faststringmap

faststringmap is a fast read-only string keyed map for Go (golang). For our use case it is approximately 5 times faster than using Go's built-in map type with a string key. It also has the following advantages:

  • look up strings and byte slices without use of the unsafe package
  • minimal impact on GC due to lack of pointers in the data structure
  • data structure can be trivially serialized to disk or network

The code provided implements a map from string to uint32 which fits our use case, but you can easily substitute other value types.

faststringmap is a variant of a data structure called a Trie. At each level we use a slice to hold the next possible byte values. This slice is of length one plus the difference between the lowest and highest possible next bytes of strings in the map. Not all the entries in the slice are valid next bytes. faststringmap is thus more space efficient for keys using a small set of nearby runes, for example those using a lot of digits.

Example

Example usage can be found in uint32_store_example_test.go.

Motivation

I created faststringmap in order to improve the speed of parsing CSV where the fields were category codes from survey data. The majority of these were numeric ("1", "2", "3"...) plus a distinct code for "not applicable". I was struck that in the simplest possible cases (e.g. "1" ... "5") the map should be a single slice lookup.

Our fast CSV parser provides fields as byte slices into the read buffer to avoid creating string objects. So I also wanted to facilitate key lookup from a []byte rather than a string. This is not possible using a built-in Go map without use of the unsafe package.

Benchmarks

Below are example benchmarks from my laptop which are for looking up every element in a map of size 1000. So approximate times are 25ns per lookup for the Go native map and 5ns per lookup for the faststringmap.

cpu: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
BenchmarkUint32Store
BenchmarkUint32Store-8        	  218463	      4959 ns/op
BenchmarkGoStringToUint32
BenchmarkGoStringToUint32-8   	   49279	     24483 ns/op

Improvements

You can improve the performance further by using a slice for the next fields. This avoids a bounds check when looking up the entry for a byte. However, it comes at the cost of easy serialization and introduces a lot of pointers which will have impact on GC. It is not possible to directly construct the slice version in the same way so that the whole store is one block of memory. Either create as in this code and then derive the slice version or create distinct slice objects at each level.

Owner
The Sensible Code Company
We design and sell products that turn messy information into valuable data. Also see https://github.com/cantabular Previously: https://github.com/scraperwiki
The Sensible Code Company
Similar Resources

A typed implementation of the Go sync.Map using code generation

syncmap A typed implementation of the Go sync.Map using code generation. Install go get -u github.com/a8m/syncmap@master Examples: Using CLI $ syncma

Dec 26, 2022

Multi-String Pattern Matching Algorithm Using TrieHashNode

Multi-String Pattern Matching algorithm. This implementation is inspired from Aho-Corasick algorithm Getting Started modelA = mspm.NewModel("mspm_mode

Dec 9, 2022

A simple Set data structure implementation in Go (Golang) using LinkedHashMap.

Set Set is a simple Set data structure implementation in Go (Golang) using LinkedHashMap. This library allow you to get a set of int64 or string witho

Sep 26, 2022

Collections for Golang using generics. Currently containing Hash Set.

Collections for Golang using generics. Currently containing Hash Set.

Implementation of missing colections for Golang using Generics. Free of dependencies. Curently in early development phase. Requirements Go 1.18+ Insta

Dec 30, 2021

A thread safe map which has expiring key-value pairs

~ timedmap ~ A map which has expiring key-value pairs. go get github.com/zekroTJA/timedmap Intro This package allows to set values to a map which will

Dec 29, 2022

A prefix-enhanced map in Go

PrefixMap PrefixMap is a prefix-enhanced map that eases the retrieval of values based on key prefixes. Quick Start Creating a PrefixMap // creates the

Jun 13, 2022

💯 Go Struct and Field validation, including Cross Field, Cross Struct, Map, Slice and Array diving

Package validator implements value validations for structs and individual fields based on tags.

Nov 9, 2022

Go library for encoding native Go structures into generic map values.

wstructs origin: github.com/things-go/structs Go library for encoding native Go structures into generic map values. Installation Use go get. go ge

Jan 10, 2022

A Golang lock-free thread-safe HashMap optimized for fastest read access.

hashmap Overview A Golang lock-free thread-safe HashMap optimized for fastest read access. Usage Set a value for a key in the map: m := &HashMap{} m.S

Dec 30, 2022
Comments
  • Improve memory usage during map build

    Improve memory usage during map build

    Speed up build for large maps by allocating in chunks and then copying over to one slice at the end. This generates less garbage and involves less copying than the previous strategy of using append.

  • faststringmap is only faster when keys are shorter 7 symbols

    faststringmap is only faster when keys are shorter 7 symbols

    I was evaluating faststringmap for my use case and wrote a benchmark that compares lookup times of standard map and faststringmap. In the beginning the key length was 36 symbols(length of a UUID with dashes), the result of faststringmap was about 10 times worse than the result of standard map. So I limited the length of the keys in my benchmark to 6 and only then faststringmap became faster:

    package trie_test
    
    import (
    	"testing"
    
    	"github.com/google/uuid"
    	"github.com/sensiblecodeio/faststringmap"
    )
    
    const (
    	keysCount = 1000
    )
    
    type exampleSource map[string]uint32
    
    func (s exampleSource) AppendKeys(a []string) []string {
    	for k := range s {
    		a = append(a, k)
    	}
    	return a
    }
    
    func (s exampleSource) Get(k string) uint32 {
    	return s[k]
    }
    
    var (
    	keys    []string
    	hashMap exampleSource
    	fastMap faststringmap.Uint32Store
    )
    
    func init() {
    	keys = make([]string, keysCount)
    
    	hashMap = exampleSource{}
    
    	for i := 0; i < keysCount; i++ {
    		keys[i] = uuid.NewString()[0:6]
    		hashMap[keys[i]] = 1
    	}
    
    	fastMap = faststringmap.NewUint32Store(hashMap)
    }
    
    func BenchmarkMap(b *testing.B) {
    	for i := 0; i < b.N; i++ {
    		for _, key := range keys {
    			if v, ok := hashMap[key]; v != 1 || !ok {
    				b.Fail()
    			}
    		}
    	}
    }
    
    func BenchmarkFastStringMap(b *testing.B) {
    	for i := 0; i < b.N; i++ {
    		for _, key := range keys {
    			if v, ok := fastMap.LookupString(key); v != 1 || !ok {
    				b.Fail()
    			}
    		}
    	}
    }
    

    Results for key length of 36:

    goos: linux
    goarch: amd64
    pkg: github.com/zhulik/go-eventbus/trie
    cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics     
    BenchmarkMap-16                   134596              8464 ns/op
    BenchmarkFastStringMap-16          14326             84213 ns/op
    PASS
    ok      github.com/zhulik/go-eventbus/trie      3.311s
    

    Results for key length of 6:

    goos: linux
    goarch: amd64
    pkg: github.com/zhulik/go-eventbus/trie
    cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics     
    BenchmarkMap-16                   132898              8484 ns/op
    BenchmarkFastStringMap-16         140976              8254 ns/op
    PASS
    ok      github.com/zhulik/go-eventbus/trie      2.495s
    

    Am I cooking the data structure wrong, or it is by design?

A Go library to iterate over potentially nested map keys using the visitor pattern

A Go library to iterate over potentially nested map keys using the visitor pattern

Mar 15, 2022
skipmap is a high-performance concurrent sorted map based on skip list. Up to 3x ~ 10x faster than sync.Map in the typical pattern.
skipmap is a high-performance concurrent sorted map based on skip list. Up to 3x ~ 10x faster than sync.Map in the typical pattern.

Introduction skipmap is a high-performance concurrent map based on skip list. In typical pattern(one million operations, 90%LOAD 9%STORE 1%DELETE), th

Jan 8, 2023
An in-memory string-interface{} map with various expiration options for golang

TTLCache - an in-memory cache with expiration TTLCache is a simple key/value cache in golang with the following functions: Expiration of items based o

Jan 8, 2023
Decode / encode XML to/from map[string]interface{} (or JSON); extract values with dot-notation paths and wildcards. Replaces x2j and j2x packages.

mxj - to/from maps, XML and JSON Decode/encode XML to/from map[string]interface{} (or JSON) values, and extract/modify values from maps by key or key-

Dec 22, 2022
An in-memory string-interface{} map with various expiration options for golang

TTLCache - an in-memory cache with expiration TTLCache is a simple key/value cache in golang with the following functions: Expiration of items based o

Dec 28, 2022
A memory-efficient trie for testing the existence/prefixes of string only(for now).

Succinct Trie A memory-efficient trie for testing the existence/prefixes of string only(for now). Install go get -u github.com/nobekanai/sutrie Docume

Mar 10, 2022
Package set is a small wrapper around the official reflect package that facilitates loose type conversion and assignment into native Go types.

Package set is a small wrapper around the official reflect package that facilitates loose type conversion and assignment into native Go types. Read th

Dec 27, 2022
Data structure and relevant algorithms for extremely fast prefix/fuzzy string searching.

Trie Data structure and relevant algorithms for extremely fast prefix/fuzzy string searching. Usage Create a Trie with: t := trie.New() Add Keys with:

Dec 27, 2022
Implementation of Boyer-Moore fast string search algorithm in Go

boyermoore Implementation of Boyer-Moore fast string search algorithm in Go

Oct 7, 2022
go/golang: fast bit set Bloom filter

package implements a fast bloom filter with real 'bitset' and JSONMarshal/JSONUnmarshal to store/reload the Bloom filter.

Nov 3, 2022