Deduplicated and GC-friendly string store

Symbols

Go Report Card GoDoc codecov MIT License

GC-friendly string store

This library helps to store big number of strings in structure with small number of pointers to make it friendly to Go garbage collector.

Motivation

As described in article Avoiding high GC overhead with large heaps, a large number of pointers make Go Garbage Collector life harder. So what if you need to deal with strings, a lot of strings? Every string is a pointer. You need to hide this pointers from Go runtime.

It's helpful for persistent caches and text resources.

Philosophy

As small as possible, as fast as possible.

Usage

package main

import (
	"fmt"

	"github.com/vporoshok/symbols"
)

// User is a domain model
type User struct {
	ID     int
	Name   string
	Labels []string
}

// UserCache is a persistent cache of users
type UserCache struct {
	data map[int]cachedUser
	dict symbols.Dictionary // using Dictionary to deduplicate labels
}

// cachedUser is a special struct to glue symbols with domain model
type cachedUser struct {
	name, labels symbols.Symbol
}

// Add user to cache
func (cache *UserCache) Add(user User) {
	if _, ok := cache.data[user.ID]; !ok {
		if cache.data == nil {
			cache.data = make(map[int]cachedUser)
		}
		cache.data[user.ID] = cachedUser{
			name:   cache.dict.AddString(user.Name),
			labels: symbols.AddStrings(&cache.dict, user.Labels),
		}
	}
}

// Get user from cache by id
func (cache UserCache) Get(id int) (User, bool) {
	if user, ok := cache.data[id]; ok {
		return User{
			ID:     id,
			Name:   cache.dict.GetString(user.name),
			Labels: symbols.GetStrings(cache.dict, user.labels),
		}, true
	}
	return User{}, false
}

func main() {
	cache := new(UserCache)
	cache.Add(User{
		ID:     1,
		Name:   "John",
		Labels: []string{"developer", "golang", "bicycle"},
	})
	cache.Add(User{
		ID:     2,
		Name:   "Mary",
		Labels: []string{"developer", "golang", "running"},
	})
	cache.Add(User{
		ID:     3,
		Name:   "Albert",
		Labels: []string{"manager", "bicycle", "running"},
	})
	fmt.Println(cache.Get(1))
	fmt.Println(cache.Get(2))
	fmt.Println(cache.Get(3))
	// Dictionary now contain strings
	// John, Mary, Albert
	// developer, golang, bicycle, running, manager
	// and composition of John's, Mary's and Albert's labels as 3-byte strings
	fmt.Println(cache.dict.Len()) // 11
}

How it works

Store

Store has two parts:

  1. slice of pages for small strings;
  2. slice of long strings as is;

Short strings (less or equal 255 bytes) stored in pages (1 MB byte arrays) with prefix byte with it length, for example:

[4, J, o, h, n, 9, d, e, v, e, l, o, p, e, r, 6, g, o, l, a, n, g, 7, b, c, y, c, l, e, 4, M, a, r, y, ...]

And Symbol for it represents two uin32 numbers (page number and index of byte with length) stored as an uint64.

If string to be add is longer then rest of current page, a new page is being created, so pages may be incomplete (less than 256 bytes per page, less than 0.01%).

Long strings stored as is in separate slice and Symbol for it represents index in this slice with 1 at first bit.

Dictionary

Key features

List of features that may help to decide use this project.

See also

List of inspiration projects, alternatives, and analogs. Maybe some comparisons if it's important.

Roadmap

  • Configure Github Actions for project;
  • Store string length in Symbol instead of page;
  • Optimize long string store;
  • May be add methods to save/restore Store to/from file or writer/reader (how to use it?);
Owner
Bastrykov Evgeniy
Web developer. Go, JS, TS, Python, PHP etc.
Bastrykov Evgeniy
Similar Resources

a sharded store to hold large IPLD graphs efficiently, packaged as location-transparent attachable CAR files, with mechanical sympathy

DAG store This README will be populated soon. In the meantime, please refer to the design document.

Oct 31, 2022

A full-featured license tool to check and fix license headers and resolve dependencies' licenses.

A full-featured license tool to check and fix license headers and resolve dependencies' licenses.

SkyWalking Eyes A full-featured license tool to check and fix license headers and resolve dependencies' licenses. Usage You can use License-Eye in Git

Dec 26, 2022

A Go (golang) library for parsing and verifying versions and version constraints.

go-version is a library for parsing versions and version constraints, and verifying versions against a set of constraints. go-version can sort a collection of versions properly, handles prerelease/beta versions, can increment versions, etc.

Jan 9, 2023

🤖🤝A tool to test and analyze storage and retrieval deal capability on the Filecoin network.

Dealbot A tool to test and analyze storage and retrieval deal capability on the Filecoin network. Getting Started Clone the repo and build: git clone

Sep 10, 2022

Continuous profiling for analysis of CPU, memory usage over time, and down to the line number. Saving infrastructure cost, improving performance, and increasing reliability.

Continuous profiling for analysis of CPU, memory usage over time, and down to the line number. Saving infrastructure cost, improving performance, and increasing reliability.

Continuous profiling for analysis of CPU, memory usage over time, and down to the line number. Saving infrastructure cost, improving performance, and increasing reliability.

Jan 2, 2023

CUE utilities and helpers for working with tree based objects in any combination of CUE, Yaml, and JSON.

Cuetils CUE utilities and helpers for working with tree based objects in any combination of CUE, Yaml, and JSON. Using As a command line binary The cu

Dec 24, 2022

Hex dump and read values of files quickly and swiftly with Go-Hex a program designed to dump any file in a hexadecimal format

Go-Hex Hex dump and read values of files quickly and swiftly with Go-Hex a program designed to dump any file in a hexadecimal format Dump Hashes ----

Oct 10, 2021

A process that receives probe information and stores it in a database for reporting and analysis

probed is a process that receives probe information and stores it in a database for reporting and analysis.

Nov 2, 2022

Golang: unify nil and empty slices and maps

unifynil, unify nil and empty slices and maps in Golang Empty slices and maps can be nil or not nil in Go. It may become a nightmare in tests and JSON

Jan 16, 2022
Related tags
A library that provides Go Generics friendly "optional" features.

go-optional A library that provides Go Generics friendly "optional" features. Synopsis some := optional.Some[int](123) fmt.Printf("%v\n", some.IsSome(

Dec 20, 2022
Go library that provides fuzzy string matching optimized for filenames and code symbols in the style of Sublime Text, VSCode, IntelliJ IDEA et al.
Go library that provides fuzzy string matching optimized for filenames and code symbols in the style of Sublime Text, VSCode, IntelliJ IDEA et al.

Go library that provides fuzzy string matching optimized for filenames and code symbols in the style of Sublime Text, VSCode, IntelliJ IDEA et al. This library is external dependency-free. It only depends on the Go standard library.

Dec 27, 2022
💪 Helper Utils For The Go: string, array/slice, map, format, cli, env, filesystem, test and more.
💪 Helper Utils For The Go: string, array/slice, map, format, cli, env, filesystem, test and more.

?? Helper Utils For The Go: string, array/slice, map, format, cli, env, filesystem, test and more. Go 的一些工具函数,格式化,特殊处理,常用信息获取等等

Jan 6, 2023
encLib is a simple golang package for quickly encoding and decoding string data in hex

encLib is a simple golang package for quickly encoding and decoding string data in hex

Nov 1, 2021
Package strnaming is used to Convert string to camelCase, snake_case, kebab-case.

strnaming Package strnaming is used to Convert string to camelCase, snake_case, kebab-case. Contents strnaming Contents API Examples Install Quick sta

Oct 24, 2021
Cell is a Go package that creates new instances by string in running time.

Cell Cell is a Go package that creates new instances by string in running time. Getting Started Installing To start using CELL, install Go and run go

Dec 20, 2021
ms - 'my story' creates a secure password string which can be memorized with a technique shared by Max.

On 23.12.21 20:22, Stefan Claas wrote: [...] > > Yes, I am aware of that, but how can one memorize a key when traveling > and not taking any devices

Dec 24, 2021
Q2entities - Parse the entities string from a Quake 2 .bsp map file. Written in Go

Q2Entities A simple command-line utility to extract the entities string from a Quake 2 map file. Entities? Binary Space Partitioning maps (.bsp) conta

Apr 9, 2022
Generates a random alphanumeric string of a given length.

randstring randstring.Create () is fast and has minimal memory allocation. It returns a random alphanumeric string of a given length. Install go get g

Jan 7, 2022
With Pasteback you can store common terms and paste them back to clipboard

Pasteback With Pasteback you can store common terms and paste them back to clipboard when needed. There is one central spot to put every string into y

Nov 22, 2021