Set data structure for Go

Archived project. No maintenance.

This project is not maintained anymore and is archived.. Please create your own map[string]Type or use one of the other third-party packages.

Thanks all for their work on this project.

Set GoDoc Build Status

Set is a basic and simple, hash-based, Set data structure implementation in Go (Golang).

Set provides both threadsafe and non-threadsafe implementations of a generic set data structure. The thread safety encompasses all operations on one set. Operations on multiple sets are consistent in that the elements of each set used was valid at exactly one point in time between the start and the end of the operation. Because it's thread safe, you can use it concurrently with your goroutines.

For usage see examples below or click on the godoc badge.

Install and Usage

Install the package with:

go get github.com/fatih/set

Import it with:

import "githug.com/fatih/set"

and use set as the package name inside the code.

Examples

Initialization of a new Set

// create a set with zero items
s := set.New(set.ThreadSafe) // thread safe version
s := set.New(set.NonThreadSafe) // non thread-safe version

Basic Operations

// add items
s.Add("istanbul")
s.Add("istanbul") // nothing happens if you add duplicate item

// add multiple items
s.Add("ankara", "san francisco", 3.14)

// remove item
s.Remove("frankfurt")
s.Remove("frankfurt") // nothing happens if you remove a nonexisting item

// remove multiple items
s.Remove("barcelona", 3.14, "ankara")

// removes an arbitary item and return it
item := s.Pop()

// create a new copy
other := s.Copy()

// remove all items
s.Clear()

// number of items in the set
len := s.Size()

// return a list of items
items := s.List()

// string representation of set
fmt.Printf("set is %s", s.String())

Check Operations

// check for set emptiness, returns true if set is empty
s.IsEmpty()

// check for a single item exist
s.Has("istanbul")

// ... or for multiple items. This will return true if all of the items exist.
s.Has("istanbul", "san francisco", 3.14)

// create two sets for the following checks...
s := s.New("1", "2", "3", "4", "5")
t := s.New("1", "2", "3")

// check if they are the same
if !s.IsEqual(t) {
    fmt.Println("s is not equal to t")
}

// if s contains all elements of t
if s.IsSubset(t) {
	fmt.Println("t is a subset of s")
}

// ... or if s is a superset of t
if t.IsSuperset(s) {
	fmt.Println("s is a superset of t")
}

Set Operations

// let us initialize two sets with some values
a := set.New(set.ThreadSafe)
a := set.Add("ankara", "berlin", "san francisco")
b := set.New(set.NonThreadSafe)
b := set.Add("frankfurt", "berlin")

// creates a new set with the items in a and b combined.
// [frankfurt, berlin, ankara, san francisco]
c := set.Union(a, b)

// contains items which is in both a and b
// [berlin]
c := set.Intersection(a, b)

// contains items which are in a but not in b
// [ankara, san francisco]
c := set.Difference(a, b)

// contains items which are in one of either, but not in both.
// [frankfurt, ankara, san francisco]
c := set.SymmetricDifference(a, b)
// like Union but saves the result back into a.
a.Merge(b)

// removes the set items which are in b from a and saves the result back into a.
a.Separate(b)

Multiple Set Operations

a := set.New(set.ThreadSafe)
a := set.Add("1", "3", "4", "5")
b := set.New(set.ThreadSafe)
b := set.Add("2", "3", "4", "5")
c := set.New(set.ThreadSafe)
c := set.Add("4", "5", "6", "7")

// creates a new set with items in a, b and c
// [1 2 3 4 5 6 7]
u := set.Union(a, b, c)

// creates a new set with items in a but not in b and c
// [1]
u := set.Difference(a, b, c)

// creates a new set with items that are common to a, b and c
// [5]
u := set.Intersection(a, b, c)

Helper methods

The Slice functions below are a convenient way to extract or convert your Set data into basic data types.

// create a set of mixed types
s := set.New(set.ThreadSafe)
s := set.Add("ankara", "5", "8", "san francisco", 13, 21)

// convert s into a slice of strings (type is []string)
// [ankara 5 8 san francisco]
t := set.StringSlice(s)

// u contains a slice of ints (type is []int)
// [13, 21]
u := set.IntSlice(s)

Concurrent safe usage

Below is an example of a concurrent way that uses set. We call ten functions concurrently and wait until they are finished. It basically creates a new string for each goroutine and adds it to our set.

package main

import (
	"fmt"
	"strconv"
	"sync"

	"github.com/fatih/set"
)

func main() {
	var wg sync.WaitGroup // this is just for waiting until all goroutines finish

	// Initialize our thread safe Set
	s := set.New(set.ThreadSafe)

	// Add items concurrently (item1, item2, and so on)
	for i := 0; i < 10; i++ {
		wg.Add(1)

		go func(i int) {
			defer wg.Done()

			item := "item" + strconv.Itoa(i)
			fmt.Println("adding", item)
			s.Add(item)
		}(i)
	}

	// Wait until all concurrent calls finished and print our set
	wg.Wait()
	fmt.Println(s)
}

Credits

License

The MIT License (MIT) - see LICENSE.md for more details

Owner
Fatih Arslan
Software Engineer. Gopher and Coffee geek. Creator of vim-go. Tool maker.
Fatih Arslan
Comments
  • Semver?

    Semver?

    though it's maybe not so hot in golang circles, semver is still pretty much the de facto norm for release versioning. i noticed the one tag that's present so far is 0.1, which isn't quite there (0.1.0 is probably the equivalent). for future releases, would you be OK with adopting semver?

    like, once we finish this non-ts set implementation, a 0.2.0 would probably be in order. (or maybe a 1.0.0, if you feel like the API is stable.)

  • Implement sort.Interface

    Implement sort.Interface

    Check out and investigate how to implement it in a way that it doesn't break existing API, if any, then only in a minor fashion. Renaming Size() to Len() is one of those cases.

  • Shift non-essential methods into functions.

    Shift non-essential methods into functions.

    Re: #12, this narrows the interface a bit, moving methods that only call other methods (as opposed to accessing the internal map datastructure) into standalone functions.

    I'm less confident about this one, now having done it, but let's discuss.

  • Issue14 - revised New() method to factory method

    Issue14 - revised New() method to factory method

    Please review the new New() function. It now takes a parameter that identifies ThreadSafe or NonThreadSafe, also please review the README.md to see if it needs to be revised.

  • Provide non threadsafe implementation

    Provide non threadsafe implementation

    hi,

    might seem like an odd request, but it'd be nice if this lib provided an implementation without thread-safety in addition to the one with. that way i can pick which one makes sense depending on my use case. no reason they can't share an interface, of course.

    if you're amenable, i'll code one up.

  • Introduce Set interface & Each method

    Introduce Set interface & Each method

    so, this is a little prep work, leading up to the goal of #7.

    i've created a Set interface, and refactored the methods in the package to specify that, instead of *Set, as the type for provided parameters, and the return value in most cases. the side effect of doing this is that it's no longer OK to reach in and lock the mutexes in the provided object, as there's no guarantee that it's there. it seems this is the desired behavior if we're going to have a non-threadsafe set; one ought to be able to mix and match the two, no?

    the bigger issue, though, is that it's not really possible any longer to directly reach in and for...range across the map storage of the passed set. to that end, i've introduced an Each() method, which is really just an iterator achieved via injecting a closure. (i think that's a useful addition, regardless). that's now used to do these traversals; the performance hit, based on the benchmarks introduced here, looks to be around 5%:

    Through Each() (new):
    BenchmarkSetEquality       10000        767576 ns/op
    BenchmarkSubset    10000        786049 ns/op
    
    Direct map iteration (old):
    BenchmarkSetEquality       10000        723936 ns/op
    BenchmarkSubset    10000        728652 ns/op
    

    i also profiled this a bit, mostly just out of curiousity - results are here: https://gist.github.com/sdboyer/8447255 . that's profiling the BenchmarkSetEquality test. from that...well, i'm not even sure what to draw from it. but hey, there it is :)

    bottom line - i thought we'd discuss whether or not you're OK with these changes before i proceed on to implementing the non-ts set. if so, the next step from here is fairly obvious, but i wanted to check in first.

    also, side question - is there a reason you bookend the set methods with the lock/unlock calls instead of doing defer s.l.RUnlock() immediately after s.l.RLock()?

  • Change behaviour of set.New()

    Change behaviour of set.New()

    I think set.New() should not accept any items as inputs. Instead it should should just accept a type that defines the underlying set. This will simplify the usage of this current package. An example code would be

    a := set.New(set.ThreadSafe)
    b := set.New(set.NonThreadSafe)
    c := set.New(set.SliceBased)
    

    Each struct would then imply the Interface method.

    This is a proposal and is open to any comments.

  • Needs Big-O time complexity in documentation.

    Needs Big-O time complexity in documentation.

    Without time complexity information for each Set method, it is difficult to determine whether or not to use this library. I thought about using this library, until I read the code for Intersect, which calls Union twice, which in turn calls Copy. The whole operation is hardly optimized. I DO understand that this library is for convenience purposes and I am NOT intending to bash your code, but if people are to rely on such libraries, they at least need to know the complexity of the operations.

    To further make my point, I added a benchmark for a piece of code I wrote for the intersection of two sets, which I make no claim as being the fastest implementation, but the order is easily worked out as: O( (len(set1) + len(set2)) * O(map) )

    (I took the liberty of interleaving the results for easier comparison.)

    // Returns the intersection of two slices
    func SimpleIntersection(l1 []int, l2 []int) []int { // O((len(l1) + len(l2)) * O(map))
        ret := []int{}
        m := make(map[int]bool, len(l1))
    
        for _, item := range l1 {
            m[item] = true
        }
    
        for _, item := range l2 {
            if m[item] == true {
                ret = append(ret, item)
            }
        }
        return ret
    }
    
    PASS
    BenchmarkSetEquality-4                     10000        780659 ns/op
    BenchmarkSubset-4                          10000        801358 ns/op
    BenchmarkIntersection10-4                 100000         13075 ns/op
    BenchmarkSimpleIntersection10-4          2000000           601 ns/op
    BenchmarkIntersection100-4                 10000        118579 ns/op
    BenchmarkSimpleIntersection100-4          200000          6183 ns/op
    BenchmarkIntersection1000-4                 1000       1302503 ns/op
    BenchmarkSimpleIntersection1000-4          30000         53364 ns/op
    BenchmarkIntersection10000-4                 100      13888576 ns/op
    BenchmarkSimpleIntersection10000-4          3000        573658 ns/op
    BenchmarkIntersection100000-4                 10     175577891 ns/op
    BenchmarkSimpleIntersection100000-4          200       7036305 ns/op
    BenchmarkIntersection1000000-4                 1    2828214630 ns/op
    BenchmarkSimpleIntersection1000000-4          10     144838095 ns/op
    ok      github.com/fatih/set    41.404s
    
  • Introduce non-threadsafe Set implementation

    Introduce non-threadsafe Set implementation

    OK, here's a stab at the non-ts implementation.

    i made it so that Set uses SetNonTS as a promoted field, which isn't really ideal. for one, it'd be really nice to be able to reuse the actual logic from SetNonTS's methods, but i don't know if that's possible. also, that means it exposes the SetNonTS as an exported field, which is just wrong and dumb.

    this is definitely not done, but i wanted to get this up so we could discuss while it's in progress.

    re: #7

  • IsEqual Bug

    IsEqual Bug

    IsEqual returns true as long as the two sets are of the same size-regardless of their content. The tests don't cover such a case.

    equal := true
    if equal = len(s.m) == t.Size(); equal {
        t.Each(func(item interface{}) (equal bool) {
            _, equal = s.m[item]
            return
        })
    }
    
    return equal
    

    .

  • Panic if different types of Set implementations are passed to interface functions

    Panic if different types of Set implementations are passed to interface functions

    One can easily mix up Set interfaces and pass them to functions like Difference, Union, etc... This can be fix easily with a simple trick. We should have types of Set implementations and each Interface should have a new Type() method that returns a simple type declaration (this can be a const enum).

    And then we can simple check:

    func Intersection(s Interface, t Interface) Interface {
        if s.Type() != t.Type() {
            panic("Set implementations mixed up: %v , %v", s.Type(), v.Type())
        }
    
        u := s.Copy()
        u.Separate(Difference(u, t))
        return u
    }
    

    However one can easily fake this up with a custom Type(), but for our set package this shouldn't be a concern.

Related tags
Set data structure for Go

Archived project. No maintenance. This project is not maintained anymore and is archived.. Please create your own map[string]Type or use one of the ot

Nov 21, 2022
Set data structure for Go

Archived project. No maintenance. This project is not maintained anymore and is archived.. Please create your own map[string]Type or use one of the ot

Nov 21, 2022
Disjoint Set data structure implementation in Go

dsu Implementation of the Disjoint-Set data structure. The Disjoint-Set, Also called a Union-Find or Merge-Find set, is a data structure that stores a

Dec 9, 2022
A bloom filter is a probabilistic data structure that is used to determine if an element is present in a set

A Go implementation of a bloom filter, with support for boltdb and badgerdb as optional in-memory persistent storage.

Jul 4, 2022
Recursively searches a map[string]interface{} structure for another map[string]interface{} structure

msirecurse Recursively searches a map[string]interface{} structure for existence of a map[string]interface{} structure Motivation I wrote this package

Mar 3, 2022
Bitset data structure

Your basic bit Set data structure for positive numbers A bit array, or bit set, is an efficient set data structure. It consists of an array that compa

Dec 29, 2022
Data structure and algorithm library for go, designed to provide functions similar to C++ STL

GoSTL English | 简体中文 Introduction GoSTL is a data structure and algorithm library for go, designed to provide functions similar to C++ STL, but more p

Dec 26, 2022
Data structure and relevant algorithms for extremely fast prefix/fuzzy string searching.

Trie Data structure and relevant algorithms for extremely fast prefix/fuzzy string searching. Usage Create a Trie with: t := trie.New() Add Keys with:

Dec 27, 2022
BTree provides a simple, ordered, in-memory data structure for Go programs.

BTree implementation for Go This package provides an in-memory B-Tree implementation for Go, useful as an ordered, mutable data structure. The API is

Dec 30, 2022
Generates data structure definitions from JSON files for any kind of programming language

Overview Archivist generates data structure definitions from JSON files for any kind of programming language. It also provides a library for golang to

Jun 28, 2022
A data structure for storing points.
A data structure for storing points.

ptree This package provides an in-memory data structure for storing points. Under the hood it stores points in a tree structure where nodes are spatia

Apr 18, 2022
Data Structure Libraries and Algorithms implementation

Algorithms Data Structure Libraries and Algorithms implementation in C++ Disclaimer This repository is meant to be used as a reference to learn data s

Dec 8, 2022
Trie data structure implementation in Golang 🌳

Gotri Gotri is an Unicode character based Trie/prefix tree implementation in Go, with the suggestion/auto-complete feature for character searching. Si

Jun 17, 2022
Data structure,Algorithms implemented in Go (for education)
 Data structure,Algorithms implemented in Go (for education)

Data structure,Algorithms implemented in Go (for education) List of Content : 1. Math - 2. String - 3. Conversions - 4. Sort - 5. Search - 6. Data str

Dec 13, 2022
Go implementation of the van Emde Boas tree data structure: Priority queue for positive whole numbers in O(log log u) time.

vEB Go implementation of the van Emde Boas tree data structure: Priority queue for positive whole numbers in O(log log u) time. Supports the following

Mar 7, 2022
Implementation of the MaxStack Data Structure in Go

MaxStack-Golang Implementation of the MaxStack Data Structure in Go This repository contains the design of a max stack data structure that supports th

Nov 10, 2021
publish a tree-like data structure from a backend to a front-end

tree-publish publish a tree-like data structure from a backend to a front-end. It needs to be a tree in order to publish the data as JSON document. If

Dec 20, 2021
Data Structure Series: Heaps and heap applications

heaps Data Structure Series: Heaps and heap applications Current Updates: GO: heaps.go: max-heap implementation standard add, remove, and restore func

Mar 17, 2022
simple golang event bus structure

super simple and small event bus structure for golang that allows emissions as go routines.

Dec 11, 2021