K-Sortable Globally Unique IDs

ksuid Go Report Card GoDoc Circle CI

ksuid is an efficient, comprehensive, battle-tested Go library for generating and parsing a specific kind of globally unique identifier called a KSUID. This library serves as its reference implementation.

Install

go get -u github.com/segmentio/ksuid

What is a KSUID?

KSUID is for K-Sortable Unique IDentifier. It is a kind of globally unique identifier similar to a RFC 4122 UUID, built from the ground-up to be "naturally" sorted by generation timestamp without any special type-aware logic.

In short, running a set of KSUIDs through the UNIX sort command will result in a list ordered by generation time.

Why use KSUIDs?

There are numerous methods for generating unique identifiers, so why KSUID?

  1. Naturally ordered by generation time
  2. Collision-free, coordination-free, dependency-free
  3. Highly portable representations

Even if only one of these properties are important to you, KSUID is a great choice! :) Many projects chose to use KSUIDs just because the text representation is copy-and-paste friendly.

1. Naturally Ordered By Generation Time

Unlike the more ubiquitous UUIDv4, a KSUID contains a timestamp component that allows them to be loosely sorted by generation time. This is not a strong guarantee (an invariant) as it depends on wall clocks, but is still incredibly useful in practice. Both the binary and text representations will sort by creation time without any special sorting logic.

2. Collision-free, Coordination-free, Dependency-free

While RFC 4122 UUIDv1s do include a time component, there aren't enough bytes of randomness to provide strong protection against collisions (duplicates). With such a low amount of entropy, it is feasible for a malicious party to guess generated IDs, creating a problem for systems whose security is, implicitly or explicitly, sensitive to an adversary guessing identifiers.

To fit into a 64-bit number space, Snowflake IDs and its derivatives require coordination to avoid collisions, which significantly increases the deployment complexity and operational burden.

A KSUID includes 128 bits of pseudorandom data ("entropy"). This number space is 64 times larger than the 122 bits used by the well-accepted RFC 4122 UUIDv4 standard. The additional timestamp component can be considered "bonus entropy" which further decreases the probability of collisions, to the point of physical infeasibility in any practical implementation.

Highly Portable Representations

The text and binary representations are lexicographically sortable, which allows them to be dropped into systems which do not natively support KSUIDs and retain their time-ordered property.

The text representation is an alphanumeric base62 encoding, so it "fits" anywhere alphanumeric strings are accepted. No delimiters are used, so stringified KSUIDs won't be inadvertently truncated or tokenized when interpreted by software that is designed for human-readable text, a common problem for the text representation of RFC 4122 UUIDs.

How do KSUIDs work?

Binary KSUIDs are 20-bytes: a 32-bit unsigned integer UTC timestamp and a 128-bit randomly generated payload. The timestamp uses big-endian encoding, to support lexicographic sorting. The timestamp epoch is adjusted to May 13th, 2014, providing over 100 years of life. The payload is generated by a cryptographically-strong pseudorandom number generator.

The text representation is always 27 characters, encoded in alphanumeric base62 that will lexicographically sort by timestamp.

High Performance

This library is designed to be used in code paths that are performance critical. Its code has been tuned to eliminate all non-essential overhead. The KSUID type is derived from a fixed-size array, which eliminates the additional reference chasing and allocation involved in a variable-width type.

The API provides an interface for use in code paths which are sensitive to allocation. For example, the Append method can be used to parse the text representation and replace the contents of a KSUID value without additional heap allocation.

All public package level "pure" functions are concurrency-safe, protected by a global mutex. For hot loops that generate a large amount of KSUIDs from a single Goroutine, the Sequence type is provided to elide the potential contention.

By default, out of an abundance of caution, the cryptographically-secure PRNG is used to generate the random bits of a KSUID. This can be relaxed in extremely performance-critical code using the included FastRander type. FastRander uses the standard PRNG with a seed generated by the cryptographically-secure PRNG.

NOTE: While there is no evidence that FastRander will increase the probability of a collision, it shouldn't be used in scenarios where uniqueness is important to security, as there is an increased chance the generated IDs can be predicted by an adversary.

Battle Tested

This code has been used in production at Segment for several years, across a diverse array of projects. Trillions upon trillions of KSUIDs have been generated in some of Segment's most performance-critical, large-scale distributed systems.

Plays Well With Others

Designed to be integrated with other libraries, the KSUID type implements many standard library interfaces, including:

  • Stringer
  • database/sql.Scanner and database/sql/driver.Valuer
  • encoding.BinaryMarshal and encoding.BinaryUnmarshal
  • encoding.TextMarshal and encoding.TextUnmarshal (encoding/json friendly!)

Command Line Tool

This package comes with a command-line tool ksuid, useful for generating KSUIDs as well as inspecting the internal components of existing KSUIDs. Machine-friendly output is provided for scripting use cases.

Given a Go build environment, it can be installed with the command:

$ go install github.com/segmentio/ksuid/cmd/ksuid

CLI Usage Examples

Generate a KSUID

$ ksuid
0ujsswThIGTUYm2K8FjOOfXtY1K

Generate 4 KSUIDs

$ ksuid -n 4
0ujsszwN8NRY24YaXiTIE2VWDTS
0ujsswThIGTUYm2K8FjOOfXtY1K
0ujssxh0cECutqzMgbtXSGnjorm
0ujsszgFvbiEr7CDgE3z8MAUPFt

Inspect the components of a KSUID

$ ksuid -f inspect 0ujtsYcgvSTl8PAuAdqWYSMnLOv

REPRESENTATION:

  String: 0ujtsYcgvSTl8PAuAdqWYSMnLOv
     Raw: 0669F7EFB5A1CD34B5F99D1154FB6853345C9735

COMPONENTS:

       Time: 2017-10-09 21:00:47 -0700 PDT
  Timestamp: 107608047
    Payload: B5A1CD34B5F99D1154FB6853345C9735

Generate a KSUID and inspect its components

$ ksuid -f inspect

REPRESENTATION:

  String: 0ujzPyRiIAffKhBux4PvQdDqMHY
     Raw: 066A029C73FC1AA3B2446246D6E89FCD909E8FE8

COMPONENTS:

       Time: 2017-10-09 21:46:20 -0700 PDT
  Timestamp: 107610780
    Payload: 73FC1AA3B2446246D6E89FCD909E8FE8

Inspect a KSUID with template formatted inspection output

$ ksuid -f template -t '{{ .Time }}: {{ .Payload }}' 0ujtsYcgvSTl8PAuAdqWYSMnLOv
2017-10-09 21:00:47 -0700 PDT: B5A1CD34B5F99D1154FB6853345C9735

Inspect multiple KSUIDs with template formatted output

$ ksuid -f template -t '{{ .Time }}: {{ .Payload }}' $(ksuid -n 4)
2017-10-09 21:05:37 -0700 PDT: 304102BC687E087CC3A811F21D113CCF
2017-10-09 21:05:37 -0700 PDT: EAF0B240A9BFA55E079D887120D962F0
2017-10-09 21:05:37 -0700 PDT: DF0761769909ABB0C7BB9D66F79FC041
2017-10-09 21:05:37 -0700 PDT: 1A8F0E3D0BDEB84A5FAD702876F46543

Generate KSUIDs and output JSON using template formatting

$ ksuid -f template -t '{ "timestamp": "{{ .Timestamp }}", "payload": "{{ .Payload }}", "ksuid": "{{.String}}"}' -n 4
{ "timestamp": "107611700", "payload": "9850EEEC191BF4FF26F99315CE43B0C8", "ksuid": "0uk1Hbc9dQ9pxyTqJ93IUrfhdGq"}
{ "timestamp": "107611700", "payload": "CC55072555316F45B8CA2D2979D3ED0A", "ksuid": "0uk1HdCJ6hUZKDgcxhpJwUl5ZEI"}
{ "timestamp": "107611700", "payload": "BA1C205D6177F0992D15EE606AE32238", "ksuid": "0uk1HcdvF0p8C20KtTfdRSB9XIm"}
{ "timestamp": "107611700", "payload": "67517BA309EA62AE7991B27BB6F2FCAC", "ksuid": "0uk1Ha7hGJ1Q9Xbnkt0yZgNwg3g"}

Implementations for other languages

License

ksuid source code is available under an MIT License.

Comments
  • Add links to implementations in other lanugages

    Add links to implementations in other lanugages

    The Go library is the reference implementation, though not everyone uses Go. That's why it's useful to link to other implementations so people can use KSUID in the language of their choice.

  • Analysis of hash distribution of ksuids.

    Analysis of hash distribution of ksuids.

    Has any analysis of common hash distribution of ksuid values been done? There exists use cases where you want the time ordering for client side uses of ids but on the server side use the ids for distributed storage in a cluster data store. In this case a typical strategy is to hash the ids and use that to determine a shard within the datastore for that entity.

    Does the timestamp prefixing cause hashes of ksuids to cluster under common hash functions, e.g. MD5, various FNV or SHA algorithms etc?

  • Optimize new

    Optimize new

    The changes in the PR optimize the implementation of NewRandom, here's a highlight of the changes:

    • Use a non-cryptographic random source to avoid having to make a syscall to generate KSUIDs, use Go's math/rand.Rand seeded with a crypto-random source
    • Get rid of of the payload memory pool, use a global buffer instead. Since we have to pay the cost of using a mutex to access the random generator, we load the random bytes in a global buffer synchronized on the same mutex and then copy the results to a local variable.
    • Load the payload directly into the generated KSUID variable and inline the time serialization from FromParts.
    • Change ReadFull to ReadAtLeast, for some reason the compiled failed to inline this function call and resulted in ~3% of the remaining cost of NewRandom.

    Here are the numbers before and after:

    $ go test -run _ -v -bench New -cpuprofile /tmp/cpu -benchmem -memprofile /tmp/mem
    BenchmarkNew-4   	 1000000	      1234 ns/op	       0 B/op	       0 allocs/op
    PASS
    ok  	github.com/segmentio/ksuid	1.258s
    
    $ go test -run _ -v -bench New -cpuprofile /tmp/cpu -benchmem -memprofile /tmp/mem
    BenchmarkNew-4   	20000000	       105 ns/op	       0 B/op	       0 allocs/op
    PASS
    ok  	github.com/segmentio/ksuid	2.228s
    

    Please take a look and let me know if something should be changed.

  • Does ksuid support for Postgres

    Does ksuid support for Postgres

    I'm very exciting about ksuid. But when I want create a new table on postgres with primary key is ksuid. So what kind of data type should I set for ksuid ? How about performance of it in big query ? Beause with UUID, it's supported by data type is uuid. Thank you so much

  • Data structures

    Data structures

    This PR adds data structures to work with KSUIDs.

    The first one is the Sequence type, which can be used to generate an ordered sequence of KSUIDs.

    The second is the CompressedSet type, which is an ordered set of KSUIDs that uses techniques inspired from bitmap compression to efficiently store KSUIDs, especially when IDs are packed in sequences or close to each other.

    Here are the test and benchmarks:

    $ go test -v -run CompressedSet -bench CompressedSet -benchmem
    === RUN   TestCompressedSet
    === RUN   TestCompressedSet/sparse
    === RUN   TestCompressedSet/packed
    === RUN   TestCompressedSet/mixed
    --- PASS: TestCompressedSet (0.00s)
        --- PASS: TestCompressedSet/sparse (0.00s)
        	set_test.go:126: original 20000 B, compressed 17102 B (14.49%)
        --- PASS: TestCompressedSet/packed (0.00s)
        	set_test.go:126: original 20000 B, compressed 2153 B (89.23%)
        --- PASS: TestCompressedSet/mixed (0.00s)
        	set_test.go:126: original 20000 B, compressed 5022 B (74.89%)
    goos: darwin
    goarch: amd64
    pkg: github.com/segmentio/ksuid
    BenchmarkCompressedSet/write-4         	    3000	    465194 ns/op	      32 B/op	       1 allocs/op
    BenchmarkCompressedSet/read-4          	   20000	     62202 ns/op	       0 B/op	       0 allocs/op
    PASS
    ok  	github.com/segmentio/ksuid	3.380s
    
    • 465us to compress 1000 KSUIDs the profile shows that this is largely dominated by the sorting operation, I'll see what I can do about this
    • 62us to decompress 1000 KSUIDs I think this is pretty good perf, especially considering the kind of compression ratios that this data structure can achieve when storing lots of KSUIDs that were produced around the same time or using sequences.

    screen shot 2017-09-27 at 4 28 57 pm

    Please take a look and let me know if anything should be changed.

  • What happens after 100 years?

    What happens after 100 years?

    Readme says that the timestamp provides over 100 years of life. What happens after 100 years? Does it wrap and use the same timestamp around 100 years ago? Will it break index logicality or other things? Moving 4 bits from the random part to the timestamp would make it 2000 years and wouldn't change much the library guarantees so I wonder why not do it?

  • Wrong epoch date in README?

    Wrong epoch date in README?

    Maybe I'm misunderstanding something, but, isn't the date in README wrong?

    I mean, where it says

    The timestamp epoch is adjusted to March 5th, 2014 [...]

    , shouldn't it be May 13th, 2014 ? Given that

    go run ./ksuid -f inspect 000000000000000000000000000
    

    outputs

    REPRESENTATION:
    
      String: 000000000000000000000000000
         Raw: 0000000000000000000000000000000000000000
    
    COMPONENTS:
    
           Time: 2014-05-13 18:53:20 +0200 CEST
      Timestamp: 0
        Payload: 00000000000000000000000000000000
    
    
  • SemVer releases

    SemVer releases

    Now that the Go ecosystem started to lean towards using semantic versioning, using git tags for this package might help people utilizing the new way in their applications. Also, gives the impression that this package is production ready.

    What do you think?

  • README: replace the Rust implementation

    README: replace the Rust implementation

    When I created #49 I used the rust implementation that I thought was best based on looking at github/crates.io

    Though having actually used it now, it's very much lacking, so I ended up implementing a new one that has a saner internal representation, integrates with the rest of the Rust ecosystem, and tests a good chunk of the KSUID value space for correctness, as can be seen in: https://github.com/svix/rust-ksuid/blob/main/tests/test_kuids.txt

  • Is it possible to generate KSUID from SQL?

    Is it possible to generate KSUID from SQL?

    Hi, To set DEFAULT value for an PK of table, I wonder if has anyone tried to generate a valid ksuid value by using SQL.

    For example for UUIDv4, we can generate default value if uuid-ossp extension is not installed for PostrgreSQL as:

    `CREATE TABLE IF NOT EXISTS table (
       id VARCHAR(36) DEFAULT md5(random()::text || clock_timestamp()::text)::uuid  NOT NULL,
    ....
    

    Of course it'll be better to generate from application but, I it makes simpler to add default value for manual interactions with tables.

    Has anyone tried to develop an SQL function to geneate ksuid?

  • Go's default random number generators are not safe for concurrent use?

    Go's default random number generators are not safe for concurrent use?

    Hi, this is an observation, not a bug.

    ksuid.go says

    Go's default random number generators are not safe for concurrent use by multiple goroutines, the use of the rander and randBuffer are explicitly synchronized here.

    Thus it proceeds to lock its own randMutex.

    I have the intuition that crypto/rand.Reader would be broken if it wasn't safe for concurrent use, and that actually it is safe. This seems confirmed by the implementations in rand_unix.go and rand_windows.go which already use their own mutex.

    I was curious if removing randMutex would improve the benchmark numbers, but it doesn't.

    Still, randMutex might be unnecessary.

  • GUID comparison

    GUID comparison

    Greetings,

    I am compiling a GUID summary sheet which tries to aggregate different dimensions with the intent to help engineers pick the right generator.

    Would you be kind enough to help me validate the KSUID column and point to any issues?

    Thank you

    image
  • [Question]: I know that is not possible to use more than 16 bit payload, but what's the reason?

    [Question]: I know that is not possible to use more than 16 bit payload, but what's the reason?

    If there's no technical problem could make it configurable? Like:

    // It will return a configurable instance.
    instance := ksuid.NewInstance(ksuid.Config{
       MaxPayloadLength: 32,
    })
    

    And if possible a method to replace the global instance like golang zap:

    instance := ksuid.NewInstance(ksuid.Config{
       MaxPayloadLength: 32,
    })
    
    // instance will create a 32 bit payload while ksuid.New() will create a 16 bit payload.
    // to avoid that we could do something like:
    
    undo := ksuid.ReplaceGlobals(instance)
    defer undo()
    
    // here the ksuid and instance are both configurable in the same way so I could call 
    // ksuid.New() or instance.New() and both will create a 32 bit payload.
    
  • Postgres extension written in rust to generate ksuid

    Postgres extension written in rust to generate ksuid

    Hey, I created a postgres extension that was written in rust, and can be used in postgres to generate ksuid values for various stuff.

    check it here - https://github.com/spa5k/uids-postgres

    mainly created it for experiment and lack of a good postgres extension written in ksuid, ulid and nanoid.

    One extension for all 3 of em.

  • What is the differences between Sync and Async?

    What is the differences between Sync and Async?

    Creation You can create a new instance synchronously:

    const ksuidFromSync = KSUID.randomSync() Or asynchronously:

    const ksuidFromAsync = await KSUID.random()

  • Passing a timestamp before the epoch to FromParts does not return an error

    Passing a timestamp before the epoch to FromParts does not return an error

    It seems that it wraps around rather than returning an error.

    https://go.dev/play/p/s2aA3-ERd-E

    Uses timestamp 2000-01-01T00:00:00.00Z and zeros for the payload. Returns: WfeXVK2UN8Qh86bSI6R4zNvC7ge

    $ ksuid -f inspect WfeXVK2UN8Qh86bSI6R4zNvC7ge
    
    REPRESENTATION:
    
      String: WfeXVK2UN8Qh86bSI6R4zNvC7ge
         Raw: E4FAF58000000000000000000000000000000000
    
    COMPONENTS:
    
           Time: 2136-02-07 01:28:16 -0500 EST
      Timestamp: 3841652096
        Payload: 00000000000000000000000000000000
    
Universally Unique Lexicographically Sortable Identifier (ULID) in Go

Universally Unique Lexicographically Sortable Identifier A Go port of alizain/ulid with binary format implemented. Background A GUID/UUID can be subop

Jan 9, 2023
✨ Generate unique IDs (Port of Node package "generate-snowflake" to Golang)

✨ Generate Snowflake Generate unique IDs. Inspired by Twitter's Snowflake system. ?? Installation Initialize your project (go mod init example.com/exa

Feb 11, 2022
A simple to use Go (golang) package to generate or parse Twitter snowflake IDs

snowflake snowflake is a Go package that provides A very simple Twitter snowflake generator. Methods to parse existing snowflake IDs. Methods to conve

Dec 30, 2022
Snowflake - Simple twitter's snowflake uniquely identifiable descriptors (IDs) format generator for Go

Snowflake Dead simple and fast Twitter's snowflake id generator in Go. Installation go get github.com/HotPotatoC/snowflake Usage Generating a snowflak

Oct 6, 2022
Snowflake - A simple to use Go (golang) package to generate or parse Twitter snowflake IDs
Snowflake - A simple to use Go (golang) package to generate or parse Twitter snowflake IDs

❄️ Go-Snowflake A Snowflake Generator for Go A simple to use Go (golang) package

Oct 20, 2022
A tiny and fast Go unique string generator

Nano ID A tiny and fast Go unique string generator Safe. It uses cryptographically strong random APIs and tests distribution of symbols. Compact. It u

Nov 11, 2022
High performance unique number generator powered by Go

SEQSVR High performance unique number generator powered by Go 中文 README Features Distributed: Can be scaled horizontally High performance: Allocation

Nov 16, 2022
A network service for generating unique ID numbers inspired by Twitter's Snowflake.

Hanabira Hanabira is a network service for generating unique ID numbers inspired by Twitter's Snowflake. How to run hanabira-cluster and etcd-cluster

Jan 13, 2022
K-Sortable Globally Unique IDs

ksuid ksuid is an efficient, comprehensive, battle-tested Go library for generating and parsing a specific kind of globally unique identifier called a

Jan 9, 2023
Compact, sortable and fast unique IDs with embedded metadata.
Compact, sortable and fast unique IDs with embedded metadata.

A spec for unique IDs in distributed systems based on the Snowflake design, i.e. a coordination-based ID variant. It aims to be friendly to both machi

Dec 22, 2022
Universally Unique Lexicographically Sortable Identifier (ULID) in Go

Universally Unique Lexicographically Sortable Identifier A Go port of alizain/ulid with binary format implemented. Background A GUID/UUID can be subop

Jan 9, 2023
Super short, fully unique, non-sequential and URL friendly Ids

Generator of unique non-sequential short Ids The package shortidenables the generation of short, fully unique, non-sequential and by default URL frien

Dec 30, 2022
✨ Generate unique IDs (Port of Node package "generate-snowflake" to Golang)

✨ Generate Snowflake Generate unique IDs. Inspired by Twitter's Snowflake system. ?? Installation Initialize your project (go mod init example.com/exa

Feb 11, 2022
Simple git hooks written in go that installs globally to your machine

Go-hooks Simple git hooks written in go that installs globally to your machine Install curl -fsSL

Oct 19, 2022
Koyeb is a developer-friendly serverless platform to deploy apps globally.
Koyeb is a developer-friendly serverless platform to deploy apps globally.

Koyeb Serverless Platform Deploy a Go Gin application on Koyeb Learn more about Koyeb · Explore the documentation · Discover our tutorials About Koyeb

Nov 14, 2022
Sep 26, 2022
Vex is a variable-length, lexicographically-sortable hex format for uint64 values

Vex is a variable-length, lexicographically-sortable hex format for uint64 values. It can be used instead of fmt.Sprintf("%016x", v) for shorter s

Mar 3, 2022
Forms is a fast, powerful, flexible, sortable web form rendering library written in golang.

forms Description forms makes form creation and handling easy. It allows the creation of form without having to write HTML code or bother to make the

Oct 2, 2022
Singlestore event analytics - Evaluation of sortable ID generation schemes

Singlestore event analytics - Evaluation of sortable ID generation schemes

Jan 25, 2022
ID type with marshalling to/from hash to prevent sending IDs to clients.
ID type with marshalling to/from hash to prevent sending IDs to clients.

Hide IDs Hide is a simple package to provide an ID type that is marshalled to/from a hash string. This prevents sending technical IDs to clients and c

Dec 10, 2022