Go package containing implementations of efficient encoding, decoding, and validation APIs.

encoding Circle CI Go Report Card GoDoc

Go package containing implementations of encoders and decoders for various data formats.

Motivation

At Segment, we do a lot of marshaling and unmarshaling of data when sending, queuing, or storing messages. The resources we need to provision on the infrastructure are directly related to the type and amount of data that we are processing. At the scale we operate at, the tools we choose to build programs can have a large impact on the efficiency of our systems. It is important to explore alternative approaches when we reach the limits of the code we use.

This repository includes experiments for Go packages for marshaling and unmarshaling data in various formats. While the focus is on providing a high performance library, we also aim for very low development and maintenance overhead by implementing APIs that can be used as drop-in replacements for the default solutions.

Requirements and Maintenance Schedule

This package has no dependencies outside of the core runtime of Go. It requires a recent version of Go.

This package follows the same maintenance schedule as the Go project, meaning that issues relating to versions of Go which aren't supported by the Go team, or versions of this package which are older than 1 year, are unlikely to be considered.

Additionally, we have fuzz tests which aren't a runtime required dependency but will be pulled in when running go mod tidy. Please don't include these go.mod updates in change requests.

encoding/json GoDoc

More details about the implementation of this package can be found here.

The json sub-package provides a re-implementation of the functionalities offered by the standard library's encoding/json package, with a focus on lowering the CPU and memory footprint of the code.

The exported API of this package mirrors the standard library's encoding/json package, the only change needed to take advantage of the performance improvements is the import path of the json package, from:

import (
    "encoding/json"
)

to

import (
    "github.com/segmentio/encoding/json"
)

The improvement can be significant for code that heavily relies on serializing and deserializing JSON payloads. The CI pipeline runs benchmarks to compare the performance of the package with the standard library and other popular alternatives; here's an overview of the results:

Comparing to encoding/json (v1.16.2)

name                           old time/op    new time/op     delta
Marshal/*json.codeResponse2      6.40ms ± 2%     3.82ms ± 1%   -40.29%  (p=0.008 n=5+5)
Unmarshal/*json.codeResponse2    28.1ms ± 3%      5.6ms ± 3%   -80.21%  (p=0.008 n=5+5)

name                           old speed      new speed       delta
Marshal/*json.codeResponse2     303MB/s ± 2%    507MB/s ± 1%   +67.47%  (p=0.008 n=5+5)
Unmarshal/*json.codeResponse2  69.2MB/s ± 3%  349.6MB/s ± 3%  +405.42%  (p=0.008 n=5+5)

name                           old alloc/op   new alloc/op    delta
Marshal/*json.codeResponse2       0.00B           0.00B           ~     (all equal)
Unmarshal/*json.codeResponse2    1.80MB ± 1%     0.02MB ± 0%   -99.14%  (p=0.016 n=5+4)

name                           old allocs/op  new allocs/op   delta
Marshal/*json.codeResponse2        0.00            0.00           ~     (all equal)
Unmarshal/*json.codeResponse2     76.6k ± 0%       0.1k ± 3%   -99.92%  (p=0.008 n=5+5)

Benchmarks were run on a Core i9-8950HK CPU @ 2.90GHz.

Comparing to github.com/json-iterator/go (v1.1.10)

name                           old time/op    new time/op    delta
Marshal/*json.codeResponse2      6.19ms ± 3%    3.82ms ± 1%   -38.26%  (p=0.008 n=5+5)
Unmarshal/*json.codeResponse2    8.52ms ± 3%    5.55ms ± 3%   -34.84%  (p=0.008 n=5+5)

name                           old speed      new speed      delta
Marshal/*json.codeResponse2     313MB/s ± 3%   507MB/s ± 1%   +61.91%  (p=0.008 n=5+5)
Unmarshal/*json.codeResponse2   228MB/s ± 3%   350MB/s ± 3%   +53.50%  (p=0.008 n=5+5)

name                           old alloc/op   new alloc/op   delta
Marshal/*json.codeResponse2       8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
Unmarshal/*json.codeResponse2    1.05MB ± 0%    0.02MB ± 0%   -98.53%  (p=0.000 n=5+4)

name                           old allocs/op  new allocs/op  delta
Marshal/*json.codeResponse2        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
Unmarshal/*json.codeResponse2     37.2k ± 0%      0.1k ± 3%   -99.83%  (p=0.008 n=5+5)

Although this package aims to be a drop-in replacement of encoding/json, it does not guarantee the same error messages. It will error in the same cases as the standard library, but the exact error message may be different.

encoding/iso8601 GoDoc

The iso8601 sub-package exposes APIs to efficiently deal with with string representations of iso8601 dates.

Data formats like JSON have no syntaxes to represent dates, they are usually serialized and represented as a string value. In our experience, we often have to check whether a string value looks like a date, and either construct a time.Time by parsing it or simply treat it as a string. This check can be done by attempting to parse the value, and if it fails fallback to using the raw string. Unfortunately, while the happy path for time.Parse is fairly efficient, constructing errors is much slower and has a much bigger memory footprint.

We've developed fast iso8601 validation functions that cause no heap allocations to remediate this problem. We added a validation step to determine whether the value is a date representation or a simple string. This reduced CPU and memory usage by 5% in some programs that were doing time.Parse calls on very hot code paths.

Comments
  • Can't reproduce speed/memory benefits in benchmarks

    Can't reproduce speed/memory benefits in benchmarks

    I tried this library as a drop in stdlib replacement and found that with our data, it was slightly worse than stdlib in both memory and speed.

    So I thought OK, benchmarks are highly dependent on the specific data used, I'll try with the sample data used by this project. To my surprise the results were even worse -- this library seems to use more memory than stdlib and perform more slowly.

    Then I noticed your README benchmarks were with Go 1.16.2, so I tried with that. Same outcome.

    I feel like I must be doing something really wrong, so I've put together a repo with the code and some of the benchmark stats I got at https://github.com/lpar/segmentio

  • json.Unescape has issues with UNC paths

    json.Unescape has issues with UNC paths

    I have a test string that has a value with an escaped UNC path in it . It validates ok according to: https://jsonlint.com/

    "{\"dataPath\":\"\\\\bnk11977fs\\bnk11977\\xyz11146682\\xyzdata\\\"}"

    When you pass it through json.Unescape the result is invalid json

    See: playground

    Highlights from the stdout are:

    raw sample0 "\"{\\\"dataPath\\\":\\\"\\\\\\\\bnk11977fs\\\\bnk11977\\\\be11146682\\\\encompassdata\\\\\\\"}\""
    
    unescape for sample0 was invalidated "{\"dataPath\":\"\\\\bnk11977fs\\bnk11977\\be11146682\\encompassdata\\\"}" ; error=json: invalid character 'e' in string escape code: "\\bnk11977fs\bnk11977\be1114668...
    

    You may ask, "If the original is valid json, then why do you need to unescape it anyway?"

    Answer: The raw escaped form comes from reading a logging entry. A log processor service (lambda) needs to create output without all the escapes, else another downstream reader (logstash ingest) see's the dcument event as a big ass string and not a nice nested object.

    So the goal is to be able to have the lambda processor write the line something like the following. yes the UNC and escapes are a bitch

    And this is a simple example. Imagine a raw json with 100 deply nested fields. You certinly can't just blind Replacement of backslash-quote with quote !

    {"dataPath":"\\bnk11977fs\bnk11977\be11146682\encompassdata\"}

  • Proposal: ability to marshal with flags

    Proposal: ability to marshal with flags

    Hi! Me again with another proposal!

    json.Parse offers a readily available alternative to json.Unmarshal with the simple addition of flags.

    In comparison, json.Append takes two extra parameters compared to json.Marshal: a []byte to fill and the flags. This means that unfortunately, you can't benefit from the encoderBufferPool if you just want to add some AppendFlags.

    This would give 3 functions from most to least flexible :

    • Append(b []byte, x interface {}, flags AppendFlags) - can use tags, no extra copy, but no buffer pool
    • MarshalWithFlags(x interface{}, flags AppendFlags) - can use tags, buffer pool & one extra copy
    • Marshal(x interface{}) - cannot use tags, buffer pool & one extra copy

    Having to pre-allocate a buffer for Append might be quite a challenge, because you often don't know how long the resulting buffer will be. For instance, allocating a full page like the buffer pool does is not necessarily a great option for small payloads. Reimplementing a similar pool in our code is definitely an option, but i'd be great to simply be able to reuse the logic that's already there in the lib.

    Would this be something you'd consider adding in? Again, happy to propose a PR.

  • proto: unexpected fault address when deref struct field

    proto: unexpected fault address when deref struct field

    func TestIssue110(t *testing.T) {
    	type message struct {
                    A *uint32 `protobuf:"fixed32,1,opt"`
    	}
    
    	var a uint32 = 0x41c06db4
    	data, _ := Marshal(message{
                    A: &a,
            })
    
    	var m message
    	err := Unmarshal(data, &m)
    	if err != nil {
    		t.Fatal(err)
    	}
    	if *m.A != 0x41c06db4 {
            t.Errorf("m.A mismatch, want 0x41c06db4 but got %x", m.A)
        }
    }
    
    

    expected pass test but got

    unexpected fault address 0x41c06db4
    fatal error: fault
    [signal 0xc0000005 code=0x0 addr=0x41c06db4 pc=0x86ec3c]
    
    goroutine 6 [running]:
    runtime.throw({0x8be436, 0x89ce00})
    	C:/Users/wdvxdr/sdk/go1.17.1/src/runtime/panic.go:1198 +0x76 fp=0xc00004deb8 sp=0xc00004de88 pc=0x738896
    runtime.sigpanic()
    	C:/Users/wdvxdr/sdk/go1.17.1/src/runtime/signal_windows.go:260 +0x10c fp=0xc00004df00 sp=0xc00004deb8 pc=0x74b52c
    github.com/segmentio/encoding/proto.TestFix32Decode(0xc000037d40)
    	C:/Users/wdvxdr/Documents/Project/encoding/proto/decode_test.go:164 +0xbc fp=0xc00004df70 sp=0xc00004df00 pc=0x86ec3c
    testing.tRunner(0xc000037d40, 0x8cd970)
    	C:/Users/wdvxdr/sdk/go1.17.1/src/testing/testing.go:1259 +0x102 fp=0xc00004dfc0 sp=0xc00004df70 pc=0x7f9dc2
    testing.(*T).Run·dwrap·21()
    	C:/Users/wdvxdr/sdk/go1.17.1/src/testing/testing.go:1306 +0x2a fp=0xc00004dfe0 sp=0xc00004dfc0 pc=0x7faaca
    runtime.goexit()
    	C:/Users/wdvxdr/sdk/go1.17.1/src/runtime/asm_amd64.s:1581 +0x1 fp=0xc00004dfe8 sp=0xc00004dfe0 pc=0x767801
    created by testing.(*T).Run
    	C:/Users/wdvxdr/sdk/go1.17.1/src/testing/testing.go:1306 +0x35a
    
    goroutine 1 [chan receive]:
    testing.(*T).Run(0xc000037ba0, {0x8c098e, 0x76a0d3}, 0x8cd970)
    	C:/Users/wdvxdr/sdk/go1.17.1/src/testing/testing.go:1307 +0x375
    testing.runTests.func1(0xc0000706c0)
    	C:/Users/wdvxdr/sdk/go1.17.1/src/testing/testing.go:1598 +0x6e
    testing.tRunner(0xc000037ba0, 0xc000079d18)
    	C:/Users/wdvxdr/sdk/go1.17.1/src/testing/testing.go:1259 +0x102
    testing.runTests(0xc000102080, {0xa1a4e0, 0x17, 0x17}, {0x78a74d, 0x8c0705, 0x0})
    	C:/Users/wdvxdr/sdk/go1.17.1/src/testing/testing.go:1596 +0x43f
    testing.(*M).Run(0xc000102080)
    	C:/Users/wdvxdr/sdk/go1.17.1/src/testing/testing.go:1504 +0x51d
    main.main()
    	_testmain.go:119 +0x14b
    
    
    Process finished with the exit code 1
    
    
  • handling of time.Duration deviates from stdlib

    handling of time.Duration deviates from stdlib

    This library currently encodes and decodes time.Duration values as strings. This deviates from the stdlib, as the type time.Duration does not implement json.Marshaller and is treated as an int64 instead.

    Since this library implements this custom logic for both encode and decode, using it on both sides of an exchange of data is no problem. However, when interacting with other decoders (including the stdlib, but also extends to other languages/tools) the difference can be breaking.

    My understanding is that this particular behavior is relied on heavily within Segment, so simply changing it to adhere strictly to the stdlib will be breaking for us.

    There is currently no way to turn this functionality off, so in the short-term we should expose a config flag to support this. Over the long-term though, matching the stdlib should be the default and the flag should instead enable this additional functionality.

  • fix(json): improve handling of omitempty with embedded pointers

    fix(json): improve handling of omitempty with embedded pointers

    This PR fixes #63 by improving the handling of embedded pointers. Not surprisingly, these are a tricky bunch to contend with, so I've included a few test cases to demonstrate the efficacy of this change, but would appreciate some feedback on additional test cases to include.

    In short, when passing a value to the "is empty" func, previously we weren't following pointers properly when they were embedded since we failed to track the pointer addresses own offset. This fix still feels kinda janky to me, especially to someone not especially comfortable with dealing with pointers like this, but it seems to get the job done.

Golang parameter validation, which can replace go-playground/validator, includes ncluding Cross Field, Map, Slice and Array diving, provides readable,flexible, configurable validation.
Golang parameter validation, which can replace go-playground/validator, includes ncluding Cross Field, Map, Slice and Array diving, provides readable,flexible, configurable validation.

Checker 中文版本 Checker is a parameter validation package, can be use in struct/non-struct validation, including cross field validation in struct, elemen

Dec 16, 2022
Dec 28, 2022
Library providing opanapi3 and Go types for store/validation and transfer of ISO-4217, ISO-3166, and other types.

go-types This library has been created with the purpose to facilitate the store, validation, and transfer of Go ISO-3166/ISO-4217/timezones/emails/URL

Nov 9, 2022
:100:Go Struct and Field validation, including Cross Field, Cross Struct, Map, Slice and Array diving

Package validator Package validator implements value validations for structs and individual fields based on tags. It has the following unique features

Jan 1, 2023
Swagger builder and input validation for Go servers
Swagger builder and input validation for Go servers

crud A Swagger/OpenAPI builder and validation library for building HTTP/REST APIs. Heavily inspired by hapi and the hapi-swagger projects. No addition

Jan 5, 2023
Struct validation using tags

Govalid Use Govalid to validate structs. Documentation For full documentation see pkg.go.dev. Example package main import ( "fmt" "log" "strings"

Dec 6, 2022
Validate Golang request data with simple rules. Highly inspired by Laravel's request validation.
Validate Golang request data with simple rules. Highly inspired by Laravel's request validation.

Validate golang request data with simple rules. Highly inspired by Laravel's request validation. Installation Install the package using $ go get githu

Dec 29, 2022
An interesting go struct tag expression syntax for field validation, etc.

go-tagexpr An interesting go struct tag expression syntax for field validation, etc. Usage Validator: A powerful validator that supports struct tag ex

Jan 9, 2023
Opinionated go to validation library

?? valeed Your opinionated go-to validation library. Struct tag-based. Validate here, validate there, validate everywhere. Sleek and simple validation

Jul 21, 2022
Simple module for validation inn control number

simple module for validation inn control number

Sep 4, 2022
Gin Middleware to extract json tag value from playground validator's errors validation

Json Tag Extractor for Go-Playground Validator This is Gin Middleware that aim to extract json tag and than store it to FieldError.Field() object. Ins

Jan 14, 2022
Validator - Replace the validation framework used by gin

validator Replace the validation framework used by gin replace mod:replace githu

Jan 18, 2022
[Go] Package of validators and sanitizers for strings, numerics, slices and structs

govalidator A package of validators and sanitizers for strings, structs and collections. Based on validator.js. Installation Make sure that Go is inst

Jan 6, 2023
This package provides a framework for writing validations for Go applications.

github.com/gobuffalo/validate This package provides a framework for writing validations for Go applications. It does provide you with few validators,

Dec 15, 2022
Provide check digit algorithms and calculators written in Go

checkdigit About Provide check digit algorithms and calculators written by Go. Provided methods Algorithms Luhn Verhoeff Damm Calculators ISBN-10 ISBN

Dec 17, 2022
A norms and conventions validator for Terraform

This tool will help you ensure that a terraform folder answer to your norms and conventions rules. This can be really useful in several cases : You're

Nov 29, 2022
The Hyperscale InputFilter library provides a simple inputfilter chaining mechanism by which multiple filters and validator may be applied to a single datum in a user-defined order.

Hyperscale InputFilter Branch Status Coverage master The Hyperscale InputFilter library provides a simple inputfilter chaining mechanism by which mult

Oct 20, 2021
go-eexcel implements encoding and decoding of XLSX like encoding/json

go-eexcel go-eexcel implements encoding and decoding of XLSX like encoding/json Usage func ExampleMarshal() { type st struct { Name string `eexce

Dec 9, 2021
A simple, semantic and developer-friendly golang package for encoding&decoding and encryption&decryption

A simple, semantic and developer-friendly golang package for encoding&decoding and encryption&decryption

Jan 4, 2023
Go package for decoding and encoding TARGA image format

tga tga is a Go package for decoding and encoding TARGA image format. It supports RLE and raw TARGA images with 8/15/16/24/32 bits per pixel, monochro

Sep 26, 2022