Molecule is a Go library for parsing protobufs in an efficient and zero-allocation manner

GoDoc C.I

Molecule

Molecule is a Go library for parsing protobufs in an efficient and zero-allocation manner. The API is loosely based on this excellent Go JSON parsing library.

This library is in alpha and the API could change. The current APIs are fairly low level, but additional helpers may be added in the future to make certain operations more ergonomic.

Rationale

The standard Unmarshal protobuf interface in Go makes it difficult to manually control allocations when parsing protobufs. In addition, its common to only require access to a subset of an individual protobuf's fields. These issues make it hard to use protobuf in performance critical paths.

This library attempts to solve those problems by introducing a streaming, zero-allocation interface that allows users to have complete control over which fields are parsed, and how/when objects are allocated.

The downside, of course, is that molecule is more difficult to use (and easier to misuse) than the standard protobuf libraries so its recommended that it only be used in situations where performance is important. It is not a general purpose replacement for proto.Unmarshal(). It is recommended that users familiarize themselves with the proto3 encoding before attempting to use this library.

Features

  1. Unmarshal all protobuf primitive types with a streaming, zero-allocation API.
  2. Support for iterating through protobuf messages in a streaming fashion.
  3. Support for iterating through packed protobuf repeated fields (arrays) in a streaming fashion.

Not Supported

  1. Proto2 syntax (some things will probably work, but nothing is tested).
  2. Repeated fields encoded not using the "packed" encoding (although in theory they can be parsed using this library, there just aren't any special helpers).
  3. Map fields. It should be possible to parse maps using this library's API, but it would be a bid tedious. I plan on adding better support for this once I settle on a reasonable API.
  4. Probably lots of other things.

Examples

The godocs have numerous runnable examples.

Attributions

This library is mostly a thin wrapper around other people's work:

  1. The interface was inspired by this jsonparser library.
  2. The codec for interacting with protobuf streams was lifted from this protobuf reflection library. The code was manually vendored instead of imported to reduce dependencies.

Dependencies

The core molecule library has zero external dependencies. The go.sum file does contain some dependencies introduced from the tests package, however, those should not be included transitively when using this library.

Comments
  • DecodeVarint: Fix panic on truncated input; Add tests

    DecodeVarint: Fix panic on truncated input; Add tests

    Replace references to cb.len with cb.Len() to ensure we are checking the difference from cb.len - cb.index. When some values have already been parsed, DecodeVarint will be called with cb.index > 0, which means the existing bounds checks were wrong

    Commit 775411c3fa added faster decoding of varints, but omitted some of the bounds checks. This means that using this library to parse truncated inputs now panics, instead of returning an error.

    Add a fuzz test to ensure this continues to work. Add a deterministic unit test extracted from the fuzz test as an easy to understand example.

  • Incomplete error handling

    Incomplete error handling

    func MessageEach(buffer *codec.Buffer, fn MessageEachFn) error {
    	for !buffer.EOF() {
    		fieldNum, wireType, err := buffer.DecodeTagAndWireType()
    		if err == io.EOF {
    			return nil
    		}
    
    		value, err := readValueFromBuffer(wireType, buffer)
    

    Non-nil, non-io.EOF errors from the decode method are ignored. Oops! 😅

  • Implement ProtoStream.Write()

    Implement ProtoStream.Write()

    There are two use case:

    1. Writing pre-encoded buffers into the stream.
    2. Writing string fields that can sometimes be "" without if/else

    A real-world example for the second use case is writing entries into pprof's Profile.string_table [1] where the first entry must be an empty string:

    func writeStringTable(ps *molecule.ProtoStream, str string) {
      ps.Embedded(6, func(ps *molecule.ProtoStream) error {
        _, err := ps.Write([]byte(str))
        return err
      })
    }
    
    writeStringTable(ps, "") // must be written (!)
    writeStringTable(ps, "foo")
    

    Without this patch one would you have to write something like this:

    func writeStringTable(ps *molecule.ProtoStream, str string) {
      if len(str) == 0 {
        ps.Embedded(6, func(ps *molecule.ProtoStream) error {
          return nil
        })
      } else {
        ps.String(6, str)
      }
    }
    
    writeStringTable(ps, "") // must be written (!)
    writeStringTable(ps, "foo")
    

    [1] https://github.com/google/pprof/blob/d260c55eee4ceee9be31ddc2517f54a0d92b85ec/proto/profile.proto#L65-L67

  • Set a cap on slices returned from AsBytesUnsafe

    Set a cap on slices returned from AsBytesUnsafe

    This will prevent callers from reading past the end of the slice returned from AsBytesUnsafe.

    Benchmark below shows no change (count=10)

    name                                         old time/op    new time/op    delta
    Molecule/standard_unmarshal-8                  1.45µs ± 1%    1.44µs ± 1%  -0.68%  (p=0.025 n=9+8)
    Molecule/unmarshal_all-8                        169ns ± 1%     170ns ± 3%    ~     (p=0.642 n=10+10)
    Molecule/unmarshal_loop-8                       154ns ± 1%     154ns ± 2%    ~     (p=0.561 n=9+10)
    Molecule/unmarshal_single_with_molecule-8       124ns ± 1%     124ns ± 2%    ~     (p=0.725 n=10+10)
    Molecule/unmarshal_multiple_with_molecule-8     746ns ± 1%     748ns ± 1%    ~     (p=0.565 n=10+10)
    Simple-8                                        270ns ± 2%     268ns ± 0%    ~     (p=0.617 n=10+8)
    Packing-8                                       179µs ±10%     167µs ±12%    ~     (p=0.156 n=10+9)
    
    name                                         old alloc/op   new alloc/op   delta
    Simple-8                                        1.00B ± 0%     1.00B ± 0%    ~     (all equal)
    Packing-8                                        835B ± 5%      832B ± 2%    ~     (p=0.880 n=10+8)
    
    name                                         old allocs/op  new allocs/op  delta
    Simple-8                                         1.00 ± 0%      1.00 ± 0%    ~     (all equal)
    Packing-8                                        0.00           0.00         ~     (all equal)
    
  • unsafeBytesToString: Fix Go 1.16 SliceHeader vet warnings

    unsafeBytesToString: Fix Go 1.16 SliceHeader vet warnings

    Go 1.16 added new vet warnings to check for unsafe uses of reflect.SliceHeader. To fix them, create an empty slice, then alter that slice using *reflect.SliceHeader/Stringheader. This is apparently safer because it works with escape analysis and garbage collection [1]. The pattern used here is also used in golang.org/x/sys, so I think this should be safe.

    [1] https://github.com/golang/go/issues/40701

    Fixes:

    value.go:149:35: possible misuse of reflect.StringHeader

  • Improve decoding throughput

    Improve decoding throughput

    This commit combines a few changes that improve the decoding throughput.

    name                                         old time/op  new time/op  delta
    Molecule/standard_unmarshal-8                1.28µs ± 8%  1.32µs ± 4%   +2.64%  (p=0.000 n=23+23)
    Molecule/unmarshal_all-8                      330ns ± 2%   211ns ± 6%  -36.09%  (p=0.000 n=25+21)
    Molecule/unmarshal_loop-8                     322ns ± 1%   169ns ± 3%  -47.62%  (p=0.000 n=22+22)
    Molecule/unmarshal_single_with_molecule-8     257ns ± 2%   165ns ± 2%  -35.72%  (p=0.000 n=21+21)
    Molecule/unmarshal_multiple_with_molecule-8  1.49µs ± 7%  0.99µs ± 4%  -33.36%  (p=0.000 n=25+24)
    

    I attempted to switch one of our apps to use molecule instead of the gogo generated code but I found that the CPU usage of using molecule significantly worse.

    The most obvious thing in the profiler was DecodeVarint so I initially focused my efforts there. I originally replaced DecodeVarint with the body of decodeVarintSlow as this allowed the function to be inlined. However, the implementation from https://github.com/dennwc/varint was faster still, even accounting for the fact that it is too complex to be inlined.

    The improvements here mainly focus on either inlining methods directly or modifying them so that they are eligible to be inlined. My intuition is that the call-site overhead was actually contributing to the the CPU usage so inlining where possible seemed to help.

    I also added a Next() method that allows a different method of iterating over the values, you can see in the benchmark that this is significantly faster than using MessageEach, again I think this is due to call-site overhead / stack management. I'm not sure how you feel about having competing APIs like this but the improvement to me seems worthwhile.

    Open to suggestions on how to improve this further.

Related tags
Simple library to handle ANSI functions and parsing of color formatting strings

Emerald A basic color library for use in my Go projects, built on top of mgutz/ansi. Package ansi is a small, fast library to create ANSI colored stri

Oct 28, 2022
A library for parsing ANSI encoded strings
 A library for parsing ANSI encoded strings

Go ANSI Parser converts strings with ANSI escape codes into a slice of structs that represent styled text

Sep 20, 2022
A small & fast dependency-free library for parsing micro expressions.

MicroExpr A small & fast dependency-free library for parsing micro expressions. This library was originally built for use in templating languages (e.g

Nov 25, 2022
A utility library to do files/io/bytes processing/parsing in file-system or network.

goreader A utility library to do files/io/bytes processing/parsing in file-system or network. These features are really common to be implemented for a

Nov 1, 2021
This library provides an ASTERIX Frame(binary data) decoding/parsing(json,xml) capabilities for Go.

GoAsterix This library provides an ASTERIX Frame(binary data) decoding/parsing(json,xml) capabilities for Go. ASTERIX ASTERIX (All Purpose Structured

Dec 13, 2022
A comprehensive, efficient, and reusable util function library of go.

Lancet Lancet is a comprehensive, efficient, and reusable util function library of go. Inspired by the java apache common package and lodash.js. Engli

Jan 8, 2023
Helper library for full uint64 randomness, pool backed for efficient concurrency

fastrand64-go Helper library for full uint64 randomness, pool backed for efficient concurrency Inspired by https://github.com/valyala/fastrand which i

Dec 5, 2021
Go-path - A helper package that provides utilities for parsing and using ipfs paths

go-path is a helper package that provides utilities for parsing and using ipfs paths

Jan 18, 2022
MNA - stands for mobile number assignment - a small zero external dependency golang library that is used to identify mobile number assignment in tanzania

MNA - stands for mobile number assignment - a small zero external dependency golang library that is used to identify mobile number assignment in tanzania

Nov 29, 2021
Golang source code parsing, usage like reflect package

gotype Golang source code parsing, usage like reflect package English 简体中文 Usage API Documentation Examples License Pouch is licensed under the MIT Li

Dec 9, 2022
Small utility to allow simpler, quicker testing of parsing files in crowdsec

cs_parser_test Small utility to allow simpler, quicker testing of parsing files in crowdsec Usage $ sudo cs_parser_test -t syslog /var/log/mail.log N

Jul 13, 2021
Gene parsing package for Axie Infinity

agp Package agp is a gene parsing package for Axie Infinity. The name agp stands for "Axie Gene Parser" which decodes the hex representation of an Axi

Apr 18, 2022
MCsniperGO, a fast, efficient, and feature-packed minecraft name sniper.

MCsniperGO This project was made possible by my donators Usage This sniper is in it's beta stage, meaning bugs should be expected. Easy installation d

Dec 31, 2022
Calculate the efficient frontier

Calculate the efficient frontier

Nov 11, 2022
Golang flags parser with zero dependency

flags Golang flags parser with zero dependency. Usage See simple.go for basic usage. Concept flags gives a simple way to get flag's value from argumen

Jan 16, 2022
🍕 Enjoy a slice! A utility library for dealing with slices and maps that focuses on type safety and performance.

?? github.com/elliotchance/pie Enjoy a slice! pie is a library of utility functions for common operations on slices and maps. Quick Start FAQ What are

Dec 30, 2022
A tool and library for using structural regular expressions.

Structural Regular Expressions sregx is a package and tool for using structural regular expressions as described by Rob Pike (link).

Dec 7, 2022
Go library for decoding generic map values into native Go structures and vice versa.

mapstructure mapstructure is a Go library for decoding generic map values to structures and vice versa, while providing helpful error handling. This l

Dec 28, 2022
A well tested and comprehensive Golang statistics library package with no dependencies.

Stats - Golang Statistics Package A well tested and comprehensive Golang statistics library / package / module with no dependencies. If you have any s

Dec 30, 2022