binary serialization format

Colfer Build Status

Colfer is a binary serialization format optimized for speed and size.

The project's compiler colf(1) generates source code from schema definitions to marshal and unmarshall data structures.

This is free and unencumbered software released into the public domain. The format is inspired by Protocol Buffers.

Language Support

  • C, ISO/IEC 9899:2011 compliant a.k.a. C11, C++ compatible
  • Go, a.k.a. golang
  • Java, Android compatible
  • JavaScript, a.k.a. ECMAScript, NodeJS compatible

Features

  • Simple and straightforward in use
  • No dependencies other than the core library
  • Both faster and smaller than the competition
  • Robust against malicious input
  • Maximum of 127 fields per data structure
  • No support for enumerations
  • Framed; suitable for concatenation/streaming

TODO's

  • Rust and Python support
  • Protocol revision

Use

Download a prebuilt compiler or run go get -u github.com/pascaldekloe/colfer/cmd/colf to make one yourself. Homebrew users can also brew install colfer.

The command prints its own manual when invoked without arguments.

NAME
	colf — compile Colfer schemas

SYNOPSIS
	colf [-h]
	colf [-vf] [-b directory] [-p package] \
		[-s expression] [-l expression] C [file ...]
	colf [-vf] [-b directory] [-p package] [-t files] \
		[-s expression] [-l expression] Go [file ...]
	colf [-vf] [-b directory] [-p package] [-t files] \
		[-x class] [-i interfaces] [-c file] \
		[-s expression] [-l expression] Java [file ...]
	colf [-vf] [-b directory] [-p package] \
		[-s expression] [-l expression] JavaScript [file ...]

DESCRIPTION
	The output is source code for either C, Go, Java or JavaScript.

	For each operand that names a file of a type other than
	directory, colf reads the content as schema input. For each
	named directory, colf reads all files with a .colf extension
	within that directory. If no operands are given, the contents of
	the current directory are used.

	A package definition may be spread over several schema files.
	The directory hierarchy of the input is not relevant to the
	generated code.

OPTIONS
  -b directory
    	Use a base directory for the generated code. (default ".")
  -c file
    	Insert a code snippet from a file.
  -f	Normalize the format of all schema input on the fly.
  -h	Prints the manual to standard error.
  -i interfaces
    	Make all generated classes implement one or more interfaces.
    	Use commas as a list separator.
  -l expression
    	Set the default upper limit for the number of elements in a
    	list. The expression is applied to the target language under
    	the name ColferListMax. (default "64 * 1024")
  -p package
    	Compile to a package prefix.
  -s expression
    	Set the default upper limit for serial byte sizes. The
    	expression is applied to the target language under the name
    	ColferSizeMax. (default "16 * 1024 * 1024")
  -t files
    	Supply custom tags with one or more files. Use commas as a list
    	separator. See the TAGS section for details.
  -v	Enable verbose reporting to standard error.
  -x class
    	Make all generated classes extend a super class.

TAGS
	Tags, a.k.a. annotations, are source code additions for structs
	and/or fields. Input for the compiler can be specified with the
	-f option. The data format is line-oriented.

		<line> :≡ <qual> <space> <code> ;
		<qual> :≡ <package> '.' <dest> ;
		<dest> :≡ <struct> | <struct> '.' <field> ;

	Lines starting with a '#' are ignored (as comments). Java output
	can take multiple tag lines for the same struct or field. Each
	code line is applied in order of appearance.

EXIT STATUS
	The command exits 0 on success, 1 on error and 2 when invoked
	without arguments.

EXAMPLES
	Compile ./io.colf with compact limits as C:

		colf -b src -s 2048 -l 96 C io.colf

	Compile ./*.colf with a common parent as Java:

		colf -p com.example.model -x com.example.io.IOBean Java

BUGS
	Report bugs at <https://github.com/pascaldekloe/colfer/issues>.

	Text validation is not part of the marshalling and unmarshalling
	process. C and Go just pass any malformed UTF-8 characters. Java
	and JavaScript replace unmappable content with the '?' character
	(ASCII 63).

SEE ALSO
	protoc(1), flatc(1)

It is recommended to commit the generated source code into the respective version control to preserve build consistency and minimise the need for compiler installations. Alternatively, you may use the Maven plugin.

<plugin>
	<groupId>net.quies.colfer</groupId>
	<artifactId>colfer-maven-plugin</artifactId>
	<version>1.11.2</version>
	<configuration>
		<packagePrefix>com/example</packagePrefix>
	</configuration>
</plugin>

Schema

Data structures are defined in .colf files. The format is quite self-explanatory.

// Package demo offers a demonstration.
// These comment lines will end up in the generated code.
package demo

// Course is the grounds where the game of golf is played.
type course struct {
	ID    uint64
	name  text
	holes []hole
	image binary
	tags  []text
}

type hole struct {
	// Lat is the latitude of the cup.
	lat float64
	// Lon is the longitude of the cup.
	lon float64
	// Par is the difficulty index.
	par uint8
	// Water marks the presence of water.
	water bool
	// Sand marks the presence of sand.
	sand bool
}

See what the generated code looks like in C, Go, Java or JavaScript.

The following table shows how Colfer data types are applied per language.

Colfer C Go Java JavaScript
bool char bool boolean Boolean
uint8 uint8_t uint8 byte † Number
uint16 uint16_t uint16 short † Number
uint32 uint32_t uint32 int † Number
uint64 uint64_t uint64 long † Number ‡
int32 int32_t int32 int Number
int64 int64_t int64 long Number ‡
float32 float float32 float Number
float64 double float64 double Number
timestamp timespec time.Time †† time.Instant Date + Number
text const char* + size_t string String String
binary uint8_t* + size_t []byte byte[] Uint8Array
list * + size_t slice array Array
  • † signed representation of unsigned data, i.e. may overflow to negative.
  • ‡ range limited to [1 - 2⁵³, 2⁵³ - 1]
  • †† timezone not preserved

Lists may contain floating points, text, binaries or data structures.

Security

Colfer is suited for untrusted data sources such as network I/O or bulk streams. Marshalling and unmarshalling comes with built-in size protection to ensure predictable memory consumption. The format prevents memory bombs by design.

The marshaller may not produce malformed output, regardless of the data input. In no event may the unmarshaller read outside the boundaries of a serial. Fuzz testing did not reveal any volnurabilities yet. Computing power is welcome.

Compatibility

Name changes do not affect the serialization format. Deprecated fields should be renamed to clearly discourage their use. For backwards compatibility new fields must be added to the end of colfer structs. Thus the number of fields can be seen as the schema version.

Performance

Colfer aims to be the fastest and the smallest format without compromising on reliability. See the benchmark wiki for a comparison. Suboptimal performance is treated like a bug.

Owner
Pascal S. de Kloe
freelance engineer
Pascal S. de Kloe
Comments
  • No JavaScript Benchmarks?

    No JavaScript Benchmarks?

    Would love to see how the built-in JSON and Protobuf fare against colfer. Its kinda unfair to see C, Go, and Java make it to the benchmark games and JS stays at home. Could it be that the JS version couldn't live up to the Colfer performance promise?

  • Add superclass to generated Java classes

    Add superclass to generated Java classes

    Adds a common interface to the generated Java - I'd like to write code that interacts with any Colfer object, and this makes it quite a bit simpler.

    I'm not much of a go programmer, so please let me know if something needs correction.

  • Segmentation fault in a unique case

    Segmentation fault in a unique case

    Hi, I hope all are great.

    Scenario: I have the following mapping in my chappee.colf file.

    package chappee type mappee { maptype uint32 msgtype uint32 callmsg text debugleve text minseed uint64 maxseed uint64 }

    During packing, if I set the last two values, the packing is successful. Now, if during packing I pack first 4 (or any of them) and don't pack the last 2, I get a segmentation fault. I think it would be great that Colfer itself should take care of the values which are not set.

    It can be taken as a bug, or it can be taken as a feature request.

    Cheers, infoginx.com

  • Document UTF-8 Conversion in Spec + Simplify marshall/unmarshall Implementations.

    Document UTF-8 Conversion in Spec + Simplify marshall/unmarshall Implementations.

    After looking at this I thought we were doing something special, but it turns out some characters weren't encoded right. Single character string: '𐍈' is an example

    We should switch to utf8.js since it makes the code easier to debug. If we wanted to save memory allocations, we could make it re-use a fixed buffer and return the buffer, length pair. And then we can rewrite the code for subsequent for adjusting the size in.

    The Java version can be simplified with new String(bytes, utf8CharSet); and s.getBytes(utf8CharSet); As of Java 8, they decided to move away from UTF-16 as a default encoding, and previously were into UTF16. So we should just use the native implementation as much as possible than invent ours.

    @pascaldekloe would like to hear your thoughts. :)

  • Comments in generated code

    Comments in generated code

    Hi,

    colfer place the name of schema file to the comment it generates via Maven Plugin, I am using a Windows machine and colfer does not escape the "" in file path and java compiler then complains about 'illegal escape character', is there a way to disable this comment or let it the colfer escape it before putting in the comment?

    Thx

  • BufferOverflowException in writeObject

    BufferOverflowException in writeObject

    Maybe I missed something, but it looks like a bug: In Java, in generated class there is writeObject(ObjectOutputStream out) method which uses buf array of size 1024 by default. In case of java.nio.BufferUnderflowException size of it is enlarged 4 times. But in case when buf array is too small to handle marshalled object, java.nio.BufferOverflowException is thrown by marshal(byte[] buf, int offset) method, not java.nio.BufferUnderflowException, so buf is not enlarged and writeObject ends with exception.

  • why not making colf java based on annotation processing

    why not making colf java based on annotation processing

    @Colfer could be used to identify pojo classes. Then the annotation processor would scan the class and generate the code for serialization / deserialization.

    This is the most popular java way to do it. Annotation processors are a standard part of the java compiler.

    Colf has a great potential on Android. And annotation processors are very common in Android builds.

  • Document limitation: Only 126 fields, ever, no removing

    Document limitation: Only 126 fields, ever, no removing

    I wish it had been documented that a colf type cannot have more than 126 fields, ever, and this includes "deleted" fields.

    (Why 126? "Each definition starts with an 8-bit header. The 7 least significant bits identify the field by its (0-based position) index in the schema." and 127 is reserved for the termination byte.)

    Related, I wish https://github.com/pascaldekloe/colfer/#compatibility had explicitly said "Fields can never be removed."

  • Timezone offset for Go

    Timezone offset for Go

    It would be great if this could be supported. Right now I have to serialise extra fields to rematerialize it correctly.

    I think, unless it is very hard, that I can give it a stab if you point me in the right direction.

  • Is this used in production anywhere?

    Is this used in production anywhere?

    Hi, I work for a large mobile app company that is interested in replacing JSON with something faster. The one disappointing thing is that colfer is written in Go and JS it looks like, which we can't use for iOS or Android (without a fair amount of finagling at least), so are there any plans for it to get ported to C/C++? Also, are there any companies that use it? It would be great to see a list of some companies that use or even their experiences, as that would really sell the format. How much security auditing has occurred?

    The numbers on https://github.com/eishay/jvm-serializers/wiki are tantalizing, so I'm curious to hear more :)

  • Document uint16 better.

    Document uint16 better.

    Why is uint16's compressed and uncompressed flipping the behavior of (index | 0x80)? This flag sort of is the anti-pattern of what uint32 and uint64 do. It's a bit confusing.

  • Big Decimal Support

    Big Decimal Support

    Hi! Thanks for this excelent serializer!

    Please add suport for big.Int and decimal.Big !

    My main need is the following golang types (biginteger,native) math/big.Int and (bigdecimal) github.com/ericlagergren/decimal.Big

    for others langs I don't care, but I would suggest: java => (biginteger, native) https://docs.oracle.com/javase/10/docs/api/java/math/BigInteger.html (bigdecimal, native) https://docs.oracle.com/javase/10/docs/api/java/math/BigDecimal.html JavaScrip => (biginteger, native) https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt (or, biginteger) https://github.com/MikeMcl/decimal.js (bigdecimal) https://github.com/MikeMcl/decimal.js C => (biginteger) https://github.com/libtom/libtommath (bigdecimal) https://github.com/libtom/libtomfloat

    Thanks Very Very Very Much!

    Best Whishes, Dani.

  • Adding Dart

    Adding Dart

    For https://github.com/pascaldekloe/colfer/issues/71

    • Added Dart module, with tests ** MinInt64 as uint64 is wired directly because of the overflow to negative, because that's the only branch that would have needed counting bytes. I added another test with MinInt64+1 to prove that the rest works as expected. ** I didn't add EOF as it's in the Go code, mainly because using the library is very similar in these two cases. Dart error handling is through try-catch with errors and exceptions. By default, errors are used for programmatic errors, exceptions for user input errors, so it's not totally idiomatic, looks a little hacky. But in the end catching RangeError instead of a defined EOF exception are very similar to me, tested it as well
    • updated Readme ** updated footnote characters, because in theory it's possible to have multiple of them in any field. I took the rest from here: https://en.wikipedia.org/wiki/Note_(typography)

    what's missing:

    • benchmark
    • I did my best, and I'm pretty sure that it's working as-is, but a review from a more experienced Dart dev would be good
  • Dart support

    Dart support

    Hi,

    Awesome lib, thanks! I'm planning to add a dart implementation. Writing it here, so if anyone else thinks about to add it, we don't do it multiple times.

  • Rust support

    Rust support

    Hello,

    I wanted to ask about rust support.

    Considering how such code is usually generated with macros, should we just create a crate that just implements the protocol/ser/deser logic? Also, conforming with Serde to get automatic ser/deser of rust struct would be a very nice addition as well.

    Cheers,

    Mathieu

Musgo is a Go code generator for binary MUS format with validation support.

Musgo is a Go code generator for binary MUS format with validation support. Generated code converts data to and from MUS format.

Dec 29, 2022
Golang binary decoder for mapping data into the structure

binstruct Golang binary decoder to structure Install go get -u github.com/ghostiam/binstruct Examples ZIP decoder PNG decoder Use For struct From file

Dec 17, 2022
Simple, specialised, and efficient binary marshaling

?? surge Documentation A library for fast binary (un)marshaling. Designed to be used in Byzantine networks, ?? surge never explicitly panics, protects

Oct 4, 2022
Encode and decode binary message and file formats in Go

Encode and Decode Binary Formats in Go This module wraps the package encoding/binary of the Go standard library and provides the missing Marshal() and

Dec 22, 2022
The Snappy compression format in the Go programming language.

The Snappy compression format in the Go programming language. To download and install from source: $ go get github.com/golang/snappy Unless otherwis

Jan 4, 2023
binary serialization format

Colfer Colfer is a binary serialization format optimized for speed and size. The project's compiler colf(1) generates source code from schema definiti

Dec 25, 2022
Benchmarks of Go serialization methods

Benchmarks of Go serialization methods This is a test suite for benchmarking various Go serialization methods. Tested serialization methods encoding/g

Jan 1, 2023
faster JSON serialization for Go

ffjson: faster JSON for Go ffjson generates static MarshalJSON and UnmarshalJSON functions for structures in Go. The generated functions reduce the re

Dec 25, 2022
A k-mer serialization package for Golang
A k-mer serialization package for Golang

.uniq v5 This package provides k-mer serialization methods for the package kmers, TaxIds of k-mers are optionally saved, while there's no frequency in

Aug 19, 2022
Implementation of Ethernet 2, 802.1Q, 802.1P, 802.11 (Wireless Ethernet) frame serialization/deserialization

Implementation of Ethernet 2, 802.1Q, 802.1P, 802.11 (Wireless Ethernet) frame serialization/deserialization

Feb 5, 2022
searchHIBP is a golang tool that implements binary search over a hash ordered binary file.

searchHIBP is a golang tool that implements binary search over a hash ordered binary file.

Nov 9, 2021
Musgo is a Go code generator for binary MUS format with validation support.

Musgo is a Go code generator for binary MUS format with validation support. Generated code converts data to and from MUS format.

Dec 29, 2022
archy is an static binary to determine current kernel and machine architecture, with backwards compatible flags to uname, and offers alternative output format of Go runtime (i.e. GOOS, GOARCH).

archy archy is an simple binary to determine current kernel and machine architecture, which wraps uname and alternatively can read from Go runtime std

Mar 18, 2022
Read metrics from a Message Queue in Json format and expose them in a Prometheus compatible format

mq2prom Read metrics from a Message Queue in Json format and expose them in a Prometheus compatible format. Currently only works for MQTT compatible M

Jan 24, 2022
Using NFP (Number Format Parser) you can get an Abstract Syntax Tree (AST) from Excel number format expression

NFP (Number Format Parser) Using NFP (Number Format Parser) you can get an Abstract Syntax Tree (AST) from Excel number format expression. Installatio

Feb 4, 2022
A memcached binary protocol toolkit for go.

gomemcached This is a memcached binary protocol toolkit in go. It provides client and server functionality as well as a little sample server showing h

Nov 9, 2022
A binary stream packer and unpacker

binpacker A binary packer and unpacker. Install go get github.com/zhuangsirui/binpacker Examples Packer buffer := new(bytes.Buffer) packer := binpacke

Dec 1, 2022
A Left-Leaning Red-Black (LLRB) implementation of balanced binary search trees for Google Go

GoLLRB GoLLRB is a Left-Leaning Red-Black (LLRB) implementation of 2-3 balanced binary search trees in Go Language. Overview As of this writing and to

Dec 23, 2022
Golang MySql binary log replication listener

Go MySql binary log replication listener Pure Go Implementation of MySQL replication protocol. This allow you to receive event like insert, update, de

Oct 25, 2022
Sending line notifications using a binary, docker or Drone CI.
Sending line notifications using a binary, docker or Drone CI.

drone-line Sending line notifications using a binary, docker or Drone CI. Register Line BOT API Trial Please refer to LINE Business Center. Feature Se

Sep 27, 2022