Experimental wire-format protobuf canonicalizer

wirepb

This repository implements an experimental wire-format canonical string format for protocol buffer messages.

Specifically, the wirepb.Canonical function implements Algorithm 2 described below.

Background

The Protocol Buffer encoding rules do not guarantee a consistent binary encoding for any given message. The encoder is allowed to store field encodings in any order, regardless of their tag values or their order of declaration within the protobuf schema.

Even within a single process, the encoder is allowed to produce different output for repeated encodings of the same message. Although most encoders do not encode differently by design, messages that use maps, or that are encoded with different options, often produce different outputs. This can be true even if there are no unknown fields.

While a correct encoding must decode to an equivalent message, these rules mean that hashes, checksums, and fingerprints of wire-format protobuf messages are not guaranteed to be stable over time.

Applications that need to compute reliable hashes, checksums, or fingerprints of message structures should generally define them in terms of the schematic structure of the type.

This is relatively easy to do for a single language, but can be tricky when the same value needs to be computed across multiple languages that share the protocol buffer as an interchange format. The in-memory organization of data types like strings, integers, floating-point values, arrays, and maps differs -- so that a rule that is easy to implement in one language may be inefficient or impractical in another.

Some projects work around this by avoiding "problematic" constructs like maps, enabling the "deterministic" option in encoders that support it, and relying on the encoder to respect language-specific properties like array and tag order when laying out messages in memory. These tactics help, but are insufficient. Moreover, they often do not generalize across languages.

Canonical Layout

One possible solution to this dilemma is to define a rule for canonical layout in the wire format. We can take advantage of the rule that the encoder is allowed to emit, and the decoder is required to accept, arbitrary ordering of fields within a message.

In summary, the rule for canonical layout uses the fact that a wire encoding is a concatenated sequence of key|value strings, where the key comprises a field tag number and a "wire type" sufficient to describe the number and layout of the value bytes.

To impose a consistent ordering, consider the following algorithm:

Algorithm 1

  1. Parse the message into a sequence of (type, tag, value) tuples.
  2. Sort the tuples by tag (numerically), then value (lexicographically).
  3. Rearrange the fields into the resulting order.

Algorithm 1 is not quite sufficient, however: A message field may contain another message encoded as a string. Furthermore, the wire encoding does not distinguish between the encoding of a message and the encodings of an arbitrary byte string.

The only way to know for sure whether a field contains a message or an opaque string is to consult the schema. However, we can work around this by modifying the algorithm as follows:

Algorithm 2

  1. Parse the input into a sequence of (type, tag, value) tuples. If parsing fails, return the input unmodified. Otherwise:
  2. For each tuple whose wire type is "string", apply Algorithm 2 to its value.
  3. Sort the resulting tuples by tag (numerically), then value (lexicographically).
  4. Rearrange the fields into the resulting order.

Algorithm 2 fixes most of the problems with Algorithm 1. There are three main issues remaining:

  • The algorithm cannot distinguish between a field of opaque string type that contains the wire encoding of a message, and a field of message type. Compare:

    message M1 {
      bytes content = 1;
    }
    message M2 {
      Foo content = 1;
    }

    where the content field of M1 contains the wire encoding of a Foo. Algorithm 2 would canonicalize both of these messages identically.

    This is semantically safe: Decoding the result as M1 and then decoding its content field would produce the same (valid) Foo as decoding the result as M2 and observing its content field.

    However, encoding an M1, running Algorithm 2, and then decoding the result as an M1, would not result in an equivalent message.

  • The algorithm does not unify default-valued fields with unset (omitted) fields. In proto3, the only default values are equivalent to omitting the field from the message, and most encoders do omit them. However, if an encoder does include default-valued fields in its output, Algorithm 2 will generate a different string than if the field was omitted.

    This problem could be worked around by recognizing and filtering out fields that encode the default (zero) value. This is slightly tricky, though, as it changes the length and content of fields rather than only their order.

  • The algorithm does not handle "packed" repeated fields of scalar type. Packed repeated fields are encoded as a wire-type string of concatenated varint values. The values within the string do not have the structure of a message, so the algorithm does not "fix" the order of the packed values.

    This problem is not easy to work around, since many opaque strings are not distinguishable from a concatenation of varint values.

Owner
Similar Resources

RBAC scaffolding based on Gin + Gorm+ Casbin + Wire

RBAC scaffolding based on Gin + Gorm+ Casbin + Wire

Gin Admin 基于 GIN + GORM + CASBIN + WIRE 实现的RBAC权限管理脚手架,目的是提供一套轻量的中后台开发框架,方便、快速的完成业务需求的开发。 特性 遵循 RESTful API 设计规范 & 基于接口的编程规范 基于 GIN 框架,提供了丰富的中间件支持(JWT

Dec 28, 2022

wire protocol for multiplexing connections or streams into a single connection, based on a subset of the SSH Connection Protocol

qmux qmux is a wire protocol for multiplexing connections or streams into a single connection. It is based on the SSH Connection Protocol, which is th

Dec 26, 2022

golang auto wire code generator

Go-AutoWire helps you to generate wire files with easy annotate 中文文档 this project is base on wire but it did simplify the wire usage and make wire muc

Dec 2, 2022

Jacket of spf13/viper: Simplified go configuration for wire-jacket.

Viper-Jacket: viper config for Wire-Jacket Jacket of spf13/viper: config for wire-jacket. Simplified env-based go configuration package using viper. b

Nov 18, 2021

Wire: Automated Initialization in Go

Wire: Automated Initialization in Go Wire is a code generation tool that automates connecting components using dependency injection. Dependencies betw

Dec 10, 2021

Toy-redis-mock - Experimentation with parsing the redis wire protocol from scratch

Overview Simple app for practicing implementing server-side Redis wire protocol

Jan 9, 2022

gonewire: one wire library that uses the w1 kernel module

gonewire one wire library that uses the w1 kernel module. current support: DS18(S)20

Jan 25, 2022

Grafana-threema-forwarder - Alert forwarder from Grafana webhooks to Threema wire messages

Grafana to Threema alert forwarder Although Grafana has built in support for pus

Nov 11, 2022

Mgosniff: MongoDB Wire Protocol Analysis Tools

mgosniff - MongoDB Wire Protocol Analysis Tools Reference: MongoDB Wire Protocol

Feb 18, 2022

Read metrics from a Message Queue in Json format and expose them in a Prometheus compatible format

mq2prom Read metrics from a Message Queue in Json format and expose them in a Prometheus compatible format. Currently only works for MQTT compatible M

Jan 24, 2022

Using NFP (Number Format Parser) you can get an Abstract Syntax Tree (AST) from Excel number format expression

NFP (Number Format Parser) Using NFP (Number Format Parser) you can get an Abstract Syntax Tree (AST) from Excel number format expression. Installatio

Feb 4, 2022

Protobuf files manager

Prot - protobuf files manager. It application can help your manage protobuf files and generate code based on him. !!! Before use Prot you must install

Jun 22, 2022

Generate types and service clients from protobuf definitions annotated with http rules.

protoc-gen-typescript-http Generates Typescript types and service clients from protobuf definitions annotated with http rules. The generated types fol

Nov 22, 2022

A simple RPC framework with protobuf service definitions

Twirp is a framework for service-to-service communication emphasizing simplicity and minimalism. It generates routing and serialization from API defin

Jan 7, 2023

A Protocol Buffers compiler that generates optimized marshaling & unmarshaling Go code for ProtoBuf APIv2

vtprotobuf, the Vitess Protocol Buffers compiler This repository provides the protoc-gen-go-vtproto plug-in for protoc, which is used by Vitess to gen

Jan 1, 2023

protobuf ではなく JSON でやり取りするファイルを出力する protoc プラグイン

protoc-gen-jsonif proto ファイルから、JSON フォーマットでやりとりする型定義ファイルを出力する protoc プラグインです。 proto ファイルで言語を越えて型定義が出来るのはとても良い しかし protobuf ライブラリを入れるのが面倒 今のプロジェクトには既に

Feb 28, 2022

WIP protobuf support for Gleam ✨

gleam_pb WIP protobuf support for Gleam ✨ Progress Gleam Type generation custom functions that better handle default values stop including unnecessary

Feb 26, 2022

Generate Jsonnet definition for JSON representation of protobuf object

Generate Jsonnet definition for JSON representation of protobuf object

Nov 1, 2021

Native, Protobuf & SQL-compliant objects used by offensive security tools.

Attacked Infrastructure Modular Specification (AIMS) Overview This repository aims to gather various declarations/specification of elements faced or n

Nov 19, 2021
golang auto wire code generator

Go-AutoWire helps you to generate wire files with easy annotate 中文文档 this project is base on wire but it did simplify the wire usage and make wire muc

Dec 2, 2022
Wire: Automated Initialization in Go

Wire: Automated Initialization in Go Wire is a code generation tool that automates connecting components using dependency injection. Dependencies betw

Dec 10, 2021
Mgosniff: MongoDB Wire Protocol Analysis Tools

mgosniff - MongoDB Wire Protocol Analysis Tools Reference: MongoDB Wire Protocol

Feb 18, 2022
An experimental toolkit for injecting alternate authentication strategies into a PostgreSQL-compatible wire format.

PG Auth Proxy This is an experimental toolkit for injecting alternate authentication strategies into a PostgreSQL-compatible wire format. This is a pr

Jan 20, 2022
Jacket of google/wire: advanced DI approach wrapping google/wire for cloud.
Jacket of google/wire: advanced DI approach wrapping google/wire for cloud.

Wire-Jacket: IoC Container of google/wire for cloud-native Jacket of google/wire: advanced DI approach wrapping google/wire for cloud. google/wire : h

Nov 21, 2022
rediloper is a protobuf wrapper generator, designed for mget or mset protobuf variables easily.

Rediloper rediloper is a protobuf wrapper generator, designed for mget or mset protobuf variables easily. Installation go get -u github.com/p1gd0g/red

Aug 3, 2021
Create a Protocol Buffers (Protobuf) plugin, which is executed with the protoc compileCreate a Protocol Buffers (Protobuf) plugin, which is executed with the protoc compile

Interview Assignment Overview You assignment is to create a Protocol Buffers (Protobuf) plugin, which is executed with the protoc compiler. In this ex

Nov 19, 2021
protoCURL is cURL for Protobuf: The command-line tool for interacting with Protobuf over HTTP REST endpoints using human-readable text formats

protoCURL protoCURL is cURL for Protobuf: The command-line tool for interacting with Protobuf over HTTP REST endpoints using human-readable text forma

Jan 6, 2023
Kafka producer and consumer tool in protobuf format.

protokaf Kafka producer and consumer tool in protobuf format. Features Consume and produce messages using Protobuf protocol Trace messages with Jaeger

Nov 15, 2022
RabbitMQ wire tap and swiss army knife
RabbitMQ wire tap and swiss army knife

rabtap - RabbitMQ wire tap Swiss army knife for RabbitMQ. Tap/Pub/Sub messages, create/delete/bind queues and exchanges, inspect broker. Contents Feat

Dec 28, 2022