ByNom is a Go package for parsing byte sequences, suitable for parsing text and binary data

LICENSE Go Reference

bynom

ByNom is a Go package for parsing byte sequences. Its goal is to provide tools to build safe byte parsers without compromising the speed or memory consumption.

The package is inspired by Rust nom library. At the time I was looking for the byte parser library which could meet my requirements, and therefore I decided to write that package.

Status

The current status of the package is Early Development. The interface can be changed any time without notification.

Installation

To install the package use go get github.com/workanator/bynom

Features

  • byte-oriented: The basic type is byte and parsers works with bytes and byte slices.
  • zero-copy: If a parser returns a subset of its input data, it will return a slice of that input, without copying.1
  • conditional parsing: Parsing can be conditional containing switches and optional parts.
  • rich error details: The package tries to provide as much information as possible on parsing error.

1 Depends on the implementation of bynom.Plate.

Example

Here is the simplified example of how time parser can be constructed. The expected time format is HH:MM[:SS][ ][AM|PM].

var hour, minute, second, amPm []byte               // Parsing result will be here

digits := WhileAcceptable(span.Range('0', '9'))     // Allow only bytes in the range '0'..'9'
twoDigits := RequireLen(2, digits)                  // Require the sequence to be 2 bytes in length
time24 := Sequence(
  Take(into.Bytes(&hour), twoDigits),               // Parse hour and write the result in `hour`
  Expect(':'),                                      // Expect ':' after the hour
  Take(into.Bytes(&minute), twoDigits),             // Parse minute and write the result in `minute`
  Optional(                                         // Parse optional second
    Expect(':'),                                    // Expect ':' after the the minute
    Take(into.Bytes(&second), twoDigits),           // Parse second and write the result in `second`
  ),
)
time12 := Sequence(
  time24,                                           // Extend 24-hour time parser
  Optional(While(' ')),                             // Skip optional whitespace
  Take(                                             // Parse AM/PM part
    into.Bytes(&amPm),                              // On success write the result in `amPm`
    ExpectAcceptable(span.Set('a', 'A', 'p', 'P')), // Expect one byte from the set aApP
    ExpectAcceptable(span.Set('m', 'M')),           // Expect one byte from the set mM
  ),
)

timeBite := NewBite(Switch(time12, time24))         // Wrap parsers into the bynom.Eater.
if err := dish.NewString(inputData); err != nil {
  panic(err)
} else {
  println(string(hour), ":", string(minute), ":", string(second), " ", string(amPm))
}

See examples for more examples.

Parsers

Name Description
Any Reads bytes from the plate until io.EOF encountered.
Expect Expects the next byte to be equal input.
ExpectNot Expects the next byte to be not equal input.
ExpectAcceptable Expects the next set of bytes to be accepted by input.
ExpectIneligible Expects the next set of bytes to be declined by input.
Optional Groups multiple parsers into one optional parser.
Repeat Repeat the set of parsing N times.
RequireLen Require the length of parsed byte sequnce to be equal input.
Sequence Groups multiple parsers into one parser.
Switch Converts multiple parsers into options. The first parser which returns success finishes the switch.
Take Takes the parsed byte sequence into variable.
When Runs the set of parsers when the first parser finishes with success.
WhenNot Runs the set of parsers when the first parser finishes with error.
While Parses while the next byte equals input.
WhileNot Parses while the next byte does not equal input.
WhileAcceptable Parses while the next set of bytes accepted by input.
WhileIneligible Parses while the next set of bytes declined by input.

Error Formatting

The parsing errors can be formatted using formatters from prettierr package. Here is an example of a parsing error formatted with prettierr.HexFormatter.

Error:
  requirement not met: invalid length: expected 2, have 11
Range:
  start=0, end=11
Context:
  31 33 32 34 33 32 34 33  32 30 35                | 13243243 205     
Stack:
  5: RequireLen, start=0, end=11
  4: Take[0], start=0, end=11
  3: Sequence[0], start=0, end=11
  2: Switch, start=0
  1: When[0], start=0, end=11
  0: Switch, start=0

To-Do

  • Add support for "words".
  • Provide more usable information on parsing error.
  • Add Plate implementation for io.ReadSeeker.
  • Add more tests.
  • Add benchmarks.
  • Add more examples.
  • Extend the documentation.

Contribution

Any contributions and feedback are welcome.

Owner
Similar Resources

Small and fast FTS (full text search)

Microfts A small full text indexing and search tool focusing on speed and space. Initial tests seem to indicate that the database takes about twice as

Jul 30, 2022

Diff, match and patch text in Go

go-diff go-diff offers algorithms to perform operations required for synchronizing plain text: Compare two texts and return their differences. Perform

Dec 25, 2022

:book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

prose prose is a natural language processing library (English only, at the moment) in pure Go. It supports tokenization, segmentation, part-of-speech

Jan 4, 2023

PipeIt is a text transformation, conversion, cleansing and extraction tool.

PipeIt is a text transformation, conversion, cleansing and extraction tool.

PipeIt PipeIt is a text transformation, conversion, cleansing and extraction tool. Features Split - split text to text array by given separator. Regex

Aug 15, 2022

Fast and secure steganography CLI for hiding text/files in images.

Fast and secure steganography CLI for hiding text/files in images.

indie CLI This complete README is hidden in the target.png file below without the original readme.png this could have also been a lie as none could ev

Mar 20, 2022

a simple and lightweight terminal text editor written in Go

Simple Text editor written in Golang build go build main.go

Oct 4, 2021

AppGo is an application that is intended to read a plain text log file and deliver an encoded polyline

AppGo AppGo is an application that is intended to read a plain text log file and deliver an encoded polyline. Installation To run AppGo it is necessar

Oct 23, 2021

A UTF-8 and internationalisation testing utility for text rendering.

ɱéťàł "English, but metal" Metal is a tool that converts English text into a legible, Zalgo-like character swap for the purposes of testing localisati

Jan 1, 2023

A simple action that looks for multiple regex matches, in a input text, and returns the key of the first found match.

Key Match Action A simple action that looks for multiple regex matches, in a input text, and returns the key of the first found match. TO RUN Add the

Aug 4, 2022
👄 The most accurate natural language detection library in the Go ecosystem, suitable for long and short text alike
👄 The most accurate natural language detection library in the Go ecosystem, suitable for long and short text alike

?? The most accurate natural language detection library in the Go ecosystem, suitable for long and short text alike

Dec 25, 2022
Google GCP Text-to-Speech Service in one simple binary ;)

Google text-to-speak Simple Binary file This repository is a simple implementation of google text-to-speak service. Required enable API in GCP (https:

Dec 25, 2022
Package sanitize provides functions for sanitizing text in golang strings.

sanitize Package sanitize provides functions to sanitize html and paths with go (golang). FUNCTIONS sanitize.Accents(s string) string Accents replaces

Dec 5, 2022
Path parsing for segment unmarshaling and slicing.

parth go get github.com/codemodus/parth/v2 Package parth provides path parsing for segment unmarshaling and slicing. In other words, parth provides s

Sep 27, 2022
Match regex group into go struct using struct tags and automatic parsing

regroup Simple library to match regex expression named groups into go struct using struct tags and automatic parsing Installing go get github.com/oris

Nov 5, 2022
A general purpose application and library for aligning text.

align A general purpose application that aligns text The focus of this application is to provide a fast, efficient, and useful tool for aligning text.

Sep 27, 2022
Parse placeholder and wildcard text commands

allot allot is a small Golang library to match and parse commands with pre-defined strings. For example use allot to define a list of commands your CL

Nov 24, 2022
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

omniparser Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JS

Jan 4, 2023
Produces a set of tags from given source. Source can be either an HTML page, Markdown document or a plain text. Supports English, Russian, Chinese, Hindi, Spanish, Arabic, Japanese, German, Hebrew, French and Korean languages.
Produces a set of tags from given source. Source can be either an HTML page, Markdown document or a plain text. Supports English, Russian, Chinese, Hindi, Spanish, Arabic, Japanese, German, Hebrew, French and Korean languages.

Tagify Gets STDIN, file or HTTP address as an input and returns a list of most popular words ordered by popularity as an output. More info about what

Dec 19, 2022
Templating system for HTML and other text documents - go implementation

FAQ What is Kasia.go? Kasia.go is a Go implementation of the Kasia templating system. Kasia is primarily designed for HTML, but you can use it for any

Mar 15, 2022