Encoding and decoding for fixed-width formatted data

fixedwidth GoDoc Report card Go Cover

Package fixedwidth provides encoding and decoding for fixed-width formatted Data.

go get github.com/ianlopshire/go-fixedwidth

Usage

Struct Tags

The struct tag schema schema used by fixedwidth is: fixed:"{startPos},{endPos},[{alignment},[{padChar}]]"1.

The startPos and endPos arguments control the position within a line. startPos and endPos must both be positive integers greater than 0. Positions start at 1. The interval is inclusive.

The alignment argument controls the alignment of the value within it's interval. The valid options are default2, right, and left. The alignment is optional and can be omitted.

The padChar argument controls the character that will be used to pad any empty characters in the interval after writing the value. The default padding character is a space. The padChar is optional and can be omitted.

Fields without tags are ignored.

Encode

// define some data to encode
people := []struct {
    ID        int     `fixed:"1,5"`
    FirstName string  `fixed:"6,15"`
    LastName  string  `fixed:"16,25"`
    Grade     float64 `fixed:"26,30"`
}{
    {1, "Ian", "Lopshire", 99.5},
}

data, err := fixedwidth.Marshal(people)
if err != nil {
    log.Fatal(err)
}
fmt.Printf("%s", data)
// Output:
// 1    Ian       Lopshire  99.50

Decode

// define the format
var people []struct {
    ID        int     `fixed:"1,5"`
    FirstName string  `fixed:"6,15"`
    LastName  string  `fixed:"16,25"`
    Grade     float64 `fixed:"26,30"`
}

// define some fixed-with data to parse
data := []byte("" +
    "1    Ian       Lopshire  99.50" + "\n" +
    "2    John      Doe       89.50" + "\n" +
    "3    Jane      Doe       79.50" + "\n")


err := fixedwidth.Unmarshal(data, &people)
if err != nil {
    log.Fatal(err)
}

fmt.Printf("%+v\n", people[0])
fmt.Printf("%+v\n", people[1])
fmt.Printf("%+v\n", people[2])
// Output:
//{ID:1 FirstName:Ian LastName:Lopshire Grade:99.5}
//{ID:2 FirstName:John LastName:Doe Grade:89.5}
//{ID:3 FirstName:Jane LastName:Doe Grade:79.5}

It is also possible to read data incrementally

decoder := fixedwidth.NewDecoder(bytes.NewReader(data))
for {
    var element myStruct
    err := decoder.Decode(&element)
    if err == io.EOF {
        break
    }
    if err != nil {
        log.Fatal(err)
    }
    handle(element)
}

If you have an input where the indices are expressed in unicode codepoints, and not raw bytes fixedwidth supports this. Your data must be UTF-8 encoded:

decoder := fixedwidth.NewDecoder(strings.NewReader(data))
decoder.SetUseCodepointIndices(true)
// Decode as usual now

Alignment Behavior

Alignment Encoding Decoding
default Field is left aligned The padding character is trimmed from both right and left of value
left Field is left aligned The padding character is trimmed from right of value
right Field is right aligned The padding character is trimmed from left of value

Notes

  1. {} indicates an argument. [] indicates and optional segment ^
  2. The default alignment is similar to left but has slightly different behavior required to maintain backwards compatibility ^

Licence

MIT

Owner
Ian Lopshire
Software engineer @timehop formerly @RedVentures. Open-source contributor, outdoors enthusiast, SpaceX fanboy. he/him.
Ian Lopshire
Comments
  • Support marshal unicode

    Support marshal unicode

    Hi @ianlopshire, I make a pull request for supporting marshal Unicode. Here is the benchmark:

    benchmark                               old ns/op     new ns/op     delta
    BenchmarkMarshal_MixedData_1-4          4135          3843          -7.06%
    BenchmarkMarshal_MixedData_1000-4       3827841       3563933       -6.89%
    BenchmarkMarshal_MixedData_100000-4     336806933     373349191     +10.85%
    BenchmarkMarshal_String-4               1308          1069          -18.27%
    BenchmarkMarshal_StringPtr-4            1266          1154          -8.85%
    BenchmarkMarshal_Int64-4                1038          1059          +2.02%
    BenchmarkMarshal_Float64-4              1381          1406          +1.81%
    
    benchmark                               old allocs     new allocs     delta
    BenchmarkMarshal_MixedData_1-4          48             48             +0.00%
    BenchmarkMarshal_MixedData_1000-4       44009          44009          +0.00%
    BenchmarkMarshal_MixedData_100000-4     4400015        4400015        +0.00%
    BenchmarkMarshal_String-4               9              9              +0.00%
    BenchmarkMarshal_StringPtr-4            9              9              +0.00%
    BenchmarkMarshal_Int64-4                9              9              +0.00%
    BenchmarkMarshal_Float64-4              12             12             +0.00%
    
    benchmark                               old bytes     new bytes     delta
    BenchmarkMarshal_MixedData_1-4          5248          5680          +8.23%
    BenchmarkMarshal_MixedData_1000-4       1300042       1732044       +33.23%
    BenchmarkMarshal_MixedData_100000-4     112738222     155938227     +38.32%
    BenchmarkMarshal_String-4               4344          4376          +0.74%
    BenchmarkMarshal_StringPtr-4            4344          4376          +0.74%
    BenchmarkMarshal_Int64-4                4336          4368          +0.74%
    BenchmarkMarshal_Float64-4              4400          4440          +0.91%
    

    Please let me know if I missed any case or something that can be improved.

    Regards, Huy Dang

  • Cache descriptions of structs to avoid repeated reparsing of tags

    Cache descriptions of structs to avoid repeated reparsing of tags

    In a benchmark I have of repeatedly calling Decode(&myStruct), nearly 20% of the time is being spent parsing tags.

    The stdlib json package caches a description computed description of structs it receives in order to avoid this sort of overhead.

  • Client-Configurable Padding Character

    Client-Configurable Padding Character

    As a consumer of this package, I have a need for generating fixedwidth -encoded output that's padded with 0s, so that I can integrate with a system which treats spaces as significant characters.

  • Support for codepoint-based fixedwidth files

    Support for codepoint-based fixedwidth files

    I have an extremely bonkers fixed-width file format where the file is utf-8 encoded, but the fixed offsets are expressed in decoded codepoints. Naturally this doesn't play super well with go-fixedwidth's (ENTIRELY REASONABLE) byte-based approach :-)

    I'd like to find some way to shoe-horn this into go-fixedwidth (and obviously contribute this upstream).

    I think the most efficient implementation is probably, if you're doing this codepoint mode, to conver the line to []rune in readLine and in rawValueFromLine convert it back to a []byte with the utf-8 value.

    And I think that'd all work fine. But I don't see an obvious way to do this without throwing a bunch of if statements here and changing a bunch of types (e.g. I could replace all the []byte with struct { bytes []byte; runes []rune } and then sticks if in readLine and rawValueFromLine).

    Do you have an opinion on if there's a better implementation strategy? Would I be better off just forking decode.go?

  • WithRightAlignedZeroPaddedNumbers and WithOverflowErrors Options

    WithRightAlignedZeroPaddedNumbers and WithOverflowErrors Options

    This is partly solving #28, but in a less flexible way than the field/tag-level approach. (Though also less prone to field-level mistakes since it'll consistently marshal ints without field-level configuration).

    I add an opts ...EncoderOption variadic to the NewEncoder func, and add a WithRightAlignedZeroPaddedInts() option to enable a change in encoding behavior: right-aligned and zero-padded integers.

  • Incorrectly getting io.EOF before the end of file when using Decoder.Decode

    Incorrectly getting io.EOF before the end of file when using Decoder.Decode

    When there is a very long line (specifically, 64 * 1024 = 65536 characters or longer) in a file processed by Decoder.Decode, the Decode method returns io.EOF when it reaches the very long line, even when it is not the end of the file.

    To reproduce this bug, see the sample code and test file in this small repo. These are based off of the sample code in the fixedwith repo's README here. In the small repo I provided, the test file contains 4 data lines, similar to the sample data in the README, except the 2nd line has over 65536 characters. The sample code uses the Decoder to read each line and print out the struct version of the line. It prints out the first line correctly, then prints a message indicating io.EOF was returned from the second line, e.g.:

    go run main.go
    {ID:1 FirstName:Ian LastName:Lopshire Grade:99.5 Age:20 Alive:false Github:f}
    Got EOF%
    

    I believe this may be happening because lines 108-113 of decode.go return io.EOF if, after calling readLine, the Decoder’s done flag is true and the returned values of readLine are err == nil && !ok. Looking into readLine (specifically, lines 162-166), if the result of the Decoder’s underlying Scanner object’s call to Scan() returns false, these conditions will be met. Looking into the Scan function in bufio/scan.go here, if the buffer length is greater than the maxTokenSize (which is 64 * 1024 = 65536), the method returns false. This false is then being used by Decoder to return io.EOF, even though this is not the end of the file.

  • bool type support (resolves #39)

    bool type support (resolves #39)

    This is a change for requested feature here #39

    When contributing to this repository, please first discuss the change you wish to make via issue.

    Issue for this already created, no one mentioned that starts it so I decided to fix this one as for me it seems to be important thing to parse.

    • [x] Ensure your PR is up-to-date with the master branch.
    • [x] Ensure all unit tests and integration tests are passing Covered all possible scenarios it may happen with bool, but open for suggestions :)
    • [x] Ensure changes are covered by unit tests where possible
    • [x] Ensure changes to the package's public API are covered in the documentation Covered Example_* in tests, changed README for bool example
    • [x] Remove superfluous changes to keep the diff as small as possible
  • Fixes #6 -- expose if an EOF happens in Decoder.Decoder

    Fixes #6 -- expose if an EOF happens in Decoder.Decoder

    You'll see that there were two tests I needed to change. The first change I think is definitely right -- you can't read a struct with fields from empty data. The second one I'm less confident in (see the TODO comment I left), I'd appreciate your feedback on that.

  • Enable `leftpad` option for encoding

    Enable `leftpad` option for encoding

    In some formats values would be expected to be padded in the left.

    e.g:

    type DifferentFixedTags struct {
    	RegularStr string `fixed:"1,5"`
    	RegularInt int    `fixed:"6,10"`
    
     	PadStr string `fixed:"11,15,leftpad"`
    	PadInt int    `fixed:"16,20,leftpad"`
    }
    d := DifferentFixedTags{"one", 1, "two", 2}
    fixedwidth.Marshal(d)
    // Result: "one  1      two00002"
    
  • Optional TrimSpace

    Optional TrimSpace

    The decoder automatically trims spaces which I guess works for most people. I have a case where I do not want to trim spaces. Would it be possible to allow this to be configured via an option on the decoder?

  • Add formatting options (e.g. left pad)

    Add formatting options (e.g. left pad)

    The encoder should support common formatting needs such as left padding.

    Proposed Spec

    Formatting Options

    • default - No padding is applied to the value.
    • rightpad - The value is padded on the right to fill available space.
    • leftpad - The value is padded on the left to fill available space.

    In all cases the value is written to the available space in a left-to-right manner. If the value length is greater than available space, the right most characters will be omitted.

    Struct Tags

    The struct tag schema will be updated to support and optional third option to specify formatting – fixed:"{startPos},{endPos},{format}".

    Padding Characters

    | Types | Padding Character | | ----- | ------------------- | | int, int8, int16, int32, int64 | 0 | | uint, uint8, uint16, uint32, uint64 | 0 | | float32, float64| 0 | | string, []byte | \u0020 (space) |

    Any type not listed will default to being padded with \u0020 (space).

  • Allow default formatting to be configured at the struct level

    Allow default formatting to be configured at the struct level

    As of v0.7.0 formatting is configurable at the struct field level.

    type Record struct {
    	Field1 string `fixed:"1,5,left,#"`
    	Field2 string `fixed:"6,10,left,#"`
    	Field3 string `fixed:"11,15,left,#"`
    	Field4 string `fixed:"16,20,left,#"`
    	...
    }
    

    Adding all of the required tags can be tedious when all of the fields require a specific format. To alleviate this, there should be a mechanism to set the default format for all the fields in a struct.

    My current thought is to implement something similar to xml.Name.

    type Record struct {
    	// Format is a special struct that can be embedded into a struct to control
    	// the default formatting of its fields.
      	fixedwidth.Format `fixed:"left,#"`
    
    	Field1 string `fixed:"1,5"`
    	Field2 string `fixed:"6,10"`
    	Field3 string `fixed:"11,15"`
    	Field4 string `fixed:"16,20,right,0"` // Override the default formatting.
    	...
    }
    
  • Encoder Strict Mode

    Encoder Strict Mode

    There should be an opt-in strict mode on the encoder that triggers an error when a value does not fit in the available space. It should also throw an error when the intervals defined for a struct overlap.

  • Behavior of complex types

    Behavior of complex types

    The behavior of more complex types need to be defined/implemented.

    • [ ] nested structs with tag
    • [ ] nested structs without tag
    • [ ] embedded struct with tag
    • [ ] embedded struct without tag
    type Nested struct {
    	F1 string `fixed:"1,10"`
    	F2 struct {
    		E1 string `fixed:"11,20"`
    		E2 string `fixed:"21,30"`
    	}
    }
    
    type NestedWithTag struct {
    	F1 string `fixed:"1,10"`
    	F2 struct {
    		E1 string `fixed:"1,10"`
    		E2 string `fixed:"11,20"`
    	} `fixed:"11,30"`
    }
    
    type S1 struct {
    	F1 string `fixed:"1,10"`
    	F4 string `fixed:"31,40"`
    }
    
    type Embedded struct {
    	S1
    	F2 string `fixed:"11,20"`
    	F3 string `fixed:"21,30"`
    }
    
    type S2 struct {
    	F3 string `fixed:"1,10"`
    	F4 string `fixed:"11,20"`
    }
    
    type EmbeddedWithTag struct {
    	F2 string `fixed:"1,10"`
    	F3 string `fixed:"11,20"`
    	S2 `fixed:"21,40"`
    }
    
Wrap unicode text not to exceed a certain width.

wwrap Wrap unicode text not to exceed a specified column width. There is a fold utility in the GNU Coreutils package, but unfortunately it works on by

Dec 1, 2021
Generate markdown formatted sprint updates based on the Jira tickets were involved in the given sprint.

Generate markdown formatted sprint updates based on the Jira tickets were involved in the given sprint.

Nov 15, 2021
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.

csvplus Package csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream processing operations, indices and joins. The

Apr 9, 2022
Parse data and test fixtures from markdown files, and patch them programmatically, too.

go-testmark Do you need test fixtures and example data for your project, in a language agnostic way? Do you want it to be easy to combine with documen

Oct 31, 2022
ByNom is a Go package for parsing byte sequences, suitable for parsing text and binary data

ByNom is a Go package for parsing byte sequences. Its goal is to provide tools to build safe byte parsers without compromising the speed or memo

May 5, 2021
A fast, easy-of-use and dependency free custom mapping from .csv data into Golang structs

csvparser This package provides a fast and easy-of-use custom mapping from .csv data into Golang structs. Index Pre-requisites Installation Examples C

Nov 14, 2022
Extract structured data from web sites. Web sites scraping.
Extract structured data from web sites. Web sites scraping.

Dataflow kit Dataflow kit ("DFK") is a Web Scraping framework for Gophers. It extracts data from web pages, following the specified CSS Selectors. You

Jan 7, 2023
Gotabulate - Easily pretty-print your tabular data with Go

Gotabulate - Easily pretty-print tabular data Summary Go-Tabulate - Generic Go Library for easy pretty-printing of tabular data. Installation go get g

Dec 27, 2022
Faker is a Go library that generates fake data for you.
Faker is a Go library that generates fake data for you.

Faker is a Go library that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in your p

Jan 7, 2023
Easily to convert JSON data to Markdown Table

Easily to convert JSON data to Markdown Table

Oct 28, 2022
Auto-gen fuzzing wrappers from normal code. Automatically find buggy call sequences, including data races & deadlocks. Supports rich signature types.

fzgen fzgen auto-generates fuzzing wrappers for Go 1.18, optionally finds problematic API call sequences, can automatically wire outputs to inputs acr

Dec 23, 2022
Build "Dictionary of the Old Norwegian Language" into easier-to-use data formats

Old Norwegian Dictionary Builder Build "Dictionary of the Old Norwegian Language" into easier-to-use data formats. Available formats: JSON DSL XML Usa

Oct 11, 2022
Decode / encode XML to/from map[string]interface{} (or JSON); extract values with dot-notation paths and wildcards. Replaces x2j and j2x packages.

mxj - to/from maps, XML and JSON Decode/encode XML to/from map[string]interface{} (or JSON) values, and extract/modify values from maps by key or key-

Dec 29, 2022
Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler
Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler

Pagser Pagser inspired by page parser。 Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and str

Dec 13, 2022
[Go] Package of validators and sanitizers for strings, numerics, slices and structs

govalidator A package of validators and sanitizers for strings, structs and collections. Based on validator.js. Installation Make sure that Go is inst

Dec 28, 2022
Take screenshots of websites and create PDF from HTML pages using chromium and docker

gochro is a small docker image with chromium installed and a golang based webserver to interact wit it. It can be used to take screenshots of w

Nov 23, 2022
Watches container registries for new and changed tags and creates an RSS feed for detected changes.

Tagwatch Watches container registries for new and changed tags and creates an RSS feed for detected changes. Configuration Tagwatch is configured thro

Jan 7, 2022
A general purpose application and library for aligning text.

align A general purpose application that aligns text The focus of this application is to provide a fast, efficient, and useful tool for aligning text.

Sep 27, 2022
Parse placeholder and wildcard text commands

allot allot is a small Golang library to match and parse commands with pre-defined strings. For example use allot to define a list of commands your CL

Nov 24, 2022