csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.

csvplus

GoDoc Go report License: BSD 3-Clause

Package csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream processing operations, indices and joins.

The library is primarily designed for ETL-like processes. It is mostly useful in places where the more advanced searching/joining capabilities of a fully-featured SQL database are not required, but the same time the data transformations needed still include SQL-like operations.

License: BSD

Examples

Simple sequential processing:

people := csvplus.FromFile("people.csv").SelectColumns("name", "surname", "id")

err := csvplus.Take(people).
	Filter(csvplus.Like(csvplus.Row{"name": "Amelia"})).
	Map(func(row csvplus.Row) csvplus.Row { row["name"] = "Julia"; return row }).
	ToCsvFile("out.csv", "name", "surname")

if err != nil {
	return err
}

More involved example:

customers := csvplus.FromFile("people.csv").SelectColumns("id", "name", "surname")
custIndex, err := csvplus.Take(customers).UniqueIndexOn("id")

if err != nil {
	return err
}

products := csvplus.FromFile("stock.csv").SelectColumns("prod_id", "product", "price")
prodIndex, err := csvplus.Take(products).UniqueIndexOn("prod_id")

if err != nil {
	return err
}

orders := csvplus.FromFile("orders.csv").SelectColumns("cust_id", "prod_id", "qty", "ts")
iter := csvplus.Take(orders).Join(custIndex, "cust_id").Join(prodIndex)

return iter(func(row csvplus.Row) error {
	// prints lines like:
	//	John Doe bought 38 oranges for £0.03 each on 2016-09-14T08:48:22+01:00
	_, e := fmt.Printf("%s %s bought %s %ss for £%s each on %s\n",
		row["name"], row["surname"], row["qty"], row["product"], row["price"], row["ts"])
	return e
})

Design principles

The package functionality is based on the operations on the following entities:

  • type Row
  • type DataSource
  • type Index

Type Row

Row represents one row from a DataSource. It is a map from column names to the string values under those columns on the current row. The package expects a unique name assigned to every column at source. Compared to using integer indices this provides more convenience when complex transformations get applied to each row during processing.

type DataSource

Type DataSource represents any source of zero or more rows, like .csv file. This is a function that when invoked feeds the given callback with the data from its source, one Row at a time. The type also has a number of operations defined on it that provide for easy composition of the operations on the DataSource, forming so called fluent interface. All these operations are 'lazy', i.e. they are not performed immediately, but instead each of them returns a new DataSource.

There is also a number of convenience operations that actually invoke the DataSource function to produce a specific type of output:

  • IndexOn to build an index on the specified column(s);
  • UniqueIndexOn to build a unique index on the specified column(s);
  • ToCsv to serialise the DataSource to the given io.Writer in .csv format;
  • ToCsvFile to store the DataSource in the specified file in .csv format;
  • ToJSON to serialise the DataSource to the given io.Writer in JSON format;
  • ToJSONFile to store the DataSource in the specified file in JSON format;
  • ToRows to convert the DataSource to a slice of Rows.

Type Index

Index is a sorted collection of rows. The sorting is performed on the columns specified when the index is created. Iteration over an index yields a sorted sequence of rows. An Index can be joined with a DataSource. The type has operations for finding rows and creating sub-indices in O(log(n)) time. Another useful operation is resolving duplicates. Building an index takes O(n*log(n)) time. It should be noted that the Index building operation requires the entire dataset to be read into the memory, so certain care should be taken when indexing huge datasets. An index can also be stored to, or loaded from a disk file.

For more details see the documentation.

Project status

The project is in a usable state usually called "beta". Tested on Linux Mint 18.3 using Go version 1.10.2.

Comments
  • Reading from an io.Reader/io.ReadCloser

    Reading from an io.Reader/io.ReadCloser

    Enhancement: it would be useful (and idiomatic) to be able to read & parse data coming from an io.Reader (and related interfaces like io.ReadCloser), not just .FromFile.

  • .SelectColumns might not be optional

    .SelectColumns might not be optional

    I'm loading a csv file with 38 fields. Until I selected a few interesting columns with .SelectColumns(), all the rows were correctly read but all rows were empty. The csv file does have field names in the first row.

    See https://gist.github.com/keltia/f1dcb745fdef27d5fb2da76b17c9c124 for code extract. The repo is not public because it contains personal test data I can not show.

    First line of the csv is the following:

    EmailAddress;PSComputerName;RunspaceId;FirstSyncTime;LastPolicyUpdateTime;LastSyncAttemptTime;LastSuccessSync;DeviceType;DeviceID;DeviceUserAgent;DeviceWipeSentTime;DeviceWipeRequestTime;DeviceWipeAckTime;LastPingHeartbeat;RecoveryPassword;DeviceModel;DeviceImei;DeviceFriendlyName;DeviceOS;DeviceOSLanguage;DevicePhoneNumber;MailboxLogReport;DeviceEnableOutboundSMS;DeviceMobileOperator;Identity;Guid;IsRemoteWipeSupported;Status;StatusNote;DeviceAccessState;DeviceAccessStateReason;DeviceAccessControlRule;DevicePolicyApplied;DevicePolicyApplicationStatus;LastDeviceWipeRequestor;DeviceActiveSyncVersion;NumberOfFoldersSynced;SyncStateUpgradeTime

  • How would I write output to csv for all headers irrespective of header names?

    How would I write output to csv for all headers irrespective of header names?

    Currently, I've to define column names to be able to output to file. Instead is there any option to include all columns?

    ToCsvFile("out.csv", "phone", "country_code", "phone_type", "carrier", "region")
    
  • UTF-8 + BOM: do you handle this?

    UTF-8 + BOM: do you handle this?

    I have a few csv files coming from a WIndows-based system and it generates them as UTF-8 + BOM. It seems that either encoding/csv or csvplus fails to handle this and the first field will be see as\uFEFF<the field>.

    "\ufeffEmailAddress":"[email protected]"
    

    It used to work before I switched to csvplus.

    Thanks.

  • tags are not properly formatted.

    tags are not properly formatted.

    Hi, I'm adding semantic versioning & vgo support to my modules and as the last two tags are without the "v" prefix, vgo assumes v0.2.4 is the latest, breaking my code which rely on FromFile().

    Could please re-tag with v0.3.0 & v0.3.1?

omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

omniparser Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JS

Jan 4, 2023
A fast, easy-of-use and dependency free custom mapping from .csv data into Golang structs

csvparser This package provides a fast and easy-of-use custom mapping from .csv data into Golang structs. Index Pre-requisites Installation Examples C

Nov 14, 2022
Your CSV pocket-knife (golang)

csvutil - Your CSV pocket-knife (golang) #WARNING I would advise against using this package. It was a language learning exercise from a time before "e

Oct 24, 2022
A markdown parser written in Go. Easy to extend, standard(CommonMark) compliant, well structured.

goldmark A Markdown parser written in Go. Easy to extend, standards-compliant, well-structured. goldmark is compliant with CommonMark 0.29. Motivation

Dec 29, 2022
Decode / encode XML to/from map[string]interface{} (or JSON); extract values with dot-notation paths and wildcards. Replaces x2j and j2x packages.

mxj - to/from maps, XML and JSON Decode/encode XML to/from map[string]interface{} (or JSON) values, and extract/modify values from maps by key or key-

Dec 29, 2022
Anybool - Useful interface for boolean settings and options

anybool Tricky and fun utilities for Go programs. AnyBool is a small utility wit

Feb 2, 2022
[Go] Package of validators and sanitizers for strings, numerics, slices and structs

govalidator A package of validators and sanitizers for strings, structs and collections. Based on validator.js. Installation Make sure that Go is inst

Dec 28, 2022
Package strit introduces a new type of string iterator, along with a number of iterator constructors, wrappers and combinators.

strit Package strit (STRing ITerator) assists in development of string processing pipelines by providing a simple iteration model that allows for easy

Jun 21, 2022
ByNom is a Go package for parsing byte sequences, suitable for parsing text and binary data

ByNom is a Go package for parsing byte sequences. Its goal is to provide tools to build safe byte parsers without compromising the speed or memo

May 5, 2021
Package i18n is a middleware that provides internationalization and localization for Flamego
Package i18n is a middleware that provides internationalization and localization for Flamego

i18n Package i18n is a middleware that provides internationalization and localization for Flamego. Installation The minimum requirement of Go is 1.16.

Dec 14, 2022
This package provides Go (golang) types and helper functions to do some basic but useful things with mxGraph diagrams in XML, which is most famously used by app.diagrams.net, the new name of draw.io.

Go Draw - Golang MX This package provides types and helper functions to do some basic but useful things with mxGraph diagrams in XML, which is most fa

Aug 30, 2022
A Package Searching and Installation tool for Go Projects.

Gosearch A Package Searching and Installation tool for Go Projects. Installation go install github.com/kinensake/[email protected] Usage gosearch <pack

Dec 19, 2022
A golang package to work with Decentralized Identifiers (DIDs)

did did is a Go package that provides tools to work with Decentralized Identifiers (DIDs). Install go get github.com/ockam-network/did Example packag

Nov 25, 2022
Genex package for Go

genex Genex package for Go Easy and efficient package to expand any given regex into all the possible strings that it can match. This is the code that

Nov 2, 2022
A declarative struct-tag-based HTML unmarshaling or scraping package for Go built on top of the goquery library

goq Example import ( "log" "net/http" "astuart.co/goq" ) // Structured representation for github file name table type example struct { Title str

Dec 12, 2022
Go (Golang) GNU gettext utilities package

Gotext GNU gettext utilities for Go. Features Implements GNU gettext support in native Go. Complete support for PO files including: Support for multil

Dec 18, 2022
htmlquery is golang XPath package for HTML query.

htmlquery Overview htmlquery is an XPath query package for HTML, lets you extract data or evaluate from HTML documents by an XPath expression. htmlque

Jan 4, 2023
Package sanitize provides functions for sanitizing text in golang strings.

sanitize Package sanitize provides functions to sanitize html and paths with go (golang). FUNCTIONS sanitize.Accents(s string) string Accents replaces

Dec 5, 2022
A markdown renderer package for the terminal
A markdown renderer package for the terminal

go-term-markdown go-term-markdown is a go package implementing a Markdown renderer for the terminal. Note: Markdown being originally designed to rende

Nov 25, 2022