A fast diff tool for comparing csv files

csvdiff

Build Status Go Doc Go Report Card codecov Downloads Latest release

A fast diff tool for comparing csv files.

What is csvdiff?

Csvdiff is a difftool to compute changes between two csv files.

  • It is not a traditional diff tool. It is most suitable for comparing csv files dumped from database tables. GNU diff tool is orders of magnitude faster on comparing line by line.
  • Supports selective comparison of fields in a row.
  • Supports specifying group of columns as primary-key i.e uniquely identify a row.
  • Support ignoring columns e.g ignore columns like created_at timestamps.
  • Compares csvs of million records csv in under 2 seconds.
  • Supports lot of output formats e.g colored git style output or JSON for post-processing.

Why?

I wanted to compare if the rows of a table before and after a given time and see what is the new changes that came in. Also, I wanted to selectively compare columns ignoring columns like created_at and updated_at. All I had was just the dumped csv files.

Demo

asciicast

Usage

$ csvdiff base.csv delta.csv
# Additions (1)
+ 24564,907,completely-newsite.com,com,19827,32902,completely-newsite.com,com,1621,909,19787,32822
# Modifications (1)
- 69,48,aol.com,com,97543,225532,aol.com,com,70,49,97328,224491
+ 69,1048,aol.com,com,97543,225532,aol.com,com,70,49,97328,224491
# Deletions (1)
- 1618,907,deleted-website.com,com,19827,32902,deleted-website.com,com,1621,909,19787,32822
Differentiates two csv files and finds out the additions and modifications.
Most suitable for csv files created from database tables

Usage:
  csvdiff <base-csv> <delta-csv> [flags]

Flags:
      --columns ints          Selectively compare positions in CSV Eg: 1,2. Default is entire row
  -o, --format string         Available (rowmark|json|legacy-json|diff|word-diff|color-words) (default "diff")
  -h, --help                  help for csvdiff
      --ignore-columns ints   Inverse of --columns flag. This cannot be used if --columns are specified
      --include ints          Include positions in CSV to display Eg: 1,2. Default is entire row
  -p, --primary-key ints      Primary key positions of the Input CSV as comma separated values Eg: 1,2 (default [0])
  -s, --separator string      use specific separator (\t, or any one character string) (default ",")
      --time                  Measure time
  -t, --toggle                Help message for toggle
      --version               version for csvdiff

Installation

Homebrew

brew tap thecasualcoder/stable
brew install csvdiff

Using binaries

# binary will be $GOPATH/bin/csvdiff
curl -sfL https://raw.githubusercontent.com/aswinkarthik/csvdiff/master/install.sh | sh -s -- -b $GOPATH/bin

# or install it into ./bin/
curl -sfL https://raw.githubusercontent.com/aswinkarthik/csvdiff/master/install.sh | sh -s

# In alpine linux (as it does not come with curl by default)
wget -O - -q https://raw.githubusercontent.com/aswinkarthik/csvdiff/master/install.sh | sh -s

Using source code

go get -u github.com/aswinkarthik/csvdiff

Use case

  • Cases where you have a base database dump as csv. If you receive the changes as another database dump as csv, this tool can be used to figure out what are the additions and modifications to the original database dump. The additions.csv can be used to create an insert.sql and with the modifications.csv an update.sql data migration.
  • The delta file can either contain just the changes or the entire table dump along with the changes.

Supported

  • Additions
  • Modifications
  • Deletions
  • Non comma separators

Not Supported

  • Cannot be used as a generic difftool. Requires a column to be used as a primary key from the csv.

Formats

There are a number of formats supported

  • diff: Git's diff style
  • word-diff: Git's --word-diff style
  • color-words: Git's --color-words style
  • json: JSON serialization of result
  • legacy-json: JSON serialization of result in old format
  • rowmark: Marks each row with ADDED or MODIFIED status.

Miscellaneous features

  • The --primary-key in an integer array. Specify comma separated positions if the table has a compound key. Using this primary key, it can figure out modifications. If the primary key changes, it is an addition.
% csvdiff base.csv delta.csv --primary-key 0,1
  • If you want to compare only few columns in the csv when computing hash,
% csvdiff base.csv delta.csv --primary-key 0,1 --columns 2
  • Supports JSON format for post processing
% csvdiff examples/base-small.csv examples/delta-small.csv --format json | jq '.'
{
  "Additions": [
    "24564,907,completely-newsite.com,com,19827,32902,completely-newsite.com,com,1621,909,19787,32822"
  ],
  "Modifications": [{
    "Original": "69,1048,aol.com,com,97543,225532,aol.com,com,70,49,97328,224491",
    "Current":  "69,1049,aol.com,com,97543,225532,aol.com,com,70,49,97328,224491"
  }],
  "Deletions": [
    "1615,905,deleted-website.com,com,19833,33110,deleted-website.com,com,1613,902,19835,33135"
  ]
}

Build locally

$ git clone https://github.com/aswinkarthik/csvdiff
$ go get ./...
$ go build

# To run tests
$ go get github.com/stretchr/testify/assert
$ go test -v ./...

Algorithm

  • Creates a map of <uint64, uint64> for both base and delta file
    • key is a hash of the primary key values as csv
    • value is a hash of the entire row
  • Two maps as initial processing output
    • base-map
    • delta-map
  • The delta map is compared with the base map. As long as primary key is unchanged, they row will have same key. An entry in delta map is a
    • Addition, if the base-map's does not have a value.
    • Modification, if the base-map's value is different.
    • Deletions, if the base-map has no match on the delta map.

Credits

  • Uses 64 bit xxHash algorithm, an extremely fast non-cryptographic hash algorithm, for creating the hash. Implementations from cespare
  • Used Majestic million data for demo.
Owner
Aswin Karthik
Full stack/Polyglot developer, Devops practitioner
Aswin Karthik
Comments
  • Option to ignore only some columns

    Option to ignore only some columns

    I would like to compare a 50-column CSV file and I want to ignore the differences in a few columns only. Instead of specifying --columns 0,1,2,3,4..32,34..41,43..50 it would be nice to say --ignore-columns 33,42.

    This is not the same as #20 which only controls the display of the columns.

  • unknown flag: --separator / unknown shorthand flag: 's' in -s

    unknown flag: --separator / unknown shorthand flag: 's' in -s

    Tested with latest csvdiff_1.3.0_windows_amd64.tar.gz release.

    > csvdiff.exe --version
    csvdiff version 1.3.0
    
    > csvdiff.exe old.csv new.csv -p 0,1,2 --separator ";"
    csvdiff: command failed - unknown flag: --separator
    
    > csvdiff.exe old.csv new.csv -p 0,1,2 -s ";"
    csvdiff: command failed - unknown shorthand flag: 's' in -s
    
  • Support non-comma separator

    Support non-comma separator

    Currently, csvdiff supports only comma. csv.Reader allows overriding comma to any rune. We can extend this to make the rune configurable as a character so that TSVs or other separators can be supported

  • Error when compare

    Error when compare

    Hi, i tried csvdiff but i have an error comparing files : csvdiff failed: error processing base file: record on line 7: wrong number of fields on line 7, i have the same number of fields but in a field there seem to be a tabulation... The csv is generated by a program and it is separated by ; so why does it complain about the number of fields ? because of the tab in the field ?

    Thanks in advance

  • CSV columns not quoted properly in the output

    CSV columns not quoted properly in the output

    The following files, when diffed, don't produce the proper output:

    1.csv:

    1,"one,two"
    

    2.csv

    1,"one,four"
    
    csvdiff 1.csv 2.csv
    # Additions (0)
    # Modifications (1)
    - 1,one,two
    + 1,one,four
    

    The output looks like it has 3 columns, while both files only have two.

  • Add printing only added row mode

    Add printing only added row mode

    • add printing only added row mode(--format only-added)
    csvdiff hoge.csv huga.csv
    

    + 1,apple
    - 2,orrange
    + 2,cherry
    - 3,banana
    
    csvdiff hoge.csv huga.csv --format only-added
    

    1,apple
    2,cherry
    
  • panic: runtime error: index out of range [1] with length 0

    panic: runtime error: index out of range [1] with length 0

    Hey, I get a runtime issue when using this command:

    $ csvdiff --columns '1,2,3,4' -p 1 csv/bno_1582942483.csv csv/bno_1582992337.csv

    # Additions (2)
    + Qatar,1,0,
    + Ecuador,1,0,
    # Modifications (32)
    - Guangdong province,1348,7,"21 serious, 22 critical"
    + Guangdong province,1349,7,"18 serious, 20 critical"
    - Henan province,1272,20,"9 serious, 9 critical"
    + Henan province,1272,21,"7 serious, 6 critical"
    - Zhejiang province,1205,1,"10 serious, 11 critical"
    + Zhejiang province,1205,1,"10 serious, 9 critical"
    - Hunan province,1017,4,18 serious
    + Hunan province,1018,4,19 serious
    - Beijing,410,7,
    + Beijing,411,8,
    - Shanghai,337,3,"9 serious, 3 critical"
    + Shanghai,337,3,10 serious
    - Other regions,7321,64,
    + Other regions,7322,64,
    - TOTAL,5931,88,151 serious/critical
    + TOTAL,79251,2835,"7,664 serious39,002 recovered1,418 suspected"
    - South Korea,2931,16,"7 critical, 24 recovered"
    + South Korea,3150,17,"10 critical, 24 recovered"
    - Iran,388,34,73 recovered
    + Iran,593,43,123 recovered
    - Japan,234*,5,"20 serious, 32 recovered"
    + Japan,241*,5,"20 serious, 32 recovered"
    - Singapore,96,0,"8 critical, 66 recovered"
    + Singapore,102,0,"7 critical, 72 recovered"
    - Germany,48,0,"2 critical, 16 recovered"
    + Germany,79,0,"2 critical, 16 recovered"
    - United States,64,0,"1 serious, 6 recovered"
    + United States,66,0,"1 serious, 7 recovered"
    - France,57,2,"1 serious, 11 recovered"
    + France,73,2,"1 serious, 12 recovered"
    - Spain,32,0,"1 serious, 2 recovered"
    + Spain,45,0,"1 serious, 2 recovered"
    - Thailand,41,0,"2 serious, 15 recovered"
    + Thailand,42,0,"2 critical, 28 recovered"
    - Taiwan,34,1,5 recovered
    + Taiwan,39,1,5 recovered
    - Bahrain,38,0,
    + Bahrain,41,0,
    - United Kingdom,20,0,8 recovered
    + United Kingdom,23,0,8 recovered
    - Canada,15,0,3 recovered
    + Canada,16,0,3 recovered
    - Sweden,11,0,
    + Sweden,12,0,
    - Oman,6,0,
    + Oman,6,0,1 recovered
    - Austria,6,0,
    + Austria,9,0,
    - Croatia,5,0,
    + Croatia,6,0,
    - Pakistan,2,0,
    + Pakistan,4,0,
    - Georgia,1,0,
    + Georgia,3,0,
    - Norway,6,0,
    + Norway,7,0,
    - Denmark,2,0,
    + Denmark,3,0,
    - Netherlands,2,0,
    + Netherlands,6,0,
    - Mexico,2,0,
    + Mexico,3,0,
    panic: runtime error: index out of range [1] with length 0
    
    goroutine 1 [running]:
    github.com/aswinkarthik/csvdiff/pkg/digest.Positions.String(0xc0001360e0, 0x4, 0x4, 0x0, 0x0, 0x0, 0x2c, 0xc000135840, 0xb)
    	/private/tmp/csvdiff-20200229-14031-ibocos/csvdiff-1.4.0/pkg/digest/positions.go:35 +0x320
    github.com/aswinkarthik/csvdiff/cmd.(*Formatter).lineDiff(0xc000197b28, 0xc00011cf00, 0x2, 0x2, 0xc000200000, 0x20, 0x20, 0xc00012a480, 0x1, 0x1, ...)
    	/private/tmp/csvdiff-20200229-14031-ibocos/csvdiff-1.4.0/cmd/formatter.go:203 +0x650
    github.com/aswinkarthik/csvdiff/cmd.(*Formatter).Format(0xc000197b28, 0xc00011cf00, 0x2, 0x2, 0xc000200000, 0x20, 0x20, 0xc00012a480, 0x1, 0x1, ...)
    	/private/tmp/csvdiff-20200229-14031-ibocos/csvdiff-1.4.0/cmd/formatter.go:49 +0x17f
    github.com/aswinkarthik/csvdiff/cmd.runContext(0xc000190000, 0x14759c0, 0xc000130008, 0x14759c0, 0xc000130010, 0x0, 0x0)
    	/private/tmp/csvdiff-20200229-14031-ibocos/csvdiff-1.4.0/cmd/root.go:108 +0x4fe
    github.com/aswinkarthik/csvdiff/cmd.glob..func2(0x16f5380, 0xc000132300, 0x2, 0x6, 0x0, 0x0)
    	/private/tmp/csvdiff-20200229-14031-ibocos/csvdiff-1.4.0/cmd/root.go:87 +0x392
    github.com/spf13/cobra.(*Command).execute(0x16f5380, 0xc000138010, 0x6, 0x6, 0x16f5380, 0xc000138010)
    	/private/tmp/csvdiff-20200229-14031-ibocos/csvdiff-1.4.0/.brew_home/go/pkg/mod/github.com/spf13/[email protected]/command.go:826 +0x453
    github.com/spf13/cobra.(*Command).ExecuteC(0x16f5380, 0xc00011ae90, 0xc00003ef48, 0x1042b3a)
    	/private/tmp/csvdiff-20200229-14031-ibocos/csvdiff-1.4.0/.brew_home/go/pkg/mod/github.com/spf13/[email protected]/command.go:914 +0x2fb
    github.com/spf13/cobra.(*Command).Execute(...)
    	/private/tmp/csvdiff-20200229-14031-ibocos/csvdiff-1.4.0/.brew_home/go/pkg/mod/github.com/spf13/[email protected]/command.go:864
    github.com/aswinkarthik/csvdiff/cmd.Execute()
    	/private/tmp/csvdiff-20200229-14031-ibocos/csvdiff-1.4.0/cmd/root.go:115 +0x76
    main.main()
    	/private/tmp/csvdiff-20200229-14031-ibocos/csvdiff-1.4.0/main.go:29 +0x59
    

    CSV files:

    china,location,Cases,Deaths,Notes,date2,unixtime
    inside,Hubei province (includes Wuhan),66337,2727,"7,370 serious",Fri Feb 28 18:14:43 PST 2020,1582942483
    inside,Guangdong province,1348,7,"21 serious, 22 critical",Fri Feb 28 18:14:43 PST 2020,1582942483
    inside,Henan province,1272,20,"9 serious, 9 critical",Fri Feb 28 18:14:43 PST 2020,1582942483
    inside,Zhejiang province,1205,1,"10 serious, 11 critical",Fri Feb 28 18:14:43 PST 2020,1582942483
    inside,Hunan province,1017,4,18 serious,Fri Feb 28 18:14:43 PST 2020,1582942483
    inside,Beijing,410,7,,Fri Feb 28 18:14:43 PST 2020,1582942483
    inside,Shanghai,337,3,"9 serious, 3 critical",Fri Feb 28 18:14:43 PST 2020,1582942483
    inside,Other regions,7321,64,,Fri Feb 28 18:14:43 PST 2020,1582942483
    inside,Undisclosed,4,2,,Fri Feb 28 18:14:43 PST 2020,1582942483
    inside,TOTAL,79251,2835,"7,664 serious39,002 recovered1,418 suspected",Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Diamond Princess,705,6,"36 serious, 10 recovered",Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,South Korea,2931,16,"7 critical, 24 recovered",Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Italy,889,21,"64 serious, 46 recovered",Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Japan,234*,5,"20 serious, 32 recovered",Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Iran,388,34,73 recovered,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Singapore,96,0,"8 critical, 66 recovered",Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Hong Kong,93,2,"4 critical, 2 serious, 18 recovered",Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,United States,64,0,"1 serious, 6 recovered",Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Thailand,41,0,"2 serious, 15 recovered",Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Taiwan,34,1,5 recovered,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Malaysia,25,0,17 recovered,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Australia,25,0,15 recovered,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Germany,48,0,"2 critical, 16 recovered",Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Vietnam,16,0,16 recovered,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,UAE,19,0,"2 serious, 5 recovered",Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,United Kingdom,20,0,8 recovered,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,France,57,2,"1 serious, 11 recovered",Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Canada,15,0,3 recovered,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Macau,10,0,6 recovered,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Bahrain,38,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Kuwait,45,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Iraq,8,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Philippines,3,1,2 recovered,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,India,3,0,3 recovered,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Russia,2,0,2 recovered,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Spain,32,0,"1 serious, 2 recovered",Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Oman,6,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Nepal,1,0,1 recovered,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Cambodia,1,0,1 recovered,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Sri Lanka,1,0,1 recovered,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Finland,3,0,1 recovered,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Sweden,11,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Belgium,1,0,1 recovered,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Egypt,1,0,1 recovered,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Israel,7,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Lebanon,4,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Afghanistan,1,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Austria,6,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Croatia,5,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Switzerland,8,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Algeria,1,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Brazil,1,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Greece,4,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Pakistan,2,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,North Macedonia,1,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Georgia,1,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Norway,6,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Romania,3,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Denmark,2,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Estonia,1,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Netherlands,2,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,San Marino,1,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Nigeria,1,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Lithuania,1,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,New Zealand,1,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Belarus,1,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Mexico,2,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Azerbaijan,1,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Iceland,1,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,Monaco,1,0,,Fri Feb 28 18:14:43 PST 2020,1582942483
    outside,TOTAL,5931,88,151 serious/critical,Fri Feb 28 18:14:43 PST 2020,1582942483
    
    china,location,Cases,Deaths,Notes,date2,unixtime
    inside,Hubei province (includes Wuhan),66337,2727,"7,370 serious",Sat Feb 29 08:05:37 PST 2020,1582992337
    inside,Guangdong province,1349,7,"18 serious, 20 critical",Sat Feb 29 08:05:37 PST 2020,1582992337
    inside,Henan province,1272,21,"7 serious, 6 critical",Sat Feb 29 08:05:37 PST 2020,1582992337
    inside,Zhejiang province,1205,1,"10 serious, 9 critical",Sat Feb 29 08:05:37 PST 2020,1582992337
    inside,Hunan province,1018,4,19 serious,Sat Feb 29 08:05:37 PST 2020,1582992337
    inside,Beijing,411,8,,Sat Feb 29 08:05:37 PST 2020,1582992337
    inside,Shanghai,337,3,10 serious,Sat Feb 29 08:05:37 PST 2020,1582992337
    inside,Other regions,7322,64,,Sat Feb 29 08:05:37 PST 2020,1582992337
    inside,TOTAL,79251,2835,"7,664 serious39,002 recovered1,418 suspected",Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,South Korea,3150,17,"10 critical, 24 recovered",Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Italy,889,21,"64 serious, 46 recovered",Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Diamond Princess,705,6,"36 serious, 10 recovered",Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Iran,593,43,123 recovered,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Japan,241*,5,"20 serious, 32 recovered",Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Singapore,102,0,"7 critical, 72 recovered",Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Hong Kong,93,2,"4 critical, 2 serious, 18 recovered",Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Germany,79,0,"2 critical, 16 recovered",Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,United States,66,0,"1 serious, 7 recovered",Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,France,73,2,"1 serious, 12 recovered",Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Spain,45,0,"1 serious, 2 recovered",Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Kuwait,45,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Thailand,42,0,"2 critical, 28 recovered",Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Taiwan,39,1,5 recovered,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Bahrain,41,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Malaysia,25,0,17 recovered,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Australia,25,0,15 recovered,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,United Kingdom,23,0,8 recovered,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,UAE,19,0,"2 serious, 5 recovered",Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Vietnam,16,0,16 recovered,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Canada,16,0,3 recovered,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Sweden,12,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Macau,10,0,6 recovered,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Iraq,8,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Israel,7,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Oman,6,0,1 recovered,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Philippines,3,1,2 recovered,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,India,3,0,3 recovered,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Finland,3,0,1 recovered,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Russia,2,0,2 recovered,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Nepal,1,0,1 recovered,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Cambodia,1,0,1 recovered,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Sri Lanka,1,0,1 recovered,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Belgium,1,0,1 recovered,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Egypt,1,0,1 recovered,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Lebanon,4,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Afghanistan,1,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Austria,9,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Croatia,6,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Switzerland,8,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Algeria,1,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Brazil,1,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Greece,4,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Pakistan,4,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,North Macedonia,1,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Georgia,3,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Norway,7,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Romania,3,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Denmark,3,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Estonia,1,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Netherlands,6,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,San Marino,1,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Nigeria,1,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Lithuania,1,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,New Zealand,1,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Belarus,1,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Mexico,3,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Azerbaijan,1,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Iceland,1,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Monaco,1,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Qatar,1,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,Ecuador,1,0,,Sat Feb 29 08:05:37 PST 2020,1582992337
    outside,TOTAL,6461,98,154 serious/critical,Sat Feb 29 08:05:37 PST 2020,1582992337
    
  • Add support for unescaped quotes

    Add support for unescaped quotes

    Thank you for this very useful tool!

    These changes expose the CSV reader lazyQuote parameter, which allows support for unescaped quotes in data.

    e.g.

    fieldOneValue,column2Data,5" from wall,false
    
  • Detect also deleted rows when used with primary keys

    Detect also deleted rows when used with primary keys

    Hello there, I really love this tool, however when I do this:

    csvdiff old.txt new.txt --primary-key 0,1 --format json > diff.json

    Where for example old is:

    key,data a,1 b,2 c,3

    And new is:

    key,data b,5 c,3 d,6

    It will only detect that b has changed and d has been added but not that a has been deleted.

  • Add different formats

    Add different formats

    Fixes #14

    Formats added:

    1. git diff style (default now)
    2. git diff --color-words
    3. git diff --word-diff
    4. Updated JSON with original row for modifications
    5. Old formats are retained
  • Fixes #20: Include columns to display

    Fixes #20: Include columns to display

    As part of this PR, 3 changes are introduced

    • --include option to selectively display certain columns
    • Better error handling instead of blind panicking
    • Managing version is moved out
  • Any Way to Handle Unescaped Quotation Marks in Source Files

    Any Way to Handle Unescaped Quotation Marks in Source Files

    When processing pipe-separated files with unescaped quotation marks in fields, an error is generated, so source files have to be pre-processed to escape or remove quotation marks.

    "bare " in non-quoted-field"

  • Panic while word-diffing

    Panic while word-diffing

    I get an exception, while comparing two simple files using csvdiff -o word-diff:

    x,[-C-]{+A+}
    panic: runtime error: index out of range [0] with length 0
    
    goroutine 1 [running]:
    github.com/aswinkarthik/csvdiff/cmd.(*Formatter).wordLevelDiffs(0xc0000f7ac0, 0xb1a638, 0x0, 0x0, 0xc000182000, 0x2, 0x2, 0xb1a638, 0x0, 0x0, ...)
            /home/travis/gopath/src/github.com/aswinkarthik/csvdiff/cmd/formatter.go:242 +0x1338
    github.com/aswinkarthik/csvdiff/cmd.(*Formatter).wordDiff(...)
            /home/travis/gopath/src/github.com/aswinkarthik/csvdiff/cmd/formatter.go:216
    github.com/aswinkarthik/csvdiff/cmd.(*Formatter).Format(0xc0000f7ac0, 0xb1a638, 0x0, 0x0, 0xc000182000, 0x2, 0x2, 0xb1a638, 0x0, 0x0, ...)
            /home/travis/gopath/src/github.com/aswinkarthik/csvdiff/cmd/formatter.go:51 +0x504
    github.com/aswinkarthik/csvdiff/cmd.runContext(0xc0000f0000, 0x884060, 0xc000010018, 0x884060, 0xc000010020, 0x0, 0x0)
            /home/travis/gopath/src/github.com/aswinkarthik/csvdiff/cmd/root.go:108 +0x4c5
    github.com/aswinkarthik/csvdiff/cmd.glob..func2(0xaf89c0, 0xc0000243c0, 0x2, 0x4, 0x0, 0x0)
            /home/travis/gopath/src/github.com/aswinkarthik/csvdiff/cmd/root.go:87 +0x283
    github.com/spf13/cobra.(*Command).execute(0xaf89c0, 0xc000012060, 0x4, 0x4, 0xaf89c0, 0xc000012060)
            /home/travis/gopath/pkg/mod/github.com/spf13/[email protected]/command.go:826 +0x460
    github.com/spf13/cobra.(*Command).ExecuteC(0xaf89c0, 0xc00005cda0, 0xc000059f20, 0x43cb8a)
            /home/travis/gopath/pkg/mod/github.com/spf13/[email protected]/command.go:914 +0x2fb
    github.com/spf13/cobra.(*Command).Execute(...)
            /home/travis/gopath/pkg/mod/github.com/spf13/[email protected]/command.go:864
    github.com/aswinkarthik/csvdiff/cmd.Execute()
            /home/travis/gopath/src/github.com/aswinkarthik/csvdiff/cmd/root.go:115 +0x76
    main.main()
            /home/travis/gopath/src/github.com/aswinkarthik/csvdiff/main.go:29 +0x59
    

    Interestingly enough, this even happens with two equal files, including the difference found ([-C-]{+A+}). My example data, for both files:

    x,A
    x,B
    x,C
    

    I tested both DOS (CRLF) and UNIX (LF) line endings. I'm running version 1.4.0 in Ubuntu via WSL2.

  • Wrong number of fields

    Wrong number of fields

    "01dd672d-e078-46c5-ae4a-6cc284125664", "zoe", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.61 Safari/537.36", "TtYuBV9fFy6HTlVl7AOVhuoZmH9hhxRa", "20be9326b2a53b98773c227fe1f745d8", NULL, "2022-06-20 08:54:14.751721+00", "2022-09-20 08:54:14.751721+00", "2022-06-20 08:54:14.751721+00", "2022-06-20 08:54:14.751721+00"
    "02c79b4a-d753-4397-9074-124b54fa7a47", "znhrbvqnxx", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0", "wu0IV4VwCtkJJ07TMEMp5bH6XdIq09MS", "4f01ab7686f5d680e256721cd53d2d45", NULL, "2022-09-05 10:03:25.568272+00", "2022-12-05 10:03:25.568272+00", "2022-09-05 10:03:25.568272+00", "2022-09-05 10:03:25.568272+00"
    

    Taking the 2 rows above, and writing them both to demo1.csv and demo2.csv, the output of:

    csvdiff dist/demo1.csv dist/demo2.csv --lazyquotes
    

    is csvdiff: command failed - error processing base file: record on line 2: wrong number of fields.

    (Note I'm writing the same data to both files.)

    Without --lazyquotes the issue is csvdiff: command failed - error in base-file: parse error on line 1, column 41: bare " in non-quoted-field.

    Any tips on what I'm doing wrong? This seems like valid, quoted CSV data.

    Thanks a lot!

  • new format

    new format "diff-file" as a delta between the base and the delta file

    I needed a delta between two gigantic CSV files for a project, your tool helped me a lot. As a thank you, I would like to play back the resulting customization and the new format, if you are interested.

    This PR provides a a new delta formatter diff-file.

    The following changes are included in the PR:

    • New format diff-file that creates a delta/diff CSV file between the base and delta files. The CSV file contains the headers of the delta file, as well as all additions and current modifications.
    • A formatter test to ensure the functionality of the new format
    • An update of the README containing the new format information

    Could also be related to Issue #19

  • Can we include more output at the end?

    Can we include more output at the end?

    Hello:

    I was wondering if we can include a line that says, how many lines/records were read from the first file. And what percentage of file is different from the other file. So basically, more stats at the end of the comparison.

Command-line tool to load csv and excel (xlsx) files and run sql commands
Command-line tool to load csv and excel (xlsx) files and run sql commands

csv-sql supports loading and saving results as CSV and XLSX files with data processing with SQLite compatible sql commands including joins.

Nov 2, 2022
A JSON diff utility

JayDiff A JSON diff utility. Install Downloading the compiled binary Download the latest version of the binary: releases extract the archive and place

Dec 11, 2022
Easy to use library and CLI utility to generate Go struct from CSV files.

csv2struct Easy to use library and CLI utility to generate Go struct from CSV files. As a benefit, it's fully compatible with csvutil. So, structs gen

Nov 7, 2022
A piece of software that shouldn't need to exist. Processes badly formatted outputs from Morgan Stanley Shareworks into usable CSV files.
A piece of software that shouldn't need to exist.  Processes badly formatted outputs from Morgan Stanley Shareworks into usable CSV files.

Shareworks-munger: A program to process Shareworks reports into Something Usable "Shareworks" is a product of Morgan Stanley which tracks certain kind

Jun 6, 2022
A simple script to run speedtest(offical) CLI tool and store the results in CSV

PeriodicBW A script made to run official speedtest.net binary periodically and store the results in a CSV file Installation Get the official speedtest

Aug 10, 2021
CLI tool to rank proposals according to Majority Judgment, from an input CSV
CLI tool to rank proposals according to Majority Judgment, from an input CSV

Majority Judgment CLI tool WORK IN PROGRESS Although the core mechanics are here and ranking does work, the features described in this README are not

Sep 10, 2022
Command line tool for processing client transaction data in CSV format

command line tool for processing client transaction data in CSV format. thank you for looking! build instructions: $ git clone https://github.com/lnit

Oct 29, 2021
Simple tool to convert a Parquet file to a CSV written in Go/ Golang

Parquet2CSV Parquet2CSV is a simple tool to convert a Parquet file to a CSV written in Go/ Golang Usage: parquet2csv <PATH_TO_PARQUET_FILE> This will

Nov 3, 2022
tinygo-used-files is a CLI tool that lists only the files to be built as specified by buildtag.

tinygo-used-files is a CLI tool that lists only the files to be built as specified by buildtag.

Feb 6, 2022
Related is a simple cli utility tool to create files or a group of files.

Related - Create files based on individual definitions or groups Related helps with common file-creation-based tasks. You can predefine single types a

Apr 16, 2022
Query, update and convert data structures from the command line. Comparable to jq/yq but supports JSON, TOML, YAML, XML and CSV with zero runtime dependencies.
Query, update and convert data structures from the command line. Comparable to jq/yq but supports JSON, TOML, YAML, XML and CSV with zero runtime dependencies.

dasel Dasel (short for data-selector) allows you to query and modify data structures using selector strings. Comparable to jq / yq, but supports JSON,

Jan 2, 2023
A simple command line for convert CSV in JSON
A simple command line for convert CSV in JSON

C2J A simple command line for convert CSV in JSON list of objects based on header. Install With Go 1.17 or higher: go install github.com/edermanoel94/

Dec 14, 2022
Smart and simple CSV processing on the command line

csvquote smart and simple CSV processing on the command line Are you looking for a way to process CSV data with standard UNIX shell commands? Are you

Dec 30, 2022
TIled map editor CSV export conversion to C array

tiled2c This tool is very simplistic and is a variation of the one written to convert between Tiled map editor CSV exports and a format for the sega s

Nov 28, 2021
Named csv reader for go

Named csv reader Package named csv reader provides methods to easly read csv files and parse columsn to basic types. This package is not a CSV parser,

Nov 28, 2021
CLI filters the contents of the csv file according to the filters from the another file.

filtercsv CLI filters the contents of the csv file according to the filters from the another file. Made to process big files by a lots of filters. By

Dec 2, 2021
From the command line, quickly explore data from a CSV file.
From the command line, quickly explore data from a CSV file.

shallow-explore From the command line, quickly explore data from a CSV file. shallow-explore is a Golang backed command-line tool for iterating over c

Nov 10, 2022
Visualize CSV data in Golang

Golang CSV Viewer Visualize CSV data in Golang Run official example cd official go run . # in another terminal window open http://localhost:8089 bars

Jan 18, 2022
A simple command line functionality to convert your Kaspersky Password Manager exported file to CSV format

A simple command line functionality to convert your Kaspersky Password Manager exported file to CSV format

Apr 20, 2022