DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration

dataframe-go

Dataframes are used for statistics, machine-learning, and data manipulation/exploration. You can think of a Dataframe as an excel spreadsheet. This package is designed to be light-weight and intuitive.

⚠️ The package is production ready but the API is not stable yet. Once stability is reached, version 1.0.0 will be tagged. It is recommended your package manager locks to a commit id instead of the master branch directly. ⚠️

the project to show your appreciation.

Features

  1. Importing from CSV, JSONL, Parquet, MySQL & PostgreSQL
  2. Exporting to CSV, JSONL, Excel, Parquet, MySQL & PostgreSQL
  3. Developer Friendly
  4. Flexible - Create custom Series (custom data types)
  5. Performant
  6. Interoperability with gonum package.
  7. pandas sub-package Help Required
  8. Fake data generation
  9. Interpolation (ForwardFill, BackwardFill, Linear, Spline, Lagrange)
  10. Time-series Forecasting (SES, Holt-Winters)
  11. Math functions
  12. Plotting (cross-platform)

See Tutorial here.

Installation

go get -u github.com/rocketlaunchr/dataframe-go
import dataframe "github.com/rocketlaunchr/dataframe-go"

DataFrames

Creating a DataFrame

s1 := dataframe.NewSeriesInt64("day", nil, 1, 2, 3, 4, 5, 6, 7, 8)
s2 := dataframe.NewSeriesFloat64("sales", nil, 50.3, 23.4, 56.2, nil, nil, 84.2, 72, 89)
df := dataframe.NewDataFrame(s1, s2)

fmt.Print(df.Table())
  
OUTPUT:
+-----+-------+---------+
|     |  DAY  |  SALES  |
+-----+-------+---------+
| 0:  |   1   |  50.3   |
| 1:  |   2   |  23.4   |
| 2:  |   3   |  56.2   |
| 3:  |   4   |   NaN   |
| 4:  |   5   |   NaN   |
| 5:  |   6   |  84.2   |
| 6:  |   7   |   72    |
| 7:  |   8   |   89    |
+-----+-------+---------+
| 8X2 | INT64 | FLOAT64 |
+-----+-------+---------+

Go Playground

Insert and Remove Row

df.Append(nil, 9, 123.6)

df.Append(nil, map[string]interface{}{
	"day":   10,
	"sales": nil,
})

df.Remove(0)

OUTPUT:
+-----+-------+---------+
|     |  DAY  |  SALES  |
+-----+-------+---------+
| 0:  |   2   |  23.4   |
| 1:  |   3   |  56.2   |
| 2:  |   4   |   NaN   |
| 3:  |   5   |   NaN   |
| 4:  |   6   |  84.2   |
| 5:  |   7   |   72    |
| 6:  |   8   |   89    |
| 7:  |   9   |  123.6  |
| 8:  |  10   |   NaN   |
+-----+-------+---------+
| 9X2 | INT64 | FLOAT64 |
+-----+-------+---------+

Go Playground

Update Row

df.UpdateRow(0, nil, map[string]interface{}{
	"day":   3,
	"sales": 45,
})

Sorting

sks := []dataframe.SortKey{
	{Key: "sales", Desc: true},
	{Key: "day", Desc: true},
}

df.Sort(ctx, sks)

OUTPUT:
+-----+-------+---------+
|     |  DAY  |  SALES  |
+-----+-------+---------+
| 0:  |   9   |  123.6  |
| 1:  |   8   |   89    |
| 2:  |   6   |  84.2   |
| 3:  |   7   |   72    |
| 4:  |   3   |  56.2   |
| 5:  |   2   |  23.4   |
| 6:  |  10   |   NaN   |
| 7:  |   5   |   NaN   |
| 8:  |   4   |   NaN   |
+-----+-------+---------+
| 9X2 | INT64 | FLOAT64 |
+-----+-------+---------+

Go Playground

Iterating

You can change the step and starting row. It may be wise to lock the DataFrame before iterating.

The returned value is a map containing the name of the series (string) and the index of the series (int) as keys.

iterator := df.ValuesIterator(dataframe.ValuesOptions{0, 1, true}) // Don't apply read lock because we are write locking from outside.

df.Lock()
for {
	row, vals, _ := iterator()
	if row == nil {
		break
	}
	fmt.Println(*row, vals)
}
df.Unlock()

OUTPUT:
0 map[day:1 0:1 sales:50.3 1:50.3]
1 map[sales:23.4 1:23.4 day:2 0:2]
2 map[day:3 0:3 sales:56.2 1:56.2]
3 map[1:<nil> day:4 0:4 sales:<nil>]
4 map[day:5 0:5 sales:<nil> 1:<nil>]
5 map[sales:84.2 1:84.2 day:6 0:6]
6 map[day:7 0:7 sales:72 1:72]
7 map[day:8 0:8 sales:89 1:89]

Go Playground

Statistics

You can easily calculate statistics for a Series using the gonum or montanaflynn/stats package.

SeriesFloat64 and SeriesTime provide access to the exported Values field to seamlessly interoperate with external math-based packages.

Example

Some series provide easy conversion using the ToSeriesFloat64 method.

import "gonum.org/v1/gonum/stat"

s := dataframe.NewSeriesInt64("random", nil, 1, 2, 3, 4, 5, 6, 7, 8)
sf, _ := s.ToSeriesFloat64(ctx)

Mean

mean := stat.Mean(sf.Values, nil)

Median

import "github.com/montanaflynn/stats"
median, _ := stats.Median(sf.Values)

Standard Deviation

std := stat.StdDev(sf.Values, nil)

Plotting (cross-platform)

import (
	chart "github.com/wcharczuk/go-chart"
	"github.com/rocketlaunchr/dataframe-go/plot"
	wc "github.com/rocketlaunchr/dataframe-go/plot/wcharczuk/go-chart"
)

sales := dataframe.NewSeriesFloat64("sales", nil, 50.3, nil, 23.4, 56.2, 89, 32, 84.2, 72, 89)
cs, _ := wc.S(ctx, sales, nil, nil)

graph := chart.Chart{Series: []chart.Series{cs}}

plt, _ := plot.Open("Monthly sales", 450, 300)
graph.Render(chart.SVG, plt)
plt.Display(plot.None)
<-plt.Closed

Output:

plot

Math Functions

import "github.com/rocketlaunchr/dataframe-go/math/funcs"

res := 24
sx := dataframe.NewSeriesFloat64("x", nil, utils.Float64Seq(1, float64(res), 1))
sy := dataframe.NewSeriesFloat64("y", &dataframe.SeriesInit{Size: res})
df := dataframe.NewDataFrame(sx, sy)

fn := funcs.RegFunc("sin(2*𝜋*x/24)")
funcs.Evaluate(ctx, df, fn, 1)

Go Playground

Output:

sine wave

Importing Data

The imports sub-package has support for importing csv, jsonl, parquet, and directly from a SQL database. The DictateDataType option can be set to specify the true underlying data type. Alternatively, InferDataTypes option can be set.

CSV

csvStr := `
Country,Date,Age,Amount,Id
"United States",2012-02-01,50,112.1,01234
"United States",2012-02-01,32,321.31,54320
"United Kingdom",2012-02-01,17,18.2,12345
"United States",2012-02-01,32,321.31,54320
"United Kingdom",2012-05-07,NA,18.2,12345
"United States",2012-02-01,32,321.31,54320
"United States",2012-02-01,32,321.31,54320
Spain,2012-02-01,66,555.42,00241
`
df, err := imports.LoadFromCSV(ctx, strings.NewReader(csvStr))

OUTPUT:
+-----+----------------+------------+-------+---------+-------+
|     |    COUNTRY     |    DATE    |  AGE  | AMOUNT  |  ID   |
+-----+----------------+------------+-------+---------+-------+
| 0:  | United States  | 2012-02-01 |  50   |  112.1  | 1234  |
| 1:  | United States  | 2012-02-01 |  32   | 321.31  | 54320 |
| 2:  | United Kingdom | 2012-02-01 |  17   |  18.2   | 12345 |
| 3:  | United States  | 2012-02-01 |  32   | 321.31  | 54320 |
| 4:  | United Kingdom | 2015-05-07 |  NaN  |  18.2   | 12345 |
| 5:  | United States  | 2012-02-01 |  32   | 321.31  | 54320 |
| 6:  | United States  | 2012-02-01 |  32   | 321.31  | 54320 |
| 7:  |     Spain      | 2012-02-01 |  66   | 555.42  |  241  |
+-----+----------------+------------+-------+---------+-------+
| 8X5 |     STRING     |    TIME    | INT64 | FLOAT64 | INT64 |
+-----+----------------+------------+-------+---------+-------+

Go Playground

Exporting Data

The exports sub-package has support for exporting to csv, jsonl, parquet, Excel and directly to a SQL database.

Optimizations

  • If you know the number of rows in advance, you can set the capacity of the underlying slice of a series using SeriesInit{}. This will preallocate memory and provide speed improvements.

Generic Series

Out of the box, there is support for string, time.Time, float64 and int64. Automatic support exists for float32 and all types of integers. There is a convenience function provided for dealing with bool. There is also support for complex128 inside the xseries subpackage.

There may be times that you want to use your own custom data types. You can either implement your own Series type (more performant) or use the Generic Series (more convenient).

civil.Date

import "time"
import "cloud.google.com/go/civil"

sg := dataframe.NewSeriesGeneric("date", civil.Date{}, nil, civil.Date{2018, time.May, 01}, civil.Date{2018, time.May, 02}, civil.Date{2018, time.May, 03})
s2 := dataframe.NewSeriesFloat64("sales", nil, 50.3, 23.4, 56.2)

df := dataframe.NewDataFrame(sg, s2)

OUTPUT:
+-----+------------+---------+
|     |    DATE    |  SALES  |
+-----+------------+---------+
| 0:  | 2018-05-01 |  50.3   |
| 1:  | 2018-05-02 |  23.4   |
| 2:  | 2018-05-03 |  56.2   |
+-----+------------+---------+
| 3X2 | CIVIL DATE | FLOAT64 |
+-----+------------+---------+

Tutorial

Create some fake data

Let's create a list of 8 "fake" employees with a name, title and base hourly wage rate.

import "golang.org/x/exp/rand"
import "rocketlaunchr/dataframe-go/utils/faker"

src := rand.NewSource(uint64(time.Now().UTC().UnixNano()))
df := faker.NewDataFrame(8, src, faker.S("name", 0, "Name"), faker.S("title", 0.5, "JobTitle"), faker.S("base rate", 0, "Number", 15, 50))
+-----+----------------+----------------+-----------+
|     |      NAME      |     TITLE      | BASE RATE |
+-----+----------------+----------------+-----------+
| 0:  | Cordia Jacobi  |   Consultant   |    42     |
| 1:  | Nickolas Emard |      NaN       |    22     |
| 2:  | Hollis Dickens | Representative |    22     |
| 3:  | Stacy Dietrich |      NaN       |    43     |
| 4:  |  Aleen Legros  |    Officer     |    21     |
| 5:  |  Adelia Metz   |   Architect    |    18     |
| 6:  | Sunny Gerlach  |      NaN       |    28     |
| 7:  | Austin Hackett |      NaN       |    39     |
+-----+----------------+----------------+-----------+
| 8X3 |     STRING     |     STRING     |   INT64   |
+-----+----------------+----------------+-----------+

Apply Function

Let's give a promotion to everyone by doubling their salary.

s := df.Series[2]

applyFn := dataframe.ApplySeriesFn(func(val interface{}, row, nRows int) interface{} {
	return 2 * val.(int64)
})

dataframe.Apply(ctx, s, applyFn, dataframe.FilterOptions{InPlace: true})
+-----+----------------+----------------+-----------+
|     |      NAME      |     TITLE      | BASE RATE |
+-----+----------------+----------------+-----------+
| 0:  | Cordia Jacobi  |   Consultant   |    84     |
| 1:  | Nickolas Emard |      NaN       |    44     |
| 2:  | Hollis Dickens | Representative |    44     |
| 3:  | Stacy Dietrich |      NaN       |    86     |
| 4:  |  Aleen Legros  |    Officer     |    42     |
| 5:  |  Adelia Metz   |   Architect    |    36     |
| 6:  | Sunny Gerlach  |      NaN       |    56     |
| 7:  | Austin Hackett |      NaN       |    78     |
+-----+----------------+----------------+-----------+
| 8X3 |     STRING     |     STRING     |   INT64   |
+-----+----------------+----------------+-----------+

Create a Time series

Let's inform all employees separately on sequential days.

import "rocketlaunchr/dataframe-go/utils/utime"

mts, _ := utime.NewSeriesTime(ctx, "meeting time", "1D", time.Now().UTC(), false, utime.NewSeriesTimeOptions{Size: &[]int{8}[0]})
df.AddSeries(mts, nil)
+-----+----------------+----------------+-----------+--------------------------------+
|     |      NAME      |     TITLE      | BASE RATE |          MEETING TIME          |
+-----+----------------+----------------+-----------+--------------------------------+
| 0:  | Cordia Jacobi  |   Consultant   |    84     |   2020-02-02 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 1:  | Nickolas Emard |      NaN       |    44     |   2020-02-03 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 2:  | Hollis Dickens | Representative |    44     |   2020-02-04 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 3:  | Stacy Dietrich |      NaN       |    86     |   2020-02-05 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 4:  |  Aleen Legros  |    Officer     |    42     |   2020-02-06 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 5:  |  Adelia Metz   |   Architect    |    36     |   2020-02-07 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 6:  | Sunny Gerlach  |      NaN       |    56     |   2020-02-08 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 7:  | Austin Hackett |      NaN       |    78     |   2020-02-09 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
+-----+----------------+----------------+-----------+--------------------------------+
| 8X4 |     STRING     |     STRING     |   INT64   |              TIME              |
+-----+----------------+----------------+-----------+--------------------------------+

Filtering

Let's filter out our senior employees (they have titles) for no reason.

filterFn := dataframe.FilterDataFrameFn(func(vals map[interface{}]interface{}, row, nRows int) (dataframe.FilterAction, error) {
	if vals["title"] == nil {
		return dataframe.DROP, nil
	}
	return dataframe.KEEP, nil
})

seniors, _ := dataframe.Filter(ctx, df, filterFn)
+-----+----------------+----------------+-----------+--------------------------------+
|     |      NAME      |     TITLE      | BASE RATE |          MEETING TIME          |
+-----+----------------+----------------+-----------+--------------------------------+
| 0:  | Cordia Jacobi  |   Consultant   |    84     |   2020-02-02 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 1:  | Hollis Dickens | Representative |    44     |   2020-02-04 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 2:  |  Aleen Legros  |    Officer     |    42     |   2020-02-06 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 3:  |  Adelia Metz   |   Architect    |    36     |   2020-02-07 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
+-----+----------------+----------------+-----------+--------------------------------+
| 4X4 |     STRING     |     STRING     |   INT64   |              TIME              |
+-----+----------------+----------------+-----------+--------------------------------+

Other useful packages

  • awesome-svelte - Resources for killing react
  • dbq - Zero boilerplate database operations for Go
  • electron-alert - SweetAlert2 for Electron Applications
  • google-search - Scrape google search results
  • igo - A Go transpiler with cool new syntax such as fordefer (defer for for-loops)
  • mysql-go - Properly cancel slow MySQL queries
  • react - Build front end applications using Go
  • remember-go - Cache slow database queries
  • testing-go - Testing framework for unit testing

Legal Information

The license is a modified MIT license. Refer to LICENSE file for more details.

© 2018-21 PJ Engineering and Business Solutions Pty. Ltd.

Comments
  • HOw to use CSVLoadOptions ?

    HOw to use CSVLoadOptions ?

    hi i have one csv fiile ,has four fields [USERID ,MOVIEID,RATING, TIMESTAMP) ,LoadFromCSV default load all fields data type are string ,I want to change it with float64 when load init ,so I create CSVLoadOptions var csvOp imports.CSVLoadOptions csvOp.DictateDataType =make(map[string]interface{}) csvOp.DictateDataType["USERID"]= float64(0) csvOp.DictateDataType["MOVIEID"]=float64(0) csvOp.DictateDataType["RATING"]=float64(0) csvOp.DictateDataType["TIMESTAMP"]=float64(0)

    ratingDf, err := imports.LoadFromCSV(ctx, file,csvOp)
    

    but has load error ,I dont know why ,is use the CSVLoadOptions is not correct ?

  • Getting dataframe.ApplySeriesFn undefined error

    Getting dataframe.ApplySeriesFn undefined error

    Thanks for creating this library!

    I can get this code to work:

    ctx := context.TODO()
    
    // step 1: open the csv
    csvfile, err := os.Open("data/example.csv")
    if err != nil {
    	log.Fatal(err)
    }
    
    dataframe, err := imports.LoadFromCSV(ctx, csvfile)
    

    Here's the data that's printed:

    fmt.Print(dataframe.Table())
    
    +-----+------------+-----------------+
    |     | FIRST NAME | FAVORITE NUMBER |
    +-----+------------+-----------------+
    | 0:  |  matthew   |       23        |
    | 1:  |   daniel   |        8        |
    | 2:  |  allison   |       42        |
    | 3:  |   david    |       18        |
    +-----+------------+-----------------+
    | 4X2 |   STRING   |     STRING      |
    +-----+------------+-----------------+
    

    I cannot get this code working:

    s := dataframe.Series[2]
    
    applyFn := dataframe.ApplySeriesFn(func(val interface{}, row, nRows int) interface{} {
    	return 2 * val.(int64)
    })
    
    dataframe.Apply(ctx, s, applyFn, dataframe.FilterOptions{InPlace: true})
    
    fmt.Print(dataframe.Table())
    

    Here's the error message:

    ./dataframe_go.go:36:22: dataframe.ApplySeriesFn undefined (type *dataframe.DataFrame has no field or method ApplySeriesFn)
    ./dataframe_go.go:40:11: dataframe.Apply undefined (type *dataframe.DataFrame has no field or method Apply)
    ./dataframe_go.go:40:44: dataframe.FilterOptions undefined (type *dataframe.DataFrame has no field or method FilterOptions)
    

    Here's the code: https://github.com/MrPowers/go-dataframe-examples/blob/master/dataframe_go.go

    Sorry if this is a basic question. I am a Go newbie!

    Thanks again for making this library!

  • Reading from Parquet

    Reading from Parquet

    Hello,

    Are there any plans to support reading a Parquet file into a dataframe? I have a need for this and am evaluating this library to use in an application.

    Thanks!

  • Expand docs to include other common dataframe operations, etc.

    Expand docs to include other common dataframe operations, etc.

    Greetings!

    Just a minor suggestion, but if you have the time, it could be useful to expand the docs a bit more to cover some additional common operations applied to dataframe-like structures, where supported.

    For example:

    • retrieving a single row
    • retrieving a single column
    • selecting row/column subsets by indices or ranges
    • selecting a single value by <row, column> indices

    Further, one other thing I noticed when employing the package for the first time, is that many of the dataframe.xx() function calls include a nil as the first argument.

    From looking at the code for dataframe.go, these appear to be relating to an optional Options struct, so it makes sense that this would be set to nil in many instances. It may just be worth mentioning this explicitly in the examples for .Append() in the docs.

    Finally two other things that could be useful to consider including in the docs:

    • Limitations compared with R/pandas
    • Cheatsheet of commands comparing dataframe-go with R/pandas (more effort, and probably better suited for a separate wiki page, etc., but would be really useful for people coming from these worlds..)

    Thanks for taking the time to put together and share this really useful package!

  • how to get dataframe all  data convert to  gonum  dense matrix ?

    how to get dataframe all data convert to gonum dense matrix ?

    I want to use it ,but I found some problem ,You make the property about SeriesInt64 values private !!!,why ?

    would you like tell how to convert dataframe to gonum dense matrix ?

    and how to use LoadFromCSV(ctx,strings.NewReader(csvStr)),which ctx ,how to define the context.Context

  • Draw graphs from columns of dataframe

    Draw graphs from columns of dataframe

    Hi! At the moment I have managed to plot a separate dataframe column by this strange method:

    func main() {
            // all values of df are strings representing floating point numbers
            df := df, err := imports.LoadFromCSV(ctx, r, imports.CSVLoadOptions{Comma: ';'})
    	s := df.Series[2] // trying to plot column 2
    	series := dataframe.NewSeriesFloat64("test_name", nil, nil)
    
    	i := s.ValuesIterator(dataframe.ValuesOptions{InitialRow: 0, Step: 1, DontReadLock: false})
    	for {
    		row, vals, _ := i()
    		if row == nil {
    			break
    		}
    		val, err := strconv.ParseFloat(vals.(string), 64)
    		if err != nil {
    			continue
    		}
    		series.Append(val)
    	}
    	Plot(series)
    }
    
    func Plot(ser *dataframe.SeriesFloat64) {
    	ctx := context.TODO()
    	cs, _ := wcharczuk_chart.S(ctx, ser, nil, nil)
    	graph := chart.Chart{
    		Title:  "test_graph",
    		Width:  640,
    		Height: 480,
    		Series: []chart.Series{cs},
    	}
    	f, err := os.Create("graph.svg")
    	if err != nil {
    		panic(err)
    	}
    	defer f.Close()
    
    	plt := bufio.NewWriter(f)
    	_ = graph.Render(chart.SVG, plt)
    }
    

    Is there any simplier or more elegant method to do this job? And another question is if I can plot several columns on one plot? And if it is possible, how can I do this? Thanks in advance.

  • LoadFromJSON Not Working

    LoadFromJSON Not Working

    files, err := ioutil.ReadFile("device.json") if err != nil { fmt.Println(err) }

    var ctx = context.Background()
    df2, _ := imports.LoadFromJSON(ctx, strings.NewReader(string(files)))
    
    fmt.Println(df2.Table())
    
  • Add support for CSV without headers row

    Add support for CSV without headers row

    This simply adds the support to import CSV files without a headers row.

    In case the ColumnNames options is specified, it uses it to set the series names, instead of reading the first row.

    It moves the if row == 0 { outside the for loop to avoid to do the check for each row read.

  • Inconsistent behavior for Apply when using with ApplyDataFrameFn

    Inconsistent behavior for Apply when using with ApplyDataFrameFn

    I'm trying to concatenate two columns in a dataframe and put it into a new column. The behavior is very inconsistent. Sometimes the strings are concatenated into the new column. Sometimes the value is just set to NaN.

    In this run, the value for concat_contact_number in the resulting dataframe was correctly set to 97312345678. The map value for concat_contact_number also reflects the concatenated value.

    Expected output:

    $ go run main.go 
    INFO[0000] In applyConcatDf: vals[contact_number_country_code]: 973 
    INFO[0000] In applyConcatDf: vals[concat_contact_number]: 973 
    INFO[0000] In applyConcatDf: vals[contact_number]: 12345678 
    INFO[0000] In applyConcatDf: vals[concat_contact_number]: 97312345678 
    INFO[0000] In applyConcatDf: vals: map[0:973 1:12345678 2:<nil> concat_contact_number:97312345678 contact_number:12345678 contact_number_country_code:973] 
    INFO[0000] In prepareDataframe:                         
    INFO[0000] +-----+-----------------------------+----------------+-----------------------+
    |     | CONTACT NUMBER COUNTRY CODE | CONTACT NUMBER | CONCAT CONTACT NUMBER |
    +-----+-----------------------------+----------------+-----------------------+
    | 0:  |             973             |    12345678    |      97312345678      |
    +-----+-----------------------------+----------------+-----------------------+
    | 1X3 |           STRING            |     STRING     |        STRING         |
    +-----+-----------------------------+----------------+-----------------------+ 
    INFO[0000] In main:                                     
    INFO[0000] +-----+-----------------------------+----------------+-----------------------+
    |     | CONTACT NUMBER COUNTRY CODE | CONTACT NUMBER | CONCAT CONTACT NUMBER |
    +-----+-----------------------------+----------------+-----------------------+
    | 0:  |             973             |    12345678    |      97312345678      |
    +-----+-----------------------------+----------------+-----------------------+
    | 1X3 |           STRING            |     STRING     |        STRING         |
    +-----+-----------------------------+----------------+-----------------------+ 
    

    In this run, the value for concat_contact_number in the resulting dataframe was incorrectly set to NaN. Same as with the correct run, the map value for concat_contact_number is also set to the expected concatenated value.

    Erroneous output:

    $ go run main.go 
    INFO[0000] In applyConcatDf: vals[contact_number_country_code]: 973 
    INFO[0000] In applyConcatDf: vals[concat_contact_number]: 973 
    INFO[0000] In applyConcatDf: vals[contact_number]: 12345678 
    INFO[0000] In applyConcatDf: vals[concat_contact_number]: 97312345678 
    INFO[0000] In applyConcatDf: vals: map[0:973 1:12345678 2:<nil> concat_contact_number:97312345678 contact_number:12345678 contact_number_country_code:973] 
    INFO[0000] In prepareDataframe:                         
    INFO[0000] +-----+-----------------------------+----------------+-----------------------+
    |     | CONTACT NUMBER COUNTRY CODE | CONTACT NUMBER | CONCAT CONTACT NUMBER |
    +-----+-----------------------------+----------------+-----------------------+
    | 0:  |             973             |    12345678    |          NaN          |
    +-----+-----------------------------+----------------+-----------------------+
    | 1X3 |           STRING            |     STRING     |        STRING         |
    +-----+-----------------------------+----------------+-----------------------+ 
    INFO[0000] In main:                                     
    INFO[0000] +-----+-----------------------------+----------------+-----------------------+
    |     | CONTACT NUMBER COUNTRY CODE | CONTACT NUMBER | CONCAT CONTACT NUMBER |
    +-----+-----------------------------+----------------+-----------------------+
    | 0:  |             973             |    12345678    |          NaN          |
    +-----+-----------------------------+----------------+-----------------------+
    | 1X3 |           STRING            |     STRING     |        STRING         |
    +-----+-----------------------------+----------------+-----------------------+ 
    

    It can be observed that in both cases the map value for 2 is always <nil>. Is this expected?

    Run this code several times to see deviances in the output. The issue may not show up immediately. Sometimes it takes 10x runs, sometimes only 2x run. Again the behavior is inconsistent.

    Working code:

    package main
    
    import (
    	"context"
    	"fmt"
    	"strings"
    
    	dataframe "github.com/rocketlaunchr/dataframe-go"
    	"github.com/rocketlaunchr/dataframe-go/imports"
    	log "github.com/sirupsen/logrus"
    )
    
    // applyConcatDf returns an ApplyDataFrameFn that concatenates the given column names into another column
    func applyConcatDf(dest_column string, columns []string) dataframe.ApplyDataFrameFn {
    	return func(vals map[interface{}]interface{}, row, nRows int) map[interface{}]interface{} {
    		vals[dest_column] = ""
    		for _, key := range columns {
    			log.Infof("vals[%s]: %s", key, vals[key].(string))
    			vals[dest_column] = vals[dest_column].(string) + vals[key].(string)
    			log.Infof("vals[%s]: %s", dest_column, vals[dest_column].(string))
    		}
    
    		log.Infof("vals: %v", vals)
    		return vals
    	}
    }
    
    // applySetupDataframe initializes the dataframe from a CSV string
    func setupDataframe() *dataframe.DataFrame {
    	ctx := context.Background()
    
    	csvStr := `contact_number_country_code,contact_number
    "973","12345678"`
    
    	df, _ := imports.LoadFromCSV(ctx, strings.NewReader(csvStr), imports.CSVLoadOptions{
    		DictateDataType: map[string]interface{}{
    			"contact_number_country_code": "",
    			"contact_number":              "",
    		},
    	})
    
    	return df
    }
    
    // prepareDataframe applies the concatenation on the loaded dataframe
    func prepareDataframe(df *dataframe.DataFrame) {
    	ctx := context.Background()
    
    	sConcatContactNumber := dataframe.NewSeriesString("concat_contact_number", &dataframe.SeriesInit{Size: df.NRows()})
    	df.AddSeries(sConcatContactNumber, nil)
    
    	_, err := dataframe.Apply(ctx, df, applyConcatDf("concat_contact_number", []string{"contact_number_country_code", "contact_number"}), dataframe.FilterOptions{InPlace: true})
    
    	if err != nil {
    		log.WithError(err).Error("concatenation cannot be applied")
    	}
    
    	fmt.Println(df)
    }
    
    func main() {
    	df := setupDataframe()
    	prepareDataframe(df)
    	fmt.Println(df)
    }
    
  • Getting back Float64/Int64/Mixed series from dataframe

    Getting back Float64/Int64/Mixed series from dataframe

    I wanted to know if there is a way to convert a series interface to get the original type of series (Float64/Int64/Mixed) underneath it. I will describe mu use case.

    After creating a dataframe, I am trying to use gonum to do some analysis. For eg. linear regression of two series from dataframe. But for this I have to iterate over the whole series(using ValuesIterator) to get back each element into a []float64, which is required by gonum. ToSeriesFloat64 does not help since it is not implemented by Series.

    Is there an easier way to access the whole underlying series into into corresponding concrete slice?

  • Potential collision and risk from indirect dependence

    Potential collision and risk from indirect dependence "github.com/gotestyourself/gotestyourself"

    Background

    Repo rocketlaunchr/dataframe-go used the old path to import gotestyourself indirectly. This caused that github.com/gotestyourself/gotestyourself and gotest.tools coexist in this repo: https://github.com/rocketlaunchr/dataframe-go/blob/master/go.mod (Line 20 & 40)

    github.com/gotestyourself/gotestyourself v2.2.0+incompatible // indirect
    gotest.tools v2.2.0+incompatible // indirect 
    

    That’s because the gotestyourself has already renamed it’s import path from "github.com/gotestyourself/gotestyourself" to "gotest.tools". When you use the old path "github.com/gotestyourself/gotestyourself" to import the gotestyourself, will reintroduces gotestyourself through the import statements "import gotest.tools" in the go source file of gotestyourself.

    https://github.com/gotestyourself/gotest.tools/blob/v2.2.0/fs/example_test.go#L8

    package fs_test
    import (
    	…
    	"gotest.tools/assert"
    	"gotest.tools/assert/cmp"
    	"gotest.tools/fs"
    	"gotest.tools/golden"
    )
    

    "github.com/gotestyourself/gotestyourself" and "gotest.tools" are the same repos. This will work in isolation, bring about potential risks and problems.

    Solution

    Add replace statement in the go.mod file:

    replace github.com/gotestyourself/gotestyourself => gotest.tools v2.2.0
    

    Then clean the go.mod.

  • Progress for re-write of dataframe-go?

    Progress for re-write of dataframe-go?

    It's written in the README file that "Once Go 1.18 (Generics) is introduced, the ENTIRE package will be rewritten.", As Go 1.18 has been released for a while, I'm wondering if work has started on re-writing of the entire package. If so, how's the progress?

  • Indirect dependency `github.com/blend/go-sdk v1.1.1` does not exist

    Indirect dependency `github.com/blend/go-sdk v1.1.1` does not exist

    I suspect that the library maintainers prepended "legacy-" to versions before changing the versioning scheme. At the least, this dependency should be updated to legacy-v1.1.1.

  • Error to read parquet with latest parquet-go

    Error to read parquet with latest parquet-go

    1. Create a file with python pandas
    dataframe = pandas.DataFrame({
            "A": ["a", "b", "c", "d"],
            "B": [2, 3, 4, 1],
            "C": [10, 20, None, None]
        })
    
    dataframe.to_parquet("1.parquet")
    

    This file looks like: image

    1. Read this file
    func main() {
        ctx := context.Background()
        fr, _ := local.NewLocalFileReader("1.parquet")
        df, err := imports.LoadFromParquet(ctx, fr)
        if err != nil {
            panic(err)
        }
        fmt.Println(df)
    }
    
    1. Got a unique name error
    panic: names of series must be unique: 
    
    goroutine 1 [running]:
    github.com/rocketlaunchr/dataframe-go.NewDataFrame({0xc0001f8000, 0x3, 0xc000149a10?})
            .../rocketlaunchr/[email protected]/dataframe.go:41 +0x33c
    github.com/rocketlaunchr/dataframe-go/imports.LoadFromParquet({0x1497868, 0xc000020080}, {0x1498150?, 0xc00000e798?}, {0xc0000021a0?, 0xc000149f70?, 0x1007599?})
            .../go/pkg/mod/github.com/rocketlaunchr/[email protected]/imports/parquet.go:110 +0x8ae
    main.main()
            .../main.go:13 +0x78
    
    1. Following the stack, I found some useful informations
    • All series in method imports.LoadFromParquet with empty names
    image
    • goFieldNameToActual each keys in this map with prefix "Scheme", but goName didn't, may be it's the reason why can't not find a name from this map
    image image

    This's the first time I use golang to read parquet files. It is an error cause by parquet-go breaking changes or something else ?

  • Bad import, was an upstream dependency deleted?

    Bad import, was an upstream dependency deleted?

    go: github.com/sjwhitworth/[email protected] requires
            github.com/rocketlaunchr/[email protected] requires
            github.com/blend/[email protected]: reading github.com/blend/go-sdk/go.mod at revision v1.1.1: unknown revision v1.1.1
    

    It looks like v1.1.1 of github.com/blend/go-sdk is missing. Are you seeing the same or am I taking crazy pills today?

记录算法学习和LeetCode、LintCode、codewars的学习路程。A record of algorithm learning.

Problem List Leetcode、LintCode、Codewars Algorithm problem solution written by golang. LeetCode id Name(Github) Name(Gitee) 00001 TwoSum TwoSum 00003 L

Nov 3, 2021
Learning Golang Language In Clean Structure

Learning Golang Language In Clean Structure At this example project, I'm trying to learn Golang with Clean structure and come up with a reusable, nice

Sep 25, 2022
Finite State Machine for Go

FSM for Go FSM is a finite state machine for Go. It is heavily based on two FSM implementations: Javascript Finite State Machine, https://github.com/j

Dec 27, 2022
Graphoscope: a solution to access multiple independent data sources from a common UI and show data relations as a graph
Graphoscope: a solution to access multiple independent data sources from a common UI and show data relations as a graph

Graphoscope A solution to access multiple independent data sources from a common UI and show data relations as a graph: Contains a list of by default

May 26, 2022
Dasel - Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool.
Dasel - Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool.

Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.

Jan 1, 2023
Data structure and algorithm library for go, designed to provide functions similar to C++ STL

GoSTL English | 简体中文 Introduction GoSTL is a data structure and algorithm library for go, designed to provide functions similar to C++ STL, but more p

Dec 26, 2022
Data structure and relevant algorithms for extremely fast prefix/fuzzy string searching.

Trie Data structure and relevant algorithms for extremely fast prefix/fuzzy string searching. Usage Create a Trie with: t := trie.New() Add Keys with:

Dec 27, 2022
Graph algorithms and data structures
Graph algorithms and data structures

Your basic graph Golang library of basic graph algorithms Topological ordering, image by David Eppstein, CC0 1.0. This library offers efficient and we

Jan 2, 2023
Graph algorithms and data structures
Graph algorithms and data structures

Your basic graph Golang library of basic graph algorithms Topological ordering, image by David Eppstein, CC0 1.0. This library offers efficient and we

Jan 25, 2021
Data Structure Libraries and Algorithms implementation

Algorithms Data Structure Libraries and Algorithms implementation in C++ Disclaimer This repository is meant to be used as a reference to learn data s

Dec 8, 2022
golang sorting algorithm and data construction.

data-structures-questions 算法和程序结构是我们学习编程的基础,但是很多的时候,我们很多都是只是在应用,而没有深入的去研究这些,所以自己也在不断的思考和探索,然后分析,学习,总结自己学习的过程,希望可以和大家一起学习和交流下算法! 目录 网络协议 数据结构 算法 数据库 Go

Dec 17, 2022
low level data type and utils in Golang.

low low level data type and utils in Golang. A stable low level function set is the basis of a robust architecture. It focuses on stability and requir

Dec 24, 2022
Algorithms and Data Structures Solved in Golang

Algorithms and Data Structures Solved in Golang Hi! I'm Bruno Melo and this repository contains a lot challenges solved on many plataforms using go as

Oct 20, 2022
Some data structures and algorithms using golang

Some data structures and algorithms using golang

Aug 13, 2022
Data structures and algorithms implementation from ZeroToMastery course

ZeroToMastery Data Structures & Algorithms course This repo includes all the data structure and algorithm exercises solutions and implementations. Ins

Jul 4, 2022
Practice-dsa-go - Data Structures and Algorithms for Interview Preparation in Go

Data Structures and Algorithms for Interview Preparation in Go Data Structures K

Jul 3, 2022
Implementation of various data structures and algorithms in Go
Implementation of various data structures and algorithms in Go

GoDS (Go Data Structures) Implementation of various data structures and algorithms in Go. Data Structures Containers Lists ArrayList SinglyLinkedList

Jan 25, 2022
Tutorial code for my video Learn to Use Basic Data Structures - Slices, Structs and Maps in Golang

Learn to Use Basic Data Structures - Slices, Structs and Maps in Golang Read text from a file and split into words. Introduction to slices / lists. Co

Jan 26, 2022
A Go implementation of an in-memory bloom filter, with support for boltdb and badgerdb as optional data persistent storage.

Sprout A bloom filter is a probabilistic data structure that is used to determine if an element is present in a set. Bloom filters are fast and space

Jul 4, 2022