DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration

dataframe-go

Dataframes are used for statistics, machine-learning, and data manipulation/exploration. You can think of a Dataframe as an excel spreadsheet. This package is designed to be light-weight and intuitive.

⚠️ The package is production ready but the API is not stable yet. Once stability is reached, version 1.0.0 will be tagged. It is recommended your package manager locks to a commit id instead of the master branch directly. ⚠️

the project to show your appreciation.

Features

  1. Importing from CSV, JSONL, MySQL & PostgreSQL
  2. Exporting to CSV, JSONL, Excel, Parquet, MySQL & PostgreSQL
  3. Developer Friendly
  4. Flexible - Create custom Series (custom data types)
  5. Performant
  6. Interoperability with gonum package.
  7. pandas sub-package Help Required
  8. Fake data generation
  9. Interpolation (ForwardFill, BackwardFill, Linear, Spline, Lagrange)
  10. Time-series Forecasting (SES, Holt-Winters)
  11. Math functions
  12. Plotting (cross-platform)

See Tutorial here.

Installation

go get -u github.com/rocketlaunchr/dataframe-go
import dataframe "github.com/rocketlaunchr/dataframe-go"

DataFrames

Creating a DataFrame

s1 := dataframe.NewSeriesInt64("day", nil, 1, 2, 3, 4, 5, 6, 7, 8)
s2 := dataframe.NewSeriesFloat64("sales", nil, 50.3, 23.4, 56.2, nil, nil, 84.2, 72, 89)
df := dataframe.NewDataFrame(s1, s2)

fmt.Print(df.Table())
  
OUTPUT:
+-----+-------+---------+
|     |  DAY  |  SALES  |
+-----+-------+---------+
| 0:  |   1   |  50.3   |
| 1:  |   2   |  23.4   |
| 2:  |   3   |  56.2   |
| 3:  |   4   |   NaN   |
| 4:  |   5   |   NaN   |
| 5:  |   6   |  84.2   |
| 6:  |   7   |   72    |
| 7:  |   8   |   89    |
+-----+-------+---------+
| 8X2 | INT64 | FLOAT64 |
+-----+-------+---------+

Go Playground

Insert and Remove Row

df.Append(nil, 9, 123.6)

df.Append(nil, map[string]interface{}{
	"day":   10,
	"sales": nil,
})

df.Remove(0)

OUTPUT:
+-----+-------+---------+
|     |  DAY  |  SALES  |
+-----+-------+---------+
| 0:  |   2   |  23.4   |
| 1:  |   3   |  56.2   |
| 2:  |   4   |   NaN   |
| 3:  |   5   |   NaN   |
| 4:  |   6   |  84.2   |
| 5:  |   7   |   72    |
| 6:  |   8   |   89    |
| 7:  |   9   |  123.6  |
| 8:  |  10   |   NaN   |
+-----+-------+---------+
| 9X2 | INT64 | FLOAT64 |
+-----+-------+---------+

Go Playground

Update Row

df.UpdateRow(0, nil, map[string]interface{}{
	"day":   3,
	"sales": 45,
})

Sorting

sks := []dataframe.SortKey{
	{Key: "sales", Desc: true},
	{Key: "day", Desc: true},
}

df.Sort(ctx, sks)

OUTPUT:
+-----+-------+---------+
|     |  DAY  |  SALES  |
+-----+-------+---------+
| 0:  |   9   |  123.6  |
| 1:  |   8   |   89    |
| 2:  |   6   |  84.2   |
| 3:  |   7   |   72    |
| 4:  |   3   |  56.2   |
| 5:  |   2   |  23.4   |
| 6:  |  10   |   NaN   |
| 7:  |   5   |   NaN   |
| 8:  |   4   |   NaN   |
+-----+-------+---------+
| 9X2 | INT64 | FLOAT64 |
+-----+-------+---------+

Go Playground

Iterating

You can change the step and starting row. It may be wise to lock the DataFrame before iterating.

The returned value is a map containing the name of the series (string) and the index of the series (int) as keys.

iterator := df.ValuesIterator(dataframe.ValuesOptions{0, 1, true}) // Don't apply read lock because we are write locking from outside.

df.Lock()
for {
	row, vals, _ := iterator()
	if row == nil {
		break
	}
	fmt.Println(*row, vals)
}
df.Unlock()

OUTPUT:
0 map[day:1 0:1 sales:50.3 1:50.3]
1 map[sales:23.4 1:23.4 day:2 0:2]
2 map[day:3 0:3 sales:56.2 1:56.2]
3 map[1:<nil> day:4 0:4 sales:<nil>]
4 map[day:5 0:5 sales:<nil> 1:<nil>]
5 map[sales:84.2 1:84.2 day:6 0:6]
6 map[day:7 0:7 sales:72 1:72]
7 map[day:8 0:8 sales:89 1:89]

Go Playground

Statistics

You can easily calculate statistics for a Series using the gonum or montanaflynn/stats package.

SeriesFloat64 and SeriesTime provide access to the exported Values field to seamlessly interoperate with external math-based packages.

Example

Some series provide easy conversion using the ToSeriesFloat64 method.

import "gonum.org/v1/gonum/stat"

s := dataframe.NewSeriesInt64("random", nil, 1, 2, 3, 4, 5, 6, 7, 8)
sf, _ := s.ToSeriesFloat64(ctx)

Mean

mean := stat.Mean(sf.Values, nil)

Median

import "github.com/montanaflynn/stats"
median, _ := stats.Median(sf.Values)

Standard Deviation

std := stat.StdDev(sf.Values, nil)

Plotting (cross-platform)

import (
	chart "github.com/wcharczuk/go-chart"
	"github.com/rocketlaunchr/dataframe-go/plot"
	wc "github.com/rocketlaunchr/dataframe-go/plot/wcharczuk/go-chart"
)

sales := dataframe.NewSeriesFloat64("sales", nil, 50.3, nil, 23.4, 56.2, 89, 32, 84.2, 72, 89)
cs, _ := wc.S(ctx, sales, nil, nil)

graph := chart.Chart{Series: []chart.Series{cs}}

plt, _ := plot.Open("Monthly sales", 450, 300)
graph.Render(chart.SVG, plt)
plt.Display(plot.None)
<-plt.Closed

Output:

plot

Math Functions

import "github.com/rocketlaunchr/dataframe-go/math/funcs"

res := 24
sx := dataframe.NewSeriesFloat64("x", nil, utils.Float64Seq(1, float64(res), 1))
sy := dataframe.NewSeriesFloat64("y", &dataframe.SeriesInit{Size: res})
df := dataframe.NewDataFrame(sx, sy)

fn := funcs.RegFunc("sin(2*𝜋*x/24)")
funcs.Evaluate(ctx, df, fn, 1)

Go Playground

Output:

sine wave

Importing Data

The imports sub-package has support for importing csv, jsonl and directly from a SQL database. The DictateDataType option can be set to specify the true underlying data type. Alternatively, InferDataTypes option can be set.

CSV

csvStr := `
Country,Date,Age,Amount,Id
"United States",2012-02-01,50,112.1,01234
"United States",2012-02-01,32,321.31,54320
"United Kingdom",2012-02-01,17,18.2,12345
"United States",2012-02-01,32,321.31,54320
"United Kingdom",2012-05-07,NA,18.2,12345
"United States",2012-02-01,32,321.31,54320
"United States",2012-02-01,32,321.31,54320
Spain,2012-02-01,66,555.42,00241
`
df, err := imports.LoadFromCSV(ctx, strings.NewReader(csvStr))

OUTPUT:
+-----+----------------+------------+-------+---------+-------+
|     |    COUNTRY     |    DATE    |  AGE  | AMOUNT  |  ID   |
+-----+----------------+------------+-------+---------+-------+
| 0:  | United States  | 2012-02-01 |  50   |  112.1  | 1234  |
| 1:  | United States  | 2012-02-01 |  32   | 321.31  | 54320 |
| 2:  | United Kingdom | 2012-02-01 |  17   |  18.2   | 12345 |
| 3:  | United States  | 2012-02-01 |  32   | 321.31  | 54320 |
| 4:  | United Kingdom | 2015-05-07 |  NaN  |  18.2   | 12345 |
| 5:  | United States  | 2012-02-01 |  32   | 321.31  | 54320 |
| 6:  | United States  | 2012-02-01 |  32   | 321.31  | 54320 |
| 7:  |     Spain      | 2012-02-01 |  66   | 555.42  |  241  |
+-----+----------------+------------+-------+---------+-------+
| 8X5 |     STRING     |    TIME    | INT64 | FLOAT64 | INT64 |
+-----+----------------+------------+-------+---------+-------+

Go Playground

Exporting Data

The exports sub-package has support for exporting to csv, jsonl, parquet, Excel and directly to a SQL database.

Optimizations

  • If you know the number of rows in advance, you can set the capacity of the underlying slice of a series using SeriesInit{}. This will preallocate memory and provide speed improvements.

Generic Series

Out of the box, there is support for string, time.Time, float64 and int64. Automatic support exists for float32 and all types of integers. There is a convenience function provided for dealing with bool. There is also support for complex128 inside the xseries subpackage.

There may be times that you want to use your own custom data types. You can either implement your own Series type (more performant) or use the Generic Series (more convenient).

civil.Date

import "time"
import "cloud.google.com/go/civil"

sg := dataframe.NewSeriesGeneric("date", civil.Date{}, nil, civil.Date{2018, time.May, 01}, civil.Date{2018, time.May, 02}, civil.Date{2018, time.May, 03})
s2 := dataframe.NewSeriesFloat64("sales", nil, 50.3, 23.4, 56.2)

df := dataframe.NewDataFrame(sg, s2)

OUTPUT:
+-----+------------+---------+
|     |    DATE    |  SALES  |
+-----+------------+---------+
| 0:  | 2018-05-01 |  50.3   |
| 1:  | 2018-05-02 |  23.4   |
| 2:  | 2018-05-03 |  56.2   |
+-----+------------+---------+
| 3X2 | CIVIL DATE | FLOAT64 |
+-----+------------+---------+

Tutorial

Create some fake data

Let's create a list of 8 "fake" employees with a name, title and base hourly wage rate.

import "golang.org/x/exp/rand"
import "rocketlaunchr/dataframe-go/utils/faker"

src := rand.NewSource(uint64(time.Now().UTC().UnixNano()))
df := faker.NewDataFrame(8, src, faker.S("name", 0, "Name"), faker.S("title", 0.5, "JobTitle"), faker.S("base rate", 0, "Number", 15, 50))
+-----+----------------+----------------+-----------+
|     |      NAME      |     TITLE      | BASE RATE |
+-----+----------------+----------------+-----------+
| 0:  | Cordia Jacobi  |   Consultant   |    42     |
| 1:  | Nickolas Emard |      NaN       |    22     |
| 2:  | Hollis Dickens | Representative |    22     |
| 3:  | Stacy Dietrich |      NaN       |    43     |
| 4:  |  Aleen Legros  |    Officer     |    21     |
| 5:  |  Adelia Metz   |   Architect    |    18     |
| 6:  | Sunny Gerlach  |      NaN       |    28     |
| 7:  | Austin Hackett |      NaN       |    39     |
+-----+----------------+----------------+-----------+
| 8X3 |     STRING     |     STRING     |   INT64   |
+-----+----------------+----------------+-----------+

Apply Function

Let's give a promotion to everyone by doubling their salary.

s := df.Series[2]

applyFn := dataframe.ApplySeriesFn(func(val interface{}, row, nRows int) interface{} {
	return 2 * val.(int64)
})

dataframe.Apply(ctx, s, applyFn, dataframe.FilterOptions{InPlace: true})
+-----+----------------+----------------+-----------+
|     |      NAME      |     TITLE      | BASE RATE |
+-----+----------------+----------------+-----------+
| 0:  | Cordia Jacobi  |   Consultant   |    84     |
| 1:  | Nickolas Emard |      NaN       |    44     |
| 2:  | Hollis Dickens | Representative |    44     |
| 3:  | Stacy Dietrich |      NaN       |    86     |
| 4:  |  Aleen Legros  |    Officer     |    42     |
| 5:  |  Adelia Metz   |   Architect    |    36     |
| 6:  | Sunny Gerlach  |      NaN       |    56     |
| 7:  | Austin Hackett |      NaN       |    78     |
+-----+----------------+----------------+-----------+
| 8X3 |     STRING     |     STRING     |   INT64   |
+-----+----------------+----------------+-----------+

Create a Time series

Let's inform all employees separately on sequential days.

import "rocketlaunchr/dataframe-go/utils/utime"

mts, _ := utime.NewSeriesTime(ctx, "meeting time", "1D", time.Now().UTC(), false, utime.NewSeriesTimeOptions{Size: &[]int{8}[0]})
df.AddSeries(mts, nil)
+-----+----------------+----------------+-----------+--------------------------------+
|     |      NAME      |     TITLE      | BASE RATE |          MEETING TIME          |
+-----+----------------+----------------+-----------+--------------------------------+
| 0:  | Cordia Jacobi  |   Consultant   |    84     |   2020-02-02 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 1:  | Nickolas Emard |      NaN       |    44     |   2020-02-03 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 2:  | Hollis Dickens | Representative |    44     |   2020-02-04 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 3:  | Stacy Dietrich |      NaN       |    86     |   2020-02-05 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 4:  |  Aleen Legros  |    Officer     |    42     |   2020-02-06 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 5:  |  Adelia Metz   |   Architect    |    36     |   2020-02-07 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 6:  | Sunny Gerlach  |      NaN       |    56     |   2020-02-08 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 7:  | Austin Hackett |      NaN       |    78     |   2020-02-09 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
+-----+----------------+----------------+-----------+--------------------------------+
| 8X4 |     STRING     |     STRING     |   INT64   |              TIME              |
+-----+----------------+----------------+-----------+--------------------------------+

Filtering

Let's filter out our senior employees (they have titles) for no reason.

filterFn := dataframe.FilterDataFrameFn(func(vals map[interface{}]interface{}, row, nRows int) (dataframe.FilterAction, error) {
	if vals["title"] == nil {
		return dataframe.DROP, nil
	}
	return dataframe.KEEP, nil
})

seniors, _ := dataframe.Filter(ctx, df, filterFn)
+-----+----------------+----------------+-----------+--------------------------------+
|     |      NAME      |     TITLE      | BASE RATE |          MEETING TIME          |
+-----+----------------+----------------+-----------+--------------------------------+
| 0:  | Cordia Jacobi  |   Consultant   |    84     |   2020-02-02 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 1:  | Hollis Dickens | Representative |    44     |   2020-02-04 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 2:  |  Aleen Legros  |    Officer     |    42     |   2020-02-06 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 3:  |  Adelia Metz   |   Architect    |    36     |   2020-02-07 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
+-----+----------------+----------------+-----------+--------------------------------+
| 4X4 |     STRING     |     STRING     |   INT64   |              TIME              |
+-----+----------------+----------------+-----------+--------------------------------+

Other useful packages

  • awesome-svelte - Resources for killing react
  • dbq - Zero boilerplate database operations for Go
  • electron-alert - SweetAlert2 for Electron Applications
  • google-search - Scrape google search results
  • igo - A Go transpiler with cool new syntax such as fordefer (defer for for-loops)
  • mysql-go - Properly cancel slow MySQL queries
  • react - Build front end applications using Go
  • remember-go - Cache slow database queries
  • testing-go - Testing framework for unit testing

Legal Information

The license is a modified MIT license. Refer to LICENSE file for more details.

© 2018-20 PJ Engineering and Business Solutions Pty. Ltd.

Comments
  • HOw to use CSVLoadOptions ?

    HOw to use CSVLoadOptions ?

    hi i have one csv fiile ,has four fields [USERID ,MOVIEID,RATING, TIMESTAMP) ,LoadFromCSV default load all fields data type are string ,I want to change it with float64 when load init ,so I create CSVLoadOptions var csvOp imports.CSVLoadOptions csvOp.DictateDataType =make(map[string]interface{}) csvOp.DictateDataType["USERID"]= float64(0) csvOp.DictateDataType["MOVIEID"]=float64(0) csvOp.DictateDataType["RATING"]=float64(0) csvOp.DictateDataType["TIMESTAMP"]=float64(0)

    ratingDf, err := imports.LoadFromCSV(ctx, file,csvOp)
    

    but has load error ,I dont know why ,is use the CSVLoadOptions is not correct ?

  • Getting dataframe.ApplySeriesFn undefined error

    Getting dataframe.ApplySeriesFn undefined error

    Thanks for creating this library!

    I can get this code to work:

    ctx := context.TODO()
    
    // step 1: open the csv
    csvfile, err := os.Open("data/example.csv")
    if err != nil {
    	log.Fatal(err)
    }
    
    dataframe, err := imports.LoadFromCSV(ctx, csvfile)
    

    Here's the data that's printed:

    fmt.Print(dataframe.Table())
    
    +-----+------------+-----------------+
    |     | FIRST NAME | FAVORITE NUMBER |
    +-----+------------+-----------------+
    | 0:  |  matthew   |       23        |
    | 1:  |   daniel   |        8        |
    | 2:  |  allison   |       42        |
    | 3:  |   david    |       18        |
    +-----+------------+-----------------+
    | 4X2 |   STRING   |     STRING      |
    +-----+------------+-----------------+
    

    I cannot get this code working:

    s := dataframe.Series[2]
    
    applyFn := dataframe.ApplySeriesFn(func(val interface{}, row, nRows int) interface{} {
    	return 2 * val.(int64)
    })
    
    dataframe.Apply(ctx, s, applyFn, dataframe.FilterOptions{InPlace: true})
    
    fmt.Print(dataframe.Table())
    

    Here's the error message:

    ./dataframe_go.go:36:22: dataframe.ApplySeriesFn undefined (type *dataframe.DataFrame has no field or method ApplySeriesFn)
    ./dataframe_go.go:40:11: dataframe.Apply undefined (type *dataframe.DataFrame has no field or method Apply)
    ./dataframe_go.go:40:44: dataframe.FilterOptions undefined (type *dataframe.DataFrame has no field or method FilterOptions)
    

    Here's the code: https://github.com/MrPowers/go-dataframe-examples/blob/master/dataframe_go.go

    Sorry if this is a basic question. I am a Go newbie!

    Thanks again for making this library!

  • Reading from Parquet

    Reading from Parquet

    Hello,

    Are there any plans to support reading a Parquet file into a dataframe? I have a need for this and am evaluating this library to use in an application.

    Thanks!

  • Expand docs to include other common dataframe operations, etc.

    Expand docs to include other common dataframe operations, etc.

    Greetings!

    Just a minor suggestion, but if you have the time, it could be useful to expand the docs a bit more to cover some additional common operations applied to dataframe-like structures, where supported.

    For example:

    • retrieving a single row
    • retrieving a single column
    • selecting row/column subsets by indices or ranges
    • selecting a single value by <row, column> indices

    Further, one other thing I noticed when employing the package for the first time, is that many of the dataframe.xx() function calls include a nil as the first argument.

    From looking at the code for dataframe.go, these appear to be relating to an optional Options struct, so it makes sense that this would be set to nil in many instances. It may just be worth mentioning this explicitly in the examples for .Append() in the docs.

    Finally two other things that could be useful to consider including in the docs:

    • Limitations compared with R/pandas
    • Cheatsheet of commands comparing dataframe-go with R/pandas (more effort, and probably better suited for a separate wiki page, etc., but would be really useful for people coming from these worlds..)

    Thanks for taking the time to put together and share this really useful package!

  • how to get dataframe all  data convert to  gonum  dense matrix ?

    how to get dataframe all data convert to gonum dense matrix ?

    I want to use it ,but I found some problem ,You make the property about SeriesInt64 values private !!!,why ?

    would you like tell how to convert dataframe to gonum dense matrix ?

    and how to use LoadFromCSV(ctx,strings.NewReader(csvStr)),which ctx ,how to define the context.Context

  • Draw graphs from columns of dataframe

    Draw graphs from columns of dataframe

    Hi! At the moment I have managed to plot a separate dataframe column by this strange method:

    func main() {
            // all values of df are strings representing floating point numbers
            df := df, err := imports.LoadFromCSV(ctx, r, imports.CSVLoadOptions{Comma: ';'})
    	s := df.Series[2] // trying to plot column 2
    	series := dataframe.NewSeriesFloat64("test_name", nil, nil)
    
    	i := s.ValuesIterator(dataframe.ValuesOptions{InitialRow: 0, Step: 1, DontReadLock: false})
    	for {
    		row, vals, _ := i()
    		if row == nil {
    			break
    		}
    		val, err := strconv.ParseFloat(vals.(string), 64)
    		if err != nil {
    			continue
    		}
    		series.Append(val)
    	}
    	Plot(series)
    }
    
    func Plot(ser *dataframe.SeriesFloat64) {
    	ctx := context.TODO()
    	cs, _ := wcharczuk_chart.S(ctx, ser, nil, nil)
    	graph := chart.Chart{
    		Title:  "test_graph",
    		Width:  640,
    		Height: 480,
    		Series: []chart.Series{cs},
    	}
    	f, err := os.Create("graph.svg")
    	if err != nil {
    		panic(err)
    	}
    	defer f.Close()
    
    	plt := bufio.NewWriter(f)
    	_ = graph.Render(chart.SVG, plt)
    }
    

    Is there any simplier or more elegant method to do this job? And another question is if I can plot several columns on one plot? And if it is possible, how can I do this? Thanks in advance.

  • LoadFromJSON Not Working

    LoadFromJSON Not Working

    files, err := ioutil.ReadFile("device.json") if err != nil { fmt.Println(err) }

    var ctx = context.Background()
    df2, _ := imports.LoadFromJSON(ctx, strings.NewReader(string(files)))
    
    fmt.Println(df2.Table())
    
  • Add support for CSV without headers row

    Add support for CSV without headers row

    This simply adds the support to import CSV files without a headers row.

    In case the ColumnNames options is specified, it uses it to set the series names, instead of reading the first row.

    It moves the if row == 0 { outside the for loop to avoid to do the check for each row read.

  • Inconsistent behavior for Apply when using with ApplyDataFrameFn

    Inconsistent behavior for Apply when using with ApplyDataFrameFn

    I'm trying to concatenate two columns in a dataframe and put it into a new column. The behavior is very inconsistent. Sometimes the strings are concatenated into the new column. Sometimes the value is just set to NaN.

    In this run, the value for concat_contact_number in the resulting dataframe was correctly set to 97312345678. The map value for concat_contact_number also reflects the concatenated value.

    Expected output:

    $ go run main.go 
    INFO[0000] In applyConcatDf: vals[contact_number_country_code]: 973 
    INFO[0000] In applyConcatDf: vals[concat_contact_number]: 973 
    INFO[0000] In applyConcatDf: vals[contact_number]: 12345678 
    INFO[0000] In applyConcatDf: vals[concat_contact_number]: 97312345678 
    INFO[0000] In applyConcatDf: vals: map[0:973 1:12345678 2:<nil> concat_contact_number:97312345678 contact_number:12345678 contact_number_country_code:973] 
    INFO[0000] In prepareDataframe:                         
    INFO[0000] +-----+-----------------------------+----------------+-----------------------+
    |     | CONTACT NUMBER COUNTRY CODE | CONTACT NUMBER | CONCAT CONTACT NUMBER |
    +-----+-----------------------------+----------------+-----------------------+
    | 0:  |             973             |    12345678    |      97312345678      |
    +-----+-----------------------------+----------------+-----------------------+
    | 1X3 |           STRING            |     STRING     |        STRING         |
    +-----+-----------------------------+----------------+-----------------------+ 
    INFO[0000] In main:                                     
    INFO[0000] +-----+-----------------------------+----------------+-----------------------+
    |     | CONTACT NUMBER COUNTRY CODE | CONTACT NUMBER | CONCAT CONTACT NUMBER |
    +-----+-----------------------------+----------------+-----------------------+
    | 0:  |             973             |    12345678    |      97312345678      |
    +-----+-----------------------------+----------------+-----------------------+
    | 1X3 |           STRING            |     STRING     |        STRING         |
    +-----+-----------------------------+----------------+-----------------------+ 
    

    In this run, the value for concat_contact_number in the resulting dataframe was incorrectly set to NaN. Same as with the correct run, the map value for concat_contact_number is also set to the expected concatenated value.

    Erroneous output:

    $ go run main.go 
    INFO[0000] In applyConcatDf: vals[contact_number_country_code]: 973 
    INFO[0000] In applyConcatDf: vals[concat_contact_number]: 973 
    INFO[0000] In applyConcatDf: vals[contact_number]: 12345678 
    INFO[0000] In applyConcatDf: vals[concat_contact_number]: 97312345678 
    INFO[0000] In applyConcatDf: vals: map[0:973 1:12345678 2:<nil> concat_contact_number:97312345678 contact_number:12345678 contact_number_country_code:973] 
    INFO[0000] In prepareDataframe:                         
    INFO[0000] +-----+-----------------------------+----------------+-----------------------+
    |     | CONTACT NUMBER COUNTRY CODE | CONTACT NUMBER | CONCAT CONTACT NUMBER |
    +-----+-----------------------------+----------------+-----------------------+
    | 0:  |             973             |    12345678    |          NaN          |
    +-----+-----------------------------+----------------+-----------------------+
    | 1X3 |           STRING            |     STRING     |        STRING         |
    +-----+-----------------------------+----------------+-----------------------+ 
    INFO[0000] In main:                                     
    INFO[0000] +-----+-----------------------------+----------------+-----------------------+
    |     | CONTACT NUMBER COUNTRY CODE | CONTACT NUMBER | CONCAT CONTACT NUMBER |
    +-----+-----------------------------+----------------+-----------------------+
    | 0:  |             973             |    12345678    |          NaN          |
    +-----+-----------------------------+----------------+-----------------------+
    | 1X3 |           STRING            |     STRING     |        STRING         |
    +-----+-----------------------------+----------------+-----------------------+ 
    

    It can be observed that in both cases the map value for 2 is always <nil>. Is this expected?

    Run this code several times to see deviances in the output. The issue may not show up immediately. Sometimes it takes 10x runs, sometimes only 2x run. Again the behavior is inconsistent.

    Working code:

    package main
    
    import (
    	"context"
    	"fmt"
    	"strings"
    
    	dataframe "github.com/rocketlaunchr/dataframe-go"
    	"github.com/rocketlaunchr/dataframe-go/imports"
    	log "github.com/sirupsen/logrus"
    )
    
    // applyConcatDf returns an ApplyDataFrameFn that concatenates the given column names into another column
    func applyConcatDf(dest_column string, columns []string) dataframe.ApplyDataFrameFn {
    	return func(vals map[interface{}]interface{}, row, nRows int) map[interface{}]interface{} {
    		vals[dest_column] = ""
    		for _, key := range columns {
    			log.Infof("vals[%s]: %s", key, vals[key].(string))
    			vals[dest_column] = vals[dest_column].(string) + vals[key].(string)
    			log.Infof("vals[%s]: %s", dest_column, vals[dest_column].(string))
    		}
    
    		log.Infof("vals: %v", vals)
    		return vals
    	}
    }
    
    // applySetupDataframe initializes the dataframe from a CSV string
    func setupDataframe() *dataframe.DataFrame {
    	ctx := context.Background()
    
    	csvStr := `contact_number_country_code,contact_number
    "973","12345678"`
    
    	df, _ := imports.LoadFromCSV(ctx, strings.NewReader(csvStr), imports.CSVLoadOptions{
    		DictateDataType: map[string]interface{}{
    			"contact_number_country_code": "",
    			"contact_number":              "",
    		},
    	})
    
    	return df
    }
    
    // prepareDataframe applies the concatenation on the loaded dataframe
    func prepareDataframe(df *dataframe.DataFrame) {
    	ctx := context.Background()
    
    	sConcatContactNumber := dataframe.NewSeriesString("concat_contact_number", &dataframe.SeriesInit{Size: df.NRows()})
    	df.AddSeries(sConcatContactNumber, nil)
    
    	_, err := dataframe.Apply(ctx, df, applyConcatDf("concat_contact_number", []string{"contact_number_country_code", "contact_number"}), dataframe.FilterOptions{InPlace: true})
    
    	if err != nil {
    		log.WithError(err).Error("concatenation cannot be applied")
    	}
    
    	fmt.Println(df)
    }
    
    func main() {
    	df := setupDataframe()
    	prepareDataframe(df)
    	fmt.Println(df)
    }
    
  • Getting back Float64/Int64/Mixed series from dataframe

    Getting back Float64/Int64/Mixed series from dataframe

    I wanted to know if there is a way to convert a series interface to get the original type of series (Float64/Int64/Mixed) underneath it. I will describe mu use case.

    After creating a dataframe, I am trying to use gonum to do some analysis. For eg. linear regression of two series from dataframe. But for this I have to iterate over the whole series(using ValuesIterator) to get back each element into a []float64, which is required by gonum. ToSeriesFloat64 does not help since it is not implemented by Series.

    Is there an easier way to access the whole underlying series into into corresponding concrete slice?

  • Potential collision and risk from indirect dependence

    Potential collision and risk from indirect dependence "github.com/gotestyourself/gotestyourself"

    Background

    Repo rocketlaunchr/dataframe-go used the old path to import gotestyourself indirectly. This caused that github.com/gotestyourself/gotestyourself and gotest.tools coexist in this repo: https://github.com/rocketlaunchr/dataframe-go/blob/master/go.mod (Line 20 & 40)

    github.com/gotestyourself/gotestyourself v2.2.0+incompatible // indirect
    gotest.tools v2.2.0+incompatible // indirect 
    

    That’s because the gotestyourself has already renamed it’s import path from "github.com/gotestyourself/gotestyourself" to "gotest.tools". When you use the old path "github.com/gotestyourself/gotestyourself" to import the gotestyourself, will reintroduces gotestyourself through the import statements "import gotest.tools" in the go source file of gotestyourself.

    https://github.com/gotestyourself/gotest.tools/blob/v2.2.0/fs/example_test.go#L8

    package fs_test
    import (
    	…
    	"gotest.tools/assert"
    	"gotest.tools/assert/cmp"
    	"gotest.tools/fs"
    	"gotest.tools/golden"
    )
    

    "github.com/gotestyourself/gotestyourself" and "gotest.tools" are the same repos. This will work in isolation, bring about potential risks and problems.

    Solution

    Add replace statement in the go.mod file:

    replace github.com/gotestyourself/gotestyourself => gotest.tools v2.2.0
    

    Then clean the go.mod.

  • Progress for re-write of dataframe-go?

    Progress for re-write of dataframe-go?

    It's written in the README file that "Once Go 1.18 (Generics) is introduced, the ENTIRE package will be rewritten.", As Go 1.18 has been released for a while, I'm wondering if work has started on re-writing of the entire package. If so, how's the progress?

  • Indirect dependency `github.com/blend/go-sdk v1.1.1` does not exist

    Indirect dependency `github.com/blend/go-sdk v1.1.1` does not exist

    I suspect that the library maintainers prepended "legacy-" to versions before changing the versioning scheme. At the least, this dependency should be updated to legacy-v1.1.1.

  • Error to read parquet with latest parquet-go

    Error to read parquet with latest parquet-go

    1. Create a file with python pandas
    dataframe = pandas.DataFrame({
            "A": ["a", "b", "c", "d"],
            "B": [2, 3, 4, 1],
            "C": [10, 20, None, None]
        })
    
    dataframe.to_parquet("1.parquet")
    

    This file looks like: image

    1. Read this file
    func main() {
        ctx := context.Background()
        fr, _ := local.NewLocalFileReader("1.parquet")
        df, err := imports.LoadFromParquet(ctx, fr)
        if err != nil {
            panic(err)
        }
        fmt.Println(df)
    }
    
    1. Got a unique name error
    panic: names of series must be unique: 
    
    goroutine 1 [running]:
    github.com/rocketlaunchr/dataframe-go.NewDataFrame({0xc0001f8000, 0x3, 0xc000149a10?})
            .../rocketlaunchr/[email protected]/dataframe.go:41 +0x33c
    github.com/rocketlaunchr/dataframe-go/imports.LoadFromParquet({0x1497868, 0xc000020080}, {0x1498150?, 0xc00000e798?}, {0xc0000021a0?, 0xc000149f70?, 0x1007599?})
            .../go/pkg/mod/github.com/rocketlaunchr/[email protected]/imports/parquet.go:110 +0x8ae
    main.main()
            .../main.go:13 +0x78
    
    1. Following the stack, I found some useful informations
    • All series in method imports.LoadFromParquet with empty names
    image
    • goFieldNameToActual each keys in this map with prefix "Scheme", but goName didn't, may be it's the reason why can't not find a name from this map
    image image

    This's the first time I use golang to read parquet files. It is an error cause by parquet-go breaking changes or something else ?

  • Bad import, was an upstream dependency deleted?

    Bad import, was an upstream dependency deleted?

    go: github.com/sjwhitworth/[email protected] requires
            github.com/rocketlaunchr/[email protected] requires
            github.com/blend/[email protected]: reading github.com/blend/go-sdk/go.mod at revision v1.1.1: unknown revision v1.1.1
    

    It looks like v1.1.1 of github.com/blend/go-sdk is missing. Are you seeing the same or am I taking crazy pills today?

Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more

Gonum Installation The core packages of the Gonum suite are written in pure Go with some assembly. Installation is done using go get. go get -u gonum.

Jan 8, 2023
A well tested and comprehensive Golang statistics library package with no dependencies.

Stats - Golang Statistics Package A well tested and comprehensive Golang statistics library / package / module with no dependencies. If you have any s

Dec 26, 2022
GoStats is a go library for math statistics mostly used in ML domains, it covers most of the statistical measures functions.

GoStats GoStats is an Open Source Go library for math statistics mostly used in Machine Learning domains, it covers most of the Statistical measures f

Nov 10, 2022
Package goraph implements graph data structure and algorithms.
Package goraph implements graph data structure and algorithms.

goraph Package goraph implements graph data structure and algorithms. go get -v gopkg.in/gyuho/goraph.v2; I have tutorials and visualizations of grap

Dec 20, 2022
tools for working with streams of data

streamtools 4/1/2015 Development for streamtools has waned as our attention has turned towards developing a language paradigm that embraces blocking,

Nov 18, 2022
Types and utilities for working with 2d geometry in Golang

orb Package orb defines a set of types for working with 2d geo and planar/projected geometric data in Golang. There are a set of sub-packages that use

Dec 28, 2022
:wink: :cyclone: :strawberry: TextRank implementation in Golang with extendable features (summarization, phrase extraction) and multithreading (goroutine) support (Go 1.8, 1.9, 1.10)
:wink: :cyclone: :strawberry: TextRank implementation in Golang with extendable features (summarization, phrase extraction) and multithreading (goroutine) support (Go 1.8, 1.9, 1.10)

TextRank on Go This source code is an implementation of textrank algorithm, under MIT licence. The minimum requred Go version is 1.8. MOTIVATION If th

Dec 18, 2022
2D triangulation library. Allows translating lines and polygons (both based on points) to the language of GPUs.
2D triangulation library. Allows translating lines and polygons (both based on points) to the language of GPUs.

triangolatte 2D triangulation library. Allows translating lines and polygons (both based on points) to the language of GPUs. Features normal and miter

Dec 23, 2022
Polygol - Boolean polygon clipping/overlay operations (union, intersection, difference, xor) on Polygons and MultiPolygons
Polygol - Boolean polygon clipping/overlay operations (union, intersection, difference, xor) on Polygons and MultiPolygons

polygol Boolean polygon clipping/overlay operations (union, intersection, differ

Jan 8, 2023
DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration
DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration

Dataframes are used for statistics, machine-learning, and data manipulation/exploration. You can think of a Dataframe as an excel spreadsheet. This pa

Jan 3, 2023
Gota: DataFrames and data wrangling in Go (Golang)

Gota: DataFrames, Series and Data Wrangling for Go This is an implementation of DataFrames, Series and data wrangling methods for the Go programming l

Jan 6, 2023
Gota: DataFrames and data wrangling in Go (Golang)

Gota: DataFrames, Series and Data Wrangling for Go This is an implementation of DataFrames, Series and data wrangling methods for the Go programming l

Jan 5, 2023
A modern tool for the Windows kernel exploration and tracing
A modern tool for the Windows kernel exploration and tracing

Fibratus A modern tool for the Windows kernel exploration and observability Get Started » Docs • Filaments • Download • Discussions What is Fibratus?

Dec 30, 2022
Bet - An exploration in writing structured Go tests using type parameters

Behavior Tests This is an exploration in writing structured Go tests using type

Feb 25, 2022
CLI to run your dataframes against SLU service and generated labeled dataframe.

trail CLI to run your dataframes against different services (currently, SLU service). Setup Get the latest binaries from the releases here. Choose the

Nov 12, 2021
Common functional data manipulation and abstraction patterns in Golang.

Functional Patterns in Golang GOMAD (Early stage) This package is still in an early stage of development. Feel free to open a PR and contribute or jus

Jan 8, 2023
This repository is where I'm learning to write a CLI using Go, while learning Go, and experimenting with Docker containers and APIs.

CLI Project This repository contains a CLI project that I've been working on for a while. It's a simple project that I've been utilizing to learn Go,

Dec 12, 2021
Scraper to download school attendance data from the DfE's statistics website
Scraper to download school attendance data from the DfE's statistics website

?? Simple to use. Scrape attendance data with a single command! ?? Super fast. A

Mar 31, 2022
Sig - Statistics in Go - CLI tool for quick statistical analysis of data streams

Statistics in Go - CLI tool for quick statistical analysis of data streams

May 16, 2022
Swiss Army knife Proxy tool for HTTP/HTTPS traffic capture, manipulation, and replay on the go.
Swiss Army knife Proxy tool for HTTP/HTTPS traffic capture, manipulation, and replay on the go.

Features • Installation • Usage • Running Proxify • Installing SSL Certificate • Applications of Proxify • Join Discord Swiss Army Knife Proxy for rap

Jan 8, 2023