Sparse matrix formats for linear algebra supporting scientific and machine learning applications

Sparse matrix formats

License: MIT GoDoc Build Status Go Report Card codecov Mentioned in Awesome Go Sourcegraph

Implementations of selected sparse matrix formats for linear algebra supporting scientific and machine learning applications. Compatible with the APIs in the Gonum package and interoperable with Gonum dense matrix types.

Overview

Machine learning applications typically model entities as vectors of numerical features so that they may be compared and analysed quantitively. Typically the majority of the elements in these vectors are zeros. In the case of text mining applications, each document within a corpus is represented as a vector and its features represent the vocabulary of unique words. A corpus of several thousand documents might utilise a vocabulary of hundreds of thousands (or perhaps even millions) of unique words but each document will typically only contain a couple of hundred unique words. This means the number of non-zero values in the matrix might only be around 1%.

Sparse matrix formats capitalise on this premise by only storing the non-zero values thereby reducing both storage/memory requirements and processing effort for manipulating the data.

Features

Usage

The sparse matrices in this package implement the Gonum Matrix interface and so are fully interoperable and mutually compatible with the Gonum APIs and dense matrix types.

// Construct a new 3x2 DOK (Dictionary Of Keys) matrix
dokMatrix := sparse.NewDOK(3, 2)

// Populate it with some non-zero values
dokMatrix.Set(0, 0, 5)
dokMatrix.Set(2, 1, 7)

// Demonstrate accessing values (could use Gonum's mat.Formatted()
// function to pretty print but this demonstrates element access)
m, n := dokMatrix.Dims()
for i := 0; i < m; i++ {
    for j := 0; j < n; j++ {
        fmt.Printf("%.0f,", dokMatrix.At(i, j))
    }
    fmt.Printf("\n")
}

// Convert DOK matrix to CSR (Compressed Sparse Row) matrix
// just for fun (not required for upcoming multiplication operation)
csrMatrix := dokMatrix.ToCSR()

// Create a random 2x3 COO (COOrdinate) matrix with
// density of 0.5 (half the elements will be non-zero)
cooMatrix := sparse.Random(sparse.COOFormat, 2, 3, 0.5)

// Convert CSR matrix to Gonum mat.Dense matrix just for fun
// (not required for upcoming multiplication operation)
// then transpose so it is the right shape/dimensions for
// multiplication with the original CSR matrix
denseMatrix := csrMatrix.ToDense().T()

// Multiply the 2 matrices together and store the result in the
// sparse receiver (multiplication with sparse product)
var csrProduct sparse.CSR
csrProduct.Mul(csrMatrix, cooMatrix)

// As an alternative, use the sparse BLAS routines for efficient
// sparse matrix multiplication with a Gonum mat.Dense product
// (multiplication with dense product)
denseProduct := sparse.MulMatMat(false, 1, csrMatrix, denseMatrix, nil)

Installation

With Go installed, package installation is performed using go get.

go get -u github.com/james-bowman/sparse/...

Acknowledgements

See Also

License

MIT

Owner
James Bowman
CTO @Fresh8 Gaming. Go | Microservices | Machine Learning | NLP
James Bowman
Comments
  • GMRES

    GMRES

    Could you clarify your point of view about adding implementation for iterative methods? For example GMRES: https://en.wikipedia.org/wiki/Generalized_minimal_residual_method

  • implement mat.tracer for dia, csc and csr types

    implement mat.tracer for dia, csc and csr types

    add tests for tracer add benchmarks for trace (which is now much faster)

    also added a colnnz() to csc which was part of the trace implementation at one point. left in as feels like a gap. open to removing if someone has strong views. not clear why rownnz() would be on csr without colnnz() on csc though.

    /tmp/sparse_old.txt/tmp/sparse_new.txt
    time/opdelta
    TraceCS/BenchTraceCS3/0.010000-100-41.16µs ± 3%0.72µs ± 3%−37.76%(p=0.000 n=10+8)
    TraceCS/BenchTraceCS4/0.010000-100-41.15µs ± 5%0.70µs ± 3%−38.94%(p=0.000 n=10+10)
    TraceCS/BenchTraceCS3/0.400000-100-45.28µs ± 1%5.35µs ± 3%~(p=0.189 n=9+10)
    TraceCS/BenchTraceCS4/0.400000-100-45.06µs ± 1%4.85µs ± 5%−4.14%(p=0.005 n=8+9)
    TraceCS/BenchTraceCS3/0.010000-200-42.45µs ± 2%1.80µs ± 2%−26.69%(p=0.000 n=10+10)
    TraceCS/BenchTraceCS4/0.010000-200-42.50µs ± 3%1.83µs ± 2%−26.81%(p=0.000 n=9+10)
    TraceCS/BenchTraceCS3/0.400000-200-416.5µs ± 4%16.2µs ± 3%~(p=0.108 n=9+9)
    TraceCS/BenchTraceCS4/0.400000-200-416.3µs ± 2%16.0µs ± 2%−2.17%(p=0.001 n=8+10)
    TraceCS/BenchTraceCS3/0.010000-400-46.35µs ± 4%5.57µs ± 7%−12.34%(p=0.000 n=10+10)
    TraceCS/BenchTraceCS4/0.010000-400-46.71µs ± 6%4.94µs ± 3%−26.35%(p=0.000 n=9+10)
    TraceCS/BenchTraceCS3/0.400000-400-455.2µs ± 2%56.6µs ± 6%~(p=0.278 n=9+10)
    TraceCS/BenchTraceCS4/0.400000-400-456.0µs ± 2%55.5µs ± 6%~(p=0.053 n=10+9)
    TraceCS/BenchTraceCS3/0.010000-800-422.3µs ± 1%20.9µs ± 7%−5.97%(p=0.003 n=8+10)
    TraceCS/BenchTraceCS4/0.010000-800-422.7µs ± 2%20.5µs ± 4%−9.86%(p=0.000 n=10+10)
    TraceCS/BenchTraceCS3/0.400000-800-4229µs ± 7%212µs ± 2%−7.03%(p=0.000 n=10+9)
    TraceCS/BenchTraceCS4/0.400000-800-4221µs ± 1%216µs ± 3%−2.34%(p=0.006 n=9+9)
    TraceCS/BenchTraceCS3/0.010000-1600-463.7µs ± 6%59.2µs ± 4%−6.99%(p=0.000 n=9+10)
    TraceCS/BenchTraceCS4/0.010000-1600-463.8µs ± 6%57.6µs ± 3%−9.76%(p=0.000 n=10+10)
    TraceCS/BenchTraceCS3/0.400000-1600-41.00ms ± 9%0.97ms ±10%~(p=0.156 n=9+10)
    TraceCS/BenchTraceCS4/0.400000-1600-4952µs ± 3%957µs ± 5%~(p=0.529 n=10+10)
    TraceDIA/BenchTraceDIA-100-4657ns ± 3%142ns ± 5%−78.43%(p=0.000 n=8+10)
    TraceDIA/BenchTraceDIA-200-41.29µs ±14%0.17µs ±18%−86.80%(p=0.000 n=10+10)
    TraceDIA/BenchTraceDIA-400-42.30µs ± 3%0.20µs ± 3%−91.31%(p=0.000 n=10+10)
    TraceDIA/BenchTraceDIA-800-44.70µs ± 7%0.28µs ± 3%−94.12%(p=0.000 n=10+10)
    TraceDIA/BenchTraceDIA-1600-49.06µs ± 8%0.45µs ± 9%−95.00%(p=0.000 n=10+10)
     
  • go get fails

    go get fails

    Running the go get command yields following message:

    $ go version
    go version go1.16.2 linux/amd64
    $ go get -u github.com/james-bowman/sparse/...
    # github.com/james-bowman/sparse/blas
    ../../go/pkg/mod/github.com/james-bowman/[email protected]/blas/matrix.go:83:7: undefined: floats.EqualWithinAbs
    ../../go/pkg/mod/github.com/james-bowman/[email protected]/blas/matrix.go:105:8: undefined: floats.EqualWithinAbs
    

    Expected

    For go to get module. I did not have this problem go geting gonum.

  • sparse cholesky solver

    sparse cholesky solver

    this is an attempt at a sparse cholesky implementation. i've taken parts of the interface from gonum.mat.Cholesky but not all. curious your thoughts on how to organize this.

    many of the methods on mat.Cholesky are not associated with interfaces. so it feels safe to just implement what's needed as-needed. i expect designing this well will lead to a proposal to add a mat.Solver/Factorizer interface into gonum. Or something like that.

    some of mat.Cholesky's methods don't feel useful here, at least not on their own. the symmetric stuff might make sense if associated with sparse.SymCSR etc. somewhat similar for inverse.

    i'm stopping well short of implementing cholmod in the same manner this package implements sparseblas because a) it's a v large piece of work i don't really need it and b) memory savings >> improved performance here.

  • add MulMatSparseVec function and test

    add MulMatSparseVec function and test

    This is an attempt at dense matrix * sparse vector multiplication. AFAIK there is no standard Sparse BLAS routine for this operation so I've tried to do it in a consistent style. As for naming: MulMatVec and MulMatMat are taken for the sparse matrix * dense something case.

    Maybe it makes more sense to make MulMatVec fully generic and have it delegate to MulMatSparseVec, MulSparseMatVec? The latter would be a renaming of today's MulMatVec; the new MulMatVec would change BlasCompatibleSparser to mat.Matrix in the signature and contain type assertion logic with no math. That's a larger change though.

  • Clarify NNZ behavior on COO format sparse matrices

    Clarify NNZ behavior on COO format sparse matrices

    PR clarifies behavior of NNZ on a coordinate matrix. The current documentation for this method states that COO.NNZ() returns the number of non-zero elements in the sparse matrix. Rather, the length of the underlying data is returned, which could differ from the number of non-zero elements. For example, NNZ() could return a number greater than 0 on a zero matrix if duplicate entries summed to zero or if zeroes are explicitly stored. NNZ() could also return a number greater than the size of the matrix with enough duplicates.

  • Add Vector.Set and Vector.SetVec methods

    Add Vector.Set and Vector.SetVec methods

    I needed a version of a sparse vector that that I could update similar to the gonum mat.Dense so I have added Set and SetVec methods to Vector and their accompanying tests. By adding the Set method, Vector also now satisfies the mat.Mutable interface from gonum. Also updated the import in vector.go from "github.com/gonum/floats" to "gonum.org/v1/gonum/floats".

    Hopefully this is helpful to project.

  • Sparse vectors in blas package

    Sparse vectors in blas package

    Hi. I'm just wondering if you have any plan for adding the sparse vector representation into blas package the same way as the sparse matrix (referring to RawMatrix() method)?. having access to the actual elements of a sparse vector (data and ind) would be quite useful. Thanks

  • add TODO view in test log

    add TODO view in test log

    Now, result:

    --- PASS: TestTodo (0.03s)
        --- PASS: TestTodo/benchmarks_test.go (0.01s)
        --- PASS: TestTodo/binary.go (0.00s)
        --- PASS: TestTodo/binary_test.go (0.00s)
        --- PASS: TestTodo/compressed.go (0.00s)
        --- PASS: TestTodo/compressed_arith.go (0.00s)
        	sparse_test.go:42: compressed_arith.go:113  // TODO: handle cases where both matrices are DIA
        	sparse_test.go:42: compressed_arith.go:189  // TODO Consider converting all Sparser args to CSR
        	sparse_test.go:42: compressed_arith.go:300  // TODO optimisation for DIA matrices
        --- PASS: TestTodo/compressed_arith_test.go (0.00s)
        --- PASS: TestTodo/compressed_test.go (0.00s)
        --- PASS: TestTodo/coordinate.go (0.00s)
        --- PASS: TestTodo/coordinate_test.go (0.00s)
        --- PASS: TestTodo/diagonal.go (0.00s)
        --- PASS: TestTodo/diagonal_test.go (0.00s)
        --- PASS: TestTodo/dictionaryofkeys.go (0.00s)
        --- PASS: TestTodo/dictionaryofkeys_test.go (0.00s)
        --- PASS: TestTodo/doc.go (0.00s)
        --- PASS: TestTodo/example_test.go (0.00s)
        --- PASS: TestTodo/matrix.go (0.00s)
        --- PASS: TestTodo/matrix_test.go (0.00s)
        --- PASS: TestTodo/persistence.go (0.00s)
        --- PASS: TestTodo/persistence_test.go (0.00s)
        --- PASS: TestTodo/pool.go (0.00s)
        --- PASS: TestTodo/sparse_test.go (0.00s)
        --- PASS: TestTodo/vector.go (0.00s)
        --- PASS: TestTodo/vector_test.go (0.00s)
    
  • Updating gonum/matrix/mat64 dependency to gonum/gonum/mat

    Updating gonum/matrix/mat64 dependency to gonum/gonum/mat

    Interested in using this implementation for sparse linear algebra! Noticed that the current underlying matrix type is deprecated, so I went ahead and updated to the current gonum standard.

  • Optimal way to resize matrix

    Optimal way to resize matrix

    Hi, is there any recommended way to resize sparse matrixes? Thinking about backing storage i would think coo matrixes should be easier to resize right?

  • Scalar multiplication

    Scalar multiplication

    Do you have an example of how to do an efficient scalar multiplication of a COO matrix somewhere?

    And one for vertically stacking two matrices similarly to Dense.Stack in gonum?

  • Leading duplicates breaks coo to csr conversion

    Leading duplicates breaks coo to csr conversion

    When leading duplicates are present in a COO matrix, the conversion towards CSR becomes invalid and not identical to the original COO representation.

    An example and solution are proposed in pull request #19

  • Fix leading duplicates in COO.

    Fix leading duplicates in COO.

    When leading duplicates are present in a COO matrix, the conversion towards CSR becomes invalid and not identical to the original COO representation. This changes the dedupe function, such that leading duplicates are treated correctly by starting the counter at -1 compared to 0.

    The following illustrates the problem:

    // create identity matrix with a duplicate at (0,0) that should drop out
    coo := NewCOO(2, 2, make([]int, 0), make([]int, 0), make([]float64, 0))
    coo.Set(0, 0, 100)
    coo.Set(0, 0, -100)
    coo.Set(0, 0, 1)	
    coo.Set(1, 1, 1)
    
    csr := coo.ToCSR()
    
    // compare matrices after conversion to CSR, expect equal matrices. 
    if !mat.Equal(coo, csr) {
        fmt.Printf("Expected:\n%v\n but created:\n%v\n", mat.Formatted(coo), mat.Formatted(csr))
    }
    

    When evaluated, we observe the output, showing the influence of the leading duplicates.

    ⎡1  0⎤
    ⎣0  1⎦
    but created:
    ⎡100    0⎤
    ⎣  0    1⎦
    

    The committed change should resolve this.

  • How to we set/remove a value to is suppose to be set to 0.0

    How to we set/remove a value to is suppose to be set to 0.0

    When assigning a value to a CSR (might be others too, but I know that one), how do we set the value back to 0.0?

    From https://github.com/james-bowman/sparse/blob/master/blas/matrix.go#L39, we can see that the value of 0.0 will not be set.

Go language bindings for the COIN-OR Linear Programming library

clp Description The clp package provides a Go interface to the COIN-OR Linear Programming (CLP) library, part of the COIN-OR (COmputational INfrastruc

Nov 9, 2022
This is a go implementation of Segmented Sieve and non Segmented sieve to produce prime numbers concurrently.

Prime This is a Go library to produce prime numbers using all available cpu cores. Installation $ go get github.com/kavehmz/prime Usage package main

Dec 30, 2022
A well tested and comprehensive Golang statistics library package with no dependencies.

Stats - Golang Statistics Package A well tested and comprehensive Golang statistics library / package / module with no dependencies. If you have any s

Dec 27, 2022
Go library that makes it easy to work with physical types, e.g. distances, velocities and angles.

Units Use at your own risk Note that this library is NOT production ready. If you want to use it anyway, contributions and bug reports are welcome! Br

Dec 27, 2022
Sparse matrix formats for linear algebra supporting scientific and machine learning applications

Sparse matrix formats Implementations of selected sparse matrix formats for linear algebra supporting scientific and machine learning applications. Co

Jan 8, 2023
Go implementation of BLAS (Basic Linear Algebra Subprograms)

Go implementation of BLAS (Basic Linear Algebra Subprograms) Any function is implemented in generic Go and if it is justified, it is optimized for AMD

Dec 11, 2022
Dec 28, 2022
Machine Learning libraries for Go Lang - Linear regression, Logistic regression, etc.

package ml - Machine Learning Libraries ###import "github.com/alonsovidales/go_ml" Package ml provides some implementations of usefull machine learnin

Nov 10, 2022
Open-in-linear - A tool provides a shortcut to opening a linear issue in the desktop app or browser

This tool provides a shortcut to opening a linear issue in the desktop app or browser.

Jan 25, 2022
HyperLogLog with lots of sugar (Sparse, LogLog-Beta bias correction and TailCut space reduction)

An improved version of HyperLogLog for the count-distinct problem, approximating the number of distinct elements in a multiset using 33-50% less space

Dec 31, 2022
:triangular_ruler:gofmtmd formats go source code block in Markdown. detects fenced code & formats code using gofmt.
:triangular_ruler:gofmtmd formats go source code block in Markdown. detects fenced code & formats code using gofmt.

gofmtmd gofmtmd formats go source code block in Markdown. detects fenced code & formats code using gofmt. Installation $ go get github.com/po3rin/gofm

Oct 31, 2022
Formats discord tokens to different formats.
Formats discord tokens to different formats.

token_formatter Formats discord tokens to different formats. Features Format your current tokens to a new format! Every tool uses a different format f

Nov 3, 2022
Package shapes provides an algebra for handling shapes

About Package shapes provides the algebra and machinery for dealing with the metainformation of shapes of a tensor. Why a shape package? The shape pac

Jan 14, 2022
This repository is where I'm learning to write a CLI using Go, while learning Go, and experimenting with Docker containers and APIs.

CLI Project This repository contains a CLI project that I've been working on for a while. It's a simple project that I've been utilizing to learn Go,

Dec 12, 2021
Selected Machine Learning algorithms for natural language processing and semantic analysis in Golang

Natural Language Processing Implementations of selected machine learning algorithms for natural language processing in golang. The primary focus for t

Dec 25, 2022
On-line Machine Learning in Go (and so much more)

goml Golang Machine Learning, On The Wire goml is a machine learning library written entirely in Golang which lets the average developer include machi

Jan 5, 2023
Selected Machine Learning algorithms for natural language processing and semantic analysis in Golang

Natural Language Processing Implementations of selected machine learning algorithms for natural language processing in golang. The primary focus for t

Dec 25, 2022
Self-contained Machine Learning and Natural Language Processing library in Go

If you like the project, please ★ star this repository to show your support! ?? A Machine Learning library written in pure Go designed to support rele

Dec 30, 2022
DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration
DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration

Dataframes are used for statistics, machine-learning, and data manipulation/exploration. You can think of a Dataframe as an excel spreadsheet. This pa

Dec 31, 2022
Deploy, manage, and scale machine learning models in production
Deploy, manage, and scale machine learning models in production

Deploy, manage, and scale machine learning models in production. Cortex is a cloud native model serving platform for machine learning engineering teams.

Dec 30, 2022