Naive Bayesian Classification for Golang.

Naive Bayesian Classification

Perform naive Bayesian classification into an arbitrary number of classes on sets of strings. bayesian also supports term frequency-inverse document frequency calculations (TF-IDF).

Copyright (c) 2011-2017. Jake Brukhman. ([email protected]). All rights reserved. See the LICENSE file for BSD-style license.


Background

This is meant to be an low-entry barrier Go library for basic Bayesian classification. See code comments for a refresher on naive Bayesian classifiers, and please take some time to understand underflow edge cases as this otherwise may result in innacurate classifications.


Installation

Using the go command:

go get github.com/navossoc/bayesian
go install !$

Documentation

See the GoPkgDoc documentation here.


Features

  • Conditional probability and "log-likelihood"-like scoring.
  • Underflow detection.
  • Simple persistence of classifiers.
  • Statistics.
  • TF-IDF support.

Example 1 (Simple Classification)

To use the classifier, first you must create some classes and train it:

import "github.com/navossoc/bayesian"

const (
    Good bayesian.Class = "Good"
    Bad  bayesian.Class = "Bad"
)

classifier := bayesian.NewClassifier(Good, Bad)
goodStuff := []string{"tall", "rich", "handsome"}
badStuff  := []string{"poor", "smelly", "ugly"}
classifier.Learn(goodStuff, Good)
classifier.Learn(badStuff,  Bad)

Then you can ascertain the scores of each class and the most likely class your data belongs to:

scores, likely, _ := classifier.LogScores(
                        []string{"tall", "girl"},
                     )

Magnitude of the score indicates likelihood. Alternatively (but with some risk of float underflow), you can obtain actual probabilities:

probs, likely, _ := classifier.ProbScores(
                        []string{"tall", "girl"},
                     )

Example 2 (TF-IDF Support)

To use the TF-IDF classifier, first you must create some classes and train it and you need to call ConvertTermsFreqToTfIdf() AFTER training and before calling classification methods such as LogScores, SafeProbScores, and ProbScores)

import "github.com/navossoc/bayesian"

const (
    Good bayesian.Class = "Good"
    Bad bayesian.Class = "Bad"
)

// Create a classifier with TF-IDF support.
classifier := bayesian.NewClassifierTfIdf(Good, Bad)

goodStuff := []string{"tall", "rich", "handsome"}
badStuff  := []string{"poor", "smelly", "ugly"}

classifier.Learn(goodStuff, Good)
classifier.Learn(badStuff,  Bad)

// Required
classifier.ConvertTermsFreqToTfIdf()

Then you can ascertain the scores of each class and the most likely class your data belongs to:

scores, likely, _ := classifier.LogScores(
    []string{"tall", "girl"},
)

Magnitude of the score indicates likelihood. Alternatively (but with some risk of float underflow), you can obtain actual probabilities:

probs, likely, _ := classifier.ProbScores(
    []string{"tall", "girl"},
)

Use wisely.

Comments
  • remove extra float64 slice

    remove extra float64 slice

    using simple benchmark over ProbScores:

    func BenchmarkProbScores(b *testing.B) {
        c := NewClassifier(Good, Bad)
        c.Learn([]string{"tall", "handsome", "rich"}, Good)
    
        for n := 0; n < b.N; n++ {
            c.ProbScores([]string{"the", "tall", "man"})
        }
    }
    

    old code: BenchmarkProbScores-4 5000000 271 ns/op 32 B/op 2 allocs/op

    new code: BenchmarkProbScores-4 10000000 199 ns/op 16 B/op 1 allocs/op

    of course this will be more obvious with more classes in the classifier

    PS: because it makes the code less obvious, I am not sure it is worth it to be merged, I just needed the 1 less allocation.

  • Add classes after classifier creation

    Add classes after classifier creation

    Apologies in advance as my knowledge of Go is still somewhat limited, so this may be a naive question.

    I want to expose the naive bayes classifying as a HTTP Web service, with both train and classify endpoints. I have no trouble with that, but I want the train endpoint to be able to accept new labels (labels that aren't currently in the classifier). Right now the labels are simply specified as consts and passed into the constructor. Can you think of the best way to add the ability to add labels at run-time?

  • Release 1.0 is really old - make a new release

    Release 1.0 is really old - make a new release

    1.0 is still importing things like "gob" instead of "encoding/gob" etc. Can you make a new release? I can also help co-maintain the project is that helps.

    Tools like dep will pick up release versions for most people and they will get code that won't work for newer versions of go.

    Thanks!

  • Panic if underflow is detected in `SafeProbScores`

    Panic if underflow is detected in `SafeProbScores`

    SafeProbScores ... If an underflow is detected, this method panics

    Source

    I am getting a bit confused by the comment in the method above according to the doc this method is suppose to panic but the code instead returns an error.

    Am I missing something ?

  • Changed SafeProbScores to return an error instead of a panic

    Changed SafeProbScores to return an error instead of a panic

    I know this is not a backwards compatible change, but this is a mathematical error and not really a runtime error, so: a) a panic causes functions to unwind outside of this package, which is not good for long running applications and b) there's little need to fill the log with stack traces given this is a known and reasonably common outcome.

    PS. Great package, really useful, thanks!

  • Fix function comments based on best practices from Effective Go

    Fix function comments based on best practices from Effective Go

    Every exported function in a program should have a doc comment. The first sentence should be a summary that starts with the name being declared. From effective go.

    I generated this with CodeLingo and I'm keen to get some feedback, but this is automated so feel free to close it and just say opt out to opt out of future CodeLingo outreach PRs.

  • fix data race issue for Classifier.seen

    fix data race issue for Classifier.seen

    When running the LogScores method in a highly concurrent situation, I noticed that Go's data race detector would complain about a data race regarding Classifier.seen. So that's why I changed any read and write operation to that particular member of the struct to only use atomic load and increment functions.

  • add Observe method to support externally learned word frequencies

    add Observe method to support externally learned word frequencies

    External methods to learn word frequencies might be things like distributed word-count in spark. For online classification, however, it might still be desirable to use go.

  • Use CodeLingo to Address Further Issues

    Use CodeLingo to Address Further Issues

    Hi @jbrukh!

    Thanks for merging the fixes from our earlier pull request. They were generated by CodeLingo which we've used to find a further 30 issues in the repo. This PR adds a set of CodeLingo Tenets which catch any new cases of the found issues in PRs to your repo.

    CodeLingo will also send follow-up PRs to fix the existing repos in the codebase. Install CodeLingo GitHub app after merging this PR. It will always be free for open source.

    We're most interested to see if we can help with project specific bugs. Tell us about more interesting issues and we'll see if our tech can help - free of charge.

    Thanks, Blake and the CodeLingo Team

  • Modernized go fmt, lint etc fixes. Simple code cleanup

    Modernized go fmt, lint etc fixes. Simple code cleanup

    @jbrukh hey, I did some simple code cleanup here

    • Moved the package doc to doc.go
    • Fixed some range syntax
    • Added some function docs
    • Other minor changes recommended according to golint
  • added WordsByClass to help with debugging learned classifiers

    added WordsByClass to help with debugging learned classifiers

    In order to assess the quality of a training set I found it useful to know which words are most prominent in a given class. Of course this could be done by a separate wordcount as well but the classifier did that already - so why not use it.

  • JSON serialization

    JSON serialization

    Hi. My edits:

    • JSON serialization as an option (gob by default, so it have full backward compatibility). Also JSON have around 25% less file size
    • Minor codestyle fixes - error check in defer, spread operator in test
    • Go mod file
    2020/07/24 20:32:04 gob [One Two Three Four Five Six Seven Eight Nine Ten]
    2020/07/24 20:32:04 gob size 816
    2020/07/24 20:32:04 json [One Two Three Four Five Six Seven Eight Nine Ten]
    2020/07/24 20:32:04 json size 611
    
    package main
    
    import (
    	"github.com/jbrukh/bayesian"
    	"log"
    	"os"
    	"path"
    )
    
    func write(ser bayesian.Serializer) {
    	const (
    		One   bayesian.Class = "One"
    		Two   bayesian.Class = "Two"
    		Three bayesian.Class = "Three"
    		Four  bayesian.Class = "Four"
    		Five  bayesian.Class = "Five"
    		Six   bayesian.Class = "Six"
    		Seven bayesian.Class = "Seven"
    		Eight bayesian.Class = "Eight"
    		Nine  bayesian.Class = "Nine"
    		Ten   bayesian.Class = "Ten"
    	)
    
    	classifier := bayesian.NewClassifier(One, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten)
    	oneStuff := []string{"lorem", "ipsum", "dolor"}
    	twoStuff := []string{"sit", "amet", "consectetur"}
    	threeStuff := []string{"adipiscing", "elit", "sed"}
    	fourStuff := []string{"do", "eiusmod", "tempor"}
    	fiveStuff := []string{"incididunt", "ut", "labore"}
    	sixStuff := []string{"et", "dolore", "magna"}
    	sevenStuff := []string{"aliqua", "ut", "enim"}
    	eightStuff := []string{"ad", "minim", "veniam"}
    	nineStuff := []string{"quis", "nostrud", "exercitation"}
    	tenStuff := []string{"ullamco", "laboris", "nisi"}
    
    	classifier.Learn(oneStuff, One)
    	classifier.Learn(twoStuff, Two)
    	classifier.Learn(threeStuff, Three)
    	classifier.Learn(fourStuff, Four)
    	classifier.Learn(fiveStuff, Five)
    	classifier.Learn(sixStuff, Six)
    	classifier.Learn(sevenStuff, Seven)
    	classifier.Learn(eightStuff, Eight)
    	classifier.Learn(nineStuff, Nine)
    	classifier.Learn(tenStuff, Ten)
    
    	wd, err := os.Getwd()
    	if err != nil {
    		panic(err)
    	}
    
    	err = classifier.WriteToFile(path.Join(wd, "out_"+string(ser)), ser)
    	if err != nil {
    		panic(err)
    	}
    }
    
    func read(ser bayesian.Serializer) {
    	wd, err := os.Getwd()
    	if err != nil {
    		panic(err)
    	}
    
    	file := path.Join(wd, "out_"+string(ser))
    
    	classifier, err := bayesian.NewClassifierFromFile(file, ser)
    	if err != nil {
    		panic(err)
    	}
    
    	f, err := os.Open(file)
    	if err != nil {
    		panic(err)
    	}
    	info, err := f.Stat()
    	if err != nil {
    		panic(err)
    	}
    
    	log.Println(ser, classifier.Classes)
    	log.Println(ser, "size", info.Size())
    }
    
    func main() {
    	write(bayesian.Gob)
    	read(bayesian.Gob)
    
    	write(bayesian.JSON)
    	read(bayesian.JSON)
    }
    
  • Request for a new function that will enable adding of new class to an existing classifier

    Request for a new function that will enable adding of new class to an existing classifier

    Hi,

    I found this library very useful. I think since this has a supervised learning mechanism, it would be good if we can add a new class for stuffs that can be learned that can't be categorized from the existing classes.

    Thanks

  • request for a tag of an older commit

    request for a tag of an older commit

    git tag -a 1.1 35eb93528ee -m "tag a specific older version that was built against"
    git push --tags
    

    In addition, it would be nice if current versions were tagged as well...

  • Allow classifier to initialise with only one class

    Allow classifier to initialise with only one class

    Current code panics in case the classifier is initialised with just 1 class. However I have an edge case where there might only be a single class. So I was thinking maybe classifier could be allowed to initialise with just 1 class. I made changes in the code and tested for my use case. It worked. However, the unit tests fail in this case. Is there any specific reason to keep this limitation? What would be a good way to solve this, if any?

  • Seen() is always 0?

    Seen() is always 0?

    package main
    
    import (
    	"log"
    
    	"github.com/jbrukh/bayesian"
    )
    
    const (
    	Arabic  bayesian.Class = "Arabic"
    	Malay   bayesian.Class = "Malay"
    	Yiddish bayesian.Class = "Yiddish"
    )
    
    func main() {
    
    	nbClassifier := bayesian.NewClassifier(Arabic, Malay, Yiddish)
    	arabicStuff := []string{"algeria", "bahrain", "comoros"}
    	malaysianStuff := []string{"malaysians", "bahasa"}
    	yiddishStuff := []string{"jewish", "jews", "israel"}
    	nbClassifier.Learn(arabicStuff, Arabic)
    	nbClassifier.Learn(malaysianStuff, Malay)
    	nbClassifier.Learn(yiddishStuff, Yiddish)
    
    	log.Println(nbClassifier.Learned()) // 3
    	log.Printf(`SEEN: %d`, nbClassifier.Seen()) // 0
    }
    
A Naive Bayes SMS spam classifier written in Go.
A Naive Bayes SMS spam classifier written in Go.

Ham (SMS spam classifier) Summary The purpose of this project is to demonstrate a simple probabilistic SMS spam classifier in Go. This supervised lear

Sep 9, 2022
Ensembles of decision trees in go/golang.
Ensembles of decision trees in go/golang.

CloudForest Google Group Fast, flexible, multi-threaded ensembles of decision trees for machine learning in pure Go (golang). CloudForest allows for a

Dec 1, 2022
Genetic Algorithms library written in Go / golang

Description Genetic Algorithms for Go/Golang Install $ go install git://github.com/thoj/go-galib.git Compiling examples: $ git clone git://github.com

Sep 27, 2022
Golang Genetic Algorithm
Golang Genetic Algorithm

goga Golang implementation of a genetic algorithm. See ./examples for info on how to use the library. Overview Goga is a genetic algorithm solution wr

Dec 19, 2022
Golang Neural Network
Golang Neural Network

Varis Neural Networks with GO About Package Some time ago I decided to learn Go language and neural networks. So it's my variation of Neural Networks

Sep 27, 2022
Golang implementation of the Paice/Husk Stemming Algorithm

##Golang Implementation of the Paice/Husk stemming algorithm This project was created for the QUT course INB344. Details on the algorithm can be found

Sep 27, 2022
Golang HTML to PDF Converter
Golang HTML to PDF Converter

Golang HTML to PDF Converter For reading any document, one prefers PDF format over any other formats as it is considered as a standard format for any

Dec 15, 2022
A high-performance timeline tracing library for Golang, used by TiDB

Minitrace-Go A high-performance, ergonomic timeline tracing library for Golang. Basic Usage package main import ( "context" "fmt" "strcon

Nov 28, 2022
Gota: DataFrames and data wrangling in Go (Golang)

Gota: DataFrames, Series and Data Wrangling for Go This is an implementation of DataFrames, Series and data wrangling methods for the Go programming l

Jan 5, 2023
Golang k-d tree implementation with duplicate coordinate support

Golang k-d tree implementation with duplicate coordinate support

Nov 9, 2022
Another AOC repo (this time in golang!)

advent-of-code Now with 100% more golang! (It's going to be a long advent of code...) To run: Get your data for a given year/day and copy paste it to

Dec 14, 2021
Go (Golang) encrypted deep learning library; Fully homomorphic encryption over neural network graphs

DC DarkLantern A lantern is a portable case that protects light, A dark lantern is one who's light can be hidden at will. DC DarkLantern is a golang i

Oct 31, 2022
Clean Architecture With Golang

Clean Architecture With Golang When init a new project go mod init github.com/samuelterra22/clean-architecture-go Run testes go test ./... Generate a

Aug 2, 2022
TFKG - A Tensorflow and Keras Golang port

TFKG - A Tensorflow and Keras Golang port This is experimental and quite nasty under the hood* Support macOS: running docker container, no GPU acceler

Oct 18, 2022
face detction/recognization golang lib using tensorflow facenet
face detction/recognization golang lib using tensorflow facenet

Golang lib for detect/recognize by tensorflow facenet Prerequest libtensorfow 1.x Follow the instruction Install TensorFlow for C facenet tenorflow sa

Sep 23, 2022
Genetic algorithms using Golang Generics

Package genetic Package genetic implements genetic algorithms using Golang's Gen

Sep 23, 2022
Naive Bayesian Classification for Golang.

Naive Bayesian Classification Perform naive Bayesian classification into an arbitrary number of classes on sets of strings. bayesian also supports ter

Dec 20, 2022
Bayesian text classifier with flexible tokenizers and storage backends for Go

Shield is a bayesian text classifier with flexible tokenizer and backend store support Currently implemented: Redis backend English tokenizer Example

Nov 25, 2022
naive go bindings to the CPython C-API

go-python Naive go bindings towards the C-API of CPython-2. this package provides a go package named "python" under which most of the PyXYZ functions

Jan 5, 2023
naive go bindings to GnuPlot
naive go bindings to GnuPlot

go-gnuplot Simple-minded functions to work with gnuplot. go-gnuplot runs gnuplot as a subprocess and pushes commands via the STDIN of that subprocess.

Nov 8, 2021