Naive Bayesian Classification for Golang.

Naive Bayesian Classification

Perform naive Bayesian classification into an arbitrary number of classes on sets of strings. bayesian also supports term frequency-inverse document frequency calculations (TF-IDF).

Copyright (c) 2011-2017. Jake Brukhman. ([email protected]). All rights reserved. See the LICENSE file for BSD-style license.


Background

This is meant to be an low-entry barrier Go library for basic Bayesian classification. See code comments for a refresher on naive Bayesian classifiers, and please take some time to understand underflow edge cases as this otherwise may result in innacurate classifications.


Installation

Using the go command:

go get github.com/navossoc/bayesian
go install !$

Documentation

See the GoPkgDoc documentation here.


Features

  • Conditional probability and "log-likelihood"-like scoring.
  • Underflow detection.
  • Simple persistence of classifiers.
  • Statistics.
  • TF-IDF support.

Example 1 (Simple Classification)

To use the classifier, first you must create some classes and train it:

import "github.com/navossoc/bayesian"

const (
    Good bayesian.Class = "Good"
    Bad  bayesian.Class = "Bad"
)

classifier := bayesian.NewClassifier(Good, Bad)
goodStuff := []string{"tall", "rich", "handsome"}
badStuff  := []string{"poor", "smelly", "ugly"}
classifier.Learn(goodStuff, Good)
classifier.Learn(badStuff,  Bad)

Then you can ascertain the scores of each class and the most likely class your data belongs to:

scores, likely, _ := classifier.LogScores(
                        []string{"tall", "girl"},
                     )

Magnitude of the score indicates likelihood. Alternatively (but with some risk of float underflow), you can obtain actual probabilities:

probs, likely, _ := classifier.ProbScores(
                        []string{"tall", "girl"},
                     )

Example 2 (TF-IDF Support)

To use the TF-IDF classifier, first you must create some classes and train it and you need to call ConvertTermsFreqToTfIdf() AFTER training and before calling classification methods such as LogScores, SafeProbScores, and ProbScores)

import "github.com/navossoc/bayesian"

const (
    Good bayesian.Class = "Good"
    Bad bayesian.Class = "Bad"
)

// Create a classifier with TF-IDF support.
classifier := bayesian.NewClassifierTfIdf(Good, Bad)

goodStuff := []string{"tall", "rich", "handsome"}
badStuff  := []string{"poor", "smelly", "ugly"}

classifier.Learn(goodStuff, Good)
classifier.Learn(badStuff,  Bad)

// Required
classifier.ConvertTermsFreqToTfIdf()

Then you can ascertain the scores of each class and the most likely class your data belongs to:

scores, likely, _ := classifier.LogScores(
    []string{"tall", "girl"},
)

Magnitude of the score indicates likelihood. Alternatively (but with some risk of float underflow), you can obtain actual probabilities:

probs, likely, _ := classifier.ProbScores(
    []string{"tall", "girl"},
)

Use wisely.

Comments
  • remove extra float64 slice

    remove extra float64 slice

    using simple benchmark over ProbScores:

    func BenchmarkProbScores(b *testing.B) {
        c := NewClassifier(Good, Bad)
        c.Learn([]string{"tall", "handsome", "rich"}, Good)
    
        for n := 0; n < b.N; n++ {
            c.ProbScores([]string{"the", "tall", "man"})
        }
    }
    

    old code: BenchmarkProbScores-4 5000000 271 ns/op 32 B/op 2 allocs/op

    new code: BenchmarkProbScores-4 10000000 199 ns/op 16 B/op 1 allocs/op

    of course this will be more obvious with more classes in the classifier

    PS: because it makes the code less obvious, I am not sure it is worth it to be merged, I just needed the 1 less allocation.

  • Add classes after classifier creation

    Add classes after classifier creation

    Apologies in advance as my knowledge of Go is still somewhat limited, so this may be a naive question.

    I want to expose the naive bayes classifying as a HTTP Web service, with both train and classify endpoints. I have no trouble with that, but I want the train endpoint to be able to accept new labels (labels that aren't currently in the classifier). Right now the labels are simply specified as consts and passed into the constructor. Can you think of the best way to add the ability to add labels at run-time?

  • Release 1.0 is really old - make a new release

    Release 1.0 is really old - make a new release

    1.0 is still importing things like "gob" instead of "encoding/gob" etc. Can you make a new release? I can also help co-maintain the project is that helps.

    Tools like dep will pick up release versions for most people and they will get code that won't work for newer versions of go.

    Thanks!

  • Panic if underflow is detected in `SafeProbScores`

    Panic if underflow is detected in `SafeProbScores`

    SafeProbScores ... If an underflow is detected, this method panics

    Source

    I am getting a bit confused by the comment in the method above according to the doc this method is suppose to panic but the code instead returns an error.

    Am I missing something ?

  • Changed SafeProbScores to return an error instead of a panic

    Changed SafeProbScores to return an error instead of a panic

    I know this is not a backwards compatible change, but this is a mathematical error and not really a runtime error, so: a) a panic causes functions to unwind outside of this package, which is not good for long running applications and b) there's little need to fill the log with stack traces given this is a known and reasonably common outcome.

    PS. Great package, really useful, thanks!

  • Fix function comments based on best practices from Effective Go

    Fix function comments based on best practices from Effective Go

    Every exported function in a program should have a doc comment. The first sentence should be a summary that starts with the name being declared. From effective go.

    I generated this with CodeLingo and I'm keen to get some feedback, but this is automated so feel free to close it and just say opt out to opt out of future CodeLingo outreach PRs.

  • fix data race issue for Classifier.seen

    fix data race issue for Classifier.seen

    When running the LogScores method in a highly concurrent situation, I noticed that Go's data race detector would complain about a data race regarding Classifier.seen. So that's why I changed any read and write operation to that particular member of the struct to only use atomic load and increment functions.

  • add Observe method to support externally learned word frequencies

    add Observe method to support externally learned word frequencies

    External methods to learn word frequencies might be things like distributed word-count in spark. For online classification, however, it might still be desirable to use go.

  • what is good or bad?

    what is good or bad?

    what is good or bad?

    Sorry, I didn't understand. After getting the result, how to know whether the result belongs to good or bad?

    scores, likely, _ := classifier.LogScores(
                            []string{"tall", "girl"},
                         )
    
    probs, likely, _ := classifier.ProbScores(
                            []string{"tall", "girl"},
                         )
    
    

    likely == 1 is bad? @jbrukh

  • Use CodeLingo to Address Further Issues

    Use CodeLingo to Address Further Issues

    Hi @jbrukh!

    Thanks for merging the fixes from our earlier pull request. They were generated by CodeLingo which we've used to find a further 30 issues in the repo. This PR adds a set of CodeLingo Tenets which catch any new cases of the found issues in PRs to your repo.

    CodeLingo will also send follow-up PRs to fix the existing repos in the codebase. Install CodeLingo GitHub app after merging this PR. It will always be free for open source.

    We're most interested to see if we can help with project specific bugs. Tell us about more interesting issues and we'll see if our tech can help - free of charge.

    Thanks, Blake and the CodeLingo Team

  • Modernized go fmt, lint etc fixes. Simple code cleanup

    Modernized go fmt, lint etc fixes. Simple code cleanup

    @jbrukh hey, I did some simple code cleanup here

    • Moved the package doc to doc.go
    • Fixed some range syntax
    • Added some function docs
    • Other minor changes recommended according to golint
  • JSON serialization

    JSON serialization

    Hi. My edits:

    • JSON serialization as an option (gob by default, so it have full backward compatibility). Also JSON have around 25% less file size
    • Minor codestyle fixes - error check in defer, spread operator in test
    • Go mod file
    2020/07/24 20:32:04 gob [One Two Three Four Five Six Seven Eight Nine Ten]
    2020/07/24 20:32:04 gob size 816
    2020/07/24 20:32:04 json [One Two Three Four Five Six Seven Eight Nine Ten]
    2020/07/24 20:32:04 json size 611
    
    package main
    
    import (
    	"github.com/jbrukh/bayesian"
    	"log"
    	"os"
    	"path"
    )
    
    func write(ser bayesian.Serializer) {
    	const (
    		One   bayesian.Class = "One"
    		Two   bayesian.Class = "Two"
    		Three bayesian.Class = "Three"
    		Four  bayesian.Class = "Four"
    		Five  bayesian.Class = "Five"
    		Six   bayesian.Class = "Six"
    		Seven bayesian.Class = "Seven"
    		Eight bayesian.Class = "Eight"
    		Nine  bayesian.Class = "Nine"
    		Ten   bayesian.Class = "Ten"
    	)
    
    	classifier := bayesian.NewClassifier(One, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten)
    	oneStuff := []string{"lorem", "ipsum", "dolor"}
    	twoStuff := []string{"sit", "amet", "consectetur"}
    	threeStuff := []string{"adipiscing", "elit", "sed"}
    	fourStuff := []string{"do", "eiusmod", "tempor"}
    	fiveStuff := []string{"incididunt", "ut", "labore"}
    	sixStuff := []string{"et", "dolore", "magna"}
    	sevenStuff := []string{"aliqua", "ut", "enim"}
    	eightStuff := []string{"ad", "minim", "veniam"}
    	nineStuff := []string{"quis", "nostrud", "exercitation"}
    	tenStuff := []string{"ullamco", "laboris", "nisi"}
    
    	classifier.Learn(oneStuff, One)
    	classifier.Learn(twoStuff, Two)
    	classifier.Learn(threeStuff, Three)
    	classifier.Learn(fourStuff, Four)
    	classifier.Learn(fiveStuff, Five)
    	classifier.Learn(sixStuff, Six)
    	classifier.Learn(sevenStuff, Seven)
    	classifier.Learn(eightStuff, Eight)
    	classifier.Learn(nineStuff, Nine)
    	classifier.Learn(tenStuff, Ten)
    
    	wd, err := os.Getwd()
    	if err != nil {
    		panic(err)
    	}
    
    	err = classifier.WriteToFile(path.Join(wd, "out_"+string(ser)), ser)
    	if err != nil {
    		panic(err)
    	}
    }
    
    func read(ser bayesian.Serializer) {
    	wd, err := os.Getwd()
    	if err != nil {
    		panic(err)
    	}
    
    	file := path.Join(wd, "out_"+string(ser))
    
    	classifier, err := bayesian.NewClassifierFromFile(file, ser)
    	if err != nil {
    		panic(err)
    	}
    
    	f, err := os.Open(file)
    	if err != nil {
    		panic(err)
    	}
    	info, err := f.Stat()
    	if err != nil {
    		panic(err)
    	}
    
    	log.Println(ser, classifier.Classes)
    	log.Println(ser, "size", info.Size())
    }
    
    func main() {
    	write(bayesian.Gob)
    	read(bayesian.Gob)
    
    	write(bayesian.JSON)
    	read(bayesian.JSON)
    }
    
  • Request for a new function that will enable adding of new class to an existing classifier

    Request for a new function that will enable adding of new class to an existing classifier

    Hi,

    I found this library very useful. I think since this has a supervised learning mechanism, it would be good if we can add a new class for stuffs that can be learned that can't be categorized from the existing classes.

    Thanks

  • request for a tag of an older commit

    request for a tag of an older commit

    git tag -a 1.1 35eb93528ee -m "tag a specific older version that was built against"
    git push --tags
    

    In addition, it would be nice if current versions were tagged as well...

  • Allow classifier to initialise with only one class

    Allow classifier to initialise with only one class

    Current code panics in case the classifier is initialised with just 1 class. However I have an edge case where there might only be a single class. So I was thinking maybe classifier could be allowed to initialise with just 1 class. I made changes in the code and tested for my use case. It worked. However, the unit tests fail in this case. Is there any specific reason to keep this limitation? What would be a good way to solve this, if any?

  • Seen() is always 0?

    Seen() is always 0?

    package main
    
    import (
    	"log"
    
    	"github.com/jbrukh/bayesian"
    )
    
    const (
    	Arabic  bayesian.Class = "Arabic"
    	Malay   bayesian.Class = "Malay"
    	Yiddish bayesian.Class = "Yiddish"
    )
    
    func main() {
    
    	nbClassifier := bayesian.NewClassifier(Arabic, Malay, Yiddish)
    	arabicStuff := []string{"algeria", "bahrain", "comoros"}
    	malaysianStuff := []string{"malaysians", "bahasa"}
    	yiddishStuff := []string{"jewish", "jews", "israel"}
    	nbClassifier.Learn(arabicStuff, Arabic)
    	nbClassifier.Learn(malaysianStuff, Malay)
    	nbClassifier.Learn(yiddishStuff, Yiddish)
    
    	log.Println(nbClassifier.Learned()) // 3
    	log.Printf(`SEEN: %d`, nbClassifier.Seen()) // 0
    }
    
Related tags
Parses the Graphviz DOT language in golang

Parses the Graphviz DOT language and creates an interface, in golang, with which to easily create new and manipulate existing graphs which can be writ

Dec 25, 2022
Golang RServe client. Use R from Go

Roger Roger is a Go RServe client, allowing the capabilities of R to be used from Go applications. The communication between Go and R is via TCP. It i

Dec 22, 2022
A well tested and comprehensive Golang statistics library package with no dependencies.

Stats - Golang Statistics Package A well tested and comprehensive Golang statistics library / package / module with no dependencies. If you have any s

Dec 27, 2022
Naive Bayesian Classification for Golang.

Naive Bayesian Classification Perform naive Bayesian classification into an arbitrary number of classes on sets of strings. bayesian also supports ter

Dec 20, 2022
Bayesian text classifier with flexible tokenizers and storage backends for Go

Shield is a bayesian text classifier with flexible tokenizer and backend store support Currently implemented: Redis backend English tokenizer Example

Nov 25, 2022
naive go bindings to the CPython C-API

go-python Naive go bindings towards the C-API of CPython-2. this package provides a go package named "python" under which most of the PyXYZ functions

Jan 5, 2023
naive go bindings to GnuPlot
naive go bindings to GnuPlot

go-gnuplot Simple-minded functions to work with gnuplot. go-gnuplot runs gnuplot as a subprocess and pushes commands via the STDIN of that subprocess.

Nov 8, 2021
naive go bindings to the CPython C-API

go-python Naive go bindings towards the C-API of CPython-2. this package provides a go package named "python" under which most of the PyXYZ functions

Dec 30, 2022
A Naive Bayes SMS spam classifier written in Go.
A Naive Bayes SMS spam classifier written in Go.

Ham (SMS spam classifier) Summary The purpose of this project is to demonstrate a simple probabilistic SMS spam classifier in Go. This supervised lear

Sep 9, 2022
Naive Bayes spam-filtering in Go

Naive Bayes Spam-Filtering Spam is a simple implementation of naive Bayes spam-filtering algorithm. Resources youtube - live coding(farsi). License Th

Nov 20, 2021
Naive LEGO helper for SberCloud DNS to be used with the EXEC plugin

Naive LEGO helper for SberCloud DNS Very basic, no any checks performed To be used with the exec plugin as described here Environment variables SBC_AC

Nov 3, 2021
A naive implementation of Raft consensus algorithm.

This implementation is used to learn/understand the Raft consensus algorithm. The code implements the behaviors shown in Figure 2 of the Raft paper wi

Dec 3, 2021
Paxoskv: a Naive and Basic paxos kv storage

paxoskv: a Naive and Basic paxos kv storage 这个repo 目前仅是用于学习的实例代码. 这是一个基于paxos, 只有200行代码的kv存储系统的简单实现, 以最简洁的形式展示paxos如何运行, 作为 可靠分布式系统-paxos的直观解释 这篇教程中的代

Nov 29, 2021
A naive and simple implementation of blockchains.

naivechain A naive and simple implementation of blockchains. Build And Run Download and compile go get -v github.com/kofj/naivechain Start First Node

Dec 5, 2022
Gocfg - A naive and simple cfg parser that uses maps internally done in Go

gocfg A simple ini-like parser based on maps. Key iteration can be done using th

Sep 13, 2022
[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。

Goribot 一个分布式友好的轻量的 Golang 爬虫框架。 完整文档 | Document !! Warning !! Goribot 已经被迁移到 Gospider|github.com/zhshch2002/gospider。修复了一些调度问题并分离了网络请求部分到另一个仓库。此仓库会继续

Oct 29, 2022
Realize is the #1 Golang Task Runner which enhance your workflow by automating the most common tasks and using the best performing Golang live reloading.
Realize is the #1 Golang Task Runner which enhance your workflow by automating the most common tasks and using the best performing Golang live reloading.

#1 Golang live reload and task runner Content - ⭐️ Top Features - ???? Get started - ?? Config sample - ?? Commands List - ?? Support and Suggestions

Jan 6, 2023
golang feature toggle library - a library to help make golang feature toggling clean and easy

toggle supports env_variable backed toggling. It can also be updated via a pubsub interface (tested w/ redis) 2 engines for toggle backing are include

Mar 29, 2022
Hprose 1.0 for Golang (Deprecated). Hprose 2.0 for Golang is here:

Hprose for Golang Introduction Installation Usage Http Server Http Client Synchronous Invoking Synchronous Exception Handling Asynchronous Invoking As

Dec 15, 2022
graylog-golang is a full implementation for sending messages in GELF (Graylog Extended Log Format) from Go (Golang) to Graylog

graylog-golang is a full implementation for sending messages in GELF (Graylog Extended Log Format) from Go (Golang) to Graylog

Dec 5, 2022