Naive Bayesian Classification for Golang.

Last update: Dec 20, 2022

Comments: 16

Naive Bayesian Classification

Perform naive Bayesian classification into an arbitrary number of classes on sets of strings. bayesian also supports term frequency-inverse document frequency calculations (TF-IDF).

Background

This is meant to be an low-entry barrier Go library for basic Bayesian classification. See code comments for a refresher on naive Bayesian classifiers, and please take some time to understand underflow edge cases as this otherwise may result in innacurate classifications.

Installation

Using the go command:

go get github.com/navossoc/bayesian
go install !$

Documentation

See the GoPkgDoc documentation here.

Features

Conditional probability and "log-likelihood"-like scoring.
Underflow detection.
Simple persistence of classifiers.
Statistics.
TF-IDF support.

Example 1 (Simple Classification)

To use the classifier, first you must create some classes and train it:

import "github.com/navossoc/bayesian"

const (
    Good bayesian.Class = "Good"
    Bad  bayesian.Class = "Bad"
)

classifier := bayesian.NewClassifier(Good, Bad)
goodStuff := []string{"tall", "rich", "handsome"}
badStuff  := []string{"poor", "smelly", "ugly"}
classifier.Learn(goodStuff, Good)
classifier.Learn(badStuff,  Bad)

Then you can ascertain the scores of each class and the most likely class your data belongs to:

scores, likely, _ := classifier.LogScores(
                        []string{"tall", "girl"},
                     )

Magnitude of the score indicates likelihood. Alternatively (but with some risk of float underflow), you can obtain actual probabilities:

probs, likely, _ := classifier.ProbScores(
                        []string{"tall", "girl"},
                     )

Example 2 (TF-IDF Support)

To use the TF-IDF classifier, first you must create some classes and train it and you need to call ConvertTermsFreqToTfIdf() AFTER training and before calling classification methods such as LogScores, SafeProbScores, and ProbScores)

import "github.com/navossoc/bayesian"

const (
    Good bayesian.Class = "Good"
    Bad bayesian.Class = "Bad"
)

// Create a classifier with TF-IDF support.
classifier := bayesian.NewClassifierTfIdf(Good, Bad)

goodStuff := []string{"tall", "rich", "handsome"}
badStuff  := []string{"poor", "smelly", "ugly"}

classifier.Learn(goodStuff, Good)
classifier.Learn(badStuff,  Bad)

// Required
classifier.ConvertTermsFreqToTfIdf()

Then you can ascertain the scores of each class and the most likely class your data belongs to:

scores, likely, _ := classifier.LogScores(
    []string{"tall", "girl"},
)

Magnitude of the score indicates likelihood. Alternatively (but with some risk of float underflow), you can obtain actual probabilities:

probs, likely, _ := classifier.ProbScores(
    []string{"tall", "girl"},
)

Use wisely.

Owner

Jake Brukhman

https://github.com/jbrukh/bayesian

Comments

remove extra float64 slice
using simple benchmark over ProbScores:

func BenchmarkProbScores(b *testing.B) { c := NewClassifier(Good, Bad) c.Learn([]string{"tall", "handsome", "rich"}, Good) for n := 0; n < b.N; n++ { c.ProbScores([]string{"the", "tall", "man"}) } }

old code: BenchmarkProbScores-4 5000000 271 ns/op 32 B/op 2 allocs/op

new code: BenchmarkProbScores-4 10000000 199 ns/op 16 B/op 1 allocs/op

of course this will be more obvious with more classes in the classifier

PS: because it makes the code less obvious, I am not sure it is worth it to be merged, I just needed the 1 less allocation.
Add classes after classifier creation

Apologies in advance as my knowledge of Go is still somewhat limited, so this may be a naive question.

I want to expose the naive bayes classifying as a HTTP Web service, with both train and classify endpoints. I have no trouble with that, but I want the train endpoint to be able to accept new labels (labels that aren't currently in the classifier). Right now the labels are simply specified as consts and passed into the constructor. Can you think of the best way to add the ability to add labels at run-time?
Release 1.0 is really old - make a new release

1.0 is still importing things like "gob" instead of "encoding/gob" etc. Can you make a new release? I can also help co-maintain the project is that helps.

Tools like dep will pick up release versions for most people and they will get code that won't work for newer versions of go.

Thanks!
Panic if underflow is detected in `SafeProbScores`

SafeProbScores ... If an underflow is detected, this method panics

Source

I am getting a bit confused by the comment in the method above according to the doc this method is suppose to panic but the code instead returns an error.

Am I missing something ?
Changed SafeProbScores to return an error instead of a panic

I know this is not a backwards compatible change, but this is a mathematical error and not really a runtime error, so: a) a panic causes functions to unwind outside of this package, which is not good for long running applications and b) there's little need to fill the log with stack traces given this is a known and reasonably common outcome.

PS. Great package, really useful, thanks!
Fix function comments based on best practices from Effective Go

Every exported function in a program should have a doc comment. The first sentence should be a summary that starts with the name being declared. From effective go.

I generated this with CodeLingo and I'm keen to get some feedback, but this is automated so feel free to close it and just say opt out to opt out of future CodeLingo outreach PRs.
fix data race issue for Classifier.seen

When running the LogScores method in a highly concurrent situation, I noticed that Go's data race detector would complain about a data race regarding Classifier.seen. So that's why I changed any read and write operation to that particular member of the struct to only use atomic load and increment functions.
add Observe method to support externally learned word frequencies

External methods to learn word frequencies might be things like distributed word-count in spark. For online classification, however, it might still be desirable to use go.

what is good or bad?

Sorry, I didn't understand. After getting the result, how to know whether the result belongs to good or bad?

scores, likely, _ := classifier.LogScores(
                        []string{"tall", "girl"},
                     )

probs, likely, _ := classifier.ProbScores(
                        []string{"tall", "girl"},
                     )

likely == 1 is bad? @jbrukh

Use CodeLingo to Address Further Issues

Hi @jbrukh!

Thanks for merging the fixes from our earlier pull request. They were generated by CodeLingo which we've used to find a further 30 issues in the repo. This PR adds a set of CodeLingo Tenets which catch any new cases of the found issues in PRs to your repo.

CodeLingo will also send follow-up PRs to fix the existing repos in the codebase. Install CodeLingo GitHub app after merging this PR. It will always be free for open source.

We're most interested to see if we can help with project specific bugs. Tell us about more interesting issues and we'll see if our tech can help - free of charge.

Thanks, Blake and the CodeLingo Team
Modernized go fmt, lint etc fixes. Simple code cleanup
@jbrukh hey, I did some simple code cleanup here

Moved the package doc to doc.go

Fixed some range syntax

Added some function docs

Other minor changes recommended according to golint

JSON serialization

Hi. My edits:

JSON serialization as an option (gob by default, so it have full backward compatibility). Also JSON have around 25% less file size
Minor codestyle fixes - error check in defer, spread operator in test
Go mod file

2020/07/24 20:32:04 gob [One Two Three Four Five Six Seven Eight Nine Ten]
2020/07/24 20:32:04 gob size 816
2020/07/24 20:32:04 json [One Two Three Four Five Six Seven Eight Nine Ten]
2020/07/24 20:32:04 json size 611

package main

import (
	"github.com/jbrukh/bayesian"
	"log"
	"os"
	"path"
)

func write(ser bayesian.Serializer) {
	const (
		One   bayesian.Class = "One"
		Two   bayesian.Class = "Two"
		Three bayesian.Class = "Three"
		Four  bayesian.Class = "Four"
		Five  bayesian.Class = "Five"
		Six   bayesian.Class = "Six"
		Seven bayesian.Class = "Seven"
		Eight bayesian.Class = "Eight"
		Nine  bayesian.Class = "Nine"
		Ten   bayesian.Class = "Ten"
	)

	classifier := bayesian.NewClassifier(One, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten)
	oneStuff := []string{"lorem", "ipsum", "dolor"}
	twoStuff := []string{"sit", "amet", "consectetur"}
	threeStuff := []string{"adipiscing", "elit", "sed"}
	fourStuff := []string{"do", "eiusmod", "tempor"}
	fiveStuff := []string{"incididunt", "ut", "labore"}
	sixStuff := []string{"et", "dolore", "magna"}
	sevenStuff := []string{"aliqua", "ut", "enim"}
	eightStuff := []string{"ad", "minim", "veniam"}
	nineStuff := []string{"quis", "nostrud", "exercitation"}
	tenStuff := []string{"ullamco", "laboris", "nisi"}

	classifier.Learn(oneStuff, One)
	classifier.Learn(twoStuff, Two)
	classifier.Learn(threeStuff, Three)
	classifier.Learn(fourStuff, Four)
	classifier.Learn(fiveStuff, Five)
	classifier.Learn(sixStuff, Six)
	classifier.Learn(sevenStuff, Seven)
	classifier.Learn(eightStuff, Eight)
	classifier.Learn(nineStuff, Nine)
	classifier.Learn(tenStuff, Ten)

	wd, err := os.Getwd()
	if err != nil {
		panic(err)
	}

	err = classifier.WriteToFile(path.Join(wd, "out_"+string(ser)), ser)
	if err != nil {
		panic(err)
	}
}

func read(ser bayesian.Serializer) {
	wd, err := os.Getwd()
	if err != nil {
		panic(err)
	}

	file := path.Join(wd, "out_"+string(ser))

	classifier, err := bayesian.NewClassifierFromFile(file, ser)
	if err != nil {
		panic(err)
	}

	f, err := os.Open(file)
	if err != nil {
		panic(err)
	}
	info, err := f.Stat()
	if err != nil {
		panic(err)
	}

	log.Println(ser, classifier.Classes)
	log.Println(ser, "size", info.Size())
}

func main() {
	write(bayesian.Gob)
	read(bayesian.Gob)

	write(bayesian.JSON)
	read(bayesian.JSON)
}

Request for a new function that will enable adding of new class to an existing classifier

Hi,

I found this library very useful. I think since this has a supervised learning mechanism, it would be good if we can add a new class for stuffs that can be learned that can't be categorized from the existing classes.

Thanks
request for a tag of an older commit
git tag -a 1.1 35eb93528ee -m "tag a specific older version that was built against" git push --tags

In addition, it would be nice if current versions were tagged as well...
Allow classifier to initialise with only one class

Current code panics in case the classifier is initialised with just 1 class. However I have an edge case where there might only be a single class. So I was thinking maybe classifier could be allowed to initialise with just 1 class. I made changes in the code and tested for my use case. It worked. However, the unit tests fail in this case. Is there any specific reason to keep this limitation? What would be a good way to solve this, if any?

Seen() is always 0?

package main

import (
	"log"

	"github.com/jbrukh/bayesian"
)

const (
	Arabic  bayesian.Class = "Arabic"
	Malay   bayesian.Class = "Malay"
	Yiddish bayesian.Class = "Yiddish"
)

func main() {

	nbClassifier := bayesian.NewClassifier(Arabic, Malay, Yiddish)
	arabicStuff := []string{"algeria", "bahrain", "comoros"}
	malaysianStuff := []string{"malaysians", "bahasa"}
	yiddishStuff := []string{"jewish", "jews", "israel"}
	nbClassifier.Learn(arabicStuff, Arabic)
	nbClassifier.Learn(malaysianStuff, Malay)
	nbClassifier.Learn(yiddishStuff, Yiddish)

	log.Println(nbClassifier.Learned()) // 3
	log.Printf(`SEEN: %d`, nbClassifier.Seen()) // 0
}

Related tags

Mathematics bayesian

Parses the Graphviz DOT language in golang

Parses the Graphviz DOT language and creates an interface, in golang, with which to easily create new and manipulate existing graphs which can be writ

Dec 25, 2022

Golang RServe client. Use R from Go

Roger Roger is a Go RServe client, allowing the capabilities of R to be used from Go applications. The communication between Go and R is via TCP. It i

Dec 22, 2022

A well tested and comprehensive Golang statistics library package with no dependencies.

Stats - Golang Statistics Package A well tested and comprehensive Golang statistics library / package / module with no dependencies. If you have any s

Dec 27, 2022

Naive Bayesian Classification for Golang.

Naive Bayesian Classification Perform naive Bayesian classification into an arbitrary number of classes on sets of strings. bayesian also supports ter

Dec 20, 2022

Bayesian text classifier with flexible tokenizers and storage backends for Go

Shield is a bayesian text classifier with flexible tokenizer and backend store support Currently implemented: Redis backend English tokenizer Example

Nov 25, 2022

naive go bindings to the CPython C-API

go-python Naive go bindings towards the C-API of CPython-2. this package provides a go package named "python" under which most of the PyXYZ functions

Jan 5, 2023

naive go bindings to GnuPlot

go-gnuplot Simple-minded functions to work with gnuplot. go-gnuplot runs gnuplot as a subprocess and pushes commands via the STDIN of that subprocess.

Nov 8, 2021

naive go bindings to the CPython C-API

go-python Naive go bindings towards the C-API of CPython-2. this package provides a go package named "python" under which most of the PyXYZ functions

Dec 30, 2022

A Naive Bayes SMS spam classifier written in Go.

Ham (SMS spam classifier) Summary The purpose of this project is to demonstrate a simple probabilistic SMS spam classifier in Go. This supervised lear

Sep 9, 2022

Naive Bayes spam-filtering in Go

Naive Bayes Spam-Filtering Spam is a simple implementation of naive Bayes spam-filtering algorithm. Resources youtube - live coding(farsi). License Th

Nov 20, 2021

Naive LEGO helper for SberCloud DNS to be used with the EXEC plugin

Naive LEGO helper for SberCloud DNS Very basic, no any checks performed To be used with the exec plugin as described here Environment variables SBC_AC

Nov 3, 2021

A naive implementation of Raft consensus algorithm.

This implementation is used to learn/understand the Raft consensus algorithm. The code implements the behaviors shown in Figure 2 of the Raft paper wi

Dec 3, 2021

Paxoskv: a Naive and Basic paxos kv storage

paxoskv: a Naive and Basic paxos kv storage 这个repo 目前仅是用于学习的实例代码. 这是一个基于paxos, 只有200行代码的kv存储系统的简单实现, 以最简洁的形式展示paxos如何运行, 作为可靠分布式系统-paxos的直观解释这篇教程中的代

Nov 29, 2021

A naive and simple implementation of blockchains.

naivechain A naive and simple implementation of blockchains. Build And Run Download and compile go get -v github.com/kofj/naivechain Start First Node

Dec 5, 2022

Gocfg - A naive and simple cfg parser that uses maps internally done in Go

gocfg A simple ini-like parser based on maps. Key iteration can be done using th

Sep 13, 2022

[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。

Goribot 一个分布式友好的轻量的 Golang 爬虫框架。完整文档 | Document !! Warning !! Goribot 已经被迁移到 Gospider|github.com/zhshch2002/gospider。修复了一些调度问题并分离了网络请求部分到另一个仓库。此仓库会继续

Oct 29, 2022

Realize is the #1 Golang Task Runner which enhance your workflow by automating the most common tasks and using the best performing Golang live reloading.

#1 Golang live reload and task runner Content - ⭐️ Top Features - ???? Get started - ?? Config sample - ?? Commands List - ?? Support and Suggestions

Jan 6, 2023

golang feature toggle library - a library to help make golang feature toggling clean and easy

toggle supports env_variable backed toggling. It can also be updated via a pubsub interface (tested w/ redis) 2 engines for toggle backing are include

Mar 29, 2022

Hprose 1.0 for Golang (Deprecated). Hprose 2.0 for Golang is here:

Hprose for Golang Introduction Installation Usage Http Server Http Client Synchronous Invoking Synchronous Exception Handling Asynchronous Invoking As

Dec 15, 2022

graylog-golang is a full implementation for sending messages in GELF (Graylog Extended Log Format) from Go (Golang) to Graylog

Dec 5, 2022

Naive Bayesian Classification for Golang.

Naive Bayesian Classification

Background

Installation

Documentation

Features

Example 1 (Simple Classification)

Example 2 (TF-IDF Support)

Owner

Jake Brukhman

Comments

remove extra float64 slice

Add classes after classifier creation

Release 1.0 is really old - make a new release

Panic if underflow is detected in `SafeProbScores`

Changed SafeProbScores to return an error instead of a panic

Fix function comments based on best practices from Effective Go

fix data race issue for Classifier.seen

add Observe method to support externally learned word frequencies

what is good or bad?

what is good or bad?

Use CodeLingo to Address Further Issues

Modernized go fmt, lint etc fixes. Simple code cleanup

JSON serialization

Request for a new function that will enable adding of new class to an existing classifier

request for a tag of an older commit

Allow classifier to initialise with only one class

Seen() is always 0?

Related tags

Parses the Graphviz DOT language in golang

Golang RServe client. Use R from Go

A well tested and comprehensive Golang statistics library package with no dependencies.

Naive Bayesian Classification for Golang.

Bayesian text classifier with flexible tokenizers and storage backends for Go

naive go bindings to the CPython C-API

naive go bindings to GnuPlot

naive go bindings to the CPython C-API

A Naive Bayes SMS spam classifier written in Go.

Naive Bayes spam-filtering in Go

Naive LEGO helper for SberCloud DNS to be used with the EXEC plugin

A naive implementation of Raft consensus algorithm.

Paxoskv: a Naive and Basic paxos kv storage

A naive and simple implementation of blockchains.

Gocfg - A naive and simple cfg parser that uses maps internally done in Go

[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。

Realize is the #1 Golang Task Runner which enhance your workflow by automating the most common tasks and using the best performing Golang live reloading.

golang feature toggle library - a library to help make golang feature toggling clean and easy

Hprose 1.0 for Golang (Deprecated). Hprose 2.0 for Golang is here:

graylog-golang is a full implementation for sending messages in GELF (Graylog Extended Log Format) from Go (Golang) to Graylog