Natural language detection library for Go

Whatlanggo

Build Status Go Report Card GoDoc Coverage Status

Natural language detection for Go.

Features

  • Supports 84 languages
  • 100% written in Go
  • No external dependencies
  • Fast
  • Recognizes not only a language, but also a script (Latin, Cyrillic, etc)

Getting started

Installation:

    go get -u github.com/abadojack/whatlanggo

Simple usage example:

package main

import (
	"fmt"

	"github.com/abadojack/whatlanggo"
)

func main() {
	info := whatlanggo.Detect("Foje funkcias kaj foje ne funkcias")
	fmt.Println("Language:", info.Lang.String(), " Script:", whatlanggo.Scripts[info.Script], " Confidence: ", info.Confidence)
}

Blacklisting and whitelisting

package main

import (
	"fmt"

	"github.com/abadojack/whatlanggo"
)

func main() {
	//Blacklist
	options := whatlanggo.Options{
		Blacklist: map[whatlanggo.Lang]bool{
			whatlanggo.Ydd: true,
		},
	}

	info := whatlanggo.DetectWithOptions("האקדמיה ללשון העברית", options)

	fmt.Println("Language:", info.Lang.String(), "Script:", whatlanggo.Scripts[info.Script])

	//Whitelist
	options1 := whatlanggo.Options{
		Whitelist: map[whatlanggo.Lang]bool{
			whatlanggo.Epo: true,
			whatlanggo.Ukr: true,
		},
	}

	info = whatlanggo.DetectWithOptions("Mi ne scias", options1)
	fmt.Println("Language:", info.Lang.String(), " Script:", whatlanggo.Scripts[info.Script])
}

For more details, please check the documentation.

Requirements

Go 1.8 or higher

How does it work?

How does the language recognition work?

The algorithm is based on the trigram language models, which is a particular case of n-grams. To understand the idea, please check the original whitepaper Cavnar and Trenkle '94: N-Gram-Based Text Categorization'.

How IsReliable calculated?

It is based on the following factors:

  • How many unique trigrams are in the given text
  • How big is the difference between the first and the second(not returned) detected languages? This metric is called rate in the code base.

Therefore, it can be presented as 2d space with threshold functions, that splits it into "Reliable" and "Not reliable" areas. This function is a hyperbola and it looks like the following one:

Language recognition whatlang rust

For more details, please check a blog article Introduction to Rust Whatlang Library and Natural Language Identification Algorithms.

License

MIT

Derivation

whatlanggo is a derivative of Franc (JavaScript, MIT) by Titus Wormer.

Acknowledgements

Thanks to greyblake (Potapov Sergey) for creating whatlang-rs from where I got the idea and algorithms.

Owner
Abado Jack Mtulla
Thunderbolt and lightning very very frightening
Abado Jack Mtulla
Similar Resources

A Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29

segment A Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29 Features Currently only segmentation at Word

Dec 19, 2022

Cgo binding for Snowball C library

Description Snowball stemmer port (cgo wrapper) for Go. Provides word stem extraction functionality. For more detailed info see http://snowball.tartar

Nov 28, 2022

A go library for reading and creating ISO9660 images

iso9660 A package for reading and creating ISO9660 Joliet and Rock Ridge extensions are not supported. Examples Extracting an ISO package main import

Jan 2, 2023

Natural language detection library for Go

Natural language detection library for Go

Whatlanggo Natural language detection for Go. Features Supports 84 languages 100% written in Go No external dependencies Fast Recognizes not only a la

Dec 28, 2022

👄 The most accurate natural language detection library in the Go ecosystem, suitable for long and short text alike

👄 The most accurate natural language detection library in the Go ecosystem, suitable for long and short text alike

👄 The most accurate natural language detection library in the Go ecosystem, suitable for long and short text alike

Dec 25, 2022

👄 The most accurate natural language detection library in the Go ecosystem, suitable for long and short text alike

👄 The most accurate natural language detection library in the Go ecosystem, suitable for long and short text alike

Its task is simple: It tells you which language some provided textual data is written in. This is very useful as a preprocessing step for linguistic data in natural language processing applications such as text classification and spell checking. Other use cases, for instance, might include routing e-mails to the right geographically located customer service department, based on the e-mails' languages.

Dec 29, 2022

Natural language detection package in pure Go

getlang getlang provides fast natural language detection in Go. Features Offline -- no internet connection required Supports 29 languages Provides ISO

Dec 26, 2022

Natural-deploy - A natural and simple way to deploy workloads or anything on other machines.

Natural Deploy Its Go way of doing Ansibles: Motivation: Have you ever felt when using ansible or any declarative type of program that is used for dep

Jan 3, 2022

Fast face detection, pupil/eyes localization and facial landmark points detection library in pure Go.

Fast face detection, pupil/eyes localization and facial landmark points detection library in pure Go.

Pigo is a pure Go face detection, pupil/eyes localization and facial landmark points detection library based on Pixel Intensity Comparison-based Objec

Dec 24, 2022

Elkeid is a Cloud-Native Host-Based Intrusion Detection solution project to provide next-generation Threat Detection and Behavior Audition with modern architecture.

Elkeid is a Cloud-Native Host-Based Intrusion Detection solution project to provide next-generation Threat Detection and Behavior Audition with modern architecture.

Elkeid is a Cloud-Native Host-Based Intrusion Detection solution project to provide next-generation Threat Detection and Behavior Audition with modern architecture.

Dec 30, 2022

Self-contained Machine Learning and Natural Language Processing library in Go

If you like the project, please ★ star this repository to show your support! 🤩 A Machine Learning library written in pure Go designed to support rele

Dec 30, 2022

Self-contained Machine Learning and Natural Language Processing library in Go

Self-contained Machine Learning and Natural Language Processing library in Go

Self-contained Machine Learning and Natural Language Processing library in Go

Jan 8, 2023

Selected Machine Learning algorithms for natural language processing and semantic analysis in Golang

Natural Language Processing Implementations of selected machine learning algorithms for natural language processing in golang. The primary focus for t

Dec 25, 2022

Selected Machine Learning algorithms for natural language processing and semantic analysis in Golang

Natural Language Processing Implementations of selected machine learning algorithms for natural language processing in golang. The primary focus for t

Dec 25, 2022

A natural language date/time parser with pluggable rules

when when is a natural language date/time parser with pluggable rules and merge strategies Examples tonight at 11:10 pm at Friday afternoon the deadli

Dec 26, 2022

Guess the natural language of a text in Go

guesslanguage This is a Go version of python guess-language. guesslanguage provides a simple way to detect the natural language of unicode string and

Dec 26, 2022

A natural language date/time parser with pluggable rules

when when is a natural language date/time parser with pluggable rules and merge strategies Examples tonight at 11:10 pm at Friday afternoon the deadli

Dec 26, 2022

AI-powered code snippet generator using ChatGPT. Generate code in various languages based on natural language descriptions!

SnipForge - AI-Powered Code Snippet Generator SnipForge is a powerful command-line interface (CLI) tool that utilizes OpenAI's GPT technology to gener

May 5, 2023

A simple, efficient spring animation library for smooth, natural motion🎼

A simple, efficient spring animation library for smooth, natural motion🎼

Harmonica A simple, efficient spring animation library for smooth, natural motion. It even works well on the command line.

Jan 1, 2023
Self-contained Machine Learning and Natural Language Processing library in Go

If you like the project, please ★ star this repository to show your support! ?? A Machine Learning library written in pure Go designed to support rele

Dec 30, 2022
Selected Machine Learning algorithms for natural language processing and semantic analysis in Golang

Natural Language Processing Implementations of selected machine learning algorithms for natural language processing in golang. The primary focus for t

Dec 25, 2022
A natural language date/time parser with pluggable rules

when when is a natural language date/time parser with pluggable rules and merge strategies Examples tonight at 11:10 pm at Friday afternoon the deadli

Dec 26, 2022
Cross platform locale detection for Golang

go-locale go-locale is a Golang lib for cross platform locale detection. OS Support Support all OS that Golang supported, except android: aix: IBM AIX

Aug 20, 2022
Stemmer packages for Go programming language. Includes English, German and Dutch stemmers.

Stemmer package for Go Stemmer package provides an interface for stemmers and includes English, German and Dutch stemmers as sub-packages: porter2 sub

Dec 14, 2022
Gopher-translator - A HTTP API that accepts english word or sentences and translates them to Gopher language

Gopher Translator Service An interview assignment project. To see the full assig

Jan 25, 2022
Complete Translation - translate a document to another language
 Complete Translation - translate a document to another language

Complete Translation This project is to translate a document to another language. The initial target is English to Korean. Consider this project is no

Feb 25, 2022
Go bindings for the snowball libstemmer library including porter 2

Go (golang) bindings for libstemmer This simple library provides Go (golang) bindings for the snowball libstemmer library including the popular porter

Sep 27, 2022
Cgo binding for icu4c library

About Cgo binding for icu4c C library detection and conversion functions. Guaranteed compatibility with version 50.1. Installation Installation consis

Sep 27, 2022
A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

prose is a natural language processing library (English only, at the moment) in pure Go. It supports tokenization, segmentation, part-of-speech tagging, and named-entity extraction.

Jan 4, 2023