Self-contained Machine Learning and Natural Language Processing library in Go

Last update: Dec 30, 2022

Comments: 16

If you like the project, please ★ star this repository to show your support! 🤩

A Machine Learning library written in pure Go designed to support relevant neural architectures in Natural Language Processing.

spaGO is self-contained, in that it uses its own lightweight computational graph framework for both training and inference, easy to understand from start to finish.

Usage

Requirements:

Clone this repo or get the library:

go get -u github.com/nlpodyssey/spago

spaGO supports two main use cases, which are explained more in detail in the following.

CLI mode

Several programs can be leveraged to tour the current NLP capabilities in spaGO. A list of the demos now follows.

The Docker image can be built like this.

docker build -t spago:main . -f Dockerfile

Library mode

You can access the core functionality of spaGO, i.e. optimizing mathematical expressions by back-propagating gradients through a computational graph, in your own code by using spaGO in library mode.

At a high level, it comprises four main modules:

Matrix
Graph
Model
Optimizer

To get started, look at the implementation of built-in neural models, such as the LSTM. Don't be afraid, it is straightforward Go code. The idea is that you could have written spaGO :)

You may find a Feature Source Tree useful for a quick overview of the library's package organization.

There is also a repo with handy examples, such as MNIST classification.

Features

Automatic differentiation

You write the forward(), it does all backward() derivatives for you:
- Define-by-Run (default, just like PyTorch does)
- Define-and-Run (similar to the static graph of TensorFlow)

Optimization methods

Gradient descent:
- Adam, RAdam, RMS-Prop, AdaGrad, SGD
Differential Evolution

Neural networks

Feed-forward models (Linear, Highway, Convolution, ...)
Recurrent models (LSTM, GRU, BiLSTM...)
Attention mechanisms (Self-Attention, Multi-Head Attention, ...)
Recursive auto-encoders

Natural Language Processing

Memory-efficient Word Embeddings (with badger key–value store)
Character Language Models
Recurrent Sequence Labeler with CRF on top (e.g. Named Entities Recognition)
Transformer models (BERT-like)
- Masked language model
- Next sentence prediction
- Tokens Classification
- Text Classification (e.g. Sentiment Analysis)
- Question Answering
- Textual Entailment
- Text Similarity

Compatible with pre-trained state-of-the-art neural models:

Hugging Face Transformers
Flair sequence labeler architecture

Current Status

We're not at a v1.0.0 yet, so spaGO is currently work-in-progress.

However, it has been running smoothly for a quite a few months now in a system that analyzes thousands of news items a day!

Besides, it's pretty easy to get your hands on through, so you might want to use it in your real applications.

Early adopters may make use of it for production use today as long as they understand and accept that spaGO is not fully tested and that APIs might change.

Known Limits

Sadly, at the moment, spaGO is not GPU friendly by design.

Contributing

We're glad you're thinking about contributing to spaGO! If you think something is missing or could be improved, please open issues and pull requests. If you'd like to help this project grow, we'd love to have you!

To start contributing, check the Contributing Guidelines.

Contact

We encourage you to write an issue. This would help the community grow.

If you really want to write to us privately, please email Matteo Grella with your questions or comments.

Acknowledgments

spaGO is a personal project that is part of the open-source NLP Odyssey initiative initiated by members of the EXOP team. I would therefore like to thank EXOP GmbH here, which is providing full support for development by promoting the project and giving it increasing importance.

Sponsors

We appreciate contributions of all kinds. We especially want to thank spaGO fiscal sponsors who contribute to ongoing project maintenance.

Our aim is simplifying people's life by making lending easy, fast and accessible, leveraging Open Banking and Artificial Intelligence. https://www.faire.ai/

We work on Artificial Intelligence based hardware and software systems, declining them in areas such as Energy Management, Personal Safety, E-Health and Sports equipment. https://hype-design.it/

Professional services in the IT sector for Local Administrations, Enterprises and Local Authorities. https://www.boxxapps.com/

See our Open Collective page if you too are interested in becoming a sponsor.

Owner

NLP Odyssey

https://github.com/nlpodyssey/spago

Comments

Help with German zero shot

I would like to run Sahajtomar/German_Zeroshot (https://huggingface.co/Sahajtomar/German_Zeroshot) model in spago.

The import was successful: ./huggingface-importer --model=Sahajtomar/German_Zeroshot --repo=./models -> BERT has been converted successfully!

Can I now run the model with Bart server(as I believe supports the zero shot, not the Bart server)?

I receive:

bassea@AP15557 spago % ./bart-server server --repo=./models --model=Sahajtomar/German_Zeroshot --tls-disable Start loading pre-trained model from "models/Sahajtomar/German_Zeroshot" [1/2] Loading configuration... ok panic: bart: unsupported architecture BertForSequenceClassification

goroutine 1 [running]: github.com/nlpodyssey/spago/pkg/nlp/transformers/bart/loader.Load(0xc000038660, 0x21, 0x2, 0xc000038660, 0x21, 0x492f960) /Users/bassea/go/src/spago/pkg/nlp/transformers/bart/loader/loader.go:43 +0x819 github.com/nlpodyssey/spago/cmd/bart/app.newServerCommandActionFor.func1(0xc00022b740, 0x0, 0x0) /Users/bassea/go/src/spago/cmd/bart/app/server.go:106 +0x105 github.com/urfave/cli/v2.(*Command).Run(0xc000222ea0, 0xc00022b440, 0x0, 0x0) /Users/bassea/go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:163 +0x4e0 github.com/urfave/cli/v2.(*App).RunContext(0xc0000351e0, 0x4ae0aa0, 0xc000036068, 0xc0000320a0, 0x5, 0x5, 0x0, 0x0) /Users/bassea/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:313 +0x814 github.com/urfave/cli/v2.(*App).Run(...) /Users/bassea/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:224 main.main() /Users/bassea/go/src/spago/cmd/bart/main.go:15 +0x72
Is it possible to pre-load passages from csv?

Is it currently possible to preload let's say the go faq and run semantic search on the passages by only providing a question to spago serving squad2. I would like to create behavior like shown in this video semantic-search

If not yet could you give me some pointers to dig into it?

Thank you
float32 data type
This is just a question out of curiosity but: Do you have any plans to support the float32 data type (or any other types, like integers actually)?

It is very common to train a neural network with a float32 precision, as it reduces the computation cost without having any significant impact on the accuracy, and I was wondering what would be the speed gain for spago?

I was thinking about something like an Enum type given to the matrix creation function and maybe the possibility to convert an existing matrix from a type to another

Supporting types like uint8 could allow to implement some quantization schemes more easily, which seems like a good fit with Golang since it shines in distributed applications (e.g. sending quantized weights over the network is definitely more bandwidth friendly)

For now I just find myself "fighting the matrix" by extracting the underlying data, convert them in the desired type, do some work with them and finally reloading them later in a matrix to run some computation.
Other integration / Telegram Bot

Hi guys,

Hope you are all well !

I was looking for some implementation of AI chatbot golang and found your project spago.

And, I am developing a multibot for telegram based on go plugins: https://github.com/paper2code/telegram-multibot

So I was wondering how can I integrate spago as a QA bot plugin.

Do you have any insights or advices for such integration ?

Thanks in advance.

Cheers, X

Running hugging_face_importer from docker container causes strange behavior

I was following instructions to test the question answering demo and noticed that the container never completed by outputting the spago model and left a zombie python process running on my machine (a mid-2013 MacBook Pro 2.3 ghz quad core i7, 16 gb ddr3). Here are steps to reproduce:

# after cloning the repo in its latest form
git rev-parse --short HEAD
f91d1b8
docker build -t spago:main . -f Dockerfile
# that completes successfully
mkdir ~/.spago
# then i run the hugging face import step via the container
docker run --rm -it -v ~/.spago:/tmp/spago spago:main ./hugging_face_importer --model=deepset/bert-base-cased-squad2 --repo=/tmp/spago

Running command: './hugging_face_importer --model=deepset/bert-base-cased-squad2 --repo=/tmp/spago'
Downloading dataset...
Start downloading 🤗 `deepset/bert-base-cased-squad2`
2020/06/27 18:55:30 Fetch the model configuration from `https://s3.amazonaws.com/models.huggingface.co/bert/deepset/bert-base-cased-squad2/config.json`
Downloading... 508 B complete
2020/06/27 18:55:30 Fetch the model vocabulary from `https://s3.amazonaws.com/models.huggingface.co/bert/deepset/bert-base-cased-squad2/vocab.txt`
Downloading... 214 kB complete
2020/06/27 18:55:30 Fetch the model weights from `https://s3.amazonaws.com/models.huggingface.co/bert/deepset/bert-base-cased-squad2/pytorch_model.bin` (it might take a while...)

# this process runs for a _really_ long time - ive actually never seen it finish successfully (have let it run for over 60 minutes)
...

In another shell session I was inspecting what Python processes are operating because I noted some cpu hogging after the pytorch model was fully downloaded ...

$ ls -hal ~/.spago/deepset/bert-base-cased-squad2/
total 417M
drwxr-xr-x 6 anthcor staff  192 Jun 27 14:56 ./
drwxr-xr-x 3 anthcor staff   96 Jun 27 14:55 ../
-rw-r--r-- 1 anthcor staff  508 Jun 27 14:55 config.json
drwx------ 6 anthcor staff  192 Jun 27 14:57 embeddings_storage/
-rw-r--r-- 1 anthcor staff 414M Jun 27 14:56 pytorch_model.bin
-rw-r--r-- 1 anthcor staff 209K Jun 27 14:55 vocab.txt
$ ps aux | grep spago
anthcor          29694   6.1  0.1  4444408  22784 s003  S+    2:55PM   0:03.44 docker run --rm -it -v /Users/anthcor/.spago:/tmp/spago spago:main ./hugging_face_importer --model=deepset/bert-base-cased-squad2 --repo=/tmp/spago
anthcor          29705   0.0  0.0  4268300    700 s004  S+    2:56PM   0:00.00 grep spago
anthcor          29693   0.0  0.0  4280612   6932 s003  S+    2:55PM   0:00.08 /usr/local/Cellar/[email protected]/3.8.3/Frameworks/Python.framework/Versions/3.8/Resources/Python.app/Contents/MacOS/Python /usr/local/bin/grc -es --colour=auto docker run --rm -it -v /Users/anthcor/.spago:/tmp/spago spago:main ./hugging_face_importer --model=deepset/bert-base-cased-squad2 --repo=/tmp/spago

When running this workflow not with docker everything works as expected.

In order to stop everything I just kill the docker container and the zombie local python process.

ps aux | grep -i spago | awk '{print $2}' | xargs kill -9 $1

Not really sure why this happens – looks like the container is using a local system binary and that causes hangups in the flow of things but I could totally be wrong as I haven't really spent too much time diving in. Hope this gives enough insight into my issue – lmk if you would like any more details. Cheers 🍻

BERT Server Prediction Doesn't Seem To Work

I'm attempting to use the BERT server, and have successfully gotten the /answer API call to work. I can't seem to find much information on BERT prediction in general, but I'm guessing its used to predict what the next sentence will be?

Based on this I tried sending a JSON request to the /predict route like so, but it gives an empty response:

$> curl -d '{"text": "BERT is a technique for NLP developed by Google. BERT was created and published in 2018 by Jacob Devlin and his colleagues from Google."}' -H Content-Type=application/json http://localhost:1987/predict
{"tokens":[]}

Contrast this with discriminate which works:

$> curl -d '{"text": "BERT is a technique for NLP developed by Google. BERT was created and published in 2018 by Jacob Devlin and his colleagues from Google."}' -H Content-Type=application/json http://localhost:1987/discriminate
{"tokens":[{"text":"BERT","start":0,"end":4,"label":"FAKE"},{"text":"is","start":5,"end":7,"label":"FAKE"},{"text":"a","start":8,"end":9,"label":"FAKE"},{"text":"technique","start":10,"end":19,"label":"FAKE"},{"text":"for","start":20,"end":23,"label":"FAKE"},{"text":"NLP","start":24,"end":27,"label":"FAKE"},{"text":"developed","start":28,"end":37,"label":"FAKE"},{"text":"by","start":38,"end":40,"label":"FAKE"},{"text":"Google","start":41,"end":47,"label":"FAKE"},{"text":".","start":47,"end":48,"label":"FAKE"},{"text":"BERT","start":49,"end":53,"label":"FAKE"},{"text":"was","start":54,"end":57,"label":"FAKE"},{"text":"created","start":58,"end":65,"label":"FAKE"},{"text":"and","start":66,"end":69,"label":"FAKE"},{"text":"published","start":70,"end":79,"label":"FAKE"},{"text":"in","start":80,"end":82,"label":"FAKE"},{"text":"2018","start":83,"end":87,"label":"FAKE"},{"text":"by","start":88,"end":90,"label":"FAKE"},{"text":"Jacob","start":91,"end":96,"label":"FAKE"},{"text":"Devlin","start":97,"end":103,"label":"FAKE"},{"text":"and","start":104,"end":107,"label":"FAKE"},{"text":"his","start":108,"end":111,"label":"FAKE"},{"text":"colleagues","start":112,"end":122,"label":"FAKE"},{"text":"from","start":123,"end":127,"label":"FAKE"},{"text":"Google","start":128,"end":134,"label":"FAKE"},{"text":".","start":134,"end":135,"label":"FAKE"}]}

Also, what does discriminate do exactly? I found an article explaining all the details of BERT but cant seem to find out what discriminate does

Cannot import project as library

Hello,

I am trying to import this project as a library by running the command go get -u github.com/nlpodyssey/spago (as instructed in the readme)

I am getting the following error: cannot find package "github.com/dgraph-io/badger/v2"

Anyone else got this error?
Better BERT Server Capabilities
(apologies if the many issues are annoying, really liking the library so far)

Overview

Right now when using the BERT server, a set of defaults are used without any control from the user:

HTTP server lacks customizability

HTTP server listens on 0.0.0.0

No ability to enable TLS

Additionally it's not possible to build "external servers" as the functions used by the BERT http router (discriminateHandler, predictionHandler, qaHandler are private functions). If this was changed to instead export the handler functions (DiscriminateHandler, PredictionHandler, QaHandler) it would allow people to have more control over the BERT server, better middleware capabilities, TLS, etc...

By using public handler functions, users would be able to define their own routers, say using chi, and overall have more control of the BERT server.

I'd be more than happy to open a PR that implements these suggestions

Multi-label BERT classifier from PyTorch

So I can convert and then load my BERT model, but I am having troubling working out how to operate it from Spago.

It is a multi-label model and to use it in Python I do this:

    text_enc = bert_tokenizer.encode_plus(
            texttoclassify,
            None,
            add_special_tokens=True,
            max_length=MAX_LEN,
            padding='max_length',
            return_token_type_ids=False,
            return_attention_mask=True,
            truncation=True,
            return_tensors='pt'
    )

    # mymodel implements pl.LightningModule
    #
    outputs = mymodel(text_enc['input_ids'], text_enc['attention_mask'])
    pred_out = outputs[0].detach().numpy()

And then process the pred_out array. This model has 5 outputs and all works as you expect in Python.

So, how do I perform the equivalent in Spago? Borrowing code from the classifier server, I have got this far, but it just isn't obvious what I need to modify to cater for 5 output label layer.


func getTokenized(vocab *vocabulary.Vocabulary, text string) []string {
	cls := wordpiecetokenizer.DefaultClassToken
	sep := wordpiecetokenizer.DefaultSequenceSeparator
	tokenizer := wordpiecetokenizer.New(vocab)
	tokenized := append([]string{cls}, append(tokenizers.GetStrings(tokenizer.Tokenize(text)), sep)...)
	return tokenized
}

// ....
	model, err := bert.LoadModel(dir)
	if err != nil {
		log.Fatalf("error during model loading (%v)\n", err)
	}
	defer model.Close()
	
	// We need a classifier that matches the output layer of our model.
	//
	var bc = bert.ClassifierConfig{
		InputSize: 768,
		Labels:    []string{"A", "B", "C", "D", "E"},
	}
	model.Classifier = bert.NewTokenClassifier(bc)

	tokenized := getTokenized(model.Vocabulary, s)

	g := ag.NewGraph(ag.ConcurrentComputations(runtime.NumCPU()))
	proc := nn.ReifyForInference(model, g).(*bert.Model)
	encoded := proc.Encode(tokenized)

	logits := proc.SequenceClassification(encoded)
	probs := floatutils.SoftMax(logits.Value().Data())

However, this just gives me 0.2 for each, so I seem to be miles off. Is there an example, or can a short code sequence be provide? Is the wordpiecetokenizer even the correct thing to use?

Accelerators

There is a ton of work being done with risc-v and machine learning accelerators

tinygo makes it’s possible to leverage spago on these accuracies I feel.

Just wanted to point this out as I saw in your Q&A that you felt it was not possible to accelerate spago
This is the best thing since sliced bread

Hello I do not have any issue so feel free to close this but I just wanted to say that spago is fantastic. Love what has been done here in native go!!! Thank you everybody involved. FORZA ITALIA

QA Chinese model result does not match python version

Using this Chinese model This model runs on python locally, the output is correct, but from spaGO is not.

Similar to #101, but I cannot find the bool parameter for QA, how to turn off output is forced to be a distribution (sum must be 1), whereas with Python, the output is free?

server := bert.NewServer(model)
answers := s.model.Answer(body.Question, body.Passage)

Translated QA: Context: My name is Clara, I live in Berkeley Q: what is my name? A: Clara

Output is supposed to be 克拉拉 but got

{
    "answers": [
        {
            "text": "我叫克拉拉，我住在伯克利。",
            "start": 0,
            "end": 13,
            "confidence": 0.2547743
        },
        {
            "text": "住在伯克利。",
            "start": 7,
            "end": 13,
            "confidence": 0.22960596
        },
        {
            "text": "我叫克拉拉，我住",
            "start": 0,
            "end": 8,
            "confidence": 0.1548344
        }
    ],
    "took": 1075
}

./bert-server server --repo=~/.spago --model=luhua/chinese_pretrain_mrc_roberta_wwm_ext_large --tls-disable

PASSAGE="我叫克拉拉，我住在伯克利。"                                                                                                                                                 
QUESTION1="我的名字是什么？" 
curl -k -d '{"question": "'"$QUESTION1"'", "passage": "'"$PASSAGE"'"}' -H "Content-Type: application/json" "http://127.0.0.1:1987/answer?pretty"

Gorgonia tensors

Hi Matteo in your gophercon deck you mentioned to have gpu friendly gorgonia tensors on spago's roadmap. I am curious about how this might work. Could you give any pointers. I suppose because the tensors are more or less just a slice of floats? I read on their git that with regards to cuda the api is expected to change quite a bit before hitting v1.0. Currently on 0.9.17 so I guess not before gorgonia's cuda interface got to a stable first release?

Nearly 4 times the memory usage when compared to python for the same model

I ran memory profiling for the code https://github.com/nlpodyssey/spago/issues/103 and spago version uses 3.9 GB when compared to 1.2 GB of python. The model sizes are similar valhalla/distilbart-mnli-12-3 , it is 2.5 GB after transforming (hf-importer) to spago and where as upstream python version is 2.1 GB.

Memory profiling in spago

memory_prof

Memory profiling in Python

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
     7    217.3 MiB    217.3 MiB           1   @profile
     8                                         def classify():
     9   1227.3 MiB   1010.0 MiB           1       classifier = pipeline("zero-shot-classification", model="models/distilbart-mnli-12-3")
    10                                         
    11   1227.3 MiB      0.0 MiB           1       sequence = "PalmOS on Raspberry Pi"
    12   1227.3 MiB      0.0 MiB           1       candidate_labels = ["startup", "business", "legal", "tech"]
    13                                         
    14                                         
    15   1235.1 MiB      7.8 MiB           1       res = classifier(sequence, candidate_labels, multi_label=True, truncation=False)
    16                                         
    17   1235.1 MiB      0.0 MiB           5       for i, label in enumerate(candidate_labels):
    18   1235.1 MiB      0.0 MiB           4           print("%d. %s [%.2f]\n" % (i, res['labels'][i], res['scores'][i]))

Is this expected? Spago can be very useful in low memory environments like ARM SBC to conducted CPU bound inference, But the memory usage needs to optimized.

Python version seems to be faster in overall operation timing as well because loading of configuration, model weights takes variable timing in spago.

Differences in the output of zero shot classification between python & spago for the same model

I appreciate everyone involved with the spago project for developing a proper Machine Learning framework for Go.

I'm in the process of exploring spago and found that the output for valhalla/distilbart-mnli-12-3 differs for zero shot classification when using python vs spago .

       //main.go

	model, err := zsc.LoadModel("spago/valhalla/distilbart-mnli-12-3")
	if err != nil {
		log.Fatal(err)
	}
	defer model.Close()

	//Sequence
	sequence := "PalmOS on Raspberry Pi"

	// arbitrary list of topics
	candidateLables := []string{"startup", "business", "legal", "tech"}

	result, err := model.Classify(sequence, "", candidateLables, true)

	if err != nil {
		log.Fatal(err)
	}
	for i, item := range result.Distribution {
		fmt.Printf("%d. %s [%.2f]\n", i, item.Class, item.Confidence)
	}


0. tech [0.89]
1. startup [0.02]
2. legal [0.01]
3. business [0.00]

   #main.py
    classifier = pipeline("zero-shot-classification", model="models/distilbart-mnli-12-3")

    sequence = "PalmOS on Raspberry Pi"
    candidate_labels = ["startup", "business", "legal", "tech"]

    res = classifier(sequence, candidate_labels, multi_label=True, truncation=False)

    for i, label in enumerate(candidate_labels):
        print("%d. %s [%.2f]\n" % (i, res['labels'][i], res['scores'][i]))

0. tech [0.99]
1. legal [0.77]
2. startup [0.05]
3. business [0.00]

Is this an expected behavior? If so why.

bart-large-mnli multi_class does not agree with Python version

If you convert facebook/bart-large-mnli and use it to evaluate the demo text at huggingface and compare against a local Python setup for verification, we find that:

the online demo card and the local Python agree on the label score
the label probabilities given back are vastly different
the Python version takes roughly 16 seconds on my local machine, but the Spago version takes 37 seconds - this is a MAC and there is no GPU available

Python code is

    text = "A new model offers an explanation for how the Galilean satellites formed around the solar system’s " \
           "largest world. Konstantin Batygin did not set out to solve one of the solar system’s most puzzling " \
           "mysteries when he went for a run up a hill in Nice, France. Dr. Batygin, a Caltech researcher, " \
           "best known for his contributions to the search for the solar system’s missing “Planet Nine,” spotted a " \
           "beer bottle. At a steep, 20 degree grade, he wondered why it wasn’t rolling down the hill. He realized " \
           "there was a breeze at his back holding the bottle in place. Then he had a thought that would only pop " \
           "into the mind of a theoretical astrophysicist: “Oh! This is how Europa formed.” Europa is one of " \
           "Jupiter’s four large Galilean moons. And in a paper published Monday in the Astrophysical Journal, " \
           "Dr. Batygin and a co-author, Alessandro Morbidelli, a planetary scientist at the Côte d’Azur Observatory " \
           "in France, present a theory explaining how some moons form around gas giants like Jupiter and Saturn, " \
           "suggesting that millimeter-sized grains of hail produced during the solar system’s formation became " \
           "trapped around these massive worlds, taking shape one at a time into the potentially habitable moons we " \
           "know today. "
    cc = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
    labels = ['space & cosmos', 'scientific discovery', 'microbiology', 'robots', 'archeology']
    r = cc(text, labels, multi_class=True)

Go code, with same text and classes, is:

bartModel, err = zsc.LoadModel(bartDir)
// ... check err
result, err := bartModel.Classify(c.InputText, "", classes, true)

Similarly using the model valhalla/distilbart-mnli-12-3 also gives wildly different results to the online huggingface demo, using the same text and label set as above.

So, is there something else I need to do, or is the zsc code not working? My go code is essentially just like the zsc demo code.

Self-contained Machine Learning and Natural Language Processing library in Go

Usage

CLI mode

Library mode

Features

Automatic differentiation

Optimization methods

Neural networks

Natural Language Processing

Compatible with pre-trained state-of-the-art neural models:

Current Status

Known Limits

Contributing

Contact

Links

Acknowledgments

Sponsors

Owner

NLP Odyssey

Comments

Help with German zero shot

Is it possible to pre-load passages from csv?

float32 data type

Other integration / Telegram Bot

Running hugging_face_importer from docker container causes strange behavior

BERT Server Prediction Doesn't Seem To Work

Cannot import project as library

Better BERT Server Capabilities

Overview

Multi-label BERT classifier from PyTorch

Accelerators

This is the best thing since sliced bread

QA Chinese model result does not match python version

Gorgonia tensors

Nearly 4 times the memory usage when compared to python for the same model

Differences in the output of zero shot classification between python & spago for the same model

bart-large-mnli multi_class does not agree with Python version

Related tags

Self-contained Japanese Morphological Analyzer written in pure Go

Natural language detection library for Go

Natural language detection package in pure Go

A natural language date/time parser with pluggable rules

A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

Stemmer packages for Go programming language. Includes English, German and Dutch stemmers.

Gopher-translator - A HTTP API that accepts english word or sentences and translates them to Gopher language

Complete Translation - translate a document to another language

A go library for reading and creating ISO9660 images

Go bindings for the snowball libstemmer library including porter 2

Cgo binding for icu4c library

A Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29

Cgo binding for Snowball C library

Go efficient text segmentation and NLP; support english, chinese, japanese and other. Go 语言高性能分词

i18n (Internationalization and localization) engine written in Go, used for translating locale strings.

Utilities for working with discrete probability distributions and other tools useful for doing NLP work

Read and use word2vec vectors in Go

[UNMANTEINED] Extract values from strings and fill your structs with nlp.

A Go package for n-gram based text categorization, with support for utf-8 and raw text