[UNMANTEINED] Extract values from strings and fill your structs with nlp.

GoDoc Go Report Card Build Status codecov

nlp

nlp is a general purpose any-lang Natural Language Processor that parses the data inside a text and returns a filled model

Supported types

int  int8  int16  int32  int64
uint uint8 uint16 uint32 uint64
float32 float64
string
time.Time
time.Duration

Installation

// go1.8+ is required
go get -u github.com/shixzie/nlp

Feel free to create PR's and open Issues :)

How it works

You will always begin by creating a NL type calling nlp.New(), the NL type is a Natural Language Processor that owns 3 funcs, RegisterModel(), Learn() and P().

RegisterModel(i interface{}, samples []string, ops ...ModelOption) error

RegisterModel takes 3 parameters, an empty struct, a set of samples and some options for the model.

The empty struct lets nlp know all possible values inside the text, for example:

type Song struct {
	Name        string // fields must be exported
	Artist      string
	ReleasedAt  time.Time
}
err := nl.RegisterModel(Song{}, someSamples, nlp.WithTimeFormat("2006"))
if err != nil {
	panic(err)
}
// ...

tells nlp that inside the text may be a Song.Name, a Song.Artist and a Song.ReleasedAt.

The samples are the key part about nlp, not just because they set the limits between keywords but also because they will be used to choose which model use to handle an expression.

Samples must have a special syntax to set those limits and keywords.

songSamples := []string{
	"play {Name} by {Artist}",
	"play {Name} from {Artist}",
	"play {Name}",
	"from {Artist} play {Name}",
	"play something from {ReleasedAt}",
}

In the example below, you can see we're reffering to the Name and Artist fields of the Song type declared above, both {Name} and {Artist} are our keywords and yes! you guessed it! Everything between play and by will be treated as a {Name}, and everything that's after by will be treated as an {Artist} meaning that play and by are our limits.

     limits
 ┌─────┴─────┐
┌┴─┐        ┌┴┐
play {Name} by  {Artist}
     └─┬──┘     └───┬──┘
       └──────┬─────┘
           keywords

Any character can be a limit, a , for example can be used as a limit.

keywords as well as limits are CaseSensitive so be sure to type them right.

Note that putting 2 keywords together will cause that only 1 or none of them will be detected

limits are important - Me :3

Learn() error

Learn maps all models samples to their respective models using the NaiveBayes algorithm based on those samples. Learn() also trains all registered models so they're able to fit expressions in the future.

// must call after all models are registrated and before calling nl.P()
err := nl.Learn() 
if err != nil {
	panic(err)
}
// ...

Once the algorithm has finished learning, we're now ready to start Processing those texts.

Note that you must call NL.Learn() after all models are registrated and before calling NL.P()

P(expr string) interface{}

P first asks the trained algorithm which model should be used, once we get the right and already trained model, we just make it fit the expression.

Note that everything in the expression must be separated by a space or tab

When processing an expression, nlp searches for the limits inside that expression and evaluates which sample fits better the expression, it doesn't matter if the text has trash. In this example:

     limits
 ┌─────┴─────┐
┌┴─┐        ┌┴┐
play {Name} by  {Artist}
     └─┬──┘     └───┬──┘
       └──────┬─────┘
           keywords

we have 2 limits, play and by, it doesn't matter if we had an expression hello sir can you pleeeeeease play King by Lauren Aquilina, since:

                                  limits
            trash              ┌────┴────┐
┌─────────────┴─────────────┐ ┌┴─┐      ┌┴┐
hello sir can you pleeeeeease play King by  Lauren Aquilina
                                   └┬─┘     └─────┬───────┘
                                 {Name}       {Artist}
                                 └─┬──┘       └───┬──┘
                                   └──────┬───────┘
                                       keywords

{Name} would be replaced with King, {Artist} would be replaced with Lauren Aquilina, trash would be ignored as well as the limits play and by, and then a pointer to a filled struct with the type used to register the model (Song) ( Song.Name being {Name} and Song.Artist beign {Artist} ) will be returned.

Usage

type Song struct {
	Name       string
	Artist     string
	ReleasedAt time.Time
}

songSamples := []string{
	"play {Name} by {Artist}",
	"play {Name} from {Artist}",
	"play {Name}",
	"from {Artist} play {Name}",
	"play something from {ReleasedAt}",
}

nl := nlp.New()
err := nl.RegisterModel(Song{}, songSamples, nlp.WithTimeFormat("2006"))
if err != nil {
	panic(err)
}

err = nl.Learn() // you must call Learn after all models are registered and before calling P
if err != nil {
	panic(err)
}

// after learning you can call P the times you want
s := nl.P("hello sir can you pleeeeeease play King by Lauren Aquilina") 
if song, ok := s.(*Song); ok {
	fmt.Println("Success")
	fmt.Printf("%#v\n", song)
} else {
	fmt.Println("Failed")
}

// Prints
//
// Success
// &main.Song{Name: "King", Artist: "Lauren Aquilina"}
Owner
Juan Alvarez
Just a normal guy~
Juan Alvarez
Comments
  • Multiple models ?

    Multiple models ?

    Hello, I'm now facing another problem, I wanted to register several models to find many information, here is the problem:

    package main
    
    import (
    	"fmt"
    	"time"
    
    	"github.com/shixzie/nlp"
    )
    
    type Person struct {
    	Name string
    }
    
    type Job struct {
    	Name  string
    	Since time.Time
    }
    
    func main() {
    
    	nl := nlp.New()
    	jobSample := []string{
    		"I'm {Name}",
    		"I'm {Name} since {Since}",
    	}
    	persSample := []string{
    		"my name is {Name}",
    		"call me {Name}",
    	}
    
    	err := nl.RegisterModel(Job{}, jobSample, nlp.WithTimeFormat("2006"))
    	if err != nil {
    		panic(err)
    	}
    	err = nl.RegisterModel(Person{}, persSample, nlp.WithTimeFormat("2006"))
    	if err != nil {
    		panic(err)
    	}
    
    	err = nl.Learn()
    	if err != nil {
    		panic(err)
    	}
    
    	p := nl.P("Hi, my name is Patrice")
    	fmt.Printf("%T, %+v\n", p, p)
    
    	p = nl.P("Hi, I'm engeneer")
    	fmt.Printf("%T, %+v\n", p, p)
    
    	p = nl.P("Hi, my name is Patrice and I'm engeneer since 2001")
    	fmt.Printf("%T, %+v\n", p, p)
    }
    

    That sounds ok becaus nl.P() doesn't return a slice. But that cans be problematic if we want to fetch many information to bind them together.

    So I have that responses:

    *main.Person, &{Name:Patrice}
    *main.Job, &{Name:engeneer Since:0001-01-01 00:00:00 +0000 UTC}
    *main.Job, &{Name:engeneer Since:2001-01-01 00:00:00 +0100 CET}
    

    One more time, I will try to find some time to help if I can, but is there any plan to change that kind of behavior ?

  • extension to allow learning feedback loop at runtime.

    extension to allow learning feedback loop at runtime.

    the current Learn option learns off a fixed set of data if i understand this ? this is also at design time, which makes sense.

    what about at runtime though where we have a HITL (Human in the loop). With any learning algo, it can be helpful for the NLP to get feedback on how well it is learning from the user. This is more run time learning. There are lots of subtle ways of getting that positive or negative feedback.

    • if the user immediately pressed the back button after the result was given.
    • ask the user with a thumb up or down occasionally.
    • maybe some others

    This is just an idea i have seen on other machine learning systems. Curious what you think.

  • Bad match ?

    Bad match ?

    There is something that goes wrong when I use that example:

    package main
    
    import (
    	"fmt"
    	"time"
    
    	"github.com/shixzie/nlp"
    )
    
    type Need struct {
    	Name  string
    	Since time.Time
    }
    
    func main() {
    
    	nl := nlp.New()
    	needSample := []string{
    		"I need {Name} since {Since}",
    	}
    
    	err := nl.RegisterModel(Need{}, needSample, nlp.WithTimeFormat("2006"))
    	if err != nil {
    		panic(err)
    	}
    
    	err = nl.Learn()
    	if err != nil {
    		panic(err)
    	}
    
    	p := nl.P("Hi, I am Patrice, I need water since 2001")
    	fmt.Println(p)
    

    It only get "since" for a "Need" but doesn't match "water". If I remove "I am", or "I" from "I am", so I get "water" and "2001".

    I don't have the time to check what goes wrong, if you can...

  • Can you register multiple models for one parser?

    Can you register multiple models for one parser?

    I've tried and all I seem to get out of the parser is the type and model that was registered with the parser first. For example:

    nl = nlp.New()
    checkErr(nl.RegisterModel(ShutdownAction{}, rundata.Samples["shutdown"]))
    checkErr(nl.RegisterModel(MuteAction{}, rundata.Samples["mute"]))
    nl.Learn()
    

    nl.P() will always return a ShutdownAction{}, regardless of whether or not the text fed to nl.P() matched any part of the samples provided in rundata.Samples["shutdown"].

    It's not that big of a hardship if I can't, it's just that I would prefer to do it this way and use a type switch than try to use keywords to determine which model I should use to evaluate the input. (I'm working on a chatbot for a Discord server and I'd prefer interactions to be more conversational than having a strict command syntax.

  • error in examples_test.go

    error in examples_test.go

    Try run function ExampleClassifier_Classify from file examples_test.go and fail with error panic: register at least one class before learning in place

            err = cls.Learn()
    	if err != nil {
    		panic(err)
    	}
    
  • How I can save/load the learned things?

    How I can save/load the learned things?

    Assuming that I can have many models and each one with a lot of samples, how I can store the learned things, so I can store it in some file and later on i can read from the file?

  • Question around ignoring trailing text

    Question around ignoring trailing text

    Hi!

    Thank you for pulling together this awesome library.

    I have a question related to trailing text.

    Let's say that I'm using the model from the README, and I have the following input:

    s := nl.P("hello sir can you pleeeeeease play King by Lauren Aquilina ????") 
    

    I would like ???? to be considered trash so the artist remains Lauren Aquilina. Is there a way to train the model in that way ? I was trying to play with the limits, but no luck.

Utilities for working with discrete probability distributions and other tools useful for doing NLP work

GNLP A few structures for doing NLP analysis / experiments. Basics counter.Counter A map-like data structure for representing discrete probability dis

Nov 28, 2022
i18n (Internationalization and localization) engine written in Go, used for translating locale strings.

go-localize Simple and easy to use i18n (Internationalization and localization) engine written in Go, used for translating locale strings. Use with go

Nov 29, 2022
Package i18n provides internationalization and localization for your Go applications.

i18n Package i18n provides internationalization and localization for your Go applications. Installation The minimum requirement of Go is 1.16. go get

Nov 9, 2022
Translate your Go program into multiple languages.

go-i18n go-i18n is a Go package and a command that helps you translate Go programs into multiple languages. Supports pluralized strings for all 200+ l

Jan 1, 2023
📖 Tutorial: An easy way to translate your Golang application
📖 Tutorial: An easy way to translate your Golang application

?? Tutorial: An easy way to translate your Golang application ?? The full article is published on April 13, 2021, on Dev.to: https://dev.to/koddr/an-e

Feb 9, 2022
Translate your Go program into multiple languages with similar fmt.Sprintf format syntax.

Loafer-i18n Loafer-i18n is a Go package and a command that helps you translate Go programs into multiple languages. Supports pluralized strings with =

Dec 22, 2021
i18n-pseudo - Pseudolocalization is an incredibly useful tool for localizing your apps.

i18n-pseudo Pseudolocalization is an incredibly useful tool for localizing your apps. This module makes it easy to apply pseudo to any given string. I

Mar 21, 2022
Read and use word2vec vectors in Go

Introduction This is a package for reading word2vec vectors in Go and finding similar words and analogies. Installation This package can be installed

Nov 28, 2022
Selected Machine Learning algorithms for natural language processing and semantic analysis in Golang

Natural Language Processing Implementations of selected machine learning algorithms for natural language processing in golang. The primary focus for t

Dec 25, 2022
A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

prose is a natural language processing library (English only, at the moment) in pure Go. It supports tokenization, segmentation, part-of-speech tagging, and named-entity extraction.

Jan 4, 2023
Self-contained Machine Learning and Natural Language Processing library in Go

If you like the project, please ★ star this repository to show your support! ?? A Machine Learning library written in pure Go designed to support rele

Dec 30, 2022
Stemmer packages for Go programming language. Includes English, German and Dutch stemmers.

Stemmer package for Go Stemmer package provides an interface for stemmers and includes English, German and Dutch stemmers as sub-packages: porter2 sub

Dec 14, 2022
A Go package for n-gram based text categorization, with support for utf-8 and raw text

A Go package for n-gram based text categorization, with support for utf-8 and raw text. To do: write documentation make it faster Keywords: text categ

Nov 28, 2022
A go library for reading and creating ISO9660 images

iso9660 A package for reading and creating ISO9660 Joliet and Rock Ridge extensions are not supported. Examples Extracting an ISO package main import

Jan 2, 2023
Gopher-translator - A HTTP API that accepts english word or sentences and translates them to Gopher language

Gopher Translator Service An interview assignment project. To see the full assig

Jan 25, 2022
Optinator - Idiomatic way to fill structs with options logic

optinator Go packages are generally start with a main struct and the package ini

Mar 15, 2022
A program to create assembly 8086 strings to print without using any printing/strings related function but only mov-xchg-int and loops

Assembly String builder tool A program to create assembly 8086 strings to print without using any printing/strings related function but only mov-xchg-

Feb 1, 2022
Go efficient text segmentation and NLP; support english, chinese, japanese and other. Go 语言高性能分词

gse Go efficient text segmentation; support english, chinese, japanese and other. 简体中文 Dictionary with double array trie (Double-Array Trie) to achiev

Jan 8, 2023
[Go] Package of validators and sanitizers for strings, numerics, slices and structs

govalidator A package of validators and sanitizers for strings, structs and collections. Based on validator.js. Installation Make sure that Go is inst

Jan 6, 2023
[Go] Package of validators and sanitizers for strings, numerics, slices and structs

govalidator A package of validators and sanitizers for strings, structs and collections. Based on validator.js. Installation Make sure that Go is inst

Dec 28, 2022