A Go port of the Rapid Automatic Keyword Extraction algorithm (RAKE)

A Go implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm as described in: Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic Keyword Extraction from Individual Documents. In M. W. Berry & J. Kogan (Eds.), Text Mining: Theory and Applications: John Wiley & Sons.

Original Python implementation available at: https://github.com/aneesha/RAKE

The source code is released under the MIT License.

Docs and Report Card

Example Usage

package main

import (
	"github.com/afjoseph/goRAKE"
	"fmt"
)

func main() {
	text := `The growing doubt of human autonomy and reason has created a state of moral confusion where man is left without the guidance of either revelation or reason. The result is the acceptance of a relativistic position which proposes that value judgements and ethical norms are exclusively matters of arbitrary preference and that no objectively valid statement can be made in this realm... But since man cannot live without values and norms, this relativism makes him an easy prey for irrational value systems.`

	candidates := rake.RunRake(text)

	for _, candidate := range candidates {
		fmt.Printf("%s --> %f\n", candidate.Key, candidate.Value)
	}

	fmt.Printf("\nsize: %d\n", len(candidates))
}

<!---------------------------------------------------------->
<!--output-->
<!---------------------------------------------------------->
<!--objectively valid statement --> 9.000000-->
<!--exclusively matters --> 4.000000-->
<!--arbitrary preference --> 4.000000-->
<!--easy prey --> 4.000000-->
<!--relativistic position --> 4.000000-->
<!--human autonomy --> 4.000000-->
<!--relativism makes --> 4.000000-->
<!--growing doubt --> 4.000000-->
<!--moral confusion --> 4.000000-->
<!--ethical norms --> 3.500000-->
<!--norms --> 1.500000-->
<!--made --> 1.000000-->
<!--guidance --> 1.000000-->
<!--man --> 1.000000-->
<!--result --> 1.000000-->
<!--systems --> 1.000000-->
<!--values --> 1.000000-->
<!--realm --> 1.000000-->
<!--live --> 1.000000-->
<!--judgements --> 1.000000-->
<!--reason --> 1.000000-->
<!--left --> 1.000000-->
<!--proposes --> 1.000000-->
<!--irrational --> 1.000000-->
<!--created --> 1.000000-->
<!--acceptance --> 1.000000-->
<!--revelation --> 1.000000-->
<!--state --> 1.000000-->

<!--size: 28-->
Owner
Abdullah Joseph
Mobile Security Team Lead @ Adjust https://www.linkedin.com/in/afjoseph/
Abdullah Joseph
Comments
  • Performance degradation caused by commit b66ca2f2b6bf9f4d84f82946e670799b8d2b2e46

    Performance degradation caused by commit b66ca2f2b6bf9f4d84f82946e670799b8d2b2e46

    This commit fixes #1 but adds pretty heavy loop and result should be cached or stored as a []string but it isn't https://github.com/Obaied/RAKE.Go/blob/b66ca2f2b6bf9f4d84f82946e670799b8d2b2e46/stopwords.go#L584

  • Updated rake.go.

    Updated rake.go.

    Fixes : https://github.com/afjoseph/RAKE.Go/issues/7

    After this fix Rake is able to remove all the consecutive stopwords instead of just removing the first one.

  • Hardcoded stop-list path

    Hardcoded stop-list path

    Hi.

    You send us a pull request here https://github.com/avelino/awesome-go/pull/1229.

    I've start my review, but I see that hardcoded stopPath should cause a fail here should cause a fail here if file not found, but it should not exist on linux/windows.

  • README.md Example: Stopwords are not Ignored

    README.md Example: Stopwords are not Ignored

    I believe your commit 6bf7f9f5e21bfa2097164aa0958b4d6dacfe2570, 'Load stop-words as a string slice instead of splitting a large string', broke stopwords from working. If I rollback to version prior to that commit, everything works fine (git checkout 7df06d19b2795d3b3101a8da3b79efad4c2ce7be).

    Running the README.md example (with the import fixed), stopwords are not removed.

    package main
    
    import (
    	"fmt"
    
    	rake "github.com/afjoseph/RAKE.Go"
    )
    
    func main() {
    	text := `The growing doubt of human autonomy and reason has created a state of moral confusion where man is left without the guidance of either revelation or reason. The result is the acceptance of a relativistic position which proposes that value judgements and ethical norms are exclusively matters of arbitrary preference and that no objectively valid statement can be made in this realm... But since man cannot live without values and norms, this relativism makes him an easy prey for irrational value systems.`
    
    	candidates := rake.RunRake(text)
    
    	for _, candidate := range candidates {
    		fmt.Printf("%s --> %f\n", candidate.Key, candidate.Value)
    	}
    
    	fmt.Printf("\nsize: %d\n", len(candidates))
    }
    
    a relativistic position --> 9.000000
    objectively valid statement --> 9.000000
    an easy prey --> 9.000000
    value judgements --> 4.000000
    human autonomy --> 4.000000
    the acceptance --> 4.000000
    growing doubt --> 4.000000
    relativism makes --> 4.000000
    the guidance --> 4.000000
    either revelation --> 4.000000
    be made --> 4.000000
    arbitrary preference --> 4.000000
    exclusively matters --> 4.000000
    this realm --> 4.000000
    moral confusion --> 4.000000
    since man --> 3.500000
    ethical norms --> 3.500000
    norms --> 1.500000
    man --> 1.500000
    proposes --> 1.000000
    irrational --> 1.000000
    left --> 1.000000
    created --> 1.000000
    reason --> 1.000000
    state --> 1.000000
    values --> 1.000000
    result --> 1.000000
    systems --> 1.000000
    live --> 1.000000
    that --> 1.000000
    
    size: 30
    

    I am running:

    go version go1.13.1 darwin/amd64
    
  • Updated import statement

    Updated import statement

    Changed import to "github.com/afjoseph/RAKE.go" from "github.com/afjoseph/goRAKE", which was broken Fixes https://github.com/afjoseph/RAKE.Go/issues/6

Chinese word splitting algorithm MMSEG in GO

MMSEGO This is a GO implementation of MMSEG which a Chinese word splitting algorithm. TO DO list Documentation/comments Benchmark Usage #Input Diction

Sep 27, 2022
Golang implementation of the Paice/Husk Stemming Algorithm

##Golang Implementation of the Paice/Husk stemming algorithm This project was created for the QUT course INB344. Details on the algorithm can be found

Sep 27, 2022
Golang port of Petrovich - an inflector for Russian anthroponyms.
Golang port of Petrovich - an inflector for Russian anthroponyms.

Petrovich is the library which inflects Russian names to given grammatical case. This is the Go port of https://github.com/petrovich. Installation go

Dec 25, 2022
a Make/rake-like dev tool using Go
a Make/rake-like dev tool using Go

About Mage is a make-like build tool using Go. You write plain-old go functions, and Mage automatically uses them as Makefile-like runnable targets. I

Jan 7, 2023
Eunomia is a distributed application framework that support Gossip protocol, QuorumNWR algorithm, PBFT algorithm, PoW algorithm, and ZAB protocol and so on.

Introduction Eunomia is a distributed application framework that facilitates developers to quickly develop distributed applications and supports distr

Sep 28, 2021
repin is a tool to replace strings between keyword pair.

repin repin is a tool to replace strings between keyword pair. tl;dr repin is a tool that makes it easy to write operations that can be written in GNU

Nov 4, 2022
Gopkg - Search go.dev packages by keyword

gopkg Search go.dev packages by keyword Usage Install go install github.com/luck

Apr 6, 2022
go-fastdfs 是一个简单的分布式文件系统(私有云存储),具有无中心、高性能,高可靠,免维护等优点,支持断点续传,分块上传,小文件合并,自动同步,自动修复。Go-fastdfs is a simple distributed file system (private cloud storage), with no center, high performance, high reliability, maintenance free and other advantages, support breakpoint continuation, block upload, small file merge, automatic synchronization, automatic repair.(similar fastdfs).
go-fastdfs 是一个简单的分布式文件系统(私有云存储),具有无中心、高性能,高可靠,免维护等优点,支持断点续传,分块上传,小文件合并,自动同步,自动修复。Go-fastdfs is a simple distributed file system (private cloud storage), with no center, high performance, high reliability, maintenance free and other advantages, support breakpoint continuation, block upload, small file merge, automatic synchronization, automatic repair.(similar fastdfs).

中文 English 愿景:为用户提供最简单、可靠、高效的分布式文件系统。 go-fastdfs是一个基于http协议的分布式文件系统,它基于大道至简的设计理念,一切从简设计,使得它的运维及扩展变得更加简单,它具有高性能、高可靠、无中心、免维护等优点。 大家担心的是这么简单的文件系统,靠不靠谱,可不

Jan 8, 2023
Lima launches Linux virtual machines on macOS, with automatic file sharing, port forwarding, and containerd.

Lima: Linux-on-Mac ("macOS subsystem for Linux", "containerd for Mac")

Jan 8, 2023
Go-enum-algorithm - Implement an enumeration algorithm in GO

go-enum-algorithm implement an enumeration algorithm in GO run the code go run m

Feb 15, 2022
A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

prose is a natural language processing library (English only, at the moment) in pure Go. It supports tokenization, segmentation, part-of-speech tagging, and named-entity extraction.

Jan 4, 2023
:wink: :cyclone: :strawberry: TextRank implementation in Golang with extendable features (summarization, phrase extraction) and multithreading (goroutine) support (Go 1.8, 1.9, 1.10)
:wink: :cyclone: :strawberry: TextRank implementation in Golang with extendable features (summarization, phrase extraction) and multithreading (goroutine) support (Go 1.8, 1.9, 1.10)

TextRank on Go This source code is an implementation of textrank algorithm, under MIT licence. The minimum requred Go version is 1.8. MOTIVATION If th

Dec 18, 2022
A Go native tabular data extraction package. Currently supports .xls, .xlsx, .csv, .tsv formats.

grate A Go native tabular data extraction package. Currently supports .xls, .xlsx, .csv, .tsv formats. Why? Grate focuses on speed and stability first

Dec 26, 2022
Pi-hole data right from your terminal. Live updating view, query history extraction and more!
Pi-hole data right from your terminal. Live updating view, query history extraction and more!

Pi-CLI Pi-CLI is a command line program used to view data from a Pi-Hole instance directly in your terminal.

Dec 12, 2022
:book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

prose prose is a natural language processing library (English only, at the moment) in pure Go. It supports tokenization, segmentation, part-of-speech

Jan 4, 2023
PipeIt is a text transformation, conversion, cleansing and extraction tool.
PipeIt is a text transformation, conversion, cleansing and extraction tool.

PipeIt PipeIt is a text transformation, conversion, cleansing and extraction tool. Features Split - split text to text array by given separator. Regex

Aug 15, 2022
Fast, realtime regex-extraction, and aggregation into common formats such as histograms, numerical summaries, tables, and more!
Fast, realtime regex-extraction, and aggregation into common formats such as histograms, numerical summaries, tables, and more!

rare A file scanner/regex extractor and realtime summarizor. Supports various CLI-based graphing and metric formats (histogram, table, etc). Features

Dec 29, 2022
Extraction politique de conformité : xlsx (fichier de suivi) -> xml (format AlgoSec)

go_policyExtractor Extraction politique de conformité : xlsx (fichier de suivi) -> xml (format AlgoSec). Le programme suivant se base sur les intitulé

Nov 4, 2021
A block parser tool that allows extraction of various data types on DAS

das-database A block parser tool that allows extraction of various data types on DAS (register, edit, sell, transfer, ...) from CKB Prerequisites Ubun

Nov 11, 2022
go-fasttld is a high performance top level domains (TLD) extraction module.

go-fasttld go-fasttld is a high performance top level domains (TLD) extraction module implemented with compressed tries. This module is a port of the

Dec 21, 2022