character-set conversion library implemented in Go

mahonia

character-set conversion library implemented in Go.

Mahonia is a character-set conversion library implemented in Go. All data is compiled into the executable; it doesn't need any external data files.

based on http://code.google.com/p/mahonia/

install

go get github.com/axgle/mahonia

example

package main
import "fmt"
import "github.com/axgle/mahonia"
func main(){
  enc:=mahonia.NewEncoder("gbk")
  //converts a  string from UTF-8 to gbk encoding.
  fmt.Println(enc.ConvertString("hello,世界"))  
}

donate

https://github.com/axgle/mahonia/wiki/Donate

Comments
  • Import of code.google.com/p/mahonia

    Import of code.google.com/p/mahonia

    The file "mahoniconv/mahoniconv.go" imports "code.google.com/p/mahonia". As code.google.com is shutting down, this should be changed to github.com/axgle/mahonia.

  • 如果示例没有转换成功,可以试试这样的方式

    如果示例没有转换成功,可以试试这样的方式

    package main import "fmt" import "github.com/axgle/mahonia" func main(){ enc:=mahonia.NewEncoder("gbk") _, data, _ := enc.Translate([]byte(otherEncodingString), true) fmt.Println(data)
    }

  • please add LICENSE file

    please add LICENSE file

    I am packaging mahonia for Fedora. Adding the license file is not necessary, but the ticket should be opened according to the guidelines [https://fedoraproject.org/wiki/Packaging:LicensingGuidelines#License_Text]. @axgle

    https://bugzilla.redhat.com/show_bug.cgi?id=1480957#c1

  • 如果需要转换的string中有英文符号,会无法转换

    如果需要转换的string中有英文符号,会无法转换

    ··· e.DOM.Find("p").Each(func(i int, s *goquery.Selection) { text := s.Text() result := mahonia.NewDecoder("gbk").ConvertString(text) fmt.Println(result) }) ··· 这是一段爬取代码,text里面保存的是gbk编码的字符串。 我发现只要这个text里面有英文的“”双引号,双引号里面的内容都没有被转码。 输出的结果类似于 ··· 我是正常的中文鈥満焐氖谴竺ā⒙躺氖切 ··· 后面的乱码就是在英文的双引号中的文字。 但如果我把整个html页面包括div,li标签等都打印出来,就可以转码正常。 代码类似于: ··· c.OnHTML("#ArtContent", func(e *colly.HTMLElement) { result := mahonia.NewDecoder("gbk").ConvertString(string(e.Response.Body)) fmt.Println(result) ··· 在这里result 是完全转换成中文了,没有乱码。

  • 用utf8转成gbk再转回去 转不回去了

    用utf8转成gbk再转回去 转不回去了

    package main

    import ( "fmt" "github.com/axgle/mahonia" )

    func main() {

    str :="你好" 
    fmt.Println("UTF-8 to GBK: ",ConvertToString(str,"utf8","gbk"))
    
    // data :=ConvertToString(str,"utf8","gbk")
    fmt.Println("GBK to UTF-8: ",ConvertToString(ConvertToString(str,"utf8","gbk"),"gbk","utf8"))
    

    }

    func ConvertToString(src string, srcCode string, tagCode string) string { srcCoder := mahonia.NewDecoder(srcCode) srcResult := srcCoder.ConvertString(src) tagCoder := mahonia.NewDecoder(tagCode) _, cdata, _ := tagCoder.Translate([]byte(srcResult), true) result := string(cdata) return result }

  • Init decoder first and use later cause different behavior

    Init decoder first and use later cause different behavior

    image

    if I do enc := ma.NewEncoder("gb18030") first and then print with enc.ConvertString(file1.Name) I would get Chinese garbled.

    But if I just do ma.NewDecoder("gb18030").ConvertString(file1.Name) it works

  • A problem when converting jis-string to utf-8

    A problem when converting jis-string to utf-8

    In this call mahonia.NewDecoder("shift-jis").ConvertString(string(s))

    if string(s) contain char "", it will be converted to "¥"( full-width), which may lead to an error when creating file.

    The following code can avoid this problem without changing original package strings.Replace(mahonia.NewDecoder("shift-jis").ConvertString(string(s)), "¥", "", -1)

Handy tools to manipulate korean character.
Handy tools to manipulate korean character.

About hangul hangul is a set of handy tools for manipulate korean character in Go language. Example package main import ( "fmt" hangu

Oct 27, 2022
Golang HTML to plaintext conversion library

html2text Converts HTML into text of the markdown-flavored variety Introduction Ensure your emails are readable by all! Turns HTML into raw text, usef

Dec 28, 2022
PipeIt is a text transformation, conversion, cleansing and extraction tool.
PipeIt is a text transformation, conversion, cleansing and extraction tool.

PipeIt PipeIt is a text transformation, conversion, cleansing and extraction tool. Features Split - split text to text array by given separator. Regex

Aug 15, 2022
A collection of well-known string hash functions, implemented in Go

This library is a collection of "well-known" 32-bit string hashes, implemented in Go. It includes: Java string hash ELF-32 Jenkins' One-A

Mar 3, 2022
Produces a set of tags from given source. Source can be either an HTML page, Markdown document or a plain text. Supports English, Russian, Chinese, Hindi, Spanish, Arabic, Japanese, German, Hebrew, French and Korean languages.
Produces a set of tags from given source. Source can be either an HTML page, Markdown document or a plain text. Supports English, Russian, Chinese, Hindi, Spanish, Arabic, Japanese, German, Hebrew, French and Korean languages.

Tagify Gets STDIN, file or HTTP address as an input and returns a list of most popular words ordered by popularity as an output. More info about what

Dec 19, 2022
A general purpose application and library for aligning text.

align A general purpose application that aligns text The focus of this application is to provide a fast, efficient, and useful tool for aligning text.

Sep 27, 2022
A NMEA parser library in pure Go

go-nmea This is a NMEA library for the Go programming language (Golang). Features Parse individual NMEA 0183 sentences Support for sentences with NMEA

Dec 20, 2022
Go library for the TOML language

go-toml Go library for the TOML format. This library supports TOML version v1.0.0-rc.3 Features Go-toml provides the following features for using data

Dec 27, 2022
A Go library to parse and format vCard

go-vcard A Go library to parse and format vCard. Usage f, err := os.Open("cards.vcf") if err != nil { log.Fatal(err) } defer f.Close() dec := vcard.

Dec 26, 2022
A declarative struct-tag-based HTML unmarshaling or scraping package for Go built on top of the goquery library

goq Example import ( "log" "net/http" "astuart.co/goq" ) // Structured representation for github file name table type example struct { Title str

Dec 12, 2022
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

omniparser Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JS

Jan 4, 2023
The Go library for working with delimited separated value (DSV).

Package dsv is a Go library for working with delimited separated value (DSV). NOTE: This package has been deprecated. See https://github.com/shuLhan/s

Sep 15, 2021
Upskirt markdown library bindings for Go

Goskirt Package goskirt provides Go-bindings for the excellent Sundown Markdown parser. (F/K/A Upskirt). To use goskirt, create a new Goskirt-value wi

Oct 23, 2022
Go Library [DEPRECATED]

Tideland Go Library Description The Tideland Go Library contains a larger set of useful Google Go packages for different purposes. ATTENTION: The cell

Nov 15, 2022
Go library to parse and render Remarkable lines files
Go library to parse and render Remarkable lines files

go-remarkable2pdf Go library to parse and render Remarkable lines files as PDF.

Nov 7, 2022
A modern text indexing library for go
A modern text indexing library for go

bleve modern text indexing in go - blevesearch.com Features Index any go data structure (including JSON) Intelligent defaults backed up by powerful co

Jan 4, 2023
Faker is a Go library that generates fake data for you.
Faker is a Go library that generates fake data for you.

Faker is a Go library that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in your p

Jan 7, 2023
:book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

prose prose is a natural language processing library (English only, at the moment) in pure Go. It supports tokenization, segmentation, part-of-speech

Jan 4, 2023
golang rss/atom generator library

gorilla/feeds feeds is a web feed generator library for generating RSS, Atom and JSON feeds from Go applications. Goals Provide a simple interface to

Dec 26, 2022