go-runewidth
Provides functions to get fixed width of the character or string.
Usage
runewidth.StringWidth("つのだ☆HIRO") == 12
Author
Yasuhiro Matsumoto
License
under the MIT License: http://mattn.mit-license.org/2013
Provides functions to get fixed width of the character or string.
runewidth.StringWidth("つのだ☆HIRO") == 12
Yasuhiro Matsumoto
under the MIT License: http://mattn.mit-license.org/2013
This introduces an implementation of StringWidth() using Unicode grapheme clusters which should be the correct way to split a string into its individual characters. The built-in assumption is that if we have combined runes (emojis, flags etc.), their width is the width of the first non-zero-width rune. Many of these combined runes were previously not handled correctly by this package.
Please note:
rivo/uniseg
over.)TestStringWidth
test case but only the part where EastAsianWidth = true
. I'm not very familiar with this flag so I don't know how to fix that. You may want to review this.Update runewidth to use unicode9 character width tables. This is the default in vim and neovim now, so should be safe to use in any terminal.
The Condition
is no longer calculated using IsEastAsian()
as terminals do not use locale to determine how wide to draw ambiguous with characters. Rather, they simply default to 1, and may offer an option to set to 2 (which is discouraged).
All tests still pass. The API is very close to the way it was, but not identical due to the change in Condition
.
I recognize this is a fairly large diff, so I'd be happy to work with you in any way you feel best to get this merged.
Check this for the definition of box-drawing (BD below) characters.
I found that these characters are defined to be of ambiguous width, so passing these to RuneWidth
returns 2 in my environment. This is somehow inconvenient since AFAIK, terminal fonts tend to interpret BD characters in half-width.
Is it possible to remove these characters from the ambiguous table? I can make the PR if you think this sounds sane.
Thanks.
Here's a short example that illustrates an issue with flags (or "regional indicators"):
fmt.Println(runewidth.StringWidth("🇩🇪")) // Should be "2", outputs "4".
The flag consists of two code points which are processed separately by runewidth
. But most modern systems will combine them into one flag emoji.
This is part of a larger topic which I describe in more detail here: gdamore/tcell#264. It doesn't just affect flags but also characters in e.g. Arabic and Korean where there are more sophisticated rules than "combining characters" and zero-width joiners (which you added with #20).
I don't know exactly how you calculate the widths of characters. I'm also not sure how you would solve flags as well as some of the other rules described in the Unicode specification but it would sure be nice as printing these flags currently gives me trouble in tview
. There have been multiple issues asking for better support for different languages and emojis so it seems that there are quite a few people who use the terminal with these characters.
(Maybe my new package uniseg
can help you here.)
It would be great if you could add support for zero-width joiners (ZWJ). I have the following code example which doesn't work as expected:
package main
import (
"fmt"
runewidth "github.com/mattn/go-runewidth"
)
func main() {
e := "👨👨👧"
r := []rune(e)
var widths []int
for _, c := range r {
widths = append(widths, runewidth.RuneWidth(c))
}
fmt.Printf("%s : len=%d numrunes=%d width=%d widths=%v runes=%X\n", e, len(e), len(r), runewidth.StringWidth(e), widths, r)
}
The output is:
👨👨👧 : len=18 numrunes=5 width=6 widths=[2 0 2 0 2] runes=[1F468 200D 1F468 200D 1F467]
Specifically, width
should be 2
instead of 6
. I found this article which explains how they work. It does not only affect emojis but also characters in some languages.
This came up in rivo/tview#161. It would be great if support for ZWJ could be added so I can implement support for these Unicode characters in tview
. I understand that not all kinds of combinations are supported and it's probably difficult to figure out which ones are. But assuming these characters are supported will help a lot. I don't expect users to try to print ZWJ combinations which are not supported anyway.
Thanks!
Hi,
Consider the following three similar unicode characters:
'-' - Unicode Character 'HYPHEN-MINUS' (U+002D)
'–' - Unicode Character 'EN DASH' (U+2013)
'—' - Unicode Character 'EM DASH' (U+2014)
From https://github.com/shurcooL/markdownfmt/issues/7#issuecomment-46792756, I've learned that go-runewidth
considers the width of the first character to be 1, and the width of second and third characters to be 2.
Is that intended?
I'm not sure how to test this reliably, but in most environments it seems that EN DASH has width that's closer to 1 than 2.
Any thoughts on this?
Provides nearly an order of magnitude speedup depending on how quickly the checks are done.
Data is packed at 4 bytes/rune, since the max output value is 2.
cpu: AMD Ryzen 9 3950X 16-Core Processor
BenchmarkRuneWidthAll/regular-32 51 25539433 ns/op 0 B/op 0 allocs/op
BenchmarkRuneWidthAll/lut-32 442 2711694 ns/op 0 B/op 0 allocs/op
BenchmarkRuneWidth768/regular-32 617528 2109 ns/op 0 B/op 0 allocs/op
BenchmarkRuneWidth768/lut-32 605570 2038 ns/op 0 B/op 0 allocs/op
BenchmarkRuneWidthAllEastAsian/regular-32 31 36469868 ns/op 0 B/op 0 allocs/op
BenchmarkRuneWidthAllEastAsian/lut-32 442 2710229 ns/op 0 B/op 0 allocs/op
BenchmarkRuneWidth768EastAsian/regular-32 73273 16028 ns/op 0 B/op 0 allocs/op
BenchmarkRuneWidth768EastAsian/lut-32 634987 1871 ns/op 0 B/op 0 allocs/op
PASS
I'm trying to port an old DOS program using tcell (which uses RuneWidth). My program has a table mapping CP437 char code to rune, and then I print that rune to the screen. I'm in the terminal with fixed width fonts, so I expect all chars to be the same width.
The issue is RuneWidth('\u2666')
and some other characters is returning width 2 instead of 1, which makes tcell allocate 2 chars for it and causes "gaps" in the rendering. Here's playground code showing which chars do this: https://play.golang.org/p/Hjq3GOC0Pcd -- output is:
RuneWidth('☺') = 2
RuneWidth('☻') = 2
RuneWidth('♥') = 2
RuneWidth('♦') = 2
RuneWidth('♣') = 2
RuneWidth('♠') = 2
RuneWidth('♂') = 2
RuneWidth('♀') = 2
RuneWidth('♪') = 2
RuneWidth('♫') = 2
RuneWidth('☼') = 2
RuneWidth('↕') = 2
RuneWidth('‼') = 2
RuneWidth('↔') = 2
I believe it's happening because these are treated as Emoji characters. Is this behavior expected? If so, how do I work around this in tcell?
When using an east asian encoding, the following runes are given a width of 2 but they should be 1: ─┌└┐┘│
.
To reproduce:
export LC_CTYPE="ja_JP.UTF-8"
(in go program)
runewidth.RuneWidth('─') // returns 2
looking at the runewidth_table.go file, the culprit is {0x24EB, 0x254B}
in the ambiguous
table. I'm not sure how to update this; the file is auto-generated.
In terminal apps which render box characters this can lead to broken rendering:
Let me know if there's anything else I can add. Thanks :)
According to tr11, the "na" designation stands for "East Asian Narrow". This includes both ASCII characters as well as some non-ASCII characters like double angle brackets. Previous versions of go-runewidth (v0.0.4) assigned these a width of one, but more recent versions assign it a width of zero. I noticed this because tcell versions after this commit refused to display the double angle bracket characters.
The fix proposed in this PR is to assign characters with the "na" designation a width of one.
For the issue possible regression ? · Issue #32 · mattn/go-runewidth.
On the script/generate.go reading https://unicode.org/Public/12.1.0/ucd/EastAsianWidth.txt , the record which has "F"(FullWidth) field is not used as same as "W".
So, I tried to fix to support and update checksums (Is it to remind go generate
?)
Would you like to merge if there are no problems ?
The rivo/uniseg
package has received a major update which also includes methods for grapheme cluster parsing that are much faster than the previously used Graphemes
class.
I've upgraded your package accordingly and updated the relevant code to use these faster methods. It would be great if you could merge these changes.
Thank you!
ps. I noticed that some automatic checks did not complete successfully because they are still running on Go 1.15. Would you like me to look into upgrading them to the current version (1.18)?
I stumbled over a character that, when output to the console directly, takes up two characters. But StringWidth()
gives me 1
. This is because the first rune of this character has a width of 1
and that's what's being used, see here. I know I wrote this code and I'm sure that you cannot simply add up the widths of individual runes ("🏳️🌈" would then have a width of 4 which is obviously wrong) and using the first rune's width worked fine so far. But it turns out that it fails in some cases.
I'm not familiar with Indian characters but it seems to me that the second rune is a modifier that turns the character from a width of 1
into a width of 2
. Are you aware of any logic that we could add to go-runewidth
that makes this right?
Here's example code that illustrates the issue:
package main
import (
"fmt"
runewidth "github.com/mattn/go-runewidth"
)
func main() {
s := "खा"
fmt.Println("0123456789")
fmt.Println(s + "<")
fmt.Printf("String width: %d\n", runewidth.StringWidth(s))
var i int
for _, r := range s {
fmt.Printf("Rune %s (%d) width: %d\n", string(r), i, runewidth.RuneWidth(r))
i++
}
}
Output (on macOS with iTerm2):
ZeroWidthJoiner
was removed after v0.0.9
: https://github.com/mattn/go-runewidth/blob/v0.0.9/runewidth.go#L14
The next version was v0.0.10
, but this introduced a breaking API change.
While being v0
means you can introduce breaking API changes, would it be possible to get a v1
release that can ensure API stability?
It's fine to just keep cutting new versions when API changes happen, but right now it makes managing Go Module dependencies rather painful, since it just assumes patch versions don't introduce breaking changes.
I stumbled over this while working on #47.
It seems that RuneWidth is not always equal to the StringWidth of a single rune.
This is quite unexpected, TBH.
Please see https://github.com/markus-oberhumer/mattn--go-runewidth/commit/5da511d36b1ea1ad913590b7b27357e5fffd3512 for a test case.
Added power support for the travis.yml file with ppc64le. This is part of the Ubuntu distribution for ppc64le. This helps us simplify testing later when distributions are re-building and re-releasing. For more info tag @gerrith3.
Go View File 在线体验地址 http://39.97.98.75:8082/view/upload (不会经常更新,保留最基本的预览功能。服务器配置较低,如果出现链接超时请等待几秒刷新重试,或者换Chrome) 目前已经完成 docker部署 (不用为运行环境烦恼) Wor
bluemonday bluemonday is a HTML sanitizer implemented in Go. It is fast and highly configurable. bluemonday takes untrusted user generated content as
Colly Lightning Fast and Elegant Scraping Framework for Gophers Colly provides a clean interface to write any kind of crawler/scraper/spider. With Col
did did is a Go package that provides tools to work with Decentralized Identifiers (DIDs). Install go get github.com/ockam-network/did Example packag
Parses the Graphviz DOT language and creates an interface, in golang, with which to easily create new and manipulate existing graphs which can be writ
Gotext GNU gettext utilities for Go. Features Implements GNU gettext support in native Go. Complete support for PO files including: Support for multil
htmlquery Overview htmlquery is an XPath query package for HTML, lets you extract data or evaluate from HTML documents by an XPath expression. htmlque
omniparser Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JS
Pagser Pagser inspired by page parser。 Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and str
podcast Package podcast generates a fully compliant iTunes and RSS 2.0 podcast feed for GoLang using a simple API. Full documentation with detailed ex
THIS PROJECT IS UNMAINTAINED The last commit to this repo before writing this message occurred over two years ago. While it was never my intention to
csvutil - Your CSV pocket-knife (golang) #WARNING I would advise against using this package. It was a language learning exercise from a time before "e
gosexy/gettext Go bindings for GNU gettext, an internationalization and localization library for writing multilingual systems. Requeriments GNU gettex
goagrep There are situations where you want to take the user's input and match a primary key in a database. But, immediately a problem is introduced:
go-ngram N-gram index for Go. Key features Unicode support. Append only. Data can't be deleted from index. GC friendly (all strings are pooled and com
gokmp String-matching in Golang using the Knuth–Morris–Pratt algorithm (KMP). Disclaimer This library was written as part of my Master's Thesis and sh
html2text Converts HTML into text of the markdown-flavored variety Introduction Ensure your emails are readable by all! Turns HTML into raw text, usef
raymond Handlebars for golang with the same features as handlebars.js 3.0. The full API documentation is available here: http://godoc.org/github.com/a
sanitize Package sanitize provides functions to sanitize html and paths with go (golang). FUNCTIONS sanitize.Accents(s string) string Accents replaces