Correct commonly misspelled English words in source files

Build Status Go Report Card GoDoc Coverage license

Correct commonly misspelled English words... quickly.

Install

If you just want a binary and to start using misspell:

curl -L -o ./install-misspell.sh https://git.io/misspell
sh ./install-misspell.sh

Both will install as ./bin/misspell. You can adjust the download location using the -b flag. File a ticket if you want another platform supported.

If you use Go, the best way to run misspell is by using gometalinter. Otherwise, install misspell the old-fashioned way:

go get -u github.com/client9/misspell/cmd/misspell

and misspell will be in your GOPATH

Also if you like to live dangerously, one could do

curl -L https://git.io/misspell | bash

Usage

$ misspell all.html your.txt important.md files.go
your.txt:42:10 found "langauge" a misspelling of "language"

# ^ file, line, column
$ misspell -help
Usage of misspell:
  -debug
    	Debug matching, very slow
  -error
    	Exit with 2 if misspelling found
  -f string
    	'csv', 'sqlite3' or custom Golang template for output
  -i string
    	ignore the following corrections, comma separated
  -j int
    	Number of workers, 0 = number of CPUs
  -legal
    	Show legal information and exit
  -locale string
    	Correct spellings using locale perferances for US or UK.  Default is to use a neutral variety of English.  Setting locale to US will correct the British spelling of 'colour' to 'color'
  -o string
    	output file or [stderr|stdout|] (default "stdout")
  -q	Do not emit misspelling output
  -source string
    	Source mode: auto=guess, go=golang source, text=plain or markdown-like text (default "auto")
  -w	Overwrite file with corrections (default is just to display)

FAQ

How can I make the corrections automatically?

Just add the -w flag!

$ misspell -w all.html your.txt important.md files.go
your.txt:9:21:corrected "langauge" to "language"

# ^ File is rewritten only if a misspelling is found

How do I convert British spellings to American (or vice-versa)?

Add the -locale US flag!

$ misspell -locale US important.txt
important.txt:10:20 found "colour" a misspelling of "color"

Add the -locale UK flag!

$ echo "My favorite color is blue" | misspell -locale UK
stdin:1:3:found "favorite color" a misspelling of "favourite colour"

Help is appreciated as I'm neither British nor an expert in the English language.

How do you check an entire folder recursively?

Just list a directory you'd like to check

misspell .
misspell aDirectory anotherDirectory aFile

You can also run misspell recursively using the following shell tricks:

misspell directory/**/*

or

find . -type f | xargs misspell

You can select a type of file as well. The following examples selects all .txt files that are not in the vendor directory:

find . -type f -name '*.txt' | grep -v vendor/ | xargs misspell -error

Can I use pipes or stdin for input?

Yes!

Print messages to stderr only:

$ echo "zeebra" | misspell
stdin:1:0:found "zeebra" a misspelling of "zebra"

Print messages to stderr, and corrected text to stdout:

$ echo "zeebra" | misspell -w
stdin:1:0:corrected "zeebra" to "zebra"
zebra

Only print the corrected text to stdout:

$ echo "zeebra" | misspell -w -q
zebra

Are there special rules for golang source files?

Yes! If the file ends in .go, then misspell will only check spelling in comments.

If you want to force a file to be checked as a golang source, use -source=go on the command line. Conversely, you can check a golang source as if it were pure text by using -source=text. You might want to do this since many variable names have misspellings in them!

Can I check only-comments in other other programming languages?

I'm told the using -source=go works well for ruby, javascript, java, c and c++.

It doesn't work well for python and bash.

Does this work with gometalinter?

gometalinter runs multiple golang linters. Starting on 2016-06-12 gometalinter supports misspell natively but it is disabled by default.

# update your copy of gometalinter
go get -u github.com/alecthomas/gometalinter

# install updates and misspell
gometalinter --install --update

To use, just enable misspell

gometalinter --enable misspell ./...

Note that gometalinter only checks golang files, and uses the default options of misspell

You may wish to run this on your plaintext (.txt) and/or markdown files too.

How Can I Get CSV Output?

Using -f csv, the output is standard comma-seprated values with headers in the first row.

misspell -f csv *
file,line,column,typo,corrected
"README.md",9,22,langauge,language
"README.md",47,25,langauge,language

How can I export to SQLite3?

Using -f sqlite, the output is a sqlite3 dump-file.

$ misspell -f sqlite * > /tmp/misspell.sql
$ cat /tmp/misspell.sql

PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE misspell(
  "file" TEXT,
  "line" INTEGER,i
  "column" INTEGER,i
  "typo" TEXT,
  "corrected" TEXT
);
INSERT INTO misspell VALUES("install.txt",202,31,"immediatly","immediately");
# etc...
COMMIT;
$ sqlite3 -init /tmp/misspell.sql :memory: 'select count(*) from misspell'
1

With some tricks you can directly pipe output to sqlite3 by using -init /dev/stdin:

misspell -f sqlite * | sqlite3 -init /dev/stdin -column -cmd '.width 60 15' ':memory' \
    'select substr(file,35),typo,count(*) as count from misspell group by file, typo order by count desc;'

How can I ignore rules?

Using the -i "comma,separated,rules" flag you can specify corrections to ignore.

For example, if you were to run misspell -w -error -source=text against document that contains the string Guy Finkelshteyn Braswell, misspell would change the text to Guy Finkelstheyn Bras well. You can then determine the rules to ignore by reverting the change and running the with the -debug flag. You can then see that the corrections were htey -> they and aswell -> as well. To ignore these two rules, you add -i "htey,aswell" to your command. With debug mode on, you can see it print the corrections, but it will no longer make them.

How can I change the output format?

Using the -f template flag you can pass in a golang text template to format the output.

One can use printf "%q" VALUE to safely quote a value.

The default template is compatible with gometalinter

{{ .Filename }}:{{ .Line }}:{{ .Column }}:corrected {{ printf "%q" .Original }} to "{{ printf "%q" .Corrected }}"

To just print probable misspellings:

-f '{{ .Original }}'

What problem does this solve?

This corrects commonly misspelled English words in computer source code, and other text-based formats (.txt, .md, etc).

It is designed to run quickly so it can be used as a pre-commit hook with minimal burden on the developer.

It does not work with binary formats (e.g. Word, etc).

It is not a complete spell-checking program nor a grammar checker.

What are other misspelling correctors and what's wrong with them?

Some other misspelling correctors:

They all work but had problems that prevented me from using them at scale:

  • slow, all of the above check one misspelling at a time (i.e. linear) using regexps
  • not MIT/Apache2 licensed (or equivalent)
  • have dependencies that don't work for me (python3, bash, linux sed, etc)
  • don't understand American vs. British English and sometimes makes unwelcome "corrections"

That said, they might be perfect for you and many have more features than this project!

How fast is it?

Misspell is easily 100x to 1000x faster than other spelling correctors. You should be able to check and correct 1000 files in under 250ms.

This uses the mighty power of golang's strings.Replacer which is a implementation or variation of the Aho–Corasick algorithm. This makes multiple substring matches simultaneously.

In addition this uses multiple CPU cores to work on multiple files.

What problems does it have?

Unlike the other projects, this doesn't know what a "word" is. There may be more false positives and false negatives due to this. On the other hand, it sometimes catches things others don't.

Either way, please file bugs and we'll fix them!

Since it operates in parallel to make corrections, it can be non-obvious to determine exactly what word was corrected.

It's making mistakes. How can I debug?

Run using -debug flag on the file you want. It should then print what word it is trying to correct. Then file a bug describing the problem. Thanks!

Why is it making mistakes or missing items in golang files?

The matching function is case-sensitive, so variable names that are multiple worlds either in all-upper or all-lower case sometimes can cause false positives. For instance a variable named bodyreader could trigger a false positive since yrea is in the middle that could be corrected to year. Other problems happen if the variable name uses a English contraction that should use an apostrophe. The best way of fixing this is to use the Effective Go naming conventions and use camelCase for variable names. You can check your code using golint

What license is this?

The main code is MIT.

Misspell also makes uses of the Golang standard library and contains a modified version of Golang's strings.Replacer which are covered under a BSD License. Type misspell -legal for more details or see legal.go

Where do the word lists come from?

It started with a word list from Wikipedia. Unfortunately, this list had to be highly edited as many of the words are obsolete or based from mistakes on mechanical typewriters (I'm guessing).

Additional words were added based on actually mistakes seen in the wild (meaning self-generated).

Variations of UK and US spellings are based on many sources including:

American English is more accepting of spelling variations than is British English, so "what is American or not" is subject to opinion. Corrections and help welcome.

What are some other enhancements that could be done?

Here's some ideas for enhancements:

Capitalization of proper nouns could be done (e.g. weekday and month names, country names, language names)

Opinionated US spellings US English has a number of words with alternate spellings. Think adviser vs. advisor. While "advisor" is not wrong, the opinionated US locale would correct "advisor" to "adviser".

Versioning Some type of versioning is needed so reporting mistakes and errors is easier.

Feedback Mistakes would be sent to some server for agregation and feedback review.

Contractions and Apostrophes This would optionally correct "isnt" to "isn't", etc.

Comments
  • Ignore binary files based on content, not just filenames

    Ignore binary files based on content, not just filenames

    When deciding whether to ignore a binary file, the decision could/should be made also on the content of files, not just filenames.

    Maybe https://golang.org/pkg/net/http/#DetectContentType could be used? If its return value starts with text/, treat as non-binary and proceed with check.

    Or if that doesn't work for some reason, check for \0 (nul) in the first (max) 512 bytes of a file and treat as binary if found.

  • -source=go has no effect

    -source=go has no effect

    Commit f637716 removed the special handling to only check Go comments (commit 1aaafc0 is the next commit that passed CI).

    With commit 49cf02c715,

    ./misspell -source=go
    void mispell()
    {
    // mispell
    }
    stdin:3:3:found "mispell" a misspelling of "misspell"
    

    This appears to be the correct behaviour. The misspelling in a comment is identified and the misspelling in the function name is ignored.

    With commit 1aaafc0,

    ./misspell -source=go
    void mispell()
    {
    // mispell
    }
    /home/joel/sample.go:1:5:found "mispell" a misspelling of "misspell"
    /home/joel/sample.go:3:4:found "mispell" a misspelling of "misspell"
    

    Now the misspelling in the function name is also identified.

  • false positives

    false positives

    apologies everyone. the unit tests are failing but travis-ci is not picking up on it.

    The failures here is are due to a sub-string of word being corrected, that ends up being a correct-but-different word.

    e.g. words-aspell-1.txt:22411:0:found "adventurers" a misspelling of "adventures"

    There is no rule to do this.

    Unfortunately, the rule

    venturers-> ventures
    

    is converting adventurers into adventures

    here are some others (and some are not correct as well)

    /go/src/github.com/client9/misspell # misspell words-aspell-1.txt 
    words-aspell-1.txt:19322:0:found "Treasuries" a misspelling of "Treasures"
    words-aspell-1.txt:22411:0:found "adventurers" a misspelling of "adventures"
    words-aspell-1.txt:38674:0:found "compliant" a misspelling of "complaint"
    words-aspell-1.txt:40283:0:found "convertors" a misspelling of "converts"
    words-aspell-1.txt:44490:0:found "deliberative" a misspelling of "deliberate"
    words-aspell-1.txt:45277:0:found "descried" a misspelling of "described"
    words-aspell-1.txt:45278:0:found "descries" a misspelling of "describes"
    words-aspell-1.txt:55709:0:found "florescent" a misspelling of "fluorescent"
    words-aspell-1.txt:78960:0:found "multiplies" a misspelling of "multiples"
    words-aspell-1.txt:85494:0:found "payed" a misspelling of "paid"
    words-aspell-1.txt:87889:0:found "planed" a misspelling of "planned"
    words-aspell-1.txt:89799:0:found "predicating" a misspelling of "predicting"
    words-aspell-1.txt:89800:0:found "predication" a misspelling of "prediction"
    words-aspell-1.txt:89802:0:found "predicative" a misspelling of "predictive"
    words-aspell-1.txt:91196:0:found "proportionals" a misspelling of "proportions"
    words-aspell-1.txt:94939:0:found "refection" a misspelling of "reflection"
    words-aspell-1.txt:97389:0:found "revolvers" a misspelling of "revolves"
    words-aspell-1.txt:104588:0:found "smoothy" a misspelling of "smoothly"
    words-aspell-1.txt:106007:0:found "specif" a misspelling of "specific"
    words-aspell-1.txt:110623:0:found "symmetricly" a misspelling of "symmetrical"
    words-aspell-1.txt:114800:0:found "treasuries" a misspelling of "treasures"
    words-aspell-1.txt:122258:0:found "withing" a misspelling of "within"
    words-aspell-1.txt:122714:0:found "worshiping" a misspelling of "worshipping"
    
  • Ignore files in SCM dirs

    Ignore files in SCM dirs

    As misspell now tries to ignore binary files, maybe it could try to ignore files in SCM dirs as well? For example skip if fed a file path containing dir component .git, .svn, .hg, .bzr, CVS, ...

  • vendor folder support?

    vendor folder support?

    Thanks for this tool; I use misspell in Caddy's CI tests. We're vendoring our dependencies though and this is proving to be a problem for our checks because misspell does not work well with vendor folders, where I don't care if there are misspellings.

    Old CI command:

    $ misspell -error .
    

    Works great, except it checks the vendor/ folder, which I need to avoid.

    $ misspell -error $(go list ./... | grep -v vendor)
    

    Doesn't work because misspell doesn't take package paths.

    $ find . -type f | grep -v vendor/ | xargs misspell
    

    Checks more than just .go files.

    find . -type d | grep -v vendor/ | xargs misspell
    

    Same problem as first, since it still traverses into the vendor subfolder.

    Basically, I'm torn. For now, I'm removing this from our CI checks, but if the first command could skip vendor/ folders, that would be best for me I think and I could add this back to our checks.

  • Build error

    Build error

    $ go get -v -u "github.com/client9/misspell/cmd/misspell"
    github.com/client9/misspell (download)
    github.com/client9/misspell
    github.com/client9/misspell/cmd/misspell
    # github.com/client9/misspell/cmd/misspell
    ../../../github.com/client9/misspell/cmd/misspell/main.go:190: undefined: updated
    
  • British spelling is treated as

    British spelling is treated as "common misspelling"

    Being British I do not find the following example amusing

    your.txt:42:10 found "initialised" a misspelling of "initialized"

    in particular when I suddenly see it in the goreportcard.com status of my project (the only one of the originally 100% score):

    Line 109: warning: 55:found "initialised" a misspelling of "initialized" (misspell)

    Is there any good reason to treat English as English misspelling?

  • misspell tries to modify images

    misspell tries to modify images

    Against https://github.com/cockroachdb/cockroach/commit/ac79461eef6223544ee2d5ee49e908c94524767c

    misspell -w $(git ls-files) -debug
    docs/RFCS/distributed_sql_processor.png:55:1296:corrected "adn" to "and"
    ui/apple-touch-icon.png:11:192:corrected "TEH" to "THE"
    ui/fonts/Lato-Bold.woff:823:297:corrected "adn" to "and"
    

    Needless to say, this corrupts the images.

  • Add retunred->returned.

    Add retunred->returned.

    Originally encountered in https://github.com/golang/gddo/blob/eb44e37ce297392abc76d1ac1ca8343a77f43662/database/database_test.go#L124. Though it was a misspelling inside a Go string, not a comment, so this tool wouldn't have fixed that particular instance. But maybe it'll help elsewhere?

    Since #4 is not resolved, there's a chance I did something wrong. It's my first time adding a missing word here, so I'm not familiar with the process. Please review.

  • False negative: package imports

    False negative: package imports

    Hi,

    I got few errors about my import paths -

    misspell50%
    
    Misspell Finds commonly misspelled English words
    
            qrget/main.go
            Line 22: warning: "nucular" is a misspelling of "nuclear" (misspell)
            Line 22: warning: "nucular" is a misspelling of "nuclear" (misspell)
            Line 23: warning: "nucular" is a misspelling of "nuclear" (misspell)
    

    I'd say that it should ignore "mispells" in import section, because those are not words, but identifiers.

  • Switch to xurls for the url regexp

    Switch to xurls for the url regexp

    Avoids having to roll a separate regexp. It's also more aggressive, matching anything that looks like a URL instead of trying to only match correct URLs.

    All the removed test cases now match as urls and are replaced with blank space, which is good since they wouldn't be potential misspellings anyway.

  • README build instructions update

    README build instructions update

    Hello :wave:

    I attempted to install misspell following instructions provided in the README and encountered the following error:

    go get -u github.com/client9/misspell/cmd/misspell
    go get: installing executables with 'go get' in module mode is deprecated.
            Use 'go install pkg@version' instead.
            For more information, see https://golang.org/doc/go-get-install-deprecation
            or run 'go help get' or 'go help install'.
    

    It may be worth updating the README install section to follow the recommended workflow for installing Go libraries which changed in 1.16. The install instruction would now be go install github.com/client9/misspell/cmd/misspell@latest

  • Add support for MacOS Monterey (?)

    Add support for MacOS Monterey (?)

    Hi there. It seems like there is an issue with misspell with MacOS Monterey. After updating, running misspell produces this error:

    fatal error: runtime: bsdthread_register error
    
    runtime stack:
    runtime.throw(0x13b111d, 0x21)
    	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/panic.go:619 +0x81 fp=0x7ff7bfeff708 sp=0x7ff7bfeff6e8 pc=0x1029051
    runtime.goenvs()
    	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/os_darwin.go:129 +0x83 fp=0x7ff7bfeff738 sp=0x7ff7bfeff708 pc=0x1026bd3
    runtime.schedinit()
    	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/proc.go:496 +0xa4 fp=0x7ff7bfeff790 sp=0x7ff7bfeff738 pc=0x102b914
    runtime.rt0_go(0x7ff7bfeff7c8, 0x1, 0x7ff7bfeff7c8, 0x0, 0x1000000, 0x1, 0x7ff7bfeff990, 0x0, 0x7ff7bfeff999, 0x7ff7bfeff9d5, ...)
    	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/asm_amd64.s:252 +0x1f4 fp=0x7ff7bfeff798 sp=0x7ff7bfeff790 pc=0x10515e4
    

    It seems like this could be remedied by building with go 1.11 instead of 1.10, per this golang wiki entry.

    I assume this is something that should be fixed here and not on my system because I am downloading the installer script and using the executable it builds rather than going through go itself, but please let me know if that is incorrect. I am not very familiar with go or the inner workings of this codebase.

    Thank you for making an excellent tool! ❤️

  • Add support for M1 Macs

    Add support for M1 Macs

    It would be great to have support for Apple silicon macOS now that they are more common.

    Looks like install-misspell.sh would need to be updated in is_supported_platform() and a release tarball with the appropriate name would need to be created.

  • ability to add additional custom wordlist

    ability to add additional custom wordlist

    First of all thank you for your hard work. I also tried to look for what I was looking for in the existing open and closed issues but couldn't really find it. If it is a duplicate issue please do close immediately and link to it. My apologies if so.

    It would be very useful if I could add a wordlist to the root of a project to add additional words that should be accepted. This is often for technical words or words for historic reasons. In HTTP headers it is for example quite custom to have words that look like misspellings but are actually correct. Some examples:

    • Ect, not a misspelling for etc, it's an abbreviation short for "effective connection type";
    • Referer instead of Referrer: this is a misspelling, but so on purpose, as the ones making the HTTP spec made that mistake a long time ago, and now we all have to live with it;

    There are a lot more examples. I use this project (misspell) as part of golang-ci by the way. Currently there are 2 options I have:

    • ignore misspell for an entire file, project: not desired as I do like the linting it gives me in general;
    • add nolint:misspell for each line where such a word occurs: this is what I currently do, but it adds a lot of noise, while really it is always about the same small set of words for any given project;

    I imagine I am not the only person suffering from this. As such I have a feeling that this is probably already possible, but the documentation seemed to indicate the oppose (and so by design). Any help here?

[mirror] This is a linter for Go source code.

Golint is a linter for Go source code. Installation Golint requires a supported release of Go. go get -u golang.org/x/lint/golint To find out where g

Dec 23, 2022
Remove unnecessary type conversions from Go source

About The unconvert program analyzes Go packages to identify unnecessary type conversions; i.e., expressions T(x) where x already has type T. Install

Nov 28, 2022
depth is tool to retrieve and visualize Go source code dependency trees.

depth is tool to retrieve and visualize Go source code dependency trees. Install Download the appropriate binary for your platform from the Rele

Dec 30, 2022
Print all source code for a given go package or module.

gosrcs gosrcs is a tool to print all the source code a given go package depends on. The original motivation of this tool is to integrate go builds int

Oct 25, 2021
Detect non-inclusive language in your source code.
Detect non-inclusive language in your source code.

Detect non-inclusive language in your source code. I stay woke - Erykah Badu Creating an inclusive work environment is imperative to a healthy, suppor

Dec 25, 2022
a Go code to detect leaks in JS files via regex patterns

a Go code to detect leaks in JS files via regex patterns

Nov 13, 2022
misspelled word linter for Go comments, string literals and embedded files

gospel The gospel program lints Go source files for misspellings in comments, strings and embedded files. It uses hunspell to identify misspellings an

Aug 6, 2022
This project parses all mails from google-search within key-words and ban-words

mailParser This project parses all mails from google-search within key-words and ban-words For launch program create the input file first string conta

Feb 2, 2022
Go-fetch-words - Fetch 5 letter words from dictionary.com

Go-fetch-words This GO app fetches 5 letter words from dictionary.com and saves

Oct 16, 2022
Produces a set of tags from given source. Source can be either an HTML page, Markdown document or a plain text. Supports English, Russian, Chinese, Hindi, Spanish, Arabic, Japanese, German, Hebrew, French and Korean languages.
Produces a set of tags from given source. Source can be either an HTML page, Markdown document or a plain text. Supports English, Russian, Chinese, Hindi, Spanish, Arabic, Japanese, German, Hebrew, French and Korean languages.

Tagify Gets STDIN, file or HTTP address as an input and returns a list of most popular words ordered by popularity as an output. More info about what

Dec 19, 2022
This is an open source project for commonly used functions for the Go programming language.

Common Functions This is an open source project for commonly used functions for the Go programming language. This package need >= go 1.3 Code Conventi

Jan 8, 2023
A fast, correct image dithering library in Go.

dither is a library for dithering images in Go. It has many dithering algorithms built-in, and allows you to specify your own. Correctness is a

Dec 27, 2022
Correct the timestamp of photo/video of Google Photo from Google takeout

Correct Timestamp of Google Photo from Google Takeout Development Environment: Ubuntu 20.04 Go 1.17 Usage Require Go 1.16 or later to build. go build

Sep 9, 2022
Some examples of testing techniques and commonly used frameworks

golang-test-examples Some examples of testing techniques and commonly used frameworks Test frameworks Testify Ginkgo Convey godog (cucumber) Test exam

Oct 29, 2021
Migration - Commonly used migration tools

Migration Commonly used migration tools Usage package main import ( "context"

Feb 16, 2022
Migration - Commonly used migration tools

Migration Commonly used migration tools Usage package main import ( "context"

Feb 16, 2022
Go efficient text segmentation and NLP; support english, chinese, japanese and other. Go 语言高性能分词

gse Go efficient text segmentation; support english, chinese, japanese and other. 简体中文 Dictionary with double array trie (Double-Array Trie) to achiev

Jan 8, 2023
Stemmer packages for Go programming language. Includes English, German and Dutch stemmers.

Stemmer package for Go Stemmer package provides an interface for stemmers and includes English, German and Dutch stemmers as sub-packages: porter2 sub

Dec 14, 2022
An easy-to-use OCR and Japanese to English translation tool
An easy-to-use OCR and Japanese to English translation tool

Manga Translator An easy-to-use application for translating text in images from Japanese to English. The GUI was created using Gio. Gio supports a var

Dec 28, 2022
Converts a number to its English counterpart. Uses arbitrary precision; so a number of any size can be converted.

Converts a number to its English counterpart. Uses arbitrary precision; so a number of any size can be converted.

Dec 14, 2021