Unified text diffing in Go (copy of the internal diffing packages the officlal Go language server uses)

gotextdiff - unified text diffing in Go Go Reference Hexops logo

This is a copy of the Go text diffing packages that the official Go language server gopls uses internally to generate unified diffs.

If you've previously tried to generate unified text diffs in Go (like the ones you see in Git and on GitHub), you may have found github.com/sergi/go-diff which is a Go port of Neil Fraser's google-diff-match-patch code - however it does not support unified diffs.

This is arguably one of the best (and most maintained) unified text diffing packages in Go as of at least 2020.

(All credit goes to the Go authors, I am merely re-publishing their work so others can use it.)

Example usage

Import the packages:

import (
    "github.com/hexops/gotextdiff"
    "github.com/hexops/gotextdiff/myers"
)

Assuming you want to diff a.txt and b.txt, whose contents are stored in aString and bString then:

edits := myers.ComputeEdits(span.URIFromPath("a.txt"), aString, bString)
diff := fmt.Sprint(gotextdiff.ToUnified("a.txt", "b.txt", aString, edits))

diff will be a string like:

--- a.txt
+++ b.txt
@@ -1,13 +1,28 @@
-foo
+bar

API compatability

We will publish a new major version anytime the API changes in a backwards-incompatible way. Because the upstream is not being developed with this being a public package in mind, API breakages may occur more often than in other Go packages (but you can always continue using the old version thanks to Go modules.)

Alternatives

Contributing

We will only accept changes made upstream, please send any contributions to the upstream instead! Compared to the upstream, only import paths will be modified (to be non-internal so they are importable.) The only thing we add here is this README.

License

See https://github.com/golang/tools/blob/master/LICENSE

Owner
Hexops
Experiment everywhere
Hexops
Similar Resources

Easy AWK-style text processing in Go

awk Description awk is a package for the Go programming language that provides an AWK-style text processing capability. The package facilitates splitt

Jul 25, 2022

Change the color of console text.

go-colortext package This is a package to change the color of the text and background in the console, working both under Windows and other systems. Un

Oct 26, 2022

Templating system for HTML and other text documents - go implementation

FAQ What is Kasia.go? Kasia.go is a Go implementation of the Kasia templating system. Kasia is primarily designed for HTML, but you can use it for any

Mar 15, 2022

Package sanitize provides functions for sanitizing text in golang strings.

sanitize Package sanitize provides functions to sanitize html and paths with go (golang). FUNCTIONS sanitize.Accents(s string) string Accents replaces

Dec 5, 2022

Small and fast FTS (full text search)

Microfts A small full text indexing and search tool focusing on speed and space. Initial tests seem to indicate that the database takes about twice as

Jul 30, 2022

text to speech bot for discord

text to speech bot for discord

text to speech bot for discord

Oct 1, 2022

A diff3 text merge implementation in Go

Diff3 A diff3 text merge implementation in Go based on the awesome paper below. "A Formal Investigation of Diff3" by Sanjeev Khanna, Keshav Kunal, and

Nov 5, 2022

gomtch - find text even if it doesn't want to be found

gomtch - find text even if it doesn't want to be found Do your users have clever ways to hide some terms from you? Sometimes it is hard to find forbid

Sep 28, 2022

Convert scanned image PDF file to text annotated PDF file

Convert scanned image PDF file to text annotated PDF file

Jisui (自炊) This tool is PoC (Proof of Concept). Jisui is a helper tool to create e-book. Ordinary the scanned book have not text information, so you c

Dec 11, 2022
Comments
  • used extremely large amount of memory

    used extremely large amount of memory

    Just comparing 2 solaris pkginfo output used about 1.5Gb of memory (attached).

    	edits := myers.ComputeEdits(span.URIFromPath(""), PrevContent, Content)
    	unifiedPatch := gotextdiff.ToUnified("src", "dst", PrevContent, edits)
    	var DiffContent = fmt.Sprint(unifiedPatch)
    
    	fmt.Printf("!!!! len(DiffContent):%+v\n", len(DiffContent))
    
    /usr/bin/time -v go run scratch.go
    !!!! len(DiffContent):1152378
    	Command being timed: "go run scratch.go"
    	User time (seconds): 2.07
    	System time (seconds): 2.51
    	Percent of CPU this job got: 102%
    	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.46
    	Average shared text size (kbytes): 0
    	Average unshared data size (kbytes): 0
    	Average stack size (kbytes): 0
    	Average total size (kbytes): 0
    	Maximum resident set size (kbytes): 15095320
    	Average resident set size (kbytes): 0
    	Major (requiring I/O) page faults: 5
    	Minor (reclaiming a frame) page faults: 3793204
    	Voluntary context switches: 3058
    	Involuntary context switches: 434
    	Swaps: 0
    	File system inputs: 0
    	File system outputs: 5640
    	Socket messages sent: 0
    	Socket messages received: 0
    	Signals delivered: 0
    	Page size (bytes): 4096
    	Exit status: 0
    

    solaris-sw-pkginfo.txt prev-pkginfo.txt

  • Merge algorithm is inadequate

    Merge algorithm is inadequate

    Hello,

    I'm curious if a better merge algorithm would be considered in-scope. Right now the patch always applies to the same line position regardless of context. When I found this package I was looking for something with more similar behaviour to Git, but I wasn't able to find such a thing and will need to implement it myself.

Guess the natural language of a text in Go

guesslanguage This is a Go version of python guess-language. guesslanguage provides a simple way to detect the natural language of unicode string and

Dec 26, 2022
👄 The most accurate natural language detection library in the Go ecosystem, suitable for long and short text alike
👄 The most accurate natural language detection library in the Go ecosystem, suitable for long and short text alike

?? The most accurate natural language detection library in the Go ecosystem, suitable for long and short text alike

Dec 25, 2022
Decode / encode XML to/from map[string]interface{} (or JSON); extract values with dot-notation paths and wildcards. Replaces x2j and j2x packages.

mxj - to/from maps, XML and JSON Decode/encode XML to/from map[string]interface{} (or JSON) values, and extract/modify values from maps by key or key-

Dec 29, 2022
Similar to Anki but uses the actual frequency of words
Similar to Anki but uses the actual frequency of words

wordGame A program that uses a frequency-annotated vocabulary list to learn as efficiently as possible. Usage go run wordGame.go -freqTableFname=itali

Sep 21, 2021
Converts a number to its English counterpart. Uses arbitrary precision; so a number of any size can be converted.

Converts a number to its English counterpart. Uses arbitrary precision; so a number of any size can be converted.

Dec 14, 2021
A general purpose application and library for aligning text.

align A general purpose application that aligns text The focus of this application is to provide a fast, efficient, and useful tool for aligning text.

Sep 27, 2022
Parse placeholder and wildcard text commands

allot allot is a small Golang library to match and parse commands with pre-defined strings. For example use allot to define a list of commands your CL

Nov 24, 2022
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

omniparser Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JS

Jan 4, 2023
Produces a set of tags from given source. Source can be either an HTML page, Markdown document or a plain text. Supports English, Russian, Chinese, Hindi, Spanish, Arabic, Japanese, German, Hebrew, French and Korean languages.
Produces a set of tags from given source. Source can be either an HTML page, Markdown document or a plain text. Supports English, Russian, Chinese, Hindi, Spanish, Arabic, Japanese, German, Hebrew, French and Korean languages.

Tagify Gets STDIN, file or HTTP address as an input and returns a list of most popular words ordered by popularity as an output. More info about what

Dec 19, 2022
Extract urls from text

xurls Extract urls from text using regular expressions. Requires Go 1.13 or later. import "mvdan.cc/xurls/v2" func main() { rxRelaxed := xurls.Relax

Jan 7, 2023