Unified diff parser and printer for Go

go-diff Build Status GoDoc

Diff parser and printer for Go.

Installing

go get -u github.com/sourcegraph/go-diff/diff

Usage

It doesn't actually compute a diff. It only reads in (and prints out, given a Go struct representation) unified diff output, such as the following. The corresponding data structure in Go is the diff.FileDiff struct.

--- oldname	2009-10-11 15:12:20.000000000 -0700
+++ newname	2009-10-11 15:12:30.000000000 -0700
@@ -1,3 +1,9 @@ Section Header
+This is an important
+notice! It should
+therefore be located at
+the beginning of this
+document!
+
 This part of the
 document has stayed the
 same from version to
@@ -5,16 +11,10 @@
 be shown if it doesn't
 change.  Otherwise, that
 would not be helping to
-compress the size of the
-changes.
-
-This paragraph contains
-text that is outdated.
-It will be deleted in the
-near future.
+compress anything.
Owner
Sourcegraph
Code search and navigation for teams (self-hosted, OSS)
Sourcegraph
Comments
  • diff: latest release version still has old module path

    diff: latest release version still has old module path

    The import path of this package has changed in PR #30. /cc @sqs

    However, the latest released version (v0.5.0) still has the old module path in its go.mod file:

    https://github.com/sourcegraph/go-diff/blob/v0.5.0/go.mod#L1

    As a result, trying to install the latest version of this package in module mode fails:

    $ cd $(mktemp -d)
    
    $ go mod init m
    go: creating new go.mod: module m
    
    $ go get github.com/sourcegraph/go-diff/diff
    go: finding github.com/sourcegraph/go-diff/diff latest
    go: github.com/sourcegraph/[email protected]: parsing go.mod: unexpected module path "sourcegraph.com/sourcegraph/go-diff"
    go: error loading module requirements
    
    $ echo $?
    1
    

    This can be fixed this by making a new release version (perhaps v0.5.1) that contains the new module path in the go.mod file.

  • Go mod tidy sourcegraph

    Go mod tidy sourcegraph

    This is a campaign run to fulfill the delivery plan https://github.com/sourcegraph/customer/issues/13. Over the coming days, I will update more repositories using go mod tidy.

  • diff: fix time format

    diff: fix time format

    It seems that the fractional seconds part is optional, so change the format to accept whole seconds too.

    Here's a sample diff output by the bzr command which fails to parse currently.

    I started adding a test, but it involves more time than I have to spend right now, as the tests rely on reflect.DeepEqual working on time.Time instances, which won't work in general due to monotonic time. I'd suggest using https://godoc.org/github.com/google/go-cmp/cmp#Diff instead.

    === modified file 'encode.go'
    --- encode.go	2011-11-24 19:47:20 +0000
    +++ encode.go	2011-11-28 13:24:31 +0000
    @@ -250,8 +250,7 @@
     	e.emitScalar("null", "", "", C.YAML_PLAIN_SCALAR_STYLE)
     }
    
    -func (e *encoder) emitScalar(value, anchor, tag string,
    -	style C.yaml_scalar_style_t) {
    +func (e *encoder) emitScalar(value, anchor, tag string, style C.yaml_scalar_style_t) {
     	var canchor, ctag, cvalue *C.yaml_char_t
     	var cimplicit C.int
     	var free func()
    
    === modified file 'encode_test.go'
    --- encode_test.go	2011-11-24 19:47:20 +0000
    +++ encode_test.go	2011-11-28 13:24:39 +0000
    @@ -76,6 +76,10 @@
     		A int "a,omitempty"
     		B int "b,omitempty"
     	}{0, 0}},
    +	{"{}\n", &struct {
    +		A *struct{ X int } "a,omitempty"
    +		B int              "b,omitempty"
    +	}{nil, 0}},
    
     	// Flow flag
     	{"a: [1, 2]\n", &struct {
    
    === modified file 'goyaml.go'
    --- goyaml.go	2011-11-24 19:47:20 +0000
    +++ goyaml.go	2011-11-28 13:24:39 +0000
    @@ -256,7 +256,7 @@
     	switch v.Kind() {
     	case reflect.String:
     		return len(v.String()) == 0
    -	case reflect.Interface:
    +	case reflect.Interface, reflect.Ptr:
     		return v.IsNil()
     	case reflect.Slice:
     		return v.Len() == 0
    
  • MultiFileDiffReader returns trailing content

    MultiFileDiffReader returns trailing content

    Along with EOF. This is useful for handling mixed diff and non-diff output. Note that "stray" content between diff files was already included in the extended headers for the next diff. All this commit does is return a partial FileDiff, with only extended headers and no actual diff content, along with an EOF.


    Fixes https://github.com/sourcegraph/go-diff/issues/59

  • MultiFileDiffReader doesn’t handle messages in diff that certain files are only available in specific version.

    MultiFileDiffReader doesn’t handle messages in diff that certain files are only available in specific version.

    In some unified diff files there are such lines as Only in {path}: {filename}, meaning that certain files are only available in specific version. Those messages aren’t handled in any way, except when those lines are just catched in FileDiff.Extended for next FileDiff in the file. Are there any plans of handling these messages?

    For example, those unique files can be picked up in next functions, representing them as FileDiff with only the value OrigName.

    func (r *MultiFileDiffReader) ReadAllFiles() ([]*FileDiff, error)
    func (r *MultiFileDiffReader) ReadFile() (*FileDiff, error)
    

    Then for this file example, the output below for ReadAllFiles() would be expected. my_diff.txt

    diff -u source_a/file_1.txt  source_b/file_1.txt
    --- source_a/file_1.txt   2020-07-28 12:54:18.000000000 +0000
    +++ source_b/file_1.txt  2020-07-28 12:54:18.000000000 +0000
    @@ -2,3 +3,4 @@
     To be, or not to be, that is the question:
    -Whether 'tis nobler in the mind to suffer
    +The slings and arrows of outrageous fortune,
    +Or to take arms against a sea of troubles
     And by opposing end them. To die—to sleep,
    Only in source_a: file_2.txt
    Only in source_b: file_3.txt
    
    ReadAllFiles(myMultiFileDiffReader) -> {
        FileDiff{
            OrigName: “source_a/file_1.txt”, 
            OrigTime: ..., 
            NewName: “source_b/file_1.txt”, 
            NewTime: …,
            Entended: ..., 
            Hunks: …
        },
     
        FileDiff{
            OrigName: “source_a/file_2.txt”, 
            OrigTime: nil, 
            NewName: nil, 
            NewTime: nil
            …
        }, 
    
        FileDiff{
            OrigName: “source_b/file_3.txt”, 
            OrigTime: nil, 
            NewName: nil, 
            …
        }
    
    }, nil
    
  • Remove sourcegraph.com vanity import path

    Remove sourcegraph.com vanity import path

    This package is currently accessible via sourcegraph.com/sourcegraph/go-diff but there isn't really a good reason for this, and we no longer serve vanity import paths at e.g. https://sourcegraph.com/sourcegraph/go-diff (the real location is now https://sourcegraph.com/github.com/sourcegraph/go-diff)

    This package is in use by the Go build bots, it looks like: https://github.com/golang/build/blob/master/go.mod#L74

    Some firewalls block Cloudflare, which Sourcegraph.com is hosted through -- making this package hard to fetch needlessly. Reported on Slack here: https://gophers.slack.com/archives/C9BMAAFFB/p1542362482359900

    A PR for this would be very much appreciated :)

  • sourcegraph.com/sourcegraph/go-diff@v0.5.1: parsing go.mod: unexpected module path

    sourcegraph.com/sourcegraph/[email protected]: parsing go.mod: unexpected module path "github.com/sourcegraph/go-diff"

    When running "go get -u" for github.com/johnwyles/vrddt-reboot:

    sourcegraph.com/sourcegraph/[email protected]: parsing go.mod: unexpected module path "github.com/sourcegraph/go-diff"

  • Fix support for empty diffs (new, deleted, renamed, binary files).

    Fix support for empty diffs (new, deleted, renamed, binary files).

    ~~DO NOT MERGE, this is a WIP PR and there are known TODOs to resolve. Early review (comments, suggestions) is welcome.~~ Edit: Ready for review and merge.

    There are situations where a diff is empty. For example, if a blank file is added a removed. Or if a file is renamed without any changes. This change detects and handles those situations properly. Edit: Or if the file modified is binary.

    Improve test coverage significantly.

    TODO

    • [x] Update .proto and regenerate .pb.go.

    Fixes #10.

  • Passthrough for non-diff output

    Passthrough for non-diff output

    I've recently started using this library, to redact diffs that pertain to secret files from logs. It would be great if (Multi)FileDiffReader could yield content that is not part of a valid diff, so that the caller has the choice of whether or not to print it.

    For example, the program colordiff will successfully print non-diff output, allowing it to be used to process input that is mixed diff and non-diff content.

    What do you think of this feature, and would you be interested in a PR that implements it?

  • Add support for parsing multi-file diffs affecting binary files.

    Add support for parsing multi-file diffs affecting binary files.

    Unless I'm missing something, it seems the multi-file diff parser skips over changes to binary files (/cc @gbbr). Consider the following multi-file diff:

    diff --git a/README.md b/README.md
    index 7b73e04..36cde13 100644
    --- a/README.md
    +++ b/README.md
    @@ -1,6 +1,8 @@
     Conception-go [![Build Status](https://travis-ci.org/shurcooL/Conception-go.svg?branch=master)](https://travis-ci.org/shurcooL/Conception-go)
     =============
    
    +This is a change.
    +
     This is a work in progress Go implementation of [Conception](https://github.com/shurcooL/Conception#demonstration).
    
     Conception is an experimental project. It's a platform for researching software development tools and techniques. It is driven by a set of guiding principles. Conception-go targets Go development.
    diff --git a/data/Font.png b/data/Font.png
    index 17a971d..599f8dd 100644
    Binary files a/data/Font.png and b/data/Font.png differ
    diff --git a/main.go b/main.go
    index 1aced1e..98a982e 100644
    --- a/main.go
    +++ b/main.go
    @@ -6710,6 +6710,8 @@ func init() {
     }
    
     func main() {
    +   // Another plain text change.
    +
        //defer profile.Start(profile.CPUProfile).Stop()
        //defer profile.Start(profile.MemProfile).Stop()
    
    

    There are 3 files changed. Two text files (README.md and main.go), one binary file (data/Font.png).

    However, if I execute diff.ParseMultiFileDiff on that diff, the output is a slice with just 2 *diff.FileDiff entries, for the two text files. The binary file change is not there.

    Reproduce sample

    package main
    
    import (
        "fmt"
        "strings"
    
        "sourcegraph.com/sourcegraph/go-diff/diff"
    )
    
    // fileDiffName returns the name of a FileDiff.
    func fileDiffName(fileDiff *diff.FileDiff) string {
        var origName, newName string
        if strings.HasPrefix(fileDiff.OrigName, "a/") {
            origName = fileDiff.OrigName[2:]
        }
        if strings.HasPrefix(fileDiff.NewName, "b/") {
            newName = fileDiff.NewName[2:]
        }
        switch {
        case origName != "" && newName != "" && origName == newName: // Modified.
            return newName
        case origName != "" && newName != "" && origName != newName: // Renamed.
            return origName + " -> " + newName
        case origName == "" && newName != "": // Added.
            return newName
        case origName != "" && newName == "": // Removed.
            return "~~" + origName + "~~"
        default:
            panic("unexpected, no names")
        }
    }
    
    func main() {
        b := []byte(input)
        var o string
        if fileDiffs, err := diff.ParseMultiFileDiff(b); err == nil {
            for _, fileDiff := range fileDiffs {
                o += "\n" + "## " + fileDiffName(fileDiff) + "\n"
                /*o += "\n```diff\n"
                if hunks, err := diff.PrintHunks(fileDiff.Hunks); err == nil {
                    o += string(hunks)
                }
                o += "```\n"*/
            }
        } else {
            o += "\n```\n" + err.Error() + "\n```\n"
        }
        fmt.Println(o)
    }
    
    const input = `diff --git a/README.md b/README.md
    index 7b73e04..36cde13 100644
    --- a/README.md
    +++ b/README.md
    @@ -1,6 +1,8 @@
     Conception-go [![Build Status](https://travis-ci.org/shurcooL/Conception-go.svg?branch=master)](https://travis-ci.org/shurcooL/Conception-go)
     =============
    
    +This is a change.
    +
     This is a work in progress Go implementation of [Conception](https://github.com/shurcooL/Conception#demonstration).
    
     Conception is an experimental project. It's a platform for researching software development tools and techniques. It is driven by a set of guiding principles. Conception-go targets Go development.
    diff --git a/data/Font.png b/data/Font.png
    index 17a971d..599f8dd 100644
    Binary files a/data/Font.png and b/data/Font.png differ
    diff --git a/main.go b/main.go
    index 1aced1e..98a982e 100644
    --- a/main.go
    +++ b/main.go
    @@ -6710,6 +6710,8 @@ func init() {
     }
    
     func main() {
    +   // Another plain text change.
    +
        //defer profile.Start(profile.CPUProfile).Stop()
        //defer profile.Start(profile.MemProfile).Stop()
    
    `
    
  • Support Apple Diff timestamps

    Support Apple Diff timestamps

    Apple's diff (at least the one shipping with macOS Ventura) does not include the fractional seconds or Timezone in its output.

    This adds support for that so diffutils does not need to be installed.

    I first encountered this problem using golangci-lint on macOS after upgrading to macOS Ventura.

  • Binary support v1

    Binary support v1

    Binary diffs are parsed correctly after #46, but we should still add support to get the decoded body, and a boolean flag to determine a file is binary.

  • Produce LSIF data for each commit for fast/precise code nav

    Produce LSIF data for each commit for fast/precise code nav

  • Rename of repository not handled well

    Rename of repository not handled well

    This might actually be a bug in go, but I'm reporting it here anyway, since this breaks things here for now.

    Recently, this library movied from sourcegraph to github. However, using the old sourcegraph.com name is now broken:

    $ go get -d sourcegraph.com/sourcegraph/go-diff
    go: sourcegraph.com/sourcegraph/[email protected]: parsing go.mod: unexpected module path "github.com/sourcegraph/go-diff"
    go: error loading module requirements
    

    The obvious fix is to just use the new github.com import path, but this is not always trivial. In particular, when go-diff is pulled in by a dependency that has the old url, running go get -u to update dependencies fails with the same error.

    I guess that the problem is that go does not gracefully support updating to a new version in this case and updating across a rename (so 0.5.0 -> 0.5.1) needs changing of the import path. I can't see an easy way to fix this on the go side of things, so perhaps the fix here is to have sourcegraph.com not publish the 0.5.1 release (only 0.5.0), though that would also mean that people might never realize they're using an outdated version...

    (also, I couldn't actually find the code on sourcegraph.com anymore, http://sourcegraph.com/sourcegraph/go-diff gives a 404, but apparently go get still manages to access the files)

  • support sorting multi-file diff before printing

    support sorting multi-file diff before printing

    NOTE: This is just a demo PR. It is not intended to be merged.

    Makes it easier to canonicalize the output of PrintMultiFileDiff by adding a new PrintOptions arg that lets the caller specify that the diffs should be sorted before printing.

Unified text diffing in Go (copy of the internal diffing packages the officlal Go language server uses)

gotextdiff - unified text diffing in Go This is a copy of the Go text diffing packages that the official Go language server gopls uses internally to g

Dec 26, 2022
Diff, match and patch text in Go

go-diff go-diff offers algorithms to perform operations required for synchronizing plain text: Compare two texts and return their differences. Perform

Dec 25, 2022
OAS 3.1 Validation and Diff CLI Tool

oas-diff OAS 3.1 Validation and Diff Tool Requisits Go 1.17+ Run Build make build Run ./build/oasdiff --help Examples Validate ./build/oasdiff valid

May 12, 2022
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

omniparser Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JS

Jan 4, 2023
A shell parser, formatter, and interpreter with bash support; includes shfmt

sh A shell parser, formatter, and interpreter. Supports POSIX Shell, Bash, and mksh. Requires Go 1.14 or later. Quick start To parse shell scripts, in

Dec 29, 2022
A simple CSS parser and inliner in Go

douceur A simple CSS parser and inliner in Golang. Parser is vaguely inspired by CSS Syntax Module Level 3 and corresponding JS parser. Inliner only p

Dec 12, 2022
Quick and simple parser for PFSense XML configuration files, good for auditing firewall rules

pfcfg-parser version 0.0.1 : 13 January 2022 A quick and simple parser for PFSense XML configuration files to generate a plain text file of the main c

Jan 13, 2022
A NMEA parser library in pure Go

go-nmea This is a NMEA library for the Go programming language (Golang). Features Parse individual NMEA 0183 sentences Support for sentences with NMEA

Dec 20, 2022
TOML parser for Golang with reflection.

THIS PROJECT IS UNMAINTAINED The last commit to this repo before writing this message occurred over two years ago. While it was never my intention to

Dec 30, 2022
User agent string parser in golang

User agent parsing useragent is a library written in golang to parse user agent strings. Usage First install the library with: go get xojoc.pw/userage

Aug 2, 2021
Simple HCL (HashiCorp Configuration Language) parser for your vars.

HCL to Markdown About To write a good documentation for terraform module, quite often we just need to print all our input variables as a fancy table.

Dec 14, 2021
A markdown parser written in Go. Easy to extend, standard(CommonMark) compliant, well structured.

goldmark A Markdown parser written in Go. Easy to extend, standards-compliant, well-structured. goldmark is compliant with CommonMark 0.29. Motivation

Dec 29, 2022
A PDF renderer for the goldmark markdown parser.
A PDF renderer for the goldmark markdown parser.

goldmark-pdf goldmark-pdf is a renderer for goldmark that allows rendering to PDF. Reference See https://pkg.go.dev/github.com/stephenafamo/goldmark-p

Jan 7, 2023
Experimental parser Angular template

Experimental parser Angular template This repository only shows what a parser on the Go might look like Benchmark 100k line of template Parser ms @ang

Dec 15, 2021
A dead simple parser package for Go
A dead simple parser package for Go

A dead simple parser package for Go V2 Introduction Tutorial Tag syntax Overview Grammar syntax Capturing Capturing boolean value Streaming Lexing Sta

Dec 30, 2022
Freestyle xml parser with golang

fxml - FreeStyle XML Parser This package provides a simple parser which reads a XML document and output a tree structure, which does not need a pre-de

Jul 1, 2022
An extension to the Goldmark Markdown Parser

Goldmark-Highlight An extension to the Goldmark Markdown Parser which adds parsing / rendering capabilities for rendering highlighted text. Highlighte

May 25, 2022
A parser combinator library for Go.

Takenoco A parser combinator library for Go. Examples CSV parser Dust - toy scripting language Usage Define the parser: package csv import ( "err

Oct 30, 2022
A simple json parser built using golang

jsonparser A simple json parser built using golang Installation: go get -u githu

Dec 29, 2021