a tool for code clone detection

dupl Build Status

dupl is a tool written in Go for finding code clones. So far it can find clones only in the Go source files. The method uses suffix tree for serialized ASTs. It ignores values of AST nodes. It just operates with their types (e.g. if a == 13 {} and if x == 100 {} are considered the same provided it exceeds the minimal token sequence size).

Due to the used method dupl can report so called "false positives" on the output. These are the ones we do not consider clones (whether they are too small, or the values of the matched tokens are completely different).

Installation

go get -u github.com/mibk/dupl

Usage

Usage of dupl:
  dupl [flags] [paths]

Paths:
  If the given path is a file, dupl will use it regardless of
  the file extension. If it is a directory it will recursively
  search for *.go files in that directory.

  If no path is given dupl will recursively search for *.go
  files in the current directory.

Flags:
  -files
        read file names from stdin one at each line
  -html
        output the results as HTML, including duplicate code fragments
  -plumbing
        plumbing (easy-to-parse) output for consumption by scripts or tools
  -t, -threshold size
        minimum token sequence size as a clone (default 15)
  -vendor
        check files in vendor directory
  -v, -verbose
        explain what is being done

Examples:
  dupl -t 100
        Search clones in the current directory of size at least
        100 tokens.
  dupl $(find app/ -name '*_test.go')
        Search for clones in tests in the app directory.
  find app/ -name '*_test.go' |dupl -files
        The same as above.

Example

The reduced output of this command with the following parameters for the Docker source code looks like this.

$ dupl -t 200 -html >docker.html
Owner
Michal Bohuslávek
pronounced /mɪk/
Michal Bohuslávek
Comments
  • Output in a format more easily digested by editors

    Output in a format more easily digested by editors

    Hello,

    Interesting tool. Would it be possible to have an alternate output format that is more easily integrated with editors/IDEs?

    Instead of:

    found 3 clones:
      loc 1: main.go, line 139-139,
      loc 2: main.go, line 140-140,
      loc 3: main.go, line 150-150,
    

    Something like:

    main.go:139-139: duplicate of main.go:140-140, main.go:150-150
    main.go:140-140: duplicate of main.go:139-139, main.go:150-150
    main.go:150-150: duplicate of main.go:139-139, main.go:140-140
    
  • dupl things two switch/case clauses are the same when they differ in variable type creation. Mention the actual behaviour in README.

    dupl things two switch/case clauses are the same when they differ in variable type creation. Mention the actual behaviour in README.

    Consider this go code:

    switch raw.VersionField {
    // v1.Version and v2.Version are just strings aliases.
    case v1.Version:
        dec := json.NewDecoder(bytes.NewReader(b))
        dec.UseNumber()
        var m v1.QueueMessage
        // some more code
    case v2.Version:
        dec := json.NewDecoder(bytes.NewReader(b))
        dec.UseNumber()
        var m v2.QueueMessage
        // some more code, similar to the previous clause.
    }
    

    Running dupl on this returns:

    duplicate of ./api.go:119-150 (dupl) // Lines belong to the second clase
    duplicate of ./api.go:87-118 (dupl)  // Lines belong to the first clause
    

    There are two issues here:

    1. These two cases aren't the same, and differ by the var m v1.QueueMessage line.
    2. Presuming that they are the same, they're linking each other in the error message. I guess the link should be one way, from the duplicate (the second clause, the with v2) to the first.
  • Serious memory leak

    Serious memory leak

    There are two places have serious memory leak:

    • 1.https://github.com/mibk/dupl/blob/master/suffixtree/dupl.go#L34
    • 2.https://github.com/mibk/dupl/blob/master/main.go#L166
  • Please mention that this tool required go 1.4 or  greater

    Please mention that this tool required go 1.4 or greater

    Since you use for range that go 1.4 supported.

    If you use go 1.3, you will get:

    Installing dupl -> go get -u github.com/mibk/dupl
    # github.com/mibk/dupl/suffixtree
    ../../mibk/dupl/suffixtree/suffixtree.go:44: syntax error: unexpected range, expecting {
    ../../mibk/dupl/suffixtree/suffixtree.go:49: syntax error: unexpected }
    gometalinter: error: failed to install dupl: exit status 2: exit status 2
    
  • dupl not the same as dupl *.go

    dupl not the same as dupl *.go

    Running dupl will find duplicates across files. Running dupl *.go will only find duplicates inside each file. Given the feedback of https://github.com/mibk/dupl/issues/6 is it possible for multiple files specified on the command line to work the same as dupl on a directory?

  • Consider all command line arguments as paths.

    Consider all command line arguments as paths.

    Currently, dupl searches for clones in the current directory. There is an option to search in another directory that can be specified using the first argument.

    This CL enables using other arguments as well. If the argument is a directory it searches recursively all *.go files in that directory. If it is a file it uses it regardless of the file extension.

    The following commands are now equivalent:

        > dupl
        > dupl .
        > dupl $(find -name '*.go')
        > find -name '*.go' |dupl -files
    

    Also, update README and help texts.

  • output: add plumbing output. Closes #2

    output: add plumbing output. Closes #2

    An example of the plumbing output as suggested in #2:

    $ dupl -t 30 -plumbing
    syntax/golang/golang.go:97-105: duplicate of syntax/golang/golang.go:133-140
    syntax/golang/golang.go:133-140: duplicate of syntax/golang/golang.go:97-105
    syntax/golang/golang.go:146-153: duplicate of syntax/golang/golang.go:155-162
    syntax/golang/golang.go:155-162: duplicate of syntax/golang/golang.go:146-153
    syntax/golang/golang.go:221-229: duplicate of syntax/golang/golang.go:255-263
    syntax/golang/golang.go:255-263: duplicate of syntax/golang/golang.go:221-229
    syntax/golang/golang.go:235-240: duplicate of syntax/golang/golang.go:352-357
    syntax/golang/golang.go:352-357: duplicate of syntax/golang/golang.go:235-240
    output/output_test.go:15-20: duplicate of output/output_test.go:33-38
    output/output_test.go:33-38: duplicate of output/output_test.go:15-20
    

    Note that groups are not dividied. Should it be divided (for example by a blank line)?

  • Why is the output non-deterministic?

    Why is the output non-deterministic?

    I've found that running dupl on the same codebase multiple times produces different results. I'm not sure why this would be the case, any ideas?

    Heres some examples to demonstrate what I mean:

    >for i in $(seq 1 100);do dupl ~/go/src/github.com/spf13/cobra/ |grep Found;done  |sort |uniq -c
          6 Found total 82 clone groups.
         27 Found total 83 clone groups.
         52 Found total 84 clone groups.
         13 Found total 85 clone groups.
          2 Found total 86 clone groups.
    > for i in $(seq 1 100);do dupl ~/go/src/golang.org/x/net/webdav |grep Found;done  |sort |uniq -c
          2 Found total 228 clone groups.
         30 Found total 229 clone groups.
         30 Found total 230 clone groups.
         23 Found total 231 clone groups.
         12 Found total 232 clone groups.
          3 Found total 233 clone groups.
    
    

    I don't see a way to check the version of dupl I have installed, but I just updated to the latest commit as of the time of me posting this ticket (415e882)

  • Remove -connect and -serve

    Remove -connect and -serve

    It would be nice to clean up the code a little bit and therefore remove these flags as well as the code that uses it. It is all related to using some instance of dupl as server and some other instances of dupl as clients to speed up the search.

    Because there is no such repository that the speed of the search was the issue, and because sadly it is not faster, but rather much much slower instead, I propose to remove it. I also think that nobody is using it anyway. More likely people are confused what is it doing there.

    I'm pretty sure nobody is using it. But in any case, I'm filing this issue. If nobody argues against it, I'm going to remove it in a week or so. Thank you.

  • README: Add Installation section with import path.

    README: Add Installation section with import path.

    Even though one can figure out the import path from the url, it's still helpful to display it in readme. Especially because many packages have vanity import paths, so it's not always immediately obvious.

  • Ability to use dupl as a library

    Ability to use dupl as a library

    For some historical reason golanci/golangci-lint uses for of mibk/dupl. I am in the process of moving golangci-lint away from forks towards upstream versions of all used linters. To achieve this we must be able to use dupl as a library and this PR introduces such an ability.

  • Allow disabling of check with in code comments

    Allow disabling of check with in code comments

    Great tool by the way! It has enabled me to write better code by pointing places where things can be dry'd up.

    It would be nice to disable output for certain places in code where I know there is duplication, but I've chosen to keep the duplication in place. Sometimes there are reasons for this.

    I'm thinking something akin to a linting comment to disable duplication checking:

    // dupl-disable
                Context("when specifying parameter set 1", func() {
                    BeforeEach(func() {
                        request, _ = http.NewRequest("GET", "/?paramA=20&paramB=40", nil)
                    })
                    It("should reflect the desired parameters", func() {
                        server.GET("/", func(c *gin.Context) {
                            paramSet1, _ := getParamSet1(c)
                            Expect(paramSet1.A).To(Equal(20))
                            Expect(paramSet1.B).To(Equal(40))
                        })
                        server.ServeHTTP(recorder, request)
                    })
                })
    // dupl-enable
    

    ...

    // dupl-disable
                    Context("when specifying parameter set 2", func() {
                        BeforeEach(func() {
                            request, _ = http.NewRequest("GET", "/?paramX=5000&paramY=10000", nil)
                        })
                        It("will reflect the desired parameters", func() {
                            server.GET("/", func(c *gin.Context) {
                                paramSet2, _ := getParamSet2(c)
                                Expect(paramSet2.X).To(Equal(5000))
                                Expect(paramSet2.Y).To(Equal(10000))
                            })
                            server.ServeHTTP(recorder, request)
                        })
                    })
    // dupl-enable
    
🐶 Automated code review tool integrated with any code analysis tools regardless of programming language
🐶 Automated code review tool integrated with any code analysis tools regardless of programming language

reviewdog - A code review dog who keeps your codebase healthy. reviewdog provides a way to post review comments to code hosting service, such as GitHu

Jan 2, 2023
A Golang tool that does static analysis, unit testing, code review and generate code quality report.
A Golang tool that does static analysis, unit testing, code review and generate code quality report.

goreporter A Golang tool that does static analysis, unit testing, code review and generate code quality report. This is a tool that concurrently runs

Jan 8, 2023
🐶 Automated code review tool integrated with any code analysis tools regardless of programming language
🐶 Automated code review tool integrated with any code analysis tools regardless of programming language

reviewdog - A code review dog who keeps your codebase healthy. reviewdog provides a way to post review comments to code hosting service, such as GitHu

Jan 7, 2023
The most opinionated Go source code linter for code audit.
The most opinionated Go source code linter for code audit.

go-critic Highly extensible Go source code linter providing checks currently missing from other linters. There is never too much static code analysis.

Jan 6, 2023
Sloc, Cloc and Code: scc is a very fast accurate code counter with complexity calculations and COCOMO estimates written in pure Go
Sloc, Cloc and Code: scc is a very fast accurate code counter with complexity calculations and COCOMO estimates written in pure Go

Sloc Cloc and Code (scc) A tool similar to cloc, sloccount and tokei. For counting physical the lines of code, blank lines, comment lines, and physica

Jan 4, 2023
depth is tool to retrieve and visualize Go source code dependency trees.

depth is tool to retrieve and visualize Go source code dependency trees. Install Download the appropriate binary for your platform from the Rele

Dec 30, 2022
Tool to populate your code with traceable and secure error codes

Essential part of any project, especially customer facing is proper and secure error handling. When error happens and customer reports it, it would be nice to know the context of the error and where it exactly occured.

Sep 28, 2022
a simple golang SSA viewer tool use for code analysis or make a linter
a simple golang SSA viewer tool use for code analysis or make a linter

ssaviewer A simple golang SSA viewer tool use for code analysis or make a linter ssa.html generate code modify from src/cmd/compile/internal/ssa/html.

May 17, 2022
Refactoring and code transformation tool for Go.

gopatch is a tool to match and transform Go code. It is meant to aid in refactoring and restyling.

Dec 30, 2022
[mirror] This is a linter for Go source code.

Golint is a linter for Go source code. Installation Golint requires a supported release of Go. go get -u golang.org/x/lint/golint To find out where g

Dec 23, 2022
Run linters from Go code -

Lint - run linters from Go Lint makes it easy to run linters from Go code. This allows lint checks to be part of a regular go build + go test workflow

Sep 27, 2022
A reference for the Go community that covers the fundamentals of writing clean code and discusses concrete refactoring examples specific to Go.

A reference for the Go community that covers the fundamentals of writing clean code and discusses concrete refactoring examples specific to Go.

Jan 1, 2023
A static code analyzer for annotated TODO comments
A static code analyzer for annotated TODO comments

todocheck todocheck is a static code analyzer for annotated TODO comments. It let's you create actionable TODOs by annotating them with issues from an

Dec 7, 2022
A little fast cloc(Count Lines Of Code)

gocloc A little fast cloc(Count Lines Of Code), written in Go. Inspired by tokei. Installation $ go get -u github.com/hhatto/gocloc/cmd/gocloc Usage

Jan 6, 2023
🔒🌍 Security scanner for your Terraform code
🔒🌍 Security scanner for your Terraform code

????tfsec uses static analysis of your terraform templates to spot potential security issues.

Dec 30, 2022
Know when GC runs from inside your golang code

gcnotifier gcnotifier provides a way to receive notifications after every run of the garbage collector (GC). Knowing when GC runs is useful to instruc

Dec 26, 2022
a Go code to detect leaks in JS files via regex patterns

a Go code to detect leaks in JS files via regex patterns

Nov 13, 2022
go-linters How to grow Go code as a bonsai: the style, the rules, the linters

How to grow Go code as a bonsai: the style, the rules, the linters (Definition 2021 Hackaton) Build go build -buildmode=plugin plugin/plugin.go Run go

Nov 9, 2022
Print all source code for a given go package or module.

gosrcs gosrcs is a tool to print all the source code a given go package depends on. The original motivation of this tool is to integrate go builds int

Oct 25, 2021