Gaining advanced insights from Git repository history.


Fast, insightful and highly customizable Git history analysis.

Table of Contents


Hercules is an amazingly fast and highly customizable Git repository analysis engine written in Go. Batteries are included. Powered by go-git.

Notice (November 2020): the main author is back from the limbo and is gradually resuming the development. See the roadmap.

There are two command-line tools: hercules and labours. The first is a program written in Go which takes a Git repository and executes a Directed Acyclic Graph (DAG) of analysis tasks over the full commit history. The second is a Python script which shows some predefined plots over the collected data. These two tools are normally used together through a pipe. It is possible to write custom analyses using the plugin system. It is also possible to merge several analysis results together - relevant for organizations. The analyzed commit history includes branches, merges, etc.

Hercules has been successfully used for several internal projects at source{d}. There are blog posts: 1, 2 and a presentation. Please contribute by testing, fixing bugs, adding new analyses, or coding swagger!

Hercules DAG of Burndown analysis

The DAG of burndown and couples analyses with UAST diff refining. Generated with hercules --burndown --burndown-people --couples --feature=uast --dry-run --dump-dag doc/

git/git image

torvalds/linux line burndown (granularity 30, sampling 30, resampled by year). Generated with hercules --burndown --first-parent --pb | labours -f pb -m burndown-project in 1h 40min.


Grab hercules binary from the Releases page. labours is installable from PyPi:

pip3 install labours

pip3 is the Python package manager.

Numpy and Scipy can be installed on Windows using

Build from source

You are going to need Go (>= v1.11) and protoc.

git clone && cd hercules
pip3 install -e ./python

GitHub Action

It is possible to run Hercules as a GitHub Action: Hercules on GitHub Marketplace. Please refer to the sample workflow which demonstrates how to setup.


...are welcome! See CONTRIBUTING and code of conduct.


Apache 2.0


The most useful and reliably up-to-date command line reference:

hercules --help

Some examples:

# Use "memory" go-git backend and display the burndown plot. "memory" is the fastest but the repository's git data must fit into RAM.
hercules --burndown | labours -m burndown-project --resample month
# Use "file system" go-git backend and print some basic information about the repository.
hercules /path/to/cloned/go-git
# Use "file system" go-git backend, cache the cloned repository to /tmp/repo-cache, use Protocol Buffers and display the burndown plot without resampling.
hercules --burndown --pb /tmp/repo-cache | labours -m burndown-project -f pb --resample raw

# Now something fun
# Get the linear history from git rev-list, reverse it
# Pipe to hercules, produce burndown snapshots for every 30 days grouped by 30 days
# Save the raw data to cache.yaml, so that later is possible to labours -i cache.yaml
# Pipe the raw data to labours, set text font size to 16pt, use Agg matplotlib backend and save the plot to output.png
git rev-list HEAD | tac | hercules --commits - --burndown | tee cache.yaml | labours -m burndown-project --font-size 16 --backend Agg --output git.png

labours -i /path/to/yaml allows to read the output from hercules which was saved on disk.


It is possible to store the cloned repository on disk. The subsequent analysis can run on the corresponding directory instead of cloning from scratch:

# First time - cache
hercules /tmp/repo-cache

# Second time - use the cache
hercules --some-analysis /tmp/repo-cache

GitHub Action

The action produces the artifact named hercules_charts. Since it is currently impossible to pack several files in one artifact, all the charts and Tensorflow Projector files are packed in the inner tar archive. In order to view the embeddings, go to, click "Load" and choose the two TSVs. Then use UMAP or T-SNE.

Docker image

docker run --rm srcd/hercules hercules --burndown --pb | docker run --rm -i -v $(pwd):/io srcd/hercules labours -f pb -m burndown-project -o /io/git_git.png

Built-in analyses

Project burndown

hercules --burndown
labours -m burndown-project

Line burndown statistics for the whole repository. Exactly the same what git-of-theseus does but much faster. Blaming is performed efficiently and incrementally using a custom RB tree tracking algorithm, and only the last modification date is recorded while running the analysis.

All burndown analyses depend on the values of granularity and sampling. Granularity is the number of days each band in the stack consists of. Sampling is the frequency with which the burnout state is snapshotted. The smaller the value, the more smooth is the plot but the more work is done.

There is an option to resample the bands inside labours, so that you can define a very precise distribution and visualize it different ways. Besides, resampling aligns the bands across periodic boundaries, e.g. months or years. Unresampled bands are apparently not aligned and start from the project's birth date.


hercules --burndown --burndown-files
labours -m burndown-file

Burndown statistics for every file in the repository which is alive in the latest revision.

Note: it will generate separate graph for every file. You don't want to run it on repository with many files.


hercules --burndown --burndown-people [--people-dict=/path/to/identities]
labours -m burndown-person

Burndown statistics for the repository's contributors. If --people-dict is not specified, the identities are discovered by the following algorithm:

  1. We start from the root commit towards the HEAD. Emails and names are converted to lower case.
  2. If we process an unknown email and name, record them as a new developer.
  3. If we process a known email but unknown name, match to the developer with the matching email, and add the unknown name to the list of that developer's names.
  4. If we process an unknown email but known name, match to the developer with the matching name, and add the unknown email to the list of that developer's emails.

If --people-dict is specified, it should point to a text file with the custom identities. The format is: every line is a single developer, it contains all the matching emails and names separated by |. The case is ignored.

Overwrites matrix

Wireshark top 20 overwrites matrix

Wireshark top 20 devs - overwrites matrix

hercules --burndown --burndown-people [--people-dict=/path/to/identities]
labours -m overwrites-matrix

Beside the burndown information, --burndown-people collects the added and deleted line statistics per developer. Thus it can be visualized how many lines written by developer A are removed by developer B. This indicates collaboration between people and defines expertise teams.

The format is the matrix with N rows and (N+2) columns, where N is the number of developers.

  1. First column is the number of lines the developer wrote.
  2. Second column is how many lines were written by the developer and deleted by unidentified developers (if --people-dict is not specified, it is always 0).
  3. The rest of the columns show how many lines were written by the developer and deleted by identified developers.

The sequence of developers is stored in people_sequence YAML node.

Code ownership

Ember.js top 20 code ownership

Ember.js top 20 devs - code ownership

hercules --burndown --burndown-people [--people-dict=/path/to/identities]
labours -m ownership

--burndown-people also allows to draw the code share through time stacked area plot. That is, how many lines are alive at the sampled moments in time for each identified developer.


Linux kernel file couples

torvalds/linux files' coupling in Tensorflow Projector

hercules --couples [--people-dict=/path/to/identities]
labours -m couples -o <name> [--couples-tmp-dir=/tmp]

Important: it requires Tensorflow to be installed, please follow official instructions.

The files are coupled if they are changed in the same commit. The developers are coupled if they change the same file. hercules records the number of couples throughout the whole commit history and outputs the two corresponding co-occurrence matrices. labours then trains Swivel embeddings - dense vectors which reflect the co-occurrence probability through the Euclidean distance. The training requires a working Tensorflow installation. The intermediate files are stored in the system temporary directory or --couples-tmp-dir if it is specified. The trained embeddings are written to the current working directory with the name depending on -o. The output format is TSV and matches Tensorflow Projector so that the files and people can be visualized with t-SNE implemented in TF Projector.

Structural hotness

      46  jinja2/ [FunctionDef]
      42  jinja2/ [FunctionDef]
      34  jinja2/ [FunctionDef]
      29  jinja2/ [FunctionDef]
      27  jinja2/ [FunctionDef]
      22  jinja2/ [FunctionDef]
      22  jinja2/ [FunctionDef]
      21  jinja2/ [FunctionDef]
      21  jinja2/ [FunctionDef]
      20  jinja2/ [FunctionDef]

Thanks to Babelfish, hercules is able to measure how many times each structural unit has been modified. By default, it looks at functions; refer to Semantic UAST XPath manual to switch to something else.

hercules --shotness [--shotness-xpath-*]
labours -m shotness

Couples analysis automatically loads "shotness" data if available.

Jinja2 functions grouped by structural hotness

hercules --shotness --pb | labours -m couples -f pb

Aligned commit series


tensorflow/tensorflow aligned commit series of top 50 developers by commit number.

hercules --devs [--people-dict=/path/to/identities]
labours -m devs -o <name>

We record how many commits made, as well as lines added, removed and changed per day for each developer. We plot the resulting commit time series using a few tricks to show the temporal grouping. In other words, two adjacent commit series should look similar after normalization.

  1. We compute the distance matrix of the commit series. Our distance metric is Dynamic Time Warping. We use FastDTW algorithm which has linear complexity proportional to the length of time series. Thus the overall complexity of computing the matrix is quadratic.
  2. We compile the linear list of commit series with Seriation technique. Particularly, we solve the Travelling Salesman Problem which is NP-complete. However, given the typical number of developers which is less than 1,000, there is a good chance that the solution does not take much time. We use Google or-tools solver.
  3. We find 1-dimensional clusters in the resulting path with HDBSCAN algorithm and assign colors accordingly.
  4. Time series are smoothed by convolving with the Slepian window.

This plot allows to discover how the development team evolved through time. It also shows "commit flashmobs" such as Hacktoberfest. For example, here are the revealed insights from the tensorflow/tensorflow plot above:

  1. "Tensorflow Gardener" is classified as the only outlier.
  2. The "blue" group of developers covers the global maintainers and a few people who left (at the top).
  3. The "red" group shows how core developers join the project or become less active.

Added vs changed lines through time


tensorflow/tensorflow added and changed lines through time.

hercules --devs [--people-dict=/path/to/identities]
labours -m old-vs-new -o <name>

--devs from the previous section allows to plot how many lines were added and how many existing changed (deleted or replaced) through time. This plot is smoothed.

Efforts through time


kubernetes/kubernetes efforts through time.

hercules --devs [--people-dict=/path/to/identities]
labours -m devs-efforts -o <name>

Besides, --devs allows to plot how many lines have been changed (added or removed) by each developer. The upper part of the plot is an accumulated (integrated) lower part. It is impossible to have the same scale for both parts, so the lower values are scaled, and hence there are no lower Y axis ticks. There is a difference between the efforts plot and the ownership plot, although changing lines correlate with owning lines.

Sentiment (positive and negative comments)

Django sentiment

It can be clearly seen that Django comments were positive/optimistic in the beginning, but later became negative/pessimistic.
hercules --sentiment --pb | labours -m sentiment -f pb

We extract new and changed comments from source code on every commit, apply BiDiSentiment general purpose sentiment recurrent neural network and plot the results. Requires libtensorflow. E.g. sadly, we need to hide the rect from the documentation finder for now is negative and Theano has a built-in optimization for logsumexp (...) so we can just write the expression directly is positive. Don't expect too much though - as was written, the sentiment model is general purpose and the code comments have different nature, so there is no magic (for now).

Hercules must be built with "tensorflow" tag - it is not by default:

make TAGS=tensorflow

Such a build requires libtensorflow.

Everything in a single pass

hercules --burndown --burndown-files --burndown-people --couples --shotness --devs [--people-dict=/path/to/identities]
labours -m all


Hercules has a plugin system and allows to run custom analyses. See


hercules combine is the command which joins several analysis results in Protocol Buffers format together.

hercules --burndown --pb > go-git.pb
hercules --burndown --pb > hercules.pb
hercules combine go-git.pb hercules.pb | labours -f pb -m burndown-project --resample M

Bad unicode errors

YAML does not support the whole range of Unicode characters and the parser on labours side may raise exceptions. Filter the output from hercules through to discard such offending characters.

hercules --burndown --burndown-people | python3 | labours -m people


These options affects all plots:

labours [--style=white|black] [--backend=] [--size=Y,X]

--style sets the general style of the plot (see labours --help). --background changes the plot background to be either white or black. --backend chooses the Matplotlib backend. --size sets the size of the figure in inches. The default is 12,9.

(required in macOS) you can pin the default Matplotlib backend with

echo "backend: TkAgg" > ~/.matplotlib/matplotlibrc

These options are effective in burndown charts only:

labours [--text-size] [--relative]

--text-size changes the font size, --relative activate the stretched burndown layout.

Custom plotting backend

It is possible to output all the information needed to draw the plots in JSON format. Simply append .json to the output (-o) and you are done. The data format is not fully specified and depends on the Python code which generates it. Each JSON file should contain "type" which reflects the plot kind.


  1. Processing all the commits may fail in some rare cases. If you get an error similar to please report there and specify --first-parent as a workaround.
  2. Burndown collection may fail with an Out-Of-Memory error. See the next session for the workarounds.
  3. Parsing YAML in Python is slow when the number of internal objects is big. hercules' output for the Linux kernel in "couples" mode is 1.5 GB and takes more than an hour / 180GB RAM to be parsed. However, most of the repositories are parsed within a minute. Try using Protocol Buffers instead (hercules --pb and labours -f pb).
  4. To speed up yaml parsing
    # Debian, Ubuntu
    apt install libyaml-dev
    # macOS
    brew install yaml-cpp libyaml
    # you might need to re-install pyyaml for changes to make effect
    pip uninstall pyyaml
    pip --no-cache-dir install pyyaml

Burndown Out-Of-Memory

If the analyzed repository is big and extensively uses branching, the burndown stats collection may fail with an OOM. You should try the following:

  1. Read the repo from disk instead of cloning into memory.
  2. Use --skip-blacklist to avoid analyzing the unwanted files. It is also possible to constrain the --language.
  3. Use the hibernation feature: --hibernation-distance 10 --burndown-hibernation-threshold=1000. Play with those two numbers to start hibernating right before the OOM.
  4. Hibernate on disk: --burndown-hibernation-disk --burndown-hibernation-dir /path.
  5. --first-parent, you win.


  • Switch from src-d/go-git to go-git/go-git. Upgrade the codebase to be compatible with the latest Go version.
  • Update the docs regarding the copyrights and such.
  • Fix the reported bugs.
  • Remove the dependency on Babelfish for parsing the code. It is abandoned and a better alternative should be found.
  • Remove the ad-hoc analyses added while source{d} was agonizing.
  • Add pluggable Logger for pipeline

    Add pluggable Logger for pipeline

    closes #244

    This PR allows something like the following: (edit no longer possible as of - a wrapper needs to be written around zap)

    package hercules_test
    import (
    	gogit ""
    func main() {
    	repo, _ := gogit.PlainOpen(".")
    	pipe := hercules.NewPipeline(repo)
                    // custom logger
    		hercules.ConfigLogger: zap.NewExample().Sugar(),

    I've added a Logger and Config variable for facts in package internal/core, and also added the logger to all pipeline items and replaced the log calls that i could find.

    There are still a few calls to panic that I think we should try to remove in favour of error, but I noticed some tests actually assert that panics occur so I've tried not to change too much for now (notably BurndownAnalysis)

  • Getting error which running sentiment analysis

    Getting error which running sentiment analysis

    I have attached a screenshot of the error.

    hercules --sentiment --pb | labours -m sentiment -f pb
    [ERROR] 2019/07/31 17:19:58 Unsatisfied dependency: [uasts] -> UASTChanges
    [ERROR] 2019/07/31 17:19:58 stacktrace:*Pipeline).resolve(0xc0ec423cb8, 0x0, 0x0, 0x10, 0xc0001133d8)
    	/home/batman/go/src/ +0x1a2c*Pipeline).Initialize(0xc0ec423cb8, 0xc000a2d320, 0x0, 0x0)
    	/home/batman/go/src/ +0x27c
    main.glob..func3(0x33582a0, 0xc000a2d590, 0x1, 0x3)
    	/home/batman/go/src/ +0x9bd*Command).execute(0x33582a0, 0xc000032110, 0x3, 0x3, 0x33582a0, 0xc000032110)
    	/home/batman/go/src/ +0x2ae*Command).ExecuteC(0x33582a0, 0xc000e9a600, 0xc000eb7f88, 0x8b197f)
    	/home/batman/go/src/ +0x2ec*Command).Execute(...)
    	/home/batman/go/src/ +0x32
    [ERROR] 2019/07/31 17:19:58 Failed to initialize the pipeline on []
    2019/07/31 17:19:58 unsatisfied dependency
    Traceback (most recent call last):
      File "/home/batman/.local/bin/labours", line 11, in <module>
        load_entry_point('labours', 'console_scripts', 'labours')()
      File "/home/batman/go/src/", line 1730, in main
        reader = read_input(args)
      File "/home/batman/go/src/", line 409, in read_input
      File "/home/batman/go/src/", line 278, in read
        raise ValueError("empty input")
    ValueError: empty input

    Screenshot from 2019-07-31 17-31-17

  • slice bounds out of range error

    slice bounds out of range error

    What i get after running hercules --devs <repo> | python -m devs

    2417 / 29722 1h58m45spanic: runtime error: slice bounds out of range

    goroutine 4816 [running]:*RenameAnalysis).blobsAreClose(0xc000a49580, 0xc04aa1e370, 0xc00c8b60a0, 0xc00b80ee48, 0x9, 0xc0933c3f90) c:/gopath/src/ +0xb30*RenameAnalysis).Consume.func1(0x0, 0x0) c:/gopath/src/ +0x41b*RenameAnalysis).Consume.func3(0xc00b313950, 0xc00650cd30) c:/gopath/src/ +0x2e created by*RenameAnalysis).Consume c:/gopath/src/ +0x1a7b

    No data has been read - has Hercules crashed?

  • Panic: file x already exists

    Panic: file x already exists

    I'm trying to run hercules against my repo:

    $ ./hercules --burndown .
    2018/09/19 15:47:54 Burndown failed on commit #2201 (3024) 8fd831a7cb4f2d6012bb984be66dc7fd9e3a8f99
    panic: file app/core/modules/****.js already exists
    goroutine 1 [running]:
    main.glob..func3(0xd36140, 0xc00025cce0, 0x1, 0x2)
            c:/gopath/src/ +0xb26*Command).execute(0xd36140, 0xc00008e0d0, 0x2, 0x3, 0xd36140, 0xc00008e0d0)
            c:/gopath/src/ +0x2d3*Command).ExecuteC(0xd36140, 0x104ab40, 0x26, 0xc000a52089)
            c:/gopath/src/ +0x304*Command).Execute(0xd36140, 0x406f37, 0xc00007c058)
            c:/gopath/src/ +0x32
            c:/gopath/src/ +0x38

    Using (v4 on Windows):

  • Add more granular intervals (replace 'days' with generic 'ticks')

    Add more granular intervals (replace 'days' with generic 'ticks')

    closes #238 , more context is in the issue

    Seeking feedback! 🙏 probably still needs a bit of work

    this PR changes all mentions of "days" to more flexible "ticks", currently configured by ConfigTicksSinceStartTicksSize (or --tick-size) that accepts an integer value as the number of hours, though tick size is actually tracked as time.Duration so even more granular options could be exposed (though likely doesn't need to)

    internal/plumbing/day.go was renamed to ticks.go which unfortunately makes the diff a bit hard to review, sorry!

    default tick size

    sets to 24 hours as before, so hopefully it's not significantly breaking

    ❯ hercules --granularity=5 --sampling=5 --burndown
      version: 9
      hash: 6c50214b59f32b8714469380e1ffcd0cb044517d
      begin_unix_time: 1548537944
      end_unix_time: 1548645154
      commits: 83
      run_time: 126
      granularity: 5
      sampling: 5
      "project": |-

    custom tick size

    for example, with 1 hour:

    ❯ hercules --granularity=5 --sampling=5 --tick-size=1 --burndown --first-parent 
      version: 9
      hash: 6c50214b59f32b8714469380e1ffcd0cb044517d
      begin_unix_time: 1548537944
      end_unix_time: 1548645154
      commits: 10
      run_time: 47
      granularity: 5
      sampling: 5
      "project": |-
        1531        0      0      0      0      0
          1531      0      0      0      0      0
          1258      0    960      0      0      0
          1258      0    960      0      0      0
          1255      0    945      0 468480      0
          1255      0    942      0 468450   2076


    any pointers on these would be appreciated!

    • [x] a few failing tests I don't really understand:
    • [x] can't seem to get labours to work, but I havent investigated it yet:

    edit works with hercules --burndown --first-parent --tick-size=1 --pb | python3 -f pb -m burndown-project, I think it's an issue with small repos. that said, the timescales dont work properly with anything other than --tick-size=24 for now.

    ~/go/src/ master* ⇡
    ❯ hercules --granularity=3 --sampling=3 --tick-size=1 --burndown --pb | python3 -f pb -m burndown-projectdoneing...
    project lifetime index: 0.20172258097493653
    resampling to year, please wait...
    Traceback (most recent call last):
      File "", line 1942, in <module>
      File "", line 1913, in main
      File "", line 1759, in project_burndown
      File "", line 627, in load_burndown
        daily[istart:ifinish, (sdt - start).days:].sum(axis=0)
    UnboundLocalError: local variable 'sdt' referenced before assignment
    ~/go/src/ master ⇡
    ❯ hercules --granularity=3 --sampling=3  --burndown --pb | python3 -f pb -m burndown-project 
    Traceback (most recent call last):
      File "", line 1942, in <module>
      File "", line 1913, in main
      File "", line 1759, in project_burndown
      File "", line 589, in load_burndown
        print(name, "lifetime index:", calculate_average_lifetime(matrix))
      File "", line 439, in calculate_average_lifetime
        lifetimes[i - start] = band[i - 1]
    IndexError: index -1 is out of bounds for axis 0 with size 0
  • Reconsider bold statement

    Reconsider bold statement

    In the README it says:

    what git-of-theseus does but much faster.

    This is a claim I cannot support from my experience using both, git-of-theseus and hercules, with the same repo (granted which is huge).

    If you need numbers, I’ll provide them. Just let me know which ones you are interested in.

  • Hercules crashes with language filter

    Hercules crashes with language filter


    I tried hercules on a quite big private repository of our company. It crashes when I use the language filter. If we look at the CLI output below we see that the first variant without the --languages csharp runs and generate valid (and interesting, thanks!) data. When I add the language filter it crashes after some time.

    $ ./hercules.linux_amd64 --burndown --granularity 2 --sampling 2 --pb --commits develop_stretching_hashes repo-cache > baukasten_burndown_stretching.pb
    ./hercules.linux_amd64 --burndown --granularity 2 --sampling 2 --pb --commits  8785,81s user 60,45s system 103% cpu 2:22:28,83 total
    $ ./hercules.linux_amd64 --burndown --granularity 2 --sampling 2 --pb --commits develop_stretching_hashes repo-cache --languages csharp > baukasten_burndown_stretching_csharp_only.pb
    finalizing...2019/04/07 20:23:49 Failed to run the pipeline on [[email protected]:ORG/REPO.git]
    panic: empty history
    goroutine 1 [running]:*BurndownAnalysis).groupSparseHistory(0xc002248400, 0xc002eddf50, 0xffffffffffffffff, 0xc02947b458, 0x88043c, 0xc000000180, 0x300000002)
    	/home/travis/gopath/src/ +0x703*BurndownAnalysis).Finalize(0xc002248400, 0x15d2540, 0xc0000dc100)
    	/home/travis/gopath/src/ +0x55*Pipeline).Run(0xc02947bcc8, 0xc003736000, 0x12b8, 0x1400, 0x0, 0x0, 0x0)
    	/home/travis/gopath/src/ +0x714
    main.glob..func3(0x21d3ec0, 0xc000a6a0b0, 0x1, 0xb)
    	/home/travis/gopath/src/ +0x85b*Command).execute(0x21d3ec0, 0xc0000cc010, 0xb, 0xb, 0x21d3ec0, 0xc0000cc010)
    	/home/travis/gopath/src/ +0x2ae*Command).ExecuteC(0x21d3ec0, 0xc000a225e0, 0xc00016ff88, 0x84f65f)
    	/home/travis/gopath/src/ +0x2ec*Command).Execute(...)
    	/home/travis/gopath/src/ +0x32
    ./hercules.linux_amd64 --burndown --granularity 2 --sampling 2 --pb --commits  854,87s user 46,41s system 108% cpu 13:51,67 total

    As I can not give anyone access to the repository to reproduce this error feel free to close this issue if you want. I will try in the meantime to reduce the commandline flags used to produce the crash. This can take some time because the crash happend after 1/4 hour. And yes, there are C# files :D.

    Edit: More crashes

    ./hercules.linux_amd64 --burndown --granularity 2 --sampling 2 --pb repo-cache --languages csharp > baukasten_burndown_csharp_only.pb                                               
    finalizing...2019/04/07 20:38:33 Failed to run the pipeline on [[email protected]:ORG/REPO.git]
    panic: empty history
    goroutine 1 [running]:*BurndownAnalysis).groupSparseHistory(0xc0087c0f00, 0xc002542cf0, 0xffffffffffffffff, 0xc01b679458, 0x88043c, 0xc000000180, 0x300000002)
    	/home/travis/gopath/src/ +0x703*BurndownAnalysis).Finalize(0xc0087c0f00, 0x15d2540, 0xc0000da300)
    	/home/travis/gopath/src/ +0x55*Pipeline).Run(0xc01b679cc8, 0xc002bb4000, 0x8b9, 0x900, 0x0, 0x0, 0x0)
    	/home/travis/gopath/src/ +0x714
    main.glob..func3(0x21d3ec0, 0xc000a623f0, 0x1, 0x9)
    	/home/travis/gopath/src/ +0x85b*Command).execute(0x21d3ec0, 0xc0000cc010, 0x9, 0x9, 0x21d3ec0, 0xc0000cc010)
    	/home/travis/gopath/src/ +0x2ae*Command).ExecuteC(0x21d3ec0, 0xc000a26620, 0xc00016ff88, 0x84f65f)
    	/home/travis/gopath/src/ +0x2ec*Command).Execute(...)
    	/home/travis/gopath/src/ +0x32
    ./hercules.linux_amd64 --burndown --granularity 2 --sampling 2 --pb repo-cach  475,91s user 19,12s system 105% cpu 7:49,19 total
    $ ./hercules.linux_amd64 --burndown --pb repo-cache --languages csharp > baukasten_burndown_csharp_only.pb                             
    finalizing...2019/04/07 20:46:45 Failed to run the pipeline on [[email protected]:ORG/REPO.git]
    panic: empty history
    goroutine 1 [running]:*BurndownAnalysis).groupSparseHistory(0xc00de68e00, 0xc00329cba0, 0xffffffffffffffff, 0xc007ee7458, 0x88043c, 0xc000000180, 0x300000002)
    	/home/travis/gopath/src/ +0x703*BurndownAnalysis).Finalize(0xc00de68e00, 0x15d2540, 0xc0000b2300)
    	/home/travis/gopath/src/ +0x55*Pipeline).Run(0xc007ee7cc8, 0xc002144000, 0x8b9, 0x900, 0x0, 0x0, 0x0)
    	/home/travis/gopath/src/ +0x714
    main.glob..func3(0x21d3ec0, 0xc000a3a550, 0x1, 0x5)
    	/home/travis/gopath/src/ +0x85b*Command).execute(0x21d3ec0, 0xc0000321f0, 0x5, 0x5, 0x21d3ec0, 0xc0000321f0)
    	/home/travis/gopath/src/ +0x2ae*Command).ExecuteC(0x21d3ec0, 0xc000a1e600, 0xc00013ff88, 0x84f65f)
    	/home/travis/gopath/src/ +0x2ec*Command).Execute(...)
    	/home/travis/gopath/src/ +0x32
    $ ./hercules.linux_amd64 --burndown --languages csharp repo-cache
    finalizing...2019/04/07 20:50:36 Failed to run the pipeline on [[email protected]:ORG/REPO.git]
    panic: empty history
    goroutine 1 [running]:*BurndownAnalysis).groupSparseHistory(0xc0000c1a00, 0xc00245ba40, 0xffffffffffffffff, 0xc0131e7458, 0x88043c, 0xc000000180, 0x300000002)
    	/home/travis/gopath/src/ +0x703*BurndownAnalysis).Finalize(0xc0000c1a00, 0x15d2540, 0xc0000c0100)
    	/home/travis/gopath/src/ +0x55*Pipeline).Run(0xc0131e7cc8, 0xc0027f6000, 0x8b9, 0x900, 0x0, 0x0, 0x0)
    	/home/travis/gopath/src/ +0x714
    main.glob..func3(0x21d3ec0, 0xc000a3e200, 0x1, 0x4)
    	/home/travis/gopath/src/ +0x85b*Command).execute(0x21d3ec0, 0xc000032060, 0x4, 0x4, 0x21d3ec0, 0xc000032060)
    	/home/travis/gopath/src/ +0x2ae*Command).ExecuteC(0x21d3ec0, 0xc000233690, 0xc00014df88, 0x84f65f)
    	/home/travis/gopath/src/ +0x2ec*Command).Execute(...)
    	/home/travis/gopath/src/ +0x32


     $ pip freeze | rg labours
     $ ./hercules.linux_amd64 version
    Version: 10
    Git:     b856b666909194669e93d2c8dd5f86a96d9f60dc
  • Unable to

    Unable to "pip install labours" on Mac OSX Mojave

    I'm trying to install labours as suggested by the README:

    My env is

    manuelkoch [~/tmp]
    $ python -c "import platform; print(platform.architecture())"
    ('64bit', '')
    manuelkoch [~/tmp]
    $ python --version
    Python 3.6.4
    manuelkoch [~/tmp]
    $ python -m pip --version
    pip 19.0.3 from /Users/manuelkoch/.pyenv/versions/hercules-3.6.4/lib/python3.6/site-packages/pip (python 3.6)
    $ pip install labours
    Collecting labours
      Using cached
    Collecting clint<1.0,>=0.5.1 (from labours)
      Using cached
    Collecting protobuf<4.0,>=3.5.0 (from labours)
      Using cached
    Collecting hdbscan<2.0,>=0.8.0 (from labours)
      Using cached
      Installing build dependencies ... done
      Getting requirements to build wheel ... done
        Preparing wheel metadata ... done
    Collecting PyYAML<5.0,>=3.0 (from labours)
    Collecting pandas<1.0,>=0.20.0 (from labours)
      Using cached
    Collecting fastdtw<2.0,>=0.3.2 (from labours)
      Using cached
    Collecting numpy<2.0,>=1.12.0 (from labours)
      Using cached
    Collecting munch<3.0,>=2.0 (from labours)
      Using cached
    Collecting scipy<2.0,>=0.19.0 (from labours)
      Using cached
    Collecting python-dateutil<3.0,>=2.6.0 (from labours)
      Using cached
    Collecting seriate<2.0,>=1.0 (from labours)
      Using cached
    Collecting matplotlib<4.0,>=2.0 (from labours)
      Using cached
    Collecting args (from clint<1.0,>=0.5.1->labours)
      Using cached
    Requirement already satisfied: setuptools in /Users/manuelkoch/.pyenv/versions/3.6.4/envs/hercules-3.6.4/lib/python3.6/site-packages (from protobuf<4.0,>=3.5.0->labours) (28.8.0)
    Collecting six>=1.9 (from protobuf<4.0,>=3.5.0->labours)
      Using cached
    Collecting scikit-learn>=0.17 (from hdbscan<2.0,>=0.8.0->labours)
      Downloading (8.0MB)
        100% |████████████████████████████████| 8.0MB 2.0MB/s
    Collecting cython>=0.27 (from hdbscan<2.0,>=0.8.0->labours)
      Using cached
    Collecting pytz>=2011k (from pandas<1.0,>=0.20.0->labours)
      Downloading (510kB)
        100% |████████████████████████████████| 512kB 26.4MB/s
    Collecting ortools>=6.9.5824 (from seriate<2.0,>=1.0->labours)
      Could not find a version that satisfies the requirement ortools>=6.9.5824 (from seriate<2.0,>=1.0->labours) (from versions: 6.5.4527, 6.6.4656, 6.6.4659, 6.7.4957, 6.7.4973)
    No matching distribution found for ortools>=6.9.5824 (from seriate<2.0,>=1.0->labours)

    Any help appreciated !

  • Github action fails with commits not found

    Github action fails with commits not found

    The following error was reported in the github actions.

    Run src-d/hercules@master
    /usr/bin/docker run --name srcdherculeslatest_947185 --label 488dfb --workdir /github/workspace --rm -e INPUT_ARGS -e HOME -e GITHUB_REF -e GITHUB_SHA -e GITHUB_REPOSITORY -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_ACTOR -e GITHUB_WORKFLOW -e GITHUB_HEAD_REF -e GITHUB_BASE_REF -e GITHUB_EVENT_NAME -e GITHUB_WORKSPACE -e GITHUB_ACTION -e GITHUB_EVENT_PATH -e RUNNER_OS -e RUNNER_TOOL_CACHE -e RUNNER_TEMP -e RUNNER_WORKSPACE -e ACTIONS_RUNTIME_URL -e ACTIONS_RUNTIME_TOKEN -e GITHUB_ACTIONS=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/home/runner/work/_temp/_github_home":"/github/home" -v "/home/runner/work/_temp/_github_workflow":"/github/workflow" -v "/home/runner/work/root-monorepo/root-monorepo":"/github/workspace" srcd/hercules:latest  "/bin/bash" "-c" "hercules --burndown --burndown-people --devs --couples --pb . | labours -m all -f pb --disable-projector -o hercules_charts && cd hercules_charts && tar -cf ../hercules_charts.tar * ../hercules_charts_* && cd .. && rm -r hercules_charts"
    git log...
    2020/01/25 15:03:38 failed to list the commits: object not found
    Traceback (most recent call last):
      File "/usr/local/bin/labours", line 11, in <module>
        load_entry_point('labours==10.7.2', 'console_scripts', 'labours')()
      File "/usr/local/lib/python3.6/dist-packages/labours/", line 154, in main
        reader = read_input(args)
      File "/usr/local/lib/python3.6/dist-packages/labours/", line 439, in read_input
      File "/usr/local/lib/python3.6/dist-packages/labours/", line 230, in read
        raise ValueError("empty input")
    ValueError: empty input
    Reading the input... 

    From the output it really does look like the volumes in the docker container are property being setup and it should be running the command in the property directory.

    The action was copied/pasted from your example. This tool looks awesome, and lots of great work. Thank you.

  • Commits analyzis

    Commits analyzis

    Replacement for git log --stat. It uses hercules pipeline so it:

    • skips vendor files
    • skips binary files
    • adds language to each file
    • uses provided author identities or merges authors automatically

    I didn't implement Merge because I'm not sure it makes much sense to merge results of this analyzis. Please let me know what you think about this feature.

    P.S. I needed it as a part of PoC task

  • Add parameter to skip vendor directories

    Add parameter to skip vendor directories

    In golang it's common practice to commit vendor directory. In other languages, people do it sometimes too.

    It messes up results of analysis a lot. This option allows to skip files which start with most popular prefixes vendor/, vendors/, node_modules/.

