EarlyBird is a sensitive data detection tool capable of scanning source code repositories for clear text password violations, PII, outdated cryptography methods, key files and more.

Logo

EarlyBird is a sensitive data detection tool capable of scanning source code repositories for clear text password violations, PII, outdated cryptography methods, key files and more. It can be used to scan remote git repositories, local files or directories or as a pre-commit step.

Installation

Linux & Mac

Running the build.sh script will produce a binary for each OS, while the install.sh script will install Earlybird on your system. This will create a .go-earlybird directory in your home directory with all the configuration files. Finally installing go-earlybird as an executable in /usr/local/bin/.

./build.sh && ./install.sh

Windows

Running build.bat will produce your binaries while the install.bat script will create a 'go-earlybird' directory in C:\Users\[my user]\App Data\, and copy the required configurations there. This script will also install go-earlybird.exe as an executable in the App Data directory (which should be in your path).

build.bat && install.bat

Usage

To launch a basic EarlyBird scan against a directory:

$ go-earlybird --path=/path/to/directory
$ go-earlybird.exe --path=C:\path\to\directory

or to scan a remote git repo:

$ go-earlybird --git=https://github.com/americanexpress/earlybird

Click here for Detailed Usage instructions.

Documentation

Why Are We Doing This?

The MITRE Corporation provides a catalog of Common Weakness Enumerations (CWE), documenting issues that should be avoided. Some of the relevant CWEs that are handled by the use of EarlyBird include:


Contributing

We welcome your interest in the American Express Open Source Community on Github. Any Contributor to any Open Source Project managed by the American Express Open Source Community must accept and sign an Agreement indicating agreement to the terms below. Except for the rights granted in this Agreement to American Express and to recipients of software distributed by American Express, You reserve all right, title, and interest, if any, in and to your contributions. Please fill out the Agreement.

License

Any contributions made under this project will be governed by the Apache License 2.0.

Code of Conduct

This project adheres to the American Express Community Guidelines. By participating, you are expected to honor these guidelines.

Comments
  • Error: invalid memory address or nil pointer dereference

    Error: invalid memory address or nil pointer dereference

    I have just built the binaries from the source code (both linux/amd64 and windows/amd64 behave the same way). When executing it I get as a result:

    panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xa12032]

    goroutine 1 [running]: github.com/americanexpress/earlybird/pkg/core.(*EarlybirdCfg).GetRuleModulesMap.func1(0xc000026f60, 0x19, 0x0, 0x0, 0xc47f00, 0xc00010ed80, 0xc000107be8, 0x4108ad) /var/lib/jenkins/workspace/Earlybird-build-binaries/earlybird/pkg/core/core.go:148 +0x32 path/filepath.Walk(0xc000026f60, 0x19, 0xc000107c30, 0xc000026f60, 0x19) /usr/lib/golang/src/path/filepath/path.go:404 +0x6b github.com/americanexpress/earlybird/pkg/core.(*EarlybirdCfg).GetRuleModulesMap(0x1044e80, 0x10445e0, 0xc00002c810) /var/lib/jenkins/workspace/Earlybird-build-binaries/earlybird/pkg/core/core.go:147 +0xef github.com/americanexpress/earlybird/pkg/core.(*EarlybirdCfg).ConfigInit(0x1044e80) /var/lib/jenkins/workspace/Earlybird-build-binaries/earlybird/pkg/core/core.go:176 +0x2ef main.main() /var/lib/jenkins/workspace/Earlybird-build-binaries/earlybird/go-earlybird.go:47 +0x35d

  • Build fails on macOS

    Build fails on macOS

    When running build.sh, the build fails with the following log:

    Running Unit Tests
    go: downloading golang.org/x/net v0.0.0-20200324143707-d3edc9973b7e
    go: downloading github.com/gorilla/mux v1.7.4
    go: downloading github.com/gocarina/gocsv v0.0.0-20200330101823-46266ca37bd3
    go: downloading gopkg.in/src-d/go-git.v4 v4.13.1
    go: downloading github.com/dghubble/sling v1.3.0
    go: downloading github.com/howeyc/gopass v0.0.0-20190910152052-7cb4b85ec19c
    go: downloading github.com/google/go-github v17.0.0+incompatible
    go: downloading golang.org/x/text v0.3.2
    go: downloading golang.org/x/crypto v0.0.0-20190701094942-4def268fd1a4
    go: downloading github.com/google/go-querystring v1.0.0
    go: downloading github.com/sergi/go-diff v1.0.0
    go: downloading gopkg.in/src-d/go-billy.v4 v4.3.2
    go: downloading github.com/kevinburke/ssh_config v0.0.0-20190725054713-01f96b0aa0cd
    go: downloading github.com/xanzy/ssh-agent v0.2.1
    go: downloading github.com/mitchellh/go-homedir v1.1.0
    go: downloading github.com/emirpasic/gods v1.12.0
    go: downloading golang.org/x/sys v0.0.0-20200323222414-85ca7c5b95cd
    go: downloading github.com/jbenet/go-context v0.0.0-20150711004518-d14ea06fba99
    go: downloading github.com/src-d/gcfg v1.4.0
    go: downloading gopkg.in/warnings.v0 v0.1.2
    # github.com/americanexpress/earlybird/pkg/api
    pkg/api/api.go:72:19: conversion from Duration (int64) to string yields a string of one rune, not a string of digits (did you mean fmt.Sprint(x)?)
    pkg/api/api.go:141:19: conversion from Duration (int64) to string yields a string of one rune, not a string of digits (did you mean fmt.Sprint(x)?)
    # github.com/americanexpress/earlybird/pkg/core
    pkg/core/core.go:261:19: conversion from Duration (int64) to string yields a string of one rune, not a string of digits (did you mean fmt.Sprint(x)?)
    # github.com/americanexpress/earlybird/pkg/writers
    pkg/writers/jsonout_test.go:55:17: conversion from Duration (int64) to string yields a string of one rune, not a string of digits (did you mean fmt.Sprint(x)?)
    Unit Tests FAILED!
    FAIL	github.com/americanexpress/earlybird/pkg/api [build failed]
    ok  	github.com/americanexpress/earlybird/pkg/config	0.325s
    FAIL	github.com/americanexpress/earlybird/pkg/core [build failed]
    Failed to open ignore file open .ge_ignore: no such file or directory
    Failed to open ignore file open /Users/phil/.ge_ignore: no such file or directory
    Failed to open ignore file open .ge_ignore: no such file or directory
    --- FAIL: Test_isIgnoredFile (0.00s)
        --- FAIL: Test_isIgnoredFile/Check_if_file_is_ignored (0.00s)
            fileUtil_test.go:148: isIgnoredFile() = false, want true
    FAIL
    FAIL	github.com/americanexpress/earlybird/pkg/file	0.222s
    ok  	github.com/americanexpress/earlybird/pkg/git	0.487s
    ok  	github.com/americanexpress/earlybird/pkg/postprocess	0.209s
    ok  	github.com/americanexpress/earlybird/pkg/scan	0.254s
    ok  	github.com/americanexpress/earlybird/pkg/update	0.468s
    ok  	github.com/americanexpress/earlybird/pkg/utils	0.252s
    ok  	github.com/americanexpress/earlybird/pkg/wildcard	0.184s
    FAIL	github.com/americanexpress/earlybird/pkg/writers [build failed]
    FAIL
    

    This occurs on macOS version 10.15.5, and Go version go1.15.2 darwin/amd64.

  • Cloning of git repository leaves files in temporary directory

    Cloning of git repository leaves files in temporary directory

    After data is being downloaded using -git flag to temporary directory it is not being removed after data check. I think that the expected behavior should be that the data is removed from the temporary location after checking presence of sensitive data.

    The problem is that if I am running check against many code repositories temp directory grows significantly and I need to have additional monitoring activity to erase temporary data.

    Version 1.24.6

  • Configuration in default path is mandatory to launch the tool

    Configuration in default path is mandatory to launch the tool

    Even if configuration parameter is set in command line the default configuration path is still trying to load. And if this path is not present earlybird fails to launch with error: "Failed to load Earlybird configopen /var/lib/jenkins/.go-earlybird/earlybird.json: no such file or directory"

    Expected: If configuration path is provided in command line, the default configuration path should not be checked.

    Version: 1.24.6

  • meta: Comparison to gitleaks?

    meta: Comparison to gitleaks?

    👋 Cool project! I was curious how earlybird compares to other projects like gitleaks? I see earlybird can scan a bit more types of targets, but are the patterns both recognize the same? I've used gitleaks for a while and am curious to adopt both tools.

    Project: https://github.com/zricethezav/gitleaks

  • Ignorefile case sensitivity is broken

    Ignorefile case sensitivity is broken

    .ge_ignore file case sensitivity works in a weird way. At least on Windows machine. Entries in ignore file have to be put in lower case to match something.

    The problem is that when in ignore file is entry: '*.txt' it matches any *.txt files (i.e. Readme.txt, readme.TXT) which is good behavior. But when I put entry "*.TXT" it does not match anything (at least not expecting readme.TXT). It showed when I tried to exclude some path which contained word "Libs" (i.e. "Libs/sweet.lib"). When I tried with all combinations of "*/Libs/*", "*Libs*", "Libs*" nothing worked. Only "*/libs/*" was matching.

    This is not a huge problem, but it is a very counter intuitive behavior.

    Version: 1.24.6

  • Fix/version issue

    Fix/version issue

    • Introduced version injection via ldflags during build to avoid maintaining the version
    • Switched to using semantic-release
    • semantic-release will dynamically generate next release version and generate CHANGELOG.md based of the commit msgs.
    • Update golang version to go 1.18
    • fixed few failing unit test
  • feat: add keepAlive flag and fix worker flag read

    feat: add keepAlive flag and fix worker flag read

    • Added disableKeepAlive flag to address high sockets opening while running on super high load as http on Kube Clusters
    • Fix the issue with workerSize not being read from flags provided with 100 as default
  • docs: fix typo in readme

    docs: fix typo in readme

    I found this typo in the readme, it did get me thinking though: should it be EarlyBird consistently across the docs or Earlybird? I see both usages in the Readme.

  • Wrong verbose information about scanned files

    Wrong verbose information about scanned files

    When I scan folder with 4 files I see in log:

    Reading file Reading file /mnt/c/git-repo/earlybird/verify/.ge_ignore Reading file /mnt/c/git-repo/earlybird/verify/checkfile.properties Reading file /mnt/c/git-repo/earlybird/verify/checkfile2.properties ***** Total issues found ***** 0 TOTAL ISSUES

    4 files scanned in 18.7711ms

    The first entry is "Reading file" without the data of the file in folder.

  • Base directory should be ignored during ignore matching

    Base directory should be ignored during ignore matching

    This is rather an enhancement than a bug, but makes things more clear

    As of now whole path is being checked against ignorefile BUT Imagine the situation that:

    1. Ignore file contains entry "/test/"
    2. User uses his /var/lib/test/projects/ directory to download his projects into Then when Earlybird is executed with parameter: -path " /var/lib/test/projects/" it will ignore all project files, nothing will be scanned

    Proposed remediation: Base directory path should be removed from matching against ignored patterns

  • Feature/installation revamp

    Feature/installation revamp

    feature/installation_revamp

    • feat: Added Dockerfile
    • feat: Refactored build.bat and build.sh to perform similar tasks and provide uniformed stdout
    • feat: Refactored install.bat and install.sh to perform similar tasks and provide uniformed stdout
    • feat: Created uninstall.bat and uninstall.sh to clean up an EarlyBird installation and provide uniformed stdout
    • feat: EarlyBird now evaluates files that are 0 bytes in size. This is important for counts and remediation reporting. Test condition also updated.
    • fix: Updated pre-commit.sample script ensuring it's both linux git and windows git compatible (Removed pre-commit_windows.sample)
    • fix: File path refactor to ensure cross-environment compatibility (path to filepath)
    • fix: isDirectory function refactor to correctly detect whether a provided path is a directory. Test condition also updated
    • chore: README.md, HOOKs.md, USAGE.md documentation updates reflecting updates installation processes
    • chore: CLI argument description updates while updating filepaths
    • chore: Date reference updates to 2023
    • chore: go.mod update
    • chore: Small refactors to code, logging and comments along the way.
  • Ignore false positive string

    Ignore false positive string

    I see that we have a way of ignoring a file. Can we introduce a way to ignore a string as well?

    example: in my .env.example I have placeholders

    my_secret=ThisIsASecretToReplace

    I want them to see this in the first run and then add "ThisIsASecretToReplace" to an exception list. By doing this, it still forces them to think about the data they are putting in .env.example and will always require the initial review of the finding. Currently I have to ignore the file .env.example altogether which means if someone actually puts a secret in there that is valid then no one will be monitoring (outside of the PR review).

  • Feature Request: GitHub Action for EarlyBird

    Feature Request: GitHub Action for EarlyBird

    Couldn't find a Discord/Slack/etc to post this to so I will post this here. It would be amazing if this repo could support a GitHub Action so that we can bake this into our CI/CD easily.

  • ignore files issue

    ignore files issue

    I'm testing earlybird against an ansible role repository, normally an easy one. I still get many false-positive matches that I'm trying to ignore zith ~/.ge_ignore but it seems the pattern style is not working or a format that I'm not expecting. Tried with filename, shell wildcard pattern. also double wildcard like https://github.com/aschaef19/earlybird/blob/main/.ge_ignore

    $ go-earlybird -path=. -verbose
    022/04/30 10:08:10 Go-EarlyBird version:  2.0.0
    Severity Fail threshold (at or above):  low
    Confidence Fail threshold (at or above):  low
    Severity Display threshold (at or above):  low
    Confidence Display threshold (at or above):  low
    Max file size to scan:  10240000  bytes
    2022/04/30 10:08:10 loading module:  ccnumber
    2022/04/30 10:08:10 loading module:  content
    2022/04/30 10:08:10 loading module:  filename
    2022/04/30 10:08:10 loading module:  inclusivity-rules
    2022/04/30 10:08:10 loading module:  password-secret
    2022/04/30 10:08:10 Scanning directory:  .
    2022/04/30 10:08:10 Ignore pattern:  *.git/*, .vagrant/, *.retry, .kitchen, inspec.lock, [._]*.s[a-v][a-z], [._]*.sw[a-p], [._]s[a-v][a-z], [._]sw[a-p], Session.vim, Sessionx.vim, .netrwhist, *~, tags, [._]*.un~, __pycache__/, *.py[cod], *$py.class, *.so, .Python, build/, develop-eggs/, dist/, downloads/, eggs/, .eggs/, lib/, lib64/, parts/, sdist/, var/, wheels/, *.egg-info/, .installed.cfg, *.egg, MANIFEST, *.manifest, *.spec, pip-log.txt, pip-delete-this-directory.txt, htmlcov/, .tox/, .coverage, .coverage.*, .cache, nosetests.xml, coverage.xml, *.cover, .hypothesis/, *.mo, *.pot, *.log, .static_storage/, .media/, local_settings.py, instance/, .webassets-cache, .scrapy, docs/_build/, target/, .ipynb_checkpoints, .python-version, celerybeat-schedule, *.sage.py, .env, .venv, env/, venv/, ENV/, env.bak/, venv.bak/, .spyderproject, .spyproject, .ropeproject, /site, .mypy_cache/, .DS_Store, .AppleDouble, .LSOverride, Icon, ._*, .DocumentRevisions-V100, .fseventsd, .Spotlight-V100, .TemporaryItems, .Trashes, .VolumeIcon.icns, .com.apple.timemachine.donotpresent, .AppleDB, .AppleDesktop, Network Trash Folder, Temporary Items, .apdisk, *~, .fuse_hidden*, .directory, .Trash-*, .nfs*, Thumbs.db, ehthumbs.db, ehthumbs_vista.db, *.stackdump, [Dd]esktop.ini, $RECYCLE.BIN/, *.cab, *.msi, *.msm, *.msp, *.lnk, secring.*, *.ca, *.crt, *.csr, *.der, *.kdb, *.org, *.p12, *.pem, *.rnd, *.ssleay, *.smime, **/.git, **/.gitignore, **/.github/workflows/galaxy.yml, **/.secrets.baseline, **/.pre-commit-config.yaml, **/test/earlybird/falsepositives-ansible.yaml, .git, .gitignore, .github/workflows/galaxy.yml, galaxy.yml, .secrets.baseline, .pre-commit-config.yaml, test/earlybird/falsepositives-ansible.yaml, falsepositives-ansible.yaml, */.git, */.gitignore, /.git, /.gitignore, */.git*, */.gitignore*, ./.git, ./.gitignore
    2022/04/30 10:08:10 Reading file  .ansible-lint
    2022/04/30 10:08:10 Reading file  .codespellignore
    2022/04/30 10:08:10 Reading file  .git
    2022/04/30 10:08:10 Reading file  .github/stale.yml
    2022/04/30 10:08:10 Reading file  .github/workflows/codespell.yml
    2022/04/30 10:08:10 Reading file  .github/workflows/default.yml
    2022/04/30 10:08:10 Reading file  .github/workflows/dryrun-bare.yml
    2022/04/30 10:08:10 Reading file  .github/workflows/earlybird.yml
    2022/04/30 10:08:10 Reading file  .github/workflows/galaxy.yml
    2022/04/30 10:08:10 Reading file  .github/workflows/lint.yml
    2022/04/30 10:08:10 Reading file  .gitignore
    [...]
    

    from my reading of code, "Reading file " should not appear if file is correctly ignored https://github.com/americanexpress/earlybird/blob/main/pkg/file/fileUtil.go#L196 match seems custom character per character as per https://github.com/americanexpress/earlybird/blob/main/pkg/wildcard/patternMatch.go

    Note that even if it says " Go-EarlyBird version: 2.0.0", this is from latest download aka https://github.com/americanexpress/earlybird/releases/download/v3.12.0/go-earlybird-linux

    Example run in https://github.com/juju4/ansible-adduser/runs/6142207431?check_suite_focus=true#step:6:1

    Thanks for sharing your work

  • Fix install.sh if clause

    Fix install.sh if clause

    == with wildcard only works in bash, [[ is also bash-only, not bourne-shell which is used in this script. To compare with wildcard in bourne-shell we can use the case statement

Related tags
Ah shhgit! Find secrets in your code. Secrets detection for your GitHub, GitLab and Bitbucket repositories: www.shhgit.com
Ah shhgit! Find secrets in your code. Secrets detection for your GitHub, GitLab and Bitbucket repositories: www.shhgit.com

shhgit helps secure forward-thinking development, operations, and security teams by finding secrets across their code before it leads to a security br

Dec 23, 2022
The dynamic infrastructure framework for everybody! Distribute the workload of many different scanning tools with ease, including nmap, ffuf, masscan, nuclei, meg and many more!
The dynamic infrastructure framework for everybody! Distribute the workload of many different scanning tools with ease, including nmap, ffuf, masscan, nuclei, meg and many more!

Axiom is a dynamic infrastructure framework to efficiently work with multi-cloud environments, build and deploy repeatable infrastructure focussed on

Dec 30, 2022
WIP. Converts Azure Container Scan Action output to SARIF, for an easier integration with GitHub Code Scanning

container-scan-to-sarif container-scan-to-sarif converts Azure Container Scan Action output to Static Analysis Results Interchange Format (SARIF), for

Jan 25, 2022
Dec 28, 2022
Secure software enclave for storage of sensitive information in memory.

MemGuard Software enclave for storage of sensitive information in memory. This package attempts to reduce the likelihood of sensitive data being expos

Dec 30, 2022
Sensitive information protection toolkit

godlp 一、简介 为了保障企业的数据安全和隐私安全,godlp 提供了一系列针对敏感数据的识别和处置方案, 其中包括敏感数据识别算法,数据脱敏处理方式,业务自定义的配置选项和海量数据处理能力。 godlp 能够应用多种隐私合规标准,对原始数据进行分级打标、判断敏感级别和实施相应的脱敏处理。 In

Jan 1, 2023
:key: Idiotproof golang password validation library inspired by Python's passlib

passlib for go Python's passlib is quite an amazing library. I'm not sure there's a password library in existence with more thought put into it, or wi

Dec 30, 2022
Nuclei is a fast tool for configurable targeted vulnerability scanning based on templates offering massive extensibility and ease of use.
Nuclei is a fast tool for configurable targeted vulnerability scanning based on templates offering massive extensibility and ease of use.

Fast and customisable vulnerability scanner based on simple YAML based DSL. How • Install • For Security Engineers • For Developers • Documentation •

Dec 30, 2022
🌘🦊 DalFox(Finder Of XSS) / Parameter Analysis and XSS Scanning tool based on golang
🌘🦊 DalFox(Finder Of XSS) / Parameter Analysis and XSS Scanning tool based on golang

Finder Of XSS, and Dal(달) is the Korean pronunciation of moon. What is DalFox ?? ?? DalFox is a fast, powerful parameter analysis and XSS scanner, bas

Jan 5, 2023
Naabu - a port scanning tool written in Go that allows you to enumerate valid ports for hosts in a fast and reliable manner
Naabu - a port scanning tool written in Go that allows you to enumerate valid ports for hosts in a fast and reliable manner

Naabu is a port scanning tool written in Go that allows you to enumerate valid ports for hosts in a fast and reliable manner. It is a really simple tool that does fast SYN/CONNECT scans on the host/list of hosts and lists all ports that return a reply.

Jan 2, 2022
Portmantool - Port scanning and monitoring tool

portmantool Port scanning and monitoring tool Components runner while true do r

Feb 14, 2022
Driftwood is a tool that can enable you to lookup whether a private key is used for things like TLS or as a GitHub SSH key for a user.
Driftwood is a tool that can enable you to lookup whether a private key is used for things like TLS or as a GitHub SSH key for a user.

Driftwood is a tool that can enable you to lookup whether a private key is used for things like TLS or as a GitHub SSH key for a user. Drift

Dec 29, 2022
A Go Module to interact with Passbolt, a Open source Password Manager for Teams

go-passbolt A Go Module to interact with Passbolt, a Open source Password Manager for Teams This Module tries to Support the Latest Passbolt Community

Oct 29, 2022
A FreeSWITCH specific scanning and exploitation toolkit for CVE-2021-37624 and CVE-2021-41157.

PewSWITCH A FreeSWITCH specific scanning and exploitation toolkit for CVE-2021-37624 and CVE-2021-41157. Related blog: https://0xinfection.github.io/p

Nov 2, 2022
DockerSlim (docker-slim): Don't change anything in your Docker container image and minify it by up to 30x (and for compiled languages even more) making it secure too! (free and open source)
DockerSlim (docker-slim): Don't change anything in your Docker container image and minify it by up to 30x (and for compiled languages even more) making it secure too! (free and open source)

Minify and Secure Docker containers (free and open source!) Don't change anything in your Docker container image and minify it by up to 30x making it

Dec 27, 2022
erchive is a go program that compresses and encrypts files and entire directories into .zep files (encrypted zip files).

erchive/zep erchive is a go program that compresses and encrypts files and entire directories into .zep files (encrypted zip files). it compresses usi

May 16, 2022