Lightweight static analysis for many languages. Find bug variants with patterns that look like source code.

Semgrep logo

Lightweight static analysis for many languages.
Find bugs and enforce code standards.

Homebrew PyPI Issues welcome! Issues welcome! 1500+ GitHub stars Follow @r2cdev


Semgrep is a fast, open-source, static analysis tool that finds bugs and enforces code standards at editor, commit, and CI time. Precise rules look like the code you’re searching; no more traversing abstract syntax trees, wrestling with regexes, or using a painful DSL. Code analysis is performed locally (code is not uploaded) and Semgrep runs on uncompiled code.

The Semgrep Registry has 1,000+ rules written by the Semgrep community covering security, correctness, and performance bugs. No need to DIY unless you want to.

Semgrep is used in production everywhere from one-person startups to multi-billion dollar companies; it’s the engine inside tools like NodeJsScan. See tools powered by Semgrep.

Semgrep is developed and commercially supported by r2c, a software security company. r2c’s hosted service, Semgrep App, lets organizations easily deploy in CI, manage rules across many projects, monitor the efficacy of code policy, and integrate with 3rd-party services. r2c offers free and paid hosted tiers (see pricing).

Language support

General availability

Go · Java · JavaScript · JSX · JSON · Python · Ruby · TypeScript · TSX

Beta & experimental

See supported languages for the complete list.

Getting started

To install Semgrep use Homebrew or pip, or run without installation via Docker:

# For macOS
$ brew install semgrep

# For Ubuntu/WSL/Linux/macOS
$ python3 -m pip install semgrep

# To try Semgrep without installation run via Docker
$ docker run --rm -v "${PWD}:/src" returntocorp/semgrep --help

Once installed, Semgrep can run with single rules or entire rulesets. Visit Running rules to learn more or try the following:

# Check for Python == where the left and right hand sides are the same (often a bug)
$ semgrep -e '$X == $X' --lang=py path/to/src

# Run the r2c-ci ruleset (with rules for many languages) on your own code!
$ semgrep --config=p/r2c-ci path/to/src

Visit Getting started to learn more.

Rule examples

Visit Rule examples for use cases and ideas. There is also an excellent interactive tutorial.

Use case Semgrep rule
Ban dangerous APIs Prevent use of exec
Search routes and authentication Extract Spring routes
Enforce the use secure defaults Securely set Flask cookies
Enforce project best-practices Use assertEqual for == checks, Always check subprocess calls
Codify project-specific knowledge Verify transactions before making them
Audit security hotspots Finding XSS in Apache Airflow, Hardcoded credentials
Audit configuration files Find S3 ARN uses
Migrate from deprecated APIs DES is deprecated, Deprecated Flask APIs, Deprecated Bokeh APIs
Apply automatic fixes Use listenAndServeTLS

Integrations

Visit Integrations to learn about Semgrep editor, commit, and CI integrations. When integrated into CI and configured to scan pull requests, Semgrep will only report issues introduced by that pull request; this lets you start using Semgrep without fixing or ignoring pre-existing issues!

More

Upgrading

To upgrade, run the command below associated with how you installed Semgrep:

# Using Homebrew
$ brew upgrade semgrep

# Using pip
$ python3 -m pip install --upgrade semgrep

# Using Docker
$ docker pull returntocorp/semgrep:latest
Comments
  • C# support

    C# support

    I'm opening a ticket to indicate interest in support for the C# Language. I can help with developing it if you can point me how to start and where to look.

  • Running semgrep on Apple M1 ARM system

    Running semgrep on Apple M1 ARM system

  • Ability to exclude rules from rulesets on command line

    Ability to exclude rules from rulesets on command line

    Is your feature request related to a problem? Please describe. I want to explore additional rulesets to add to my semgrep scans in CI. Before enabling a new ruleset in CI, however, I want to see the results on the whole codebase. So I just came across p/r2c-bug-scan that I had not noticed before. I have a convenience entry in my makefile that sort of approximates the .semgrepignore list that CI honors, like so:

    SEMGREP_IGNORE := --exclude third_party --exclude *_test.go --exclude *.pb.go --exclude *.pb.*.go --exclude *.bindata.go --exclude *.spec.ts --exclude coverage
    semgrep-test/%:
    	semgrep --config "p/$(@F)" $(SEMGREP_IGNORE)
    

    Thus, from the command-line I can run this:

    make semgrep-test/r2c-bug-scan
    

    That gave me 38 findings, but most of them were a single rule that does not really seem useful:

    components/compliance-service/inspec/cli.go
    severity:warning rule:go.lang.correctness.permissions.file_permission.incorrect-default-permission: Expect permissions to be `0600` or less for ioutil.WriteFile()
    61:	err = ioutil.WriteFile(filename, content, 0644)
    
    components/compliance-service/reporting/util/zip.go
    severity:warning rule:go.lang.correctness.permissions.file_permission.incorrect-default-permission: Expect permissions to be `0600` or less for os.Chmod()
    121:			err = os.Chmod(name, 0777)
    
    components/config-mgmt-service/cmd/chef-automate-collect/commands/automate_config.go
    severity:warning rule:go.lang.correctness.permissions.file_permission.incorrect-default-permission: Expect permissions to be `0600` or less for ioutil.WriteFile()
    396:		err := os.Mkdir(configDir, 0700)
    
    etc...
    

    As I said, that is not useful to me, so I know that when I add the ruleset to my policy on the semgrep dashboard, I will exclude that rule.

    Describe the solution you'd like Which brings us to my point here--would be great if I could do the same thing on the command-line, so I could filter out that rule and run my "evaluation scan" again, something like this:

    SEMGREP_IGNORE_FILES := --exclude third_party --exclude *_test.go --exclude *.pb.go --exclude *.pb.*.go --exclude *.bindata.go --exclude *.spec.ts --exclude coverage
    SEMGREP_IGNORE_RULES := --exclude-rule go.lang.correctness.permissions.file_permission.incorrect-default-permission
    semgrep-test/%:
    	semgrep --config "p/$(@F)" $(SEMGREP_IGNORE_FILES) $(SEMGREP_IGNORE_RULES)
    

    (No need to be fluent in make -- the request here is simply to provide an --exclude-rule command-line option.)

  • Semgrep 0.15.0 performance issues

    Semgrep 0.15.0 performance issues

    Describe the bug

    $  time semgrep -f ~/Code/njsscan/njsscan/rules/semantic_grep/ ~/Downloads/juice-shop-master/
    running 105 rules...
     10%|███████████████▍                                                                                                                                   |11/105
    

    More than 15 minutes and still waiting.

    Rules: https://github.com/ajinabraham/njsscan/tree/master/njsscan/rules/semantic_grep Source: https://github.com/bkimminich/juice-shop

    Enhancement

    It would be nice to add an end to end test suite with semgrep that runs before release updates on real apps to catch performance bugs.

    Environment

    $ semgrep --version
    0.15.0
    
  • Rules with

    Rules with "too many matches" generate confusing "Timeout" error

    Describe the bug I am using semgrep in Azure Pipeline for PR analysis. Semgrep is getting timedout for a small JS file (~6K lines) in 5 seconds. I have seen multiple threads related to timeouts in JS but did not find any conclusion or plan of action. I have tried the command locally with only this file but no luck. Is there other workaround or any other option that I could use to get a scan no matter the time it takes?

    I have seen the use of -max_match_per_file flag being used for semgrep-core but not part of the CLI. Is it possible to use this with CLI? Will this help with time outs?

    To Reproduce Unfortunately, I can not share the JS file. Here is the command. The failing rule is javascript.browser.security.raw-html-concat.raw-html-concat

    semgrep --config "p/clientside-js" --timeout 0 --error --verbose --disable-version-check --include "********/application.js" --debug --strict

    Expected behaviour I expect the scan to complete as I've given timeout is 0.

    What is the priority of the bug to you? P0 - we are using semgrep to provide a report of secure coding practices to a client but since this single is not getting scanned we are unable to use semgrep.

    Environment Python package - 0.37.0

  • Very high core time (Windows)

    Very high core time (Windows)

    I am trying to troubleshoot why the execution on my codebase is so slow.

    Im running returntocorp/semgrep docker image on windows.

    ============================[ summary ]============================
    Total time: 251.3368s Config time: 2.9130s Core time: 248.4213s
    
    Semgrep-core time:
    Total CPU time: 8.5493s  File parse time: 4.5128s  Rule parse time: 0.3529s  Match time: 3.0912s
    Slowest 5/192 files
    ...abc.ts ( 58KB): 0.014s (0.204s to parse)
    ...qwe.ts ( 51KB): 0.013s (0.169s to parse)
    ...123.ts   ( 30KB): 1.082s (0.156s to parse)
    ...dummy.ts ( 37KB): 0.009s (0.146s to parse)
    ...other.ts ( 30KB): 0.009s (0.122s to parse)
    Slowest 5 rules to match
    ...ort.js-node.using-http-server.using-http-server:         0.000s
    ...transport.js-node.telnet-request.telnet-request:         0.000s
    ...st-http-client-support.rest-http-client-support:         0.002s
    ...ure-transport.js-node.http-request.http-request:         0.000s
    ...ecure-transport.js-node.ftp-request.ftp-request:         0.000s
    

    the reported times are not really adding up to the total time of 250seconds.

    Any tips?

  • semgrep thows exception on python setup.py install

    semgrep thows exception on python setup.py install

    Describe the bug A clear and concise description of what the bug is.

    Processing semgrep-0.31.1-cp36.cp37.cp38.py36.py37.py38-none-macosx_10_14_x86_64.whl
    Installing semgrep-0.31.1-cp36.cp37.cp38.py36.py37.py38-none-macosx_10_14_x86_64.whl to /Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages
    Adding semgrep 0.31.1 to easy-install.pth file
    Traceback (most recent call last):
      File "setup.py", line 56, in <module>
        install_requires=requirements,
      File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/__init__.py", line 144, in setup
        return distutils.core.setup(**attrs)
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/distutils/dist.py", line 966, in run_commands
        self.run_command(cmd)
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/install.py", line 67, in run
        self.do_egg_install()
      File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/install.py", line 117, in do_egg_install
        cmd.run(show_deprecation=False)
      File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/easy_install.py", line 424, in run
        self.easy_install(spec, not self.no_deps)
      File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/easy_install.py", line 673, in easy_install
        return self.install_item(None, spec, tmpdir, deps, True)
      File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/easy_install.py", line 720, in install_item
        self.process_distribution(spec, dist, deps)
      File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/easy_install.py", line 765, in process_distribution
        [requirement], self.local_index, self.easy_install
      File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/pkg_resources/__init__.py", line 783, in resolve
        replace_conflicting=replace_conflicting
      File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1066, in best_match
        return self.obtain(req, installer)
      File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1078, in obtain
        return installer(requirement)
      File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/easy_install.py", line 692, in easy_install
        return self.install_item(spec, dist.location, tmpdir, deps)
      File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/easy_install.py", line 720, in install_item
        self.process_distribution(spec, dist, deps)
      File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/easy_install.py", line 745, in process_distribution
        self.install_egg_scripts(dist)
      File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/easy_install.py", line 619, in install_egg_scripts
        dist.get_metadata('scripts/' + script_name)
      File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1426, in get_metadata
        return value.decode('utf-8')
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 0: invalid continuation byte in scripts/spacegrep file at path: /Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/semgrep-0.31.1-py3.7-macosx-10.9-x86_64.egg/EGG-INFO/scripts/spacegrep
    

    To Reproduce Steps to reproduce the behavior, ideally a link to https://semgrep.dev:

    Add semgrep to a setup.py script and run python setup.py install

    Expected behavior install gracefully

    Screenshots If applicable, add screenshots to help explain your problem.

    What is the priority of the bug to you? NA

    Environment pypi, 0.31.1

  • Can't suppress with // nosemgrep because of autoformatter

    Can't suppress with // nosemgrep because of autoformatter

    I can’t use //nosemgrep because my autoformatter moves it to its own line, where it is ignored by semgrep.

    To reproduce: https://semgrep.dev/s/340G Desired behavior: I'd like all of the examples in the above snippet to be suppressed by the nosemgrep annotation, including:

    • where nosemgrep is on its own line before the target line
    • where nosemgrep is inside the matched range.

    This is currently blocking me from merging my PR.

    cc @mschwager @minusworld curious if you have thoughts

  • update to ocamlformat 0.16.0

    update to ocamlformat 0.16.0

    This is needed if we want to use ppxlib 0.20 (itself needed if we want to use OCaml 4.12.0, itself needed if we don't want errors on macos HomeBrew)

    test plan: pre-commit run --all lint-ocaml

    PR checklist:

    • [x] changelog is up to date
  • [call for ideas] Eliminate stack overflows or troubleshoot them more easily

    [call for ideas] Eliminate stack overflows or troubleshoot them more easily

    In OCaml (native code), stack overflows often result in a segfault rather than a nice Stack_overflow exception. Any practical solution that mitigates this problem is welcome.

    Stack overflows may occur in the normal use of semgrep if the maximum stack size is too small and the input is a bit extreme. Things we can do include:

    • Don't use the stack unless necessary (e.g. don't use Stdlib.List.map).
    • Use ulimit -s to lower the maximum stack size during testing and development; raise the maximum stack size in production.
    • Count the recursive calls to a few functions that are known for occasionally causing stack overflows (and that can't be avoided), and print a warning when passing a threshold.
    • Modify the ocaml runtime so it always raises Stack_overflow on stack overflows.
    • Find some other trick to detect when the stack size approaches the limit.
  • feat!: new pattern operators

    feat!: new pattern operators

    What: You know what they say, the most important thing about a language is its concrete syntax.

    Alright, so this is the big one. This PR concerns the changing of the rule syntax from its previous form to a more standardized, disciplined form, which hopes to be less confusing to users, more intuitive to use, and enable a few features along the way.


    Why: Why should we do this, I hear you asking?

    So, the old syntax is somewhat verbose. Needing to remember concrete syntax such as patterns, pattern-either, including very long names for keys such as metavariable-comparison, metavariable-analysis, etc. Additionally, the necessity of patterns to fall under a pattern key means that we often have quite drastic duplication of pattern, such as underneath a patterns. Moreover, pattern and patterns remain dubiously visually distinct.

    There are a few issues already discussing this change: https://github.com/returntocorp/semgrep/issues/1489 https://github.com/returntocorp/semgrep/issues/5840 https://github.com/returntocorp/semgrep/issues/4647 https://github.com/returntocorp/semgrep/issues/4484 https://github.com/returntocorp/semgrep/issues/5854

    The first is a proposal to switch to this new syntax, the rest are problems that are related to the fact that we do not have it.

    There are two main claims to be had here: one, this syntax will make adoption of Semgrep easier for users. Two, it will add genuine new features which make Semgrep easier/more intuitive to use.


    How:

    A New Syntax We have changed to a new concrete syntax. Notably, there are only a few constructs. An artist's rendition of the new rule syntax's EBNF is pictured below: image It fits on a whiteboard!

    Base Patterns

    Notably, we have inside, not, and, or, and regex as "base" patterns. not can also now take in entire formulas, rather than being constrained to "base patterns". pattern is around too, but now it's pretty much optional.

    This means we can now write rules like the following:

    id: ey-look-an-id
    match:
      and:
      - "a"
      - "b"
    message: ey look a message
    languages:
      - python
    severity: WARNING
    

    The pattern appears underneath a match:, which signals that we are using the new syntax.

    Backwards Compatibility

    We remain fully backwards compatible, so a top-level patterns, pattern, or whatever else was allowed before will trigger a parse via the old syntax, into a common formula language Otherwise, we use the new one.

    pattern Is Optional

    The nice thing is that standalone patterns may directly be given into the list of a combinator pattern, such as and. Previously, you would have to write

    patterns:
    - pattern: "a"
    - pattern: "b"
    - pattern: "c"
    - pattern: "d"
    

    and now you can write

    and:
    - "a"
    - "b"
    - "c"
    - "d"
    

    Where-clauses

    Additionally, we support more explicit modifiers on patterns. Previously, things like focus-metavariable had to be combined with other patterns underneath a patterns. This was effectively the same as just enforcing some constraint on the intersection of the other patterns given to the patterns, but this meant that it had to be explicitly checked for being under a patterns, as well as just generally being harder to intuitively think about.

    Now, we support syntax like the following:

    match:
      and:
      - "a"
      - "b"
      where:
      - comparison: $A > 5
      - focus: $A
      - metavariable: "A"
        pattern: "A"
    

    The where (which must exist in a two-dictionary with some other pattern) now directly modifies the other pattern it is shared with. Additionally, we use terser keys such as comparison and focus. metavariable-comparison also used to demand that you explicitly name what the metavariable is, despite the fact that it is in the comparison, so we eliminated that as well.

    not as a Parent Pattern not is now allowed to take in a pattern, instead of a string to match. This is huge for the expressivity of the rules language, since before we had to deal with pattern-not, pattern-not-inside, and others. Not only is this cumbersome syntactically, but there were some things we could not express, cf. the above issues.

    Migration

    There is also a translation command you can run via sc -translate_rules, which can help users migrate their rulebases to the new syntax automatically. We could also use it to migrate the Registry.


    Concerns:

    • Documentation does not exist yet
    • Translation from old-style rules to new-style rules is strictly programmatic and may result in rules which are more complicated than a human may write.

    Closes #1489

    PR checklist:

    • [ ] Tests included or PR comment includes a reproducible test plan
    • [ ] Documentation is up-to-date (TODO)
    • [ ] changelog.d/<issue>.<type> is a file with the what, why, and how of the change.
      • <issue> is pa-312 (Linear ticket), gh-1234 (GitHub issue), or new-gizmo (unique semantic name)
      • <type> is added, changed, fixed, or infra.
    • [X] Change has no security implications (otherwise, ping security team)
  • How to use metavariable-regex for variable in pattern-not-include to filter some cases ?

    How to use metavariable-regex for variable in pattern-not-include to filter some cases ?

    Describe the bug How to use metavariable-regex for variable in pattern-not-include to filter some cases. For example, useless sources from config file (which I would like to give it a regex pattern) I tried the case below but it does not work and even outputs no result.

    To Reproduce A rule use metavariable-regex for variable in pattern-not-include to check bash injection in go rule:

    rules:
    - id: bash_injection
      languages: [go]
      severity: ERROR
      patterns:
      - pattern-either:
          - pattern: |
              exec.Command("$SHELL", "-c", fmt.Sprintf($F, $ARG, ...), ...)
      - pattern-inside: |
          import "os/exec"
          ...
      - pattern-not-inside: |
          $ARG = "$TEST"
          ...
      - metavariable-regex:
          metavariable: $SHELL
          regex: ^(sh|bash|zsh|dash|cmd|powershell|csh|ksh|vsh)$
      - metavariable-regex:
          metavariable: $TEST
          regex: .*
      message: $SHELL injection
    

    A file, in ShouldMatch, the path is a variable. In ShouldNotMatch, the path is from some useless sources like constant. file:

    package test
    
    import (
    	"fmt"
    	"os/exec"
    )
    
    func ShouldMatch(path string) bool {
    	cmd := "ls %s"
    	out, err := exec.Command("sh", "-c", fmt.Sprintf(cmd, path)).Output()
    	if err != nil {
    		panic(err)
    	}
    	fmt.Println(out)
    	return true
    }
    
    func ShouldNotMatch(path string) bool {
    	path = "whatever"
    	cmd := "ls %s"
    	out, err := exec.Command("sh", "-c", fmt.Sprintf(cmd, path)).Output()
    	if err != nil {
    		panic(err)
    	}
    	fmt.Println(out)
    	return true
    }
    

    Expected behavior Command Injection in ShouldMatch is matched, ShouldNotMatch is not matched

    Screenshots not output image

    Use case Seems using metavariable-regex for pattern-not-include leads to this problem I want to know how to use metavariable-regex for variable in pattern-not-include to filter some cases, not only constant value.

  • ocamlformat on pfff and other things

    ocamlformat on pfff and other things

  • feat(julia): support the julia language

    feat(julia): support the julia language

    image

    Commentary:

    It's probably easier to read the file directly than it is to try and look at the diff.

    The grammar changes included in semgrep-julia were so drastic from the last time I looked at it, that I had to take the previous Parse_julia_tree_sitter.ml I wrote and basically scrap it, because so much of it was now different. As such, perhaps it's best to look at this file with a clear eye.

    I eliminated pretty much every single TODO, in favor of making sacrifices and tradeoffs that would allow us to parse for now, if at the cost of some semantic information. In particular, I've made liberal use of Other, as Julia's implementation means that quite often, things can appear in places that we do not expect.

    In particular, statements and expressions have really no meaning, macro-expressions can appear everywhere an identifier can (making things like macro definitions, imports, etc more difficult), and it's annoying that things which should be patterns are represented in the grammar as just plain expressions. For the latter, I've just injected them into an Other for now. Comments have been left, hopefully clearing up the places that I have had to make these tradeoffs, other than just everywhere an Other appears.

    Test plan:

    ./run-lang julia

    PR checklist:

    • [X] Purpose of the code is evident to future readers
    • [X] Tests included or PR comment includes a reproducible test plan
    • [] Documentation is up-to-date
    • [X] A changelog entry was added to changelog.d for any user-facing change
    • [X] Change has no security implications (otherwise, ping security team)

    If you're unsure on any of this, please see:

  • discussion: pattern-inside-taint and reviving taint patterns

    discussion: pattern-inside-taint and reviving taint patterns

    This is not a contribution yet, just a demo implementation of #3801 pattern inside a control flow, starting from the stale #5993.

    Tests successfully in three languages, but it breaks many other tests, so it would need a flag (or a where?)

    Instead of returning the range of the sink, it returns the range from the start of the source to the end of the sink.

    I implemented pattern-inside-taint a year ago, but only found time now to catch up with all the exciting changes and hopefully contribute back.

    Is there any interest in reviving taint patterns #5993? As soon as I saw the whiteboard in new pattern operators #5916, I realized that others had implemented this feature in a much better way than I had. 😉

  • Unescaped `$` in PySpark file causes Parse Error while Creating report

    Unescaped `$` in PySpark file causes Parse Error while Creating report

    Source issue

    https://gitlab.com/gitlab-org/gitlab/-/issues/373113

    Describe the bug

    GitLab Customer reported that their Semgrep SAST job fails while "Creating report". They discovered this was reproducible if the .py file contained PySpark syntax using $ character that is not escaped (\$) or is not inside a string. They isolated the part of the file that was causing the issue.

    The fatal error seems to be caused upstream in semgrep.

    Update

    For what it's worth, newer versions of the Semgrep analyzer won't fatally crash when encountering syntax errors like these -- they'll fail with a partial scan and produce an empty report instead. Below are the logs from running Semgrep locally against the file in Step 2:

    [INFO] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/command.go:76] ▶ GitLab Semgrep analyzer v3.10.0
    [DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ ANALYZER_TARGET_DIR,CI_PROJECT_DIR=/tmp/app
    [DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ ANALYZER_ARTIFACT_DIR,CI_PROJECT_DIR=/tmp/app
    [DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ ANALYZER_INDENT_REPORT=false
    [DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ ANALYZER_OPTIMIZE_REPORT=true
    [DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ ADDITIONAL_CA_CERT_BUNDLE=
    [DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ SEARCH_IGNORED_DIRS=bundle,node_modules,vendor,tmp,test,tests
    [DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ SEARCH_IGNORE_HIDDEN_DIRS=true
    [DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ SEARCH_MAX_DEPTH=2
    [DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ SAST_SEMGREP_METRICS=true
    [DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ SAST_EXPERIMENTAL_FEATURES=false
    [DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ SAST_EXCLUDED_PATHS=
    [DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ SAST_SCANNER_ALLOWED_CLI_OPTS=
    [DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ SAST_EXCLUDED_PATHS=
    [INFO] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:131] ▶ Detecting project
    [INFO] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:153] ▶ Analyzer will attempt to analyze all projects in the repository
    [INFO] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:165] ▶ Running analyzer
    [DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/src/buildapp/analyze.go:90] ▶ custom rulesets not enabled
    [DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/src/buildapp/analyze.go:128] ▶ /usr/local/bin/semgrep -f /rules -o /tmp/app/semgrep.sarif --sarif --no-rewrite-rule-ids --strict --disable-version-check --no-git-ignore --enable-metrics
    [DEBU] [Semgrep] [2022-12-16T05:44:25Z] [/go/src/buildapp/analyze.go:137] ▶ METRICS: Using configs from the Registry (like --config=p/ci) reports pseudonymous rule metrics to semgrep.dev.
    To disable Registry rule metrics, use "--metrics=off".
    Using configs only from local files (like --config=xyz.yml) does not enable metrics.
    
    More information: https://semgrep.dev/docs/metrics
    
    Scanning 1 file with 91 python rules.
    
    Some files were skipped or only partially analyzed.
      Partially scanned: 1 files only partially analyzed due to a parsing or internal Semgrep error
    
    Ran 311 rules on 1 file: 0 findings.
    
    [INFO] [Semgrep] [2022-12-16T05:44:25Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:179] ▶ Creating report
    [DEBU] [Semgrep] [2022-12-16T05:44:25Z] [/go/src/buildapp/convert.go:40] ▶ Converting report with the root path: /tmp/app
    [WARN] [Semgrep] [2022-12-16T05:44:26Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/report/[email protected]/sarif.go:344] ▶ tool notification warning: Syntax error Syntax error at line test.py:2:
     `$` was unexpected
    [DEBU] [Semgrep] [2022-12-16T05:44:26Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/report/[email protected]/report.go:212] ▶ custom rulesets not enabled
    [DEBU] [Semgrep] [2022-12-16T05:44:26Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/report/[email protected]/report.go:254] ▶ Applying report overrides
    [DEBU] [Semgrep] [2022-12-16T05:44:26Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/report/[email protected]/report.go:260] ▶ custom rulesets not enabled
    [DEBU] [Semgrep] [2022-12-16T05:44:26Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/jsonout.go:54] ▶ Optimizing JSON Output
    

    This updated behavior is similar to https://gitlab.com/gitlab-org/gitlab/-/issues/374551

    To Reproduce

    1. Create a new GitLab project https://gitlab.com/projects/new

    2. Create a test.py file with the following content:

    savetable \
        .withColumn('ttlinminutes',lit($ttl$)) \
        .write.mode('append').format('orc').insertInto('$customSchema$.all_refreshes')
    
    1. Create a .gitlab-ci.yml with the following content:
    include:
      - template: Security/SAST.gitlab-ci.yml
    
    1. Change the test.py so that line 2 has escaped $ characters (\$), example below:
    savetable \
        .withColumn('ttlinminutes',lit(\$ttl\$)) \
        .write.mode('append').format('orc').insertInto('$customSchema$.all_refreshes')
    

    Expected behavior

    Step 3: Semgrep job runs, creates a report, uploading report as artifact

    Step 4: Semgrep job runs, creates a report, uploading report as artifact

    What is the priority of the bug to you?

    • [x] P0: blocking your adoption of Semgrep or workflow
    • [ ] P1: important to fix or quite annoying
    • [ ] P2: regular bug that should get fixed

    Environment Docker Use case Unescaped $ in PySpark file will not cause error while Creating report

  • cli: don't retry non-HTTP exceptions + provide a more descriptive error for SSL cert hostname mismatches

    cli: don't retry non-HTTP exceptions + provide a more descriptive error for SSL cert hostname mismatches

    This PR does two things to make things easier for folks behind intercepting proxies:

    1. Add more context to the CertificateError raised in the case of an SSL host mismatch
    2. Don't retry non-HTTP exceptions

    Before:

    semgrep % SEMGREP_URL=https://wrong.host.badssl.com/ semgrep --config auto
    Semgrep rule registry URL is https://wrong.host.badssl.com/.
    [ERROR] Failed to download config from https://wrong.host.badssl.com/c/auto: HTTPSConnectionPool(host='wrong.host.badssl.com',
    port=443): Max retries exceeded with url: /c/auto (Caused by SSLError(CertificateError("hostname 'wrong.host.badssl.com'
    doesn't match either of '*.badssl.com', 'badssl.com'")))
    [ERROR] invalid configuration file found (1 configs were invalid)
    

    After:

    cli % SEMGREP_URL=https://wrong.host.badssl.com/ pipenv run python src/semgrep/__main__.py --config auto
    Semgrep rule registry URL is https://wrong.host.badssl.com/.
    [ERROR] Failed to download config from https://wrong.host.badssl.com/c/auto: SSL certificate error: hostname
    'wrong.host.badssl.com' doesn't match either of '*.badssl.com', 'badssl.com'. This error typically occurs when your
    internet traffic is being routed through a proxy. If this is the case, try setting the REQUESTS_CA_BUNDLE environment
    variable to the location of your proxy's CA certificate.
    [ERROR] invalid configuration file found (1 configs were invalid)
    

    Open to feedback re: the error text!

    PR checklist:

    • [ ] Purpose of the code is evident to future readers
    • [ ] Tests included or PR comment includes a reproducible test plan
    • [ ] Documentation is up-to-date
    • [ ] A changelog entry was added to changelog.d for any user-facing change
    • [ ] Change has no security implications (otherwise, ping security team)

    If you're unsure on any of this, please see:

The dynamic infrastructure framework for everybody! Distribute the workload of many different scanning tools with ease, including nmap, ffuf, masscan, nuclei, meg and many more!
The dynamic infrastructure framework for everybody! Distribute the workload of many different scanning tools with ease, including nmap, ffuf, masscan, nuclei, meg and many more!

Axiom is a dynamic infrastructure framework to efficiently work with multi-cloud environments, build and deploy repeatable infrastructure focussed on

Dec 30, 2022
🔎 Help find Trojan Source vulnerability in code 👀 . Useful for code review in project with multiple collaborators

TrojanSourceFinder TrojanSourceFinder helps developers detect "Trojan Source" vulnerability in source code. Trojan Source vulnerability allows an atta

Nov 9, 2022
Static binary analysis tool to compute shared strings references between binaries and output in JSON, YAML and YARA

StrTwins StrTwins is a binary analysis tool, powered by radare, that is capable to find shared code string references between executables and output i

May 3, 2022
Proto-find is a tool for researchers that lets you find client side prototype pollution vulnerability.

proto-find proto-find is a tool for researchers that lets you find client side prototype pollution vulnerability. How it works proto-find open URL in

Dec 6, 2022
Look for JAR files that vulnerable to Log4j RCE (CVE‐2021‐44228)
Look for JAR files that vulnerable to Log4j RCE (CVE‐2021‐44228)

Look4jar Look for JAR files that vulnerable to Log4j RCE (CVE‐2021‐44228) Objectives It differs from some other tools that scan for vulnerable remote

Dec 25, 2022
DockerSlim (docker-slim): Don't change anything in your Docker container image and minify it by up to 30x (and for compiled languages even more) making it secure too! (free and open source)
DockerSlim (docker-slim): Don't change anything in your Docker container image and minify it by up to 30x (and for compiled languages even more) making it secure too! (free and open source)

Minify and Secure Docker containers (free and open source!) Don't change anything in your Docker container image and minify it by up to 30x making it

Dec 27, 2022
AI-Powered Code Reviews for Best Practices & Security Issues Across Languages
AI-Powered Code Reviews for Best Practices & Security Issues Across Languages

AI-CodeWise ?? AI-Powered Code Reviews for Best Practices & Security Issues Across Languages AI-CodeWise GitHub Action: Your AI-powered Code Reviewer!

May 11, 2023
A collection of cool tools used by Mobile hackers. Happy hacking , Happy bug-hunting
A collection of cool tools used by Mobile hackers. Happy hacking , Happy bug-hunting

A collection of cool tools used by Mobile hackers. Happy hacking , Happy bug-hunting Family project Table of Contents Weapons Contribute Thanks to con

Jan 3, 2023
A fast port scanner written in go with a focus on reliability and simplicity. Designed to be used in combination with other tools for attack surface discovery in bug bounties and pentests
A fast port scanner written in go with a focus on reliability and simplicity. Designed to be used in combination with other tools for attack surface discovery in bug bounties and pentests

Naabu is a port scanning tool written in Go that allows you to enumerate valid ports for hosts in a fast and reliable manner. It is a really simple to

Dec 31, 2022
log4jshell vulnerability scanner for bug bounty
log4jshell vulnerability scanner for bug bounty

log4shell-looker a log4jshell vulnerability scanner for bug bounty (Written in G

Dec 10, 2022
Auto scan log4j bug with excel of server list

Log4JCheck Auto scan log4j bug with excel of server list. Please read https://ww

Dec 24, 2021
Ah shhgit! Find secrets in your code. Secrets detection for your GitHub, GitLab and Bitbucket repositories: www.shhgit.com
Ah shhgit! Find secrets in your code. Secrets detection for your GitHub, GitLab and Bitbucket repositories: www.shhgit.com

shhgit helps secure forward-thinking development, operations, and security teams by finding secrets across their code before it leads to a security br

Dec 23, 2022
A Go Library For Generating Random, Rule Based Passwords. Many Random, Much Secure.
A Go Library For Generating Random, Rule Based Passwords. Many Random, Much Secure.

Can Haz Password? A Go library for generating random, rule based passwords. Many random, much secure. Features Randomized password length (bounded). T

Dec 6, 2021
Gofrette is a reverse shell payload developed in Golang that bypasses Windows defender and many others anti-virus.
Gofrette is a reverse shell payload developed in Golang that bypasses Windows defender and many others anti-virus.

Gofrette Gofrette is a reverse shell payload developed in Golang that bypasses Windows defender and many others anti-virus.

Dec 14, 2022
PHP functions implementation to Golang. This package is for the Go beginners who have developed PHP code before. You can use PHP like functions in your app, module etc. when you add this module to your project.

PHP Functions for Golang - phpfuncs PHP functions implementation to Golang. This package is for the Go beginners who have developed PHP code before. Y

Dec 30, 2022
🌘🦊 DalFox(Finder Of XSS) / Parameter Analysis and XSS Scanning tool based on golang
🌘🦊 DalFox(Finder Of XSS) / Parameter Analysis and XSS Scanning tool based on golang

Finder Of XSS, and Dal(달) is the Korean pronunciation of moon. What is DalFox ?? ?? DalFox is a fast, powerful parameter analysis and XSS scanner, bas

Jan 5, 2023
EarlyBird is a sensitive data detection tool capable of scanning source code repositories for clear text password violations, PII, outdated cryptography methods, key files and more.
EarlyBird is a sensitive data detection tool capable of scanning source code repositories for clear text password violations, PII, outdated cryptography methods, key files and more.

EarlyBird is a sensitive data detection tool capable of scanning source code repositories for clear text password violations, PII, outdated cryptograp

Dec 10, 2022
Find secrets and passwords in container images and file systems
Find secrets and passwords in container images and file systems

Find secrets and passwords in container images and file systems

Jan 1, 2023
SSRFuzz is a tool to find Server Side Request Forgery vulnerabilities, with CRLF chaining capabilities

SSRFuzz is a tool to find Server Side Request Forgery vulnerabilities, with CRLF chaining capabilities Why?

Dec 8, 2022