Lightweight static analysis for many languages. Find bug variants with patterns that look like source code.

Last update: Jan 9, 2023

Comments: 17

Lightweight static analysis for many languages.
Find bugs and enforce code standards.

Semgrep is a fast, open-source, static analysis tool that finds bugs and enforces code standards at editor, commit, and CI time. Precise rules look like the code you’re searching; no more traversing abstract syntax trees, wrestling with regexes, or using a painful DSL. Code analysis is performed locally (code is not uploaded) and Semgrep runs on uncompiled code.

The Semgrep Registry has 1,000+ rules written by the Semgrep community covering security, correctness, and performance bugs. No need to DIY unless you want to.

Semgrep is used in production everywhere from one-person startups to multi-billion dollar companies; it’s the engine inside tools like NodeJsScan. See tools powered by Semgrep.

Semgrep is developed and commercially supported by r2c, a software security company. r2c’s hosted service, Semgrep App, lets organizations easily deploy in CI, manage rules across many projects, monitor the efficacy of code policy, and integrate with 3rd-party services. r2c offers free and paid hosted tiers (see pricing).

Language support

General availability

Go · Java · JavaScript · JSX · JSON · Python · Ruby · TypeScript · TSX

Beta & experimental

See supported languages for the complete list.

Getting started

To install Semgrep use Homebrew or pip, or run without installation via Docker:

# For macOS
$ brew install semgrep

# For Ubuntu/WSL/Linux/macOS
$ python3 -m pip install semgrep

# To try Semgrep without installation run via Docker
$ docker run --rm -v "${PWD}:/src" returntocorp/semgrep --help

Once installed, Semgrep can run with single rules or entire rulesets. Visit Running rules to learn more or try the following:

# Check for Python == where the left and right hand sides are the same (often a bug)
$ semgrep -e '$X == $X' --lang=py path/to/src

# Run the r2c-ci ruleset (with rules for many languages) on your own code!
$ semgrep --config=p/r2c-ci path/to/src

Visit Getting started to learn more.

Rule examples

Visit Rule examples for use cases and ideas. There is also an excellent interactive tutorial.

Use case	Semgrep rule
Ban dangerous APIs	Prevent use of exec
Search routes and authentication	Extract Spring routes
Enforce the use secure defaults	Securely set Flask cookies
Enforce project best-practices	Use assertEqual for == checks, Always check subprocess calls
Codify project-specific knowledge	Verify transactions before making them
Audit security hotspots	Finding XSS in Apache Airflow, Hardcoded credentials
Audit configuration files	Find S3 ARN uses
Migrate from deprecated APIs	DES is deprecated, Deprecated Flask APIs, Deprecated Bokeh APIs
Apply automatic fixes	Use listenAndServeTLS

Integrations

Visit Integrations to learn about Semgrep editor, commit, and CI integrations. When integrated into CI and configured to scan pull requests, Semgrep will only report issues introduced by that pull request; this lets you start using Semgrep without fixing or ignoring pre-existing issues!

Upgrading

To upgrade, run the command below associated with how you installed Semgrep:

# Using Homebrew
$ brew upgrade semgrep

# Using pip
$ python3 -m pip install --upgrade semgrep

# Using Docker
$ docker pull returntocorp/semgrep:latest

Owner

r2c

https://github.com/returntocorp/semgrep https://semgrep.dev

Comments

C# support

I'm opening a ticket to indicate interest in support for the C# Language. I can help with developing it if you can point me how to start and where to look.
Running semgrep on Apple M1 ARM system

The new Apple M1 chips are ARM-based. Currently, our OCaml binaries are compiled for x86-64. It seems the new M1 chips have x86 emulation under something called Rosetta 2, so things may Just Work. Regardless, we should test things out to ensure they do.

Ability to exclude rules from rulesets on command line

Is your feature request related to a problem? Please describe. I want to explore additional rulesets to add to my semgrep scans in CI. Before enabling a new ruleset in CI, however, I want to see the results on the whole codebase. So I just came across p/r2c-bug-scan that I had not noticed before. I have a convenience entry in my makefile that sort of approximates the .semgrepignore list that CI honors, like so:

SEMGREP_IGNORE := --exclude third_party --exclude *_test.go --exclude *.pb.go --exclude *.pb.*.go --exclude *.bindata.go --exclude *.spec.ts --exclude coverage
semgrep-test/%:
	semgrep --config "p/$(@F)" $(SEMGREP_IGNORE)

Thus, from the command-line I can run this:

make semgrep-test/r2c-bug-scan

That gave me 38 findings, but most of them were a single rule that does not really seem useful:

components/compliance-service/inspec/cli.go
severity:warning rule:go.lang.correctness.permissions.file_permission.incorrect-default-permission: Expect permissions to be `0600` or less for ioutil.WriteFile()
61:	err = ioutil.WriteFile(filename, content, 0644)

components/compliance-service/reporting/util/zip.go
severity:warning rule:go.lang.correctness.permissions.file_permission.incorrect-default-permission: Expect permissions to be `0600` or less for os.Chmod()
121:			err = os.Chmod(name, 0777)

components/config-mgmt-service/cmd/chef-automate-collect/commands/automate_config.go
severity:warning rule:go.lang.correctness.permissions.file_permission.incorrect-default-permission: Expect permissions to be `0600` or less for ioutil.WriteFile()
396:		err := os.Mkdir(configDir, 0700)

etc...

As I said, that is not useful to me, so I know that when I add the ruleset to my policy on the semgrep dashboard, I will exclude that rule.

Describe the solution you'd like Which brings us to my point here--would be great if I could do the same thing on the command-line, so I could filter out that rule and run my "evaluation scan" again, something like this:

SEMGREP_IGNORE_FILES := --exclude third_party --exclude *_test.go --exclude *.pb.go --exclude *.pb.*.go --exclude *.bindata.go --exclude *.spec.ts --exclude coverage
SEMGREP_IGNORE_RULES := --exclude-rule go.lang.correctness.permissions.file_permission.incorrect-default-permission
semgrep-test/%:
	semgrep --config "p/$(@F)" $(SEMGREP_IGNORE_FILES) $(SEMGREP_IGNORE_RULES)

(No need to be fluent in make -- the request here is simply to provide an --exclude-rule command-line option.)

Semgrep 0.15.0 performance issues
Describe the bug

$ time semgrep -f ~/Code/njsscan/njsscan/rules/semantic_grep/ ~/Downloads/juice-shop-master/ running 105 rules... 10%|███████████████▍ |11/105

More than 15 minutes and still waiting.

Rules: https://github.com/ajinabraham/njsscan/tree/master/njsscan/rules/semantic_grep Source: https://github.com/bkimminich/juice-shop

Enhancement

It would be nice to add an end to end test suite with semgrep that runs before release updates on real apps to catch performance bugs.

Environment

$ semgrep --version 0.15.0
Rules with "too many matches" generate confusing "Timeout" error

Describe the bug I am using semgrep in Azure Pipeline for PR analysis. Semgrep is getting timedout for a small JS file (~6K lines) in 5 seconds. I have seen multiple threads related to timeouts in JS but did not find any conclusion or plan of action. I have tried the command locally with only this file but no luck. Is there other workaround or any other option that I could use to get a scan no matter the time it takes?

I have seen the use of -max_match_per_file flag being used for semgrep-core but not part of the CLI. Is it possible to use this with CLI? Will this help with time outs?

To Reproduce Unfortunately, I can not share the JS file. Here is the command. The failing rule is javascript.browser.security.raw-html-concat.raw-html-concat

semgrep --config "p/clientside-js" --timeout 0 --error --verbose --disable-version-check --include "********/application.js" --debug --strict

Expected behaviour I expect the scan to complete as I've given timeout is 0.

What is the priority of the bug to you? P0 - we are using semgrep to provide a report of secure coding practices to a client but since this single is not getting scanned we are unable to use semgrep.

Environment Python package - 0.37.0

Very high core time (Windows)

I am trying to troubleshoot why the execution on my codebase is so slow.

Im running returntocorp/semgrep docker image on windows.

============================[ summary ]============================
Total time: 251.3368s Config time: 2.9130s Core time: 248.4213s

Semgrep-core time:
Total CPU time: 8.5493s  File parse time: 4.5128s  Rule parse time: 0.3529s  Match time: 3.0912s
Slowest 5/192 files
...abc.ts ( 58KB): 0.014s (0.204s to parse)
...qwe.ts ( 51KB): 0.013s (0.169s to parse)
...123.ts   ( 30KB): 1.082s (0.156s to parse)
...dummy.ts ( 37KB): 0.009s (0.146s to parse)
...other.ts ( 30KB): 0.009s (0.122s to parse)
Slowest 5 rules to match
...ort.js-node.using-http-server.using-http-server:         0.000s
...transport.js-node.telnet-request.telnet-request:         0.000s
...st-http-client-support.rest-http-client-support:         0.002s
...ure-transport.js-node.http-request.http-request:         0.000s
...ecure-transport.js-node.ftp-request.ftp-request:         0.000s

the reported times are not really adding up to the total time of 250seconds.

Any tips?

semgrep thows exception on python setup.py install

Describe the bug A clear and concise description of what the bug is.

Processing semgrep-0.31.1-cp36.cp37.cp38.py36.py37.py38-none-macosx_10_14_x86_64.whl
Installing semgrep-0.31.1-cp36.cp37.cp38.py36.py37.py38-none-macosx_10_14_x86_64.whl to /Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages
Adding semgrep 0.31.1 to easy-install.pth file
Traceback (most recent call last):
  File "setup.py", line 56, in <module>
    install_requires=requirements,
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/__init__.py", line 144, in setup
    return distutils.core.setup(**attrs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/install.py", line 67, in run
    self.do_egg_install()
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/install.py", line 117, in do_egg_install
    cmd.run(show_deprecation=False)
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/easy_install.py", line 424, in run
    self.easy_install(spec, not self.no_deps)
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/easy_install.py", line 673, in easy_install
    return self.install_item(None, spec, tmpdir, deps, True)
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/easy_install.py", line 720, in install_item
    self.process_distribution(spec, dist, deps)
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/easy_install.py", line 765, in process_distribution
    [requirement], self.local_index, self.easy_install
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/pkg_resources/__init__.py", line 783, in resolve
    replace_conflicting=replace_conflicting
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1066, in best_match
    return self.obtain(req, installer)
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1078, in obtain
    return installer(requirement)
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/easy_install.py", line 692, in easy_install
    return self.install_item(spec, dist.location, tmpdir, deps)
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/easy_install.py", line 720, in install_item
    self.process_distribution(spec, dist, deps)
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/easy_install.py", line 745, in process_distribution
    self.install_egg_scripts(dist)
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/easy_install.py", line 619, in install_egg_scripts
    dist.get_metadata('scripts/' + script_name)
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1426, in get_metadata
    return value.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 0: invalid continuation byte in scripts/spacegrep file at path: /Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/semgrep-0.31.1-py3.7-macosx-10.9-x86_64.egg/EGG-INFO/scripts/spacegrep

To Reproduce Steps to reproduce the behavior, ideally a link to https://semgrep.dev:

Add semgrep to a setup.py script and run python setup.py install

Expected behavior install gracefully

Screenshots If applicable, add screenshots to help explain your problem.

What is the priority of the bug to you? NA

Environment pypi, 0.31.1

Can't suppress with // nosemgrep because of autoformatter
I can’t use //nosemgrep because my autoformatter moves it to its own line, where it is ignored by semgrep.

To reproduce: https://semgrep.dev/s/340G Desired behavior: I'd like all of the examples in the above snippet to be suppressed by the nosemgrep annotation, including:

where nosemgrep is on its own line before the target line

where nosemgrep is inside the matched range.

This is currently blocking me from merging my PR.

cc @mschwager @minusworld curious if you have thoughts
update to ocamlformat 0.16.0
This is needed if we want to use ppxlib 0.20 (itself needed if we want to use OCaml 4.12.0, itself needed if we don't want errors on macos HomeBrew)

test plan: pre-commit run --all lint-ocaml

PR checklist:

[x] changelog is up to date
[call for ideas] Eliminate stack overflows or troubleshoot them more easily
In OCaml (native code), stack overflows often result in a segfault rather than a nice Stack_overflow exception. Any practical solution that mitigates this problem is welcome.

Stack overflows may occur in the normal use of semgrep if the maximum stack size is too small and the input is a bit extreme. Things we can do include:

Don't use the stack unless necessary (e.g. don't use Stdlib.List.map).

Use ulimit -s to lower the maximum stack size during testing and development; raise the maximum stack size in production.

Count the recursive calls to a few functions that are known for occasionally causing stack overflows (and that can't be avoided), and print a warning when passing a threshold.

Modify the ocaml runtime so it always raises Stack_overflow on stack overflows.

Find some other trick to detect when the stack size approaches the limit.
feat!: new pattern operators
What: You know what they say, the most important thing about a language is its concrete syntax.

Alright, so this is the big one. This PR concerns the changing of the rule syntax from its previous form to a more standardized, disciplined form, which hopes to be less confusing to users, more intuitive to use, and enable a few features along the way.

Why: Why should we do this, I hear you asking?

So, the old syntax is somewhat verbose. Needing to remember concrete syntax such as patterns, pattern-either, including very long names for keys such as metavariable-comparison, metavariable-analysis, etc. Additionally, the necessity of patterns to fall under a pattern key means that we often have quite drastic duplication of pattern, such as underneath a patterns. Moreover, pattern and patterns remain dubiously visually distinct.

There are a few issues already discussing this change: https://github.com/returntocorp/semgrep/issues/1489 https://github.com/returntocorp/semgrep/issues/5840 https://github.com/returntocorp/semgrep/issues/4647 https://github.com/returntocorp/semgrep/issues/4484 https://github.com/returntocorp/semgrep/issues/5854

The first is a proposal to switch to this new syntax, the rest are problems that are related to the fact that we do not have it.

There are two main claims to be had here: one, this syntax will make adoption of Semgrep easier for users. Two, it will add genuine new features which make Semgrep easier/more intuitive to use.

How:

A New Syntax We have changed to a new concrete syntax. Notably, there are only a few constructs. An artist's rendition of the new rule syntax's EBNF is pictured below: It fits on a whiteboard!

Base Patterns

Notably, we have inside, not, and, or, and regex as "base" patterns. not can also now take in entire formulas, rather than being constrained to "base patterns". pattern is around too, but now it's pretty much optional.

This means we can now write rules like the following:

id: ey-look-an-id match: and: - "a" - "b" message: ey look a message languages: - python severity: WARNING

The pattern appears underneath a match:, which signals that we are using the new syntax.

Backwards Compatibility

We remain fully backwards compatible, so a top-level patterns, pattern, or whatever else was allowed before will trigger a parse via the old syntax, into a common formula language Otherwise, we use the new one.

pattern Is Optional

The nice thing is that standalone patterns may directly be given into the list of a combinator pattern, such as and. Previously, you would have to write

patterns: - pattern: "a" - pattern: "b" - pattern: "c" - pattern: "d"

and now you can write

and: - "a" - "b" - "c" - "d"

Where-clauses

Additionally, we support more explicit modifiers on patterns. Previously, things like focus-metavariable had to be combined with other patterns underneath a patterns. This was effectively the same as just enforcing some constraint on the intersection of the other patterns given to the patterns, but this meant that it had to be explicitly checked for being under a patterns, as well as just generally being harder to intuitively think about.

Now, we support syntax like the following:

match: and: - "a" - "b" where: - comparison: $A > 5 - focus: $A - metavariable: "A" pattern: "A"

The where (which must exist in a two-dictionary with some other pattern) now directly modifies the other pattern it is shared with. Additionally, we use terser keys such as comparison and focus. metavariable-comparison also used to demand that you explicitly name what the metavariable is, despite the fact that it is in the comparison, so we eliminated that as well.

not as a Parent Pattern not is now allowed to take in a pattern, instead of a string to match. This is huge for the expressivity of the rules language, since before we had to deal with pattern-not, pattern-not-inside, and others. Not only is this cumbersome syntactically, but there were some things we could not express, cf. the above issues.

Migration

There is also a translation command you can run via sc -translate_rules, which can help users migrate their rulebases to the new syntax automatically. We could also use it to migrate the Registry.

Concerns:

Documentation does not exist yet

Translation from old-style rules to new-style rules is strictly programmatic and may result in rules which are more complicated than a human may write.

Closes #1489

PR checklist:

[ ] Tests included or PR comment includes a reproducible test plan

[ ] Documentation is up-to-date (TODO)

[ ] changelog.d/<issue>.<type> is a file with the what, why, and how of the change.

<issue> is pa-312 (Linear ticket), gh-1234 (GitHub issue), or new-gizmo (unique semantic name)

<type> is added, changed, fixed, or infra.

[X] Change has no security implications (otherwise, ping security team)

How to use metavariable-regex for variable in pattern-not-include to filter some cases ?

Describe the bug How to use metavariable-regex for variable in pattern-not-include to filter some cases. For example, useless sources from config file (which I would like to give it a regex pattern) I tried the case below but it does not work and even outputs no result.

To Reproduce A rule use metavariable-regex for variable in pattern-not-include to check bash injection in go rule:

rules:
- id: bash_injection
  languages: [go]
  severity: ERROR
  patterns:
  - pattern-either:
      - pattern: |
          exec.Command("$SHELL", "-c", fmt.Sprintf($F, $ARG, ...), ...)
  - pattern-inside: |
      import "os/exec"
      ...
  - pattern-not-inside: |
      $ARG = "$TEST"
      ...
  - metavariable-regex:
      metavariable: $SHELL
      regex: ^(sh|bash|zsh|dash|cmd|powershell|csh|ksh|vsh)$
  - metavariable-regex:
      metavariable: $TEST
      regex: .*
  message: $SHELL injection

A file, in ShouldMatch, the path is a variable. In ShouldNotMatch, the path is from some useless sources like constant. file:

package test

import (
	"fmt"
	"os/exec"
)

func ShouldMatch(path string) bool {
	cmd := "ls %s"
	out, err := exec.Command("sh", "-c", fmt.Sprintf(cmd, path)).Output()
	if err != nil {
		panic(err)
	}
	fmt.Println(out)
	return true
}

func ShouldNotMatch(path string) bool {
	path = "whatever"
	cmd := "ls %s"
	out, err := exec.Command("sh", "-c", fmt.Sprintf(cmd, path)).Output()
	if err != nil {
		panic(err)
	}
	fmt.Println(out)
	return true
}

Expected behavior Command Injection in ShouldMatch is matched, ShouldNotMatch is not matched

Screenshots not output

Use case Seems using metavariable-regex for pattern-not-include leads to this problem I want to know how to use metavariable-regex for variable in pattern-not-include to filter some cases, not only constant value.

ocamlformat on pfff and other things
PR checklist:

[ ] Purpose of the code is evident to future readers

[ ] Tests included or PR comment includes a reproducible test plan

[ ] Documentation is up-to-date

[ ] A changelog entry was added to changelog.d for any user-facing change

[ ] Change has no security implications (otherwise, ping security team)

If you're unsure on any of this, please see:

Contribution guidelines!

One of the more specific guides located here
feat(julia): support the julia language
Commentary:

It's probably easier to read the file directly than it is to try and look at the diff.

The grammar changes included in semgrep-julia were so drastic from the last time I looked at it, that I had to take the previous Parse_julia_tree_sitter.ml I wrote and basically scrap it, because so much of it was now different. As such, perhaps it's best to look at this file with a clear eye.

I eliminated pretty much every single TODO, in favor of making sacrifices and tradeoffs that would allow us to parse for now, if at the cost of some semantic information. In particular, I've made liberal use of Other, as Julia's implementation means that quite often, things can appear in places that we do not expect.

In particular, statements and expressions have really no meaning, macro-expressions can appear everywhere an identifier can (making things like macro definitions, imports, etc more difficult), and it's annoying that things which should be patterns are represented in the grammar as just plain expressions. For the latter, I've just injected them into an Other for now. Comments have been left, hopefully clearing up the places that I have had to make these tradeoffs, other than just everywhere an Other appears.

Test plan:

./run-lang julia

PR checklist:

[X] Purpose of the code is evident to future readers

[X] Tests included or PR comment includes a reproducible test plan

[] Documentation is up-to-date

[X] A changelog entry was added to changelog.d for any user-facing change

[X] Change has no security implications (otherwise, ping security team)

If you're unsure on any of this, please see:

Contribution guidelines!

One of the more specific guides located here
discussion: pattern-inside-taint and reviving taint patterns

This is not a contribution yet, just a demo implementation of #3801 pattern inside a control flow, starting from the stale #5993.

Tests successfully in three languages, but it breaks many other tests, so it would need a flag (or a where?)

Instead of returning the range of the sink, it returns the range from the start of the source to the end of the sink.

I implemented pattern-inside-taint a year ago, but only found time now to catch up with all the exciting changes and hopefully contribute back.

Is there any interest in reviving taint patterns #5993? As soon as I saw the whiteboard in new pattern operators #5916, I realized that others had implemented this feature in a much better way than I had. 😉

Unescaped `$` in PySpark file causes Parse Error while Creating report

Source issue

https://gitlab.com/gitlab-org/gitlab/-/issues/373113

Describe the bug

GitLab Customer reported that their Semgrep SAST job fails while "Creating report". They discovered this was reproducible if the .py file contained PySpark syntax using $ character that is not escaped (\$) or is not inside a string. They isolated the part of the file that was causing the issue.

The fatal error seems to be caused upstream in semgrep.

Update

For what it's worth, newer versions of the Semgrep analyzer won't fatally crash when encountering syntax errors like these -- they'll fail with a partial scan and produce an empty report instead. Below are the logs from running Semgrep locally against the file in Step 2:

[INFO] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/command.go:76] ▶ GitLab Semgrep analyzer v3.10.0
[DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ ANALYZER_TARGET_DIR,CI_PROJECT_DIR=/tmp/app
[DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ ANALYZER_ARTIFACT_DIR,CI_PROJECT_DIR=/tmp/app
[DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ ANALYZER_INDENT_REPORT=false
[DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ ANALYZER_OPTIMIZE_REPORT=true
[DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ ADDITIONAL_CA_CERT_BUNDLE=
[DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ SEARCH_IGNORED_DIRS=bundle,node_modules,vendor,tmp,test,tests
[DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ SEARCH_IGNORE_HIDDEN_DIRS=true
[DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ SEARCH_MAX_DEPTH=2
[DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ SAST_SEMGREP_METRICS=true
[DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ SAST_EXPERIMENTAL_FEATURES=false
[DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ SAST_EXCLUDED_PATHS=
[DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ SAST_SCANNER_ALLOWED_CLI_OPTS=
[DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:256] ▶ SAST_EXCLUDED_PATHS=
[INFO] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:131] ▶ Detecting project
[INFO] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:153] ▶ Analyzer will attempt to analyze all projects in the repository
[INFO] [Semgrep] [2022-12-16T05:43:48Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:165] ▶ Running analyzer
[DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/src/buildapp/analyze.go:90] ▶ custom rulesets not enabled
[DEBU] [Semgrep] [2022-12-16T05:43:48Z] [/go/src/buildapp/analyze.go:128] ▶ /usr/local/bin/semgrep -f /rules -o /tmp/app/semgrep.sarif --sarif --no-rewrite-rule-ids --strict --disable-version-check --no-git-ignore --enable-metrics
[DEBU] [Semgrep] [2022-12-16T05:44:25Z] [/go/src/buildapp/analyze.go:137] ▶ METRICS: Using configs from the Registry (like --config=p/ci) reports pseudonymous rule metrics to semgrep.dev.
To disable Registry rule metrics, use "--metrics=off".
Using configs only from local files (like --config=xyz.yml) does not enable metrics.

More information: https://semgrep.dev/docs/metrics

Scanning 1 file with 91 python rules.

Some files were skipped or only partially analyzed.
  Partially scanned: 1 files only partially analyzed due to a parsing or internal Semgrep error

Ran 311 rules on 1 file: 0 findings.

[INFO] [Semgrep] [2022-12-16T05:44:25Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/run.go:179] ▶ Creating report
[DEBU] [Semgrep] [2022-12-16T05:44:25Z] [/go/src/buildapp/convert.go:40] ▶ Converting report with the root path: /tmp/app
[WARN] [Semgrep] [2022-12-16T05:44:26Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/report/[email protected]/sarif.go:344] ▶ tool notification warning: Syntax error Syntax error at line test.py:2:
 `$` was unexpected
[DEBU] [Semgrep] [2022-12-16T05:44:26Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/report/[email protected]/report.go:212] ▶ custom rulesets not enabled
[DEBU] [Semgrep] [2022-12-16T05:44:26Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/report/[email protected]/report.go:254] ▶ Applying report overrides
[DEBU] [Semgrep] [2022-12-16T05:44:26Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/report/[email protected]/report.go:260] ▶ custom rulesets not enabled
[DEBU] [Semgrep] [2022-12-16T05:44:26Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/[email protected]/jsonout.go:54] ▶ Optimizing JSON Output

This updated behavior is similar to https://gitlab.com/gitlab-org/gitlab/-/issues/374551

To Reproduce

Create a new GitLab project https://gitlab.com/projects/new
Create a test.py file with the following content:

savetable \
    .withColumn('ttlinminutes',lit($ttl$)) \
    .write.mode('append').format('orc').insertInto('$customSchema$.all_refreshes')

Create a .gitlab-ci.yml with the following content:

include:
  - template: Security/SAST.gitlab-ci.yml

Change the test.py so that line 2 has escaped $ characters (\$), example below:

savetable \
    .withColumn('ttlinminutes',lit(\$ttl\$)) \
    .write.mode('append').format('orc').insertInto('$customSchema$.all_refreshes')

Expected behavior

Step 3: Semgrep job runs, creates a report, uploading report as artifact

Step 4: Semgrep job runs, creates a report, uploading report as artifact

What is the priority of the bug to you?

[x] P0: blocking your adoption of Semgrep or workflow
[ ] P1: important to fix or quite annoying
[ ] P2: regular bug that should get fixed

Environment Docker Use case Unescaped $ in PySpark file will not cause error while Creating report

cli: don't retry non-HTTP exceptions + provide a more descriptive error for SSL cert hostname mismatches

This PR does two things to make things easier for folks behind intercepting proxies:

Add more context to the CertificateError raised in the case of an SSL host mismatch
Don't retry non-HTTP exceptions

Before:

semgrep % SEMGREP_URL=https://wrong.host.badssl.com/ semgrep --config auto
Semgrep rule registry URL is https://wrong.host.badssl.com/.
[ERROR] Failed to download config from https://wrong.host.badssl.com/c/auto: HTTPSConnectionPool(host='wrong.host.badssl.com',
port=443): Max retries exceeded with url: /c/auto (Caused by SSLError(CertificateError("hostname 'wrong.host.badssl.com'
doesn't match either of '*.badssl.com', 'badssl.com'")))
[ERROR] invalid configuration file found (1 configs were invalid)

After:

cli % SEMGREP_URL=https://wrong.host.badssl.com/ pipenv run python src/semgrep/__main__.py --config auto
Semgrep rule registry URL is https://wrong.host.badssl.com/.
[ERROR] Failed to download config from https://wrong.host.badssl.com/c/auto: SSL certificate error: hostname
'wrong.host.badssl.com' doesn't match either of '*.badssl.com', 'badssl.com'. This error typically occurs when your
internet traffic is being routed through a proxy. If this is the case, try setting the REQUESTS_CA_BUNDLE environment
variable to the location of your proxy's CA certificate.
[ERROR] invalid configuration file found (1 configs were invalid)

Open to feedback re: the error text!

PR checklist:

[ ] Purpose of the code is evident to future readers
[ ] Tests included or PR comment includes a reproducible test plan
[ ] Documentation is up-to-date
[ ] A changelog entry was added to changelog.d for any user-facing change
[ ] Change has no security implications (otherwise, ping security team)

If you're unsure on any of this, please see:

Lightweight static analysis for many languages. Find bug variants with patterns that look like source code.

Lightweight static analysis for many languages. Find bugs and enforce code standards.

Language support

General availability

Beta & experimental

Getting started

Rule examples

Integrations

More

Upgrading

Owner

r2c

Comments

C# support

Running semgrep on Apple M1 ARM system

Ability to exclude rules from rulesets on command line

Semgrep 0.15.0 performance issues

Enhancement

Rules with "too many matches" generate confusing "Timeout" error

Very high core time (Windows)

semgrep thows exception on python setup.py install

Can't suppress with // nosemgrep because of autoformatter

update to ocamlformat 0.16.0

[call for ideas] Eliminate stack overflows or troubleshoot them more easily

feat!: new pattern operators

How to use metavariable-regex for variable in pattern-not-include to filter some cases ?

ocamlformat on pfff and other things

feat(julia): support the julia language

Commentary:

Test plan:

discussion: pattern-inside-taint and reviving taint patterns

Unescaped `$` in PySpark file causes Parse Error while Creating report

Source issue

Update

cli: don't retry non-HTTP exceptions + provide a more descriptive error for SSL cert hostname mismatches

Related tags

The dynamic infrastructure framework for everybody! Distribute the workload of many different scanning tools with ease, including nmap, ffuf, masscan, nuclei, meg and many more!

🔎 Help find Trojan Source vulnerability in code 👀 . Useful for code review in project with multiple collaborators

Static binary analysis tool to compute shared strings references between binaries and output in JSON, YAML and YARA

Proto-find is a tool for researchers that lets you find client side prototype pollution vulnerability.

Look for JAR files that vulnerable to Log4j RCE (CVE‐2021‐44228)

DockerSlim (docker-slim): Don't change anything in your Docker container image and minify it by up to 30x (and for compiled languages even more) making it secure too! (free and open source)

AI-Powered Code Reviews for Best Practices & Security Issues Across Languages

A collection of cool tools used by Mobile hackers. Happy hacking , Happy bug-hunting

A fast port scanner written in go with a focus on reliability and simplicity. Designed to be used in combination with other tools for attack surface discovery in bug bounties and pentests

log4jshell vulnerability scanner for bug bounty

Auto scan log4j bug with excel of server list

Ah shhgit! Find secrets in your code. Secrets detection for your GitHub, GitLab and Bitbucket repositories: www.shhgit.com

A Go Library For Generating Random, Rule Based Passwords. Many Random, Much Secure.

Gofrette is a reverse shell payload developed in Golang that bypasses Windows defender and many others anti-virus.

PHP functions implementation to Golang. This package is for the Go beginners who have developed PHP code before. You can use PHP like functions in your app, module etc. when you add this module to your project.

🌘🦊 DalFox(Finder Of XSS) / Parameter Analysis and XSS Scanning tool based on golang

EarlyBird is a sensitive data detection tool capable of scanning source code repositories for clear text password violations, PII, outdated cryptography methods, key files and more.

Find secrets and passwords in container images and file systems

SSRFuzz is a tool to find Server Side Request Forgery vulnerabilities, with CRLF chaining capabilities

Lightweight static analysis for many languages.
Find bugs and enforce code standards.