semgrep v1.59 releases: Fast and syntax-aware semantic code pattern search

by do son · Published June 17, 2020 · Updated January 31, 2024

Semgrep

Semgrep is a command-line tool for offline static analysis. Use pre-built or custom rules to enforce code and security standards in your codebase. You can try it now with our interactive live editor.

Semgrep combines the convenient and iterative style of grep with the powerful features of an Abstract Syntax Tree (AST) matcher and limited dataflow. Easily find function calls, class or method definitions, and more without having to understand ASTs or wrestle with regexes.

Motivation

Semgrep exists because:

Insecure code is easy to write
The future of security involves automatically guiding developers towards a “paved road” made of default-safe frameworks (i.e. React or Object-relational Mappers)
grep isn’t expressive enough and traditional static analysis tools (SAST) are too complicated/slow for paved road automation

The AppSec, Developer, and DevOps communities deserve a static analysis tool that is fast, easy to use, code-aware, multi-lingual and open source!

Overview

Semgrep is optimized for:

Speed: Fast enough to run on every build, commit, or file save
Finding bugs that matter: Run your own specialized rules or choose OWASP 10 checks from the Semgrep Registry. Rules match source code at the Abstract Syntax Tree (AST) level, unlike regexes that match strings and aren’t semantically aware.
Ease of customization: Rules look like the code you’re searching for, no static analysis Ph.D. required. They don’t require compiled code, only source, reducing iteration time.
Ease of integration. Highly portable and many CI and git-hook integrations already exist. Output –json and pipe results into your existing systems.
Polyglot environments: Don’t learn and maintain multiple tools for your polyglot environment (e.g. ESLint, find-sec-bugs, RuboCop, Gosec). Use the same syntax and concepts independent of language.

Language Support

Go · Java · JavaScript · JSX · JSON · Python · Ruby · TypeScript · TSX

Pattern Syntax Teaser

One of the most unique and useful things about Semgrep is how easy it is to write and iterate on queries.

The goal is to make it as easy as possible to go from an idea in your head to find the code patterns you intend to.

Example: Say you want to find all calls to a function named exec, and you don’t care about the arguments. With Semgrep, you could simply supply the pattern exec(…) and you’d match:

# Simple cases grep finds
exec("ls")
exec(some_var)

# But you don't have to worry about whitespace
exec (foo)

# Or calls across multiple lines
exec (
    bar
)

Use case	Semgrep rule
Ban dangerous APIs	Prevent use of exec
Search routes and authentication	Extract Spring routes
Enforce the use secure defaults	Securely set Flask cookies
Tainted data flowing into sinks	ExpressJS dataflow into sandbox.run
Enforce project best-practices	Use assertEqual for == checks, Always check subprocess calls
Codify project-specific knowledge	Verify transactions before making them
Audit security hotspots	Finding XSS in Apache Airflow, Hardcoded credentials
Audit configuration files	Find S3 ARN uses
Migrate from deprecated APIs	DES is deprecated, Deprecated Flask APIs, Deprecated Bokeh APIs
Apply automatic fixes	Use listenAndServeTLS

Changelog v1.59

Added

Swift: Now supports typed metavariables, such as ($X : ty). (pa-3370)

Changed

Add Elixir to Pro languages list in help information. (gh-9609)
Removed sg alias to avoid naming conflicts
with the shadow-utils sg command for Linux systems. (gh-9642)
Prevent unnecessary computation when running scans without verbose logging enabled (gh-9661)
Deprecated option taint_match_on introduced in 1.51.0, it is being renamed
to taint_focus_on. Note that taint_match_on was experimental, and
taint_focus_on is experimental too. Option taint_match_on will continue
to work but it will be completely removed at some point after 1.63.0. (pa-3272)
Added information on product-related flags to help output, especially for Semgrep Secrets. (pa-3383)
taint-mode: Improve inference of best matches for exact-sources, exact-sanitizers,
and sinks. Now we also avoid FPs in cases such as:
```
dangerouslySetInnerHTML = {
  // ok:
  {__html: props ? DOMPurify.sanitize(props.text) : ''} // no more FPs!
}
```
where props is tainted and the sink specification is:
```
patterns:
  - pattern: |
     dangerouslySetInnerHTML={{__html: $X}}
  - focus-metavariable: $X
```
Previously Semgrep wrongly considered the individual subexpressions of the
conditional as sinks, including the props in props ? ..., thus producing a
false positive. Now it will only consider the conditional expression as a whole
as the sink. (rules-6457)
Removed an internal legacy syntax for secrets rules (mode: semgrep_internal_postprocessor). (scrt-320)

Fixed

Autofix: Fixes that span multiple lines will now try to align
inserted fixed lines with each other. (gh-3070)
Matching: Try blocks with catch clauses can now match try blocks that have
extraneous catch clauses, as long as it matches a subset. For instance,
the pattern
```
try:
  ...
catch A:
  ...
```
can now match
```
try:
  ...
catch A:
  ...
catch B:
  ...
``` (gh-3362)
```
Previously, some people got the error:
```
Encountered error when running rules: Other syntax error at line NO FILE INFO YET:-1:
Invalid_argument: String.sub / Bytes.sub
```
Semgrep should now report this error properly with a file name and line number and
handle it gracefully. (gh-9628)
Fixed Dockerfile parsing bug where multiline comments were parsed incorrectly. (gh-9628-2)
The language server will now properly respect findings that have been ignored via the app (lsp-fingerprints)

taint-mode: Pro: Semgrep will now propagate taint via instance variables when
calling methods within the same class, making this example work:

class Test {

  private String str;

  public setStr() {
    this.str = "tainted";
  }

  public useStr() {
    //ruleid: test
    sink(this.str);
  }

  public test() {
    setStr();
    useStr();
  }

}
``` (pa-3372)

taint-mode: Pro: Taint traces will now reflect when taint is propagated via
class fields, such as in this example:
```
class Test {

  private String str;

  public setStr() {
    this.str = "tainted";
  }

  public useStr() {
    //ruleid: test
    sink(this.str);
  }

  public test() {
    setStr();
    useStr();
  }

}
```
Previously Semgrep will report that taint originated at this.str = "tainted",
but it would not tell you how the control flow got there. Now the taint trace
will indicate that we get there by calling setStr() inside test(). (pa-3373)
Addressed an issue related to matching top-level identifiers with meta-variable
qualified patterns in C++, such as matching ::foo with ::$A::$B. This problem
was specific to Pro Engine-enabled scans. (pa-3375)

semgrep v1.59 releases: Fast and syntax-aware semantic code pattern search

Search

Brilliantly

Content & Links

semgrep v1.59 releases: Fast and syntax-aware semantic code pattern search

Semgrep

Motivation

Overview

Language Support

Pattern Syntax Teaser

Changelog v1.59

Added

Changed

Fixed

Install & Use

Search

Brilliantly

Content & Links