This static analysis tool works to ensure your program's data flow does not spill beyond its banks.

Go Flow Levee

This static analysis tool works to ensure your program's data flow does not spill beyond its banks.

An input program's data flow is explored using a combination of pointer analysis, static single assignment analysis, and taint analysis. "Sources" must not reach "sinks" without first passing through a "sanitizer." Additionally, source data can "taint" neighboring variables during a "propagation" function call, such as writing a source to a string. Such tainted variables also must not reach any sink.

Such analysis can be used to prevent the accidental logging of credentials or personally identifying information, defend against maliciously constructed user input, and enforce data communication restrictions between processes.

User Guide

See guides/ for guided introductions on:

Motivation

Much data should not be freely shared. For instance, secrets (e.g, OAuth tokens, passwords), personally identifiable information (e.g., name, email or mailing address), and other sensitive information (e.g., user payment info, information regulated by law) should typically be serialized only when necessary and should almost never be logged. However, as a program's type hierarchy becomes more complex or as program logic grows to warrant increasingly detailed logging, it is easy to overlook when a class might contain these sensitive data and which log statements might accidentally expose them.

Technical design

See design/.

Configuration

See configuration/ for configuration details.

Reporting bugs

Static taint propagation analysis is a hard problem. In fact, it is undecidable. Concretely, this means two things:

  • False negatives: the analyzer may fail to recognize that a piece of code is unsafe.
  • False positives: the analyzer may incorrectly claim that a safe piece of code is unsafe.

Since taint propagation is often used as a security safeguard, we care more deeply about false negatives. If you discover unsafe code that the analyzer is not recognizing as unsafe, please open an issue here. Conversely, fales positives waste developer time and should also be addressed. If the analyzer produces a report for code that you consider to be safe, please open an issue here.

For general bug reports (e.g. crashes), please open an issue here.

Contributing

See CONTRIBUTING.md for details.

Developing

See DEVELOPING.md for details.

Disclaimer

This is not an officially supported Google product.

Owner
Google
Google ❤️ Open Source
Google
Comments
  • Fix sanitization bug, add tests

    Fix sanitization bug, add tests

    This PR fixes a bug in how sanitization status is evaluated. It also introduces tests covering the previously buggy behavior.

    This PR also adds tests documenting another, more complicated bug also involving sanitization. This bug are described in #155.

    • [x] Tests pass
    • [x] Running against a large codebase such as Kubernetes does not error out.
    • [x] Appropriate changes to README are included in PR
  • Improvement: Identify sources in extracts

    Improvement: Identify sources in extracts

    This PR is a successor to #80. PR #80 addresses false positives caused by traversing from a tainted Extract to the other Extracts of a Call.

    This PR addresses Extracts not being identified as Sources due to being returned by a call that returns a *Source instead of a Source.

    See #80, as well as my comment here: https://github.com/google/go-flow-levee/pull/80#issuecomment-686506245, for additional information.

    • [x] Tests pass
    • [x] Appropriate changes to README are included in PR
  • Rework configuration to allow YAML and future configuration feature-add.

    Rework configuration to allow YAML and future configuration feature-add.

    This refactor is intended to enable future resolution of #87, #88, #89, #91, #103, i.e. our label:configuration issues.

    • [x] Tests pass
    • [ ] Running against a large codebase such as Kubernetes does not error out.
    • [ ] Appropriate changes to README are included in PR

    ===

    Discussion:

    Where should the heavy lifting occur?

    Pros and cons of iterating towards package config being a simple data container, with actual matching logic being owned by the relevant packages.

    • Pro:
      • Keep ssa logic where ssa is being used, rather than config requiring global implementation knowledge
      • Keep config "low level," avoiding need to expose excessive number of methods as we add additional configuration features.
    • Con:
      • Each package will need to "unpack" configuration, e.g. source will need to for i, s := range cfg.Source { s.Package.Match(mySsa.pkg) // ... } etc.

    I could go either way on it, personally. A lot of ssa logic currently lives in config.

    I think maybe the ideal option would be to split the difference and to rewrite the current method-set to only take basic types E.g. func (c Config) IsSourceField(packagePath, typeName, fieldName) bool. Or maybe to return the set of matchers that satisfy the given constraint. I'm not sure, and I think this is something that's only going to evince itself as we try to migrate to the new struct.

    What do you think of the example YAML?

    @mlevesquedion and I have had some discussion offline about what a decent implementation might look like. Now having iterated on it, I'd appreciate your and/or other's input on this spec style. Some key questions:

    • Do you want to maintain the name packageRE, or have package implicitly be a regexp? Or accept either, using packageRE as a regexp patch and package as an explicit string match?
    • Does the fieldtags format of explicitly listing key-value pairs read well? We could accept a list of tag strings, e.g., foo:"bar" as the input, but I worry how that would read / parse in YAML.
  • Fix: Take instruction order into account when visiting calls

    Fix: Take instruction order into account when visiting calls

    The bug

    Currently, the analyzer does not differentiate between the following two (abstract) scenarios:

    1. Bad
    taint(value)
    sink(value)
    
    1. Ok
    sink(value)
    taint(value)
    

    Specifically, in this second case, a diagnostic is produced at the sink call, even though the value has not yet been tainted. This is because we rely mostly on the SSA value graph to determine if a source reaches a sink, and that graph has no concept of the "order" of instructions: it only contains Referrers/Operands relationships.

    The fix

    While traversing from a Source, for each ssa.BasicBlock, we keep track of the max index of an instruction visited within it. When we visit an instruction, we find its index in its block and update the max seen so far if necessary. When we encounter a Call instruction, we check that it is higher than the current max for its block, i.e. that it occurs after the value that led us to the call was tainted.

    Future PR's

    The solution proposed here only works for single-block functions. If a function has multiple blocks, in most cases this approach will not work as is, so I will need to extend it in a future PR.

    • [x] Tests pass
    • [x] Appropriate changes to README are included in PR
  • Explicitly handle each node type

    Explicitly handle each node type

    This PR implements explicit handling of each node type. The idea is to make taint propagation a deliberate choice, instead of the current "just traverse to everything, except if X, Y or Z". Indeed, in most cases, propagating everywhere doesn't make much sense.

    I think the resulting code is easier to understand. The epic switch looks intimidating, but for every node type, where to traverse is specified explicitly, in a single place. Also, visitReferrers actually means "visit the referrers" now (same for visitOperands).

    Performing this work has made me discover some issues/potentially missing test cases:

    • [x] BinOp needs tests, e.g. to make sure we aren't propagating to the other (non-tainted) operand : #179
    • [x] Phi needs tests, e.g. to make sure we aren't propagating to the other (non-tainted) operand : #180
    • [x] MapUpdate needs tests to make sure we are propagating only to the Map, not to the Key or the Val. #181
    • [x] Tests for Index/IndexAddr : #187

    The above issues have all been demonstrated via test cases, which are fixed by this PR.

    For cases where the correct propagation is ambiguous, I have opened an issue here: #188. These will need further investigation. In the meantime, this PR maintains the current behavior of "traverse everywhere" for these cases.

    • [x] Tests pass
    • [x] Running against a large codebase such as Kubernetes does not error out.
    • (N/A) [ ] Appropriate changes to README are included in PR
  • Replace traversal with interpretation

    Replace traversal with interpretation

    Why?

    The current approach to taint propagation is based on traversing the implicit graph of Referrer/Operand relationships. Because it is not always desirable to taint all of the Referrers/Operands of a given Source, while traversing it is necessary to perform additional checks to avoid propagating to things we shouldn't, e.g. propagating from a tainting instruction to a sink call that occurred before it. Most of these checks have been introduced to fix bugs discovered by running the analyzer on kubernetes: #118, #116, #80, #79. Debugging these cases was fairly complicated, even using the debugging aids (visualizations of the ssa graph and dumps of the ssa code). I believe debugging was difficult because the code itself is complicated, and it has gotten more complicated as a result of fixing the bugs.

    What

    This PR replaces the current approach to taint propagation with a new approach that is based on "interpreting" the SSA code. Information about ssa Values is maintained in a map that contains the state of each Value. Propagation is performed by stepping through the instructions, introducing taint when a Source value is encountered and propagating it when reading from a tainted Value. This approach intrinsically takes instruction order into account, is more explicit, and is (I believe) easier to understand.

    I also believe that this approach will be easier to extend, and it handles some test cases that the current approach does not. In all of the following cases, the current approach fails to produce a report:

    func TestIncorrectSanitizationByValue(s core.Source) {
    	core.Sanitize(s)
    	core.Sink(s) // TODO want "a source has reached a sink"
    }
    
    func TestDoublePointer(s **core.Source) {
    	core.Sink(s)   // TODO want "a source has reached a sink"
    	core.Sink(*s)  // TODO want "a source has reached a sink"
    	core.Sink(**s) // TODO want "a source has reached a sink"
    }
    
    func TestTaintIsPropagatedToColocatedPointerArguments(s core.Source, ip *core.Innocuous) {
    	i := core.Innocuous{}
    	taintColocated(s, &i, ip)
    	core.Sink(s)  // want "a source has reached a sink"
    	core.Sink(i)  // TODO want "a source has reached a sink"
    	core.Sink(ip) // want "a source has reached a sink"
    }
    
    func TestStructHoldingSourceAndInnocIsTainted(s core.Source, i core.Innocuous) {
    	h := Holder{
    		s,
    		i,
    	}
    	core.Sink(h) // TODO want "a source has reached a sink"
    }
    

    Related PRs

    This PR builds on #125, which introduces the above-mentioned tests.

    Additional information

    I have run both "versions" of the analyzer on kubernetes and they produce the same reports. This provides some evidence that we would not be missing cases by moving to the new approach. The run time is not significantly different.

    Future changes

    • Introduce further debugging aids. I have not included it in this PR, but I have written a debugger that allows a developer to step through each instruction in a function to see how the state evolves during interpretation. I think it would also be useful to render the graph of blocks (the control flow graph).
    • Cover ssa Instructions that are currently not covered, and add tests for them.
    • [x] Tests pass
    • [x] Appropriate changes to README are included in PR
  • Introduce pointer analysis

    Introduce pointer analysis

    Context

    This PR introduces pointer analysis into levee in order to handle some tricky cases that we have been having trouble with. In particular, the following case, and some other cases similar to it, are now handled correctly:

    Wrapper through empty interface

    func TestSinkWrapper(s core.Source) {
        SinkWrapper(s) // want "a source has reached a sink"
    }
    
    func SinkWrapper(arg interface{}) {
        core.Sink(arg) 
    }
    

    Note that the diagnostic is produced at the call to SinkWrapper.

    This other case is still not handled properly, however, because pointer analysis can't work with ints. Untainted value identified as tainted

    func TestReturnsFive(s core.Source) {
        five := ReturnsFive(s)
        core.Sink(five) // an unexpected diagnostic is produced here
    }
    
    func ReturnsFive(arg interface{}) int {
        return 5
    }
    

    Implementation

    In order to make the pointer analysis play well with the existing analysis, pointer analysis is executed first. If pointer analysis reports something at a call, there is no need to analyze it with our existing analysis. Running both analyses also allows us to handle some cases which pointer analysis does not handle, such as this one: Source value written to a string

    func TestStringify(s core.Source) {
        str := Stringify(s)
        core.Sink(str) // pointer analysis cannot handle this case, because a string is not a reference type
    }
    
    func Stringify(arg interface{}) string {
        return fmt.Sprintf("%v", arg)
    }
    

    Performing the pointer analysis itself is fairly straightforward. For each sink call, we collect the ssa.Values that are arguments to the call. If the sink is variadic, this requires a bit of additional work. For each value, if it is queryable, we add it to the list of queries. After performing the analysis, we go over the queries' PointsToSets and report our findings.

    Outstanding issues

    At present, pointer analysis does not allow us to handle every case we might be interested in. For example, the following test case fails:

    func TestSinkWrapperSlice(s core.Source) {
    	SinkWrapperSlice("not a source", s, 0) // TODO want "a source has reached a sink"
    }
    
    func SinkWrapperSlice(args ...interface{}) {
    	core.Sink(args)
    }
    

    My understanding of this failure is that within SinkWrapperSlice, the args parameter is represented as an ssa.Parameter, which represents the whole slice. Since the only Referrer for this ssa.Parameter is the call to core.Sink, and since it has no Operands (because it is an ssa.Value), using the current approach I see no way to formulate a relevant query for pointer analysis. We are not interested in the whole slice, merely in its elements (more specifically, in its 2nd element, but we can't know that in advance), and we have no way to get at these elements (at least, not within the body of SinkWrapperSlice).

    • [x] Tests pass
    • [x] Appropriate changes to README are included in PR
  • Proposal for testdata convention - spoof source root with go.mod to assist IDEs

    Proposal for testdata convention - spoof source root with go.mod to assist IDEs

    Problem statement

    analysistest expects testdata to be a fake ~~GOROOT~~GOPATH. We currently have various tests at .../testdata/src/example.com/... to test the various analyzers.

    IntelliJ expects our project to be configured with Go modules. This fake root is opaque to IntelliJ. As a consequence, the IDE can't resolve the linkage for packages within testdata. E.g.,

    image

    This conflict makes development in of tests sticker than it needs to be. Additionally, while I believe I'm the only one who does this, but it also prevents general execution of an analyzer from the command-line. I am partial to using ssadump when investigating issues, but this is prevented in any file with an unresolved import.

    Proposal

    I propose adding a trivial go.mod to each testdata/src/<some shared root> so that our IDEs can resolve these paths. See #273 and associated branch for an example within config/testdata. This allows proper linking within IntelliJ while retaining test compatability.

    For compatability with the builder within analysistest, we would each testdata/src/... to be self-contained. Cross-package imports are permitted, but not cross-module imports. This requires a shared root within testdata/src/. #273 demonstrates this as testdata/src/github.com/. In practice, we should avoid collision of go modules and adopt adopt testdata/src/<identifier>, e.g. testdata/src/levee_configtest/....

    I'd love to hear your thoughts about adding this as a convention going forward. At your convenience, please pull my pseudo-module-testing branch and see if this works with your own workflow. (I had to execute config_test to trigger compiling and linkage.)

  • Introduce infer analyzer

    Introduce infer analyzer

    This PR introduces an analyzer that identifies named types as sources from their underlying types and their fields, implementing the core logic needed for #96 and #97. In the future, the output of this analyzer could be used to identify sources by type. Therefore, it will integrate either with the source or the sourcetype analyzer.

    The core of the analysis consists of building a "type graph" that indicates which types use which other types in their definition. Traversing the graph in reverse topological order allows "sourceyness" to be propagated from configured sources to types that refer to them.

    When a type definition like the below is encountered, an edge is added from B to A. If B is a configured source, or an inferred source, A will also be identified as an inferred source when traversing the graph.

    type A B
    

    This information is obtained using the ast, since the fact that A's underlying type is B is lost in the ssa.

    When a struct definition like the below is encountered, an edge is added from B to A.

    type A struct {
        b B
    }
    

    **This information is obtained using the ssa, since the ssa provides a convenient API for this purpose.``

    In practice, things are a little bit more complicated: a type may contain multiple types, some of them named, some of them not named, e.g.:

    map[string][]map[Source][]string
    

    Cycles can occur due to corecursive type definitions, e.g.:

    type A struct {
        b *B
    }
    type B struct {
        a *A
    } 
    

    In the topological sorting, nodes within a cycle may be in any order. This does not pose any particular issues to the analysis, and it does not impact the order for nodes outside of the cycle.

    • [x] Tests pass
    • [x] Running against a large codebase such as Kubernetes does not error out. (N/A, not exposed and not yet integrated with the main analyzer.)
    • [x] Appropriate changes to README are included in PR (N/A)
  • To discuss: Issues and prioritization for a '1.0'

    To discuss: Issues and prioritization for a '1.0'

    I wanted to discuss how we are prioritizing certain issues.

    While I don't necessarily think this document should be merged into master, I wanted to have a place where we could fork discussions by line item, since I foresee many sub-discussions occurring. After discussion, each pending item should have an Issue opened to track it, and those we deem higher priority should be tagged with a 1.0 target tag.

  • false negative when analyze the url parameters about gin framework

    false negative when analyze the url parameters about gin framework

    False negative report

    Use this issue template to describe a situation where the analyzer failed to recognize that a piece of unsafe code is unsafe. For example, if the analyzer did not produce a report on the following piece of code, then that would be a false negative:

    func ginhandler(c *gin.Context) {
    	name := c.Query("name") // url parameter name set to source
    	c.String(200, name)
    	log.Println(name) //sink
    }
    
    func test123() {
    	r := gin.Default()
    	r.GET("/hello", ginhandler)
    	r.Run()
    }
    

    (We are assuming that Source has been configured as a source and Sink has been configured as a sink.)

    Describe the issue Please include a clear and concise description of what happened and why you think it is a false negative. I hava configured the levee and did some tests. It is ok with net/http.Form setted as source and log.Println as sink. However when I test the code I showed, levee can't recognize it as an unsafe case.

    To Reproduce Please make it as easy as possible for us to reproduce what you observed. If possible, provide the exact configuration and code on which the analyzer failed to produce a report. If the code cannot be shared, please provide a simplified example and confirm that it also contains the false negative.

    config file ` { "Sources": [ { "PackageRE": "github.com/gin-gonic/gin", "TypeRE": "Context", "FieldRE": "" } ], "Sinks": [ { "PackageRE": "^log$", "MethodRE": "Print" } ]

    } `

    command: levee -config=config.json example.go

    Additional context Add any other context about the problem here.

    The main point here is that I want to do some analysis on the url parameters' taint, especially the gin framework, thanks for your time to help me with this.

  • Upgrade the Go version to v1.18, and x/tools to v0.1.11. Support generics better.

    Upgrade the Go version to v1.18, and x/tools to v0.1.11. Support generics better.

    Address the differences related to the new SSA format. Remove errors/warnings on generics types, and add unit tests.

    Fixes #323

    It's a good idea to open an issue first for discussion.

    • [x] Running against a large codebase such as Kubernetes does not error out. (See DEVELOPING.md for instructions on how to do that.)
    • [x] Appropriate changes to README are included in PR
  • Generics are not supported by analyzers

    Generics are not supported by analyzers

    Bug report

    Describe the bug Generics are not supported by analyzers or helper functions. Most notably the new *types.TypeParam type is not recognized.

    To Reproduce

    1. Upgrade x/tools to a recent version (which supports ssa + generics):
    go get golang.org/x/[email protected]
    go mod download github.com/yuin/goldmark 
    go get golang.org/x/tools/internal/[email protected]
    
    1. Add a generic type with a method to internal/pkg/sourcetype/testdata/test_stackoverflow.go

    E.g.

    type G[T any]struct{}
    
    func (G[T]) M(_ T) {
    }
    
    1. Run the sourcetype test in verbose mode:
    % go test -v ./internal/pkg/sourcetype
    === RUN   TestSourceTypeDoesNotStackOverflow
    unexpected type received: *types.TypeParam T; please report this issue
    --- PASS: TestSourceTypeDoesNotStackOverflow (0.02s)
    PASS
    ok  	github.com/google/go-flow-levee/internal/pkg/sourcetype	0.048s
    

    Note the unexpected type received: *types.TypeParam T; please report this issue in the output.

    Additional context n/a

  • `utils.Dereference` can get stuck in an infinite loop

    `utils.Dereference` can get stuck in an infinite loop

    When a type refers to itself through a pointer, utils.Dereference can get stuck in an infinite loop:

    package test
    
    type A *A
    
    func test(a A) {
    }
    

    Running levee on this code hangs indefinitely.

  • Separate the unit-tests for the two taint analyses

    Separate the unit-tests for the two taint analyses

    Bug report

    Currently the unit-tests of the two taint analyses are combined, and the EAR-based analysis fail some tests. It is preferable to separate the tests, e.g. by putting them in different directories:

       example/tests/shared/...
       example/tests/propagation/... 
       example/tests/ear/....
    

    This can simplify the "levee_ear_test.go" as well.

  • Use more advanced call graph in inter-procedural analysis

    Use more advanced call graph in inter-procedural analysis

    Bug report

    Describe the bug

    Currently the callee of a virtual method call is resolved only statically in SSA, i.e. the Static Callgraph is used. Other call graphs may be used to increase the analysis's coverage and the number of findings:

Analyzer: zapvet is static analysis tool for zap

zapvet zapvet is static analysis tool for zap. fieldtype: fieldtype finds confliction type of field Install You can get zapvet by go install command (

Sep 18, 2022
Retnilnil is a static analysis tool to detect `return nil, nil`

retnilnil retnilnil is a static analysis tool for Golang that detects return nil, nil in functions with (*T, error) as the return type. func f() (*T,

Jun 9, 2022
GoKart - Go Security Static Analysis
 GoKart - Go Security Static Analysis

GoKart is a static analysis tool for Go that finds vulnerabilities using the SSA (single static assignment) form of Go source code.

Jan 1, 2023
🐶 Automated code review tool integrated with any code analysis tools regardless of programming language
🐶 Automated code review tool integrated with any code analysis tools regardless of programming language

reviewdog - A code review dog who keeps your codebase healthy. reviewdog provides a way to post review comments to code hosting service, such as GitHu

Jan 2, 2023
a simple golang SSA viewer tool use for code analysis or make a linter
a simple golang SSA viewer tool use for code analysis or make a linter

ssaviewer A simple golang SSA viewer tool use for code analysis or make a linter ssa.html generate code modify from src/cmd/compile/internal/ssa/html.

May 17, 2022
🐶 Automated code review tool integrated with any code analysis tools regardless of programming language
🐶 Automated code review tool integrated with any code analysis tools regardless of programming language

reviewdog - A code review dog who keeps your codebase healthy. reviewdog provides a way to post review comments to code hosting service, such as GitHu

Jan 7, 2023
Tool: ptrls prints result of pointer analysis

ptrls Install $ go install github.com/gostaticanalysis/ptrls/cmd/ptrls@latest Usage $ cd testdata/a $ cat a.go package main func main() { f(map[str

Feb 1, 2022
A GitLab API client enabling Go programs to interact with GitLab in a simple and uniform way

go-gitlab A GitLab API client enabling Go programs to interact with GitLab in a simple and uniform way NOTE Release v0.6.0 (released on 25-08-2017) no

Jan 6, 2023
[mirror] Performance measurement, storage, and analysis.

Go performance measurement, storage, and analysis tools This subrepository holds the source for various packages and tools related to performance meas

Dec 24, 2022
A static code analyzer for annotated TODO comments
A static code analyzer for annotated TODO comments

todocheck todocheck is a static code analyzer for annotated TODO comments. It let's you create actionable TODOs by annotating them with issues from an

Dec 7, 2022
structslop is a static analyzer for Go that recommends struct field rearrangements to provide for maximum space/allocation efficiency.

structslop Package structslop defines an Analyzer that checks struct can be re-arranged fields to get optimal struct size.

Dec 28, 2022
Go-perfguard - A static analyzer with emphasis on performance

perfguard This tool is a work in progress. It's not production-ready yet. perfgu

Dec 28, 2022
Drone Plugin for detecting credentials or other sensitive data in your repository

A plugin to detect hard-coded secrets and sensitive data in your source code files. Building Build the plugin binary: scripts/build.sh Build the plug

Apr 21, 2022
Manage your repository's TODOs, tickets and checklists as config in your codebase.

tickgit ??️ tickgit is a tool to help you manage latent work in a codebase. Use the tickgit command to view pending tasks, progress reports, completio

Dec 30, 2022
Tool to populate your code with traceable and secure error codes

Essential part of any project, especially customer facing is proper and secure error handling. When error happens and customer reports it, it would be nice to know the context of the error and where it exactly occured.

Sep 28, 2022
Visualise Go program GC trace data in real time

This project is no longer maintained I'm sorry but I do not have the bandwidth to maintain this tool. Please do not send issues or PRs. Thank you. gcv

Dec 14, 2022
Marshal data into commands struct!
Marshal data into commands struct!

Commandarrgh in a nuthsell Commandarrgh is an interface that helps you marshaling data into a command arguments structure. Maybe you have been trying

Dec 18, 2021
Clean architecture validator for go, like a The Dependency Rule and interaction between packages in your Go projects.
Clean architecture validator for go, like a The Dependency Rule and interaction between packages in your Go projects.

Clean Architecture checker for Golang go-cleanarch was created to keep Clean Architecture rules, like a The Dependency Rule and interaction between mo

Dec 31, 2022