Gives criticality score for an open source project

Open Source Project Criticality Score (Beta)

This project is maintained by members of the Securing Critical Projects WG.

Goals

  1. Generate a criticality score for every open source project.

  2. Create a list of critical projects that the open source community depends on.

  3. Use this data to proactively improve the security posture of these critical projects.

Criticality Score

A project's criticality score defines the influence and importance of a project. It is a number between 0 (least-critical) and 1 (most-critical). It is based on the following algorithm by Rob Pike:

We use the following parameters to derive the criticality score for an open source project:

Parameter (Si) Weight (αi) Max threshold (Ti) Description Reasoning
created_since 1 120 Time since the project was created (in months) Older project has higher chance of being widely used or being dependent upon.
updated_since -1 120 Time since the project was last updated (in months) Unmaintained projects with no recent commits have higher chance of being less relied upon.
contributor_count 2 5000 Count of project contributors (with commits) Different contributors involvement indicates project's importance.
org_count 1 10 Count of distinct organizations that contributors belong to Indicates cross-organization dependency.
commit_frequency 1 1000 Average number of commits per week in the last year Higher code churn has slight indication of project's importance. Also, higher susceptibility to vulnerabilities.
recent_releases_count 0.5 26 Number of releases in the last year Frequent releases indicates user dependency. Lower weight since this is not always used.
closed_issues_count 0.5 5000 Number of issues closed in the last 90 days Indicates high contributor involvement and focus on closing user issues. Lower weight since it is dependent on project contributors.
updated_issues_count 0.5 5000 Number of issues updated in the last 90 days Indicates high contributor involvement. Lower weight since it is dependent on project contributors.
comment_frequency 1 15 Average number of comments per issue in the last 90 days Indicates high user activity and dependence.
dependents_count 2 500000 Number of project mentions in the commit messages Indicates repository use, usually in version rolls. This parameter works across all languages, including C/C++ that don't have package dependency graphs (though hack-ish). Plan to add package dependency trees in the near future.

NOTE:

  • We are looking for community ideas to improve upon these parameters.
  • There will always be exceptions to the individual reasoning rules.

Usage

The program only requires one argument to run, the name of the repo:

$ pip3 install criticality-score

$ criticality_score --repo github.com/kubernetes/kubernetes
name: kubernetes
url: https://github.com/kubernetes/kubernetes
language: Go
created_since: 79
updated_since: 0
contributor_count: 3664
org_count: 5
commit_frequency: 102.7
recent_releases_count: 76
closed_issues_count: 2906
updated_issues_count: 5136
comment_frequency: 5.7
dependents_count: 407254
criticality_score: 0.9862

You can add your own parameters to the criticality score calculation. For example, you can add internal project usage data to re-adjust the project's criticality score for your prioritization needs. This can be done by adding the --params :: ... argument on the command line.

Authentication

Before running criticality score, you need to:

# For posix platforms, e.g. linux, mac:
export GITHUB_AUTH_TOKEN=<your access token>

# For windows:
set GITHUB_AUTH_TOKEN=<your access token>
  • For GitLab repos, you need to create a GitLab access token and set it in environment variable GITLAB_AUTH_TOKEN. This helps to avoid the GitLab's api limitations for unauthenticated users.
# For posix platforms, e.g. linux, mac:
export GITLAB_AUTH_TOKEN=<your access token>

# For windows:
set GITLAB_AUTH_TOKEN=<your access token>

Formatting Results

There are three formats currently: default, json, and csv. Others may be added in the future.

These may be specified with the --format flag.

Public Data

If you're only interested in seeing a list of critical projects with their criticality score, we publish them in csv format.

This data is available on Google Cloud Storage and can be downloaded via the gsutil command-line tool or the web browser here.

NOTE: Currently, these lists are derived from projects hosted on GitHub ONLY. We do plan to expand them in near future to account for projects hosted on other source control systems.

$ gsutil ls gs://ossf-criticality-score/*.csv
gs://ossf-criticality-score/c_top_200.csv
gs://ossf-criticality-score/cplusplus_top_200.csv
gs://ossf-criticality-score/csharp_top_200.csv
gs://ossf-criticality-score/go_top_200.csv
gs://ossf-criticality-score/java_top_200.csv
gs://ossf-criticality-score/js_top_200.csv
gs://ossf-criticality-score/php_top_200.csv
gs://ossf-criticality-score/python_top_200.csv
gs://ossf-criticality-score/ruby_top_200.csv
gs://ossf-criticality-score/rust_top_200.csv
gs://ossf-criticality-score/shell_top_200.csv

This data is generated using this generator script. For example, to generate a list of top 200 C language projects, run:

$ pip3 install python-gitlab PyGithub
$ python3 -u -m criticality_score.generate \
    --language c --count 200 --sample-size 5000 --output-dir output

We have also aggregated the results over 100K repositories in GitHub (language-independent) and are available for download here.

Contributing

If you want to get involved or have ideas you'd like to chat about, we discuss this project in the Securing Critical Projects WG meetings.

See the Community Calendar for the schedule and meeting invitations.

See the Contributing documentation for guidance on how to contribute.

Owner
Open Source Security Foundation (OpenSSF)
Open Source Security Foundation (OpenSSF)
Comments
  • GeoTools not showing in top 200 for java projects, run criticality score on larger sample set

    GeoTools not showing in top 200 for java projects, run criticality score on larger sample set

    I looked at the top 200 Java projects, out of curiosity, to see if any of the projects I'm working on, like GeoTools, is included in the list. It was not, which is not an issue per se, but then I've computed the criticality score from command line, getting this:

    criticality_score --repo "https://github.com/geotools/geotools"
    name: geotools
    url: https://github.com/geotools/geotools
    language: Java
    created_since: 111
    updated_since: 0
    contributor_count: 315
    org_count: 6
    commit_frequency: 9.7
    recent_releases_count: 16
    closed_issues_count: 150
    updated_issues_count: 161
    comment_frequency: 1.0
    dependents_count: 337
    criticality_score: 0.66477
    

    The score alone would place the project at around position 100 of the top 200 projects. Since it's a no show, I'm wondering if there is any other criteria used to include/exclude projects, besides the pure score?

  • Use project first commit date for created_since, instead of github project creation date

    Use project first commit date for created_since, instead of github project creation date

    For many projects the github creation date might not match the project creation date.

    Would it be better to look at the date of the oldest commit in the repository?

    For example, for OpenSSL the computed creation_since value is 95 months, as the date of creation of a github mirror (2013-01-15T22:34:48Z), but the project is almost 22 years old (the first commit in the master branch dates back to 1998-12-21T10:52:45+00:00)!

    The cap for the field is 10 years anyway, so it's not that bad, but still it is one parameter in the equation that might be adjusted.

    Edit: this also affects other fields (e.g. recent_releases) when they are computed based on estimates based on the time since creation.

    Thoughts?

  • What is dependents_count parameter, looks suspect ?

    What is dependents_count parameter, looks suspect ?

    I asked for the criticality info on several projects in my industry's ecosystem, and the dependents_count really confuses me and makes me suspicious about how it's computed. Some of the projects I checked are hard dependencies of others, so if transitive dependencies are being properly tracked, the former should always have higher dependents_count than the latter, no? But this is not the case.

    One project that I run is very specialized and is of no use to casual small projects, only making sense as an embedded component of a large open source or commercial app. So while certainly very important in my industry and having a large number of end users touch those things in which it is embedded, I expect it to have a tiny number of directly downstream projects. Yet it has an absurdly, implausibly high dependents_count. Other projects I checked on that I know are directly used by orders of magnitude more projects, have implausibly low dependents_count.

    Is there some kind of verbose mode that prints details that would give us more information about how these scores are computed? Like, more insight into why it thinks a project has few or many dependent projects?

    I should mention that these are C++ projects, so perhaps the means by which dependencies are tracked is very flawed compared to a python (say) which may have a requirements.txt. How is it computed for C++? Has anybody considered promoting a GitHub convention of having a particularly named file serve as a manifest for what other projects a code base is dependent on? (Informationally only, since no C++ build system cares about such things.)

  • Maven and Gradle not in the Top 2000 java list

    Maven and Gradle not in the Top 2000 java list

    Hi,

    I just saw that the Maven and Gradle projects are less important that 2000 java projects where they are used in as a build tool. Maybe due to the fact that they:

    • are not a declared dependency
    • https://github.com/ossf/criticality_score/issues/14
    • https://github.com/ossf/criticality_score/issues/23
    • external issue tracker
    • All the parts (pluggable, not a dependency!) are split into many repositories
    • Mosten downloaded via maven.org, sdkman, package systems, etc

    Probably the same for other languages and build-tools, but haven’t checked.

  • Installation does not work as described in README

    Installation does not work as described in README

    I get:

    $ pip3 install criticality-score
    Collecting criticality-score
      Could not find a version that satisfies the requirement criticality-score (from versions: )
    No matching distribution found for criticality-score
    
  • Add Watchers/Description Metrics

    Add Watchers/Description Metrics

    I wanted to submit a suggestion to include GitHub Watchers (to help assess popularity) and the GitHub Description (to clarify the project's overall goal). I am currently helping contribute to OSSF's Security Metrics project, in which we are retrieving several of the GitHub metrics covered in this project (but also need to analyze the two mentioned above to help with our overall security assessment). If these can be included via the pull request I have submitted that would be extremely helpful. Thank you!

  • Handle empty repo case

    Handle empty repo case

    When I was running the script, I bumped into these repos that they fall into the filter due to high number of stars but they're actually empty and the script throws an exception: https://github.com/fossasia/libregraphics.asia https://github.com/libredesktop/libredesktop-events https://github.com/libredesktop/libredesktop-project-list https://github.com/libredesktop/LibreDesktop-Specs https://github.com/meilix/arch-meilix https://github.com/meilix/deb-meilix https://github.com/meilix/meilix-addons https://github.com/meilix/meilix-art https://github.com/meilix/meilix-connect https://github.com/meilix/meilix-web https://github.com/susiai/susi_partners https://github.com/susiai/susi_sdk https://github.com/ascoders/blog https://github.com/bigdongdongCLUB/newGCP https://github.com/koush/support-wiki https://github.com/mariobehling/ai-packages https://github.com/mariobehling/mb-sandbox https://github.com/meilix/meilix-docs https://github.com/paulirish/devtools-addons https://github.com/QingDaoIT/BlackList https://github.com/zhengzhouqiuzhi/zhengzhouqiuzhi

    To handle it, for GitLab, checking the commits length was enough:

    if len(repo.commits.list()) == 0:
    

    For GitHub, I couldn't find any proper way to understand whether the repo is empty. When we call "get_commits().totalCount", it already throws an exception. What I did is to force it to throw the exception by assigning "totalCount" to an unused variable (I could do it by printing the value as well?). Not an ideal solution, so let me know what you think.

    try:
    	repo = get_github_auth_token().get_repo(repo_url)
    	# Validate whether repo is empty; if it's empty, calling totalCount throws a 409 exception
    	total_commits = repo.get_commits().totalCount
    except github.GithubException as exp:
    	if exp.status == 404 or exp.status == 409:
    		return None
    return GitHubRepository(repo)
    

    Another remark is that we're spending one more request from our rate limit when calling "get_commits()" to make this validation. I only tested this for GitHub, but I'm assuming it's the same for GitLab as well.

    Alternatively, we can make all these calls before initializing the repo, do the validations, and pass them to repo object as arguments? This would also help us reducing the number of call to the API, but making these changes would take some time.

    To be able to test my changes, I created empty repos on both GitHub & GitLab btw: https://github.com/coni2k/empty-repo https://gitlab.com/coni2k/empty-repo

    Last, I also added this bit to "generate" script. Otherwise it fails when there are no processed repos:

    if len(stats) == 0:
        return
    
  • Adds repolist command line parameter

    Adds repolist command line parameter

    The new --repolist parameter takes the name of a file containing a list of repositories to score.

    usage: run.py [-h] (--repo REPO | --repolist REPOLIST | --local-file L_FILE) [--format {default,csv,json}] [--params PARAMS [PARAMS ...]]

    Gives criticality score for an open source project or a list of projects.

    optional arguments: -h, --help show this help message and exit --repo REPO repository url --repolist REPOLIST listfile of repository urls --local-file L_FILE path of a local csv file with repo stats --format {default,csv,json} output format. allowed values are [default, csv, json] --params PARAMS [PARAMS ...] Additional parameters in form ::<max_threshold>

    This at least partially addresses Issue #97

    Signed-off-by: Arnaud J Le Hors [email protected]

  • why apache/spark isn't in Java top 200 public data?

    why apache/spark isn't in Java top 200 public data?

    Spark has much higher score than ElasticSearch and Beam, Spark is missing but ElasticSearch and Beam are there, why?

    apache/spark:

    $ criticality_score --repo github.com/apache/spark
    name: spark
    url: https://github.com/apache/spark
    language: Scala
    created_since: 83
    updated_since: 0
    contributor_count: 2374
    org_count: 4
    commit_frequency: 53.8
    recent_releases_count: 20
    closed_issues_count: 1252
    updated_issues_count: 1456
    comment_frequency: 12.1
    dependents_count: 396346
    criticality_score: 0.96476
    

    elastic/elasticsearch:

    $ criticality_score --repo github.com/elastic/elasticsearch
    name: elasticsearch
    url: https://github.com/elastic/elasticsearch
    language: Java
    created_since: 132
    updated_since: 0
    contributor_count: 1709
    org_count: 3
    commit_frequency: 127.1
    recent_releases_count: 21
    closed_issues_count: 7966
    updated_issues_count: 9234
    comment_frequency: 1.0
    dependents_count: 95320
    criticality_score: 0.88175
    

    apache/beam:

    $ criticality_score --repo github.com/apache/beam
    name: beam
    url: https://github.com/apache/beam
    language: Java
    created_since: 59
    updated_since: 0
    contributor_count: 980
    org_count: 7
    commit_frequency: 67.1
    recent_releases_count: 7
    closed_issues_count: 725
    updated_issues_count: 826
    comment_frequency: 4.3
    dependents_count: 11397
    criticality_score: 0.8319
    
  • How are the top 200 lists computed?

    How are the top 200 lists computed?

    I am directly responsible for two open source projects. I was shocked to see that one is on your "top 200" list of C++ projects. The other project has been around longer, has more contributors, more PRs, surely has an order of magnitude more downstream users, and in fact has a much higher criticality score. But it's not on the list. I can't quite figure out what the top 200 would be measuring (I would think the 200 projects with the very highest criticality score itself? But apparently not?) for the first project to show up on the list but not the other.

    Can you give any insight about WHAT is being ranked in your "top" lists?

  • Language implementation is less critical than language project generator, create list for TypeScript projects inside JS list.

    Language implementation is less critical than language project generator, create list for TypeScript projects inside JS list.

    tsdx, a TypeScript project generator, appears in the top 200 list for JavaScript packages; however, TypeScript itself does not. That seems somewhat counterintuitive.

  • Publish Docker Images to ghcr.io

    Publish Docker Images to ghcr.io

    • Published the docker images to ghcr.io
    • Here is an example: https://github.com/nathannaveen?tab=packages&repo_name=criticality_score, https://github.com/nathannaveen/criticality_score/actions/runs/3810049085
    • After https://github.com/ossf/criticality_score/pull/293 gets merged in, I will include docker images for scorer.

    Signed-off-by: nathannaveen [email protected]

  • Included Docker File for Scorer

    Included Docker File for Scorer

    It is important to include a Dockerfile with a command-line interface (Scorer) project for the following reasons:

    • Reproducibility: A Dockerfile allows others to easily build the same environment in which the Scorer was developed and tested. This ensures that the Scorer will behave consistently across different systems.
    • Portability: With a Dockerfile, users can run the Scorer on any system that has Docker installed, regardless of the underlying operating system or dependencies.
    • Collaboration: A Dockerfile allows other developers to easily contribute to the Scorer project by providing a consistent and well-defined environment for development and testing.

    In summary, including a Dockerfile with a CLI project makes it easier to use, test, and collaborate on the project, and ensures that the CLI will behave consistently across different systems.

    Signed-off-by: nathannaveen [email protected]

  • Included Build Targets for Binaries

    Included Build Targets for Binaries

    • Included Make file targets for the binaries
    • Included the check in github ci

    Signed-off-by: nathannaveen [email protected]

  • CLI seems to require more setup than shown in the documentation

    CLI seems to require more setup than shown in the documentation

    After installing the CLI (from @main, see #288), I'm trying to run the example shown in the README. However, I'm getting an error:

    $ criticality_score github.com/kubernetes/kubernetes
    > 2022-12-21 11:40:20.631 INFO    Preparing default scorer
    > 2022-12-21 11:40:20.639 ERROR   Failed to create collector      {"error": "init deps.dev source: bigquery: constructing client: google: could not find default credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information."}
    > main.main
    >       /Users/pnacht/go/pkg/mod/github.com/ossf/[email protected]/cmd/criticality_score/main.go:160
    > runtime.main
    >         /usr/local/go/src/runtime/proc.go:250
    

    Seems to require additional credentials...

    I've also found cmd/criticality_score/README.md, which says we need to log into GCP before using the CLI. Maybe that's what it needs.

    I therefore ran gcloud auth login --update-adc (btw, the README has the command as gcloud login --update-adc, which isn't recognized) and repeated the criticality_score command:

    $ criticality_score github.com/kubernetes/kubernetes
    > 2022-12-21 14:47:20.748 INFO    Preparing default scorer
    > 2022-12-21 14:47:20.750 ERROR   Failed to create collector      {"error": "init deps.dev source: unable to detect projectID, please refer to docs for DetectProjectID"}
    > main.main
    >        /Users/pnacht/go/pkg/mod/github.com/ossf/[email protected]/cmd/criticality_score/main.go:160
    > runtime.main
    >         /usr/local/go/src/runtime/proc.go:250
    

    The error is different now, something about "DetectProjectID"? Looking through the criticality_score codebase, I only found one reference to it and honestly didn't know how to proceed from here.

    What else is required to run the CLI as a standalone?

  • Can only install the standalone CLI from @main

    Can only install the standalone CLI from @main

    The README suggests installing the CLI with

    go install github.com/ossf/criticality_score/cmd/criticality_score
    

    However, I get an error:

    go: 'go install' requires a version when current directory is not in a module
    Try 'go install github.com/ossf/criticality_score/cmd/criticality_score@latest' to install the latest version
    

    However, when I then try @latest (and @v1.0.7, the latest release tag), I get another error:

    go: github.com/ossf/criticality_score/cmd/criticality_score@latest: module github.com/ossf/criticality_score@latest found (v1.0.7), but does not contain package github.com/ossf/criticality_score/cmd/criticality_score
    

    The only method I've found that works is using @main, but that's not an optimal solution since it requires that the main branch be in a usable state.

Bubbly is an open-source platform that gives you confidence in your continuous release process.
Bubbly is an open-source platform that gives you confidence in your continuous release process.

Bubbly Bubbly - Release Readiness in a Bubble Bubbly emerged from a need that many lean software teams practicing Continuous Integration and Delivery

Nov 29, 2022
Get live cricket score right in your terminal.
Get live cricket score right in your terminal.

cric Get cricket score right in your terminal. How to use?! Make sure you have Node.js installed on your machine and just type the following command w

Feb 4, 2022
Parse NYT crossword puzzle score screenshots and extract the times.

Parse NYT crossword puzzle score screenshots and extract the times.

Mar 11, 2022
Go library for calculating the Nutri-Score of foods and beverages.

nutriscore Go library for calculating the Nutri-Score Based on https://www.santepubliquefrance.fr/content/download/150263/file/2021_07_21_QR_scientifi

Dec 23, 2021
SigNoz helps developers monitor their applications & troubleshoot problems, an open-source alternative to DataDog, NewRelic, etc. 🔥 🖥. 👉 Open source Application Performance Monitoring (APM) & Observability tool
SigNoz helps developers monitor their applications & troubleshoot problems, an open-source alternative to DataDog, NewRelic, etc. 🔥 🖥.   👉  Open source Application Performance Monitoring (APM) & Observability tool

Monitor your applications and troubleshoot problems in your deployed applications, an open-source alternative to DataDog, New Relic, etc. Documentatio

Sep 24, 2021
An open-source, distributed, cloud-native CD (Continuous Delivery) product designed for developersAn open-source, distributed, cloud-native CD (Continuous Delivery) product designed for developers
An open-source, distributed, cloud-native CD (Continuous Delivery) product designed for developersAn open-source, distributed, cloud-native CD (Continuous Delivery) product designed for developers

Developer-oriented Continuous Delivery Product ⁣ English | 简体中文 Table of Contents Zadig Table of Contents What is Zadig Quick start How to use? How to

Oct 19, 2021
Magma: Gives network operators an open, flexible and extendable mobile core network solution
Magma: Gives network operators an open, flexible and extendable mobile core network solution

Connecting the Next Billion People Magma is an open-source software platform tha

Dec 24, 2021
Configure is a Go package that gives you easy configuration of your project through redundancy

Configure Configure is a Go package that gives you easy configuration of your project through redundancy. It has an API inspired by negroni and the fl

Sep 26, 2022
Open-IM-Server is open source instant messaging Server.Backend in Go.
Open-IM-Server is open source instant messaging Server.Backend in Go.

Open-IM-Server Open-IM-Server: Open source Instant Messaging Server Instant messaging server. Backend in pure Golang, wire transport protocol is JSON

Jan 2, 2023
Open-IM-Server is open source instant messaging Server.Backend in Go.
Open-IM-Server is open source instant messaging Server.Backend in Go.

Open-IM-Server is open source instant messaging Server.Backend in Go.

Dec 31, 2022
go-opa-validate is an open-source lib that evaluates OPA (open policy agent) policy against JSON or YAML data.
go-opa-validate is an open-source lib that evaluates OPA (open policy agent) policy against JSON or YAML data.

go-opa-validate go-opa-validate is an open-source lib that evaluates OPA (open policy agent) policy against JSON or YAML data. Installation Usage Cont

Nov 17, 2022
mesh-kridik is an open-source security scanner that performs various security checks on a Kubernetes cluster with istio service mesh and is leveraged by OPA (Open Policy Agent) to enforce security rules.
mesh-kridik is an open-source security scanner that performs various security checks on a Kubernetes cluster with istio service mesh and is leveraged by OPA (Open Policy Agent) to enforce security rules.

mesh-kridik Enhance your Kubernetes service mesh security !! mesh-kridik is an open-source security scanner that performs various security checks on a

Dec 14, 2022
go-opa-validate is an open-source lib that evaluates OPA (open policy agent) policy against JSON or YAML data.
go-opa-validate is an open-source lib that evaluates OPA (open policy agent) policy against JSON or YAML data.

go-opa-validate go-opa-validate is an open-source lib that evaluates OPA (open policy agent) policy against JSON or YAML data. Installation Usage Cont

Nov 17, 2022
onnx-go gives the ability to import a pre-trained neural network within Go without being linked to a framework or library.
onnx-go gives the ability to import a pre-trained neural network within Go without being linked to a framework or library.

This is a Go Interface to Open Neural Network Exchange (ONNX). Overview onnx-go contains primitives to decode a onnx binary model into a computation b

Dec 24, 2022
Pixie gives you instant visibility by giving access to metrics, events, traces and logs without changing code.
Pixie gives you instant visibility by giving access to metrics, events, traces and logs without changing code.

Pixie gives you instant visibility by giving access to metrics, events, traces and logs without changing code.

Jan 4, 2023
A cowin bot that gives you an update whenever it finds a vacancy in your region

go-cowin-bot A cowin bot that will give you an update on discord whenever it finds a vacancy for the parameters provided Setup: download go-cowin-bot

Mar 29, 2022
Gowl is a process management and process monitoring tool at once. An infinite worker pool gives you the ability to control the pool and processes and monitor their status.
Gowl is a process management and process monitoring tool at once. An infinite worker pool gives you the ability to control the pool and processes and monitor their status.

Gowl is a process management and process monitoring tool at once. An infinite worker pool gives you the ability to control the pool and processes and monitor their status.

Nov 10, 2022
A Telegram Repo For Bots Under Maintenance Which Gives Faster Response To Users
A Telegram Repo For Bots Under Maintenance Which Gives Faster Response To Users

Maintenance Bot A Telegram Repo For Bots Under Maintenance Which Gives Faster Response To Users Requests » Report a Bug | Request Feature Table of Con

Mar 21, 2022
Emulate a Vikings War of Clans battle with the real game mechanics and gives you the results of your emulated rapport!

VikingsStatsCalc Emulate a Vikings War of Clans battle with the real game mechanics and gives you the results of your emulated rapport! TODO Introduce

Nov 18, 2022
`ctxio` gives `io.copy` operations the ability to cancel with context and retrieve progress data.

ctxio The ctxio package gives golang io.copy operations the ability to terminate with context and retrieve progress data. Install go get github.com/

Aug 10, 2022