A virtual file system for small to medium sized datasets (MB or GB, not TB or PB). Like Docker, but for data.

AetherFS assists in the production, distribution, and replication of embedded databases and in-memory datasets. You can think of it like Docker, but for data.

AetherFS provides engineers with a platform to manage collections of files called datasets. It optimizes its use of the underlying blob store (AWS S3 or equivalent) to reduce cost to operators and improve performance for end users.

Why not use S3 directly or a file server?

While this is an option, there are several problems that arise with this solution. For example, to produce two references to the same dataset, you must upload the same set of files twice. If you want to produce three references, then three times (and so on). This comes at a cost of additional time in your pipeline and storage costs.

Instead, producers tag datasets in AetherFS. A tag can refer to a specific version (semantic or calendar) or a channel that consumers can subscribe to (latest, stable, etc.). Instead of storing entire snapshots of datasets in each version, AetherFS removes duplicated blocks between them. This allows clients to re-use blocks of data and only download new or updated portions.

Status

This project is under active development. The lists below detail aspirational features and documentation.

  • Documentation
  • Features
    • HTTP file server for ease of interaction
    • REST and gRPC APIs for programmatic interaction
    • Optional agent that can manage a shared or FUSE file system
    • Efficiently persist and query information stored in AWS S3
    • Authenticate using common schemes (such as OIDC)
    • Enforce access control around datasets
    • Encrypt data in transit and at rest
    • Built-in developer tools to help understand dataset performance and usage

Expectations & Roadmap

Since I'm mostly iterating on this project in my free time, I plan on using calendar versioning. Bugfixes and minor features can be introduced in any patch version but any major feature should wait for the next release. Releases happen in October, February, and June (every 4 months). Any security issues will be addressed in a timely manner, regardless of release schedule.

v21.10

This will be the initial release of AetherFS. It includes the "essentials".

  • Single binary containing all components.
  • Command to run an AetherFS data hub.
  • Command to upload to and tag datasets in AetherFS.
  • Command to download tagged datasets from AetherFS.
  • Minimal web interface.

v22.02

As the second major release of the AetherFS system, this will include additional security measures and helps simplify interaction for end users (provided there's interest in the system).

  • Command to run an agent process with a FUSE file system.
  • Block caching to improve performance and usage of S3.
  • Command to authenticate clients.
  • Enforce access controls around datasets.
  • Data encrypted in transit.
Owner
mya
Principal Software Engineer (Golang, NodeJS, Java, Python) | homesteader | she/her
mya
Similar Resources

Dragonfly is an intelligent P2P based image and file distribution system.

Dragonfly is an intelligent P2P based image and file distribution system.

Dragonfly Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in o

Jan 9, 2023

File system for GitHub

File system for GitHub

HUBFS · File System for GitHub HUBFS is a read-only file system for GitHub and Git. Git repositories and their contents are represented as regular dir

Dec 28, 2022

GeeseFS is a high-performance, POSIX-ish S3 (Yandex, Amazon) file system written in Go

GeeseFS is a high-performance, POSIX-ish S3 (Yandex, Amazon) file system written in Go

GeeseFS is a high-performance, POSIX-ish S3 (Yandex, Amazon) file system written in Go Overview GeeseFS allows you to mount an S3 bucket as a file sys

Jan 1, 2023

Encrypted File System in Go

Getting Started: Setup the environment: Install GoLang: $ sudo apt update $ sudo apt upgrade $ sudo apt install libssl-dev gcc pkg-config $ sudo apt

Apr 30, 2022

A rudimentary go program that allows you to mount a mongo database as a FUSE file system

This is a rudimentary go program that allows you to mount a mongo database as a

Dec 29, 2021

Gokrazy mkfs: a program to create an ext4 file system on the gokrazy perm partition

gokrazy mkfs This program is intended to be run on gokrazy only, where it will c

Dec 12, 2022

Simple but powerful manager for your dotfiles

Simple but powerful manager for your dotfiles

Dotman The dotfile manager you are searching for Version v0.3 [Next] Installer scripts Bug fixes v0.2 [Now] Automatic git support added v0.1 Initial v

Dec 16, 2022

Ripgrep but for gzip-compressed files over http

Juicer It's ripgrep but for Gzip-compressed files over HTTP! This tool was primarily designed to scan thru the Common Crawl dataset for URLs without s

Feb 21, 2022
Comments
  • feat(file-server): setup an HTTP file server to make it easy to work with files

    feat(file-server): setup an HTTP file server to make it easy to work with files

    Go's out of box http.FileServer allows us to serve file content efficiently over HTTP. It supports range requests and If-Modified-Since semantics. This glues together the http.FileServer API and the the underlying dataset and block APIs.

  • issues with NFS handle

    issues with NFS handle

    So I'm pretty sure this comes from the current implementation of a caching handler. Should the NFS client connect ever change servers, then the caching handler returns a stale connection.

    Option one here would be to support service topology routing that favors host communication exclusively in the daemonset deployment and region/zone communication in the deployment. This is really just a mitigation though. To handle this properly, we'll need to replace the caching handler.

    There are a few things we can do to cut down on the number of handles that are created.

  • issues with bleve / bolt on read only NFS

    issues with bleve / bolt on read only NFS

    Alright, this was a rather aspirational first use case... but figured I'd document it

    When using the default storage engine with blevesearch (boltdb, specifically bbolt), mounting a read-only NFS volume interferes with the technologies ability to obtain file locks (even in a read-only mode which I find odd). I need to dig into this a bit more to determine if it's an NFS limitation or how boltdb manages locks...

    Ideally, this solution is great for solutions who do not need locks for read only connections...

  • swap

    swap "dataset" with "artifact"

    I spent a bit more time thinking about the release asset distribution use case. I want to dig into that a bit more, but it seems like there would be a fair amount duplicated between releases.

Pluggable, extensible virtual file system for Go

vfs Package vfs provides a pluggable, extensible, and opinionated set of file system functionality for Go across a number of file system types such as

Jan 3, 2023
A Small Virtual Filesystem in Go

This is a virtual filesystem I'm coding to teach myself Go in a fun way. I'm documenting it with a collection of Medium posts that you can find here.

Dec 11, 2022
Plik is a scalable & friendly temporary file upload system ( wetransfer like ) in golang.

Want to chat with us ? Telegram channel : https://t.me/plik_root_gg Plik Plik is a scalable & friendly temporary file upload system ( wetransfer like

Jan 2, 2023
Fast, dependency-free, small Go package to infer the binary file type based on the magic numbers signature

filetype Small and dependency free Go package to infer file and MIME type checking the magic numbers signature. For SVG file type checking, see go-is-

Jan 3, 2023
Ghostinthepdf - This is a small tool that helps to embed a PostScript file into a PDF

This is a small tool that helps to embed a PostScript file into a PDF in a way that GhostScript will run the PostScript code during the

Dec 20, 2022
A small tool for sending a single file to another machine

file-traveler A small tool for sending a single file to another machine. Build g

Dec 28, 2021
🌳 Go Bonzai™ File Completer, normal completion looking at files and directories with trailing slashes on directories (like bash)

?? Go Bonzai™ File Completer, normal completion looking at files and directories with trailing slashes on directories (like bash)

Apr 12, 2022
Bigfile -- a file transfer system that supports http, rpc and ftp protocol https://bigfile.site
Bigfile -- a file transfer system that supports http, rpc and ftp protocol   https://bigfile.site

Bigfile ———— a file transfer system that supports http, rpc and ftp protocol 简体中文 ∙ English Bigfile is a file transfer system, supports http, ftp and

Dec 31, 2022
File system event notification library on steroids.

notify Filesystem event notification library on steroids. (under active development) Documentation godoc.org/github.com/rjeczalik/notify Installation

Dec 31, 2022
Cross-platform file system notifications for Go.

File system notifications for Go fsnotify utilizes golang.org/x/sys rather than syscall from the standard library. Ensure you have the latest version

Jan 1, 2023