Kitten is a distributed file system optimized for small file storage, inspired by Facebook's Haystack.

kitten

Go

What is kitten

Kitten 是一个为大规模小文件存储而生的分布式文件系统,核心架构参考了Facebook的Haystack, 并且从bfs中学习了很多优化手段。(本项目只是一个学习项目,未经过生产环境验证)

Features

Quick Start

Introduction

传统文件系统在存储大量小文件的情况下,会出现元数据的IO瓶颈,因为每次读取一个文件需要先做一次IO找到元数据,再通过元数据找到真正的文件。并且元数据中存储的像permission、访问时间等数据可能是无用的。 在小文件数量很大的情况下你存一个数据对应的元数据大小可能跟你的数据大小差不多,这样就造成了大量的空间浪费。

Kitten从两个方向优化了这个现象:

  1. 顺序写:传统的机械硬盘由于有寻道和旋转这样的机械动作,顺序写入的性能是远大于随机写入的,所以Kitten的写入设计为顺序append。
  2. 元数据方面:Kitten将所有小文件append到一个大文件里,这里引入两个概念Superblock和Needle, Superblock就是一个超大块,集合了顺序写入的小文件,Needle就是其中的每个小文件,读取时只需要通过内存里面维护的每个Needle的offset和size就能找到对应的文件。

Kitten适合的文件特点是:一次写入从不更新不定期会读极少删除.

Kitten的设计目标是:高吞吐+低延时有容错机制低成本架构简单.

围绕这些目标,Kitten包含了以下几个模块:

Proxy

Proxy模块作为一个面向用户的模块,屏蔽了Kitten内部的各种操作,向外暴露三个简单的API,getpostdelete。分别代表读取、写入和删除操作。Proxy向下都是通过grpc进行通信。

Directory

Cache

Store

Roadmap

Name Issue Description
Kitten's basic component #1 Implement basic component including Store, Cache, Directory
Introduce Etcd #2 Introduce Etcd for distributed management.
Expose easy APIs #3 Find an elegantly way to expose APIs.
Support S3 API #4 As S3 APIs are the de facto standards for OSS, support S3 style APIs.
Implement erasure code #5 Split data into two groups(hot/warm), use erasure code to store warm data.
Similar Resources

The Container Storage Interface (CSI) Driver for Fortress Block Storage This driver allows you to use Fortress Block Storage with your container orchestrator

fortress-csi The Container Storage Interface (CSI) Driver for Fortress Block Storage This driver allows you to use Fortress Block Storage with your co

Jan 23, 2022

Tapestry is an underlying distributed object location and retrieval system (DOLR) which can be used to store and locate objects. This distributed system provides an interface for storing and retrieving key-value pairs.

Tapestry This project implements Tapestry, an underlying distributed object location and retrieval system (DOLR) which can be used to store and locate

Mar 16, 2022

a small form factor OpenShift/Kubernetes optimized for edge computing

Microshift Microshift is OpenShift1 Kubernetes in a small form factor and optimized for edge computing. Edge devices deployed out in the field pose ve

Dec 29, 2022

Microshift is a research project that is exploring how OpenShift1 Kubernetes can be optimized for small form factor and edge computing.

Microshift is a research project that is exploring how OpenShift1 Kubernetes can be optimized for small form factor and edge computing.

Nov 1, 2021

Vaala archive is a tar archive tool & library optimized for lots of small files.

🐳 Vaar 📦 Vaala archive is a tar archive tool & library optimized for lots of small files. Written in Golang, vaar performs operations in parallel &

Sep 12, 2022

This is a simple file storage server. User can upload file, delete file and list file on the server.

This is a simple file storage server.  User can upload file,  delete file and list file on the server.

Simple File Storage Server This is a simple file storage server. User can upload file, delete file and list file on the server. If you want to build a

Jan 19, 2022

A distributed MySQL binlog storage system built on Raft

A distributed MySQL binlog storage system built on Raft

What is kingbus? 中文 Kingbus is a distributed MySQL binlog store based on raft. Kingbus can act as a slave to the real master and as a master to the sl

Dec 31, 2022

A distributed MySQL binlog storage system built on Raft

A distributed MySQL binlog storage system built on Raft

What is kingbus? 中文 Kingbus is a distributed MySQL binlog store based on raft. Kingbus can act as a slave to the real master and as a master to the sl

Dec 31, 2022

A distributed key-value storage system developed by Alibaba Group

Product Overview Tair is fast-access memory (MDB)/persistent (LDB) storage service. Using a high-performance and high-availability distributed cluster

Dec 31, 2022

IceFireDB - Distributed disk storage system based on Raft and RESP protocol.

IceFireDB - Distributed disk storage system based on Raft and RESP protocol.

Distributed disk storage database based on Raft and Redis protocol.

Dec 27, 2022

SeaweedFS a fast distributed storage system for blobs, objects, files, and data lake, for billions of files

SeaweedFS a fast distributed storage system for blobs, objects, files, and data lake, for billions of files

SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.

Jan 8, 2023

BlobStore is a highly reliable,highly available and ultra-large scale distributed storage system

BlobStore Overview Documents Build BlobStore Deploy BlobStore Manage BlobStore License Overview BlobStore is a highly reliable,highly available and ul

Oct 10, 2022

Distributed-Services - Distributed Systems with Golang to consequently build a fully-fletched distributed service

Distributed-Services This project is essentially a result of my attempt to under

Jun 1, 2022

Update-java-ca-certificates - Small utility to convert the system trust store to a system Java KeyStore

update-java-ca-certificates This small utility takes care of creating a system-w

Dec 28, 2022

Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in order

Dec 28, 2022

Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in order

Dec 30, 2022

Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in order

Jan 9, 2023

Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The main branch may be in an unstable or even broken state during development. For stable versions, see releases. etcd is a distributed rel

Dec 30, 2022
Comments
  • Introduce etcd

    Introduce etcd

    As kitten is a distributed file system, we need to introduce a component for distributed management. Here we chose Etcd(https://github.com/etcd-io/etcd).

  • Implement basic components

    Implement basic components

    Implement basic components from Haystack, which including: Store, Cache, Directory

    Paper reference: Finding a needle in Haystack(https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Beaver.pdf)

Related tags
A user-space file system for interacting with Google Cloud Storage

gcsfuse is a user-space file system for interacting with Google Cloud Storage. Current status Please treat gcsfuse as beta-quality software. Use it fo

Dec 29, 2022
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
JuiceFS is a distributed POSIX file system built on top of Redis and S3.

JuiceFS is a high-performance POSIX file system released under GNU Affero General Public License v3.0. It is specially optimized for the cloud-native

Jan 1, 2023
Cross-platform file system notifications for Go.

File system notifications for Go fsnotify utilizes golang.org/x/sys rather than syscall from the standard library. Ensure you have the latest version

Jan 2, 2023
Goofys is a high-performance, POSIX-ish Amazon S3 file system written in Go
Goofys is a high-performance, POSIX-ish Amazon S3 file system written in Go

Goofys is a high-performance, POSIX-ish Amazon S3 file system written in Go Overview Goofys allows you to mount an S3 bucket as a filey system. It's a

Jan 8, 2023
The Swift Virtual File System

*** This project is not maintained anymore *** The Swift Virtual File System SVFS is a Virtual File System over Openstack Swift built upon fuse. It is

Dec 11, 2022
Cross-platform file system notifications for Go.

File system notifications for Go fsnotify utilizes golang.org/x/sys rather than syscall from the standard library. Ensure you have the latest version

Aug 7, 2017
A distributed key value store in under 1000 lines. Used in production at comma.ai

minikeyvalue Fed up with the complexity of distributed filesystems? minikeyvalue is a ~1000 line distributed key value store, with support for replica

Jan 9, 2023
A FileSystem Abstraction System for Go
A FileSystem Abstraction System for Go

A FileSystem Abstraction System for Go Overview Afero is a filesystem framework providing a simple, uniform and universal API interacting with any fil

Dec 31, 2022
go-fastdfs 是一个简单的分布式文件系统(私有云存储),具有无中心、高性能,高可靠,免维护等优点,支持断点续传,分块上传,小文件合并,自动同步,自动修复。Go-fastdfs is a simple distributed file system (private cloud storage), with no center, high performance, high reliability, maintenance free and other advantages, support breakpoint continuation, block upload, small file merge, automatic synchronization, automatic repair.(similar fastdfs).
go-fastdfs 是一个简单的分布式文件系统(私有云存储),具有无中心、高性能,高可靠,免维护等优点,支持断点续传,分块上传,小文件合并,自动同步,自动修复。Go-fastdfs is a simple distributed file system (private cloud storage), with no center, high performance, high reliability, maintenance free and other advantages, support breakpoint continuation, block upload, small file merge, automatic synchronization, automatic repair.(similar fastdfs).

中文 English 愿景:为用户提供最简单、可靠、高效的分布式文件系统。 go-fastdfs是一个基于http协议的分布式文件系统,它基于大道至简的设计理念,一切从简设计,使得它的运维及扩展变得更加简单,它具有高性能、高可靠、无中心、免维护等优点。 大家担心的是这么简单的文件系统,靠不靠谱,可不

Jan 8, 2023