DonutDB: A SQL database implemented on DynamoDB and SQLite

DonutDB: A SQL database implemented on DynamoDB and SQLite

Store and query a sqlite database directly backed by DynamoDB.

Project Status: Alpha

What is this?

This is a SQL database backed only by DynamoDB. More specifically, this is a SQLite VFS (virtual file system) implemented on top of DynamoDB.

But why?

I believe this is the cheapest way to run a SQL database in AWS (at least for small databases with relatively low query volumes).

My main use case is to have a read/write SQL database that can be used by AWS Lambda functions. It could also be used by software running on EC2 instances.

How do I use this?

You need to create a DynamoDB table with the following properties:

resource "aws_dynamodb_table" "table" {
  name         = "some-dynamo-table-name"
  hash_key     = "hash_key"
  range_key    = "range_key"

  attribute {
    name = "hash_key"
    type = "S"
  }

  attribute {
    name = "range_key"
    type = "N"
  }
}

Then to use in a Go application:

package main

import (
	"database/sql"
	"fmt"

	"github.com/aws/aws-sdk-go/aws/session"
	"github.com/aws/aws-sdk-go/service/dynamodb"
	_ "github.com/mattn/go-sqlite3"
	"github.com/psanford/sqlite3vfs"
)

func main() {

	dynamoDBclient := dynamodb.New(session.New())

	tableName := "some-dynamo-table-name"

	vfs, err := donutdb.New(dynamoDBclient, tableName)
	if err != nil {
		panic(err)
	}

	// register the custom donutdb vfs with sqlite
	// the name specifed here must match the `vfs` param
	// passed to sql.Open in the dataSourceName:
	// e.g. `...?vfs=donutdb`
	err = sqlite3vfs.RegisterVFS("donutdb", vfs)
	if err != nil {
		panic(err)
	}

	// file0 is the name of the file stored in dynamodb
	// you can have multiple db files stored in a single dynamodb table
	// The `vfs=donutdb` instructs sqlite to use the custom vfs implementation.
	// The name must match the name passed to `sqlite3vfs.RegisterVFS`
	db, err := sql.Open("sqlite3", "file0.db?vfs=donutdb")
	if err != nil {
		panic(err)
	}
	defer db.Close()

	_, err = db.Exec(`CREATE TABLE IF NOT EXISTS foo (
id text NOT NULL PRIMARY KEY,
title text
)`)
	if err != nil {
		panic(err)
	}

	_, err = db.Exec(`INSERT INTO foo (id, title) values (?, ?)`, "developer-arbitration", "washroom-whitecap")
	if err != nil {
		panic(err)
	}

	var gotID, gotTitle string
	row := db.QueryRow("SELECT id, title FROM foo where id = ?", "developer-arbitration")
	err = row.Scan(&gotID, &gotTitle)
	if err != nil {
		panic(err)
	}

	fmt.Printf("got: id=%s title=%s", gotID, gotTitle)
}

SQLite3 CLI loadable module

DonutDB also has a SQLite3 module in donutdb-loadable. This allows you to interact with DonutDB databases interactively from the SQLite3 CLI.

-- load extension sqlite> .load ./donutdb sqlite3vfs register donutdb sqlite3vfs register donutdb done sqlite> -- open db using vfs=donutdb, note you must use the sqlite uri syntax which starts with file:// sqlite> .open file:///foo.db?vfs=donutdb sqlite> -- query from remote db sqlite> SELECT * from csv where period > '2010' limit 10; series_reference period data_value suppressed status units magntude subject grp series_title_1 -------------------- ---------- ---------- ---------- ---------- ---------- ---------- ------------------------- ---------------------------------------------------- -------------- BOPQ.S06AC000000000A 2010.03 17463 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual BOPQ.S06AC000000000A 2010.06 17260 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual BOPQ.S06AC000000000A 2010.09 15419 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual BOPQ.S06AC000000000A 2010.12 17088 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual BOPQ.S06AC000000000A 2011.03 18516 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual BOPQ.S06AC000000000A 2011.06 18835 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual BOPQ.S06AC000000000A 2011.09 16390 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual BOPQ.S06AC000000000A 2011.12 18748 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual BOPQ.S06AC000000000A 2012.03 18477 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual BOPQ.S06AC000000000A 2012.06 18270 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual ">
$ cd donutdb-loadable
$ make
go build -tags SQLITE3VFS_LOADABLE_EXT -o donutloadable.a -buildmode=c-archive donut_loadable.go
rm donutloadable.h
gcc -g -fPIC -shared -o donutdb.so donutloadable.c donutloadable.a

# set the DynamoDB table name:
$ export DONUTDB_TABLE=my-donutdb-table

$ sqlite3
SQLite version 3.31.1 2020-01-27 19:55:54
Enter ".help" for usage hints.
sqlite> -- load extension
sqlite> .load ./donutdb
sqlite3vfs register donutdb
sqlite3vfs register donutdb done
sqlite> -- open db using vfs=donutdb, note you must use the sqlite uri syntax which starts with file://
sqlite> .open file:///foo.db?vfs=donutdb
sqlite> -- query from remote db
sqlite> SELECT * from csv where period > '2010' limit 10;
series_reference      period      data_value  suppressed  status      units       magntude    subject                    grp                                                   series_title_1
--------------------  ----------  ----------  ----------  ----------  ----------  ----------  -------------------------  ----------------------------------------------------  --------------
BOPQ.S06AC000000000A  2010.03     17463                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2010.06     17260                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2010.09     15419                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2010.12     17088                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2011.03     18516                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2011.06     18835                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2011.09     16390                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2011.12     18748                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2012.03     18477                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2012.06     18270                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual

CLI tool

DonutDB also provides a CLI tool to make it easier to manage SQLite database files in DynamoDB. donutdb-cli allow you to push and pull db files to a DynamoDB table:

$ ./donutdb-cli
DonutDB CLI

Usage:
  donutdb-cli [command]

Available Commands:
  completion  generate the autocompletion script for the specified shell
  debug       Debug commands
  help        Help about any command
  ls          List files in table
  pull        Pull file from DynamoDB to local filesystem
  push        Push file from local filesystem to DynamoDB
  rm          Remove file from dynamodb table

Flags:
  -h, --help   help for donutdb-cli

Use "donutdb-cli [command] --help" for more information about a command.

Is it safe to use concurrently?

It should be. DonutDB currently implements a global lock using DynamoDB locks. This means access to the database is serialized to a single client at a time, which should make it safe for multiple clients without risk of corrupting the data. However this also means client may get "Database locked" errors if clients hold locks for too long.

In the future we may implement a multi-reader single-writer locking strategy.

Performance Considerations

Roundtrip latency to DynamoDB has a major impact on query performance. You probably want to run you application in the same region as your DynamoDB table.

If you are using DonutDB from a Lambda function, you may want to do some testing with how the Lambda function's allocated memory affects query latency (memory size for Lambda also affects cpu allocation). In my testing I've found that at very low memory (128mb) application latency is affected by GC and CPU overhead for zstd decompression. Performance gets significantly better as memory size is increased.

DynamoDB Schema

The basic idea is that all data and metadata will be stored in a single dynamodb table for the vfs. The goal of using a single dynamodb table is to make the setup as easy as possible.

The current plan for the schema is as follows:

Dynamo Table: HashKey string SortKey int

Data Types stored in the dynamo table:

  • File metadata This contains a mapping of filename to metadata. Each file gets a random_id that is part of the hash_key for the file data and lock row. The random_id allows for deleting a file atomically by simply removing the metadata record. The metadata also includes the sector size used for the file. This allows for changing the sector size default in the future without breaking existing file records. File metadata is stored in a single row with a hash_key of file-meta-v1 and a range_key of 0. The filename is the attribute name and the metadata is stored as JSON in the attribute value.

  • File data This is where the bytes for each file is stored. The primary key for a file will be file-v1-${rand_id}-${filename}. Each file will be split into 4k chunks (sectors). The Sort Key is the position in the file at the sector boundary. If a sector exists, all previous sectors must also exist in the table. The bytes for a sector are stored in the attribute named "bytes". That attribute must have exactly 4k bytes, unless it is the final sector. The final sector should stop where the file stops.

  • Lock data This is where looks are stored for coordination. The current implementation uses a single global lock, similar to the sqlite flock and dot-lock implementations. The primary key for the global lock is lock-global-v1-${rand_id}-${filename} with a sort key of 0.

It should be possible to implement multi-reader single writer locks on top of dynamodb in the future.

Comments
  • Make FullPathname always prefix paths with '/'

    Make FullPathname always prefix paths with '/'

    This mainly makes it easier to work with the loadable module from the sqlite3 cli and the sqlite URI syntax. The URI syntax is always an absolute path so will always begin with the path separator.

    Without this, you can accidentally create db files in dynamo that you would be unable to access from the loadable module.

  • Reduce memory allocations and zstd cpu time

    Reduce memory allocations and zstd cpu time

    Previously we were allocating a new zstd reader for every read. This has significant overhead and was completely unnecessary since we do a read of the entire sector.

    We now use the DecodeAll stateless zstd interface. We also pre-allocate the correct uncompressed sector size to avoid extra allocations.

    On my real application I'm testing against this dropped read latency by half. Read performance is now the same as using sqlite3vfshttp from s3.

  • Compress sectors

    Compress sectors

    From testing against production dynamodb tables, attribute size makes a large difference in query latency. The data that I'm working with is mostly text and compresses well. Also, at least with young sqlite database files, they start off fairly sparse.

    We store the compression algorithm in the file metadata so if we want to upgrade the algorithm at some point in the future we can do so easily.

  • Store metadata

    Store metadata

    Add back separate row for file metadata.

    This currently serves two purposes:

    It allows us to add a level of indirection between the filename and the file data. This makes things like rm file safer, as we can now delete the reference to the file and then we don't need to worry too much about errors occurring while deleting the data (we still care about deleting the file data so we don't leak rows).

    It also allows us to track the sector size on a per file basis. I'm not sure yet what the optimal sector size is. I'd like to be able to experiment with different sizes without making old files inaccessible. It is not currently possible to change the sector size of an existing file.

  • donutdb cli should allow you to specify the path to download a database

    donutdb cli should allow you to specify the path to download a database

    I've been having some issues with downloading the databases from dynamodb after making changes to them. The donutdb cli command wants you to specify a database name and will try to write to that path even if it is in root. The cli pull command should include 1 more optional argument as a path to write the database to. The vfs setup requires an absolute path it seems.

    It also by default marks any database file as executable. So the file permissions for anything written out should be 0644 instead of -rwxr-xr-x.

  • This is so cool!! Can I help?

    This is so cool!! Can I help?

    I don't know how this project hasn't gotten more attention. This is such a neat way to use sqlite! It's exactly what I've been looking for for random stuff I want to get set up on Lambda.

    I know Go pretty well and I'm reasonably familiar with DynamoDB, but I'm somewhat new to sqlite internals and such. Do you have a todo list for this project somewhere that I can help with somehow?

    In particular, I'm very interested in supporting lots of concurrent readers. One approach I thought about was to use DynamoDB replication to replicate the table into a separate read-only copy of the database and in Donut, somehow designate that second table as read-only. Sort of the same idea as having a read replica on MySQL or Postgres: I'm not sure on the DonutDB specifics, but I think you wouldn't have to do any locking if there was a guarantee that Donut would only be reading from the DB, right?

  • Add basic profiling info to readme

    Add basic profiling info to readme

    DonutDB is slower than a local SQLite file, but how much slower? It'd be nice to at least quantify this a little bit so people have a rough idea of the performance difference.

    Maybe start with the queries listed here: https://www.sqlite.org/speed.html.

    Since we use a global lock, its probably not worth profiling concurrent write workloads (donutdb those will all get serialized). I hope this is already clear in the readme, but its probably worth mentioning again alongside this data.

Related tags
Mantil-template-form-to-dynamodb - Receive form data and write it to a DynamoDB table
Mantil-template-form-to-dynamodb - Receive form data and write it to a DynamoDB table

This template is an example of serverless integration between Google Forms and DynamoDB

Jan 17, 2022
"Go SQL DB" is a relational database that supports SQL queries for research purposes

A pure golang SQL database for database theory research

Jan 6, 2023
Beerus-DB: a database operation framework, currently only supports Mysql, Use [go-sql-driver/mysql] to do database connection and basic operations

Beerus-DB · Beerus-DB is a database operation framework, currently only supports Mysql, Use [go-sql-driver/mysql] to do database connection and basic

Oct 29, 2022
The lightweight, distributed relational database built on SQLite.
The lightweight, distributed relational database built on SQLite.

rqlite is a lightweight, distributed relational database, which uses SQLite as its storage engine. Forming a cluster is very straightforward, it grace

Jan 5, 2023
Terraform provider for SQLite database engine

Terraform provider for SQLite database engine !!! WARNING !!! This is an educational project. Not intended for any production use! !!! WARNING !!! Her

Jun 11, 2022
A go library for testing Amazon DynamoDB.

minidyn Amazon DynamoDB testing library written in Go. Goals Make local testing for DynamoDB as accurate as possible. Run DynamoDB tests in a CI witho

Nov 9, 2022
A MySQL-compatible relational database with a storage agnostic query engine. Implemented in pure Go.

go-mysql-server is a SQL engine which parses standard SQL (based on MySQL syntax) and executes queries on data sources of your choice. A simple in-memory database and table implementation are provided, and you can query any data source you want by implementing a few interfaces.

Dec 27, 2022
☄ The golang convenient converter supports Database to Struct, SQL to Struct, and JSON to Struct.
☄ The golang convenient converter supports Database to Struct, SQL to Struct, and JSON to Struct.

Gormat - Cross platform gopher tool The golang convenient converter supports Database to Struct, SQL to Struct, and JSON to Struct. 中文说明 Features Data

Dec 20, 2022
Dolt is a SQL database that you can fork, clone, branch, merge, push and pull just like a git repository.

Dolt is a SQL database that you can fork, clone, branch, merge, push and pull just like a git repository. Connect to Dolt just like any MySQL database to run queries or update the data using SQL commands. Use the command line interface to import CSV files, commit your changes, push them to a remote, or merge your teammate's changes.

Dec 31, 2022
A SQLite-based hierarchical key-value store written in Go

camellia ?? A lightweight hierarchical key-value store camellia is a Go library that implements a simple, hierarchical, persistent key-value store, ba

Nov 9, 2022
Owl is a db manager platform,committed to standardizing the data, index in the database and operations to the database, to avoid risks and failures.

Owl is a db manager platform,committed to standardizing the data, index in the database and operations to the database, to avoid risks and failures. capabilities which owl provides include Process approval、sql Audit、sql execute and execute as crontab、data backup and recover .

Nov 9, 2022
CockroachDB - the open source, cloud-native distributed SQL database.
CockroachDB - the open source, cloud-native distributed SQL database.

CockroachDB is a cloud-native SQL database for building global, scalable cloud services that survive disasters. What is CockroachDB? Docs Quickstart C

Jan 2, 2023
A decentralized, trusted, high performance, SQL database with blockchain features
A decentralized, trusted, high performance, SQL database with blockchain features

中文简介 CovenantSQL(CQL) is a Byzantine Fault Tolerant relational database built on SQLite: ServerLess: Free, High Availabile, Auto Sync Database Service

Jan 3, 2023
LBADD: An experimental, distributed SQL database
LBADD: An experimental, distributed SQL database

LBADD Let's build a distributed database. LBADD is an experimental distributed SQL database, written in Go. The goal of this project is to build a dat

Nov 29, 2022
A course to build the SQL layer of a distributed database.

TinySQL TinySQL is a course designed to teach you how to implement a distributed relational database in Go. TinySQL is also the name of the simplifed

Jan 8, 2023
This is a simple Golang application that executes SQL commands to clean up a mirror node's database.

This is a simple Golang application that executes SQL commands to clean up a mirror node's database.

Jan 24, 2022
Hard Disk Database based on a former database

Hard Disk Database based on a former database

Nov 1, 2021
Simple key value database that use json files to store the database

KValDB Simple key value database that use json files to store the database, the key and the respective value. This simple database have two gRPC metho

Nov 13, 2021
Nipo is a powerful, fast, multi-thread, clustered and in-memory key-value database, with ability to configure token and acl on commands and key-regexes written by GO

Welcome to NIPO Nipo is a powerful, fast, multi-thread, clustered and in-memory key-value database, with ability to configure token and acl on command

Dec 28, 2022