DonutDB: A SQL database implemented on DynamoDB and SQLite

Last update: Dec 21, 2022

Comments: 7

DonutDB: A SQL database implemented on DynamoDB and SQLite

Store and query a sqlite database directly backed by DynamoDB.

Project Status: Alpha

What is this?

This is a SQL database backed only by DynamoDB. More specifically, this is a SQLite VFS (virtual file system) implemented on top of DynamoDB.

But why?

I believe this is the cheapest way to run a SQL database in AWS (at least for small databases with relatively low query volumes).

My main use case is to have a read/write SQL database that can be used by AWS Lambda functions. It could also be used by software running on EC2 instances.

How do I use this?

You need to create a DynamoDB table with the following properties:

resource "aws_dynamodb_table" "table" {
  name         = "some-dynamo-table-name"
  hash_key     = "hash_key"
  range_key    = "range_key"

  attribute {
    name = "hash_key"
    type = "S"
  }

  attribute {
    name = "range_key"
    type = "N"
  }
}

Then to use in a Go application:

package main

import (
	"database/sql"
	"fmt"

	"github.com/aws/aws-sdk-go/aws/session"
	"github.com/aws/aws-sdk-go/service/dynamodb"
	_ "github.com/mattn/go-sqlite3"
	"github.com/psanford/sqlite3vfs"
)

func main() {

	dynamoDBclient := dynamodb.New(session.New())

	tableName := "some-dynamo-table-name"

	vfs, err := donutdb.New(dynamoDBclient, tableName)
	if err != nil {
		panic(err)
	}

	// register the custom donutdb vfs with sqlite
	// the name specifed here must match the `vfs` param
	// passed to sql.Open in the dataSourceName:
	// e.g. `...?vfs=donutdb`
	err = sqlite3vfs.RegisterVFS("donutdb", vfs)
	if err != nil {
		panic(err)
	}

	// file0 is the name of the file stored in dynamodb
	// you can have multiple db files stored in a single dynamodb table
	// The `vfs=donutdb` instructs sqlite to use the custom vfs implementation.
	// The name must match the name passed to `sqlite3vfs.RegisterVFS`
	db, err := sql.Open("sqlite3", "file0.db?vfs=donutdb")
	if err != nil {
		panic(err)
	}
	defer db.Close()

	_, err = db.Exec(`CREATE TABLE IF NOT EXISTS foo (
id text NOT NULL PRIMARY KEY,
title text
)`)
	if err != nil {
		panic(err)
	}

	_, err = db.Exec(`INSERT INTO foo (id, title) values (?, ?)`, "developer-arbitration", "washroom-whitecap")
	if err != nil {
		panic(err)
	}

	var gotID, gotTitle string
	row := db.QueryRow("SELECT id, title FROM foo where id = ?", "developer-arbitration")
	err = row.Scan(&gotID, &gotTitle)
	if err != nil {
		panic(err)
	}

	fmt.Printf("got: id=%s title=%s", gotID, gotTitle)
}

SQLite3 CLI loadable module

DonutDB also has a SQLite3 module in donutdb-loadable. This allows you to interact with DonutDB databases interactively from the SQLite3 CLI.

-- load extension sqlite> .load ./donutdb sqlite3vfs register donutdb sqlite3vfs register donutdb done sqlite> -- open db using vfs=donutdb, note you must use the sqlite uri syntax which starts with file:// sqlite> .open file:///foo.db?vfs=donutdb sqlite> -- query from remote db sqlite> SELECT * from csv where period > '2010' limit 10; series_reference period data_value suppressed status units magntude subject grp series_title_1 -------------------- ---------- ---------- ---------- ---------- ---------- ---------- ------------------------- ---------------------------------------------------- -------------- BOPQ.S06AC000000000A 2010.03 17463 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual BOPQ.S06AC000000000A 2010.06 17260 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual BOPQ.S06AC000000000A 2010.09 15419 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual BOPQ.S06AC000000000A 2010.12 17088 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual BOPQ.S06AC000000000A 2011.03 18516 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual BOPQ.S06AC000000000A 2011.06 18835 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual BOPQ.S06AC000000000A 2011.09 16390 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual BOPQ.S06AC000000000A 2011.12 18748 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual BOPQ.S06AC000000000A 2012.03 18477 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual BOPQ.S06AC000000000A 2012.06 18270 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual ">

$ cd donutdb-loadable
$ make
go build -tags SQLITE3VFS_LOADABLE_EXT -o donutloadable.a -buildmode=c-archive donut_loadable.go
rm donutloadable.h
gcc -g -fPIC -shared -o donutdb.so donutloadable.c donutloadable.a

# set the DynamoDB table name:
$ export DONUTDB_TABLE=my-donutdb-table

$ sqlite3
SQLite version 3.31.1 2020-01-27 19:55:54
Enter ".help" for usage hints.
sqlite> -- load extension
sqlite> .load ./donutdb
sqlite3vfs register donutdb
sqlite3vfs register donutdb done
sqlite> -- open db using vfs=donutdb, note you must use the sqlite uri syntax which starts with file://
sqlite> .open file:///foo.db?vfs=donutdb
sqlite> -- query from remote db
sqlite> SELECT * from csv where period > '2010' limit 10;
series_reference      period      data_value  suppressed  status      units       magntude    subject                    grp                                                   series_title_1
--------------------  ----------  ----------  ----------  ----------  ----------  ----------  -------------------------  ----------------------------------------------------  --------------
BOPQ.S06AC000000000A  2010.03     17463                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2010.06     17260                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2010.09     15419                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2010.12     17088                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2011.03     18516                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2011.06     18835                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2011.09     16390                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2011.12     18748                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2012.03     18477                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2012.06     18270                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual

CLI tool

DonutDB also provides a CLI tool to make it easier to manage SQLite database files in DynamoDB. donutdb-cli allow you to push and pull db files to a DynamoDB table:

$ ./donutdb-cli
DonutDB CLI

Usage:
  donutdb-cli [command]

Available Commands:
  completion  generate the autocompletion script for the specified shell
  debug       Debug commands
  help        Help about any command
  ls          List files in table
  pull        Pull file from DynamoDB to local filesystem
  push        Push file from local filesystem to DynamoDB
  rm          Remove file from dynamodb table

Flags:
  -h, --help   help for donutdb-cli

Use "donutdb-cli [command] --help" for more information about a command.

Is it safe to use concurrently?

It should be. DonutDB currently implements a global lock using DynamoDB locks. This means access to the database is serialized to a single client at a time, which should make it safe for multiple clients without risk of corrupting the data. However this also means client may get "Database locked" errors if clients hold locks for too long.

In the future we may implement a multi-reader single-writer locking strategy.

Performance Considerations

Roundtrip latency to DynamoDB has a major impact on query performance. You probably want to run you application in the same region as your DynamoDB table.

If you are using DonutDB from a Lambda function, you may want to do some testing with how the Lambda function's allocated memory affects query latency (memory size for Lambda also affects cpu allocation). In my testing I've found that at very low memory (128mb) application latency is affected by GC and CPU overhead for zstd decompression. Performance gets significantly better as memory size is increased.

DynamoDB Schema

The basic idea is that all data and metadata will be stored in a single dynamodb table for the vfs. The goal of using a single dynamodb table is to make the setup as easy as possible.

The current plan for the schema is as follows:

Dynamo Table: HashKey string SortKey int

Data Types stored in the dynamo table:

File metadata This contains a mapping of filename to metadata. Each file gets a random_id that is part of the hash_key for the file data and lock row. The random_id allows for deleting a file atomically by simply removing the metadata record. The metadata also includes the sector size used for the file. This allows for changing the sector size default in the future without breaking existing file records. File metadata is stored in a single row with a hash_key of file-meta-v1 and a range_key of 0. The filename is the attribute name and the metadata is stored as JSON in the attribute value.
File data This is where the bytes for each file is stored. The primary key for a file will be file-v1-${rand_id}-${filename}. Each file will be split into 4k chunks (sectors). The Sort Key is the position in the file at the sector boundary. If a sector exists, all previous sectors must also exist in the table. The bytes for a sector are stored in the attribute named "bytes". That attribute must have exactly 4k bytes, unless it is the final sector. The final sector should stop where the file stops.
Lock data This is where looks are stored for coordination. The current implementation uses a single global lock, similar to the sqlite flock and dot-lock implementations. The primary key for the global lock is lock-global-v1-${rand_id}-${filename} with a sort key of 0.

It should be possible to implement multi-reader single writer locks on top of dynamodb in the future.

Comments

Make FullPathname always prefix paths with '/'

This mainly makes it easier to work with the loadable module from the sqlite3 cli and the sqlite URI syntax. The URI syntax is always an absolute path so will always begin with the path separator.

Without this, you can accidentally create db files in dynamo that you would be unable to access from the loadable module.
Reduce memory allocations and zstd cpu time

Previously we were allocating a new zstd reader for every read. This has significant overhead and was completely unnecessary since we do a read of the entire sector.

We now use the DecodeAll stateless zstd interface. We also pre-allocate the correct uncompressed sector size to avoid extra allocations.

On my real application I'm testing against this dropped read latency by half. Read performance is now the same as using sqlite3vfshttp from s3.
Compress sectors

From testing against production dynamodb tables, attribute size makes a large difference in query latency. The data that I'm working with is mostly text and compresses well. Also, at least with young sqlite database files, they start off fairly sparse.

We store the compression algorithm in the file metadata so if we want to upgrade the algorithm at some point in the future we can do so easily.
Store metadata

Add back separate row for file metadata.

This currently serves two purposes:

It allows us to add a level of indirection between the filename and the file data. This makes things like rm file safer, as we can now delete the reference to the file and then we don't need to worry too much about errors occurring while deleting the data (we still care about deleting the file data so we don't leak rows).

It also allows us to track the sector size on a per file basis. I'm not sure yet what the optimal sector size is. I'd like to be able to experiment with different sizes without making old files inaccessible. It is not currently possible to change the sector size of an existing file.
donutdb cli should allow you to specify the path to download a database

I've been having some issues with downloading the databases from dynamodb after making changes to them. The donutdb cli command wants you to specify a database name and will try to write to that path even if it is in root. The cli pull command should include 1 more optional argument as a path to write the database to. The vfs setup requires an absolute path it seems.

It also by default marks any database file as executable. So the file permissions for anything written out should be 0644 instead of -rwxr-xr-x.
This is so cool!! Can I help?

I don't know how this project hasn't gotten more attention. This is such a neat way to use sqlite! It's exactly what I've been looking for for random stuff I want to get set up on Lambda.

I know Go pretty well and I'm reasonably familiar with DynamoDB, but I'm somewhat new to sqlite internals and such. Do you have a todo list for this project somewhere that I can help with somehow?

In particular, I'm very interested in supporting lots of concurrent readers. One approach I thought about was to use DynamoDB replication to replicate the table into a separate read-only copy of the database and in Donut, somehow designate that second table as read-only. Sort of the same idea as having a read replica on MySQL or Postgres: I'm not sure on the DonutDB specifics, but I think you wouldn't have to do any locking if there was a guarantee that Donut would only be reading from the DB, right?
Add basic profiling info to readme

DonutDB is slower than a local SQLite file, but how much slower? It'd be nice to at least quantify this a little bit so people have a rough idea of the performance difference.

Maybe start with the queries listed here: https://www.sqlite.org/speed.html.

Since we use a global lock, its probably not worth profiling concurrent write workloads (donutdb those will all get serialized). I hope this is already clear in the readme, but its probably worth mentioning again alongside this data.

DonutDB: A SQL database implemented on DynamoDB and SQLite

DonutDB: A SQL database implemented on DynamoDB and SQLite

What is this?

But why?

How do I use this?

SQLite3 CLI loadable module

CLI tool

Is it safe to use concurrently?

Performance Considerations

DynamoDB Schema

Owner

Peter Sanford

Comments

Make FullPathname always prefix paths with '/'

Reduce memory allocations and zstd cpu time

Compress sectors

Store metadata

donutdb cli should allow you to specify the path to download a database

This is so cool!! Can I help?

Add basic profiling info to readme

Related tags

Mantil-template-form-to-dynamodb - Receive form data and write it to a DynamoDB table

"Go SQL DB" is a relational database that supports SQL queries for research purposes

Beerus-DB: a database operation framework, currently only supports Mysql, Use [go-sql-driver/mysql] to do database connection and basic operations

The lightweight, distributed relational database built on SQLite.

Terraform provider for SQLite database engine

A go library for testing Amazon DynamoDB.

A MySQL-compatible relational database with a storage agnostic query engine. Implemented in pure Go.

☄ The golang convenient converter supports Database to Struct, SQL to Struct, and JSON to Struct.

Dolt is a SQL database that you can fork, clone, branch, merge, push and pull just like a git repository.

A SQLite-based hierarchical key-value store written in Go

Owl is a db manager platform,committed to standardizing the data, index in the database and operations to the database, to avoid risks and failures.

CockroachDB - the open source, cloud-native distributed SQL database.

A decentralized, trusted, high performance, SQL database with blockchain features

LBADD: An experimental, distributed SQL database

A course to build the SQL layer of a distributed database.

This is a simple Golang application that executes SQL commands to clean up a mirror node's database.

Hard Disk Database based on a former database

Simple key value database that use json files to store the database

Nipo is a powerful, fast, multi-thread, clustered and in-memory key-value database, with ability to configure token and acl on commands and key-regexes written by GO