Streaming replication for SQLite.

Last update: Jan 9, 2023

Comments: 17

Litestream

Litestream is a standalone streaming replication tool for SQLite. It runs as a background process and safely replicates changes incrementally to another file or S3. Litestream only communicates with SQLite through the SQLite API so it will not corrupt your database.

If you need support or have ideas for improving Litestream, please join the Litestream Slack or visit the GitHub Discussions. Please visit the Litestream web site for installation instructions and documentation.

If you find this project interesting, please consider starring the project on GitHub.

Acknowledgements

While the Litestream project does not accept external code patches, many of the most valuable contributions are in the forms of testing, feedback, and documentation. These help harden software and streamline usage for other users.

I want to give special thanks to individuals who invest much of their time and energy into the project to help make it better:

Thanks to Cory LaNou for giving early feedback and testing when Litestream was still pre-release.
Thanks to Michael Lynch for digging into issues and contributing to the documentation.
Thanks to Kurt Mackey for feedback and testing. Also, thanks to fly.io for providing testing resources.
Thanks to Sam Weston for figuring out how to run Litestream on Kubernetes and writing up the docs for it.
Thanks to Rafael & Jungle Boogie for helping to get OpenBSD release builds working.
Thanks to Simon Gottschlag, Marin,Victor Björklund, Jonathan Beri Yuri, Nathan Probst, Yann Coleuu, and Nicholas Grilly for frequent feedback, testing, & support.

Open-source, not open-contribution

Similar to SQLite, Litestream is open source but closed to code contributions. This keeps the code base free of proprietary or licensed code but it also helps me continue to maintain and build Litestream.

As the author of BoltDB, I found that accepting and maintaining third party patches contributed to my burn out and I eventually archived the project. Writing databases & low-level replication tools involves nuance and simple one line changes can have profound and unexpected changes in correctness and performance. Small contributions typically required hours of my time to properly test and validate them.

I am grateful for community involvement, bug reports, & feature requests. I do not wish to come off as anything but welcoming, however, I've made the decision to keep this project closed to contributions for my own mental health and long term viability of the project.

The documentation repository is MIT licensed and pull requests are welcome there.

Owner

Ben Johnson

https://github.com/benbjohnson/litestream https://litestream.io

Comments

Live read replicas

Currently, litestream replicas a database to cold storage (file or s3). It would be nice to provide a way to replicate streams of WAL frames to other live nodes to serve as read replicas.

This a network replication endpoint (probably http) as well as figuring out how to apply WAL writes to a live database.
Would it be possible to create an OpenBSD binary with each new release of litestream?

I am not too familiar with how cross-compilation works in this case, but I am wondering if it'd be too much work?

If a build VM is needed then I am willing to provide one :)
Google Cloud Run: cannot apply wal: disk I/O error: invalid argument
Hi!

First of all, thank you for a great piece of software.

It might be a terrible to try to run litestream and SQLite on Cloud Run. so feel free to close this issue 👍

Problem

I'm trying to run get the litestream+s6 example working on Google Cloud Run (from https://github.com/benbjohnson/litestream-s6-example). If I run the container on Cloud Run, I get the error below

s3: restoring snapshot 9e3296261a575b63/00000000 to /tmp/db.tmp s3: restoring wal files: generation=9e3296261a575b63 index=[00000000,00000001] s3: downloaded wal 9e3296261a575b63/00000001 elapsed=278.371883ms s3: downloaded wal 9e3296261a575b63/00000000 elapsed=355.99923ms cannot apply wal: disk I/O error: invalid argument [cont-init.d] 00-litestream: exited 1.

If I run the container locally, it works well

... s3: downloaded wal 9e3296261a575b63/00000000 elapsed=832.108ms s3: applied wal 9e3296261a575b63/00000000 elapsed=11.0846ms s3: applied wal 9e3296261a575b63/00000001 elapsed=10.6193ms s3: renaming database from temporary location [cont-init.d] 00-litestream: exited 0.

Any idea what might be the problem?

I'm guessing this might be due to Cloud Run's in memory file system, but I don't know how to fix it (https://cloud.google.com/appengine/docs/standard/go/using-temp-files).
cannot verify wal state: ...-wal: no such file or directory
I'm testing out a litestream setup on k8s, where a python app container and litestream container share the db via a local volume. The db is created via an init container that just runs CREATE TABLE before either of them start.

Here is what init looks like:

litestream v0.3.2 initialized db: /db/hello.sqlite3 replicating to: name="s3" type="s3" bucket="..." path="hello.sqlite3" region="" litestream /db/hello.sqlite3: init: cannot determine last wal position, clearing generation (primary wal header: EOF) litestream /db/hello.sqlite3: sync: new generation "b7852795bc0c7d42", no generation exists

Afterwards, I get no logs until I run a query through the app (read or write). Then, I get a steady stream of litestream /db/hello.sqlite3: sync error: cannot verify wal state: stat /db/hello.sqlite3-wal: no such file or directory until I shut down litestream. I've tried issuing the wal pragma from the app's connection and enabling autocommit mode, but neither seem to make a difference.

Similar to https://github.com/benbjohnson/litestream/issues/58, I'm wondering if I'm doing something wrong or whether this is a litestream bug. If you think it's the latter, I can try to work on a simpler repro (I haven't tried it locally yet).
'no snapshot available'
Howdy! I've used Litestream with great success for a personal project and was trying to work it into a new project. I have Litestream running in Docker, reading a WAL-mode DB in a volume shared with another container. However, the first time I tried turning it on, it began spamming the following error:

/db/db.db(s3): monitor error: cannot determine replica position: no snapshot available: generation=60bedb457196f69d /db/db.db(s3): snapshot written 60bedb457196f69d/00000003 /db/db.db(s3): monitor error: cannot determine replica position: no snapshot available: generation=60bedb457196f69d /db/db.db(s3): snapshot written 60bedb457196f69d/00000003 /db/db.db(s3): monitor error: cannot determine replica position: no snapshot available: generation=60bedb457196f69d /db/db.db(s3): snapshot written 60bedb457196f69d/00000003

The DB file it's replicating had already existed for some time before I tried adding Litestream to the mix. Restarting Litestream did not fix it, though it did create a new snapshot dir on S3.

The weird thing is that it seemed to be writing successfully to the bucket, though it pretty quickly dumped ~9mb of data despite the DB being ~250kb, and only two or three small transactions taking place while I had it running. I had it hooked up to a free-tier Backblaze bucket and it hit the daily transaction limit within ten minutes as well (2500 S3 API calls).

I'm not really sure what's going on here -- it can definitely read the DB file, it can authenticate with Backblaze. Googling the error message didn't give me any leads either. Any chance someone can parse this issue better than I?
Replicate only on WAL changes

Thanks again for this software!

I'm using litestream with a service that has infrequent database writes (like a handful of times per day).

I noticed my AWS dashboard reporting many PUTs and revisited the documentation and realized that litestream replicates the WAL every 10s.

A nice to have feature would be if litestream skips the WAL replication if no local changes have occurred since last sync.

An even nicer feature for my scenario would be a "sync only on write" mode where instead of replicating every N seconds, it replicates immediately after each change to the WAL, but otherwise does not replicate.

Low priority since my understanding is that even 300k S3 PUTs is only ~$1.50/month, but it'd be cool if litestream could run completely in the free tier for infrequent write scenearios.

Monitor error - InvalidArgument, status code: 400

Hi !

First of all, thank you for you great project. I am really exited to start new projects with SQLite. I tried your last commits (d6ece0b82644f8963fff1d20087bb07a6a2d99c4) on master in order to upload generations into Google Cloud Storage.

Everything worked correctly until I tried to import a ~10MB CSV dataset into SQLite.

yann@xps:~/Projects/Perso/poc-litestream$ sqlite3 test.db
SQLite version 3.34.1 2021-01-20 14:10:07
Enter ".help" for usage hints.
sqlite> .mode csv
sqlite> .import ../Shakespeare_data.csv shakespear
_________________________________________
yann@xps:~/Projects/Perso/poc-litestream$ litestream replicate -config ~/Golang/bin/litestream.yml
litestream (development build)
initialized db: /home/yann/Projects/Perso/poc-litestream/test.db
replicating to: name="s3" type="s3" bucket="test-litestream-regional" path="db" region="europe-west1" endpoint="https://storage.googleapis.com" sync-interval=10s
/home/yann/Projects/Perso/poc-litestream/test.db(s3): monitor error: InvalidArgument: Invalid argument.
        status code: 400, request id: , host id:
/home/yann/Projects/Perso/poc-litestream/test.db(s3): monitor error: InvalidArgument: Invalid argument.
        status code: 400, request id: , host id:
_________________________________________
yann@xps:~/Projects/Perso/poc-litestream$ tree -ah
.
├── [9.3M]  test.db
├── [4.0K]  .test.db-litestream
│   ├── [  17]  generation
│   └── [4.0K]  generations
│       └── [4.0K]  310573930e9594aa
│           └── [4.0K]  wal
│               ├── [197K]  00000006.wal
│               ├── [9.4M]  00000007.wal
│               └── [4.1K]  00000008.wal
├── [ 32K]  test.db-shm
└── [9.4M]  test.db-wal

4 directories, 7 files

Here is my configuration:

dbs:
 - path: /home/yann/Projects/Perso/poc-litestream/test.db
   replicas:
    - type: s3
      bucket: test-litestream-regional
      path: db
      endpoint: https://storage.googleapis.com
      forcePathStyle: true
      region: europe-west1

yann@xps:~/Projects/Perso/poc-litestream$ env | grep AWS
AWS_SECRET_ACCESS_KEY=<<REDACTED>>
AWS_ACCESS_KEY_ID=<<REDACTED>>

I wonder if this issue is GCP related of it is more general. I'm pretty sure the error is raised here: https://github.com/benbjohnson/litestream/blob/main/s3/s3.go#L818-L824

If I can do anything to help, I would love to ! Have a great day and Thank you !

ListBucket is 99% of our AWS cost

Can we cache some query and save some $ ?

| Opération | sept. 2021 | |----------------------|------------| | Total cost ($) | 6.37 | | ListBucket ($) | 6.12 | | CreateVolume-Gp2 ($) | 0.21 | | StandardStorage ($) | 0.02 | | PutObject ($) | 0.01 |

Let us know.
Sqleet Support?

Does this work with https://github.com/resilar/sqleet and, if so, any recommendations on how to make it work best? I've found sqleet to be the best for encrypted sqlite.
Android/iOS support

sqlite is the de-facto on-device database on Android [0], and it's also popular in iOS apps.

litestream might a be really useful and simple library for synchronizing data from a mobile app to a server, without having to change the app's persistence layer.

There's a couple of proprietary libraries [1] [2] in this space that I know of, but they do more than sync, and require using the library as data layer.

Since litestream is already cross-compiled for aarch64, the binary should already work out of a box. Due to Android's sandboxing, it's not usually possible for one app to access the database of another. Litestream would have to run as part of the app the database belongs to. Maybe a library could be created that wraps the existing executable, and invokes it at appropriate times, e.g. using the SyncAdapter [3]

[0] https://developer.android.com/training/data-storage/sqlite [1] https://objectbox.io [2] https://realm.io [3] https://developer.android.com/training/sync-adapters

Just an idea.
Bump github.com/aws/aws-sdk-go from 1.42.40 to 1.42.44
Bumps github.com/aws/aws-sdk-go from 1.42.40 to 1.42.44.

Release notes

Sourced from github.com/aws/aws-sdk-go's releases.

Release v1.42.44 (2022-01-28)

Service Client Updates

service/appconfig: Updates service API and documentation

service/appconfigdata: Updates service API and documentation

service/athena: Updates service API and documentation

This release adds a field, AthenaError, to the GetQueryExecution response object when a query fails.

service/cognito-idp: Updates service documentation

service/sagemaker: Updates service API

This release added a new NNA accelerator compilation support for Sagemaker Neo.

service/secretsmanager: Updates service API and documentation

Feature are ready to release on Jan 28th

Release v1.42.43 (2022-01-27)

Service Client Updates

service/amplify: Updates service documentation

service/connect: Updates service API and documentation

service/ec2: Updates service API

X2ezn instances are powered by Intel Cascade Lake CPUs that deliver turbo all core frequency of up to 4.5 GHz and up to 100 Gbps of networking bandwidth

service/kafka: Updates service API and documentation

service/opensearch: Updates service API and documentation

SDK Bugs

aws/request: Update Request Send to always ensure Request.HTTPResponse is populated.

Fixes #4211

Release v1.42.42 (2022-01-26)

Service Client Updates

service/codeguru-reviewer: Updates service API, documentation, and waiters

service/ebs: Updates service documentation

service/frauddetector: Updates service API, documentation, and paginators

service/sagemaker: Updates service API and documentation

API changes relating to Fail steps in model building pipeline and add PipelineExecutionFailureReason in PipelineExecutionSummary.

service/securityhub: Updates service API and documentation

Release v1.42.41 (2022-01-25)

Service Client Updates

service/connect: Updates service API, documentation, and paginators

service/elasticfilesystem: Updates service API and documentation

Use Amazon EFS Replication to replicate your Amazon EFS file system in the AWS Region of your preference.

service/fsx: Updates service API and documentation

service/guardduty: Updates service API and documentation

Amazon GuardDuty expands threat detection coverage to protect Amazon Elastic Kubernetes Service (EKS) workloads.

Changelog

Sourced from github.com/aws/aws-sdk-go's changelog.

Release v1.42.44 (2022-01-28)

Service Client Updates

service/appconfig: Updates service API and documentation

service/appconfigdata: Updates service API and documentation

service/athena: Updates service API and documentation

This release adds a field, AthenaError, to the GetQueryExecution response object when a query fails.

service/cognito-idp: Updates service documentation

service/sagemaker: Updates service API

This release added a new NNA accelerator compilation support for Sagemaker Neo.

service/secretsmanager: Updates service API and documentation

Feature are ready to release on Jan 28th

Release v1.42.43 (2022-01-27)

Service Client Updates

service/amplify: Updates service documentation

service/connect: Updates service API and documentation

service/ec2: Updates service API

X2ezn instances are powered by Intel Cascade Lake CPUs that deliver turbo all core frequency of up to 4.5 GHz and up to 100 Gbps of networking bandwidth

service/kafka: Updates service API and documentation

service/opensearch: Updates service API and documentation

SDK Bugs

aws/request: Update Request Send to always ensure Request.HTTPResponse is populated.

Fixes #4211

Release v1.42.42 (2022-01-26)

Service Client Updates

service/codeguru-reviewer: Updates service API, documentation, and waiters

service/ebs: Updates service documentation

service/frauddetector: Updates service API, documentation, and paginators

service/sagemaker: Updates service API and documentation

API changes relating to Fail steps in model building pipeline and add PipelineExecutionFailureReason in PipelineExecutionSummary.

service/securityhub: Updates service API and documentation

Release v1.42.41 (2022-01-25)

Service Client Updates

service/connect: Updates service API, documentation, and paginators

service/elasticfilesystem: Updates service API and documentation

Use Amazon EFS Replication to replicate your Amazon EFS file system in the AWS Region of your preference.

service/fsx: Updates service API and documentation

service/guardduty: Updates service API and documentation

Amazon GuardDuty expands threat detection coverage to protect Amazon Elastic Kubernetes Service (EKS) workloads.

Commits

e63fed2 Release v1.42.44 (2022-01-28) (#4266)

ceab691 Release v1.42.43 (2022-01-27) (#4263)

fdd6541 codegen: Fixup event stream (un)marshal event name to match model (#4261)

5bcaccb service/cloudfront/sign: fix dropped test errors (#4236)

28b1fd2 aws/config: Update Request Send to always ensure HTTPResponse is not nil (#4256)

37a82ef Release v1.42.42 (2022-01-26) (#4259)

d59ff27 Release v1.42.41 (2022-01-25) (#4254)

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
litestream --> LiteFS

Some may wonder way there is few traffic in the project lately.

There is a new fantastic tool from benbjohnson: LiteFS

It replicates SQLite via a FUSE filesystem.

Someone with commit permissions: Maybe leave a small note about this in the README?

generation creation failed due to S3 upload multipart failed

when starting litestream I saw this message in the log:

litestream v0.3.8
initialized db: /data/db.sqlite3
replicating to: name="s3" type="s3" bucket="xxx" path="lb-pipeline-prod/db.sqlite3" region="fra1" endpoint="https://fra1.digitaloceanspaces.com" sync-interval=1s
/data/db.sqlite3: init: cannot determine last wal position, clearing generation; primary wal header: EOF
/data/db.sqlite3: sync: new generation "c51b0ab65d5a9c1f", no generation exists


/data/db.sqlite3(s3): monitor error: MultipartUpload: upload multipart failed
        upload id: 2~AVDf9oLvoUjwWcYb5So7CZmoZnpUguF
caused by: TotalPartsExceeded: exceeded total allowed configured MaxUploadParts (10000). Adjust PartSize to fit in this limit

litestream snapshots / litestream generations does not reveal anything under new generation. Apparently the new generation creation has failed.

Is there any config I could set to tune the multipart upload?

my config is:

access-key-id: xxx
secret-access-key: xxx

dbs:
  - path: /data/db.sqlite3
    replicas:
      - url: s3://xxx.fra1.digitaloceanspaces.com/lb-pipeline-prod/db.sqlite3
        retention: 1h
        retention-check-interval: 20m

Unable to recover database from a local replica

Hi, there! I'm trying to recover a database by using the command litestream restore -o /tmp/db /path/to/replica.db. However, I'm getting the error database not found in config: /path/to/replica.db. Also, I tried it without specifying the -o parameter, with same error. I can only recover the database without specifying the replica parameter.

Am I making a mistake specifying the command?

Thanks in advance!
Need more examples

“ It runs as a background process and safely replicates changes incrementally to another file or S3.”

but there is no any example how to replicate from one file db to another file db
Give replicate command a -restore-if-db-not-exists flag

To get a copy of the database before replication starts, you need to write a shell script like this: https://github.com/benbjohnson/litestream-docker-example/blob/8f4d71055049ccd3cc52f499539e70601bb10d48/scripts/run.sh#L4-L10

Could replicate get a flag like -restore-if-db-not-exists that would restore the database if it doesn't exist prior to starting replication?

This seems like a common use case and it would be nice to be able to cover it with a single command instead of adding an additional script to handle it.
Programmatic Usage

I couldn't find an existing issue asking this - is there a way to consume this programmatically? On some app platforms, one can point to a code base and have it automatically built and deployed. In this case, you can't use litestream, but if you're writing a Go-based app then you could import it and do a restore on app startup, and run periodic backups in a goroutine.

As a workaround, one could build a docker image that had litestream in it, but it can be more attractive to use it in-process in the example above.

Apologies in advance if this is already addressed elsewhere.

Related tags

Streaming replication for SQLite.

Litestream

Acknowledgements

Open-source, not open-contribution

Owner

Ben Johnson

Comments

Live read replicas

Would it be possible to create an OpenBSD binary with each new release of litestream?

Google Cloud Run: cannot apply wal: disk I/O error: invalid argument

Problem

cannot verify wal state: ...-wal: no such file or directory

'no snapshot available'

Replicate only on WAL changes

Monitor error - InvalidArgument, status code: 400

ListBucket is 99% of our AWS cost

Sqleet Support?

Android/iOS support

Bump github.com/aws/aws-sdk-go from 1.42.40 to 1.42.44

Release v1.42.44 (2022-01-28)

Service Client Updates

Release v1.42.43 (2022-01-27)

Service Client Updates

SDK Bugs

Release v1.42.42 (2022-01-26)

Service Client Updates

Release v1.42.41 (2022-01-25)

Service Client Updates

Release v1.42.44 (2022-01-28)

Service Client Updates

Release v1.42.43 (2022-01-27)

Service Client Updates

SDK Bugs

Release v1.42.42 (2022-01-26)

Service Client Updates

Release v1.42.41 (2022-01-25)

Service Client Updates

litestream --> LiteFS

generation creation failed due to S3 upload multipart failed

Unable to recover database from a local replica

Need more examples

Give replicate command a -restore-if-db-not-exists flag

Programmatic Usage

Related tags

Golang MySql binary log replication listener

MySQL replication topology management and HA

Enhanced PostgreSQL logical replication

MySQL replication topology manager - agent (daemon)

A river for elasticsearch to automatically index mysql content using the replication feed.

Convert data exports from various services to a single SQLite database

Pure Go SQLite file reader

Low-level Go interface to SQLite 3

Go sqlite3 http vfs: query sqlite databases over http with range headers

BQB is a lightweight and easy to use query builder that works with sqlite, mysql, mariadb, postgres, and others.

Experimental implementation of a SQLite backend for go-mysql-server

SQLite extension for accessing other SQL databases

Golang database driver for SQLite

RecordLite: a library (and executable) that declaratively maintains SQLite tables and views of semi-structured data

Tracking down a Memory Leak in Go/SQLite

Dbench - An unscientific benchmark of SQLite vs the file system (btrfs)

Sqlair - SQLite Query Layer With Golang

Simple key-value store on top of SQLite or MySQL

This is a simple graph database in SQLite, inspired by "SQLite as a document database".

Package sqlite is a CGo-free port of SQLite.