Simple backup tool for PostgreSQL

pg_back dumps databases from PostgreSQL

Description

pg_back is a dump tool for PostgreSQL. The goal is to dump all or some databases with globals at once in the format you want, because a simple call to pg_dumpall only dumps databases in the plain SQL format.

Behind the scene, pg_back uses pg_dumpall to dump roles and tablespaces definitions, pg_dump to dump all or each selected database to a separate file in the custom format. It also extract database level ACL and configuration that is not dumped by pg_dump older than 11. Finally, it dumps all configuration options of the PostgreSQL instance.

Features

  • Dump all or a list of databases
  • Dump all but a list of excluded databases
  • Include database templates
  • Choose the format of the dump for each database
  • Limit dumped schemas and tables
  • Dump databases concurrently
  • Compute a SHA checksum of each dump
  • Pre-backup and post-backup hooks
  • Purge based on age and number of dumps to keep
  • Dump from a hot standby by pausing replication replay
  • Encrypt and decrypt dumps and other files
  • Upload dumps to S3, GCS, Azure or a remote host with SFTP

Install

A compiled binary is available from the Github repository.

The binary only needs pg_dumpall and pg_dump.

Install from source

go get -u github.com/orgrim/pg_back

Use make to build and install from source (you need go 1.16 or above).

As an alternative, the following docker command downloads, compiles and puts pg_back in the current directory:

docker run --rm -v "$PWD":/go/bin golang:1.16 go get github.com/orgrim/pg_back

Minimum versions

The minimum version of pg_dump et pg_dumpall required to dump is 8.4. The oldest tested server version of PostgreSQL is 8.2.

Usage

Basic usage

Use the --help or -? to print the list of available options. To dump all databases, you only need to give the proper connection options to the PostgreSQL instance and the path to a writable directory to store the dump files.

If default and command line options are not enough, a configuration file may be provided with -c <configfilename> (see pg_back.conf). (Note: see below to convert configuration files from version 1.)

If the default output directory /var/backups/postgresql does not exist or has improper ownership for your user, use -b to give the path where to store the files. The path may contain the {dbname} keyword, that would be replaced by the name of the database being dumped, this permits to dump each database in its own directory.

To connect to PostgreSQL, use the -h, -p, -U and -d options. If you need less known connection options such as sslcert and sslkey, you can give a keyword=value libpq connection string like pg_dump and pg_dumpall accept with their -d option. When using connection strings, backslashes must be escaped (doubled), as well as literal single quotes (used as string delimiters).

The other command line options let you tweak what is dumped, purged, and how it is done. These options can be put in a configuration file. The command line options override configuration options.

Per-database configuration

Per-database configuration can only be done with a configuration file. The configuration file uses the ini format, global options are in a unspecified section at the top of the file, and database specific options are in a section named after the database. Per database options override global options of the configuration file.

In database sections of the configuration file, a list of schemas or tables can be excluded from or selected in the dump. When using these options, the rules of the -t, -T, -n and -N of pg_dump and pattern rules apply. See the documentation of pg_dump.

When no databases names are given on the command line, all databases except templates are dumped. To include templates, use --with-templates (-T), if templates are includes from the configuration file, --without-templates force exclude them.

Databases can be excluded with --exclude-dbs (-D), which is a comma separated list of database names. If a database is listed on the command line and part of exclusion list, exclusion wins.

Multiple databases can be dumped at the same time, by using a number of concurrent pg_dump jobs greater than 1 with --jobs (-j) option. It is different than --parallel-backup-jobs (-J) that controls the number of sessions used by pg_dump with the directory format.

Checksums

A checksum of all output files is computed in a separate file when --checksum-algo (-S) is different than none. The possible algorithms are: sha1, sha224, sha256, sha384 and sha512. The checksum file is in the format required by shaXsum (sha1sum, sha256sum, etc.) tools for checking with their -c option.

Purge

Older dumps can be removed based on their age with --purge-older-than (-P) in days, if no unit is given. Allowed units are the ones understood by the time.ParseDuration Go function: "s" (seconds), "m" (minutes), "h" (hours) and so on.

A number of dump files to keep when purging can also be specified with --purge-min-keep (-K) with the special value all to keep everything, thus avoiding file removal completly. When both --purge-older-than and --purge-min-keep are used, the minimum number of dumps to keep is enforced before old dumps are removed. This avoids removing all dumps when the time interval is too small.

Hooks

A command can be run before taking dumps with --pre-backup-hook, and after with --post-backup-hook. The commands are executed directly, not by a shell, respecting single and double quoted values. Even if some operation fails, the post backup hook is executed when present.

Encryption

All the files procuded by a run of pg_back can be encrypted using age (https://age-encryption.org/ an easy to use tool that does authenticated encryption of files). To keep things simple, encryption is done using a passphrase. To encrypt files, use the --encrypt option along with the --cipher-pass option or PGBK_CIPHER_PASS environment variable to specify the passphrase. When encrypt is set to true in the configuration file, the --no-encrypt option allows to disable encryption on the command line. By default, unencrypted source files are removed when they are successfully encrypted. Use the --encrypt-keep-src option to keep them or --no-encrypt-keep-src to force remove them and override the configuration file. If required, checksum of encrypted files are computed.

Encrypted files can be decrypted with the correct passphrase and the --decrypt option. When --decrypt is present on the command line, dumps are not performed, instead files are decrypted. Files can also be decrypted with the age tool, independently. Decryption of multiple files can be parallelized with the -j option. Arguments on the commandline (database names when dumping) are used as shell globs to choose which files to decrypt.

Please note that files are written on disk unencrypted in the backup directory, before encryption and deleted after the encryption operation is complete. This means that the host running pg_back must secure enough to ensure privacy of the backup directory and connections to PostgreSQL.

Upload to remote locations

All files produced by a run can be uploaded to a remote location by setting the --upload option to a value different than none. The possible values are s3, sftp, gcs, azure or none.

When set to s3, files are uploaded to AWS S3. The --s3-* family of options can be used to tweak the access to the bucket. The --s3-profile option only reads credentials and basic configuration, s3 specific options are not used.

When set to sftp, files are uploaded to a remote host using SFTP. The --sftp-* family of options can be used to setup the access to the host. The PGBK_SSH_PASS sets the password or decrypts the private key (identity file), it is used only when --sftp-password is not set (either in the configuration file or on the command line). When an identity file is provided, the password is used to decrypt it and the password authentication method is not tried with the server. The only SSH authentication methods used are password and publickey. If an SSH agent is available, it is always used.

When set to gcs, files are uploaded to Google Cloud Storage. The --gcs-* family of options can be used to setup access to the bucket. When --gcs-keyfile is empty, GOOGLE_APPLICATION_CREDENTIALS environment is used.

When set to azure, files are uploaded to Azure Blob Storage. The --azure-* family of options can be used to setup access to the container. The name of the container is mandatory. If the account name is left empty, an anonymous connection is used and the endpoint is used directly: this allows the use of a full URL to the container with a SAS token. When an account is provided, the URL is built by prepending the container name to the endpoint and scheme is always https. The default endpoint is blob.core.windows.net. The AZURE_STORAGE_ACCOUNT and AZURE_STORAGE_KEY are used when --azure-account and --azure-key are not set (on the command line or corresponding options in the configuration file).

The --purge-remote option can be set to yes to apply the same purge policy on the remote location as the local directory.

When files are encrypted and their unencrypted source is kept, only encrypted files are uploaded.

Restoring files

The following files are created:

  • pg_globals_{date}.sql: definition of roles and tablespaces, dumped with pg_dumpall -g. This file is restored with psql.
  • pg_settings_{date}.out: the list of server parameters found in the configuration files (9.5+) or in the pg_settings view. They shall be put back by hand.
  • ident_file_{date}.out: the full contents of the pg_ident.conf file, usually located in the data directory.
  • hba_file_{date}.out: the full contents of the pg_hba.conf file, usually located in the data directory.
  • {dbname}_{date}.createdb.sql: an SQL file containing the definition of the database and parameters set at the database or "role in database" level. It is mostly useful when using a version of pg_dump older than 11. It is restored with psql.
  • {dbname}_{date}.{d,sql,dump,tar}: the dump of the database, with a suffix depending of its format. If the format is plain, the dump is suffixed with sql and must be restored with psql. Otherwise, it must be restored with pg_restore.

When checksum are computed, for each file described above, a text file of the same name with a suffix naming the checksum algorithm is produced.

When files are encrypted, they are suffixed with age and must be decrypted first, see the [Encryption] section above. When checksums are computed and encryption is required, checksum files are encrypted and encrypted files are checksummed.

To sum up, when restoring:

  1. Create the roles and tablespaces by executing pg_globals_{date}.sql with psql.
  2. Create the database with {dbname}_{date}.createdb.sql if necessary.
  3. Restore the database(s) with pg_restore (use -C to create the database) or psql

Managing the configuration file

The previous v1 configuration files are not compatible with pg_back v2.

Give the path of the v1 configuration file to the --convert-legacy-config command line option, and pg_back will try its best to convert it to the v2 format. Redirect the output to the new configuration file:

pg_back --convert-legacy-config  pg_back1.conf > pg_back2.conf

The default configuration file can be printed with the --print-default-config command line option.

On some environments (especially Debian), you may have to add host = /var/run/postgresql to override the default /tmp host.

Testing

Use the Makefile or regular go test.

To run SQL tests requiring a PostgreSQL instance:

  1. run initdb in some directory
  2. start postgres
  3. load testdata/fixture.sql with psql
  4. use go test or make test with the PGBK_TEST_CONNINFO environment variable set to a libpq connection string pointing to the instance. For example :
PGBK_TEST_CONNINFO="host=/tmp port=14651" make test

Contributing

Please use the issues and pull requests features from Github.

License

PostgreSQL - See LICENSE file

Owner
Comments
  • encrypt files

    encrypt files

    Encrypting a single file is complicated, but it would be nice if pg_back could encrypt the dumps itself (GPG ?)

    In an ideal case, the dump would not touch the disk un-encrypted (I have no idea how to deal with the directory format in this case).

  • Adding signature feature for dump file

    Adding signature feature for dump file

    Tested dump signature feature.

    By default, pg_back is working as usual

    1° option -S defines algorithm used for signature generation. or SIGNATURE_ALGO var. 2° help, command line option and configuration file option support

    3° Example:

    • pg_back -S sha256 ... will create a additional file called as dump with .sha256 extension containing sha256 sum.

    This feature is based on XXXXsum program avalaible on Linux host.

    4° If algosum program doesn't exist, pg_back fails at the beginning.

    5° signature purges is supported
    touch -d '3 months ago' postgres_2019-09-30_17-46-*

  • Quiet/Cron Mode

    Quiet/Cron Mode

    Good to see version 2 having package/binary releases and overall getting better. There is just one feature from 1.x that's missing for me and that is the -q flag to run pg_back in "quiet" mode. I normally run pg_back as a cron job and have mailing set up, so any output on STDOUT or STDERR will be sent to my inbox. However I only want to be notified about critical errors and want to suppress info messages like INFO: dumping instance configuration.

    I tried sudo -u postgres pg_back > /dev/null but still seeing terminal output, as the program only outputs to STDERR. Once I redirected STDERR to STDOUT sudo -u postgres pg_back > /dev/null 2>&1 I got the desired output. This however also discards any potential error and wont trigger an email (no output) if pg_back encounters a problem.

    To fix this I propose the following changes:

    1. Only output errors/warnings to STDERR and information, debug or verbose output to STDOUT
    2. A new CLI flag to omit any non critical output e.g. only errors/warnings to STDERR
  • v11 compatibility?

    v11 compatibility?

    In v11, if I understand correctly, some information from pg_db_role_setting that were dumped by pg_dumpall -g are now saved by pg_dump and can restored only with pg_restore --create... but you need --create with an pg_dump -Fp, or you lose this info.

    Will pg_back be compatible with this?

  • Providing --bin-directory fails with a

    Providing --bin-directory fails with a "not a directory" error

    $ which pg_dump
    /usr/bin/pg_dump
    $ /tmp/pg_back  -B /usr/bin/pg_dump -b /tmp
    2022/06/17 11:46:34 WARN: failed to retrieve version of pg_dump: fork/exec /usr/bin/pg_dump/pg_dump: not a directory
    2022/06/17 11:46:34 FATAL: provided pg_dump is older than 8.4, unable use it.
    

    Could this be related to https://github.com/orgrim/pg_back/issues/78 ?

    This error doesn't seem to happen with version 2.1.0 (installed via .deb package).

  • Older pg version support

    Older pg version support

    Hello, Latest version of pg_back (2.0.1) seems not compatible with postgresql 9.2 and lower. Especially due to nonexistent "-d connstr" option with pg_dumpall.

    2021/10/29 02:00:02 INFO: dumping globals
    2021/10/29 02:00:02 ERROR: /usr/pgsql-9.1/bin/pg_dumpall: invalid option -- 'd'
    2021/10/29 02:00:02 ERROR: Try "pg_dumpall --help" for more information.
    2021/10/29 02:00:02 FATAL: pg_dumpall -g failed: exit status 1
    

    Could you please add a disclaimer for older versions? Or take away this option? Regards

  • v2 installation from source fails

    v2 installation from source fails

    Note: I don't know anything about go, just following the README

    postgres@monolithe:~$ go get -u github.com/orgrim/pg_back
    package embed: unrecognized import path "embed" (import path does not begin with hostname)
    

    (Same problem on 2 Ubuntu 20.04 and 1 Debian 10.8 with gccgo-go)

  • Save config files

    Save config files

    pg_back should include configuration files.

    It could as simple than CONFIG_DIRECTORIES_TO_BACKUP="/etc/postgresql/ /var/lib/postgres/scripts" and a tar.

    A copy from pg_settings is another option (@tilkow a done some things about it) but (in my mind) overkill, or the output from show all.

  • pre / post backup hook

    pre / post backup hook

    Hi,

    It often happens that we'd like to copy the dump generated to an external location. It would be nice to have pre/post backup hooks to do that rather than having to build a complete script around pg_back.

    Would it be something possible ?

    Kind regards

  • single quotes in $PGBK_OPTS are not handled as expected

    single quotes in $PGBK_OPTS are not handled as expected

    if the for example -T 'tmp*' is used in PGBK_OPTS, the table is not excluded, this only seems to work if eval is used. instead of: if ! ${PGBK_BIN}pg_dump $OPTS $PGBK_OPTS -f "${dump}" $db; then this could work (not yet tested for all cases, also need to handle spaces in $dump) if ! eval "${PGBK_BIN}pg_dump $OPTS $PGBK_OPTS -f ${dump} $db"; then

  • Upload does not work for pg_back_2.0.1_linux_amd64.deb

    Upload does not work for pg_back_2.0.1_linux_amd64.deb

    Hi, I downloaded the pg_back_2.0.1_linux_amd64.deb file from the release page and install it on Ubuntu 21.10 using:

    sudo dpkg -i ./pg_back_2.0.1_linux_amd64.deb
    

    Looks like pg_back does not recognize that upload option is set and as result does not upload backups to cloud (gcs in my case). There is no errors about misconfiguration.

    But it does work if I compile the binary using:

    docker run --rm -v "$PWD":/go/bin golang:1.16 go get github.com/orgrim/pg_back
    

    In this case everything is working with the exact same config file I used previously.

  • Support of FORMAT binary (bytea problem)

    Support of FORMAT binary (bytea problem)

    • Big bytea fields (> a few 100MBs) cannot be exported (famous old limitation)
    • pg_dumpbinary uses COPY ... TO ... (FORMAT binary) but has very few features.
    • Is it possible that pg_back covers this case? A parameter may contain tables to be dumped with FORMAT binary. I suppose that coherence may be a problem (or use --snapshot ?) And it would need a separate restore script.
  • "pg_back" is an invalid Debian package name ("_" not allowed)

    "pg_back" is an invalid package name.

    It does not comply with the Debian package naming rules : https://www.debian.org/doc/debian-policy/ch-controlfields.html#source

    Package names (both source and binary, see Package) must consist only of lower case letters (a-z), digits (0-9), plus (+) and minus (-) signs, and periods (.).

    Looks like debian does not allow _ (underscore) in package names.

    Not much of an issue most of the time, but I stumbled upon this debsums tool which was reporting a warning about this:

    $ sudo debsums -ce
    debsums: invalid package name 'pg_back'
    

    There might be others out there who might be real picky...

  • Helm chart [question]

    Helm chart [question]

    It would be really nice if someone created a helm chart for this, as all charts available are not nearly as good as this one. This is useful to reduce the number of local helm charts with a chart that is maintained.

Related tags
Lightweight, single-binary Backup Repository client. Part of E2E Backup Architecture designed by RiotKit

Backup Maker Tiny backup client packed in a single binary. Interacts with a Backup Repository server to store files, uses GPG to secure your backups e

Apr 4, 2022
Kubegres is a Kubernetes operator allowing to create a cluster of PostgreSql instances and manage databases replication, failover and backup.

Kubegres is a Kubernetes operator allowing to deploy a cluster of PostgreSql pods with data replication enabled out-of-the box. It brings simplicity w

Dec 30, 2022
Simple SFTP backup tool for files.

BakTP Simple SFTP backup tool for files. config.example.json Contains an example how to backup a database. This application can be added to crontab -e

Dec 30, 2021
WaffleSyrup - Simple backup solution written by Go.

WaffleSyrup Simple backup solution written by Go. Usage WaffleSyrup runs in the current working directory. It will create ./tmp directory to save tarb

Apr 22, 2022
A simple program to automatically backup a database using git. Err handling by Sentry, Reporting by Betteruptime. Made with 🩸 , 😓 & 😭

backup What is this? A Simple program to automatically backup a database using git. Err handling by Sentry, Uses heartbeats by Betteruptime Made with

Nov 4, 2022
A simple tool to sync your etcd cluster to PostgreSQL in realtime.

etcd-postgresql-syncer A simple tool to sync your etcd cluster to PostgreSQL in realtime. It sets up a watcher on etcd and commits all changes to Post

Jan 20, 2022
🛅 Backup your Kubernetes Stateful Applications

Stash Stash by AppsCode is a cloud-native data backup and recovery solution for Kubernetes workloads. If you are running production workloads in Kuber

Jan 7, 2023
Dgraph Backup and Restore (cloud). Read-only mirror.

dgbrx Dgraph Backup and Restore X dgbrx is a Go commandline tool which helps to do a backup, restore or clean on a Dgraph Cloud (aka slash / managed)

Oct 28, 2021
Tape backup software optimized for large WORM data and long-term recoverability

Mixtape Backup software for tape users with lots of WORM data. Draft design License This codebase is not open-source software (or free, or "libre") at

Oct 30, 2022
Kstone is an etcd management platform, providing cluster management, monitoring, backup, inspection, data migration, visual viewing of etcd data, and intelligent diagnosis.
Kstone is an etcd management platform, providing cluster management, monitoring, backup, inspection, data migration, visual viewing of etcd data, and intelligent diagnosis.

Kstone 中文 Kstone is an etcd management platform, providing cluster management, monitoring, backup, inspection, data migration, visual viewing of etcd

Dec 27, 2022
A library for writing backup programs in Golang

Barkup godoc.org/github.com/keighl/barkup Barkup is a library for backing things up. It provides tools for writing bare-bones backup programs in Go. T

Nov 13, 2022
MongoBackup - This is container that takes backup of MongoDB

MongoBackup This is container that takes backup of MongoDB. It is ment to be ran

Feb 15, 2022
Lxmin - Backup and Restore LXC instances from MinIO

lxmin Backup and restore LXC instances from MinIO Usage NAME: lxmin - backup a

Dec 7, 2022
Github-backup application

Github-backup application This application clone your github repository with all commits, branch, tags etc. to your local disk Dependencies This App u

Dec 26, 2022
PolarDB Stack is a DBaaS implementation for PolarDB-for-Postgres, as an operator creates and manages PolarDB/PostgreSQL clusters running in Kubernetes. It provides re-construct, failover swtich-over, scale up/out, high-available capabilities for each clusters.
PolarDB Stack is a DBaaS implementation for PolarDB-for-Postgres, as an operator creates and manages PolarDB/PostgreSQL clusters running in Kubernetes. It provides re-construct, failover swtich-over, scale up/out, high-available capabilities for each clusters.

PolarDB Stack开源版生命周期 1 系统概述 PolarDB是阿里云自研的云原生关系型数据库,采用了基于Shared-Storage的存储计算分离架构。数据库由传统的Share-Nothing,转变成了Shared-Storage架构。由原来的N份计算+N份存储,转变成了N份计算+1份存储

Nov 8, 2022
Golang CRUD using database PostgreSQL, adding some fremework like mux and pq.

Golang CRUD with PostgreSQL Table of contents ?? General info Technologies Blog Setup General info GOPOST or Go-Post is a Golang REST API made to show

Nov 27, 2021
Book-API was made using Golang and PostgreSQL with technique CRUD with mux and pq

Book-API CRUD with PostgreSQL Table of contents ?? General info Technologies Blog Setup General info BAPI or Book-API is a Golang REST API made to sho

Feb 18, 2022
Sensu-go-postgres-metrics - The sensu-go-postgres-metrics is a sensu check that collects PostgreSQL metrics

sensu-go-postgres-metrics Table of Contents Overview Known issues Usage examples

Jan 12, 2022
Pagodasite - Pagoda site written in go uses echo postgresql redis-cli
Pagodasite - Pagoda site written in go uses echo postgresql redis-cli

Pagoda: Rapid, easy full-stack web development starter kit in Go Table of Conten

Feb 20, 2022