Exploring Postgres Write Ahead Logs (WAL)

Exploring Postgres Write Ahead Logs (WAL)

I interact with Postgres on a daily basis, albeit typically managed by a service provider like AWS RDS. Hot take - here you pay a premium for RDS to accomodate the luxury of your ignorance - rightfully so. RDS abstracts the internals of Postgres allowing you to view it more and more like an intuitive store of data rather than dwelling on the implementation details.

Some features of personal interest these services typically offer are:

  • Point in time recovery (PITR)
  • Heated (or ready to go) backup replica's
  • Logical replication

I learnt however that these are offered via proxy of ease rather than PostgresQL component expansion.

With this study into Postgres WAL recently and it has given me the ability to clearly appreciate Postgres providing service's offerings.

WAL Introduction

Database systems intend to garantuee data validity. In turn we find that this directly assumes that database transactions (units of change done) behave correctly. Correctly here is defined by four properties known as ACID. WAL is a technique that directly targets atomocity and durability.

With such a significant influence on the data validity of a database system, you can understand my interest in it.

WAL in Postgres - all modifications are written to a log before they are applied. Both redo and undo information is stored in the log.

Where do they live:

 % docker-compose up --detach postgres
Creating network "postgres-wal_default" with the default driver
Creating postgres-wal_postgres_1 ... done
 % docker-compose exec postgres bash
root@5bfadf672170:/# cd $PGDATA/pg_wal
root@5bfadf672170:/var/lib/postgresql/data# ls -l
total 16388
-rw------- 1 postgres postgres 16777216 Oct 17 21:59 000000010000000000000001
drwx------ 2 postgres postgres     4096 Oct 17 21:59 archive_status

Notes:

  • LSN derived filename convention here, leading 8 hexadecimal values represent a time element (epoched to when the DB cluster first started). Remaining 16 values increment as needed.
  • WAL files are binary files with allocation of 16MB - this is changeable.

You can --follow these WAL files;

root@5bfadf672170:/var/lib/postgresql/data/pg_wal# pg_waldump 000000010000000000000001 -f

And in another terminal create DB changes

 % pgcli -h localhost -U postgres postgres
Password for postgres:
Server: PostgreSQL 14.0 (Debian 14.0-1.pgdg110+1)
Version: 3.2.0
Home: http://pgcli.com
postgres@localhost:postgres> CREATE TABLE tmp(val int);
CREATE TABLE
Time: 0.005s
postgres@localhost:postgres> INSERT INTO tmp(val) SELECT g.id FROM generate_series(1, 10) as g(id);
INSERT 0 10
Time: 0.005s

You can see how these WAL files are written before the query is returned in the earlier terminal session.

Why do we have WAL files?

tldr: faster when adhering to strict data integrity requirements.

If we follow this procedure, we do not need to flush data pages to disk on every transaction commit, because we know that in the event of a crash we will be able to recover the database using the log: any changes that have not been applied to the data pages can be redone from the log records. (This is roll-forward recovery, also known as REDO.) - documentation

WAL-diagram

  • checkpointer hard limit (max 2min) adhering dirty buffer flusher at intervals. pauses everything, figures out what it can and can not flush.
  • background writer flushes based on LRU algo increasing clean pages to go around cheaply.

Streaming these WAL files (principle behind backups)

Postgres exports a utility pg_receivewal that acts as a read once, immutable message queue allowing you to stream these wall files to.. anywhere (for example archiving).

 % docker-compose exec postgres bash
root@03c55a788579:/# su postgres
postgres@03c55a788579:/$ cd $PGDATA/
postgres@03c55a788579:~/data$ cd ..
postgres@03c55a788579:~$ mkdir stream
postgres@03c55a788579:~$ pg_receivewal -D stream/

In another terminal you can view these files

root@03c55a788579:/# ls -l $PGDATA/../stream/
total 16384
-rw------- 1 postgres postgres 16777216 Oct 17 23:36 000000010000000000000001.partial

.partial files are actively streaming the current WAL file being written to. Once this WAL file is either filled up or switched for example;

postgres@localhost:postgres> select pg_switch_wal();
+-----------------+
| pg_switch_wal   |
|-----------------|
| 0/16FAC80       |
+-----------------+
SELECT 1
Time: 0.041s

You'll notice

root@03c55a788579:/# ls -l $PGDATA/../stream/
total 32768
-rw------- 1 postgres postgres 16777216 Oct 17 23:41 000000010000000000000001
-rw------- 1 postgres postgres 16777216 Oct 17 23:41 000000010000000000000002.partial

Although this is just simple archiving, you will see how this is an important concept to know about when it comes to general archiving, backing up and interestingly, the fundamental tool behind: DB replication.

Replication Servers

Colloquially known as replication slots, are mechanisms that more formally wrap pg_receivewal that offers easy replication connections with the aim of providing a consistent interface among replication connectors.

You can create these replication slots via

postgres@localhost:postgres> select * from pg_create_physical_replication_slot('replica');
+-------------+--------+
| slot_name   | lsn    |
|-------------+--------|
| replica     | 
   
     |
+-------------+--------+
SELECT 1
Time: 0.018s
postgres@localhost:postgres> select slot_name, active from pg_replication_slots
+-------------+----------+
| slot_name   | active   |
|-------------+----------|
| replica     | False    |
+-------------+----------+
SELECT 1
Time: 0.009s

   

Interestingly, you can also use pg_receivewal to stream WAL files to somewhere too! Which technically chains pg_receivewal. -S, --slot-name to point to a replication slot.

postgres@993e83a382c4:~$ pg_receivewal -D stream/ -S replica

Logical Replication

By default WAL file provides just enough information for basic replica support, which writes enough data to support WAL archiving and replication, including running read-only queries on a standby server. This is informed physically which uses exact block addresses and byte-by-byte replication. We can change the wal_level to allow logical decoding of the WAL files allowing a more generic consumer (difference between logical and physical replication).

edit postgres.conf to change wal_level

root@73d37a504686:/# cd $PGDATA/
root@73d37a504686:/var/lib/postgresql/data# vim postgresql.conf

inside postgresql.conf

  1. search "WRITE-AHEAD LOG"
  2. find option wal_level and change this to logical

Let postgres cluster react to this config change by cluster restart

root@73d37a504686:/var/lib/postgresql/data# su postgres
postgres@73d37a504686:~/data$ pg_ctl restart
waiting for server to shut down....
 % docker-compose up postgres

pg_receivewal works just the same, you will just notice each log line containing added logical language describing what has happened. We can extract and interpret these WAL files via pg_recvlogical although pg_recvlogical relies on replication slots directly, (ie we cannot directly wrap pg_receivewal).

We can use pg_recvlogical directly to create a logical replication slot;

su postgres
pg_recvlogical -d postgres --slot extract --create-slot
pg_recvlogical -d postgres --slot extract --start -f -

Notice: logical streams differentiate logs from different databases

inserting some stuff into postgres database

 % pgcli -h localhost -U postgres postgres
Password for postgres:
Server: PostgreSQL 14.0 (Debian 14.0-1.pgdg110+1)
Version: 3.2.0
Home: http://pgcli.com
postgres@localhost:postgres> create table tmp(val int);
CREATE TABLE
Time: 0.006s
postgres@localhost:postgres> INSERT INTO tmp(val) SELECT g.id FROM generate_series(1, 10) as g(id);
INSERT 0 10
Time: 0.006s

And in the previous terminal you will notice logical output of what is happening, far easier to understand than the physical level logging we got before.

BEGIN 736
table public.tmp: INSERT: val[integer]:1
table public.tmp: INSERT: val[integer]:2
table public.tmp: INSERT: val[integer]:3
table public.tmp: INSERT: val[integer]:4
table public.tmp: INSERT: val[integer]:5
table public.tmp: INSERT: val[integer]:6
table public.tmp: INSERT: val[integer]:7
table public.tmp: INSERT: val[integer]:8
table public.tmp: INSERT: val[integer]:9
table public.tmp: INSERT: val[integer]:10
COMMIT 736

Now ofcourse you can do this inside Postgres via pg_create_logical_replication_slot (see Replication Functions).

Since you've already created a replication slot above - you can utulise it right away (seek more complete example here);

postgres@localhost:postgres> -- \df pg_logical_slot_get_changes
postgres@localhost:postgres> SELECT * FROM pg_logical_slot_get_changes('extract', NULL, NULL);
+-------+-------+--------+
| lsn   | xid   | data   |
|-------+-------+--------|
+-------+-------+--------+
SELECT 0
Time: 0.018s
postgres@localhost:postgres> INSERT INTO tmp(val) SELECT g.id FROM generate_series(1, 10) as g(id);
INSERT 0 10
Time: 0.006s
postgres@localhost:postgres> SELECT * FROM pg_logical_slot_get_changes('extract', NULL, NULL);
+-----------+-------+-------------------------------------------+
| lsn       | xid   | data                                      |
|-----------+-------+-------------------------------------------|
| 0/17143F8 | 737   | BEGIN 737                                 |
| 0/17143F8 | 737   | table public.tmp: INSERT: val[integer]:1  |
| 0/1714748 | 737   | table public.tmp: INSERT: val[integer]:2  |
| 0/1714788 | 737   | table public.tmp: INSERT: val[integer]:3  |
| 0/17147C8 | 737   | table public.tmp: INSERT: val[integer]:4  |
| 0/1714808 | 737   | table public.tmp: INSERT: val[integer]:5  |
| 0/1714848 | 737   | table public.tmp: INSERT: val[integer]:6  |
| 0/1714888 | 737   | table public.tmp: INSERT: val[integer]:7  |
| 0/17148C8 | 737   | table public.tmp: INSERT: val[integer]:8  |
| 0/1714908 | 737   | table public.tmp: INSERT: val[integer]:9  |
| 0/1714948 | 737   | table public.tmp: INSERT: val[integer]:10 |
| 0/17149B8 | 737   | COMMIT 737                                |
+-----------+-------+-------------------------------------------+
SELECT 12
Time: 0.015s
postgres@localhost:postgres> SELECT * FROM pg_logical_slot_get_changes('extract', NULL, NULL);
+-------+-------+--------+
| lsn   | xid   | data   |
|-------+-------+--------|
+-------+-------+--------+
SELECT 0
Time: 0.016s

As JSON changesets

Via wal2json via -P, --plugin flag.

su postgres
pg_recvlogical -d postgres --slot test_slot --create-slot -P wal2json
pg_recvlogical -d postgres --slot test_slot --start -o pretty-print=1 -o add-msg-prefixes=wal2json -f -

And you will now receive more rich, json messages.

Demo

pg_recvlogical has

TODO

  • integration of tom arrells messaging queue?

Interesting utulisations of knowledge above

  • unlogged tables (why you would)
  • asynchronous commits
  • publications
Similar Resources

WAL-G is an archival restoration tool for PostgreSQL, MySQL/MariaDB, and MS SQL Server (beta for MongoDB and Redis).

WAL-G is an archival restoration tool for PostgreSQL, MySQL/MariaDB, and MS SQL Server (beta for MongoDB and Redis).

Jan 1, 2023

HLF smart-bft WAL files reader

WAL Reader Utility for reading Hyperledger Fabric SmartBFT WAL files. Install go install gitlab.n-t.io/atmz/walreader@latest Read WAL file walreader

Jan 14, 2022

Write controller-runtime based k8s controllers that read/write to git, not k8s

Git Backed Controller The basic idea is to write a k8s controller that runs against git and not k8s apiserver. So the controller is reading and writin

Dec 10, 2021

An experimental port of TinyRb to Google go, both as a means of learning go and exploring alternate approaches to implementing Ruby. Work is currently focused on the GoLightly VM.

tinyrb¶ ↑ A tiny subset of Ruby with a Lua'esc VM. Everything in TinyRb should run in the big Ruby. (except bugs and things that don't comply to the p

Sep 22, 2022

Apache Kafka Web UI for exploring messages, consumers, configurations and more with a focus on a good UI & UX.

Apache Kafka Web UI for exploring messages, consumers, configurations and more with a focus on a good UI & UX.

Kowl - Apache Kafka Web UI Kowl (previously known as Kafka Owl) is a web application that helps you to explore messages in your Apache Kafka cluster a

Jan 3, 2023

CLI for exploring AWS EC2 Spot inventory. Inspect AWS Spot instance types, saving, price, and interruption frequency.

spotinfo The spotinfo is a command-line tool that helps you determine AWS Spot instance types with the least chance of interruption and provides the s

Dec 19, 2022

A tool for exploring each layer in a docker image

A tool for exploring each layer in a docker image

dive A tool for exploring a docker image, layer contents, and discovering ways to shrink the size of your Docker/OCI image. To analyze a Docker image

Jan 9, 2023

Microshift is a research project that is exploring how OpenShift1 Kubernetes can be optimized for small form factor and edge computing.

Microshift is a research project that is exploring how OpenShift1 Kubernetes can be optimized for small form factor and edge computing.

Nov 1, 2021

Frisbee is a Kubernetes-native platform for exploring, testing, and benchmarking distributed applications.

Frisbee is a Kubernetes-native platform for exploring, testing, and benchmarking distributed applications.

Why Frisbee ? Frisbee is a next generation platform designed to unify chaos testing and perfomance benchmarking. We address the key pain points develo

Dec 14, 2022

A repository for exploring google go

GoPractice A repository for exploring google go. There are tests in there. Q1) Write a function that sorts a bunch of words by the number of character

Nov 19, 2021

FlameScope is a visualization tool for exploring different time ranges as Flame Graphs.

FlameScope is a visualization tool for exploring different time ranges as Flame Graphs.

FlameScope FlameScope is a visualization tool for exploring different time ranges as Flame Graphs, allowing quick analysis of performance issues such

Dec 27, 2022

Exploring and comparing different IOT messaging protocols / transports.

IOT Messaging Protocols Blynk https://blynk.io/ A fully integrated suite of IoT software Device provisioning Sensor data visualization Remote control

Jan 2, 2022

Eventproc - A proof-of-concept for exploring event based architecture utilizing bi-directional gRPC streams

eventproc A proof-of-concept for exploring event based architecture utilizing bi

Jan 25, 2022

Blog - Exploring Domain Driven Design In Go

Exploring Domain Driven Design In Go

Jan 25, 2022

Go Coverage in Shell: a tool for exploring Go Coverage reports from the command line

Go Coverage in Shell: a tool for exploring Go Coverage reports from the command line

Go Coverage in Shell: a tool for exploring Go Coverage reports from the command line

Dec 31, 2022

Becca - A simple dynamic language for exploring language design

Becca A simple dynamic language for exploring language design What is Becca Becc

Aug 15, 2022

Tools for exploring OpenVG

Tools for exploring OpenVG

Testbed for exploring OpenVG on the Raspberry Pi. First program Here is the graphics equivalent of "hello, world" // first OpenVG program // Anthony S

Dec 30, 2022

Leveled execution logs for Go

glog ==== Leveled execution logs for Go. This is an efficient pure Go implementation of leveled logs in the manner of the open source C++ package h

Dec 24, 2022

Library and program to parse and forward HAProxy logs

haminer Library and program to parse and forward HAProxy logs. Supported forwarder, Influxdb Requirements Go for building from source code git for dow

Aug 17, 2022
Postgres uuid[] field for gorm.io - Golang

Postgres uuid[] field for gorm.io - Golang

Jan 17, 2022
F - Experimenting with Go 1.18 generics to write more functional Go code

f f is a simple library that leverages the new generics in Golang to create a tools for functional style of code. Pipe like '|>' in Elixir or Elm. inp

Apr 12, 2022
The AlfheimDB's high performance write-ahead log.

The AlfheimDB's high performance write-ahead log.

Jul 18, 2022
🔑A high performance Key/Value store written in Go with a predictable read/write performance and high throughput. Uses a Bitcask on-disk layout (LSM+WAL) similar to Riak.

bitcask A high performance Key/Value store written in Go with a predictable read/write performance and high throughput. Uses a Bitcask on-disk layout

Sep 26, 2022
Mogo: a lightweight browser-based logs analytics and logs search platform for some datasource(ClickHouse, MySQL, etc.)
Mogo: a lightweight browser-based logs analytics and logs search platform for some datasource(ClickHouse, MySQL, etc.)

mogo Mogo is a lightweight browser-based logs analytics and logs search platform

Dec 30, 2022
The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.
The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.

The open-source platform for monitoring and observability. Grafana allows you to query, visualize, alert on and understand your metrics no matter wher

Jan 3, 2023
Simple to do list API with Gin and Gorm (with Postgres)Simple to do list API with Gin and Gorm (with Postgres)

go-todo Simple to do list API with Gin and Gorm (with Postgres) Docker Clone this repository and run: docker-compose up You can then hit the followin

Aug 29, 2022
Go-postgres - go-postgres library provide NoSQL functionality which can execute queries with pool of connections

GO Postgres go-postgres library provide NoSQL functionality which can execute queries with pool of connections. What is this repository for? Establish

Dec 31, 2021
Sensu-go-postgres-metrics - The sensu-go-postgres-metrics is a sensu check that collects PostgreSQL metrics

sensu-go-postgres-metrics Table of Contents Overview Known issues Usage examples

Jan 12, 2022
rosedb is an embedded and fast k-v database based on LSM + WAL
rosedb is an embedded and fast k-v database based on LSM + WAL

A simple k-v database in pure Golang, supports string, list, hash, set, sorted set.

Dec 30, 2022