Kubegres is a Kubernetes operator allowing to create a cluster of PostgreSql instances and manage databases replication, failover and backup.

Last update: Dec 30, 2022

Comments: 17

Kubegres is a Kubernetes operator allowing to deploy a cluster of PostgreSql pods with data replication enabled out-of-the box. It brings simplicity when using PostgreSql considering how complex managing stateful-set's life-cycle and data replication could be with Kubernetes.

Features

It creates a cluster of PostgreSql servers with data replication enabled: it creates a Primary PostgreSql pod and a number of Replica PostgreSql pods and replicates primary's database in real-time to Replica pods.
It manages fail-over: if a Primary PostgreSql crashes, it automatically promotes a Replica PostgreSql as a Primary.
It has a data backup option allowing to dump PostgreSql data regularly in a given volume.
It provides a very simple YAML with properties specialised for PostgreSql.
It is resilient, has over 55 automatized tests cases and has been running in production.

Please click here to get started.

More details

Kubegres was developed by Reactive Tech Limited and Alex Arica as the lead developer. Reactive Tech offers support services for Kubegres, Kubernetes and PostgreSql.

It was developed with the framework Kubebuilder version 3, an SDK for building Kubernetes APIs using CRDs. Kubebuilder is maintained by the official Kubernetes API Machinery Special Interest Group (SIG).

Please click here to get started.

Contribute

If you would like to contribute to Kubegres' documentation, the GIT repo is available here. Any changes to the documentation will update the website https://www.kubegres.io/

Owner

Reactive Tech Ltd

https://github.com/reactive-tech/kubegres

Comments

Configure SSL

I mounted a TLS secret successfully. But, I can't set up permissions for postgres to access the file. Is there any way to set fsGroup? Also, is there any way to set init containers?

Backup failed

We enabled the backup function as documented here and get the following error.

28/08/2021 00:00:01 - Starting DB backup of Kubegres resource postgres into file: /var/lib/backup/postgres-backup-28_08_2021_00_00_01.gz
28/08/2021 00:00:01 - Running: pg_dumpall -h postgres-replica -U postgres -c | gzip > /var/lib/backup/postgres-backup-28_08_2021_00_00_01.gz
pg_dump: error: Dumping the contents of table "table" failed: PQgetResult() failed.
pg_dump: error: Error message from server: ERROR:  canceling statement due to conflict with recovery
DETAIL:  User query might have needed to see row versions that must be removed.
pg_dump: error: The command was: COPY public.table(column1, column2) TO stdout;
pg_dumpall: error: pg_dump failed on database "db", exiting
28/08/2021 00:00:01 - DB backup completed for Kubegres resource postgres into file: /var/lib/backup/postgres-backup-28_08_2021_00_00_01.gz

In this Stackoverflow post they recommend that we should activate hot_standby_feedback or increase max_standby. Is there a suggested solution from your side to overcome this issue?

Allow kubegres cluster to run on secure Kubernetes environments (Pod security policies)
Hi,

First thanks for your work on a Postgres cluster Kubernetes Operator.

We are deploying Kubernetes clusters in a secure by design manner using Rancher's RKE2 (aka RKE Government v1.20.11+rke2r2)

This creates a cluster with a hardened Pod Security policy which forbids, among others pods, from running as root.

This implies the workloads must have security context defined with at least these to settings (1001 is the postgres image running user):

securityContext: - runAsNonRoot: true - runAsUser: 1001

Looking at the baseConfigMap I also see you do some chown postgres:postgres when copying data from primary to replicas. I guess these would fail with these settings.

In "enterprise" setups this is actually not needed as CSI provisioned Pas do belong to the pod running user AFAIK.

While I understand this level of security is not needed by everyone and some users want to be able to run things in smaller clusters where security is not mandatory, I was wondering if it was possible to have some boolean flag in the CR yaml (ie: hardened : true/false) that would allow the workload to run in hardened PSP clusters.

This flag would basically use an alternate baseConfigMap with no chown commands and the added securityContext to the statefulSet.

FYI, right now we run a single node not replicated postgres server with the above securityContext with no issues whatsoever.

Let me know what you think about this.

many thanks,

Eric
Add support to manage volumes from Kubegres YAML
By default, the docker image of official PostgreSQL only configure 64MB for /dev/shm (check the Caveats section in Docker hub. It could cause problem for larger database.

# This is taken from the pod generated by Kubegres Filesystem Size Used Avail Use% Mounted on shm 64M 64K 64M 1% /dev/shm

There is one possible method, it is to increase the /dev/shm inside the container to be the same size as the host OS, default to 50% of RAM. It is mentioned in Stackoverflow.

But currently, Kubegres Kind definition only allowed "volumeMount:". It would not recognize "volumes:" and "volumeMounts:", so it seems there is no way to increase the value for the shared memory.
master pod failure

When the master pod(1) is crashed, the second pod starts again as the master and a new pod(4) is created because the first pod is crashed. While the second pod and the fourth pod work synchronously, the third pod works alone.
Improve the failover process for Replica

Hi Alex, many thanks for your amazing work with Kubegres. I tried to test crash of secondary Postgres. I conducted 2 separate tests. In 1st test I scaled down STS of secondary Postgres to zero. In second test I stopped k3d node on which that STS is running. Unfortunately in both test nothing happened. It would be great if Kubegres will run new instance of secondary Postgres in that case to achieve desired state. Regards, Juliusz.
Disable expand Storage

I just started using the operator and I think its got a really good potential. How do you expand the storage class for new Pods in the same Statefulset without affecting existing pods?
Is this project alive and actively maintained?

Just found it and as i see it, it's a little outdated last commit from 4 months ago, some of the issues we've seen are showstoppers for us..

Is this project alive a worth pursuing it and try to help it, or better redirect our efforts to some other place?

Sorry for by that rude, but as promising this project is, it seems not very ready for production use,,,
The update of an existing Kubegres resource fails if the field 'resources' contains a value with a decimal point

Thank you for maintaining this repo.

I am looking for steps/recommendations for upgrading between minor versions and major versions.

I am guessing that upgrading between minor versions is as simple as changing the container image, i.e. postgres:13.2 -> postgres:13.4.

Now that the official image for Postgres 14 is available, are there any steps that need to be followed to go from postgres:13.2 -> postgres:14.0 ?

Cheers.

When the field "spec.database.storageClassName" is omitted in Kubegres YAML, Kubegres operator should assign the default storageClass to the deployed PostgreSql cluster

Hi, I found that kubeges's controller segfaults and stays down when I omit the storageClassName property. It recovers once the offending Kubegres object is removed.

Expected behavior: Kubegres would request PVCs with the default storage class, i.e. simply omit the storageClassName field when generating its PVCs.

Steps to reproduce: Apply the following yaml:

apiVersion: v1
kind: Secret
metadata:
  name: postgres
type: Opaque
stringData:
  rootpasswd: foo
  replpasswd: bar
  userpasswd: baz
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-conf
data:
  primary_init_script.sh: |
    #!/bin/bash
    set -e
    psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL
    CREATE DATABASE $MY_POSTGRES_USER;
    CREATE USER $MY_POSTGRES_USER WITH PASSWORD '$MY_POSTGRES_PASSWORD';
    GRANT ALL PRIVILEGES ON DATABASE $MY_POSTGRES_USER to $MY_POSTGRES_USER;
    EOSQL
---
apiVersion: kubegres.reactive-tech.io/v1
kind: Kubegres
metadata:
  name: postgres
spec:
  image: postgres:13.2
  port: 5432
  replicas: 1
  database:
    size: 10Gi
  env:
    - name: POSTGRES_PASSWORD
      valueFrom:
        secretKeyRef:
          name: postgres
          key: rootpasswd
    - name: POSTGRES_REPLICATION_PASSWORD
      valueFrom:
        secretKeyRef:
          name: postgres
          key: replpasswd
    - name: MY_POSTGRES_USER
      value: notisvc
    - name: MY_POSTGRES_PASSWORD
      valueFrom:
        secretKeyRef:
          name: postgres
          key: userpasswd
  customConfig: postgres-conf

And observe how the manager container of kubegres-controller-manager panics immediately:

manager
2021-06-07T17:14:53.192Z    INFO controllers.Kubegres =======================================================
manager
2021-06-07T17:14:53.192Z    INFO controllers.Kubegres =======================================================
manager
2021-06-07T17:14:54.192Z    INFO controllers.Kubegres KUBEGRES {"name": "postgres", "Status": {"blockingOperation":{"statefulSetOperation":{},"statefulSetSpecUpdateOperation":{}},"previousBlockingOperation":{"statefulSetOperation":{},"statefulSetSpecUpdateOperation":{}}}}
manager
2021-06-07T17:14:54.192Z    INFO controllers.Kubegres Corrected an undefined value in Spec. {"spec.database.volumeMount": "New value: /var/lib/postgresql/data"}
manager
2021-06-07T17:14:54.192Z    INFO controllers.Kubegres Updating Kubegres Spec {"name": "postgres"}
manager
2021-06-07T17:14:54.192Z    DEBUG controller-runtime.manager.events Normal {"object": {"kind":"Kubegres","namespace":"kubegres-crash","name":"postgres","uid":"c4fcc9d9-e21c-4af3-9e33-18c1f980654e","apiVersion":"kubegres.reactive-tech.io/v1","resourceVersion":"62606471"}, "reason": "SpecCheckCorrection", "message": "Corrected an undefined value in Spec. 'spec.database.volumeMount': New value: /var/lib/postgresql/data"}
manager
E0607 17:14:54.207006 1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
manager
goroutine 403 [running]:
manager
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x15f20e0, 0x22f5700)
manager
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:74 +0xa6
manager
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
manager
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:48 +0x89
manager
panic(0x15f20e0, 0x22f5700)
manager
    /usr/local/go/src/runtime/panic.go:969 +0x1b9
manager
reactive-tech.io/kubegres/controllers/states.(*DbStorageClassStates).getSpecStorageClassName(...)
manager
    /workspace/controllers/states/DbStorageClassStates.go:80
manager
reactive-tech.io/kubegres/controllers/states.(*DbStorageClassStates).GetStorageClass(0xc000ae8b50, 0x8, 0xc000ea7be0, 0x19)
manager
    /workspace/controllers/states/DbStorageClassStates.go:62 +0x47
manager
reactive-tech.io/kubegres/controllers/states.(*DbStorageClassStates).loadStates(0xc000ae8b50, 0x8, 0xc000ea7be0)
manager
    /workspace/controllers/states/DbStorageClassStates.go:46 +0x2f
manager
reactive-tech.io/kubegres/controllers/states.loadDbStorageClass(...)
manager
    /workspace/controllers/states/DbStorageClassStates.go:39
manager
reactive-tech.io/kubegres/controllers/states.(*ResourcesStates).loadDbStorageClassStates(0xc000ae9370, 0x14d91f5, 0x8)
manager
    /workspace/controllers/states/ResourcesStates.go:75 +0xe5
manager
reactive-tech.io/kubegres/controllers/states.(*ResourcesStates).loadStates(0xc000ae9370, 0xc000843200, 0x0)
manager
    /workspace/controllers/states/ResourcesStates.go:46 +0x45
manager
reactive-tech.io/kubegres/controllers/states.LoadResourcesStates(...)
manager
    /workspace/controllers/states/ResourcesStates.go:40
manager
reactive-tech.io/kubegres/controllers/ctx/resources.CreateResourcesContext(0xc0005d22c0, 0x19a3880, 0xc000eaee70, 0x19ac260, 0xc0001ba2e0, 0x19b6f60, 0xc00044bae0, 0x19a06c0, 0xc00054a2c0, 0x0, ...)
manager
    /workspace/controllers/ctx/resources/ResourcesContext.go:105 +0x6c5
manager
reactive-tech.io/kubegres/controllers.(*KubegresReconciler).Reconcile(0xc00054a300, 0x19a3880, 0xc000eaee70, 0xc0006bf2c0, 0xe, 0xc0006bf2a0, 0x8, 0xc000eaee70, 0x40a1ff, 0xc000030000, ...)
manager
    /workspace/controllers/kubegres_controller.go:74 +0x17f
manager
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0005448c0, 0x19a37c0, 0xc00029a000, 0x16482e0, 0xc001000460)
manager
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263 +0x317
manager
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0005448c0, 0x19a37c0, 0xc00029a000, 0x203000)
manager
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235 +0x205
manager
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1(0x19a37c0, 0xc00029a000)
manager
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:198 +0x4a
manager
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
manager
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185 +0x37
manager
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0004b0f50)
manager
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155 +0x5f
manager
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000afbf50, 0x196dee0, 0xc000991620, 0xc00029a001, 0xc0003621e0)
manager
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156 +0xad
manager
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0004b0f50, 0x3b9aca00, 0x0, 0xc00029a001, 0xc0003621e0)
manager
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133 +0x98
manager
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext(0x19a37c0, 0xc00029a000, 0xc0001547c0, 0x3b9aca00, 0x0, 0xc000704501)
manager
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185 +0xa6
manager
k8s.io/apimachinery/pkg/util/wait.UntilWithContext(0x19a37c0, 0xc00029a000, 0xc0001547c0, 0x3b9aca00)
manager
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:99 +0x57
manager
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
manager
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:195 +0x4e7
manager
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
manager
    panic: runtime error: invalid memory address or nil pointer dereference
manager
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x14aafe7]
manager
goroutine 403 [running]:
manager
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
manager
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:55 +0x10c
manager
panic(0x15f20e0, 0x22f5700)
manager
    /usr/local/go/src/runtime/panic.go:969 +0x1b9
manager
reactive-tech.io/kubegres/controllers/states.(*DbStorageClassStates).getSpecStorageClassName(...)
manager
    /workspace/controllers/states/DbStorageClassStates.go:80
manager
reactive-tech.io/kubegres/controllers/states.(*DbStorageClassStates).GetStorageClass(0xc000ae8b50, 0x8, 0xc000ea7be0, 0x19)
manager
    /workspace/controllers/states/DbStorageClassStates.go:62 +0x47
manager
reactive-tech.io/kubegres/controllers/states.(*DbStorageClassStates).loadStates(0xc000ae8b50, 0x8, 0xc000ea7be0)
manager
    /workspace/controllers/states/DbStorageClassStates.go:46 +0x2f
manager
reactive-tech.io/kubegres/controllers/states.loadDbStorageClass(...)
manager
    /workspace/controllers/states/DbStorageClassStates.go:39
manager
reactive-tech.io/kubegres/controllers/states.(*ResourcesStates).loadDbStorageClassStates(0xc000ae9370, 0x14d91f5, 0x8)
manager
    /workspace/controllers/states/ResourcesStates.go:75 +0xe5
manager
reactive-tech.io/kubegres/controllers/states.(*ResourcesStates).loadStates(0xc000ae9370, 0xc000843200, 0x0)
manager
    /workspace/controllers/states/ResourcesStates.go:46 +0x45
manager
reactive-tech.io/kubegres/controllers/states.LoadResourcesStates(...)
manager
    /workspace/controllers/states/ResourcesStates.go:40
manager
reactive-tech.io/kubegres/controllers/ctx/resources.CreateResourcesContext(0xc0005d22c0, 0x19a3880, 0xc000eaee70, 0x19ac260, 0xc0001ba2e0, 0x19b6f60, 0xc00044bae0, 0x19a06c0, 0xc00054a2c0, 0x0, ...)
manager
    /workspace/controllers/ctx/resources/ResourcesContext.go:105 +0x6c5
manager
reactive-tech.io/kubegres/controllers.(*KubegresReconciler).Reconcile(0xc00054a300, 0x19a3880, 0xc000eaee70, 0xc0006bf2c0, 0xe, 0xc0006bf2a0, 0x8, 0xc000eaee70, 0x40a1ff, 0xc000030000, ...)
manager
    /workspace/controllers/kubegres_controller.go:74 +0x17f
manager
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0005448c0, 0x19a37c0, 0xc00029a000, 0x16482e0, 0xc001000460)
manager
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263 +0x317
manager
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0005448c0, 0x19a37c0, 0xc00029a000, 0x203000)
manager
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235 +0x205
manager
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1(0x19a37c0, 0xc00029a000)
manager
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:198 +0x4a
manager
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
manager
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185 +0x37
manager
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0004b0f50)
manager
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155 +0x5f
manager
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000afbf50, 0x196dee0, 0xc000991620, 0xc00029a001, 0xc0003621e0)
manager
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156 +0xad
manager
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0004b0f50, 0x3b9aca00, 0x0, 0xc00029a001, 0xc0003621e0)
manager
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133 +0x98
manager
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext(0x19a37c0, 0xc00029a000, 0xc0001547c0, 0x3b9aca00, 0x0, 0xc000704501)
manager
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185 +0xa6
manager
k8s.io/apimachinery/pkg/util/wait.UntilWithContext(0x19a37c0, 0xc00029a000, 0xc0001547c0, 0x3b9aca00)
manager
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:99 +0x57
manager
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
manager
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:195 +0x4e7

Node scheduling

Is it possible to specify node scheduling parameters? For example, if I want to ensure the pods schedule on on-demand nodes (instead of spot nodes) or if I want to add a toleration so it's able to schedule on a node with a taint.

service clusterip: none , is this normal?

Hi,

after installing postgres with kubegres i see there is no ips on service mypostgres/mypostgres-replica. How can this work?

kubernetes: MicroK8s v1.25.4 revision 4221 kubegres: 1.16

Steps:
1. kubectl apply -f https://raw.githubusercontent.com/reactive-tech/kubegres/v1.16/kubegres.yaml
2. create secret
3. deploy this yaml:

apiVersion: kubegres.reactive-tech.io/v1
kind: Kubegres
metadata:
  name: mypostgres
  namespace: default

spec:

   replicas: 3 
   image: postgres:latest

   database:
      size: 1000Mi

   env:
      - name: POSTGRES_PASSWORD
        valueFrom:
           secretKeyRef:
              name: mypostgres-secret
              key: superUserPassword

      - name: POSTGRES_REPLICATION_PASSWORD
        valueFrom:
           secretKeyRef:
              name: mypostgres-secret
              key: replicationUserPassword



5: results:

k8sadm@microk8s-nas:~$ kubectl get all -l 'app=mypostgres' --show-managed-fields -A -o wide

NAMESPACE   NAME                 READY   STATUS    RESTARTS   AGE   IP            NODE             NOMINATED NODE   READINESS GATES
default     pod/mypostgres-1-0   1/1     Running   0          88m   10.1.91.141   microk8s-nas     <none>           <none>
default     pod/mypostgres-2-0   1/1     Running   0          87m   10.1.69.16    microk8s-node2   <none>           <none>
default     pod/mypostgres-3-0   1/1     Running   0          86m   10.1.91.143   microk8s-nas     <none>           <none>

NAMESPACE   NAME                         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE   SELECTOR
default     service/mypostgres           ClusterIP   None         <none>        5432/TCP   88m   app=mypostgres,replicationRole=primary
default     service/mypostgres-replica   ClusterIP   None         <none>        5432/TCP   86m   app=mypostgres,replicationRole=replica

NAMESPACE   NAME                            READY   AGE   CONTAINERS     IMAGES
default     statefulset.apps/mypostgres-1   1/1     89m   mypostgres-1   postgres:latest
default     statefulset.apps/mypostgres-2   1/1     88m   mypostgres-2   postgres:latest
default     statefulset.apps/mypostgres-3   1/1     86m   mypostgres-3   postgres:latest

 dns tests:
k8sadm@microk8s-nas:~$ kubectl get svc -A -l "k8s-app=kube-dns" -o wide
NAMESPACE     NAME       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE     SELECTOR
kube-system   kube-dns   ClusterIP   10.152.183.10   <none>        53/UDP,53/TCP,9153/TCP   4h54m   k8s-app=kube-dns

k8sadm@microk8s-nas:~$ nslookup mypostgres.svc.cluster.local 10.152.183.10
Server:		10.152.183.10
Address:	10.152.183.10#53

** server can't find mypostgres.svc.cluster.local: NXDOMAIN

regards

backup job doesn't start cause: Not starting job because prior execution is running and concurrency policy is Forbid

Hi, I have a simple cluster: 1 master and 2 replicas. I have added to the deployment of my cluster backup configuration: backup: schedule: "0 */1 * * *" pvcName: my-backup-pvc volumeMount: /var/lib/backup But, It doesn't work well, because if I try to describe my cronjob I get this message: Normal JobAlreadyActive 112s (x5 over 121m) cronjob-controller Not starting job because prior execution is running and concurrency policy is Forbid

And inside the backup cronjob pod I have the following log:

[root@worker-sgpd postgis]# kubectl logs -f backup-mypostgres-27851640-tlqcx 15/12/2022 14:54:37 - Starting DB backup of Kubegres resource mypostgres into file: /var/lib/backup/mypostgres-backup-15_12_2022_14_54_37.gz 15/12/2022 14:54:37 - Running: pg_dumpall -h mypostgres-replica -U postgres -c | gzip > /var/lib/backup/mypostgres-backup-15_12_2022_14_54_37.gz pg_dumpall: error: connection to server at "mypostgres-replica" (192.168.108.126), port 5432 failed: Connection refused Is the server running on that host and accepting TCP/IP connections? connection to server at "mypostgres-replica" (192.168.108.100), port 5432 failed: Connection refused Is the server running on that host and accepting TCP/IP connections?

If I use a Postgres client like this: kubectl run postgresql-dev-client --rm --tty -i --restart='Never' --namespace default --image docker.io/bitnami/postgresql:14.1.0-debian-10-r80 --env="PGPASSWORD=admin" -- bash

and inside the container I can connect to my postgres: psql --host mypostgres -U postgres -d postgres -p 5432

postgres-# \conninfo You are connected to database "postgres" as user "postgres" on host "mypostgres" (address "192.168.108.123") at port "5432".

I don't know why I can connect inside a pod to my Postgres, but the connection from cronjob failed. Another strange thing, If I delete the corresponding pod of my backup cron job. The first backup work and the second fails to cause connection timeout. Can you help me? Regards Antonio
Backup container not being upgraded when Postgres is being upgraded.

Hi,

Just documenting this so that it can be fixed when you guys have a chance.

As an example, install Postgres 14.1. Backup container uses Postgres 14.1. Then upgrade to 14.2. DB containers are upgraded to 14.2, but backup container is still at Postgres 14.1.

Thanks.

kubegres can't recover if all statefulsets are deleted

Starting with a health cluster with 3 replicas:

$ kubectl describe kubegres postgres-uaa
Status:
  Blocking Operation:
    Stateful Set Operation:
    Stateful Set Spec Update Operation:
  Enforced Replicas:            4
  Last Created Instance Index:  5
  Previous Blocking Operation:
    Operation Id:  Replica DB count spec enforcement
    Stateful Set Operation:
      Instance Index:  5
      Name:            postgres-uaa-5
    Stateful Set Spec Update Operation:
    Step Id:                   Replica DB is deploying
    Time Out Epoc In Seconds:  1669789919
Events:                        <none>

Delete the statefulsets:

$ kubectl delete sts postgres-uaa-2 postgres-uaa-4 postgres-uaa-5

The following error is seen:

Events:
  Type    Reason                                   Age                From                 Message
  ----    ------                                   ----               ----                 -------
  Normal  FailoverCannotHappenAsNoReplicaDeployed  25s (x2 over 26s)  Kubegres-controller  A failover is required for a Primary Pod as it is not healthy. However, a failover cannot happen because there is not any Replica deployed.

The error makes sense because no replica is available. However, its unclear how to recover the cluster. Although the statefulsets were deleted, the PVCs still exist, and the database is intact.

Using promotePod is not possible because we cannot promote a pod that is not running.

As a workaround, I was able to manually create a statefulset out of band, and then promote the pod. But this process was kind of error prone (editing index labels) and unclear. I'm not sure I did it right, but it seemed to work eventually.

Feature idea: maybe a promotePVC option that can start the statefulset from an existing PVC.

Error when starting up postgres, using version 1.16

I am not able to startup Postgres. Installed version 1.16. I am getting the following error. Any thoughts?

1.667944382668021e+09 INFO controllers.Kubegres KUBEGRES {"name": "test-postgres", "Status": {"blockingOperation":{"statefulSetOperation":{},"statefulSetSpecUpdateOperation":{}},"previousBlockingOperation":{"statefulSetOperation":{},"statefulSetSpecUpdateOperation":{}}}} 1.667944383520829e+09 ERROR controllers.Kubegres Unable to load any deployed BackUp CronJob. {"CronJob name": "backup-test-postgres", "error": "no matches for kind "CronJob" in version "batch/v1""} reactive-tech.io/kubegres/controllers/ctx/log.(*LogWrapper).ErrorEvent /workspace/controllers/ctx/log/LogWrapper.go:62 reactive-tech.io/kubegres/controllers/states.(*BackUpStates).getDeployedCronJob /workspace/controllers/states/BackUpStates.go:91 reactive-tech.io/kubegres/controllers/states.(*BackUpStates).loadStates /workspace/controllers/states/BackUpStates.go:49 reactive-tech.io/kubegres/controllers/states.loadBackUpStates /workspace/controllers/states/BackUpStates.go:43 reactive-tech.io/kubegres/controllers/states.(*ResourcesStates).loadBackUpStates /workspace/controllers/states/ResourcesStates.go:95 reactive-tech.io/kubegres/controllers/states.(*ResourcesStates).loadStates /workspace/controllers/states/ResourcesStates.go:66 reactive-tech.io/kubegres/controllers/states.LoadResourcesStates /workspace/controllers/states/ResourcesStates.go:40 reactive-tech.io/kubegres/controllers/ctx/resources.CreateResourcesContext /workspace/controllers/ctx/resources/ResourcesContext.go:110 reactive-tech.io/kubegres/controllers.(*KubegresReconciler).Reconcile /workspace/controllers/kubegres_controller.go:76 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234 1.6679443835210457e+09 ERROR Reconciler error {"controller": "kubegres", "controllerGroup": "kubegres.reactive-tech.io", "controllerKind": "Kubegres", "kubegres": {"name":"test-postgres","namespace":"default"}, "namespace": "default", "name": "test-postgres", "reconcileID": "cb49cdb7-2464-4a48-8365-822d9e0af891", "error": "no matches for kind "CronJob" in version "batch/v1""} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234 1.6679443835211165e+09 DEBUG events Warning {"object": {"kind":"Kubegres","namespace":"default","name":"test-postgres","uid":"b380917d-6c13-4471-995e-8069960adc3b","apiVersion":"kubegres.reactive-tech.io/v1","resourceVersion":"4535076"}, "reason": "BackUpCronJobLoadingErr", "message": "Unable to load any deployed BackUp CronJob. 'CronJob name': backup-test-postgres - no matches for kind "CronJob" in version "batch/v1""}

===================================== apiVersion: kubegres.reactive-tech.io/v1 kind: Kubegres metadata: name: test-postgres namespace: default spec: replicas: 2 image: postgres:14.1 database: size: 8Gi storageClassName: local-storage env: - name: POSTGRES_PASSWORD valueFrom: secretKeyRef: name: postgres-secret key: superUserPassword - name: POSTGRES_REPLICATION_PASSWORD valueFrom: secretKeyRef: name: postgres-secret key: replicationUserPassword

================ k8s version: Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.15", GitCommit:"8f1e5bf0b9729a899b8df86249b56e2c74aebc55", GitTreeState:"clean", BuildDate:"2022-01-19T17:23:01Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}

================ storage classes: NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE local-storage kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 20d
References for PITR / Continuous Archiving implementation?

Hi.

I've had some issues with Zalando, and now I'm looking for a simpler operator. Kubegres seems to fit the bill, and my experience deploy a cluster was great. I have a custom image setup to run pg_dump and pg_restore scripts , CronJobs for the dump and an on-demand job for the restoration process. This is really simple, and works well, but with restrictions: won't work for larger databases, slow, very high RPO.

I've been looking at strategies to implement PITR and continuous backup. Zalando had this baked in using pg_basebackup and WAL-G (I think). Outside the k8s world, I've read a lot about PgBackrest, Barman, WAL-G and couple of other solutions. But those doesn't look all that simple to setup when the DB is running in containers (they might be, but I don't find much information on it except one or two repos). I know Timescale runs PgBackrest as a sidecar, Zalando runs a custom image with WAL-G/E + pg_basebackup, Percona also uses PgBackrest (not sure about the architecture). PGO Crunchy also backrest, Stackgres I think is custom solution, not sure.

I tried running a separate container for PgBackrest, so I changed the VolumeClaim policy for ReadWriteMany (so that Backrest could connect directly to the data directory), but I had quite a few issues all around the process and couldn't make it work (yet? will keep trying).

I understand Kubegres is not particularly going in this direction at the moment, but I wonder if this could be an option for the future. There has been a brief discussion about it here, but it stopped at pg_dump. Stackgres has an interesting approach with several CRDs. Although this looks complex at first having multiple CRD also allows for more flexibility. Zalando's approach tries to put everything into the cluster definition and/or configuration file, so things are not always trivial to grasp. (I'll follow up in a bit with potential implementations.

I imagine that this should be a common requirement for folks deploying PSQL to k8s, so even if this is not a plan for Kubegres in future, I imagine the pain still exists, so I was wondering if there were any examples, references or any other material really to implement this solution with Kubegres, or any experience people could share.

Thanks a lot!

Kubegres is a Kubernetes operator allowing to create a cluster of PostgreSql instances and manage databases replication, failover and backup.

Owner

Reactive Tech Ltd

Comments

Configure SSL

Backup failed

Allow kubegres cluster to run on secure Kubernetes environments (Pod security policies)

Add support to manage volumes from Kubegres YAML

master pod failure

Improve the failover process for Replica

Disable expand Storage

Is this project alive and actively maintained?

The update of an existing Kubegres resource fails if the field 'resources' contains a value with a decimal point

When the field "spec.database.storageClassName" is omitted in Kubegres YAML, Kubegres operator should assign the default storageClass to the deployed PostgreSql cluster

Node scheduling

service clusterip: none , is this normal?

backup job doesn't start cause: Not starting job because prior execution is running and concurrency policy is Forbid

Backup container not being upgraded when Postgres is being upgraded.

kubegres can't recover if all statefulsets are deleted

Error when starting up postgres, using version 1.16

References for PITR / Continuous Archiving implementation?

Related tags

PolarDB-X Operator is a Kubernetes extension that aims to create and manage PolarDB-X cluster on Kubernetes.

Basic Kubernetes operator that have multiple versions in CRD. This operator can be used to experiment and understand Operator/CRD behaviors.

Modular Kubernetes operator to manage the lifecycle of databases

A Terraform module to manage cluster authentication (aws-auth) for an Elastic Kubernetes (EKS) cluster on AWS.

An operator which complements grafana-operator for custom features which are not feasible to be merged into core operator

Terraform-operator - The Terraform Operator provides support to run Terraform modules in Kubernetes in a declaritive way as a Kubernetes manifest

The OCI Service Operator for Kubernetes (OSOK) makes it easy to connect and manage OCI services from a cloud native application running in a Kubernetes environment.

Lxmin - Backup and Restore LXC instances from MinIO

A simple application, demo at this point, on how to pull a backup from Collibra on prem (say for Cohesity backup)

Lightweight, single-binary Backup Repository client. Part of E2E Backup Architecture designed by RiotKit

The Oracle Database Operator for Kubernetes (a.k.a. OraOperator) helps developers, DBAs, DevOps and GitOps teams reduce the time and complexity of deploying and managing Oracle Databases

cluster-api-state-metrics (CASM) is a service that listens to the Kubernetes API server and generates metrics about the state of custom resource objects related of Kubernetes Cluster API.

Kubernetes Operator Samples using Go, the Operator SDK and OLM

vcluster - Create fully functional virtual Kubernetes clusters - Each cluster runs inside a Kubernetes namespace and can be started within seconds

The Elastalert Operator is an implementation of a Kubernetes Operator, to easily integrate elastalert with gitops.

Minecraft-operator - A Kubernetes operator for Minecraft Java Edition servers

K8s-network-config-operator - Kubernetes network config operator to push network config to switches

Pulumi-k8s-operator-example - OpenGitOps Compliant Pulumi Kubernetes Operator Example

Nebula Operator manages NebulaGraph clusters on Kubernetes and automates tasks related to operating a NebulaGraph cluster