At what point PostgreSQL begin recovery - postgresql

I'm going to make backups from a standby server. I use the following commands to create binary backup:
psql -c 'select pg_xlog_replay_pause()'
tar c data --exclude=pg_xlog/* | lzop --fast > /mnt/nfs/backup/xxxx.tar.lzop
psql -c 'select pg_xlog_replay_resume()'
All WAL logs from master database are stored on external storage for several days and recovery using these logs works great. However, backup becomes invalid after logs are cleaned. The solution is to copy all needed WAL logs starting from some point until the last log when backup is done.
The question is what is the first file?
pg_controldata shows:
pg_control version number: 942
Catalog version number: 201409291
Database system identifier: 6185091942558520564
Database cluster state: in archive recovery
pg_control last modified: Thu 08 Oct 2015 03:14:23 PM UTC
Latest checkpoint location: 1C41/F662E1F8
Prior checkpoint location: 1C41/B4435EE8
Latest checkpoint's REDO location: 1C41/DE003400
Latest checkpoint's REDO WAL file: 0000000200001C41000000DE
Latest checkpoint's TimeLineID: 2
Latest checkpoint's PrevTimeLineID: 2
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0/3550951620
Latest checkpoint's NextOID: 83806
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 3152230057
Latest checkpoint's oldestXID's DB: 16385
Latest checkpoint's oldestActiveXID: 3550951620
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 16385
Time of latest checkpoint: Thu 08 Oct 2015 03:10:44 PM UTC
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 1C42/4CC934E0
So what is the first file? AFAIK PostgreSQL always begin recovery from a checkpoint. I have tried to restore several backups and noticed that PostgreSQL starts recovery from Prior checkpoint location. Is this always true? What's the difference between Prior checkpoint location and Latest checkpoint location?
According to pg_controldata:
First file: 1C41/B4
Minimum last file: 1C42/4C (Must be greater of equal to `Minimum recovery ending location`)
Am I right?

You need from the "last checkpoint's redo location" - the first WAL for which is identified with "last checkpoint's REDO WAL file" - through to the WAL segment containing "minimum recovery ending location" on timeline "Last checkpoint's TimeLineID".
In your example that'd be from LSN 1C41/DE003400 through to 1C42/4CC934E0, both on TimeLineID 2.
That corresponds to WAL segments 0000000200001C41000000DE through 0000000200001C42????????.

Related

Upgrading from postgres 14 beta to postgres 14 new release

I installed postgres 14 beta and i want to upgrade to 14 new release. I have just installed the new release pg 14 and when i try to start the pg 14 beta i get the error below. May i know the correct procedure of upgrading from beta to new release
-bash-4.2$ /usr/pgsql-14/bin/pg_ctl -D /var/lib/pgsql/14/data -l logfile start
waiting for server to start.... stopped waiting
pg_ctl: could not start server
Examine the log output.
-bash-4.2$ cat logfile
2021-10-27 13:19:29.507 UTC [5112] FATAL: database files are incompatible with server
2021-10-27 13:19:29.507 UTC [5112] DETAIL: The database cluster was initialized with CATALOG_VERSION_NO 202106151, but the server was compiled with CATALOG_VERSION_NO 202107181.
2021-10-27 13:19:29.507 UTC [5112] HINT: It looks like you need to initdb.
2021-10-27 13:19:29.507 UTC [5112] LOG: database system is shut down
There has been a change in the catalog version since v14 beta 1, so you have to use fump/restore of pg_upgrade to upgrade.
Install 14.0, create a new cluster and use the method of your choice to upgrade.

Recover Postgresql pgBarman

I've setup a postgresql DB and I want to backup it.
I've 1 server with my main DB et 1 with Barman.
All the setup is working, I can backup my DB with barman.
I just don't understand how I can recover my DB on a exact time point between the backups that I do everyday.
barman#ubuntu:~$ barman check main-db-server
WARNING: No backup strategy set for server 'main-db-server' (using default 'exclusive_backup').
WARNING: The default backup strategy will change to 'concurrent_backup' in the future. Explicitly set 'backup_options' to silence this warning.
Server main-db-server:
PostgreSQL: OK
is_superuser: OK
wal_level: OK
directories: OK
retention policy settings: OK
backup maximum age: OK (interval provided: 1 day, latest backup age: 9 minutes, 59 seconds)
compression settings: OK
failed backups: OK (there are 0 failed backups)
minimum redundancy requirements: OK (have 6 backups, expected at least 0)
ssh: OK (PostgreSQL server)
not in recovery: OK
systemid coherence: OK (no system Id available)
archive_mode: OK
archive_command: OK
continuous archiving: OK
archiver errors: OK
And when I backup my DB
barman#ubuntu:~$ barman backup main-db-server
WARNING: No backup strategy set for server 'main-db-server' (using default 'exclusive_backup').
WARNING: The default backup strategy will change to 'concurrent_backup' in the future. Explicitly set 'backup_options' to silence this warning.
Starting backup using rsync-exclusive method for server main-db-server in /var/lib/barman/main-db-server/base/20210427T150505
Backup start at LSN: 0/1C000028 (00000005000000000000001C, 00000028)
Starting backup copy via rsync/SSH for 20210427T150505
Copy done (time: 2 seconds)
Asking PostgreSQL server to finalize the backup.
Backup size: 74.0 MiB. Actual size on disk: 34.9 KiB (-99.95% deduplication ratio).
Backup end at LSN: 0/1C0000C0 (00000005000000000000001C, 000000C0)
Backup completed (start time: 2021-04-27 15:05:05.289717, elapsed time: 11 seconds)
Processing xlog segments from file archival for main-db-server
00000005000000000000001B
00000005000000000000001C
00000005000000000000001C.00000028.backup
I don't know how to restore my DB on a time between 2 backups :/
Thanks

Streaming replication is failing with "WAL segment has already been moved"

I am trying to implement Master/Slave streaming replication on Postgres 11.5. I ran the following steps -
On Master
select pg_start_backup('replication-setup',true);
On Slave
Stopped the postgres 11 database and ran
rsync -aHAXxv --numeric-ids --progress -e "ssh -T -o Compression=no -x" --exclude pg_wal --exclude postgresql.pid --exclude pg_log MASTER:/var/lib/postgresql/11/main/* /var/lib/postgresql/11/main
On Master
select pg_stop_backup();
On Slave
rsync -aHAXxv --numeric-ids --progress -e "ssh -T -o Compression=no -x" MASTER:/var/lib/postgresql/11/main/pg_wal/* /var/lib/postgresql/11/main/pg_wal
I created the recovery.conf file on slave ~/11/main folder
standby_mode = 'on'
primary_conninfo = 'user=postgres host=MASTER port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'
primary_slot_name='my_repl_slot'
When I start Postgres on Slave, I get the error on both MASTER and SLAVE logs -
019-11-08 09:03:51.205 CST [27633] LOG: 00000: database system was interrupted; last known up at 2019-11-08 02:53:04 CST
2019-11-08 09:03:51.205 CST [27633] LOCATION: StartupXLOG, xlog.c:6388
2019-11-08 09:03:51.252 CST [27633] LOG: 00000: entering standby mode
2019-11-08 09:03:51.252 CST [27633] LOCATION: StartupXLOG, xlog.c:6443
2019-11-08 09:03:51.384 CST [27634] LOG: 00000: started streaming WAL from primary at 12DB/C000000 on timeline 1
2019-11-08 09:03:51.384 CST [27634] LOCATION: WalReceiverMain, walreceiver.c:383
2019-11-08 09:03:51.384 CST [27634] FATAL: XX000: could not receive data from WAL stream: ERROR: requested WAL segment 00000001000012DB0000000C has already been removed
2019-11-08 09:03:51.384 CST [27634] LOCATION: libpqrcv_receive, libpqwalreceiver.c:772
2019-11-08 09:03:51.408 CST [27635] LOG: 00000: started streaming WAL from primary at 12DB/C000000 on timeline 1
2019-11-08 09:03:51.408 CST [27635] LOCATION: WalReceiverMain, walreceiver.c:383
The problem is the START WAL - 00000001000012DB0000000C is available right until I run the pg_stop_backup() and is getting archived and no longer available, once the pg_stop_backup() is executed. So this is not an issue of the WAL being archived out due to low WAL_KEEP_SEGMENTS.
postgres#SLAVE:~/11/main/pg_wal$ cat 00000001000012DB0000000C.00000718.backup
START WAL LOCATION: 12DB/C000718 (file 00000001000012DB0000000C)
STOP WAL LOCATION: 12DB/F4C30720 (file 00000001000012DB000000F4)
CHECKPOINT LOCATION: 12DB/C000750
BACKUP METHOD: pg_start_backup
BACKUP FROM: master
START TIME: 2019-11-07 15:47:26 CST
LABEL: replication-setup-mdurbha
START TIMELINE: 1
STOP TIME: 2019-11-08 08:48:35 CST
STOP TIMELINE: 1
My MASTER has archive_command set, and I have the missing WALs available. I copied them into a restore directory on the SLAVE and tried the recovery.conf below, but it still fails with the MASTER reporting the same WAL segment has already been moved error.
Any idea how I can address this issue? I have used rsync to setup replication without any issues in the past on Postgres 9.6, but have been experiencing this issue on Postgres 11.
standby_mode = 'on'
primary_conninfo = 'user=postgres host=MASTER port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'
restore_command='cp /var/lib/postgresql/restore/%f %p'
Put a restore_command into recovery.conf that can restore archived WAL files and you are fine.

What is the purpose of `pg_logical` directory inside PostgreSQL data?

I've just stumbled upon this error while testing failover of a PostgreSQL 9.4 cluster I've set up. Here I'm trying to promote a slave to be the new master:
$ repmgr -f /etc/repmgr/repmgr.conf --verbose standby promote
2014-09-22 10:46:37 UTC LOG: database system shutdown was interrupted; last known up at 2014-09-22 10:44:02 UTC
2014-09-22 10:46:37 UTC LOG: database system was not properly shut down; automatic recovery in progress
2014-09-22 10:46:37 UTC LOG: redo starts at 0/18000028
2014-09-22 10:46:37 UTC LOG: consistent recovery state reached at 0/19000600
2014-09-22 10:46:37 UTC LOG: record with zero length at 0/1A000090
2014-09-22 10:46:37 UTC LOG: redo done at 0/1A000028
2014-09-22 10:46:37 UTC LOG: last completed transaction was at log time 2014-09-22 10:36:22.679806+00
2014-09-22 10:46:37 UTC FATAL: could not open directory "pg_logical/snapshots": No such file or directory
2014-09-22 10:46:37 UTC LOG: startup process (PID 2595) exited with exit code 1
2014-09-22 10:46:37 UTC LOG: aborting startup due to startup process failure
pg_logical/snapshots dir in fact exists on master node and it is empty.
UPD: I've just manually created empty directories pg_logical/snapshots and pg_logical/mappings and server has started without complaining. repmgr standby clone seems to omit this dirs while syncing. But the question still remains because I'm just curious what this directory is for, maybe I'm missing something in my setup. Simply Googling it did not yield any meaningful results.
It's for the new logical changeset extraction / logical replication feature in 9.4.
This shouldn't happen, though... it suggests a significant bug somewhere, probably repmgr. I'll wait for details (repmgr version, etc).
Update: Confirmed, it's a repmgr bug. It's fixed in git master already (and was before this report) and will be in the next release. Which had better be soon, given the significance of this issue.

Heroku postgres follower has more tables

Just created a follower Heroku postgres database. The follower seems to have more tables than the 'master'. Why?
$ heroku pg:info
=== HEROKU_POSTGRESQL_XXXX_URL (DATABASE_URL)
Plan: Ronin
Status: Available
Data Size: 3.12 GB
Tables: 56
PG Version: 9.3.4
Connections: 20
Fork/Follow: Available
Rollback: Unsupported
Created: 2014-07-12 21:35 UTC
Followers: HEROKU_POSTGRESQL_YYYY
Maintenance: not required
=== HEROKU_POSTGRESQL_YYYY_URL
Plan: Premium 2
Status: Available
Data Size: 5.05 GB
Tables: 70
PG Version: 9.3.5
Connections: 2
Fork/Follow: Unavailable on followers
Rollback: earliest from 2014-08-20 05:56 UTC
Created: 2014-08-27 05:47 UTC
Data Encryption: In Use
Following: HEROKU_POSTGRESQL_XXXX
Behind By: 72755 commits
Maintenance: not required
Note: My original db plan is now legacy, so I had to create my follower with a different, larger db plan.
My app's operation isn't unduly affected, but I'm curious about the table number discrepancy. Also, if I hit-swap this follower to become primary, will the table count go from 70 to 56?
What DrColossos said in the comments; your database is behind in commits, something is blocking it from applying the upstream changes. You can install the pg-extras plugin and examine your follower database:
$ heroku pg:locks HEROKU_POSTGRESQL_YYY_URL -a app_name
That should show you some information on locks that could be preventing your database from catching up. If it's still 72k or more commits behind, I imagine you'll find a very old lock in place.