Multiple times Point in time recovery in PostgreSQL - postgresql

Can we perform point in time recovery more than one time using same recovery.conf file, because the recovery.conf file changes to recovery.done after a one restoration of wal file.
What if I want to do another wal file restoration at a different time using same recovery.conf file.I can't do that? Or do I have to again do a pg_basebackup and then create a new recovery file each time in my data directory of Postgres to retore next wal file

Once recovery is done, you cannot go back.
You have to restore the backup again and start from scratch.
The only alternative is using pg_rewind, but that can only reset a cluster to the state of another cluster (and you probably don't have that other cluster).

Related

PostgreSQL Point-In-Time Recovery Getting Error with No valid checkpoint record

I am trying to perform a Point-In-Time Recovery using the WAL_ARCHIVE process. The archive command is added to the postgresql.conf file and I can see the WAL being archived in the backup-archive directory. When I try to start the service I get PANIC: could not locate a valid checkpoint record
I am using the below step-by-step process.
low level api basebackup
SELECT pg_start_backup('label', true, false);
copying the data directory of my cluster
tar -zcvpf basebkPostgres20230110New.tgz /PostgreSQL/13/data
closing my basebackup
SELECT * FROM pg_stop_backup(false, true);
Stopping the postgres service
Removing the current's cluster data directory
Restoring the backed up data directory
Removing the contents of the pg_wal directory
Setting the restore_command in the postgresql.conf file
Starting the postgres service
You forgot the backup_label file and recovery.signal. You have to capture the result of pg_stop_backup (or pg_backup_stop from v15 on) and create backup_label from the contents. That file has to be in the restored data directory. Also, you have to create recovery.signal in the data directory, so that PostgreSQL starts in archive recovery mode and reads your restore_command.
Without restore_command, PostgreSQL uses the WAL in pg_wal, which is empty. Without backup_label, PostgreSQL thinks that it can recover from the checkpoint indicated by the control file pg_control. Even if that worked, the result would be a corrupted database, since you have to recover from the start of the backup.
recovery.signal is documented here (step 7), and backup_label is documented here (step 4).

How to take periodic pg_basebackup without losing any WAL files. How to pause wal archive

Environment: PostgreSQL 13.x Docker container.
I took a pg_basebackup and have configured PostgreSQL 13.x with wal_archive=on. And it is working as expected.
I see that it is recommended to take pg_basebackup periodically. How can I rotate the base_backups weekly or daily?
Example: If new pg_basebackup is running every Saturday night, Should we consider stopping/pausing wal_archiving for that duration?
#Locations:
pg_basebackup : /db-backup/basebackup
archive_command: /db-backup/wal_files
So want to move archive db-backup every Saturday.
mv /db-backup /db-backup-old
While performing these Should I pause the wal_archiving process? As per docs
we can stop/pause it by setting 24.3.1. Setting up WAL archiving
archive_command = ''
Is this the right approach? If so, should we reload the configuration OR any way we can update this configuration on-the-fly?
Note: using Postgres-docker container.
What I am trying to achieve is:
If some data is getting written on DB during DB backup rotation, either it should be in new basebackup OR new wal-files directory.
Please correct me if these confusions are irrelevant.

How to recover the current wal file that was being written in master and not yet archived?

I am new to postgres and was trying to simulate a postgresql cluster so:
I have two nodes installed for postgres latest version and acting as active / hot standby and with master configuration :
archive_mode = on
archive_command = 'test ! -f /data/%f && cp %p /data/%f'
and slave configuration
primary_slot_name = 'standby_db2_slot'
hot_standby = on
and others default and related configuration
my question is if the standby was off for some time and the master crashes how to recover the data from my archived wal files also how to get the last wal file that the master was writing to before crashing?
You could copy the files from the archive (if it is still available) into the replica's pg_wal folder. Or more typically, you would set restore_command to copy each of them from the archive upon request.
how to get the last wal file that the master was writing to before crashing?
If it was a hard crash where the master's storage was irreparably destroyed, you likey can't get it. That is why streaming is great, it copies the data stream in near-real time to minimize loss. And if was a soft crash, why are you trying to promote the replica anyway, rather than just turning the master back on? If the master's storage was only partially destroyed, then just copy this last file to the archive manually.

Which Postgresql WAL files can I safely remove from the WAL archive folder

Current situation
So I have WAL archiving set up to an independent internal harddrive on a data logging computer running Postgres. The harddrive containing the WAL archives is filling up and I'd like to remove and archive all the WAL archive files, including the initial base backup, to external backup drives.
The directory structure is like:
D:/WALBACKUP/ which is the parent folder for all the WAL files (00000110000.CA00000004 etc)
D:/WALBACKUP/BASEBACKUP/ which holds the .tar of the initial base backup
The question I have then is:
Can I safely move literally every single WAL file except the current WAL archive file, (000000000001.CA0000.. and so on), including the base backup, and move them to another hdd. (Note that the database is live and receiving data)
cheers!
WAL archives
You can use the pg_archivecleanup command to remove WAL from an archive (not pg_xlog) that's not required by a given base backup.
In general I suggest using PgBarman or a similar tool to automate your base backups and WAL retention though. It's easier and less error prone.
pg_xlog
Never remove WAL from pg_xlog manually. If you have too much WAL then:
your wal_keep_segments setting is keeping WAL around;
you have archive_mode on and archive_command set but it isn't working correctly (check the logs);
your checkpoint_segments is ridiculously high so you're just generating too much WAL; or
you have a replication slot (see the pg_replication_slots view) that's preventing the removal of WAL.
You should fix the problem that's causing WAL to be retained. If nothing seems to have happened after changing a setting run a manual CHECKPOINT command.
If you have an offline server and need to remove WAL to start it you can use pg_archivecleanup if you must. It knows how to remove only WAL that isn't needed by the server its self ... but it might break your archive-based backups, streaming replicas, etc. So don't use it unless you must.
WAL files are incremental, so the simple answer is: You cannot throw any files out. The solution is to make a new base backup and then all previous WALs can be deleted.
The WAL files contain individual statements that modify tables so if you throw some older WALs out, then the recovery process will fail (it will not silently skip missing WAL files) because the state of the database cannot be restored reliably. You can move the WAL files to some other location without upsetting the WAL process but then you'd have to make all WAL files available again from a single location if you ever need to recover your database from some point in the past; if you are running out of disk space then that may mean recovering from some location where you have enough space to store the base backup and all WAL files. The main issue here is if you can do that fast enough to restore a full database after an incident.
Another issue is that if you cannot identify where/when a problem occurred that needs to be corrected your only option is to start with the base backup and then replay all the WAL files. This procedure is not difficult, but if you have an old base backup and many WAL files to process, this simply takes a lot of time.
The best approach for your case, in general, is to make a new base backup every x months and collect WALs with that base backup. After every new base backup you can delete the old base backup and its subsequent WALs or move them to cheap offline storage (DVD, tape, etc). In the case of a major incident you can quickly restore the database to a known correct state from the recent base backup and the relatively few WAL files collected since then.
A solution that we went for, is executing pg_basebackup every night. This would create a base backup and later on we can use pg_archivecleanup to clean up all the "old" WAL files before that base using something like
"%POSTGRES_INSTALLDIR%\bin\pg_archivecleanup" -d %WAL_backup_dir% %newestBaseFile%
Fortunately, we never had to recover yet, but it should work in theory.
In case someone found this by searching how to safely cleanup the WAL directory under a replication architecture, consider the scenario where there might be left overs from offline replicas, in this case, unused replica slots waiting for the replica to come back online and thus keeping a lot of WAL archives on the Master DB.
In our case we had an issue with a replica going down due to hardware failure, we had to recreate it along with its replica_slot on the Master DB but forgot to get rid of the previous used one. Once we cleared that out PSQL got rid of unused WALs and all was good.
You can add the script to automatically clean or remove pg_wal files. This will work in pg-11 version. If you want to use other psql version the you can simply replace the command "/usr/pgsql-11/bin/pg_archivecleanup" to /usr/pgsql-12/bin/pg_archivecleanup or 13 as per your wish.
#!/bin/bash
/usr/pgsql-11/bin/pg_controldata -D /var/lib/pgsql/11/data/ > pgwalfile.txt
/usr/pgsql-11/bin/pg_archivecleanup -d /var/lib/pgsql/11/data/pg_wal $(cat pgwalfile.txt | grep "Latest checkpoint's REDO WAL file" | awk '{print $6}')

pg_archivecleanup and streaming replication

Using postgres 9.3.
I'm a bit confused on the proper usage of pg_archivecleanup.
I'm using both streaming replication and backup with continuous archiving for PITR recovery.
I don't think I can configure pg_archivecleanup in recovery.conf on the standby as it wouldn't achieve anything. The master is not archiving to a location accessible to the standby. The master is archiving to a location on its local disk, and then those archives and the associated backup are being rsync'd to a large backup disk.
So, it seems the solution would be to run pg_archivecleanup in "standalone" mode on the master, such as:
/usr/lib/postgresql/9.3/bin/pg_archivecleanup -d /archive 0000000100000010000000F0.00000028.backup
So, I'd do a cron job that would run the pg_archivecleanup command for any .backup files which are older than the latest one, and then delete those backup files, leaving only the latest one.
Is my understanding and plan correct?
If you want to retain only WAL segments after the latest base backup, you simply run pg_archivecleanup in standalone mode for the latest .backup file (not for those older than the latest).
But do you really want to have only one available backup? First of all, you won't be able to restore to the point before the last backup. Second, it makes sense to have some backups just in case (corruptions, etc).
And it seems strange to archive segments to local disk and then rsync them elsewhere. Why not putting your rsync (and then sync to flush OS buffers to disk) into archive_command? This ensures that the segment won't be removed from pg_xlog before it reaches the destination.