Does pg_archivecleanup command effect on replication slot? - postgresql

As per replication slot definition, it is a feature in PostgreSQL that ensure that the master server will retain the WAL logs that are needed by the replicas even when they are disconnected from server.
Is there any effect if I run pg_archivecleanup command every 15th day of month to free my storage. Does it has any effect on replication slot, since it is tracing WAL file which is required by standby server?
Because I run pg_archivecleanup removing WAL file from last checkpoint but I am not sure whether it is removing WAl file that is required for other replica.
If not removing then how it is actually tracing it?
I am looking for explanation from experts.

When you run pg_archivecleanup, PostgreSQL will delete all WAL segments that are older than the WAL segment you specify as argument. This will ignore replication slots, you you may end up removing WAL segments that may still be needed by standby servers to catch up (if they do that using restore_command).
Note that this normally not a problem, because pg_archivecleanup deletes WAL segments from the archive, while replication slots deal with WAL segments on the primary server (in the pg_wal directory), which are not affected by pg_archivecleanup. Now since the standby consumes WAL directly from the primary (as specified in primary_conninfo), it does not have to rely on the WAL archives.

Related

What is a restartpoint in postgresql?

In the postgresql.conf file for PostgreSQL version 13, the archive_cleanup_command comment explains the command in the following way:
#archive_cleanup_command = '' # command to execute at every restartpoint.
The documentation here and here have no mention of a 'restartpoint'. This raises the following questions:
What is a restartpoint?
For example: is restartpoint just the same word for a checkpoint? Do the two mean the exact same thing?
When is a restartpoint created?
For example: if the restartpoint is just a checkpoint then the check point will be created every 5mins or whatever the setting for checkpoint_timeout is in postgresql.conf file.
When is the archive cleanup command run?
For example: The archive cleanup command is run every time the archive_timeout (set in the postgresql.conf file) is reached. If the archive timeout is set to 1hr, then the archive_cleanup_command runs every 1hr.
A restartpoint is just a checkpoint during recovery, and it is triggered in the same fashion as a checkpoint: either by timeout or by the amount of WAL processed since the last restartpoint. Note also that
Restartpoints can't be performed more frequently than checkpoints in the master because restartpoints can only be performed at checkpoint records.
The reason for restartpoints is “restartable recovery”: if your recovery process is interrupted, the next restart won't start recovering from the beginning of the backup, but from the latest restartpoint.
archive_cleanup_command is run for all completely recovered WAL segments during a restartpoint. Its main use case are log shipping standby servers: using archive_cleanup_command they can remove all shipped WAL segments they don't need any more, so that the directory containing them doesn't grow out of bounds.

Backup postgresql WAL logs

I try to configure backuping database in postgresql with pg_basebackup and WAL logs.
For now I created full backup once a week and want to backup wal logs too. But, as I understand, posgresql writes them all the time. So, how can I copy them and be shure that they are not corrupted?
Thanks
You set archive_command to a shell command that copies the WAL file to a safe archive location, so that burden is mostly on you.
When PostgreSQL runs archive_command, it assumes that the WAL file is not corrupted. Only a PostgreSQL bug or a bug in the storage system could cause a corrupted WAL segment.
There is no better protection against PostgreSQL bugs than always running the latest bugfix release, and you can invest in storage hardware that will at least detect failure.
You can also write your archive_command with a certain amount of paranoia, e.g. by comparing the md5sum of the WAL segment and its archive copy.
Another idea is to write two copies of the WAL file to different storage systems.

Recover Postgres Streaming Replication Slave from Archived Wal Logs

I have set up a Postgres Hot Standby server by Streaming Replication. But My Standby server is asking for an old wal archive log which is currently not in Master's pg_xlog directory. But the file exists in the wal archive backup directory.
How can I configure Standby to read this file from backup directory? Or any way to manually copy this file to Standby Server ?
Any help will be appreciated.
You would have to add a restore_command to recovery.conf that can restore files from the WAL archive.
Then restart the standby, and it should be able to recover.
When the standby cannot get the required WAL via streaming replication, it tries restore_command. When that fails, it tries streaming replication again, and so on in an endless loop.

Which Postgresql WAL files can I safely remove from the WAL archive folder

Current situation
So I have WAL archiving set up to an independent internal harddrive on a data logging computer running Postgres. The harddrive containing the WAL archives is filling up and I'd like to remove and archive all the WAL archive files, including the initial base backup, to external backup drives.
The directory structure is like:
D:/WALBACKUP/ which is the parent folder for all the WAL files (00000110000.CA00000004 etc)
D:/WALBACKUP/BASEBACKUP/ which holds the .tar of the initial base backup
The question I have then is:
Can I safely move literally every single WAL file except the current WAL archive file, (000000000001.CA0000.. and so on), including the base backup, and move them to another hdd. (Note that the database is live and receiving data)
cheers!
WAL archives
You can use the pg_archivecleanup command to remove WAL from an archive (not pg_xlog) that's not required by a given base backup.
In general I suggest using PgBarman or a similar tool to automate your base backups and WAL retention though. It's easier and less error prone.
pg_xlog
Never remove WAL from pg_xlog manually. If you have too much WAL then:
your wal_keep_segments setting is keeping WAL around;
you have archive_mode on and archive_command set but it isn't working correctly (check the logs);
your checkpoint_segments is ridiculously high so you're just generating too much WAL; or
you have a replication slot (see the pg_replication_slots view) that's preventing the removal of WAL.
You should fix the problem that's causing WAL to be retained. If nothing seems to have happened after changing a setting run a manual CHECKPOINT command.
If you have an offline server and need to remove WAL to start it you can use pg_archivecleanup if you must. It knows how to remove only WAL that isn't needed by the server its self ... but it might break your archive-based backups, streaming replicas, etc. So don't use it unless you must.
WAL files are incremental, so the simple answer is: You cannot throw any files out. The solution is to make a new base backup and then all previous WALs can be deleted.
The WAL files contain individual statements that modify tables so if you throw some older WALs out, then the recovery process will fail (it will not silently skip missing WAL files) because the state of the database cannot be restored reliably. You can move the WAL files to some other location without upsetting the WAL process but then you'd have to make all WAL files available again from a single location if you ever need to recover your database from some point in the past; if you are running out of disk space then that may mean recovering from some location where you have enough space to store the base backup and all WAL files. The main issue here is if you can do that fast enough to restore a full database after an incident.
Another issue is that if you cannot identify where/when a problem occurred that needs to be corrected your only option is to start with the base backup and then replay all the WAL files. This procedure is not difficult, but if you have an old base backup and many WAL files to process, this simply takes a lot of time.
The best approach for your case, in general, is to make a new base backup every x months and collect WALs with that base backup. After every new base backup you can delete the old base backup and its subsequent WALs or move them to cheap offline storage (DVD, tape, etc). In the case of a major incident you can quickly restore the database to a known correct state from the recent base backup and the relatively few WAL files collected since then.
A solution that we went for, is executing pg_basebackup every night. This would create a base backup and later on we can use pg_archivecleanup to clean up all the "old" WAL files before that base using something like
"%POSTGRES_INSTALLDIR%\bin\pg_archivecleanup" -d %WAL_backup_dir% %newestBaseFile%
Fortunately, we never had to recover yet, but it should work in theory.
In case someone found this by searching how to safely cleanup the WAL directory under a replication architecture, consider the scenario where there might be left overs from offline replicas, in this case, unused replica slots waiting for the replica to come back online and thus keeping a lot of WAL archives on the Master DB.
In our case we had an issue with a replica going down due to hardware failure, we had to recreate it along with its replica_slot on the Master DB but forgot to get rid of the previous used one. Once we cleared that out PSQL got rid of unused WALs and all was good.
You can add the script to automatically clean or remove pg_wal files. This will work in pg-11 version. If you want to use other psql version the you can simply replace the command "/usr/pgsql-11/bin/pg_archivecleanup" to /usr/pgsql-12/bin/pg_archivecleanup or 13 as per your wish.
#!/bin/bash
/usr/pgsql-11/bin/pg_controldata -D /var/lib/pgsql/11/data/ > pgwalfile.txt
/usr/pgsql-11/bin/pg_archivecleanup -d /var/lib/pgsql/11/data/pg_wal $(cat pgwalfile.txt | grep "Latest checkpoint's REDO WAL file" | awk '{print $6}')

pg_archivecleanup and streaming replication

Using postgres 9.3.
I'm a bit confused on the proper usage of pg_archivecleanup.
I'm using both streaming replication and backup with continuous archiving for PITR recovery.
I don't think I can configure pg_archivecleanup in recovery.conf on the standby as it wouldn't achieve anything. The master is not archiving to a location accessible to the standby. The master is archiving to a location on its local disk, and then those archives and the associated backup are being rsync'd to a large backup disk.
So, it seems the solution would be to run pg_archivecleanup in "standalone" mode on the master, such as:
/usr/lib/postgresql/9.3/bin/pg_archivecleanup -d /archive 0000000100000010000000F0.00000028.backup
So, I'd do a cron job that would run the pg_archivecleanup command for any .backup files which are older than the latest one, and then delete those backup files, leaving only the latest one.
Is my understanding and plan correct?
If you want to retain only WAL segments after the latest base backup, you simply run pg_archivecleanup in standalone mode for the latest .backup file (not for those older than the latest).
But do you really want to have only one available backup? First of all, you won't be able to restore to the point before the last backup. Second, it makes sense to have some backups just in case (corruptions, etc).
And it seems strange to archive segments to local disk and then rsync them elsewhere. Why not putting your rsync (and then sync to flush OS buffers to disk) into archive_command? This ensures that the segment won't be removed from pg_xlog before it reaches the destination.