Recover Postgres Streaming Replication Slave from Archived Wal Logs

Recover Postgres Streaming Replication Slave from Archived Wal Logs - postgresql

I have set up a Postgres Hot Standby server by Streaming Replication. But My Standby server is asking for an old wal archive log which is currently not in Master's pg_xlog directory. But the file exists in the wal archive backup directory.
How can I configure Standby to read this file from backup directory? Or any way to manually copy this file to Standby Server ?
Any help will be appreciated.

You would have to add a restore_command to recovery.conf that can restore files from the WAL archive.
Then restart the standby, and it should be able to recover.
When the standby cannot get the required WAL via streaming replication, it tries restore_command. When that fails, it tries streaming replication again, and so on in an endless loop.

Related

PostgreSQL Point-In-Time Recovery Getting Error with No valid checkpoint record

I am trying to perform a Point-In-Time Recovery using the WAL_ARCHIVE process. The archive command is added to the postgresql.conf file and I can see the WAL being archived in the backup-archive directory. When I try to start the service I get PANIC: could not locate a valid checkpoint record
I am using the below step-by-step process.
low level api basebackup
SELECT pg_start_backup('label', true, false);
copying the data directory of my cluster
tar -zcvpf basebkPostgres20230110New.tgz /PostgreSQL/13/data
closing my basebackup
SELECT * FROM pg_stop_backup(false, true);
Stopping the postgres service
Removing the current's cluster data directory
Restoring the backed up data directory
Removing the contents of the pg_wal directory
Setting the restore_command in the postgresql.conf file
Starting the postgres service

You forgot the backup_label file and recovery.signal. You have to capture the result of pg_stop_backup (or pg_backup_stop from v15 on) and create backup_label from the contents. That file has to be in the restored data directory. Also, you have to create recovery.signal in the data directory, so that PostgreSQL starts in archive recovery mode and reads your restore_command.
Without restore_command, PostgreSQL uses the WAL in pg_wal, which is empty. Without backup_label, PostgreSQL thinks that it can recover from the checkpoint indicated by the control file pg_control. Even if that worked, the result would be a corrupted database, since you have to recover from the start of the backup.
recovery.signal is documented here (step 7), and backup_label is documented here (step 4).

postgresql streaming replication - Master Server keeps all the archives and this is filling up my HD

Is there a way to run cleanups on Master Server for these archive files that are older and are not needed for the slave server for streaming replication?

You can use the recovery parameter archive_cleanup_command together with the pg_archivecleanup command:
archive_cleanup_command = 'pg_archivecleanup /var/lib/postgresql/pg_log_archive/main %r'
That command assumes that the WAL archives are accessible in /mnt/server/archivedir on the standby server.
Note that for PostgreSQL versions older than v12, this has to be specified in recovery.conf.
If you don't have an easy way to access the WAL archives from the standby, you could use an NFS mount.

Multiple times Point in time recovery in PostgreSQL

Can we perform point in time recovery more than one time using same recovery.conf file, because the recovery.conf file changes to recovery.done after a one restoration of wal file.
What if I want to do another wal file restoration at a different time using same recovery.conf file.I can't do that? Or do I have to again do a pg_basebackup and then create a new recovery file each time in my data directory of Postgres to retore next wal file

Once recovery is done, you cannot go back.
You have to restore the backup again and start from scratch.
The only alternative is using pg_rewind, but that can only reset a cluster to the state of another cluster (and you probably don't have that other cluster).

WAL file is from different database system

Am using WAL E and trying to restore a postgresql database i get the error WAL file is from different database system: WAL file database system identifier is 6422218584094261886, pg_control database system identifier is 6338745400937582833
How can i force the postgresql database to use the database identifier of the WAL archive or force WAL archive to use the Database system identifier of the database

To restore the WAL files you will need your original base-backup and all the WAL files from that point forwards. The WAL files themselves list the changes to a base backup.

Did you move the pg data dir while the database was still running?
Had the same problem when migrating from one server node to the other, moving the postgres data dir with scp to the other node. When starting the docker container on the new node, i got this error in docker log -f. It turned out that i was copying the dirs while the postgres docker container on the source node was still running.

How wal logs work in postgres?

I have to do a point in time recovery in postgres should i have all the wal logs that are generated or is it fine to have few recent wal log files ? I tried recovery but I'm not sure if all the files are recovered