Can't find wal backup files - postgresql

I have two servers, a master and a replica that work together in asynchronous replication mode; slave server seems to be working fine, since any change in the master is mirrored in the slave right away. Moreover, there is also an archive process that copy the wal files from the master to another filesystem to keep them safety. My doubt is, what wal files can I delete by means of pg_archivecleanup? I guess that I need look for wal files with .backup extension in the master and then I could delete wal files older than last backup.
For instance, if I have these files in the master server
000000010000000000000089
000000010000000000000088.00000028.backup
000000010000000000000088
000000010000000000000087
000000010000000000000086
...
I come to the conclusion that is safe deleting 000000010000000000000088 and older files, and keep the newest ones.
The problem is that I don't find .backup files anywhere: neither in the master, nor in the replica, nor in the archive location.

The *.backup files are created if you perform an online backup with pg_basebackup or similar and you are archiving WAL files using the archive_command.
In that case you can use pg_archivecleanup with such a file as argument to automatically remove all WAL archives that are older than that backup.
Your either use a different backup method (pg_dump?), or you are not using archive_command.
With pg_dump you cannot do archive recovery, so you wouldn't need to archive WALs at all. If you are using a different archive method like pg_receivewal, you won't get the *.backup files and you have to think of a different method to remove your old WAL archives.
One simple method to purge your old WAL archives is to simply remove all those that are older than your retention time.

The files are still being generated and archived (unless you turned that off) an the master. They are also passed by streaming to the replica where they are kept in pg_wal, but the replica automatically cleans them up every restartpoint. You can get the replica to keep them permanently by setting archive_mode=always on the replica, but it sounds like you don't want that.
If the only purpose of the archive (by the master) is to save files for use by the replica in case it falls to far behind for streaming (not for disaster recovery or PITR) than you can use "pg_archivecleanup" to automatically clean them up. This is invoked on the replica (not the master) but it must have write access to the archive directory. So you can mount it as a network file share, you can wrap pg_archivecleanup in ssh so it gets run on the master rather than the replica.

Related

How to recover PostgreSQL 13.5 database without backup file in Ubuntu 20.04 server?

I had 10-15 microservices databases running in production on Ubuntu server. Accidentally deleted everything in the /var/lib/postgresql/** folder with command sudo rm -r *. I think PGDATA is inside the /var/lib/postgresql/13/ folder.
I tried TestDisk to restore this folder but it showed everything deleted except the 13/ folder.
I only have backup files from a long time ago.
Is there any way to restore the last data?
If you don't have a backup of the deleted files and testdisk was not able to recover them, you may want to try using another data recovery tool such as extundelete or photorec. These tools work by scanning the partition and looking for data that is no longer referenced by the file system, which can include deleted files.
It's important to note that the chances of successfully recovering deleted files decrease as more time passes and more activity occurs on the partition, so it's best to try to recover the files as soon as possible after the deletion. In addition, the more you use the partition after the deletion, the more likely it is that the deleted data will be overwritten, making it impossible to recover.
If you are unable to recover the deleted files using these tools, you may want to consider seeking the assistance of a professional data recovery service. These services typically have specialized equipment and expertise that can be used to recover data from damaged or formatted disks. However, these services can be expensive, so it's important to weigh the value of the data against the cost of recovery.

Difference between incremental backup and WAL archiving with PgBackRest

As far as I understood
WAL archiving is pushing the WAL logs to a storage place as the WAL files are generated
Incremental backup is pushing all the WAL files created since the last backup
So, assuming my WAL archiving is setup correctly
Why would I need incremental backups?
Shouldn't the cost of incremental backups be almost zero?
Most of the documentation I found is focusing on a high level implementation (e.g. how to setup WAL archiving or incremental backups) vs the internal ( what happens when I trigger an incremental backup)
My question can probably be solved with a link to some documentation, but my google-fu has failed me so far
Backups are not copies of the WAL files, they're copies of the cluster's whole data directory. As it says in the docs, an incremental backup contains:
those database cluster files that have changed since the last backup (which can be another incremental backup, a differential backup, or a full backup)
WALs alone aren't enough to restore a database; they only record changes to the cluster files, so they require a backup as a starting point.
The need for periodic backups (incremental or otherwise) is primarily to do with recovery time. Technically, you could just hold on to your original full backup plus years worth of WAL files, but replaying them all in the event of a failure could take hours or days, and you likely can't tolerate that kind of downtime.
A new backup also means that you can safely discard any older WALs (assuming you don't still need them for point-in-time recovery), meaning less data to store, and less data whose integrity you're relying on in order to recover.
If you want to know more about what pgBackRest is actually doing under the hood, it's all covered pretty thoroughly in the Postgres docs.

How to fix this ERROR: could not open file "pg_wal/00000003.history", or what *not* to do when PostgreSQL runs out of disk space

If you ever run out of space in your pg_wal directory, and you can't grow the size of that directory or switch to standby in a case where your primary goes down as a result of the full pg_wal directory, when moving the WAL files to a different location, ensure not to move pg_wal/00000003.historyas it is a core file when starting pg_basebackup for streaming replication. If in case you had moved the file already to a different location, consider bringing this particular one pg_wal/00000003.history back to the pg_wal directory.
In a case where you had actually deleted the file then you may consider just creating another to see if will work...i have not tried it myself so you maybe the first or second to try :).
I faced this issue in my production environment and resolved it by copying back pg_wal/00000003.history from where i had copied it to and my streaming replication could run without errors.
Never, ever, manually mess with the contents of pg_wal. In v10, we went to some effort to rename what was formerly known as pg_xlog to pg_wal, precisely to discourage misguided people who thought that were "just log files" from deleting files there.
If you run out of space, there is one thing you can do: move the entire pg_wal to a different file system where you have space and put a symbolic link to the new location into the PostgreSQL data directory.

How to configure WAL archiving for a cluster that *only* hosts dev or test databases?

I've got a dev and test database for a project, i.e. databases that I use to either run my project or run tests, locally. They're both in the same cluster ('instance' – I come from Redmond).
Note that my local cluster is different than the cluster that hosts the production database.
How should I configure those databases with respect to archiving the WAL files?
I'd like to be able to 'build' or 'rebuild' either of those databases by restoring from a base backup and running seed data scripts.
But how should I configure the databases or the cluster for archiving WAL files? I understand that I need them if I want to recover the database. I think that's unlikely (as I didn't even know about 'WAL' or their files, or that, presumably they're shared by all of the databases in the same cluster, which seems weird and scary coming from Microsoft SQL Server.)
In the event that I rebuild one of the databases, I should delete the WAL files since the base backup – how can I do that?
But I also don't want to have to worry about the size of the WAL files growing indefinitely. I don't want to be forced to rebuild just to save space. What can I do to prevent this?
My local cluster only contains a single dev and test database for my project, i.e. losing data from one of these databases is (or should be) no big deal. Even having to recreate the cluster itself, and the two databases, is fine and not an issue if it's even just easier than otherwise to restore the two databases to a 'working' condition for local development and testing.
In other words, I don't care about the data in either database. I will ensure – separate from WAL archiving – that I can restore either database to a state sufficient for my needs.
Also, I'd like to document (e.g. in code) how to configure my local cluster and the two databases so that other developers for the same project can use the same setup for their local clusters. These clusters are all distinct from the cluster that hosts the production database.
Rather than trying to manage your WAL files manually, it's generally recommended that you let a third-party app take care of that for you. There are several options, but pg_backrest is the most popular of the open-source offerings out there.
Each database instance writes its WAL stream, chopped in segments of 16MB.
Every other relational database does the same thing, even Microsoft SQL Server (the differences are in the name and organization of these files).
The WAL contains the physical information required to replay transactions. Imagine it as information like: "in file x, block 2734, change 24 bytes at offset 543 as follows: ..."
With a base backup and this information you can restore any given point in time in the life of the database since the end of the base backup.
Each PostgreSQL cluster writes its own "WAL stream". The files are named with long weird hexadecimal numbers that never repeat, so there is no danger that a later WAL segment of a cluster can conflict with an earlier WAL segment of the same cluster.
You have to make sure that WAL is archived to a different machine, otherwise the exercise is pretty useless. If you have several clusters on the same machine, make sure that you archive them to different directories (or locations in general), because the names of the WAL segments of different clusters will collide.
About retention: You want to keep around your backups for some time. Once you get rid of a base backup, you can also get rid of all WAL segments from before that base backup. There is the pg_archivecleanup executable that can help you get rid of all archived WAL segments older than a given base backup.
I'd like to be able to 'build' or 'rebuild' either of those databases by restoring from a base backup and running seed data scripts.
Where is the basebackup coming from? If you are restoring the PROD base backup and running the seed scripts over it, then you don't need WAL archiving at all on test/dev. But then what you get will be a clone of PROD, which means it will not have different databases for test and for dev in the same instance, since (presumably) PROD doesn't have that.
If the base backup is coming from someplace else, you will have to describe what it is. That will dictate your WAL needs.
Trying to run one instance with both test and dev on it seems like a false economy to me. Just run two instances.
Setting archive_mode=off will entirely disable a wal archive. There will still be "live" WAL files in the pg_wal or pg_xlog directory, but these get removed/recycled automatically after each checkpoint--you should not need to manage these, other than by controlling how often checkpoints take place (and making sure you don't have any replication slots hanging around). The WAL archive and the live WAL files are different things. The live WAL files are mandatory and are needed to automatically recover from something like a power failure. The WAL archive may be needed to manually recover from a hard-drive crash or the total destruction of your server, and probably isn't needed at all on dev/test.

use of archive_command in PostgreSQL streaming replication

When using streaming replication can someone please explain the purpose of archive_command and restore_command in PostgreSQL?
As i studied in streaming replication secondary server read and apply the partially filled WAL files.suppose i have my wal segment location in pg_xlog and using archive_command i am copying this to my local archive directory say /arclogs.
So if secondary server is going to read the partially filled archive logs from pg_xlog over the network then what's the use of files kept in /arclogs.
and also the files will be sent to /arclogs only when they will be 16 mb?
I'm new to PostgreSQL & your help will be appericated.
The master will normally only retain a limited amount of WAL in pg_xlog, controlled by the master's wal_keep_segments setting. If the replica is too slow or disconnected for too long, the master will delete those transaction logs to ensure it can continue running without running out of disk space.
If that happens the replica has no way to catch up to the master, since it needs a continuous and gap-free stream of WAL.
So you can:
Enable WAL archiving (archive_command and archive_mode) as a fallback, so the replica can switch to replaying WAL from archives if the master deletes WAL it needs from its pg_xlog. The replica fetches the WAL with its restore_command. Importantly, the archived WAL does not need to be on the same machine as the master, and usually isn't.
or
Use a physical replication slot (primary_slot_name in recovery.conf) to connect the replica to the master. If a slot is used, the master knows what WAL the replica requires even when the replica is disconnected. So it won't remove WAL still needed by a replica from pg_xlog. But the downside is that pg_xlog can fill up if a replica is down for too long, causing the master to fail due to lack of disk space.
or
Do neither, and allow replicas to fail if they fall too far behind. Then re-create them from a new base backup if this happens.
The documentation really needs an overview piece to put all this together.
WAL archiving has an additional benefit: If you make a base backup of the server you can use it, plus WAL archives, to do a point-in-time restore of the master. This lets you recover data from things like accidental table drops. PgBarman is one of the tools that can help you with this.