Should WAL files be deleted after some period of time? - postgresql

I'm wondering If I understand WAL internals correctly.
I have a Postrges database and I'm writing new data to it. New files are created in WAL directory and its size is constantly increasing.
When I stop my writing process I expect no new files will be created in WAL and after some time logs from WAL will be checkpointed to storage and old files deleted from WAL.
However this is not what happens. Old files are deleted but new files are constantly created so finally no space in WAL is reclaimed and the directory grows.
I can see no files older than 2h but at the same time every minute new files come up.
Is it correct? How can I free some space in WAL directory then?

Related

How to recover PostgreSQL 13.5 database without backup file in Ubuntu 20.04 server?

I had 10-15 microservices databases running in production on Ubuntu server. Accidentally deleted everything in the /var/lib/postgresql/** folder with command sudo rm -r *. I think PGDATA is inside the /var/lib/postgresql/13/ folder.
I tried TestDisk to restore this folder but it showed everything deleted except the 13/ folder.
I only have backup files from a long time ago.
Is there any way to restore the last data?
If you don't have a backup of the deleted files and testdisk was not able to recover them, you may want to try using another data recovery tool such as extundelete or photorec. These tools work by scanning the partition and looking for data that is no longer referenced by the file system, which can include deleted files.
It's important to note that the chances of successfully recovering deleted files decrease as more time passes and more activity occurs on the partition, so it's best to try to recover the files as soon as possible after the deletion. In addition, the more you use the partition after the deletion, the more likely it is that the deleted data will be overwritten, making it impossible to recover.
If you are unable to recover the deleted files using these tools, you may want to consider seeking the assistance of a professional data recovery service. These services typically have specialized equipment and expertise that can be used to recover data from damaged or formatted disks. However, these services can be expensive, so it's important to weigh the value of the data against the cost of recovery.

How to Delay Postgresql WAL Recycling?

I have 1 primary and 1 replica database (Postgresql 14.5). Most of the time, archives from prod to replica is being transferred with no problem. But rarely it can be interrupted because of network issues. At that time, some wal files can't be sent and when Replica server doesn't find that files since wal files are deleted (probably because of recycling), thus it fails to recover.
How can i delay this deletion of wal files or recycling process in order to prevent desync between primary database and replica? I've checked out some parameters such as "wal_writer_delay" etc but it doesn't seem to be helpful.

Difference between incremental backup and WAL archiving with PgBackRest

As far as I understood
WAL archiving is pushing the WAL logs to a storage place as the WAL files are generated
Incremental backup is pushing all the WAL files created since the last backup
So, assuming my WAL archiving is setup correctly
Why would I need incremental backups?
Shouldn't the cost of incremental backups be almost zero?
Most of the documentation I found is focusing on a high level implementation (e.g. how to setup WAL archiving or incremental backups) vs the internal ( what happens when I trigger an incremental backup)
My question can probably be solved with a link to some documentation, but my google-fu has failed me so far
Backups are not copies of the WAL files, they're copies of the cluster's whole data directory. As it says in the docs, an incremental backup contains:
those database cluster files that have changed since the last backup (which can be another incremental backup, a differential backup, or a full backup)
WALs alone aren't enough to restore a database; they only record changes to the cluster files, so they require a backup as a starting point.
The need for periodic backups (incremental or otherwise) is primarily to do with recovery time. Technically, you could just hold on to your original full backup plus years worth of WAL files, but replaying them all in the event of a failure could take hours or days, and you likely can't tolerate that kind of downtime.
A new backup also means that you can safely discard any older WALs (assuming you don't still need them for point-in-time recovery), meaning less data to store, and less data whose integrity you're relying on in order to recover.
If you want to know more about what pgBackRest is actually doing under the hood, it's all covered pretty thoroughly in the Postgres docs.

How to fix this ERROR: could not open file "pg_wal/00000003.history", or what *not* to do when PostgreSQL runs out of disk space

If you ever run out of space in your pg_wal directory, and you can't grow the size of that directory or switch to standby in a case where your primary goes down as a result of the full pg_wal directory, when moving the WAL files to a different location, ensure not to move pg_wal/00000003.historyas it is a core file when starting pg_basebackup for streaming replication. If in case you had moved the file already to a different location, consider bringing this particular one pg_wal/00000003.history back to the pg_wal directory.
In a case where you had actually deleted the file then you may consider just creating another to see if will work...i have not tried it myself so you maybe the first or second to try :).
I faced this issue in my production environment and resolved it by copying back pg_wal/00000003.history from where i had copied it to and my streaming replication could run without errors.
Never, ever, manually mess with the contents of pg_wal. In v10, we went to some effort to rename what was formerly known as pg_xlog to pg_wal, precisely to discourage misguided people who thought that were "just log files" from deleting files there.
If you run out of space, there is one thing you can do: move the entire pg_wal to a different file system where you have space and put a symbolic link to the new location into the PostgreSQL data directory.

Can't find wal backup files

I have two servers, a master and a replica that work together in asynchronous replication mode; slave server seems to be working fine, since any change in the master is mirrored in the slave right away. Moreover, there is also an archive process that copy the wal files from the master to another filesystem to keep them safety. My doubt is, what wal files can I delete by means of pg_archivecleanup? I guess that I need look for wal files with .backup extension in the master and then I could delete wal files older than last backup.
For instance, if I have these files in the master server
000000010000000000000089
000000010000000000000088.00000028.backup
000000010000000000000088
000000010000000000000087
000000010000000000000086
...
I come to the conclusion that is safe deleting 000000010000000000000088 and older files, and keep the newest ones.
The problem is that I don't find .backup files anywhere: neither in the master, nor in the replica, nor in the archive location.
The *.backup files are created if you perform an online backup with pg_basebackup or similar and you are archiving WAL files using the archive_command.
In that case you can use pg_archivecleanup with such a file as argument to automatically remove all WAL archives that are older than that backup.
Your either use a different backup method (pg_dump?), or you are not using archive_command.
With pg_dump you cannot do archive recovery, so you wouldn't need to archive WALs at all. If you are using a different archive method like pg_receivewal, you won't get the *.backup files and you have to think of a different method to remove your old WAL archives.
One simple method to purge your old WAL archives is to simply remove all those that are older than your retention time.
The files are still being generated and archived (unless you turned that off) an the master. They are also passed by streaming to the replica where they are kept in pg_wal, but the replica automatically cleans them up every restartpoint. You can get the replica to keep them permanently by setting archive_mode=always on the replica, but it sounds like you don't want that.
If the only purpose of the archive (by the master) is to save files for use by the replica in case it falls to far behind for streaming (not for disaster recovery or PITR) than you can use "pg_archivecleanup" to automatically clean them up. This is invoked on the replica (not the master) but it must have write access to the archive directory. So you can mount it as a network file share, you can wrap pg_archivecleanup in ssh so it gets run on the master rather than the replica.