I have 1 primary and 1 replica database (Postgresql 14.5). Most of the time, archives from prod to replica is being transferred with no problem. But rarely it can be interrupted because of network issues. At that time, some wal files can't be sent and when Replica server doesn't find that files since wal files are deleted (probably because of recycling), thus it fails to recover.
How can i delay this deletion of wal files or recycling process in order to prevent desync between primary database and replica? I've checked out some parameters such as "wal_writer_delay" etc but it doesn't seem to be helpful.
Related
As far as I understood
WAL archiving is pushing the WAL logs to a storage place as the WAL files are generated
Incremental backup is pushing all the WAL files created since the last backup
So, assuming my WAL archiving is setup correctly
Why would I need incremental backups?
Shouldn't the cost of incremental backups be almost zero?
Most of the documentation I found is focusing on a high level implementation (e.g. how to setup WAL archiving or incremental backups) vs the internal ( what happens when I trigger an incremental backup)
My question can probably be solved with a link to some documentation, but my google-fu has failed me so far
Backups are not copies of the WAL files, they're copies of the cluster's whole data directory. As it says in the docs, an incremental backup contains:
those database cluster files that have changed since the last backup (which can be another incremental backup, a differential backup, or a full backup)
WALs alone aren't enough to restore a database; they only record changes to the cluster files, so they require a backup as a starting point.
The need for periodic backups (incremental or otherwise) is primarily to do with recovery time. Technically, you could just hold on to your original full backup plus years worth of WAL files, but replaying them all in the event of a failure could take hours or days, and you likely can't tolerate that kind of downtime.
A new backup also means that you can safely discard any older WALs (assuming you don't still need them for point-in-time recovery), meaning less data to store, and less data whose integrity you're relying on in order to recover.
If you want to know more about what pgBackRest is actually doing under the hood, it's all covered pretty thoroughly in the Postgres docs.
I'm running master & replica on PG 13.3. I decided to use delayed replication (30 minutes configured in recovery_min_apply_delay parameter). On top of that, WAL archiving is configured and working well.
When load on master is very high for a long time, it happens that replication is falling behind until max_slot_wal_keep_size is exceeded (see my another, related question: Replication lag - exceeding max_slot_wal_keep_size, WAL segments not removed). Once it falls too far behind, the slot is "lost' and replica falls back to restoring WAL from the archive. So far so good. The problem is, it never tries replication again. Restarting slave does not help.
There are two ways how I managed to restore the replication:
Restarts & config edits
Remove the delay config from the replica
Restart postgres. Then it restores all the WAL from archive and once there's nothing left it will start replication again - but without any delay. Then I edit config again to introduce replication and it sometimes works, sometimes doesn't. I think it depends on the load.
Removing a WAL segment from archive
Look at currently restored WAL segments from the postgresql log and temporarily move the following one from the WAL archive. When PG tries to recovery it fails and falls back to replication
This doesn't seem like the right way to do it, does it?
Thanks,
-- Marcin
As far as I can see, this is a non-problem.
If you want replication delayed by 30 minutes, and you archive more than one 16MB WAL segment per half hour, there is no need to replicate. The information can just as well be read from the archive. If the latest entry in the latest archived WAL segment happens to be older than recovery_min_apply_delay, the standby will contact the primary and replicate.
If you insist on replication rather than archive recovery, remove restore_command and max_slot_wal_keep_size from the configuration. But I don't see the point.
If you are concerned about losing the active WAL segment in case of a catastrophe on the primary, use pg_receivewal rather than archive_command to populate the WAL archive.
Summary
We are using max_slot_wal_keep_size from Postgresql 13 to prevent master from being killed by a lagging replication. It seems, that in our case, WAL storage wasn't freed up after exceeding this parameter which resulted in a replication failure. WAL which, as I believe, should have been freed up did not seem to be needed by any other transaction at a time. I wonder how this should work and why WAL segments were not removed?
Please find the details below.
Configuration
master & one replica - streaming replication using a slot
~700GB available for pg_wal
max_slot_wal_keep_size = 600GB
min_wal_size = 20GB
max_wal_size = 40GB
default checkpoint_timeout = 5 minutes (no problem with checkpoints)
archiving is on and is catching up well
What happened
Under heavy load (large COPY/INSERT transactions, loading hundreds of GB of data), the replication started falling behind. Available space on pg_wal was being reduced in the same rate as safe_slot pg_replication_slot.safe_wal_size - as expected. At some point safe_wal_size went negative and streaming stopped working. It wasn't a problem, because replica started recovery from WAL archive. I expected that once the slot is lost, WALs will be removed up to max_wal_size. This did not happen though. It seems that Postgres tried to maintain something close to max_slot_wal_keep_size (600GB) available, in case replica starts catching up again. Over the time, there was no single transaction which would require this much WAL to be kept. archiving wasn't behind either.
Q1: Is it the case that PG will try to maintain max_slot_keep_size of WALs available?
Q2: If not, why PG did not remove excessive WAL when they were not needed neither by archiver, nor by any transactions running on the system?
Amount of free space on pg_wal was more or less 70GB for most of the time, however at some point, during heavy autovacuuming, it dipped to 0 :( This is when PG crashed and (auto-recovered soon after). After getting back up, there was 11GB left on pg_wal and no transaction running, no loading. This lasted for hours. During this time replica finally caught up from the archive and restored the replication with no delay. None of the WALs were removed. I manually run checkpoint but it did not clear any WALs. I finally restarted Postgresql and during the restarting pg_wal were finally cleared.
Q3: Again - why PG did not clear WAL? WALs, even more clearly, were not needed by any process.
Many thanks!
This was a PostgreSQL bug, and it's fixed. Thanks for reporting!
It should be available in 13.4 according to release notes (look for "Advance oldest required WAL segment")
When using streaming replication can someone please explain the purpose of archive_command and restore_command in PostgreSQL?
As i studied in streaming replication secondary server read and apply the partially filled WAL files.suppose i have my wal segment location in pg_xlog and using archive_command i am copying this to my local archive directory say /arclogs.
So if secondary server is going to read the partially filled archive logs from pg_xlog over the network then what's the use of files kept in /arclogs.
and also the files will be sent to /arclogs only when they will be 16 mb?
I'm new to PostgreSQL & your help will be appericated.
The master will normally only retain a limited amount of WAL in pg_xlog, controlled by the master's wal_keep_segments setting. If the replica is too slow or disconnected for too long, the master will delete those transaction logs to ensure it can continue running without running out of disk space.
If that happens the replica has no way to catch up to the master, since it needs a continuous and gap-free stream of WAL.
So you can:
Enable WAL archiving (archive_command and archive_mode) as a fallback, so the replica can switch to replaying WAL from archives if the master deletes WAL it needs from its pg_xlog. The replica fetches the WAL with its restore_command. Importantly, the archived WAL does not need to be on the same machine as the master, and usually isn't.
or
Use a physical replication slot (primary_slot_name in recovery.conf) to connect the replica to the master. If a slot is used, the master knows what WAL the replica requires even when the replica is disconnected. So it won't remove WAL still needed by a replica from pg_xlog. But the downside is that pg_xlog can fill up if a replica is down for too long, causing the master to fail due to lack of disk space.
or
Do neither, and allow replicas to fail if they fall too far behind. Then re-create them from a new base backup if this happens.
The documentation really needs an overview piece to put all this together.
WAL archiving has an additional benefit: If you make a base backup of the server you can use it, plus WAL archives, to do a point-in-time restore of the master. This lets you recover data from things like accidental table drops. PgBarman is one of the tools that can help you with this.
Postgress follows MVCC rules. So any query that is run on a table doesn't conflict with the writes that happen on the table. The query returns the result based on the snapshot at the point of running the query.
Now i have a master and slave. The slave is used by analysts to run queries and to perform analysis. When the slave is replicating and when analyst are running their queries simultaneously, i can see the replication lag for a long time.If the queries are long running, the replication lags a long duration and if the number of writes on the master happens to be pretty high, then i end up losing the WAL files and replication can longer proceed. I just have to spin up another slave. Why does this happen ? How do i allow queries and replication to happen simultaneously on postures ? Is there any parameter setting that i can apply to make this happen ?
The replica can't apply more WAL from the master because the master might've overwritten data blocks still needed by queries running on the replica that're older than any still running on the master. The replica needs older row versions than the master. It's exactly because of MVCC that this pause is necessary.
You probably set a high max_standby_streaming_delay to avoid "canceling statement due to conflict with recovery" errors.
If you turn hot_standby_feedback on, the replica can instead tell the master to keep those rows. But the master can't clean up free space as efficiently then, and it might run out of space in pg_xlog if the standby gets way too far behind.
See PostgreSQL manual: Handling Query Conflicts.
As for the WAL retention part: enable WAL archiving and a restore_command for your standbys. You should really be using it anyway, for point-in-time recovery. PgBarman now makes this easy with the barman get-wal command. If you don't want WAL archiving you can instead set your replica servers up to use a replication slot to connect to the master, so the master knows to retain the WAL they need indefinitely. Of course, that can cause the master to run out of space in pg_xlog and stop running so you need to monitor more closely if you do that.