I am having a trouble setting up a PostgreSQL hot_standby. When attempting to start the database after running pg_basebackup, I receive, FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 00000001000006440000008D has already been removed, in the postgresql.log. After a brief discussion in IRC, I came to understand the error likely originates from a too low wal_keep_segments setting for my write intensive database..
How might I calculate, if possible, the proper setting for wal_keep_segments? What is an acceptable value for this setting?
What I am working with:
Postgresql 9.3
Debian 7.6
wal_keep_segments could be estimated as the average number of new WAL segments per minute in the pg_xlog directory multiplied by the number of minutes across which you want to be safe for. Bear in mind that the rate is expected to increase after wal_level is changed from its default value of minimal to either archive or hot_standby. The only cost is disk space, which as you know by default is 16 MB per segment.
I typically use powers of 2 as values. At the rate of about 1 segment per minute, a value of 256 gives me about 4 hours in which to set up the standby.
You could alternatively consider using WAL streaming with pg_basebackup. This is per its --xlog-method=stream option. Unfortunately, at least as of 2013, per a discussion on a PostgreSQL mailing list, setting wal_keep_segments to a nonzero value may still be recommended - this is to prevent risking the stream from being unable to keep up. If you do use pg_basebackup though, also don't forget --checkpoint=fast.
Related
Requirement:
Avoid terminating connection due to conflict with recovery error and also have acceptable replication lag.
Google Cloud PostgreSQL 9.6, Replication turned on (uses Streaming replication), PGPool-II set to only do load balancing and with following properties on slave:
work_mem 3276800
commit_delay 100
max_wal_size 940
max_standby_archive_delay -1
max_standby_streaming_delay -1
hot_standby_feedback on
Machine config:
vCPUs:8, Memory: 30 GB, SSD storage: 76 GB
Workload:
Master fully loaded with writes and reads, and slave also fully loaded with lots of reads.
The max length of queries might be around 8-10 secs.
What we tried before:
Set max_standby_archive_delay and max_standby_streaming_delay to 900000 (900 secs), however we were seeing a lot of conflict errors.
Set max_standby_archive_delay and max_standby_streaming_delay to -1, this made the conflict errors go away, however the lag increased a lot (somewhere around 23mins)
Set max_standby_archive_delay and max_standby_streaming_delay to -1 and hot_standby_feedback to on. This also made the conflict errors go away, however we are still seeing replication lags (around 500 secs)
Query used for lag:
SELECT
pg_last_xlog_receive_location() receive,
pg_last_xlog_replay_location() replay,
(
extract(epoch FROM now()) -
extract(epoch FROM pg_last_xact_replay_timestamp())
)::int lag;
Graph of lag measured every 1 sec over a period of 9 hours:
Questions:
Given our use-case (Slave being actively used for read queries, how do we make sure we have no conflict errors and a reasonable lag (around few secs)
What does the lag mean? Does it mean only one of the table is behind Master? Or does it mean all other WALs are also pending to be applied on Slave.
If 1. is not achievable using config properties, how do we solve it in code (This is the least desirable since the code base is vast and will require lots of changes)
Thanks!
You cannot totally avoid conflicts — every statement like TRUNCATE or ALTER TABLE that requires an ACCESS EXCLUSIVE lock will lead to a replication conflict.
But you can avoid replication conflicts caused by VACUUM:
Set hot_standby_feedback = on to keep PostgreSQL from removing tuples still needed on the standby.
Set old_snapshot_threshold to a (possibly high) value other than the default to avoid vacuum truncation.
This truncation requires an ACCESS EXCLUSIVE lock that can also lead to conflicts.
For the remaining conflicts, you have a choice between delayed application and query cancelation. Or you change the workload to avoid ACCESS EXCLUSIVE locks.
To find out what is blocking you, you'll have to use pg_xlogdump on the WAL files and search for ACCESS EXCLUSIVE locks. This will allow you to figure out which object is locked. To find out what kind of operation is performed, check the WAL entries immediately before (VACUUM?) or immediately afterwards (DDL?).
I am creating replication slot and streaming changes from AWS Postgres RDS to java process through JDBC driver.
My replication slot creation code looks like this.
final ReplicationSlotInfo replicationSlotInfo = pgConnection.getReplicationAPI()
.createReplicationSlot()
.logical()
.withSlotName(replicationSlotName)
.withOutputPlugin("wal2json")
.make();
and I get replication stream using following code.
pgConnection.getReplicationAPI()
.replicationStream()
.logical()
.withSlotName(replicationSlotName)
.withSlotOption("include-xids", true)
.withSlotOption("include-timestamp", true)
.withSlotOption("pretty-print", false)
.withSlotOption("add-tables", "public.users")
.withStatusInterval(10, TimeUnit.SECONDS)
.start()
When replicator java process is not running, the WAL size gets increased. Here is the query I use to find replication lag.
SELECT
slot_name,
pg_size_pretty(pg_xlog_location_diff(pg_current_xlog_location(), restart_lsn)) AS replicationSlotLag,
active
FROM
pg_replication_slots;
Output:
slot_name replicationslotlag active
data_stream_slot 100 GB f
This replication lag gets increased beyond RDS Disk, which shuts RDS down.
I thought wal_keep_segments will take care of this, which was set to 32. But it did not work.
Is there any other property which I have to set to avoid this situation, even when Java Replication process is not running.
There is a proposal to allow a logical replication slots WAL retention to be limited. I think that that is just what you need, but it is not clear when/if it will become available.
In the meantime, all you can do is monitor the situation, then then drop the slot if it starts to fall behind too far. Of course this does mean you will have a problem re-establishing synchronization later, but there is no way around that (other than fixing whatever it is that is causing the replication process to go away and/or fall behind).
Since you say the java process is not running, dropping the slot is easy to do. If it were running, but just not keeping up, then you would have to do the sad little dance where you kill the wal sender, then try to drop the slot before it gets restarted (and I don't know how you do that on RDS)
wal_keep_segments is only applicable to physical replication, not logical. And it is for use instead of slots, not in addition to them. If you have both, then WAL is retained until both criteria are met. Indeed that is the problem you are facing; logical replication cannot be done without use of slots the way physical replication can.
wal_keep_segments is irrelevant for logical decoding.
With logical decoding, you always have to use a logical replication slot, which is a data structure which marks a position in the transaction log (WAL), so that the server never discards old WAL segments that logical decoding might still need.
That is why your WAL directory grows if you don't consume the changes.
wal_keep_segments specifies a minimum number of old WAL segments to retain. It is used for purposes like streaming replication, pg_receivewal or pg_rewind.
wal_keep_segments specifies the minimum number of segments PostgreSQL should keep in pg_xlog directory. There can be a few reasons why PostgreSQL doesn't remove segments:
There is a replication slot at a WAL location older than the WAL files, you can check it with this query:
SELECT slot_name,
lpad((pg_control_checkpoint()).timeline_id::text, 8, '0') ||
lpad(split_part(restart_lsn::text, '/', 1), 8, '0') ||
lpad(substr(split_part(restart_lsn::text, '/', 2), 1, 2), 8, '0')
AS wal_file
FROM pg_replication_slots;
WAL archiving is enabled and archive_command fails. Please, check PostgreSQL logs in this case.
There was no checkpoint for a long time.
I am trying to understand the behaviour of wal files. The wal related settings of the database are as follows:
"min_wal_size" "2GB"
"max_wal_size" "20GB"
"wal_segment_size" "16MB"
"wal_keep_segments" "0"
"checkpoint_completion_target" "0.8"
"checkpoint_timeout" "15min"
The number of wal files is always 1281 or higher:
SELECT COUNT(*) FROM pg_ls_dir('pg_xlog') WHERE pg_ls_dir ~ '^[0-9A-F]{24}';
-- count 1281
As I understand it this means wal files currently never fall below max_wal_size (1281 * 16 MB = 20496 MB = max_wal_size) ??
I would expect the number of wal files to decrease below maximum right after a checkpoint is reached and data is synced to disk. But this is clearly not the case. What am I missing?
As per the documentation (emphasis added):
The number of WAL segment files in pg_xlog directory depends on min_wal_size, max_wal_size and the amount of WAL generated in previous checkpoint cycles. When old log segment files are no longer needed, they are removed or recycled (that is, renamed to become future segments in the numbered sequence). If, due to a short-term peak of log output rate, max_wal_size is exceeded, the unneeded segment files will be removed until the system gets back under this limit. Below that limit, the system recycles enough WAL files to cover the estimated need until the next checkpoint, and removes the rest
So, as per your observation, you are probably observing the "recycle" effect -- the old WAL files are getting renamed instead of getting removed. This saves the disk some I/O, especially on busy systems.
Bear in mind that once a particular file has been recycled, it will not be reconsidered for removal/recycle again until it has been used (i.e., the relevant LSN is reached and checkpointed). That may take a long time if your system suddenly becomes less active.
If your server is very busy and then abruptly becomes mostly idle, you can get into a situation where the log fails remain at max_wal_size for a very long time. At the time it was deciding whether to remove or recycle the files, it was using them up quickly and so decided to recycle up to max_wal_size for predicted future use, rather than remove them. Once recycled, they will never get removed until they have been used (you could argue that that is a bug), and if the server is now mostly idle it will take a very long time for them to be used and thus removed.
I have a large and fast-growing PostgreSQL table (166gb Index and 72 GB database). And I want to set up a logical replication of this table. Version 11.4 on both sides.
I'm trying to do it for 2 weeks, but the only thing I have is infinite syncing and growing table size on the replica (already 293 Gb index and 88Gb table, more than original, and there are no errors in the log).
I also have tried to take a dump, restore it and start syncing - but got errors with existing primary keys.
Backend_xmin value of replication stats is changing once in a week, but the sync state is still "startup". The network connection between those servers is not used at all (they are in the same datacenter), actual speed like 300-400Kb (looks like it's mostly streaming part of replication process).
So the question is How to set up a Logical replication of large and fast-growing table properly, is it possible somehow? Thank you.
I'm trying to do it for 2 weeks, but the only thing I have is infinite syncing and growing table size on the replica (already 293 Gb index and 88Gb table, more than original, and there are no errors in the log).
Drop the non-identity indexes on the replica until after the sync is done.
The problem is exactly the same
Check the logs I found the following error:
ERROR: could not receive data from WAL stream: ERROR: canceling statement due to statement timeout
Due to large tables, replication fell off by timeout
By increasing the timeouts, the problem went away
PS Ideally, it would be cooler to set up separate timeouts for replication and for the main base.
I would like to keep at least 12 hours worth of wal segments around to keep replication going for extended network outages (namely long DR tests that my database is not a part of)
I've estimated that I will need to raise my wal_keep_segments from 64 to 1000+
Are there any drawbacks of doing this other than the space it would require? i.e. performance?
I'm considering the archive option as a backup plan for now.
Apart from the disk space, there is no problem with a high wal_keep_segments setting.