Postgres 11 Standby never catches up - postgresql

Since upgrading to Postgres 11 I cannot get my production standby server to catch up. In the logs things look fine eventually:
2019-02-06 19:23:53.659 UTC [14021] LOG: consistent recovery state reached at 3C772/8912C508
2019-02-06 19:23:53.660 UTC [13820] LOG: database system is ready to accept read only connections
2019-02-06 19:23:53.680 UTC [24261] LOG: started streaming WAL from primary at 3C772/8A000000 on timeline 1
But the following queries show everything is not fine:
warehouse=# SELECT coalesce(abs(pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn())), -1) / 1024 / 1024 / 1024 AS replication_delay_gbytes;
replication_delay_gbytes
-------------------------
208.2317776754498486
(1 row)
warehouse=# select now() - pg_last_xact_replay_timestamp() AS replication_delay;
replication_delay
-------------------
01:54:19.150381
(1 row)
After a while (a couple hours) replication_delay stays about the same but replication_delay_gbytes grows, although note replication_delay is behind from the beginning and replication_delay_gbytes starts near 0. During startup there were a number of these messages:
2019-02-06 18:24:36.867 UTC [14036] WARNING: xlog min recovery request 3C734/FA802AA8 is past current point 3C700/371ED080
2019-02-06 18:24:36.867 UTC [14036] CONTEXT: writing block 0 of relation base/16436/2106308310_vm
but Googling suggests these are fine.
Replica was created using repmgr by running pg_basebackup to perform the clone and then starting up the replica and seeing it catch up. This previously was working with Postgres 10.
Any thoughts on why this replica comes up but is perpetually lagging?

I'm still not sure what the issue is/was, but I was able to get the standby caught up with these two changes:
set use_replication_slots=true in the repmgr config
set wal_compression=on in the postgres config
Use replication slots didn't seem to change anything other than to cause replication_delay_gbytes to stay roughly flat. Turing on WAL compression did help, somehow, although I'm not entirely sure how. Yes, in theory it made it possible to ship WAL files to the standby faster, but reviewing network logs I see a drop in sent/received bytes that matches the effects of compression, so it seems to be shipping WAL files at the same speed just using less network.
It still seems like there is some underlying issue at play here, though, because for example when I do pg_basebackup to create the standby it generates roughly 500 MB/s of network traffic, but then when it is streaming WALs after the standby finishes recovery it drops to ~250 MB/s without WAL compression and ~100 MB/s with WAL compression, but there is no decrease in network traffic after it caught up with WAL compression, so I'm not sure what's going on there that allowed it to catch up.

Related

pg_wal getting full in Master DB

I also have a similar setup with master-slave (primary-standby) streaming replication set up on 2 physical nodes.The replication is working correctly and walsender and walreceiver both work fine.But in my case, the pg_wal in master is getting full and wal files are not being cleaned up. My archive mode is disabled. Can anyone help?
Postgres Version 12. Running on RHEL-7.8
Here is my configuration in Master
Master Runtime Configuration
Here is my configuration in Streaming Replication Streaming Replication Runtime Configuration
Can someone help?
Very likely you have a stale replication slot. Look at
SELECT slot_name, active, restart_lsn
FROM pg_replication_slots;
If you find an inactive one with an old restart_lsn you have found the problem. Use the pg_drop_replication_slot function to get rid of it, then pg_wal will slowly shrink back to max_wal_size.
You should also check if you have high values in wal_keep_size (wal_keep_segments in older releases). That will also make PostgreSQL retain old WAL segments.

Postgresql fatal the database system is starting up - windows 10

I have installed postgresql on windows 10 on usb disk.
Every day when i start my pc in work from sleep and plug in the disk again then trying to start postgresql i get this error:
FATAL: the database system is starting up
The service starts with following command:
E:\PostgresSql\pg96\pgservice.exe "//RS//PostgreSQL 9.6 Server"
It is the default one.
logs from E:\PostgresSql\data\logs\pg96
2019-02-28 10:30:36 CET [21788]: [1-1] user=postgres,db=postgres,app=[unknown],client=::1 FATAL: the database system is starting up
2019-02-28 10:31:08 CET [9796]: [1-1] user=postgres,db=postgres,app=[unknown],client=::1 FATAL: the database system is starting up
I want this start up to happen faster.
When you commit data to a Postgres database, the only thing which is immediately saved to disk is the write-ahead log. The actual table changes are only applied to the in-memory buffers, and won't be permanently saved to disk until the next checkpoint.
If the server is stopped abruptly, or if it suddenly loses access to the file system, then everything in memory is lost, and the next time you start it up, it needs to resort to replaying the log in order to get the tables back to the correct state (which can take quite a while, depending on how much has happened since the last checkpoint). And until it's finished, any attempt to use the server will result in FATAL: the database system is starting up.
If you make sure you shut the server down cleanly before unplugging the disk - giving it a chance to set a checkpoint and flush all of its buffers - then it should be able to start up again more or less immediately.

Postgres Streaming Replication Error: requested WAL segment has already been removed

I have setup streaming replication between a primary and secondary server. I have enabled archiving. In the Postgres log file I am seeing the below error.
< 2017-12-05 03:08:45.374 UTC > WARNING: archive_mode enabled, yet archive_command is not set
< 2017-12-05 03:08:46.668 UTC > ERROR: requested WAL segment 0000000100000000000000E3 has already been removed
< 2017-12-05 03:08:51.675 UTC > ERROR: requested WAL segment 0000000100000000000000E3 has already been removed
< 2017-12-05 03:08:56.682 UTC > ERROR: requested WAL segment 0000000100000000000000E3 has already been removed
Do we need to enable archive_mode = on for streaming replication? How can I avoid above error?
max_wal_senders = 3
wal_keep_segements = 32
https://www.postgresql.org/docs/current/static/warm-standby.html
If you use streaming replication without file-based continuous
archiving, the server might recycle old WAL segments before the
standby has received them. If this occurs, the standby will need to be
reinitialized from a new base backup. You can avoid this by setting
wal_keep_segments to a value large enough to ensure that WAL segments
are not recycled too early, or by configuring a replication slot for
the standby. If you set up a WAL archive that's accessible from the
standby, these solutions are not required, since the standby can
always use the archive to catch up provided it retains enough
segments.
emphasis mine.
so either increase wal_keep_segments to big enough (enough for your amount of block changes), or configure archive_command and set up some storage to keep removed wals from master to be available for slave. Or configuring a replication slot for the standby...
In my case I had to do reinit the replica in maintenance mode using below commands and it fixed the issue. This error was due to lag between leader and replica.
patronictl list
patronictl pause
patronictl reinit patroni
choose Replica pod
patronictl resume

postgresql consumed 100% of storage

I'm running a Red Hat 4.8.3-9 server in AWS cloud with version 9.4 of Postgresql. The database has consumed 100% of my disk usage. I went in to the database and truncated the table with the most data. After viewing the size of the tables with \d+, there was not any tables over a couple MB's. I ran du -h * --max-depth=1 and found that /var/lib/pgsql94/data/base/16384 held 472G of the 500G total storage. Even after truncating the tables, my disk usage is still 99%. I'm wondering if there is a solution to release the data because I believe deleting all the OID's in 'data/base/16384 would be bad. I have tried stopping, restarting postgres service. I am not allowed to reboot the machine unfortunately.
df -ih shows inode usage is 1%
sudo lsof +L1 does not show any large files at all
Thank you
Log files: 8K worth of repeating string
LOG: could not write temporary statistics file
"pg_stat_tmp/db_16384.tmp": No space left on device
LOG: could not
close temporary statistics file "pg_stat_tmp/db_0.tmp": No space left
on device
LOG: could not close temporary statistics file
"pg_stat_tmp/global.tmp": No space left on device
LOG: using stale
statistics instead of current ones because stats collector is not
responding

Postgresql large table update slows down

I run an update on a large table (e.g. 8 GB). It is a simple update of 3 fields in the table. I had no problems running it under postgresql 9.1, it would take 40-60 minutes but it worked. I run the same query in 9.4 database (freshly created, not upgraded) and it starts the update fine but then slows down. It uses only ~2% CPU, the level if IO is 4-5MB/s and it is sitting there. No locks, no other queries or connections, just this single update SQL on the server.
The SQL is below. "lookup" table has 12 records. The lookup can return only one row, it breaks a discrete scale (SMALLINT, -32768 .. +32767) into non-overlapping regions. "src" and "dest" tables are ~60 million records.
UPDATE dest SET
field1 = src.field1,
field2 = src.field2,
field3_id = (SELECT lookup.id FROM lookup WHERE src.value BETWEEN lookup.min AND lookup.max)
FROM src
WHERE dest.id = src.id;
I thought my disk slowed down but I can copy 1 GB files in parallel to query execution and it runs fast at >40MB/s and I have only one disk (it is a VM with ISCSI media). All other disk operations are not impacted, there is plenty of IO bandwidth. At the same time PostgreSQL is just sitting there doing very little, running very slowly.
I have 2 virtualized linux servers, one runs postgresql 9.1 and another runs 9.4. Both servers have close to identical postgresql configuration.
Has anyone else had similar experience? I am running out of ideas. Help.
Edit
The query "ran" for 20 hours I had to kill the connections and restart the server. Surprisingly it didn't kill the connection via query:
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE pid <> pg_backend_pid() AND datname = current_database();
and sever produced the following log:
2015-05-21 12:41:53.412 EDT FATAL: terminating connection due to administrator command
2015-05-21 12:41:53.438 EDT FATAL: terminating connection due to administrator command
2015-05-21 12:41:53.438 EDT STATEMENT: UPDATE <... this is 60,000,000 record table update statement>
Also server restart took long time, producing the following log:
2015-05-21 12:43:36.730 EDT LOG: received fast shutdown request
2015-05-21 12:43:36.730 EDT LOG: aborting any active transactions
2015-05-21 12:43:36.730 EDT FATAL: terminating connection due to administrator command
2015-05-21 12:43:36.734 EDT FATAL: terminating connection due to administrator command
2015-05-21 12:43:36.747 EDT LOG: autovacuum launcher shutting down
2015-05-21 12:44:36.801 EDT LOG: received immediate shutdown request
2015-05-21 12:44:36.815 EDT WARNING: terminating connection because of crash of another server process
2015-05-21 12:44:36.815 EDT DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
"The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory" - is this an indication of a bug in PostgreSQL?
Edit
I tested 9.1, 9.3 and 9.4. Both 9.1 and 9.3 don't experience the slow down. 9.4 consistently slows down on large transactions. I noticed that when a transaction starts htop monitor indicates high CPU and the process status is "R" (running). Then it gradually changes to low CPU usage and status "D" - disk (see screenshot ). My biggest question is why 9.4 is different from 9.1 and 9.3? I have a dozen of servers and this effect is observed across the board.
Thanks everyone for the help. No matter how much I tried to emphasize on the difference of performance between identical configuration of 9.4 and previous versions no one seemed to pay attention to that.
The problem was solved by disabling transparent huge pages:
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
Here are some resources I found helpful in reserching the issue:
* https://dba.stackexchange.com/questions/32890/postgresql-pg-stat-activity-shows-commit/34169#34169
* https://lwn.net/Articles/591723/
* https://blogs.oracle.com/linux/entry/performance_issues_with_transparent_huge
I'd suspect a lot of disk seeking - 5MB/s is just about right for a very random IO on ordinary (spinning) hard drive.
As you constantly replace basically all your rows I'd try to set dest table fillfactor to about 45% (alter table dest set (fillfactor=45);) and then cluster test using test_pkey;. This would allow updated row versions to be placed in the same disk sector.
Additionally using cluster src using src_pkey; so both tables would have data in the same physical order on disk also can help.
Also remember to vacuum table dest; after every update that large, so old row versions could be used again in subsequent updates.
Your old server probably evolved it's fillfactor naturally during multiple updates. On new server it is packed 100%, so updated rows have to be placed at the end.
If only few of the target rows are actually updated, you can avoid new row versions to be generated by using DISTICNT FROM. This can prevent a lot of useless disk traffic.
UPDATE dest SET
field1 = src.field1,
field2 = src.field2,
field3_id = lu.id
FROM src
JOIN lookup lu ON src.value BETWEEN lu.min AND lu.max
WHERE dest.id = src.id
-- avoid unnecessary row versions to be generated
AND (dest.field1 IS DISTINCT FROM src.field1
OR dest.field1 IS DISTINCT FROM src.field1
OR dest.field3_id IS DISTINCT FROM lu.id
)
;