WAL files keep growing even though archiving is working - postgresql

I have archiving fully working with failed_count = 0. And I successfully set up logical replication for 2 tables (with total records of 23 millions and total columns of 200).
FYI : The tables are not busy during weekend, 0 additional records. Even on weekdays, they only get maximum of 1 - 10 records / day for 2 tables.
But, when I do : SELECT COUNT(*) FROM pg_ls_dir('pg_wal') WHERE pg_ls_dir ~ '^[0-9A-F]{24}', I can see that the WAL files keep growing at a rate of 1 - 2 files every 20 minutes .
I am expecting that the WAL file stop growing as soon as archiving taking place.
Initially when I inherited this database, there were already 6800 of WAL files even though there were no logical replication took place.
Here is my configuration:
name |setting |unit|
----------------------------+------------------------------------------+----+
archive_command |test ! -f /archive/%f && cp %p /archive/%f| |
archive_mode |on | |
archive_timeout |2400 |s |
checkpoint_completion_target|0.9 | |
checkpoint_flush_after |32 |8kB |
checkpoint_timeout |300 |s |
checkpoint_warning |30 |s |
hot_standby |on | |
log_checkpoints |off | |
max_replication_slots |10 | |
max_wal_senders |5 | |
max_wal_size |8192 |MB |
min_wal_size |2048 |MB |
synchronous_standby_names |* | |
wal_compression |off | |
wal_level |logical | |
wal_log_hints |off | |
wal_segment_size |16777216 |B |
wal_sender_timeout |60000 |ms |
Questions:
Why do they keep on growing ? Is the server trying to send all the existing WAL files to another server that involves in Logical Replication ?
How to stop them from growing ?
How do I start from zero again (ie: empty pg_wal) ?
postgresql 12.

Related

PG_WAL is very big size

I have a Postgres cluster with 3 nodes: ETCD+Patroni+Postgres13.
Now there was a problem of constantly growing pg_wal folder. It now contains 5127 files. After searching the internet, I found an article advising you to pay attention to the following database parameters (their meaning at the time of the case is this):
archive_mode off;
wal_level replica;
max_wal_size 1G;
SELECT * FROM pg_replication_slots;
postgres=# SELECT * FROM pg_replication_slots;
-[ RECORD 1 ]-------+------------
slot_name | db2
plugin |
slot_type | physical
datoid |
database |
temporary | f
active | t
active_pid | 2247228
xmin |
catalog_xmin |
restart_lsn | 2D/D0ADC308
confirmed_flush_lsn |
wal_status | reserved
safe_wal_size |
-[ RECORD 2 ]-------+------------
slot_name | db1
plugin |
slot_type | physical
datoid |
database |
temporary | f
active | t
active_pid | 2247227
xmin |
catalog_xmin |
restart_lsn | 2D/D0ADC308
confirmed_flush_lsn |
wal_status | reserved
safe_wal_size |
All other functionality of the Patroni cluster works (switchover, reinit, replication);
root#srvdb3:~# patronictl -c /etc/patroni/patroni.yml list
+ Cluster: mobile (7173650272103321745) --+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------+------------+---------+---------+----+-----------+
| db1 | 10.01.1.01 | Replica | running | 17 | 0 |
| db2 | 10.01.1.02 | Replica | running | 17 | 0 |
| db3 | 10.01.1.03 | Leader | running | 17 | |
+--------+------------+---------+---------+----+-----------+
Patroni patroni-edit:
loop_wait: 10
maximum_lag_on_failover: 1048576
postgresql:
parameters:
checkpoint_timeout: 30
hot_standby: 'on'
max_connections: '1100'
max_replication_slots: 5
max_wal_senders: 5
shared_buffers: 2048MB
wal_keep_segments: 5120
wal_level: replica
use_pg_rewind: true
use_slots: true
retry_timeout: 10
ttl: 100
Help please, what could be the matter?
This is what I see in pg_stat_archiver:
postgres=# select * from pg_stat_archiver;
-[ RECORD 1 ]------+------------------------------
archived_count | 0
last_archived_wal |
last_archived_time |
failed_count | 0
last_failed_wal |
last_failed_time |
stats_reset | 2023-01-06 10:21:45.615312+00
If you have wal_keep_segments set to 5120, it is completely normal if you have 5127 WAL segments in pg_wal, because PostgreSQL will always retain at least 5120 old WAL segments. If that is too many for you, reduce the parameter. If you are using replication slots, the only disadvantage is that you might only be able to pg_rewind soon after a failover.

Long Aurora autovacuum on postgres system tables

We have had an incredibly long running autovacuum process running on one of our smaller database machines that we believe has been using a lot of Aurora:StorageIOUsage:
We determined this by running SELECT * FROM pg_stat_activity WHERE wait_event_type = 'IO';
and seeing the below results repeatedly.
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
--------+----------------------------+-------+----------+-----------+------------------+----------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+--------------+--------+-------------+--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------
398954 | postgres | 17582 | | | | | | | 2022-09-29 18:45:55.364654+00 | 2022-09-29 18:46:20.253107+00 | 2022-09-29 18:46:20.253107+00 | 2022-09-29 18:46:20.253108+00 | IO | DataFileRead | active | | 66020718 | autovacuum: VACUUM pg_catalog.pg_depend | autovacuum worker
398954 | postgres | 17846 | | | | | | | 2022-09-29 18:46:04.092536+00 | 2022-09-29 18:46:29.196309+00 | 2022-09-29 18:46:29.196309+00 | 2022-09-29 18:46:29.19631+00 | IO | DataFileRead | active | | 66020732 | autovacuum: VACUUM pg_toast.pg_toast_2618 | autovacuum worker
As you can see from the screenshot it has been going for well over a month, and is mainly for the pg_depend, pg_attribute, and pg_toast_2618 tables which are not all that large. I haven't been able to find any reason why these tables would need so much vacuuming other than maybe a database restore from our production environment (this is one of our lower environments). Here are the pg_stat_sys_tables entries for those tables and the pg_rewrite which is the table that pg_toast_2618 is associated with:
relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count
-------+------------+---------------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------+-------------------------------+--------------+-------------------------------+--------------+------------------+---------------+-------------------
1249 | pg_catalog | pg_attribute | 185251 | 12594432 | 31892996 | 119366792 | 1102817 | 3792 | 1065737 | 1281 | 543392 | 1069529 | 23584 | | 2022-09-29 18:53:25.227334+00 | | 2022-09-28 01:12:47.628499+00 | 0 | 1266763 | 0 | 36
2608 | pg_catalog | pg_depend | 2429 | 369003445 | 14152628 | 23494712 | 7226948 | 0 | 7176855 | 0 | 476267 | 7176855 | 0 | | 2022-09-29 18:52:34.523257+00 | | 2022-09-28 02:02:52.232822+00 | 0 | 950137 | 0 | 71
2618 | pg_catalog | pg_rewrite | 25 | 155083 | 1785288 | 1569100 | 64127 | 314543 | 62472 | 59970 | 7086 | 377015 | 13869 | | 2022-09-29 18:53:11.288732+00 | | 2022-09-23 18:54:50.771969+00 | 0 | 1280018 | 0 | 81
2838 | pg_toast | pg_toast_2618 | 0 | 0 | 1413436 | 3954640 | 828571 | 0 | 825143 | 0 | 15528 | 825143 | 1653714 | | 2022-09-29 18:52:47.242386+00 | | | 0 | 608881 | 0 | 0
I'm pretty new to Postgres and I'm wondering what could possibly cause this level of records to need to be cleaned up, and why it would take well over a month to accomplish considering we always have autovacuum set to TRUE. We are running Postgres version 10.17 on a single db.t3.medium, and the only thing I can think of at this point is to try increasing the instance size. Do we simply need to increase our database instance size on our aurora cluster so that this can be done more in memory? I'm at a bit of a loss for how to reduce this huge sustained spike in Storage IO costs.
Additional information for our autovaccum settings:
=> SELECT * FROM pg_catalog.pg_settings WHERE name LIKE '%autovacuum%';
name | setting | unit | category | short_desc | extra_desc | context | vartype | source | min_val | max_val | enumvals | boot_val | reset_val | sourcefile | sourceline | pending_restart
-------------------------------------+-----------+------+-------------------------------------+-------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+------------+---------+--------------------+---------+------------+-----------------------------------------------------------------------------------------+-----------+-----------+-----------------------------------+------------+-----------------
autovacuum | on | | Autovacuum | Starts the autovacuum subprocess. | | sighup | bool | configuration file | | | | on | on | /rdsdbdata/config/postgresql.conf | 78 | f
autovacuum_analyze_scale_factor | 0.05 | | Autovacuum | Number of tuple inserts, updates, or deletes prior to analyze as a fraction of reltuples. | | sighup | real | configuration file | 0 | 100 | | 0.1 | 0.05 | /rdsdbdata/config/postgresql.conf | 55 | f
autovacuum_analyze_threshold | 50 | | Autovacuum | Minimum number of tuple inserts, updates, or deletes prior to analyze. | | sighup | integer | default | 0 | 2147483647 | | 50 | 50 | | | f
autovacuum_freeze_max_age | 200000000 | | Autovacuum | Age at which to autovacuum a table to prevent transaction ID wraparound. | | postmaster | integer | default | 100000 | 2000000000 | | 200000000 | 200000000 | | | f
autovacuum_max_workers | 3 | | Autovacuum | Sets the maximum number of simultaneously running autovacuum worker processes. | | postmaster | integer | configuration file | 1 | 262143 | | 3 | 3 | /rdsdbdata/config/postgresql.conf | 45 | f
autovacuum_multixact_freeze_max_age | 400000000 | | Autovacuum | Multixact age at which to autovacuum a table to prevent multixact wraparound. | | postmaster | integer | default | 10000 | 2000000000 | | 400000000 | 400000000 | | | f
autovacuum_naptime | 5 | s | Autovacuum | Time to sleep between autovacuum runs. | | sighup | integer | configuration file | 1 | 2147483 | | 60 | 5 | /rdsdbdata/config/postgresql.conf | 9 | f
autovacuum_vacuum_cost_delay | 5 | ms | Autovacuum | Vacuum cost delay in milliseconds, for autovacuum. | | sighup | integer | configuration file | -1 | 100 | | 20 | 5 | /rdsdbdata/config/postgresql.conf | 73 | f
autovacuum_vacuum_cost_limit | -1 | | Autovacuum | Vacuum cost amount available before napping, for autovacuum. | | sighup | integer | default | -1 | 10000 | | -1 | -1 | | | f
autovacuum_vacuum_scale_factor | 0.1 | | Autovacuum | Number of tuple updates or deletes prior to vacuum as a fraction of reltuples. | | sighup | real | configuration file | 0 | 100 | | 0.2 | 0.1 | /rdsdbdata/config/postgresql.conf | 22 | f
autovacuum_vacuum_threshold | 50 | | Autovacuum | Minimum number of tuple updates or deletes prior to vacuum. | | sighup | integer | default | 0 | 2147483647 | | 50 | 50 | | | f
autovacuum_work_mem | -1 | kB | Resource Usage / Memory | Sets the maximum memory to be used by each autovacuum worker process. | | sighup | integer | default | -1 | 2147483647 | | -1 | -1 | | | f
log_autovacuum_min_duration | -1 | ms | Reporting and Logging / What to Log | Sets the minimum execution time above which autovacuum actions will be logged. | Zero prints all actions. -1 turns autovacuum logging off. | sighup | integer | default | -1 | 2147483647 | | -1 | -1 | | | f
rds.force_autovacuum_logging_level | disabled | | Customized Options | Emit autovacuum log messages irrespective of other logging configuration. | Each level includes all the levels that follow it.Set to disabled to disable this feature and fall back to using log_min_messages. | sighup | enum | default | | | {debug5,debug4,debug3,debug2,debug1,info,notice,warning,error,log,fatal,panic,disabled} | disabled | disabled | | | f
I would say you have some very long-lived snapshot being held. These tables need to be vacuumed, but the vacuum doesn't accomplish anything because the dead tuples can't be removed as some old snapshot still can see them. So immediately after being vacuumed, they are still eligible to be vacuumed again. So it tries again every 5 seconds (autovacuum_naptime), because autovacuum doesn't have a way to say "Don't bother until this snapshot which blocked me from accomplishing anything last time goes away"
Check pg_stat_activity for very old 'idle in transaction' and for any prepared transactions.

How can we track progress of initial load of logical replication in postgreSQL

Is there a way to track the progress of inital load of logical replication in postgreSQL means the statistics about how much data data copied, how much data left to copied, any estimation time to complete or any error while copying ?
Thanks
Starting in the soon-to-be-released version, 14, you can use the view pg_stat_progress_copy to see the progress of the initial data copy for each table. It will report the data and tuples already copied, but will not give you that other info.
select * from pg_stat_progress_copy ;
pid | datid | datname | relid | command | type | bytes_processed | bytes_total | tuples_processed | tuples_excluded
-------+--------+---------+--------+-----------+----------+-----------------+-------------+------------------+-----------------
5,884 | 16,384 | jjanes | 17,014 | COPY FROM | CALLBACK | 1,240,918,397 | 0 | 12,628,295 | 0

PostgreSQL master server hangs on replication flow

First of all, I'm not a data engineer, so I'll try to do my best to give you all needed things to resolve my problem :/
Context:
I'm trying to create 2 PostgreSQL servers, 1 master and 1 slave.
psql (PostgreSQL) 10.9 (Ubuntu 10.9-0ubuntu0.18.04.1)
As far as I understand, it's not a good idea to do a synchronous replication when we only have 2 servers. But I have to understand what's going on here...
Problem:
Master server hangs when I try to execute a CREATE SCHEMA test;.
But, schema exists on Master, and exists on Slave too. The Master hangs because it waits for the slave commit status...
Configuration of Master:
/etc/postgresql/10/main/conf.d/master.conf
# Connection
listen_addresses = '127.0.0.1,slave-ip'
ssl = on
ssl_cert_file = '/etc/ssl/postgresql/certs/server.pem'
ssl_key_file = '/etc/ssl/postgresql/private/server.key'
ssl_ca_file = '/etc/ssl/postgresql/certs/server.pem'
password_encryption = scram-sha-256
# WAL
wal_level = replica
synchronous_commit = remote_apply #local works, remote_apply hangs
# Archive
archive_mode = on
archive_command = 'rsync -av %p postgres#lab-3:/var/lib/postgresql/wal_archive_lab_2/%f'
# Replication master
max_wal_senders = 2
wal_keep_segments = 100
synchronous_standby_names = 'ANY 1 ("lab-3")'
/etc/postgresql/10/main/pg_hba.conf
hostssl replication replicate slave-ip/32 scram-sha-256
Configuration of Slave:
/etc/postgresql/10/main/conf.d/standby.conf
# Connection
listen_addresses = '127.0.0.1,master-ip'
ssl = on
ssl_cert_file = '/etc/ssl/postgresql/certs/server.pem'
ssl_key_file = '/etc/ssl/postgresql/private/server.key'
ssl_ca_file = '/etc/ssl/postgresql/certs/server.pem'
password_encryption = scram-sha-256
# WAL
wal_level = replica
# Archive
archive_mode = on
archive_command = 'rsync -av %p postgres#lab-3:/var/lib/postgresql/wal_archive_lab_3/%f'
# Replication slave
max_wal_senders = 2
wal_keep_segments = 100
hot_standby = on
/var/lib/postgresql/10/main/recovery.conf
standby_mode = on
primary_conninfo = 'host=master-ip port=5432 user=replicate password=replicate_password sslmode=require application_name="lab-3"'
trigger_file = '/var/lib/postgresql/10/postgresql.trigger'
I got absolutely NOTHING in log files when it hangs, just the error when I Ctrl+C to abort on the master instance:
WARNING: canceling wait for synchronous replication due to user request
DETAIL: The transaction has already committed locally, but might not have been replicated to the standby.
Is there a way to check what append, and why it stays stuck like this ?
EDIT 1
The content of pg_stat_replication :
Before query
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state
-------+----------+-----------+------------------+--------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+------------+------------+-----------+-----------+------------+---------------+------------
54431 | 16384 | replicate | "lab-3" | slave-ip | | 47742 | 2019-08-06 07:56:48.105056+02 | | streaming | 0/110000D0 | 0/110000D0 | 0/110000D0 | 0/110000D0 | | | | 0 | async
(1 row)
While it hangs / after
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state
-------+----------+-----------+------------------+--------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+------------+------------+-----------------+-----------------+---------------+---------------+------------
54431 | 16384 | replicate | "lab-3" | slave-ip | | 47742 | 2019-08-06 07:56:48.105056+02 | | streaming | 0/11000C10 | 0/11000C10 | 0/11000C10 | 0/11000C10 | 00:00:00.000521 | 00:00:00.004421 | 00:00:00.0045 | 0 | async
(1 row)
Thanks !
As Laurenz Albe said, the problem was the quoting of the synchronous standby name.
Documentation explains that it should be quoted in the synchronous_standby_names configuration entry on master server if it contains dash, but it must not be quoted in the primary_conninfo value on the slave.

How to get database size accurately in postgresql?

We are running on PostgreSQL version 9.1, previously we had over 1Billion rows in one table and has been deleted. However, it looks like the \l+ command still reports inaccurately about the actual database size (it reported 568GB but in reality it's much much less than).
The proof of that 568GB is wrong is that the individual table size tally didn't add up to the number, as you can see, top 20 relations has 4292MB in size, the remaining 985 relations are all well below 10MB. In fact all of them add up to about less than 6GB.
Any idea why PostgreSQL so much bloat? If confirmed, how can I debloat? I am not super familiar with VACUUM, is that what I need to do? If so, how?
Much appreciate it.
pmlex=# \l+
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges | Size | Tablespace | Description
-----------------+----------+----------+-------------+-------------+-----------------------+---------+------------+--------------------------------------------
pmlex | pmlex | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | 568 GB | pg_default |
pmlex_analytics | pmlex | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | 433 MB | pg_default |
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | 5945 kB | pg_default | default administrative connection database
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +| 5841 kB | pg_default | unmodifiable empty database
| | | | | postgres=CTc/postgres | | |
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +| 5841 kB | pg_default | default template for new databases
| | | | | postgres=CTc/postgres | | |
(5 rows)
pmlex=# SELECT nspname || '.' || relname AS "relation",
pmlex-# pg_size_pretty(pg_relation_size(C.oid)) AS "size"
pmlex-# FROM pg_class C
pmlex-# LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
pmlex-# WHERE nspname NOT IN ('pg_catalog', 'information_schema')
pmlex-# ORDER BY pg_relation_size(C.oid) DESC;
relation | size
-------------------------------------+---------
public.page_page | 1289 MB
public.page_pageimagehistory | 570 MB
pg_toast.pg_toast_158103 | 273 MB
public.celery_taskmeta_task_id_key | 233 MB
public.page_page_unique_hash_uniq | 140 MB
public.page_page_ad_text_id | 136 MB
public.page_page_kn_result_id | 125 MB
public.page_page_seo_term_id | 124 MB
public.page_page_kn_search_id | 124 MB
public.page_page_direct_network_tag | 124 MB
public.page_page_traffic_source_id | 123 MB
public.page_page_active | 123 MB
public.page_page_is_referrer | 123 MB
public.page_page_category_id | 123 MB
public.page_page_host_id | 123 MB
public.page_page_serp_id | 121 MB
public.page_page_domain_id | 120 MB
public.celery_taskmeta_pkey | 106 MB
public.page_pagerenderhistory | 102 MB
public.page_page_campaign_id | 89 MB
...
...
...
pg_toast.pg_toast_4354379 | 0 bytes
(1005 rows)
Your options include:
1). Ensuring autovacuum is enabled and set aggressively.
2). Recreating the table as I mentioned in an earlier comment (create-table-as-select + truncate + reload the original table).
3). Running CLUSTER on the table if you can afford to be locked out of that table (exclusive lock).
4). VACUUM FULL, though CLUSTER is more efficient and recommended.
5). Running a plain VACUUM ANALYZE a few times and leaving the table as-is, to eventually fill the space back up as new data comes in.
6). Dump and reload the table via pg_dump
7). pg_repack (though I haven't used it in production)
it will likely look different if you use pg_total_relation_size instead of pg_relation_size
pg_relation_size doesn't give the total size of the table, see
https://www.postgresql.org/docs/9.5/static/functions-admin.html#FUNCTIONS-ADMIN-DBSIZE