No much improvement on max transaction id after vacuum full - postgresql

We did a vacuum full on our table and toast. The dead tuples dropped drastically, however the max transaction id stays pretty much the same. My question is, why did it the max transaction id not go down as dead tuples go down drastically?
Before
select relname,last_autovacuum ,n_tup_upd,n_tup_del,n_tup_hot_upd,n_live_tup,n_dead_tup,n_mod_since_analyze,vacuum_count,autovacuum_count from pg_stat_all_tables where relname in ('examples','pg_toast_16450');
relname | last_autovacuum. | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | vacuum_count | autovacuum_count
----------------+-------------------------------+-----------+------------+---------------+------------+------------+---------------------+--------------+------------------
examples | 2022-01-18 23:26:52.432808+00 | 57712813 | 9818 | 48386674 | 3601588 | 306558 | 42208 | 0 | 44
pg_toast_16450 | 2022-01-17 23:14:42.516933+00 | 0 | 5735566377 | 0 | 3763818 | 805501171 | 11472355929 | 0 | 51
SELECT max(age(datfrozenxid)) FROM pg_database;
max
-----------
199857797
After
select relname,last_autovacuum ,n_tup_upd,n_tup_del,n_tup_hot_upd,n_live_tup,n_dead_tup,n_mod_since_analyze,vacuum_count,autovacuum_count from pg_stat_all_tables where relname in ('examples','pg_toast_16450');
relname | last_autovacuum | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | vacuum_count | autovacuum_count
----------------+-------------------------------+-----------+-------------+--------------+------------+------------+---------------------+--------------+------------------
examples | 2022-02-01 15:41:17.722575+00 | 120692014 | 9818 | 98148003 | 4172134 | 17666 | 150566 | 1 | 4064
pg_toast_16450 | 2022-02-01 20:49:30.552251+00 | 0 | 16169731895 | 0 | 5557218 | 33365 | 32342853690 | 0 | 15281
SELECT max(age(datfrozenxid)) FROM pg_database;
max
-----------
183888023

Yes, that is as expected. You need VACUUM to freeze tuples. VACUUM (FULL) doesn't.
Users tend to be confused, because both are triggered by the VACUUM statement, but VACUUM (FULL) is actually something entirely different from VACUUM. It is not just “a more thorough VACUUM”. The only thing they have in common is that they get rid of dead tuples. VACUUM (FULL) does not modify tuples, as freezing has to do, it just copies them around (or doesn't, if they are dead).

Related

Long Aurora autovacuum on postgres system tables

We have had an incredibly long running autovacuum process running on one of our smaller database machines that we believe has been using a lot of Aurora:StorageIOUsage:
We determined this by running SELECT * FROM pg_stat_activity WHERE wait_event_type = 'IO';
and seeing the below results repeatedly.
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
--------+----------------------------+-------+----------+-----------+------------------+----------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+--------------+--------+-------------+--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------
398954 | postgres | 17582 | | | | | | | 2022-09-29 18:45:55.364654+00 | 2022-09-29 18:46:20.253107+00 | 2022-09-29 18:46:20.253107+00 | 2022-09-29 18:46:20.253108+00 | IO | DataFileRead | active | | 66020718 | autovacuum: VACUUM pg_catalog.pg_depend | autovacuum worker
398954 | postgres | 17846 | | | | | | | 2022-09-29 18:46:04.092536+00 | 2022-09-29 18:46:29.196309+00 | 2022-09-29 18:46:29.196309+00 | 2022-09-29 18:46:29.19631+00 | IO | DataFileRead | active | | 66020732 | autovacuum: VACUUM pg_toast.pg_toast_2618 | autovacuum worker
As you can see from the screenshot it has been going for well over a month, and is mainly for the pg_depend, pg_attribute, and pg_toast_2618 tables which are not all that large. I haven't been able to find any reason why these tables would need so much vacuuming other than maybe a database restore from our production environment (this is one of our lower environments). Here are the pg_stat_sys_tables entries for those tables and the pg_rewrite which is the table that pg_toast_2618 is associated with:
relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count
-------+------------+---------------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------+-------------------------------+--------------+-------------------------------+--------------+------------------+---------------+-------------------
1249 | pg_catalog | pg_attribute | 185251 | 12594432 | 31892996 | 119366792 | 1102817 | 3792 | 1065737 | 1281 | 543392 | 1069529 | 23584 | | 2022-09-29 18:53:25.227334+00 | | 2022-09-28 01:12:47.628499+00 | 0 | 1266763 | 0 | 36
2608 | pg_catalog | pg_depend | 2429 | 369003445 | 14152628 | 23494712 | 7226948 | 0 | 7176855 | 0 | 476267 | 7176855 | 0 | | 2022-09-29 18:52:34.523257+00 | | 2022-09-28 02:02:52.232822+00 | 0 | 950137 | 0 | 71
2618 | pg_catalog | pg_rewrite | 25 | 155083 | 1785288 | 1569100 | 64127 | 314543 | 62472 | 59970 | 7086 | 377015 | 13869 | | 2022-09-29 18:53:11.288732+00 | | 2022-09-23 18:54:50.771969+00 | 0 | 1280018 | 0 | 81
2838 | pg_toast | pg_toast_2618 | 0 | 0 | 1413436 | 3954640 | 828571 | 0 | 825143 | 0 | 15528 | 825143 | 1653714 | | 2022-09-29 18:52:47.242386+00 | | | 0 | 608881 | 0 | 0
I'm pretty new to Postgres and I'm wondering what could possibly cause this level of records to need to be cleaned up, and why it would take well over a month to accomplish considering we always have autovacuum set to TRUE. We are running Postgres version 10.17 on a single db.t3.medium, and the only thing I can think of at this point is to try increasing the instance size. Do we simply need to increase our database instance size on our aurora cluster so that this can be done more in memory? I'm at a bit of a loss for how to reduce this huge sustained spike in Storage IO costs.
Additional information for our autovaccum settings:
=> SELECT * FROM pg_catalog.pg_settings WHERE name LIKE '%autovacuum%';
name | setting | unit | category | short_desc | extra_desc | context | vartype | source | min_val | max_val | enumvals | boot_val | reset_val | sourcefile | sourceline | pending_restart
-------------------------------------+-----------+------+-------------------------------------+-------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+------------+---------+--------------------+---------+------------+-----------------------------------------------------------------------------------------+-----------+-----------+-----------------------------------+------------+-----------------
autovacuum | on | | Autovacuum | Starts the autovacuum subprocess. | | sighup | bool | configuration file | | | | on | on | /rdsdbdata/config/postgresql.conf | 78 | f
autovacuum_analyze_scale_factor | 0.05 | | Autovacuum | Number of tuple inserts, updates, or deletes prior to analyze as a fraction of reltuples. | | sighup | real | configuration file | 0 | 100 | | 0.1 | 0.05 | /rdsdbdata/config/postgresql.conf | 55 | f
autovacuum_analyze_threshold | 50 | | Autovacuum | Minimum number of tuple inserts, updates, or deletes prior to analyze. | | sighup | integer | default | 0 | 2147483647 | | 50 | 50 | | | f
autovacuum_freeze_max_age | 200000000 | | Autovacuum | Age at which to autovacuum a table to prevent transaction ID wraparound. | | postmaster | integer | default | 100000 | 2000000000 | | 200000000 | 200000000 | | | f
autovacuum_max_workers | 3 | | Autovacuum | Sets the maximum number of simultaneously running autovacuum worker processes. | | postmaster | integer | configuration file | 1 | 262143 | | 3 | 3 | /rdsdbdata/config/postgresql.conf | 45 | f
autovacuum_multixact_freeze_max_age | 400000000 | | Autovacuum | Multixact age at which to autovacuum a table to prevent multixact wraparound. | | postmaster | integer | default | 10000 | 2000000000 | | 400000000 | 400000000 | | | f
autovacuum_naptime | 5 | s | Autovacuum | Time to sleep between autovacuum runs. | | sighup | integer | configuration file | 1 | 2147483 | | 60 | 5 | /rdsdbdata/config/postgresql.conf | 9 | f
autovacuum_vacuum_cost_delay | 5 | ms | Autovacuum | Vacuum cost delay in milliseconds, for autovacuum. | | sighup | integer | configuration file | -1 | 100 | | 20 | 5 | /rdsdbdata/config/postgresql.conf | 73 | f
autovacuum_vacuum_cost_limit | -1 | | Autovacuum | Vacuum cost amount available before napping, for autovacuum. | | sighup | integer | default | -1 | 10000 | | -1 | -1 | | | f
autovacuum_vacuum_scale_factor | 0.1 | | Autovacuum | Number of tuple updates or deletes prior to vacuum as a fraction of reltuples. | | sighup | real | configuration file | 0 | 100 | | 0.2 | 0.1 | /rdsdbdata/config/postgresql.conf | 22 | f
autovacuum_vacuum_threshold | 50 | | Autovacuum | Minimum number of tuple updates or deletes prior to vacuum. | | sighup | integer | default | 0 | 2147483647 | | 50 | 50 | | | f
autovacuum_work_mem | -1 | kB | Resource Usage / Memory | Sets the maximum memory to be used by each autovacuum worker process. | | sighup | integer | default | -1 | 2147483647 | | -1 | -1 | | | f
log_autovacuum_min_duration | -1 | ms | Reporting and Logging / What to Log | Sets the minimum execution time above which autovacuum actions will be logged. | Zero prints all actions. -1 turns autovacuum logging off. | sighup | integer | default | -1 | 2147483647 | | -1 | -1 | | | f
rds.force_autovacuum_logging_level | disabled | | Customized Options | Emit autovacuum log messages irrespective of other logging configuration. | Each level includes all the levels that follow it.Set to disabled to disable this feature and fall back to using log_min_messages. | sighup | enum | default | | | {debug5,debug4,debug3,debug2,debug1,info,notice,warning,error,log,fatal,panic,disabled} | disabled | disabled | | | f
I would say you have some very long-lived snapshot being held. These tables need to be vacuumed, but the vacuum doesn't accomplish anything because the dead tuples can't be removed as some old snapshot still can see them. So immediately after being vacuumed, they are still eligible to be vacuumed again. So it tries again every 5 seconds (autovacuum_naptime), because autovacuum doesn't have a way to say "Don't bother until this snapshot which blocked me from accomplishing anything last time goes away"
Check pg_stat_activity for very old 'idle in transaction' and for any prepared transactions.

PostgreSQL insert performance - why would it be so slow?

I've got a PostgreSQL database running inside a docker container on an AWS Linux instance. I've got some telemetry running, uploading records in batches of ten. A Python server inserts these records into the database. The table looks like this:
postgres=# \d raw_journey_data ;
Table "public.raw_journey_data"
Column | Type | Collation | Nullable | Default
--------+-----------------------------+-----------+----------+---------
email | character varying | | |
t | timestamp without time zone | | |
lat | numeric(20,18) | | |
lng | numeric(21,18) | | |
speed | numeric(21,18) | | |
There aren't that many rows in the table; about 36,000 presently. But committing the transactions that insert the data is taking about a minute each time:
postgres=# SELECT pid, age(clock_timestamp(), query_start), usename, query
FROM pg_stat_activity
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start desc;
pid | age | usename | query
-----+-----------------+----------+--------
30 | | |
32 | | postgres |
28 | | |
27 | | |
29 | | |
37 | 00:00:11.439313 | postgres | COMMIT
36 | 00:00:11.439565 | postgres | COMMIT
39 | 00:00:36.454011 | postgres | COMMIT
56 | 00:00:36.457828 | postgres | COMMIT
61 | 00:00:56.474446 | postgres | COMMIT
35 | 00:00:56.474647 | postgres | COMMIT
(11 rows)
The load average on the system's CPUs is zero and about half of the 4GB system RAM is available (as shown by free). So what causes the super-slow commits here?
The insertion is being done with SqlAlchemy:
db.session.execute(import_table.insert([
{
"email": current_user.email,
"t": row.t.ToDatetime(),
"lat": row.lat,
"lng": row.lng,
"speed": row.speed
}
for row in data.data
]))
Edit Update with the state column:
postgres=# SELECT pid, age(clock_timestamp(), query_start), usename, state, query
FROM pg_stat_activity
WHERE query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start desc;
pid | age | usename | state | query
-----+-----------------+----------+-------+--------
32 | | postgres | |
30 | | | |
28 | | | |
27 | | | |
29 | | | |
46 | 00:00:08.390177 | postgres | idle | COMMIT
49 | 00:00:08.390348 | postgres | idle | COMMIT
45 | 00:00:23.35249 | postgres | idle | COMMIT
(8 rows)

Why VACUUM FULL doesn't shrink the size of my table?

I'm struggling with a bloated table that I'm unable to shrink. It has just 6 rows but its size is 140MB+ and it's continously updated\deleted by quick transactions. I tried using VACUUM and VACUUM FULL but there's no result.
These are the table structure and the related statistics:
\d bloated_table
COLUMN | TYPE | Collation | Nullable | DEFAULT
-------------------------+-----------------------------+-----------+----------+---------
col1 | BIGINT | | NOT NULL |
<omissis> | CHARACTER varying(100) | | |
<omissis> | CHARACTER varying(50) | | |
<omissis> | TIMESTAMP WITHOUT TIME ZONE | | |
<omissis> | BIGINT | | |
<omissis> | BIGINT | | |
<omissis> | BIGINT | | |
<omissis> | TEXT | | |
INDEXES:
"<omissis>" PRIMARY KEY, btree (col1)
Referenced BY:
TABLE "<omissis>" CONSTRAINT "<omissis>" FOREIGN KEY (col1) REFERENCES <omissis>(col1)
SELECT ROUND(n_dead_tup::NUMERIC/NULLIF(n_live_tup::NUMERIC,0),2), *
FROM pg_catalog.pg_stat_user_tables
WHERE n_dead_tup>0
ORDER BY 1 DESC NULLS LAST
FETCH FIRST ROW ONLY;
round | relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | n_ins_since_vacuum | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count
-----------+----------+----------------------+----------------------------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+--------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+--------------+------------------+---------------+-------------------
215689.17 | 59328017 | db_bloated_table | bloated_table | 27462255 | 279950918 | 679 | 691 | 20 | 25895488 | 14 | 25476514 | 6 | 1294135 | 7 | 0 | 2022-07-06 07:32:24.031073+00 | 2022-07-06 07:39:54.601903+00 | 2022-07-05 22:06:37.492046+00 | 2022-07-06 07:39:54.657717+00 | 30 | 39195 | 26 | 38875
The table size is 143MB:
SELECT pg_size_pretty(pg_total_relation_size('bloated_table'::regclass));
pg_size_pretty
----------------
143 MB
UPDATE HERE sorry, I badly pasted the VACUUM output:
Following the VACUUM output:
> vacuum (verbose) bloated_table;
INFO:  vacuuming "argodb.bloated_table"
INFO:  "bloated_table": found 0 removable, 25570 nonremovable row versions in 343 out of 343 pages
DETAIL:  25564 dead row versions cannot be removed yet, oldest xmin: 87657915
There were 16 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  vacuuming "pg_toast.pg_toast_59328017"
INFO:  "pg_toast_59328017": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 87657915
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM
> vacuum (full,verbose) bloated_table;
INFO:  vacuuming "argodb.bloated_table"
INFO:  "bloated_table": found 0 removable, 29068 nonremovable row versions in 389 pages
DETAIL:  29062 dead row versions cannot be removed yet.
CPU: user: 0.03 s, system: 0.01 s, elapsed: 0.06 s.
VACUUM
And finally, there aren't any long-last opened transaction, abandoned replication slots nor orphaned prepared transactions:
--Abandoned replication slots
>SELECT slot_name, slot_type, DATABASE, xmin
FROM pg_replication_slots
ORDER BY AGE(xmin) DESC;
slot_name | slot_type | DATABASE | xmin
-----------+-----------+----------+------
(0 ROWS)
--Orphaned prepared transactions
> SELECT gid, PREPARED, OWNER, DATABASE, TRANSACTION AS xmin
FROM pg_prepared_xacts
ORDER BY AGE(TRANSACTION) DESC;
gid | PREPARED | OWNER | DATABASE | xmin
-----+----------+-------+----------+------
(0 ROWS)
My environment is the following: PG 13.6 on MS Azure Flexible Server
Thank you in advance for your help.

Timescaledb: retention policy isn't removing data from hypertable

(note: I've also posted this as a github issue https://github.com/timescale/timescaledb/issues/3653)
I have a hypertable request_logs configured with a 24 hour retention policy. The retention policy is being marked as running successfully, however no old data from the table is being removed. The table continues to grow day by day.
I checked and don't see any errors in the postgresql log files.
Could use additional guidance on where to look for information to troubleshoot this issue.
request_logs table structure
\d+ request_logs;
Table "public.request_logs"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------+--------------------------+-----------+----------+---------+---------+--------------+-------------
time | timestamp with time zone | | not null | | plain | |
referer | bigint | | | | plain | |
useragent | bigint | | | | plain | |
Indexes:
"request_logs_time_idx" btree ("time" DESC)
Triggers:
ts_insert_blocker BEFORE INSERT ON request_logs FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()
Child tables: _timescaledb_internal._hyper_1_37_chunk,
_timescaledb_internal._hyper_1_38_chunk,
_timescaledb_internal._hyper_1_40_chunk
Access method: heap
This is the hypertable description retrieved by running select * from _timescaledb_catalog.hypertable;
id | schema_name | table_name | associated_schema_name | associated_table_prefix | num_dimensions | chunk_sizing_func_schema | chunk_sizing_func_name | chunk_target_size | compression_state | compressed_hypertable_id | replication_factor
----+-------------+--------------+------------------------+-------------------------+----------------+--------------------------+--------------------------+-------------------+-------------------+--------------------------+--------------------
1 | public | request_logs | _timescaledb_internal | _hyper_1 | 1 | _timescaledb_internal | calculate_chunk_interval | 0 | 0 | |
(1 row)
This is the retention_policy retrieved by running SELECT * FROM timescaledb_information.job_stats;.
hypertable_schema | hypertable_name | job_id | last_run_started_at | last_successful_finish | last_run_status | job_status | last_run_duration | next_start | total_runs | total_successes | total_failures
-------------------+-----------------+--------+-------------------------------+-------------------------------+-----------------+------------+-------------------+-------------------------------+------------+-----------------+----------------
public | request_logs | 1002 | 2021-10-05 23:59:01.601404+00 | 2021-10-05 23:59:01.638441+00 | Success | Scheduled | 00:00:00.037037 | 2021-10-06 23:59:01.638441+00 | 8 | 8 | 0
| | 1 | 2021-10-05 08:38:20.473945+00 | 2021-10-05 08:38:21.153468+00 | Success | Scheduled | 00:00:00.679523 | 2021-10-06 08:38:21.153468+00 | 45 | 45 | 0
(2 rows)
Relevant system information:
OS: Ubuntu 20.04.3 LTS
PostgreSQL version (output of postgres --version): 12
TimescaleDB version (output of \dx in psql): 2.4.1
Installation method: apt install process described https://docs.timescale.com/timescaledb/latest/how-to-guides/install-timescaledb/self-hosted/ubuntu/installation-apt-ubuntu/#installation-apt-ubuntu
It looks as though this might be related to a bug that has been fixed in version 2.4.2 of TimescaleDB. The GitHub report has been updated, if you find that the issue remains after upgrade, please update the issue on GitHub with your example. Thanks for reporting!
Transparency: I work for Timescale

how is backend_start great than xact_start

How can the backend_start be greater than 2 days of xact_start/query_start? The 3rd sessions looks good, but the first 2 looks weird, is this possible? Would this mean anything?
pg=> select * from pg_catalog.pg_stat_activity where usename = 'etl_user' and state = 'active' and backend_xmin = 65201266;
datid | datname | pid |usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait_event_type | wait_event| state | backend_xid | backend_xmin | query | backend_type
-------+---------+-------+----------+----------+------------------------+----------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+------------+--------+-------------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------
16408 | pg| 37908 | 229661 | etl_user | PostgreSQL JDBC Driver | | | | 2021-04-20 21:36:22.540271+00 | 2021-04-17 22:31:32.314106+00 | 2021-04-17 22:31:32.317577+00 | 2021-04-20 21:36:22.541472+00 | || active | | 65201266 | SELECT 1 FROM (SELECT ...) | parallel worker
16408 | pg| 37909 | 229661 | etl_user | PostgreSQL JDBC Driver | | | | 2021-04-20 21:36:22.540909+00 | 2021-04-17 22:31:32.314106+00 | 2021-04-17 22:31:32.317577+00 | 2021-04-20 21:36:22.542134+00 | || active | | 65201266 | SELECT 1 FROM (SELECT ...) | parallel worker
16408 | pg| 3601 | 229661 | etl_user | PostgreSQL JDBC Driver | 10.175.130.142 | | 49832 | 2021-04-17 22:31:32.232008+00 | 2021-04-17 22:31:32.314106+00 | 2021-04-17 22:31:32.317577+00 | 2021-04-17 22:31:32.317578+00 | || active | | 65201266 | SELECT 1 FROM (SELECT ...) | client backend
(3 rows)
It looks to me like those are parallel workers started up to help the leader, and they inherit the leaders xact_start, but not backend_start. It would help to see the rest of the columns in pg_stat_activity, and know the version.
Yes, that looks impossible.
The only explanation that I have is that someone changed the system time since the sessions started.