How to shrink pg_toast table? - postgresql

I am running on postgres 9.3 on mac osx and I have a database which grew out of control. I used to have table which had one column which stored large data. Then I noticed that there the db size grew up to around 19gb just because of a pg_toast table. Then I remove the mentioned column and ran vacuum in order to get the db to a smaller size again, but it remained the same. So how can I shrink the database size?
SELECT nspname || '.' || relname AS "relation"
,pg_size_pretty(pg_relation_size(C.oid)) AS "size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_relation_size(C.oid) DESC
LIMIT 20;
results in
pg_toast.pg_toast_700305 | 18 GB
pg_toast.pg_toast_700305_index | 206 MB
public.catalog_hotelde_images | 122 MB
public.routes | 120 MB
VACUUM VERBOSE ANALYZE pg_toast.pg_toast_700305; INFO: vacuuming "pg_toast.pg_toast_700305"
INFO: index "pg_toast_700305_index" now contains 9601330 row versions in 26329 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.06s/0.02u sec elapsed 0.33 sec.
INFO: "pg_toast_700305": found 0 removable, 0 nonremovable row versions in 0 out of 2393157 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.06s/0.07u sec elapsed 0.37 sec.
VACUUM
structure of the routes table
id serial NOT NULL,
origin_id integer,
destination_id integer,
total_time integer,
total_distance integer,
speed_id integer,
uid bigint,
created_at timestamp without time zone,
updated_at timestamp without time zone,
CONSTRAINT routes_pkey PRIMARY KEY (id)

You can use one of the two types of vacuuming: standard or full.
standard:
VACUUM table_name;
full:
VACUUM FULL table_name;
Keep in mind that VACUUM FULL locks the table it is working on until it's finished.
You may want to perform standard vacuum more frequently on your tables which have frequent upload/delete activity, it may not give you as much space as vacuum full does but you will be able to run operations like SELECT, INSERT, UPDATE and DELETE and it will take less time to complete.
In my case, when pg_toast (along with other tables) got out of control, standard VACUUM made a slight difference but was not enough. I used VACUUM FULL to reclaim more disk space which was very slow on large relations. I decided to tune autovacuum and use standard VACUUM more often on my tables which are updated frequently.
If you need to use VACUUM FULL, you should do it when your users are less active.
Also, do not turn off autovacuum.
You can get some additional information by adding verbose to your commands:
VACUUM FULL VERBOSE table_name;

Try the following:
vacuum full

Related

Postgres: Autovacuum and autovacuum wraparound. When does each start?

I've got 'autovacuum_freeze_max_age' which is 200 000 000 by default.
And in theory I found a rule that autovacuum wraparound starts when:
If age(relfrozenxid) > autovacuum_freeze_max_age
But when then usual autovacuum is started? How can I count a moment:
When usual autovacuum on a table is started?
When autovacuum becomes autovacuum wraparound? Really after age(relfrozenxid) > autovacuum_freeze_max_age?
As the documentation states, normal autovacuum is triggered
if the number of tuples obsoleted since the last VACUUM exceeds the “vacuum threshold”, the table is vacuumed. The vacuum threshold is defined as:
vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples
where the vacuum base threshold is autovacuum_vacuum_threshold, the vacuum scale factor is autovacuum_vacuum_scale_factor, and the number of tuples is pg_class.reltuples.
The table is also vacuumed if the number of tuples inserted since the last vacuum has exceeded the defined insert threshold, which is defined as:
vacuum insert threshold = vacuum base insert threshold + vacuum insert scale factor * number of tuples
where the vacuum insert base threshold is autovacuum_vacuum_insert_threshold, and vacuum insert scale factor is autovacuum_vacuum_insert_scale_factor.
The second part applies only to PostgreSQL v13 and later.
Furthermore,
If the relfrozenxid value of the table is more than vacuum_freeze_table_age transactions old, an aggressive vacuum is performed to freeze old tuples and advance relfrozenxid; otherwise, only pages that have been modified since the last vacuum are scanned.
So an autovacuum worker run that was triggered by the normal mechanism can run as an anti-wraparound VACUUM if there are old enough rows in the table.
Finally,
Tables whose relfrozenxid value is more than autovacuum_freeze_max_age transactions old are always vacuumed
So if a table with old live tuples is never autovacuumed during normal processing, a special anti-wraparound autovacuum run is triggered for it, even if autovacuum is disabled. Such an autovacuum run is also forced if there are multixacts that are older than vacuum_multixact_freeze_table_age, see here. From PostgreSQL v14 on, if an unfrozen row in a table is older than vacuum_failsafe_age, an anti-wraparound autovacuum will skip index cleanup for faster processing.
Yes, this is pretty complicated.
Made a query which shows the dead turple (when simple vacuum is started) and when vacuum wrapadaround:
with dead_tup as (
SELECT st.schemaname || '.' || st.relname tablename,
st.n_dead_tup dead_tup,
current_setting('autovacuum_vacuum_threshold')::int8 +
current_setting('autovacuum_vacuum_scale_factor')::float * c.reltuples
max_dead_tup,
(current_setting('autovacuum_vacuum_threshold')::int8 +
current_setting('autovacuum_vacuum_scale_factor')::float * c.reltuples - st.n_dead_tup) as left_for_tr_vacuum,
st.last_autovacuum,
c.relnamespace,
c.oid
FROM pg_stat_all_tables st,
pg_class c
WHERE c.oid = st.relid
AND c.relkind IN ('r','m','t')
AND st.schemaname not like ('pg_temp%'))
SELECT c.oid::regclass as table,
current_setting('autovacuum_freeze_max_age')::int8 -
age(c.relfrozenxid) as xid_left,
pg_relation_size(c.oid) as relsize,
dt.dead_tup,
dt.max_dead_tup,
dt.left_for_tr_vacuum,
dt.last_autovacuum
from (pg_class c
join pg_namespace n on (c.relnamespace=n.oid)
left join dead_tup dt on (c.relnamespace=dt.relnamespace and c.oid=dt.oid))
where c.relkind IN ('r','m','t') --and (age(c.relfrozenxid)::int8 > (current_setting('autovacuum_freeze_max_age')::int8 * 0.8))
AND n.nspname not like ('pg_temp%')
order by 2

Understanding auto-vacuum and when it is triggered

We've noticed one of our tables growing considerably on PG 12. This table is the target of very frequent updates, with a mix of column types, including a very large text column (often with over 50kb of data) - we run a local cron job that looks for rows older than X time and set the text column to a null value (as we no longer need the data for that particular column after X amount of time).
We understand this does not actually free up disk space due to the MVCC model, but we were hoping that auto-vacuum would take care of this. To our surprise, the table continues to grow (now over 40gb worth) without auto-vacuum running. Running a vacuum manually has addressed the issue and we no longer see growth.
This has lead me to investigate other tables, I'm realising that I don't understand how auto-vacuum is triggered at all.
Here is my understanding of how it works, which hopefully someone can pick apart:
I look for tables that have a large amount of dead tuples in them:
select * from pg_stat_all_tables ORDER BY n_dead_tup desc;
I identify tableX with 33169557 dead tuples (n_dead_tup column).
I run a select * from pg_class ORDER BY reltuples desc; to check how many estimated rows there are on table tableX
I identify 1725253 rows via the reltuples column.
I confirm my autovacuum settings: autovacuum_vacuum_threshold = 50 and autovacuum_vacuum_scale_factor = 0.2
I apply the formula threshold + pg_class.reltuples * scale_factor, so, 50 + 1725253 * 0.2 which returns 345100.6
It is my understanding that auto-vacuum will start on this table once ~345100 dead tuples are found. But tableX is already at a whopping 33169557 dead tuples!, The last_autovacuum on this table was back in February.
Any clarification would be welcome.
Your algorithm is absolutely correct.
Here are some reasons why things could go wrong:
autovacuum runs, but is so slow that it never gets done
If you see no running autovacuum, that is not your problem.
autovacuum runs, but a long running open transaction prevents it from removing dead tuples
other tables need to be vacuumed more urgently (to avoid transaction ID wraparound), so the three workers are busy with other things
autovacuum runs, but conflicts with high concurrent locks on the table (LOCK TABLE, ALTER TABLE, ...)
This makes autovacuum give up and try again later.
autovacuum is disabled, perhaps only for that table

Postgres' insert fail results in doubled database size - VACUUM FULL does not reclaim space

So this has been baffling me for days.
I have a Postgres database with the Timescale extension active. I have a table table_name which is partitioned by week on field created_at (date time with timezone) using Postgres data partitioning feature. It had about 244M rows (~300GB). I wanted to take advantage of the Timescaledb extension and move all the data from the data partitions to a hypertable, so I followed the same db migration steps: https://docs.timescale.com/latest/getting-started/migrating-data#same-db :
I created a table, table_name_timescale like the table_name but without the primary key on id as the hypertable creation would not work. Ran SELECT create_hypertable('table_name_timescale', 'created_at'); and created the hypertable successfully.
Then ran INSERT INTO table_name_timescale SELECT * FROM table_name;, which failed after a few hours because server ran out of disk space. The initial db was ~ 350GB, Server had about 1TB in total, about half of which was free, table_name occupied about ~ 300GBs.
After that the database size went up to ~750GB and I cannot recover that space. table_name_timescale did not contain any rows, neither did all the Timescale-specific tables.
Among the things I've tried:
A VACUUM FULL VERBOSE on the table
VACUUM FULL VERBOSE on each partition
Example output of VACUUM FULL ANALYZE
VACUUM FULL ANALYZE VERBOSE table_name;
INFO: vacuuming "public.table_name"
INFO: "request_sets": found 0 removable, 0 nonremovable row versions in 0 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU: user: 0.09 s, system: 0.00 s, elapsed: 0.10 s.
INFO: analyzing "public.table_name"
INFO: "table_name": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing "public.table_name" inheritance tree
INFO: "table_name_y2020_w23": scanned 19639 of 2653099 pages, containing 163525 live rows and 0 dead rows; 19639 rows in sample, 22091146 estimated total rows
INFO: "table_name_y2020_w24": scanned 24264 of 3277907 pages, containing 201591 live rows and 0 dead rows; 24264 rows in sample, 27233620 estimated total rows
INFO: "table_name_y2020_w25": scanned 24970 of 3373276 pages, containing 205052 live rows and 0 dead rows; 24970 rows in sample, 27701121 estimated total rows
INFO: "table_name_y2020_w26": scanned 21279 of 2874745 pages, containing 170232 live rows and 0 dead rows; 21279 rows in sample, 22997960 estimated total rows
INFO: "table_name_y2020_w27": scanned 21227 of 2867687 pages, containing 169816 live rows and 0 dead rows; 21227 rows in sample, 22941496 estimated total rows
INFO: "table_name_y2020_w28": scanned 19487 of 2632630 pages, containing 155896 live rows and 0 dead rows; 19487 rows in sample, 21061040 estimated total rows
INFO: "table_name_y2020_w29": scanned 19134 of 2584880 pages, containing 153072 live rows and 0 dead rows; 19134 rows in sample, 20679040 estimated total rows
VACUUM
In general, VACUUM FULL does not come up with dead tuples or release any space, especially since the table_name is only used for inserts and no deletes or updates at all.
Ran a manual CHECKPOINT to release space from wal - I understand that this is not the correct way but...
Retried the insert, but per partition this time INSERT INTO table_name_timescale SELECT * FROM table_name_y2020_w1;, in case this would force the server to re-write/ release space (some of the partitions were inserted successfully but when we reached e.g. week 28, it once more failed with panic and out of space). I didn't want to delete the data in partitions that were successfully inserted because I want to be sure Timescale works as it is supposed to before deleting anything.
Discarded some old data that wasn't in use, table size went down to 155GB, 159GB with index and toast.
Ran SELECT drop_chunks(interval '1 days', 'table_name_timescale'); to drop anything that could have been inserted into timescale, which of course did nothing.
Dropped table_name_timescale, which again did nothing as it did not contain any data, but it did take a lot of time to complete.
Checked all timescale related tables to see if there's anything hanging, checked also with VACUUM ANALYZE, nothing.
Checked for locks
Checked for long running transactions
Stopped - restarted the server
I will eventually to back-up - drop db - restore, (I've tried restoring on a different server and the db size was back to normal) but before I do that I would really like to understand what happened here. What I cannot explain or understand is that the table size is 159GB* in total (checking with https://wiki.postgresql.org/wiki/Disk_Usage), the database size is 541GB (SELECT pg_size_pretty( pg_database_size('database_name') );). And of course I can see that under /var/lib/postgresql/data/base/24834 the files occupy the 541GB*.
(* after I deleted some old rows as mentioned above)
The output of:
SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
AND C.relkind <> 'i'
AND nspname !~ '^pg_toast'
ORDER BY pg_total_relation_size(C.oid) DESC
LIMIT 10;
"public.table_name_y2020_w25" "26 GB"
"public.table_name_y2020_w24" "26 GB"
"public.table_name_y2020_w26" "22 GB"
"public.table_name_y2020_w27" "22 GB"
"public.table_name_y2020_w23" "21 GB"
"public.table_name_y2020_w28" "21 GB"
"public.table_name_y2020_w29" "20 GB"
"public.models" "1564 MB"
"public.table_name_y2020_w20" "567 MB"
"public.table_name_y2020_w18" "97 MB"
Digging into the directory as suggested:
select oid, * from pg_database where oid = 24834;
-- output:
oid datname datdba encoding datcollate datctype datistemplate datallowconn datconnlimit datlastsysoid datfrozenxid datminmxid dattablespace datacl
24834 database_name 10 6 en_US.utf8 en_US.utf8 FALSE TRUE -1 13043 30663846 293 1663
And
select relname, relfilenode from pg_class where oid='24834';
outputs nothing.
SELECT pg_relation_filepath('table_name');
pg_relation_filepath
base/24834/343621
select * from pg_class where relfilenode='343621';
relname relnamespace reltype reloftype relowner relam relfilenode reltablespace relpages reltuples relallvisible reltoastrelid relhasindex relisshared relpersistence relkind relnatts relchecks relhasoids relhaspkey relhasrules relhastriggers relhassubclass relrowsecurity relforcerowsecurity relispopulated relreplident relispartition relfrozenxid relminmxid relacl reloptions relpartbound
table_name 2200 24940 0 10 0 343621 0 0 0 0 24943 TRUE FALSE p r 26 0 FALSE TRUE FALSE TRUE TRUE FALSE FALSE TRUE d FALSE 30664717 293
Disclaimer: Postgres internals noob here, still reading upon how things work. Also, yes, space is tight on the server and ~400GB free space to do this transition is almost ok, but it is what it is unfortunately.. :)

Why is the postgreSQL waiting while executing vacuum full table? 4T table data

I have a bloated table, its name is "role_info".
There are about 20K insert operations and a lot of update operations per day, there are no delete operations.
The table is about 4063GB now.
We have migrated the table to another database using dump, and the new table is about 62GB, so the table on the old database is bloated very seriously.
PostgreSQL version: 9.5.4
The table schema is below:
CREATE TABLE "role_info" (
"roleId" bigint NOT NULL,
"playerId" bigint NOT NULL,
"serverId" int NOT NULL,
"status" int NOT NULL,
"baseData" bytea NOT NULL,
"detailData" bytea NOT NULL,
PRIMARY KEY ("roleId")
);
CREATE INDEX "idx_role_info_serverId_playerId_roleId" ON "role_info" ("serverId", "playerId", "roleId");
The average size of field 'detailData' is about 13KB each line.
There are some SQL execution results below:
1)
SELECT
relname AS name,
pg_stat_get_live_tuples(c.oid) AS lives,
pg_stat_get_dead_tuples(c.oid) AS deads
FROM pg_class c
ORDER BY deads DESC;
Execution Result:
2)
SELECT *,
Pg_size_pretty(total_bytes) AS total,
Pg_size_pretty(index_bytes) AS INDEX,
Pg_size_pretty(toast_bytes) AS toast,
Pg_size_pretty(table_bytes) AS TABLE
FROM (SELECT *,
total_bytes - index_bytes - Coalesce(toast_bytes, 0) AS
table_bytes
FROM (SELECT c.oid,
nspname AS table_schema,
relname AS TABLE_NAME,
c.reltuples AS row_estimate,
Pg_total_relation_size(c.oid) AS total_bytes,
Pg_indexes_size(c.oid) AS index_bytes,
Pg_total_relation_size(reltoastrelid) AS toast_bytes
FROM pg_class c
LEFT JOIN pg_namespace n
ON n.oid = c.relnamespace
WHERE relkind = 'r') a
WHERE table_schema = 'public'
ORDER BY total_bytes DESC) a;
Execution Result:
3)
I have tried to vacuum full the table "role_info", but it seemed blocked by some other process, and didn't execute at all.
select * from pg_stat_activity where query like '%VACUUM%' and query not like '%pg_stat_activity%';
Execution Result:
select * from pg_locks;
Execution Result:
There are parameters of vacuum:
I have two questions:
How to deal with table bloating? autovacuum seems not working.
Why did the vacuum full blocked?
With your autovacuum settings, it will sleep for 20ms once for every 10 pages (200 cost_limit / 20 cost_dirty) it dirties. Even more because there will also be cost_hit and cost_miss as well. At that rate is would take over 12 days to autovacuum a 4063GB table which is mostly in need of dirtying pages. That is just the throttling time, not counting the actual work-time, nor the repeated scanning of the indexes. So it the actual run time could be months. The chances of autovacuum getting to run to completion in one sitting without being interrupted by something could be pretty low. Does your database get restarted often? Do you build and drop indexes on this table a lot, or add and drop partitions, or run ALTER TABLE?
Note that in v12, the default setting of autovacuum_vacuum_cost_delay was lowered by a factor of 10. This is not just because of some change to the code in v12, it was because we realized the default setting was just not sensible on modern hardware. So it would probably make sense to backport this change your existing database, if not go even further. Before 12, you can't lower to less than 1 ms, but you could lower it to 1 ms and also either increase autovacuum_vacuum_cost_delay or lower vacuum_cost_page_* setting.
Now this analysis is based on the table already being extremely bloated. Why didn't autovacuum prevent it from getting this bloated in the first place, back when the table was small enough to be autovacuumed in a reasonable time? That is hard to say. We really have no evidence as to what happened back then. Maybe your settings were even more throttled than they are now (although unlikely as it looks like you just accepted the defaults), maybe it was constantly interrupted by something. What is the "autovacuum_count" from pg_stat_all_tables for the table and its toast table?
Why did the vacuum full blocked?
Because that is how it works, as documented. That is why it is important to avoid getting into this situation in the first place. VACUUM FULL needs to swap around filenodes at the end, and needs an AccessExclusive lock to do that. It could take a weaker lock at first and then try to upgrade to AccessExclusive later, but lock upgrades have a strong deadlock risk, so it takes the strongest lock it needs up front.
You need a maintenance window where no one else is using the table. If you think you are already in such window, then you should look at the query text for the process doing the blocking. Because the lock already held is ShareUpdateExclusive, the thing holding it is not a normal query/DML, but some kind of DDL or maintenance operation.
If you can't take a maintenance window now, then you can at least do a manual VACUUM without the FULL. This takes a much weaker lock. It probably won't shrink the table dramatically, but should at least free up space for internal reuse so it stops getting even bigger while you figure out when you can schedule a maintenance window or what your other next steps are.

Postgres slow distinct query for multiple columns

I have a very simple query that is taking way too long to run.
SELECT DISTINCT col1,col2,col3,col4 FROM tbl1;
What indexes do I need to add to speed up? I ran a simple vacuum; command and added the following index but neither helped.
CREATE INDEX tbl_idx ON tbl1(col1,col2,col3,col4);
The table has 400k rows. In fact counting them is taking extremely long as well. Running a simple
SELECT count(*) from tbl1;
is taking 8 seconds. So it's possible my problems are with vacuuming or reindexing or something I'm not sure.
Here is the explain command
EXPLAIN SELECT DISTINCT col1,col2,col3,col4 FROM tbl1;
QUERY PLAN
---------------------------------------------------------------------------------
Unique (cost=3259846.80..3449267.51 rows=137830 width=25)
-> Sort (cost=3259846.80..3297730.94 rows=15153657 width=25)
Sort Key: col1, col2, col3, col4
-> Seq Scan on tbl1 (cost=0.00..727403.57 rows=15153657 width=25)
(4 rows)
Edit: I'm currently running vacuum full; which hopefully fixes the issue and then maybe someone can give me some pointers on how to fix where I went wrong. It is several hours in and still going as far as I can tell. I did run
select relname, last_autoanalyze, last_autovacuum, last_vacuum, n_dead_tup from pg_stat_all_tables where n_dead_tup >0;
and the table has nearly 16 million n_dead_tup rows.
My data doesn't change that frequently so I ended up creating a materialized view
CREATE MATERIALIZED VIEW tbl1_distinct_view AS SELECT DISTINCT col1,col2,col3,col4 FROM tbl1;
that I refresh with a cronjob once a day at 6am
0 6 * * * psql -U mydb mydb -c 'REFRESH MATERIALIZED VIEW tbl1_distinct_view;
try force database to use your index
set enable_seqscan=off ;
SELECT DISTINCT col1,col2,col3,col4 FROM tbl1;
set enable_seqscan=on ;
VACUUM and VACUUM FULL are two commands that sound the same but have very different effects.
VACUUM scans a table for tuples that it no longer needs, so that it can overwrite that space during INSERT or UPDATE statements. This command only looks at deleted rows, and does not "defragment" the table - it leaves the space usage the same, but simply marks some space as "dead" in order that it can be reused.
VACUUM FULL looks at every row, and reclaims the space left by deleted rows and dead tuples, essentially "defragmenting" the table. If this is done on a live table, it can take a very long time, and can result in heavy weight locks, increased IO, and index bloat.
I imagine what you need is a VACUUM followed by an ANALYZE, which will rebuild your statistics for each table, improving index performance. These should be performed reasonably regularly in low-usage times for a database. Only if you have a lot of space to reclaim (due to lots of DELETE statements) should you use VACUUM FULL.
Anyhow, since you've run a VACUUM FULL, once that it complete you should run an ANALYZE on the database, followed by a REINDEX (on the database), and then an EXPLAIN on your query again, you should notice an improvement.