I have big tables in which I have only inserts and selects, so when autovacuum for this tables is running - system is very slow. I have switch off autovacuum for specific tables:
ALTER TABLE ag_event_20141004_20141009 SET (autovacuum_enabled = false, toast.autovacuum_enabled = false);
ALTER TABLE ag_event_20141014_20141019 SET (autovacuum_enabled = false, toast.autovacuum_enabled = false);
After this (some time after) I see:
select pid, waiting, xact_start, query_start,query from pg_stat_activity order by query_start;
18092 | f | 2014-11-04 22:21:05.95512+03 | 2014-11-04 22:21:05.95512+03 | autovacuum: VACUUM public.ag_event_20141004_20141009 (to prevent wraparound)
19877 | f | 2014-11-04 22:22:05.889182+03 | 2014-11-04 22:22:05.889182+03 | autovacuum: VACUUM public.ag_event_20141014_20141019 (to prevent wraparound)
What shell I do to switch autovacuuming for this tables at all ??
The key here is:
(to prevent wraparound)
This means Postgres must autovacuum in order to free up transaction identifiers.
You can not entirely disable this type of autovacuum, but you can reduce its frequency by tuning the autovacuum_freeze_max_age and vacuum_freeze_min_age parameters.
Related
I found this query to see if autovacuum is diabled on a given table. I have autovacuum and statistics collector enabled on the postgresql.
SELECT reloptions FROM pg_class WHERE relname='my_table';
reloptions
----------------------------
{autovacuum_enabled=false}
(1 row)
but what I get is just a NULL value for the given table.
Does it mean autovacuum is not enabled in any of the tables i query? please advise
I found that bloat ratio of feedback_entity is 48%
current_database | schemaname | tblname | real_size | extra_size | extra_ratio | fillfactor | bloat_size | bloat_ratio | is_na
stackdb | public | feedback_entity | 5743878144 | 2785599488 | 48.4968416488746 | 100 | 2785599488 | 48.4968416488746 | f
but when I check autovacuum setting it has autovaccum setting of 10%
stackdb=> show autovacuum_vacuum_scale_factor;
autovacuum_vacuum_scale_factor
--------------------------------
0.1
(1 row)
stackdb=> show autovacuum_vacuum_threshold;
autovacuum_vacuum_threshold
-----------------------------
50
(1 row)
Also:
Autovacuum setting is on.
Autovacuum for mentioned table are running regularly at defined threshold
My Question is when auto vacuum is running at 10% of dead tuples why would bloat size increase to 48%. I have seen similar behaviour in hundreds of databases/tables. Why table bloat is always increasing and doesn't come down after every vacuum.
The query that you used to calculate the table bloat is unreliable. To determine the actual bloat, use the pgstattuple extension and query like this:
SELECT * FROM pgstattuple('public.feedback_entity');
But the table may really be bloated. There are two major reasons for that:
autovacuum runs and finishes in a reasonable time, but it cannot clean up the dead tuples. That may be because there is a long-running open transaction, an abandoned replication slot or a prepared transaction. See this article for details.
autovacuum runs too slow, so that dead rows are generated faster than it can clean them up. The symptoms are lots of dead tuples in pg_stat_user_tables and autovacuum processes that keep running forever. The straightforward solution is to use ALTER TABLE to increase autovacuum_vacuum_cost_limit or reduce autovacuum_vacuum_cost_delay for the afflicted table. An alternative approach, if possible, is to use HOT updates.
Index bloats are reaching 57%, while table bloat is 9% only and autovacuum_vacuum_Scale_factor is 10% only.
what is more surprising is even primary key is having bloat of 57%. My understanding is since my primary key is auto incrementing and single column key only so after 10% of table dead tuples, primary key index should also have 10% dead tuples.
Now when autovacuum will run at 10% of dead tuples , it will clean dead tuples. The dead tuple space now becomes bloat and this should be reused by new updates, insert. But this isn't happening in my database, here bloat size keeps on increasing.
FYI:
Index Bloat:
current_database | schemaname | tblname | idxname | real_size | extra_size | extra_ratio | fillfactor | bloat_size | bloat_ratio
| is_na
------------------+------------+----------------------+----------------------------------------------------------+------------+------------+------------------+------------+------------+-------------------
+-------
stackdb | public | data_entity | data_entity_pkey | 2766848000 | 1704222720 | 61.5943745373797 | 90 | 1585192960 | 57.2923760177646
Table Bloat:
current_database | schemaname | tblname | real_size | extra_size | extra_ratio | fillfactor | bloat_size | bloat_ratio | is_na
stackdb | public | data_entity | 10106732544 | 1007288320 | 9.96650812332014 | 100 | 1007288320 | 9.96650812332014 | f
Autovacuum Settings:
stackdb=> show autovacuum_vacuum_scale_factor;
autovacuum_vacuum_scale_factor
--------------------------------
0.1
(1 row)
stackdb=> show autovacuum_vacuum_threshold;
autovacuum_vacuum_threshold
-----------------------------
50
(1 row)
Note:
autovacuum is on
autovacuum is running successfully at defined intervals.
postgreSQL is running version 10.6. Same issue has been found with version 12.x
First: an index bloat of 57% is totally healthy. Don't worry.
Indexes become more bloated than tables, because the empty space cannot be reused as freely as it can be in a table. The table, also known al “heap”, has no predetermined ordering: if a new row is written as the result of an INSERT or UPDATE, it ends up in the first page that has enough free space, so it is easy to keep bloat low if VACUUM does its job.
B-tree indexes are different: their entries have a certain ordering, so the database is not free to choose where to put the new row. So you may have to put it into a page that is already full, causing a page split, while elsewhere in the index there are pages that are almost empty.
I have a large database with the largest tables having more than 30 million records. The database server is a dedicated server with 64 cores, 128 GB RAM running ubuntu and postgres 12. So the server is more powerful than we normally need. The server receives around 300-400 new records every second.
The problem is that almost after 1 week or 10 days of use the database becomes extremely slow, therefore we have to perform VACUUM FULL ANALYZE, and after this everything goes back to normal. But we have to put our server in maintenance mode and then perform this operation every week which is a pain.
I came up with the idea that we don't need a VACUUM FULL and we can just run ANALYZE on the database as it can run in parallel, but this didn't work. There was no performance gains after running this. Even when i run simple VACUUM on the whole database and then run ANALYZE after it, it still doesn't give the kind of performance boost that we get from VACUUM FULL ANALYZE.
I know that VACUUM FULL copies the data from the old table to a new tables and deletes the old table. But what else does it do?
Update:
So i have also reindexed the 15 largest tables, in order to confirm if this would speed up the database. But this also didnt work.
So i had to execute VACUUM FULL ANALYZE, as i didnt see any other way. Now i am trying to identify the slow queries.
Thanks to jjanes, i was able to install Track_io_timing and also identified a few queries where indexes can be added. I am using like this
SELECT * FROM pg_stat_statements ORDER BY total_time DESC;
And i get this result.
userid | 10
dbid | 16401
queryid | -3264485807545194012
query | update events set field1 = $1, field2 = $2 , field3= $3, field4 = $4 , field5 =$5 where id = $6
calls | 104559
total_time | 106180828.60536088
min_time | 3.326082
max_time | 259055.09376800002
mean_time | 1015.5111334783633
stddev_time | 1665.0715182035976
rows | 104559
shared_blks_hit | 4456728574
shared_blks_read | 4838722113
shared_blks_dirtied | 879809
shared_blks_written | 326809
local_blks_hit | 0
local_blks_read | 0
local_blks_dirtied | 0
local_blks_written | 0
temp_blks_read | 0
temp_blks_written | 0
blk_read_time | 15074237.05887792
blk_write_time | 15691.634870000113
This query simply updates 1 record, and the table size is around 30 Million records.
Question: This query already uses an index, can you please guide on what should be the next step and why is this slow? Also IO information does this show?
As you say, VACUUM FULL is an expensive command. PGs secret weapon is AUTOVACUUM, which monitors database stats and attempts to target tables with dead tuples. Read about how to tune it for the database as a whole, and possibly for big tables.
In pgAdmin, whenever a table's statistics are out-of-date, it prompts:
Running VACUUM recommended
The estimated rowcount on the table schema.table deviates
significantly from the actual rowcount. You should run VACUUM ANALYZE
on this table.
I've tested it using pgAdmin 3 and Postgres 8.4.4, with autovacuum=off. The prompt shows up immediately whenever I click a table that has been changed.
Let's say I'm making a web-based system in Java, how do I detect if a table is out-of-date, so that I can show a prompt like the one in pgAdmin?
Because of the nature of my application, here are a few rules I have to follow:
I want to know if the statistics of a certain table in pg_stats and pg_statistic are up to date.
I can't set the autovacuum flag in postgresql.conf. (In other words, the autovacuum flag can be on or off. I have no control over it. I need to tell if the stats are up-to-date whether the autovacuum flag is on or off.)
I can't run vacuum/analyze every time to make it up-to-date.
When a user selects a table, I need to show the prompt that the table is outdated when there are any updates to this table (such as drop, insert, and update) that are not reflected in pg_stats and pg_statistic.
It seems that it's not feasible by analyzing timestamps in pg_catalog.pg_stat_all_tables. Of course, if a table hasn't been analyzed before, I can check if it has a timestamp in last_analyze to find out whether the table is up-to-date. Using this method, however, I can't detect if the table is up-to-date when there's already a timestamp. In other words, no matter how many rows I add to the table, its last_analyze timestamp in pg_stat_all_tables is always for the first analyze (assuming the autovacuum flag is off). Therefore, I can only show the "Running VACUUM recommended" prompt for the first time.
It's also not feasible by comparing the last_analyze timestamp to the current timestamp. There might not be any updates to the table for days. And there might be tons of updates in one hour.
Given this scenario, how can I always tell if the statistics of a table are up-to-date?
Check the system catalogs.
=> SELECT schemaname, relname, last_autoanalyze, last_analyze FROM pg_stat_all_tables WHERE relname = 'accounts';
schemaname | relname | last_autoanalyze | last_analyze
------------+----------+-------------------------------+--------------
public | accounts | 2022-11-22 07:49:16.215009+00 |
(1 row)
=>
https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-ALL-TABLES-VIEW
All kinds of useful information in there:
test=# \d pg_stat_all_tables View "pg_catalog.pg_stat_all_tables"
Column | Type | Modifiers
-------------------+--------------------------+-----------
relid | oid |
schemaname | name |
relname | name |
seq_scan | bigint |
seq_tup_read | bigint |
idx_scan | bigint |
idx_tup_fetch | bigint |
n_tup_ins | bigint |
n_tup_upd | bigint |
n_tup_del | bigint |
n_tup_hot_upd | bigint |
n_live_tup | bigint |
n_dead_tup | bigint |
last_vacuum | timestamp with time zone |
last_autovacuum | timestamp with time zone |
last_analyze | timestamp with time zone |
last_autoanalyze | timestamp with time zone |
vacuum_count | bigint |
autovacuum_count | bigint |
analyze_count | bigint |
autoanalyze_count | bigint |
You should not have to worry about vac'ing in your application. Instead, you should have the autovac process configured on your server (in postgresql.conf), and the server takes takes of VACCUM and ANALYZE processes based on its own internal statistics. You can configure how often it should run, and what the threshold variables are for it to process.