Postgresql 9.3 Autovacuum not keeping up despite aggressive settings - postgresql

Trying to get Postgresql to keep tables a lot cleaner however even after tweaking resource limits it doesn't seem to be keeping up hardly at all.
Even after setting
ALTER TABLE veryactivetable SET (autovacuum_vacuum_threshold = 10000);
the pg_stat_user_tables for veryactivetable returns 63356 n_dead_tup and a last_autoanalyze & last_autovacuum is over 24 hours old
posgresql.conf settings :
shared_buffers = 7680MB
work_mem = 39321kB
maintenance_work_mem = 1920MB
vacuum_cost_delay = 0
vacuum_cost_page_hit = 1000
vacuum_cost_page_miss = 1000
vacuum_cost_page_dirty = 2000
vacuum_cost_limit = 7000
autovacuum = on
log_autovacuum_min_duration = 0
autovacuum_max_workers = 10
autovacuum_naptime = 10s
autovacuum_vacuum_threshold = 50
autovacuum_analyze_threshold = 50
autovacuum_vacuum_scale_factor = 0.05
autovacuum_analyze_scale_factor = 0.05
autovacuum_freeze_max_age = 200000000
autovacuum_vacuum_cost_delay = 50ms
autovacuum_vacuum_cost_limit = 7000

Set autovacuum_vacuum_scale_factor and autovacuum_analyze_scale_factor to a higher value, probably best back to the default. No need to let autovacuum run more often than necessary, particularly in your situation. The idea is to make it finish fast once it starts.
Set autovacuum_naptime higher, closer to the original default of one minute.
Restore autovacuum_max_workers to 3 unless you have a lot of databases or a lot of tables.
What you should do to make autovacuum finish as fast as possible (which is the goal) is to set autovacuum_vacuum_cost_delay to 0.
If you have just a few very busy tables, it is best to set it on those tables like you show in your question.

Related

PostgreSQL why table bloat ratio is higher than autovacuum_vacuum_scale_factor

I found that bloat ratio of feedback_entity is 48%
current_database | schemaname | tblname | real_size | extra_size | extra_ratio | fillfactor | bloat_size | bloat_ratio | is_na
stackdb | public | feedback_entity | 5743878144 | 2785599488 | 48.4968416488746 | 100 | 2785599488 | 48.4968416488746 | f
but when I check autovacuum setting it has autovaccum setting of 10%
stackdb=> show autovacuum_vacuum_scale_factor;
autovacuum_vacuum_scale_factor
--------------------------------
0.1
(1 row)
stackdb=> show autovacuum_vacuum_threshold;
autovacuum_vacuum_threshold
-----------------------------
50
(1 row)
Also:
Autovacuum setting is on.
Autovacuum for mentioned table are running regularly at defined threshold
My Question is when auto vacuum is running at 10% of dead tuples why would bloat size increase to 48%. I have seen similar behaviour in hundreds of databases/tables. Why table bloat is always increasing and doesn't come down after every vacuum.
The query that you used to calculate the table bloat is unreliable. To determine the actual bloat, use the pgstattuple extension and query like this:
SELECT * FROM pgstattuple('public.feedback_entity');
But the table may really be bloated. There are two major reasons for that:
autovacuum runs and finishes in a reasonable time, but it cannot clean up the dead tuples. That may be because there is a long-running open transaction, an abandoned replication slot or a prepared transaction. See this article for details.
autovacuum runs too slow, so that dead rows are generated faster than it can clean them up. The symptoms are lots of dead tuples in pg_stat_user_tables and autovacuum processes that keep running forever. The straightforward solution is to use ALTER TABLE to increase autovacuum_vacuum_cost_limit or reduce autovacuum_vacuum_cost_delay for the afflicted table. An alternative approach, if possible, is to use HOT updates.

Why does Postgres VACUUM FULL ANALYZE gives performance boost but VACUUM ANALYZE does not

I have a large database with the largest tables having more than 30 million records. The database server is a dedicated server with 64 cores, 128 GB RAM running ubuntu and postgres 12. So the server is more powerful than we normally need. The server receives around 300-400 new records every second.
The problem is that almost after 1 week or 10 days of use the database becomes extremely slow, therefore we have to perform VACUUM FULL ANALYZE, and after this everything goes back to normal. But we have to put our server in maintenance mode and then perform this operation every week which is a pain.
I came up with the idea that we don't need a VACUUM FULL and we can just run ANALYZE on the database as it can run in parallel, but this didn't work. There was no performance gains after running this. Even when i run simple VACUUM on the whole database and then run ANALYZE after it, it still doesn't give the kind of performance boost that we get from VACUUM FULL ANALYZE.
I know that VACUUM FULL copies the data from the old table to a new tables and deletes the old table. But what else does it do?
Update:
So i have also reindexed the 15 largest tables, in order to confirm if this would speed up the database. But this also didnt work.
So i had to execute VACUUM FULL ANALYZE, as i didnt see any other way. Now i am trying to identify the slow queries.
Thanks to jjanes, i was able to install Track_io_timing and also identified a few queries where indexes can be added. I am using like this
SELECT * FROM pg_stat_statements ORDER BY total_time DESC;
And i get this result.
userid | 10
dbid | 16401
queryid | -3264485807545194012
query | update events set field1 = $1, field2 = $2 , field3= $3, field4 = $4 , field5 =$5 where id = $6
calls | 104559
total_time | 106180828.60536088
min_time | 3.326082
max_time | 259055.09376800002
mean_time | 1015.5111334783633
stddev_time | 1665.0715182035976
rows | 104559
shared_blks_hit | 4456728574
shared_blks_read | 4838722113
shared_blks_dirtied | 879809
shared_blks_written | 326809
local_blks_hit | 0
local_blks_read | 0
local_blks_dirtied | 0
local_blks_written | 0
temp_blks_read | 0
temp_blks_written | 0
blk_read_time | 15074237.05887792
blk_write_time | 15691.634870000113
This query simply updates 1 record, and the table size is around 30 Million records.
Question: This query already uses an index, can you please guide on what should be the next step and why is this slow? Also IO information does this show?
As you say, VACUUM FULL is an expensive command. PGs secret weapon is AUTOVACUUM, which monitors database stats and attempts to target tables with dead tuples. Read about how to tune it for the database as a whole, and possibly for big tables.

PostgreSQL11 space reuse under high delete/update rate

We are evaluating PostgreSQL 11.1 for our production.
Having a system with 4251 updates per second, ~1000 deletes per second and ~3221 inserts per second and 1 billion transactions per day, we face a challenge where PostgreSQL does not reuse its (delete/update) space, and tables constantly increase in size.
We configured aggressive Autovacuum settings to avoid the wraparound situation. also tried adding periodic execution of vacuum analyze and vacuum –
and still there is no space reuse. (Only vacuum full or pg_repack release space to operating system – but this is not a reuse.)
Following are our vacuum settings:
autovacuum | on
vacuum_cost_limit | 6000
autovacuum_analyze_threshold | 50
autovacuum_vacuum_threshold | 50
autovacuum_vacuum_cost_delay | 5
autovacuum_max_workers | 32
autovacuum_freeze_max_age | 2000000
autovacuum_multixact_freeze_max_age | 2000000
vacuum_freeze_table_age | 20000
vacuum_multixact_freeze_table_age | 20000
vacuum_cost_page_dirty | 20
vacuum_freeze_min_age | 10000
vacuum_multixact_freeze_min_age | 10000
log_autovacuum_min_duration | 1000
autovacuum_naptime | 10
autovacuum_analyze_scale_factor | 0
autovacuum_vacuum_scale_factor | 0
vacuum_cleanup_index_scale_factor | 0
vacuum_cost_delay | 0
vacuum_defer_cleanup_age | 0
autovacuum_vacuum_cost_limit | -1
autovacuum_work_mem | -1
Your requirements are particularly hard for PostgreSQL.
You should set autovacuum_vacuum_cost_delay to 0 for that table.
Reset autovacuum_max_workers and autovacuum_naptime back to their default values.
Reset autovacuum_vacuum_scale_factor and autovacuum_analyze_scale_factor to their default values or slightly lower values.
Your problem is not that autovacuum does not run often enough, the problem is rather that it is too slow to keep up.
Even with that you might only be able to handle this workload with HOT updates:
Make sure that the attributes that are updated a lot are not part of any index.
Create the table with a fillfactor below 100, say 70.
HOT update often avoids the need for VACUUM and the need to update indexes.
Check the n_tup_hot_upd column of pg_stat_us-er_tables to see if it works.

Do I need to manually VACUUM temporary tables in PostgreSQL?

Consider I have an application server which:
uses connection pooling (with a relatively high number of allowed idle connections),
can run for months, and
makes heavy use of temporary tables (which are not DROP'ped on COMMIT).
The above means that I may have N "eternal" database sessions "holding" N temporary tables, which will only be dropped when the server is restarted.
I'm well aware that the autovacuum daemon can't access those temporary tables.
My question is, if I make frequent INSERT's to and DELETE's from temporary tables, and the tables are supposed to "live" for a long time, then do I need to manually VACUUM those tables after a deletion, or a single manual ANALYZE would be enough?
Currently, if I execute
select
n_tup_del,
n_live_tup,
n_dead_tup,
n_mod_since_analyze,
vacuum_count,
analyze_count
from
pg_stat_user_tables
where
relname = '...'
order by
n_dead_tup desc;
I see the that vacuum_count is always zero:
n_tup_del n_live_tup n_dead_tup n_mod_since_analyze vacuum_count analyze_count
64 3 64 0 0 16
50 1 50 26 0 3
28 1 28 2 0 5
7 1 7 4 0 4
3 1 3 2 0 4
1 6 1 8 0 2
0 0 0 0 0 0
which may mean that manual VACUUM is indeed required.
https://www.postgresql.org/docs/current/static/sql-commands.html
ANALYZE — collect statistics about a database
VACUUM — garbage-collect
and optionally analyze a database
vacuum can optionaly also analyze. So if all you want - fresh stats - just analyze. If you want to "recover" unused rows, then vacuum. I f you want both, use vacuum analyze
We had and application which was running for 24+ hours using a lot of long living quite heavy updated temp tables and we used ANALYZE on them. But there is a problem with VACUUM - if you try to use in function you get an error:
ERROR: VACUUM cannot be executed from a function or multi-command string
CONTEXT: SQL statement "vacuum xxxxxx"
PL/pgSQL function inline_code_block line 4 at SQL statement
SQL state: 25001
But later we discovered, that temp tables actually were not so advantageous at least for our app. Technically they are normal tables existing as datafiles on disk in so called temporary tablespace (either pg_default or you can set it in postgresql.conf file). But they use only so called temp_buffers - they are not loaded into shared_buffers. So you have to set temp_buffers properly and rely more on Linux cache. And as you already mentioned - autovacuum daemon "does not see" them. Therefore we later switched to using normal tables.

Statement is 250 times slower than usual

I am running a series of very long statements, moving a lot of data. The statement in question looks like something along the lines of this:
CREATE TABLE a (...);
WITH cte_1 AS (...),
cte_2 AS (...)
INSERT INTO a (...)
SELECT ....
This creates the table and populates it with roughly 60 000 large rows. Usually it was taking around 1 second to perform this statement. "Usually" means that the exact same environment (all tables and data are created by a script - no manual interaction, so all instances of the same environment are identical when it comes to data and data structure) but on a different machine, takes just 1 second to execute this.
But on a new machine that I have, this statement suddenly takes 4.5 minutes to complete. During that time Postgresql takes up 100% of a CPU core. During that time, if I open up a new connection, say, with DBeaver, and run the exact same query, with a single change (creating table b instead, and inserting there, from the exact same data sources), it takes 0.8 seconds to complete, during the time that the first query is running.
So it's definitely not the script, but rather something about the inner workings of Postgresql, or its config. Which is why I'm sharing it, instead of the code.
Oh, and this query:
SELECT
pid, datname, usename,
application_name, query, state,
to_char(current_timestamp - query_start, 'HH24:MI:SS') AS running_for
FROM pg_stat_activity;
outputs 2 DBeaver processes (SHOW search_path which is idle, and the query above), and the slow query:
9736 my_db my_user psql active 00:02:42
Out of hundreds of statements, in various schemas, with various complexity, this is the only one affected. The only thing that was modified that made it slow, is the new OS (Ubuntu 17.04), with probably this new config, since the old one was lost because my mac died.
data_directory = '/var/lib/postgresql/9.6/main'
hba_file = '/etc/postgresql/9.6/main/pg_hba.conf'
ident_file = '/etc/postgresql/9.6/main/pg_ident.conf'
external_pid_file = '/var/run/postgresql/9.6-main.pid'
listen_addresses = '*'
port = 5432
max_connections = 40
unix_socket_directories = '/var/run/postgresql'
shared_buffers = 4GB
temp_buffers = 2GB
work_mem = 512MB
maintenance_work_mem = 2GB
dynamic_shared_memory_type = posix
wal_level = minimal
fsync = off
synchronous_commit = off
full_page_writes = off
wal_buffers = 16MB
max_wal_size = 4GB
checkpoint_completion_target = 0.9
seq_page_cost = 1.0
random_page_cost = 1.5
effective_cache_size = 12GB
default_statistics_target = 500
logging_collector = on
log_directory = 'pg_log'
log_filename = 'query.log'
log_min_duration_statement = 0
debug_print_parse = off
debug_print_rewritten = off
debug_print_plan = off
debug_pretty_print = on
log_checkpoints = off
log_connections = off
log_disconnections = off
session_preload_libraries = 'auto_explain'
auto_explain.log_min_duration = '2s'
auto_explain.log_nested_statements = true
auto_explain.log_verbose = true
autovacuum = on
autovacuum_max_workers = 1
datestyle = 'iso, mdy'
timezone = 'UTC'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
default_text_search_config = 'pg_catalog.english'
max_locks_per_transaction = 2048
shared_preload_libraries = 'cstore_fdw'
Per request, this is an old backup that I had, of another config, where I manually adjusted just 1 item (shared_buffers), and the rest is pretty much default.
Update
Skipped old config
I replaced the config with the old one, and still got the same issue, except now everything was slower.
Notable update
Query became lightning fast again when I added
ANALYZE source_table1;
ANALYZE source_table2;
ANALYZE source_table3;
on the largest tables that were queried, before running the query. I didn't have to do this before and it worked perfectly fine.
This is one scenario which could explain the behaviour which you are seeing. This assumes that the source_table{1,2,3} are rebuild straight before the query is computed (as it would happen when it is part of an ETL):
Before:
source tables for the query are created
autovacuum has time to do a ANALYZE on the table while some other process finishes
postgres chooses the correct plan on the query
If now the data or the ETL changes a bit and this results in postgres having no time for autovacuum before the query, then statistics are off and the query execution time explodes.