Postgres - autovacuum questions - postgresql

I have a couple of questions about autovacuum:
Is it safe to turn of vacuuming before a long process of loading/updating data or big queries?
What are the recommended configurations for autovacuum?

Ad 1: Yes, it is safe, but you need to restart PostgreSQL to change the autovacuum setting.
Ad 2: It is recommended that you stick with the default values. If you notice that autovacuum cannot keep up, consider raising autovacuum_max_workers and autovacuum_vacuum_cost_limit.

Related

postgres autovacuum retry mechanism

In postgres, when autovacuum runs and for some reason say its able to perform autovacuum - for example when hot_standby_feedback is set and there are long running queries on standby. Say for example tab1 has been updated and it triggers autovacuum, meanwhile a long running query on standby is running and sends this info to primary which will skip the vacuum on tab1.
Since the autovacuum got skipped for tab1, When does autovacuum run vacuum on the table again? Or it will not run autovacuum again on it and we would need to manually run vacuum on that table. Basically does autovacuum retry autovacuum on tables that could not be vacuumed for the first time?
Autovacuum does not get skipped due to hot_standby_feedback. It still runs, it just might not accomplish anything if no rows can be removed. If this is the case, then pg_stat_all_tables.n_dead_tup does not get decremented, which means that the table will probably get autovacuumed again the next time the database is assessed for autovacuuming as the stats that make it eligible have not changed. On an otherwise idle system, this will happen about once every however long it takes to scan not-all-visible part of the table, rounded up to the next increment of autovacuum_naptime.
It might be a good idea (although the use case is narrow enough that I doubt it) to suppress repeat autovacuuming of a table until the horizon has advanced far enough to make it worthwhile, but currently there is no code to do this.
Note that this differs from INSERT driven autovacuums. There, n_ins_since_vacuum does get reset, even if none of the tuples were marked all visible. So that vacuum will not get run again until the table cross some other threshold to make it eligible.

Can PostgreSQL be configured so that occasional mass updates can run super-fast?

I have setup a PostgreSQL test environment which is required to contain the same amount of data (number of rows) as the production database, and be configured mostly production-like to simulate the same performance for normal transactions.
However, it being a test environment, occasionally has to have some unique, experimental, temporary, or ad-hoc changes applied. For instance, adding or removing some indexes before a performance test, recalculating the value of a column to replicate test conditions, dumping and reimporting whole tables, etc.
Is there a way to temporarily suspend data integrity guarantees in order to perform such types of mass update as fast as possible?
For example, in MySQL you can set over-sized write buffers, disable transaction logging, and suspend disk flushes on transaction commit. Is there something similar in pgsql?
The deployment environment is AWS EC2.
The manual has a chapter dedicated to initial loading of a database.
There are some safe options you can change to make things faster:
increasing max_wal_size
increasing checkpoint_timeout
wal_level to minimal
wal_log_hints to off
synchronous_commit to off
Then there are some rather unsafe options to make things faster. Unsafe meaning: you can lose all your data if the server crashes - so use at your own risk!
full_page_writes to off
fsync to off
Again: by changing the two settings above, you risk losing all your data.
To disable WAL you could also set all tables to unlogged
You can disable WAL logging with ALTER TABLE ... SET UNLOGGED, but be aware that the reverse operation will dump the whole table to WAL.
If that is not feasible, you can boost performance by setting max_wal_size hugh so that you get fewer checkpoints.
WAL flushing is disabled by setting fsync = off.
Be aware that the first and third measure will wreck your database in the event of a crash.

Did Postgresq vacuum execute when performing JDBC transaction?

From PostgreSQL blog
VACUUM reclaims storage occupied by dead tuples. In normal PostgreSQL operation, tuples that are deleted or obsoleted by an update are not physically removed from their table; they remain present until a VACUUM is done. Therefore it's necessary to do VACUUM periodically, especially on frequently-updated tables.
So, i had a performance optimization issue in java application with jdbc. So my question is did VACUUM executed on somewhere in JDBC transaction or need to set explicitly?
While that quotation is saying the truth, it omits the fact that for a decade or so PostgreSQL has been having the autovacuum daemon that does this job for you automatically.
So normally you don't have to concern yourself with that. Only on tables with very high write activity you have to tune autovacuum to be more aggressive, and you may need the occasional VACUUM (FULL) if you bulk-delete a large percentage of a table.
Performance issues are normally not connected with VACUUM (except that sequential scans take longer on bloated tables), so the connection is not clear to me.
You may control the running of VACUUM by do appropriate settings described in the section
https://www.postgresql.org/docs/current/static/runtime-config-autovacuum.html#GUC-AUTOVACUUM-FREEZE-MAX-AGE

auto vacuum vs vacuum in postgresql

Postgresql has the functionality of Vacuum for recollecting the space occupied by dead tuples. Auto vacuum is on by default and runs according to the configuration setting.
When I check the output of pg_stat_all_tables i.e. last_vacuum and last_autovacuum, autovacuum was never run for most of the tables in the database which have enough number of dead tuples(more than 1K). We also get a time window of 2-3 hours when these tables are used rarely.
Below are autovacuum settings for my database
below is the output of pg_stat_all_tables
I want to ask that is it a good idea to depend only on auto vacuum?
Are there any special setting required for autovacuum to function properly?
Should we set up a manual vacuum? Should we use both in parallel or just turn off autovacuum and use manual vacuum only?
You should definitely use autovacuum.
Are there any autovacuum processes running currently?
Does a manual VACUUM on such a table succeed?
Set log_autovacuum_min_duration = 0 to get information about autovacuum processing in the logs.
If system activity is too high, autovacuum may not be able to keep up. In this case it is advisable to configure autovacuum to be more aggressive, e.g. by setting autovacuum_vacuum_cost_limit = 1000.
https://www.postgresql.org/docs/current/static/routine-vacuuming.html
PostgreSQL databases require periodic maintenance known as vacuuming.
For many installations, it is sufficient to let vacuuming be performed
by the autovacuum daemon, which is described in Section 24.1.6. You
might need to adjust the autovacuuming parameters described there to
obtain best results for your situation. Some database administrators
will want to supplement or replace the daemon's activities with
manually-managed VACUUM commands, which typically are executed
according to a schedule by cron or Task Scheduler scripts.
vacuum creates significant IO, asjust https://www.postgresql.org/docs/current/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-VACUUM-COST to fit your needs.
Also you can set autovacuum settings per table, to be more "custom" https://www.postgresql.org/docs/current/static/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS
the above will give you the idea why your 1K dead tuples might be not enough for autovacuum and how to change it.
manual VACUUM is a perfect solution for one time run, while to run the system I'd definitely rely on autovacuum daemon

Postgres - Manual Locking Clashing with Auto Vacumn

I'm working an part of a system where tables need to be locked in "IN SHARE ROW EXCLUSIVE MODE NOWAIT" mode.
The problem is that the Autovacumn daemon (autovacuum: ANALYZE) kicks in immediately on a table after it has been worked on - stopping the next process from taking a "IN SHARE ROW EXCLUSIVE MODE NOWAIT".
The tables are large so the vacuum takes a while. I could check for this before the transaction that is trying to lock starts and use pg_cancel_backend to stop the autovacuum daemon. Are there any consequences to this? The other option is to manually schedule vacuuming but everywhere you read it's best to avoid this. Are there any parameters which I have missed that could tweak the autovacuum to stop it from behaving like this?
Thanks
If you keep autovacuum from working, you will incur table bloat and eventually risk downtime when PostgreSQL shuts down to avoid data loss due to transaction ID wraparound.
Disabling autovacuum for this table (ALTER TABLE ... SET (autovacuum_enabled = off)) and regularly scheduling a manual VACUUM is an option.
But it would be best to change your procedure so that you don't have to lock tables explicitly all the time.