We've been experiencing some connection spikes on pgbouncer connected to a postgres database. When we query pg_stat_activity during these spikes we see tons of active queries with wait_event WALWriteLock.
We changed some of our highly inserted tables to unlogged, yet inserts into these tables are still showing up during the spikes with a wait_event of WALWriteLock. I thought that if a table is unlogged, then inserts into it wouldn't get caught up waiting for WALWriteLock. What gives?
Further, any suggestions on how to stop these spikes?
WALWriteLock: accrued by PostgreSQL processes while WAL records are flushed to disk or during a WAL segments switch. synchronous_commit=off removes the wait for disk flush, full_page_writes=off reduces the amount of data to flush.
You can try above also mentioned in dec :-
https://www.percona.com/blog/2018/10/30/postgresql-locking-part-3-lightweight-locks/
One possible answer is that despite of not writing the table itself to WAL, COMMITs are still written to WAL, and if you make lots of tiny transactions this might show up. See if you can group inserts to smaller number of transactions.
Related
I have a scenario that repeats itself every few hours. In every few hours, there is a sudden increase in row exclusive locks in PostgreSQL DB. In Meantime there seems that some queries are not responded in time and causes connection exhaustion to happen that PostgreSQL does not accept new clients anymore. After 2-3 minutes locks and connection numbers drops and the system comes back to normal state again.
I wonder if auto vacuum can be the root cause of this? I see analyze and vacuum (NOT FULL VACCUM) take about 20 seconds to complete on one of the tables. I have INSERT,SELECT,UPDATE and DELETE operations going on from my application and I don't have DDL commands (ALTER TABLE, DROP TABLE, CREATE INDEX, ...) going on. Can auto vacuum procedure conflict with queries from my application and cause them to wait until vacuum has completed? Or it's all the applications and my bad design fault? I should say one of my tables has a field of type jsonb that keeps relatively large data for each row (10 MB roughly).
I have attached an image from monitoring application that shows the sudden increase in row exclusive locks.
ROW EXCLUSIVE locks are perfectly harmless; they are taken on tables against which DML statements run. Your graph reveals nothing. You should set log_lock_waits = on and log_min_duration_statement to a reasonable value. Perhaps you can spot something in the logs. Also, watch out for long running transactions.
Postgres doc tells that partitioned tables are not processed by autovacuum. But still I see that last_autovacuum column from pg_stat_user_tables is populated with recent timestamps for live partitions.
Does it mean that these timestamps are set by the background worker which only prevents transaction ID wraparound, without actually performing ANALYZE&VACUUM? Or whatever else could populate them?
Besides, taken that partitions are large and active enough, should I run the both ANALYZE and VACUUM manually on those partitions? If yes, does the order matter?
UPDATE
I'm trying to elaborate, thanks to the comments given.
Taking that vacuum should work the same way on partition as on the regular table, what could be a reason for much faster growth of the occupied disk space after partitioning? Before partitioning it was nearly a linear function of records count.
What is confusing as well, when looking for autovacuum processes running I see that those related to partitions are denoted with "to prevent wraparound", while others are not. Is it absolutely a coincidence or there is something to check?
Documentation describes partitioned table as rather a virtual entity, without its own storage. What is the point in denoting that it is not vacuumed?
The statement from the documentation is true, but misleading. Autovacuum does not process the partitioned table itself, but it processes the partitions, which are regular PostgreSQL tables. So dead tuples get removed, the visibility map gets updated, and so on. In short, there is nothing to worry about as far as vacuuming is concerned. Remember that the partitioned table itself does not hold any data!
What the documentation warns you about is ANALYZE. Autovacuum also launches automatic ANALYZE jobs to collect accurate table statistics. This will be work fine on the partitions, but there are no table statistics collected on the partitioned table itself, so you have to run ANALYZE manually on the partitioned table to get these data. In practice, I find that not to be a problem, since the optimizer generates plans for each individual partition anyway, and there it has accurate statistics.
My long running SELECT queries against Hot stand-by are failing apparently due to replay on the standby leading to vacuum of some of the rows matching my query.
Is there support for an option where I can ask the Hot stand-by server to not bother about such changes to the rows (even the rows that were updated/deleted) and continue with the scan, for my query?
Or, is dropping all queries where a matching row was cleaned up during replay, vaccumm something the server always does and there's no other way supported.
You can use hot_standby_feedback to tell the primary server to not vacuum rows that the standby server is still using. If you are concerned about affecting the primary in this way, you could instead use one of max_standby_streaming_delay or max_standby_archive_delay (depending on if you are streaming or copying log files).
These are all detailed here: https://www.postgresql.org/docs/current/runtime-config-replication.html
With slow query logging turned on, we see a lot of COMMITs taking upwards of multiple seconds to complete on our production database. On investigation, these are generally simple transactions: fetch a row, UPDATE the row, COMMIT. The SELECTs and UPDATEs in these particular transactions aren't being logged as slow. Is there anything we can do, or tools that we can use, to figure out the reason for these slow commits? We're running on an SSD, and are streaming to a slave, if that makes a difference.
Postgres commits are synchronous. This means they will wait for the WAL writes to complete before moving to the next one. You can adjust the WAL settings in the config file to adjust for this.
You can set the commit level to asynchronous at a session/user level or database wide with the synchronous_commit in the config file.
On the database side.
Vacuum your tables an update the statistics. This will get rid of dead tuples since your performing updates, there will be many.
VACUUM ANALYZE
I have found a bug in my application code where I have started a transaction, but never commit or do a rollback. The connection is used periodically, just reading some data every 10s or so. In the pg_stat_activity table, its state is reported as "idle in transaction", and its backend_start time is over a week ago.
What is the impact on the database of this? Does it cause additional CPU and RAM usage? Will it impact other connections? How long can it persist in this state?
I'm using postgresql 9.1 and 9.4.
Since you only SELECT, the impact is limited. It is more severe for any write operations, where the changes are not visible to any other transaction until committed - and lost if never committed.
It does cost some RAM and permanently occupies one of your allowed connections (which may or may not matter).
One of the more severe consequences of very long running transactions: It blocks VACUUM from doing it's job, since there is still an old transaction that can see old rows. The system will start bloating.
In particular, SELECT acquires an ACCESS SHARE lock (the least blocking of all) on all referenced tables. This does not interfere with other DML commands like INSERT, UPDATE or DELETE, but it will block DDL commands as well as TRUNCATE or VACUUM (including autovacuum jobs). See "Table-level Locks" in the manual.
It can also interfere with various replication solutions and lead to transaction ID wraparound in the long run if it stays open long enough / you burn enough XIDs fast enough. More about that in the manual on "Routine Vacuuming".
Blocking effects can mushroom if other transactions are blocked from committing and those have acquired locks of their own. Etc.
You can keep transactions open (almost) indefinitely - until the connection is closed (which also happens when the server is restarted, obviously.)
But never leave transactions open longer than needed.
There are two major impacts to the system.
The tables that have been used in those transactions:
are not vacuumed which means they are not "cleaned up" and their statistics aren't updated which might lead to bad (=slow) execution plans
cannot be changed using ALTER TABLE