Tuning autovacuum for update heavy tables - postgresql

I have a very update heavy table.
The problem is that the table grows a lot, as the autovacuum can not catch up.
The autovacuum kick in every 2 minutes, so it is running file.
I have an application which is making like 50k updates(and a few inserts) every 60-70 seconds.
I am forced to do CLUSTER every week, which is not the best solution IMO.
My autovacuum settings are as follows (plus related):
maintenance_work_mem = 2GB
autovacuum_vacuum_cost_limit = 10000
autovacuum_naptime = 10
autovacuum_max_workers = 5
autovacuum_vacuum_cost_delay = 10ms
vacuum_cost_limit = 1000
autovacuum_vacuum_scale_factor=0.1 # this is set only for this table
Here is the autovacuum log:
2018-01-02 12:47:40.212 UTC [2289] LOG: automatic vacuum of table
"mydb.public.sla": index scans: 1 pages: 0 removed, 21853 remain, 1
skipped due to pins, 0 skipped frozen tuples: 28592 removed, 884848
remain, 38501 are dead but not yet removable buffer usage: 240395
hits, 15086 misses, 22314 dirtied avg read rate: 29.918 MB/s, avg
write rate: 44.252 MB/s system usage: CPU 0.47s/2.60u sec elapsed 3.93
sec
It seems to me that 29 MB/s is way to low, considering my hardware is pretty OK
I have Intel NVMe devices, and my I/O doesn't seem to be saturated.
Postgresql version: 9.6
CPU : Dual Intel(R) Xeon(R) CPU E5-2620 v4 # 2.10GHz
Any ideas how to improve this?

Related

Issue : Postgresql DB Latency issue during load testing and at production environment

Explanation: We are able to run 200 TPS on 16 core cpu successfully with around 40% cpu utilization, 50-60 concurrent locks. However when we increase the load upto 300 TPS DB response gets slow post 15-20mins run. System shows dead tuple of 2-4%.
Observation : CPU and other resources remain stable and if we perform vacuum during system latency, performance gets increased. However after 15-20 mins system again starts getting slow.
CPU : 16 core , RAM 128 GB, DB size 650 GB

What is autovacuum_vacuum_cost_delay in autovacuum in PostgreSQL?

I am trying to tweak the PostgreSQL server with the following config parameters in the configuration file:
autovacuum_freeze_max_age = 500000000
autovacuum_max_workers = 6
autovacuum_naptime = '15s'
autovacuum_vacuum_cost_delay = 0
maintenance_work_mem = '10GB'
vacuum_freeze_min_age = 10000000
I want to understand what autovacuum_vacuum_cost_delay does. I tried searching a lot on google but didn't get anything good enough to understand.
The documentation refers to vacuum_cost_delay, which says:
The amount of time that the process will sleep when the cost limit has been exceeded.
This is a feature to slow down autovacuum. Whenever an autovacuum worker does work (delete a tuple, read a block, ...), it collects work points. As soon as these points exceed autovacuum_vacuum_cost_limit (200 by default), it makes a pause of
autovacuum_vacuum_cost_delay.
The reason is that VACUUM uses a lot of resources, so it is slowed down by default, in the hope not to be disruptive to normal operation. However, if autovacuum is too slow, you can end up with bloated tables.

PostgreSQL autovacuum causing significant performance degradation

Our Postgres DB (hosted on Google Cloud SQL with 1 CPU, 3.7 GB of RAM, see below) consists mostly of one big ~90GB table with about ~60 million rows. The usage pattern consists almost exclusively of appends and a few indexed reads near the end of the table. From time to time a few users get deleted, deleting a small percentage of rows scattered across the table.
This all works fine, but every few months an autovacuum gets triggered on that table, which significantly impacts our service's performance for ~8 hours:
Storage usage increases by ~1GB for the duration of the autovacuum (several hours), then slowly returns to the previous value (might eventually drop below it, due to the autovacuum freeing pages)
Database CPU utilization jumps from <10% to ~20%
Disk Read/Write Ops increases from near zero to ~50/second
Database Memory increases slightly, but stays below 2GB
Transaction/sec and ingress/egress bytes are also fairly unaffected, as would be expected
This has the effect of increasing our service's 95th latency percentile from ~100ms to ~0.5-1s during the autovacuum, which in turn triggers our monitoring. The service serves around ten requests per second, with each request consisting of a few simple DB reads/writes that normally have a latency of 2-3ms each.
Here are some monitoring screenshots illustrating the issue:
The DB configuration is fairly vanilla:
The log entry documenting this autovacuum process reads as follows:
system usage: CPU 470.10s/358.74u sec elapsed 38004.58 sec
avg read rate: 2.491 MB/s, avg write rate: 2.247 MB/s
buffer usage: 8480213 hits, 12117505 misses, 10930449 dirtied
tuples: 5959839 removed, 57732135 remain, 4574 are dead but not yet removable
pages: 0 removed, 6482261 remain, 0 skipped due to pins, 0 skipped frozen
automatic vacuum of table "XXX": index scans: 1
Any suggestions what we could tune to reduce the impact of future autovacuums on our service? Or are we doing something wrong?
If you can increase autovacuum_vacuum_cost_delay, your autovacuum would run slower and be less invasive.
However, it is usually the best solution to make it faster by setting autovacuum_vacuum_cost_limit to 2000 or so. Then it finishes faster.
You could also try to schedule VACUUMs of the table yourself at times when it hurts least.
But frankly, if a single innocuous autovacuum is enough to disturb your operation, you need more I/O bandwidth.

why would PostgreSQL connection establishment be CPU-bound

I have a C# backend running on AWS Lambda. It connects to a PostgreSQL DB in the same region.
I observed extremely slow cold-startup execution time after adding the DB connection code. After allocating more memory to my Lambda function, the execution time has significantly reduced.
The execution time as measured by Lambda console:
128 MB (~15.5 sec)
256 MB (~9 sec)
384 MB (~6 sec)
512 MB (~4.5 sec)
640 MB (~3.5 sec)
768 MB (~3 sec)
In contrast, after commenting out the DB connection code:
128 MB (~5 sec)
256 MB (~2.5 sec)
So opening a DB connection has contributed a lot to the execution time.
According to AWS documentation:
In the AWS Lambda resource model, you choose the amount of memory you
want for your function, and are allocated proportional CPU power and
other resources.
Since the peak memory usage has consistently stayed at ~45 MB, this phenomenon seems to suggest that database connection establishment is a computationally intensive activity. If so, why?
Code in question (I'm using Npgsql):
var conn = new NpgsqlConnection(Db.connStr);
conn.Open();
using(conn)
{ // print something }
Update 1
I set up a MariaDB instance with the same configuration and did some testing.
Using MySql.Data for synchronous operation:
128 MB (~12.5 sec)
256 MB (~6.5 sec)
Using MySqlConnector for asynchronous operation:
128 MB (~11 sec)
256 MB (~5 sec)
Interestingly, the execution time on my laptop has increased from ~4 sec (for Npgsql) to 12~15 sec (for MySql.Data and MySqlConnector).
At this point, I'll just allocate more memory to the Lambda function to solve this issue. But if anyone knows why the connection establishment took so long, I'd appreciate an answer. I might post some profiling results later.

Postgres: Checkpoints Are Occurring Too Frequently

We have a powerful Postgres server (64 cores, 384 GB RAM, 16 15k SAS drives, RAID 10), and several times during the day we rebuild several large datasets, which is very write intensive. Apache and Tomcat also run on the same server.
We're getting this warning about 300 times a day, while rebuilding these datasets, with long stretches where the errors are averaging 2 - 5 seconds apart:
2015-01-15 12:32:53 EST [11403]: [10841-1] LOG: checkpoints are occurring too frequently (2 seconds apart)
2015-01-15 12:32:56 EST [11403]: [10845-1] LOG: checkpoints are occurring too frequently (3 seconds apart)
2015-01-15 12:32:58 EST [11403]: [10849-1] LOG: checkpoints are occurring too frequently (2 seconds apart)
2015-01-15 12:33:01 EST [11403]: [10853-1] LOG: checkpoints are occurring too frequently (3 seconds apart)
These are the related settings:
checkpoint_completion_target 0.7
checkpoint_segments 64
checkpoint_timeout 5min
checkpoint_warning 30s
wal_block_size 8192
wal_buffers 4MB
wal_keep_segments 5000
wal_level hot_standby
wal_receiver_status_interval 10s
wal_segment_size 16MB
wal_sync_method fdatasync
wal_writer_delay 200ms
work_mem 96MB
shared_buffers 24GB
effective_cache_size 128GB
So that means we're writing 1024 MB worth of WAL files every 2 - 5 seconds, sometimes sustained for 15 - 30 minutes.
1) Do you see any settings we can improve on? Let me know if you need other settings documented.
2) Could we use "SET LOCAL synchronous_commit TO OFF;" at the beginning of these write-intensive transactions to let these WAL writes happen a bit more in the background, having less impact on the rest of the operations?
The data we're rebuilding is stored elsewhere, so on the off chance the power failed AND the RAID battery backup didn't do it's job, we're not out anything once the dataset gets rebuilt again.
Would "SET LOCAL synchronous_commit TO OFF;" cause any problems if this continues for 15 - 30 minutes? Or cause any problems with our streaming replication, which uses WAL senders?
Thanks!
PS. I'm hoping Samsung starts shipping their SM1715 3.2 TB PCIe enterprise SSD, since I think it would solve our problems nicely.
Your server is generating so much WAL data due to the wal_level set to hot_standby. I'm assuming you need this, so the best option to avoid the warnings is to increase your checkpoint_segments. But they are just that - warnings - it's quite common and perfectly normal to see them during bulk updates and data loads. You just happen to be updating frequently.
Changing synchronous_commit does not change what is written to the WAL, but rather the timing of when the commit returns to allow the OS to buffer those writes.
It may not apply to your schema, but you could potentially save some WAL data by using unlogged tables for your data rebuilds. Your replicas wouldn't have access to those tables, but after the rebuild you would be able to update your logged tables from their unlogged siblings.