I have a table of records that is populated sequentially once, but then every record is updated (the order in which they are updated and the timing of the updates are both random). The updates are not HOT updates. Is there any advantage to setting my fillfactor for this table to 50, or even less than 50, given these facts?
Ok, as you mentioned in the comments to your question, you're making changes in your table using transactions updating 1-10k records in each transaction. This is right approach leaving some chances to autovacuum to make its work. But table's fillfactor is not the first thing I'd check/change. Fillfactor can help you to speed up the process, but if autovacuum is not aggressive enough, you'll get very bloated table and bad performance soon.
So, first, I'd suggest you to control your table's bloating level. There is a number of queries which can help you:
https://wiki.postgresql.org/wiki/Show_database_bloat
http://blog.ioguix.net/postgresql/2014/09/10/Bloat-estimation-for-tables.html
https://github.com/ioguix/pgsql-bloat-estimation/blob/master/table/table_bloat-82-84.sql
https://github.com/dataegret/pg-utils/blob/master/sql/table_bloat.sql
(and for indexes:
https://github.com/dataegret/pg-utils/blob/master/sql/index_bloat.sql;
these queries require pgstattuple extension)
Next, I'd tune autovacuum to much more aggressive state than default, like this (this is usually good idea even if you don't need to process whole table in short period of time), something like this:
log_autovacuum_min_duration = 0
autovacuum_vacuum_scale_factor = 0.01
autovacuum_analyze_scale_factor = 0.05
autovacuum_naptime = 60
autovacuum_vacuum_cost_delay = 20
After some significant number of transactions with UPDATEs, check the bloating level.
Finally, yes, I'd tune fillfactor but probably to some higher (and more usual) value like 80 or 90 – here you need to make some predictions, what is the probability that 10% or more tuples inside a page will be updated by the single transaction? If the chances are very high, reduce fillfactor. But you've mentioned that order of rows in UPDATEs is random, so I'd use 80-90%. Keep in mind that there is an obvious trade-off here: if you set fillfactor to 50, your table will need 2x more disk space and all operations will naturally become slower. If you want to go deep to this question, I suggest creating 21 tables with fillfactors 50..100 with the same data and testing UPDATE TPS with pgbench.
Related
Summary
I am using Postgres UPSERTs in our ETLs and I'm experiencing issues with fragmentation and bloat on the tables I am writing to, which is slowing down all operations including reads.
Context
I have hourly batch ETLs upserting into tables (tables ~ 10s of Millions, upserts ~ 10s of thousands) and we have auto vacuums set to thresholds on AWS.
I have had to run FULL vacuums to get the space back and prevent processes from hanging. This has been exacerbated now as the frequency of one of our ETLs has increased, which populates some core tables which are the source for a number of denormalised views.
It seems like what is happening is that tables don't have a chance to be vacuumed before the next ETL run, thus creating a spiral which eventually leads to a complete slow-down.
Question!
Does Upsert fundamentally have a negative impact on fragmentation and if so, what are other people using? I am keen to implement some materialised views and move most of our indexes to the new views while retaining only the PK index on the tables we are writing to, but I'm not confident that this will resolve the issue I'm seeing with bloat.
I've done a bit of reading on the issue but nothing conclusive, for example --> https://www.targeted.org/articles/databases/fragmentation.html
Thanks for your help
It depends. If there are no constraint violations, INSERT ... ON CONFLICT won't cause any bloat. If it performs an update, it will produce a dead row.
The measures you can take:
set autovacuum_vacuum_cost_delay = 0 for faster autovacuum
use a fillfactor somewhat less than 100 and have no index on the updated columns, so that you can get HOT updates, which make autovacuum unnecessary
It is not clear what you are actually seeing. Can you turn track_io_timing on, and then do an EXPLAIN (ANALYZE, BUFFERS) for the query that you think has been slowed down by bloat?
Bloat and fragmentation aren't the same thing. Fragmentation is more an issue with indexes under some conditions, not the tables themselves.
It seems like what is happening is that tables don't have a chance to be vacuumed before the next ETL run
This one could be very easy to fix. Run a "manual" VACUUM (not VACUUM FULL) at end or at the beginning of each ETL run. Since you have a well defined workflow, there is no need to try get autovacuum to do the right thing, as it should be very easy to inject manual vacuums into your workflow. Or do you think that one VACUUM per ETL is overkill?
THE PROBLEM
I'm working with PostgreSQL v10 + golang and have what I believe to be a very common SQL problem:
I've a table 'counters', that has a current_value and a max_value integer columns.
Strictly, once current_value >= max_value, I would like to drop the request.
I've several Kubernetes pods that for each API call might increment current_value of the same row (in the worst case) in 'counters' table by 1 (can be thought of as concurrent updates to the same DB from distributed hosts).
In my current and naive implementation, multiple UPDATES to the same row naturally block each other (the isolation level is 'read committed' if that matters).
In the worst case, I have about 10+ requests per second that would update the same row. That creates a bottle neck and hurts performance, which I cannot afford.
POSSIBLE SOLUTION
I thought of several ideas to resolve this, but they all sacrify integrity or performance. The only one that keeps both doesn't sound very clean, for this seemingly common problem:
As long as the counter current_value is within relatively safe distance from max_value (delta > 100), send the update request to a channel that would be flushed every second or so by a worker that would aggregate the updates and request them at once. Otherwise (delta <= 100), do the update in the context of the transaction (and hit the bottleneck, but for a minority of cases). This will pace the update requests up until the point that the limit is almost reached, effectively resolving the bottleneck.
This would probably work for resolving my problem. However, I can't help but think that there are better ways to address this.
I didn't find a great solution online and even though my heuristic method would work, it feels unclean and it lacks integrity.
Creative solutions are very welcome!
Edit:
Thanks to #laurenz-albe advice, I tried to shorten the duration between the UPDATE where the row gets locked to the COMMIT of the transaction. Pushing all UPDATES to the end of the transaction seems to have done the trick. Now I can process over 100 requests/second and maintain integrity!
10 concurrent updates per second is ridiculously little. Just make sure that the transactions are as short as possible, and it won't be a problem.
Your biggest problem will be VACUUM, as lots of updates are the worst possible workload for PostgreSQL. Make sure you create the table with a fillfactor of 70 or so and the current_value is not indexed, so that you get HOT updates.
I have Postgres 9.4.7 and I have a big table ~100M rows and 20 columns. The table queries are 1.5k selects, 150 inserts and 300 updates per minute, no deletes though. Here is my autovacuum config:
autovacuum_analyze_scale_factor 0
autovacuum_analyze_threshold 5000
autovacuum_vacuum_scale_factor 0
autovacuum_vacuum_threshold 5000
autovacuum_max_workers 6
autovacuum_naptime 5s
In my case database are almost always in the constant state of vacuuming. When one vacuuming session ends another one begins.
So the main question:
Is there a common way to vacuum big tables?
Here are some other questions.
Standard vacuum do not scan entire table and 'analyze' only scans 30k rows. So under the same load I should have a constant execution time, is it true?
Do I really need to analyze table? Can frequent 'analyze' do any useful changes in query plans for a large table?
vacuum
VACUUM reclaims storage occupied by dead tuples.
So it changes only affected pages, but it will scan entire table.
That regards what you probably call "Standard vacuum". Now if you have 9.6, then
VACUUM will skip pages based on the visibility map
analyze
amount of data that ANALYZE scans depends on table size and default_statistics_target set per instance or per table - it is not 30K per se:
For large tables, ANALYZE takes a random sample of the table contents,
rather than examining every row... change slightly each time ANALYZE
is run, even if the actual table contents did not change. This might
result in small changes in the planner's estimated costs shown by
EXPLAIN.
So if you want more stable results for EXPLAIN run smth like
alter table ... alter COLUMN ... set STATISTICS 200;
or increase default_statistics_target, otherwise too often analyze has more chances to change plan.
One more thing - you have 5K threshold. In a table with 100000K rows it is 0.002% - right? so the scale is 0.00002? while default one in 0.2 or 0.1... It makes me thing that maybe you have threshold too low. Running vacuum more often is recommended indeed, but here it looks too often. Like a thousand times more often then it would be by default...
I realize that, per Pg docs (http://www.postgresql.org/about/), one can store an unlimited number of rows in a table. However, what is the "rule of thumb" for usable number of rows, if any?
Background: I want to store daily readings for a couple of decades for 13 million cells. That works out to 13 M * (366|365) * 20 ~ 9.5e10, or 95 B rows (in reality, around 120 B rows).
So, using table partitioning, I set up a master table, and then inherited tables by year. That divvies up the rows to ~ 5.2 B rows per table.
Each row is 9 SMALLINTs, and two INTs, so, 26 bytes. Add to that, the Pg overhead of 23 bytes per row, and we get 49 bytes per row. So, each table, without any PK or any other index, will weigh in at ~ 0.25 TB.
For starters, I have created only a subset of the above data, that is, only for about 250,000 cells. I have to do a bunch of tuning (create proper indexes, etc.), but the performance is really terrible right now. Besides, every time I need to add more data, I will have to drop the keys and the recreate them. The saving grace is that once everything is loaded, it will be a readonly database.
Any suggestions? Any other strategy for partitioning?
It's not just "a bunch of tuning (indexes etc.)". This is crucial and a must do.
You posted few details, but let's try.
The rule is: Try and find the most common working set. See if it fits in RAM. Optimize hardware, PG/OS buffer settings and PG indexes/clustering for it. Otherwise look for aggregates, or if it's not acceptable and you need fully random access, think what hardware could scan the whole table for you in reasonable time.
How large is your table (in gigabytes)? How does it compare to total RAM? What are your PG settings, including shared_buffers and effective_cache_size? Is this a dedicated server? If you have a 250-gig table and about 10 GB of RAM, it means you can only fit 4% of the table.
Are there any columns which are commonly used for filtering, such as state or date? Can you identify the working set that is most commonly used (like only last month)? If so, consider partitioning or clustering on these columns, and definitely index them. Basically, you're trying to make sure that as much of the working set as possible fits in RAM.
Avoid scanning the table at all costs if it does not fit in RAM. If you really need absolutely random access, the only way it could be usable is really sophisticated hardware. You would need a persistent storage/RAM configuration which can read 250 GB in reasonable time.
I have two types of queries I run often on two large datasets. They run much slower than I would expect them to.
The first type is a sequential scan updating all records:
Update rcra_sites Set street = regexp_replace(street,'/','','i')
rcra_sites has 700,000 records. It takes 22 minutes from pgAdmin! I wrote a vb.net function that loops through each record and sends an update query for each record (yes, 700,000 update queries!) and it runs in less than half the time. Hmmm....
The second type is a simple update with a relation and then a sequential scan:
Update rcra_sites as sites
Set violations='No'
From narcra_monitoring as v
Where sites.agencyid=v.agencyid and v.found_violation_flag='N'
narcra_monitoring has 1,700,000 records. This takes 8 minutes. The query planner refuses to use my indexes. The query runs much faster if I start with a set enable_seqscan = false;. I would prefer if the query planner would do its job.
I have appropriate indexes, I have vacuumed and analyzed. I optimized my shared_buffers and effective_cache_size best I know to use more memory since I have 4GB. My hardware is pretty darn good. I am running v8.4 on Windows 7.
Is PostgreSQL just this slow? Or am I still missing something?
Possibly try reducing your random_page_cost (default: 4) compared to seq_page_cost: this will reduce the planner's preference for seq scans by making random-accesses driven by indices more attractive.
Another thing to bear in mind is that MVCC means that updating a row is fairly expensive. In particular, updating every row in a table requires doubling the amount of storage for the table, until it can be vacuumed. So in your first query, you may want to qualify your update:
UPDATE rcra_sites Set street = regexp_replace(street,'/','','i')
where street ~ '/'
(afaik postgresql doesn't automatically suppress the update if it looks like you're not actually updating anything. Istr there was a standard trigger function added in 8.4 (?) to allow you to do that, but it's perhaps better to address it in the client side)
When a row is updated, a new row version is written.
If the new row does not fit in the same disk block, then every index entry pointing to the old row needs to be updated to point to the new row.
It is not just indexes on the updated data that need updating.
If you have a lot of indexes on rcra_sites, and only one or two frequently updated fields, then you might gain by separating the frequently updated fields into a table of their own.
You can also reduce the fillfactor percentage below its default of 100, so that some of the updates can result in new rows being written to the same block, resulting in the indexes pointing to that block not needing to be updated.