Transaction is 20x slower on production server - postgresql

One my development server the test transaction (series of updates etc) runs in about 2 minutes. On the production server it's about 25 minutes.
The server reads the file and inserts records. It starts out fast but then goes slower and slower as it progresses. There is an aggregate table update for each record that gets inserted and it is that update that progressively slows down. That aggregate update does query the table being written to with the inserts.
The config is only different in max_worker_processes (development 8, prod 16), shared_buffers (dev 128MB, prod 512MB), wal_buffers (Dev 4MB, prod 16MB).
I've tried tweaking a few configs and also dumped the whole database and re-did initdb just in case it was not upgraded (to 9.6) correctly. Nothing's worked.
I'm hoping that someone with experience in this could tell me what to look for.
Edit: After receiving some comments I was able figure out what is going on and get a work around going, but I think there has to be a better way. Firstly what is happening is this:
At first there is no data in the table for the relevant index, postgresql works out this plan. Note that there is data in the table just not anything with the relevant "businessIdentifier" index or "transactionNumber".
Aggregate (cost=16.63..16.64 rows=1 width=4) (actual time=0.031..0.031 rows=1 loops=1)
-> Nested Loop (cost=0.57..16.63 rows=1 width=4) (actual time=0.028..0.028 rows=0 loops=1)
-> Index Scan using transactionlinedateindex on "transactionLine" ed (cost=0.29..8.31 rows=1 width=5) (actual time=0.028..0.028 rows=0 loops=1)
Index Cond: ((("businessIdentifier")::text = '36'::text) AND ("reconciliationNumber" = 4519))
-> Index Scan using transaction_pkey on transaction eh (cost=0.29..8.31 rows=1 width=9) (never executed)
Index Cond: ((("businessIdentifier")::text = '36'::text) AND (("transactionNumber")::text = (ed."transactionNumber")::text))
Filter: ("transactionStatus" = 'posted'::"transactionStatusItemType")
Planning time: 0.915 ms
Execution time: 0.100 ms
Then as data gets inserted it becomes a really bad plan. 474ms in this example. It needs to execute thousands of times depending on what is uploaded so 474ms is bad.
Aggregate (cost=16.44..16.45 rows=1 width=4) (actual time=474.222..474.222 rows=1 loops=1)
-> Nested Loop (cost=0.57..16.44 rows=1 width=4) (actual time=474.218..474.218 rows=0 loops=1)
Join Filter: ((eh."transactionNumber")::text = (ed."transactionNumber")::text)
-> Index Scan using transaction_pkey on transaction eh (cost=0.29..8.11 rows=1 width=9) (actual time=0.023..0.408 rows=507 loops=1)
Index Cond: (("businessIdentifier")::text = '37'::text)
Filter: ("transactionStatus" = 'posted'::"transactionStatusItemType")
-> Index Scan using transactionlineprovdateindex on "transactionLine" ed (cost=0.29..8.31 rows=1 width=5) (actual time=0.934..0.934 rows=0 loops=507)
Index Cond: (("businessIdentifier")::text = '37'::text)
Filter: ("reconciliationNumber" = 4519)
Rows Removed by Filter: 2520
Planning time: 0.848 ms
Execution time: 474.278 ms
Vacuum analyze fixes it. But you cannot run Vacuum analyze until after the transaction is committed. After Vacuum analyze postgresql uses a different plan and it's back down to 0.1 ms.
Aggregate (cost=16.63..16.64 rows=1 width=4) (actual time=0.072..0.072 rows=1 loops=1)
-> Nested Loop (cost=0.57..16.63 rows=1 width=4) (actual time=0.069..0.069 rows=0 loops=1)
-> Index Scan using transactionlinedateindex on "transactionLine" ed (cost=0.29..8.31 rows=1 width=5) (actual time=0.067..0.067 rows=0 loops=1)
Index Cond: ((("businessIdentifier")::text = '37'::text) AND ("reconciliationNumber" = 4519))
-> Index Scan using transaction_pkey on transaction eh (cost=0.29..8.31 rows=1 width=9) (never executed)
Index Cond: ((("businessIdentifier")::text = '37'::text) AND (("transactionNumber")::text = (ed."transactionNumber")::text))
Filter: ("transactionStatus" = 'posted'::"transactionStatusItemType")
Planning time: 1.134 ms
Execution time: 0.141 ms
My work around is to commit after about 100 inserts and then run Vacuum analyze and then continue. The only problem is that if something in the remainder of the data fails and it's rolled back, there will still be 100 records inserted.
Is there a better way to handle this? Should I just upgrade to version 10 or 11 or postgresql and would that help?

There is an aggregate table update for each record that gets inserted and it is that update that progressively slows down.
Here is an idea: Change the workflow to (1) import external data into a table, using COPY interface, (2) Index and ANALYZE that data, (3) run the final UPDATE with all required joins/groupings to do actual transformation and update the aggregate table.
All of that could be done in one, long transaction - if needed.
Only if the whole thing is locking some vital database objects for too long, you should consider splitting this into separate transactions / batches (processing data partitioned in some generic way, by date/time or by ID).
But you cannot run Vacuum analyze until after the transaction is committed.
To get updated costs of query plan, you need only ANALYZE not VACUUM.

Related

Postgres Optimizing Free Text Search when many results are returned

We are building lightweight text search on top of our data at Postgres with GIN indexes. When the matched data is small, it works, really fast. However, if we search common terms, due to many matches, the performance of it degrades significantly.
Consider the following query:
EXPLAIN ANALYZE
SELECT count(id)
FROM data_change_records d
WHERE to_tsvector('english', d.content) ## websearch_to_tsquery('english', 'mustafa');
The result is as follows:
Finalize Aggregate (cost=47207.99..47208.00 rows=1 width=8) (actual time=15.461..17.129 rows=1 loops=1)
-> Gather (cost=47207.78..47207.99 rows=2 width=8) (actual time=9.734..17.119 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=46207.78..46207.79 rows=1 width=8) (actual time=3.773..3.774 rows=1 loops=3)
-> Parallel Bitmap Heap Scan on data_change_records d (cost=1759.41..46194.95 rows=5130 width=37) (actual time=1.765..3.673 rows=1143 loops=3)
Recheck Cond: (to_tsvector('english'::regconfig, content) ## '''mustafa'''::tsquery)"
Heap Blocks: exact=2300
-> Bitmap Index Scan on data_change_records_content_to_tsvector_idx (cost=0.00..1756.33 rows=12311 width=0) (actual time=4.219..4.219 rows=3738 loops=1)
Index Cond: (to_tsvector('english'::regconfig, content) ## '''mustafa'''::tsquery)"
Planning Time: 0.141 ms
Execution Time: 17.163 ms
If the query is simple, like mustafa replaced with aws, which reduced to aw with tokenizer the analysis is as follows:
Finalize Aggregate (cost=723889.39..723889.40 rows=1 width=8) (actual time=1073.513..1086.414 rows=1 loops=1)
-> Gather (cost=723889.17..723889.38 rows=2 width=8) (actual time=1069.439..1086.401 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=722889.17..722889.18 rows=1 width=8) (actual time=1063.847..1063.848 rows=1 loops=3)
-> Parallel Bitmap Heap Scan on data_change_records d (cost=17128.34..721138.59 rows=700233 width=37) (actual time=389.347..1014.440 rows=542724 loops=3)
Recheck Cond: (to_tsvector('english'::regconfig, content) ## '''aw'''::tsquery)"
Heap Blocks: exact=167605
-> Bitmap Index Scan on data_change_records_content_to_tsvector_idx (cost=0.00..16708.20 rows=1680560 width=0) (actual time=282.517..282.518 rows=1647916 loops=1)
Index Cond: (to_tsvector('english'::regconfig, content) ## '''aw'''::tsquery)"
Planning Time: 0.150 ms
Execution Time: 1086.455 ms
At this point, we are not sure how to proceed in this case. Options include changing the tokenization to not allow 2 words. We have lots of aws indexed that is the cause. For instance, if we search for ok which is also 2 words but not that common, the query returns in 61.378 ms
Searching for frequent words can never be as fast as searching for rare words.
One thing that strikes me is that you are using English stemming to search for names. If that is really your use case, you should use the simple dictionary that wouldn't stem aws to aw.
Alternatively, you could introduce an additional synonym dictionary to a custom text search configuration that contains aws and prevents stemming.
But, as I said, searching for frequent words cannot be fast if you want all result rows. A trick you could use is to set gin_fuzzy_search_limit to the limit of hits you want to find, then the index scan will stop early and may be faster (but you won't get all results).
If you have a new-enough version of PostgreSQL and your table is well-vacuumed, you can get an bitmap-only scan which doesn't need to visit the table, just the index. But, you would need to use count(*), not count(id), to get that. If "id" is never NULL, then these should give identical answers.
The query plan does not make it easy to tell when the bitmap-only optimization kicks in or how effective it is. If you use EXPLAIN (ANALYZE, BUFFERS) you should get at least some clue based on the buffer counts.

LockRows plan node taking long time

I have the following query in Postgres (emulating a work queue):
DELETE FROM work_queue
WHERE id IN ( SELECT l.id
FROM work_queue l
WHERE l.delivered = 'f' and l.error = 'f' and l.archived = 'f'
ORDER BY created_at
LIMIT 5000
FOR UPDATE SKIP LOCKED );
While running the above concurrently (4 processes per second) along with a concurrent ingest at the rate of 10K records/second into work_queue, the query effectively bottlenecks on LockRow node.
Query plan output:
Delete on work_queue (cost=478.39..39609.09 rows=5000 width=67) (actual time=38734.995..38734.998 rows=0 loops=1)
-> Nested Loop (cost=478.39..39609.09 rows=5000 width=67) (actual time=36654.711..38507.393 rows=5000 loops=1)
-> HashAggregate (cost=477.96..527.96 rows=5000 width=98) (actual time=36654.690..36658.495 rows=5000 loops=1)
Group Key: "ANY_subquery".id
-> Subquery Scan on "ANY_subquery" (cost=0.43..465.46 rows=5000 width=98) (actual time=36600.963..36638.250 rows=5000 loops=1)
-> Limit (cost=0.43..415.46 rows=5000 width=51) (actual time=36600.958..36635.886 rows=5000 loops=1)
-> LockRows (cost=0.43..111701.83 rows=1345680 width=51) (actual time=36600.956..36635.039 rows=5000 loops=1)
-> Index Scan using work_queue_created_at_idx on work_queue l (cost=0.43..98245.03 rows=1345680 width=51) (actual time=779.706..2690.340 rows=250692 loops=1)
Filter: ((NOT delivered) AND (NOT error) AND (NOT archived))
-> Index Scan using work_queue_pkey on work_queue (cost=0.43..7.84 rows=1 width=43) (actual time=0.364..0.364 rows=1 loops=5000)
Index Cond: (id = "ANY_subquery".id)
Planning Time: 8.424 ms
Trigger for constraint work_queue_logs_work_queue_id_fkey: time=5490.925 calls=5000
Trigger work_queue_locked_trigger: time=2119.540 calls=1
Execution Time: 46346.471 ms
(corresponding visualization: https://explain.dalibo.com/plan/ZaZ)
Any ideas on improving this? Why should locking rows take so long in the presence of concurrent inserts? Note that if I do not have concurrent inserts into the work_queue table, the query is super fast.
We can see that the index scan returned 250692 rows in order to find 5000 to lock. So apparently we had to skip over 49 other queries worth of locked rows. That is not going to be very efficient, although if static it shouldn't be as slow as you see here. But it has to acquire a transient exclusive lock on a section of memory for each attempt. If it is fighting with many other processes for those locks, you can get a cascading collapse of performance.
If you are launching 4 such statements per second with no cap and without waiting for any previous ones to finish, then you have an unstable situation. The more you have running at one time, the more they fight each other and slow down. If the completion rate goes down but the launch interval does not, then you just get more processes fighting with more other processes and each getting slower. So once you get shoved over the edge, it might never recover on its own.
The role of concurrent insertions is probably just to provide enough noisy load on the system to give the collapse a chance to take a foothold. And of course without concurrent insertion, your deletes are doing to run out of things to delete pretty soon, at which point they will be very fast.

Postgres not using Covering Index with Aggregate

Engine version: 12.4
Postgres wasn't using index only scan, then I ran vacuum analyze verbose table_name. After that it started using index only scan. Earlier when I had ran analyze verbose table_name without vacuum that time index only scan wasn't used.
So it means there is very heavy dependency on vacuum to use index only plan. Is there any way to eliminate this dependency OR should we configure vacuum very regularly? frequency like daily.
Our objective is to reduce cpu usage.. Overall machine cpu usage is 10%-15% throughout the day but when this query runs then cpu goes very high( this query runs in multiple threads at same time with diff values)
EXPLAIN ANALYZE SELECT COALESCE(requested_debit, 0) AS requestedDebit, COALESCE(requested_credit, 0) AS requestedCredit
FROM (SELECT COALESCE(Sum(le.credit), 0) AS requested_credit, COALESCE(Sum(le.debit), 0) AS requested_debit
FROM ledger_entries le
WHERE le.accounting_entity_id = 1
AND le.general_ledger_id = 503
AND le.post_date BETWEEN '2020-09-10' AND '2020-11-30') AS requested_le;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on requested_le (cost=66602.65..66602.67 rows=1 width=64) (actual time=81.263..81.352 rows=1 loops=1)
-> Finalize Aggregate (cost=66602.65..66602.66 rows=1 width=64) (actual time=81.261..81.348 rows=1 loops=1)
-> Gather (cost=66602.41..66602.62 rows=2 width=64) (actual time=79.485..81.331 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=65602.41..65602.42 rows=1 width=64) (actual time=74.293..74.294 rows=1 loops=3)
-> Parallel Index Only Scan using post_date_gl_id_accounting_entity_id_include_idx on ledger_entries le (cost=0.56..65203.73 rows=79735 width=8) (actual time=47.874..74.212 rows=197 loops=3)
Index Cond: ((post_date >= '2020-09-10'::date) AND (post_date <= '2020-11-30'::date) AND (general_ledger_id = 503) AND (accounting_entity_id = 1))
Heap Fetches: 0
Planning Time: 0.211 ms
Execution Time: 81.395 ms
(11 rows)
There is a very strong connection between VACUUM and index-only scans in PostgreSQL: an index-only scan can only skip fetching the table row (to check for visibility) if the block containing the row is marked all-visible in the visibility map. And the visibility map is updated by VACUUM.
So yes, you have to VACUUM often to get efficient index-only scans.
Typically, there is no need to schedule a manual VACUUM, you can simply
ALTER TABLE mytab SET (autovacuum_vacuum_scale_factor = 0.01);
(or a similar value) and let autovacuum do the job.
The only case where this is problematic are insert-only tables, because for them autovacuum won't be triggered for PostgreSQL versions below v13. In v13, you can simply change autovacuum_vacuum_insert_scale_factor, while in older versions you will have to set autovacuum_freeze_max_age to a low value for that table.

How do I improve the perfomance of my delete-query if the bottleneck appears to be I/O?

I have a (badly designed) database table (postgres) that I'm trying to clean up. The table is about 270 GB in size, 38K rows (+- 70 MB per row --> columns contain file contents).
In parallel with changing the design to no longer contain files, I want to remove 80% of the data to reduce disk usage. Hence, I've tried to following query:
DELETE FROM table_name.dynamic_data
WHERE table_name.env = 'AE'
Which should cover +- 25% of the data. This query times out without any warning, or sometimes reporting that the logs file are full: PANIC: could not write to file "pg_xlog/xlogtemp.32455": No space left on device
I've tried to
DELETE FROM table_name
WHERE ctid IN (
SELECT ctid
FROM table_name
WHERE table_name.env = 'AE'
LIMIT 1000)
Which works, but is incredibly slow (200-250 ms per row deleted) and times out if I go over 1000 a lot.
To find the bottleneck in the query, I ran the query above with explain (analyze,buffers,timing) on a smaller version of this query (with LIMIT 1 instead of LIMIT 1000), resulting in the following explain:
QUERY PLAN
Delete on dynamic_data (cost=0.38..4410.47 rows=1 width=36) (actual time=338.913..338.913 rows=0 loops=1)
Buffers: shared hit=7972 read=988 dirtied=975
I/O Timings: read=312.160
-> Nested Loop (cost=0.38..4410.47 rows=1 width=36) (actual time=3.919..13.700 rows=1 loops=1)
Join Filter: (dynamic_data.ctid = "ANY_subquery".ctid)
Rows Removed by Join Filter: 35938
Buffers: shared hit=4013
-> Unique (cost=0.38..0.39 rows=1 width=36) (actual time=2.786..2.788 rows=1 loops=1)
Buffers: shared hit=479
-> Sort (cost=0.38..0.39 rows=1 width=36) (actual time=2.786..2.787 rows=1 loops=1)
Sort Key: "ANY_subquery".ctid
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=479
-> Subquery Scan on "ANY_subquery" (cost=0.00..0.37 rows=1 width=36) (actual time=2.753..2.753 rows=1 loops=1)
Buffers: shared hit=474
-> Limit (cost=0.00..0.36 rows=1 width=6) (actual time=2.735..2.735 rows=1 loops=1)
Buffers: shared hit=474
-> Seq Scan on dynamic_data dynamic_data_1 (cost=0.00..4020.71 rows=11093 width=6) (actual time=2.735..2.735 rows=1 loops=1)
Filter: (env = 'AE'::text)
Rows Removed by Filter: 5614
Buffers: shared hit=474
-> Seq Scan on dynamic_data (cost=0.00..3923.37 rows=38937 width=6) (actual time=0.005..8.130 rows=35939 loops=1)
Buffers: shared hit=3534
Planning time: 0.354 ms
Execution time: 338.969 ms
My main take-away from the query plan is that I/O timings take 312/338 = 92% of the time:
actual time=338.913..338.913 rows=0 loops=1)
Buffers: shared hit=7972 read=988 dirtied=975
I/O Timings: read=312.160
I can't find anything on how to improve the I/O performance of this query without changing the database configuration. Is this simply an unfortunate effect of the large table rows / database? Or am I missing something important? How do I speed up this delete operation?
Without any resolution I'm defaulting to having a script delete 1 row at a time with separate queries. Far from ideal, so I hope you have suggestions.
Note that I am not administering the database and not authorized to make any changes to its configuration, nor is it likely that the DBA will change the config to cope with my ill-designed setup.

Phrase frequency counter with FULL Text Search of PostgreSQL 9.6

I need to calculate the number of times that a phrase appears using ts_query against an indexed text field (ts_vector data type). It works but it is very slow because the table is huge. For single words I pre-calculated all the frequencies but I have no ideas for increasing the speed of a phrase search.
Edit: Thank you for your reply #jjanes.
This is my query:
SELECT substring(date_input::text,0,5) as myear, ts_headline('simple',text_input,q, 'StartSel=<b>, StopSel=</b>,MaxWords=2, MinWords=1, ShortWord=1, HighlightAll=FALSE, MaxFragments=9999, FragmentDelimiter=" ... "') as headline
FROM
db_test, to_tsquery('simple','united<->kingdom') as q WHERE date_input BETWEEN '2000-01-01'::DATE AND '2019-12-31'::DATE and idxfti_simple ## q
And this is the EXPLAIN (ANALYZE, BUFFERS) output:
Nested Loop (cost=25408.33..47901.67 rows=5509 width=64) (actual time=286.536..17133.343 rows=38127 loops=1)
Buffers: shared hit=224723
-> Function Scan on q (cost=0.00..0.01 rows=1 width=32) (actual time=0.005..0.007 rows=1 loops=1)
-> Append (cost=25408.33..46428.00 rows=5510 width=625) (actual time=285.372..864.868 rows=38127 loops=1)
Buffers: shared hit=165713
-> Bitmap Heap Scan on db_test (cost=25408.33..46309.01 rows=5509 width=625) (actual time=285.368..791.111 rows=38127 loops=1)
Recheck Cond: ((idxfti_simple ## q.q) AND (date_input >= '2000-01-01'::date) AND (date_input <= '2019-12-31'::date))
Rows Removed by Index Recheck: 136
Heap Blocks: exact=29643
Buffers: shared hit=165607
-> BitmapAnd (cost=25408.33..25408.33 rows=5509 width=0) (actual time=278.370..278.371 rows=0 loops=1)
Buffers: shared hit=3838
-> Bitmap Index Scan on idxftisimple_idx (cost=0.00..1989.01 rows=35869 width=0) (actual time=67.280..67.281 rows=176654 loops=1)
Index Cond: (idxfti_simple ## q.q)
Buffers: shared hit=611
-> Bitmap Index Scan on db_test_date_input_idx (cost=0.00..23142.24 rows=1101781 width=0) (actual time=174.711..174.712 rows=1149456 loops=1)
Index Cond: ((date_input >= '2000-01-01'::date) AND (date_input <= '2019-12-31'::date))
Buffers: shared hit=3227
-> Seq Scan on test (cost=0.00..118.98 rows=1 width=451) (actual time=0.280..0.280 rows=0 loops=1)
Filter: ((date_input >= '2000-01-01'::date) AND (date_input <= '2019-12-31'::date) AND (idxfti_simple ## q.q))
Rows Removed by Filter: 742
Buffers: shared hit=106
Planning time: 0.332 ms
Execution time: 17176.805 ms
Sorry, I can't set track_io_timing turned on. I do know that ts_headline is not recommended but I need it to calculate the number of times that a phrase appears on the same field.
Thank you in advance for your help.
Note that fetching the rows in Bitmap Heap Scan is quite fast, <0.8 seconds, and almost all the time is spent in the top-level node. That time is likely to be spent in ts_headline, reparsing the text_input document. As long as you keep using ts_headline, there isn't much you can do about this.
ts_headline doesn't directly give you what you want (frequency), so you must be doing some kind of post-processing of it. Maybe you could move to postprocessing the tsvector directly, so the document doesn't need to be reparsed.
Another option is to upgrade further, which could allow the work of ts_headline to be spread over multiple CPUs. PostgreSQL 9.6 was the first version which supported parallel query, and it was not mature enough in that version to be able to parallelize this type of thing. v10 is probably enough to get parallel query for this, but you might as well jump all the way to v12.
Version 9.2 is old and out of support. It didn't have native support for phrase searching in the first place (introduced in 9.6).
Please upgrade.
And if it is still slow, show us the query, and the EXPLAIN (ANALYZE, BUFFERS) for it, preferably with track_io_timing turned on.