LockRows plan node taking long time - postgresql

I have the following query in Postgres (emulating a work queue):
DELETE FROM work_queue
WHERE id IN ( SELECT l.id
FROM work_queue l
WHERE l.delivered = 'f' and l.error = 'f' and l.archived = 'f'
ORDER BY created_at
LIMIT 5000
FOR UPDATE SKIP LOCKED );
While running the above concurrently (4 processes per second) along with a concurrent ingest at the rate of 10K records/second into work_queue, the query effectively bottlenecks on LockRow node.
Query plan output:
Delete on work_queue (cost=478.39..39609.09 rows=5000 width=67) (actual time=38734.995..38734.998 rows=0 loops=1)
-> Nested Loop (cost=478.39..39609.09 rows=5000 width=67) (actual time=36654.711..38507.393 rows=5000 loops=1)
-> HashAggregate (cost=477.96..527.96 rows=5000 width=98) (actual time=36654.690..36658.495 rows=5000 loops=1)
Group Key: "ANY_subquery".id
-> Subquery Scan on "ANY_subquery" (cost=0.43..465.46 rows=5000 width=98) (actual time=36600.963..36638.250 rows=5000 loops=1)
-> Limit (cost=0.43..415.46 rows=5000 width=51) (actual time=36600.958..36635.886 rows=5000 loops=1)
-> LockRows (cost=0.43..111701.83 rows=1345680 width=51) (actual time=36600.956..36635.039 rows=5000 loops=1)
-> Index Scan using work_queue_created_at_idx on work_queue l (cost=0.43..98245.03 rows=1345680 width=51) (actual time=779.706..2690.340 rows=250692 loops=1)
Filter: ((NOT delivered) AND (NOT error) AND (NOT archived))
-> Index Scan using work_queue_pkey on work_queue (cost=0.43..7.84 rows=1 width=43) (actual time=0.364..0.364 rows=1 loops=5000)
Index Cond: (id = "ANY_subquery".id)
Planning Time: 8.424 ms
Trigger for constraint work_queue_logs_work_queue_id_fkey: time=5490.925 calls=5000
Trigger work_queue_locked_trigger: time=2119.540 calls=1
Execution Time: 46346.471 ms
(corresponding visualization: https://explain.dalibo.com/plan/ZaZ)
Any ideas on improving this? Why should locking rows take so long in the presence of concurrent inserts? Note that if I do not have concurrent inserts into the work_queue table, the query is super fast.

We can see that the index scan returned 250692 rows in order to find 5000 to lock. So apparently we had to skip over 49 other queries worth of locked rows. That is not going to be very efficient, although if static it shouldn't be as slow as you see here. But it has to acquire a transient exclusive lock on a section of memory for each attempt. If it is fighting with many other processes for those locks, you can get a cascading collapse of performance.
If you are launching 4 such statements per second with no cap and without waiting for any previous ones to finish, then you have an unstable situation. The more you have running at one time, the more they fight each other and slow down. If the completion rate goes down but the launch interval does not, then you just get more processes fighting with more other processes and each getting slower. So once you get shoved over the edge, it might never recover on its own.
The role of concurrent insertions is probably just to provide enough noisy load on the system to give the collapse a chance to take a foothold. And of course without concurrent insertion, your deletes are doing to run out of things to delete pretty soon, at which point they will be very fast.

Related

Why do seq/index scans take so long when running query after a while? How to make it fast?

Problem:
I have a query that joins three tables. Whenever I run this query after a while (say 24hrs), it would take a lot of time to execute. But from that time onwards, it would execute really fast (~ 70x faster). I wanted to know what's the problem that it takes so long to execute for the first time, and how to solve it.
Table conditions:
The tables are: property_2, property_attribute_2, and property_address_2. Each of which is a partition of a bigger table (i.e. property, property_attribute, and property_address). Also, rows in property_attribute_2 and property_address_2 have reference key to property_2 using column property_id. These columns (property.id, property_attribute_2.property_id, and property_address_2.property_id) are all indexed.
The query is:
select * from public.property_2 a
inner join public.property_attribute_2 b on a.id = b.property_id
left join public.property_address_2 c on a.id=c.property_id
The query plan when I run it after a while is:
Hash Right Join (cost=670010.33..983391.75 rows=2477776 width=185) (actual time=804159.499..1065892.338 rows=2477924 loops=1)
Hash Cond: (c.property_id = a.id)
-> Seq Scan on property_address_2 c (cost=0.00..131660.48 rows=4257948 width=72) (actual time=289.781..247906.955 rows=4257973 loops=1)
-> Hash (cost=595483.13..595483.13 rows=2477776 width=117) (actual time=803833.183..803833.185 rows=2477921 loops=1)
Buckets: 32768 Batches: 128 Memory Usage: 3165kB
-> Hash Join (cost=94193.96..595483.13 rows=2477776 width=117) (actual time=98061.326..802753.642 rows=2477921 loops=1)
Hash Cond: (a.id = b.property_id)
-> Seq Scan on property_2 a (cost=0.00..265463.84 rows=6176884 width=105) (actual time=1349.284..696922.438 rows=4272433 loops=1)
-> Hash (cost=48702.76..48702.76 rows=2477776 width=20) (actual time=95497.307..95497.308 rows=2477921 loops=1)
Buckets: 65536 Batches: 64 Memory Usage: 2624kB
-> Seq Scan on property_attribute_2 b (cost=0.00..48702.76 rows=2477776 width=20) (actual time=464.476..94126.890 rows=2477921 loops=1)
Planning time: 4.034 ms
Execution time: 1065995.827 ms
And the query plan after the first run is:
Hash Right Join (cost=670010.33..983391.75 rows=2477776 width=185) (actual time=8828.873..13764.283 rows=2477924 loops=1)
Hash Cond: (c.property_id = a.id)
-> Seq Scan on property_address_2 c (cost=0.00..131660.48 rows=4257948 width=72) (actual time=0.050..1411.877 rows=4257973 loops=1)
-> Hash (cost=595483.13..595483.13 rows=2477776 width=117) (actual time=8826.620..8826.623 rows=2477921 loops=1)
Buckets: 32768 Batches: 128 Memory Usage: 3165kB
-> Hash Join (cost=94193.96..595483.13 rows=2477776 width=117) (actual time=1356.224..7925.850 rows=2477921 loops=1)
Hash Cond: (a.id = b.property_id)
-> Seq Scan on property_2 a (cost=0.00..265463.84 rows=6176884 width=105) (actual time=0.034..2652.013 rows=4272433 loops=1)
-> Hash (cost=48702.76..48702.76 rows=2477776 width=20) (actual time=1354.828..1354.829 rows=2477921 loops=1)
Buckets: 65536 Batches: 64 Memory Usage: 2624kB
-> Seq Scan on property_attribute_2 b (cost=0.00..48702.76 rows=2477776 width=20) (actual time=0.023..630.081 rows=2477921 loops=1)
Planning time: 1.181 ms
Execution time: 13872.977 ms
Also worth noting that I have a couple of other Postgres databases on this machine and different jobs use different tables on these databases on a regular basis.
If cold cache is the problem, as it seems to be the case, you can warm it up before running the query. Postgres ships with the additional module pg_prewarm providing a range of tools to populate the cache.
Instructions how to set it up here:
PostgreSQL: Force data into memory
Then you run something like:
SELECT pg_prewarm('public.property_2', 'prefetch');
SELECT pg_prewarm('public.property_attribute_2', 'prefetch');
SELECT pg_prewarm('public.property_address_2', 'prefetch');
Of course, if you always run the same SELECT query without filter predicates, you might as well just run the same query to populate the cache, without using the fancy module. Possibly scheduled with a cron job?
... are all indexed.
As you can see in the EXPLAIN output, your indexes go unused. You fetch all rows without filter predicate, so indexes typically won't help. And you do it with SELECT *, i.e. get all columns from all joined tables, so index-only scans are out, too. You might improve performance by listing only the columns you actually need in the SELECT list.
Obviously, more RAM (and proper configuration for PostgreSQL buffer cache) can help, too.
Or you might be able to reduce RAM requirements with VACUUM (FULL) or, possibly, with an optimized table definition with proper column types and order. Not just for the tables at hand, also for other tables competing for the same resources (thereby evicting "your" blocks from the cache). See:
Calculating and saving space in PostgreSQL
The difference must be caching: the first time, the data are read from disk, in subsequent runs they are found in RAM. Run EXPLAIN (ANALYZE, BUFFERS) with track_io_timing = on to confirm that.
However, it seems that either your I/O system is really slow or your tables are quite bloated. EXPLAIN (ANALYZE, BUFFERS) would show how many blocks are read, so you would know.
If bloat is indeed your problem, VACUUM (FULL) would help.

Postgres not using Covering Index with Aggregate

Engine version: 12.4
Postgres wasn't using index only scan, then I ran vacuum analyze verbose table_name. After that it started using index only scan. Earlier when I had ran analyze verbose table_name without vacuum that time index only scan wasn't used.
So it means there is very heavy dependency on vacuum to use index only plan. Is there any way to eliminate this dependency OR should we configure vacuum very regularly? frequency like daily.
Our objective is to reduce cpu usage.. Overall machine cpu usage is 10%-15% throughout the day but when this query runs then cpu goes very high( this query runs in multiple threads at same time with diff values)
EXPLAIN ANALYZE SELECT COALESCE(requested_debit, 0) AS requestedDebit, COALESCE(requested_credit, 0) AS requestedCredit
FROM (SELECT COALESCE(Sum(le.credit), 0) AS requested_credit, COALESCE(Sum(le.debit), 0) AS requested_debit
FROM ledger_entries le
WHERE le.accounting_entity_id = 1
AND le.general_ledger_id = 503
AND le.post_date BETWEEN '2020-09-10' AND '2020-11-30') AS requested_le;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on requested_le (cost=66602.65..66602.67 rows=1 width=64) (actual time=81.263..81.352 rows=1 loops=1)
-> Finalize Aggregate (cost=66602.65..66602.66 rows=1 width=64) (actual time=81.261..81.348 rows=1 loops=1)
-> Gather (cost=66602.41..66602.62 rows=2 width=64) (actual time=79.485..81.331 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=65602.41..65602.42 rows=1 width=64) (actual time=74.293..74.294 rows=1 loops=3)
-> Parallel Index Only Scan using post_date_gl_id_accounting_entity_id_include_idx on ledger_entries le (cost=0.56..65203.73 rows=79735 width=8) (actual time=47.874..74.212 rows=197 loops=3)
Index Cond: ((post_date >= '2020-09-10'::date) AND (post_date <= '2020-11-30'::date) AND (general_ledger_id = 503) AND (accounting_entity_id = 1))
Heap Fetches: 0
Planning Time: 0.211 ms
Execution Time: 81.395 ms
(11 rows)
There is a very strong connection between VACUUM and index-only scans in PostgreSQL: an index-only scan can only skip fetching the table row (to check for visibility) if the block containing the row is marked all-visible in the visibility map. And the visibility map is updated by VACUUM.
So yes, you have to VACUUM often to get efficient index-only scans.
Typically, there is no need to schedule a manual VACUUM, you can simply
ALTER TABLE mytab SET (autovacuum_vacuum_scale_factor = 0.01);
(or a similar value) and let autovacuum do the job.
The only case where this is problematic are insert-only tables, because for them autovacuum won't be triggered for PostgreSQL versions below v13. In v13, you can simply change autovacuum_vacuum_insert_scale_factor, while in older versions you will have to set autovacuum_freeze_max_age to a low value for that table.

Inordinately slow Nested Loop with join on simple query

I'm running the query below against the primary key lt_id (no other index bar the pkey btree) and joining against 1000 ids.
It might be just my lack of experience with postgres but it seems like it's maybe an order of magnitude slow.. There are 800k rows in the table in total.
This is a low spec machine(4G mem) but still thought it should be faster. CPU is idle.
EXPLAIN (ANALYZE,BUFFERS) SELECT lt_id FROM "mytable" d INNER JOIN ( VALUES (1839147),(...998 more rows here...),(1756908)) v(id) ON (d.lt_id = v.id);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.42..7743.00 rows=1000 width=4) (actual time=69.852..20743.393 rows=1000 loops=1)
Buffers: shared hit=2395 read=1607
-> Values Scan on "*VALUES*" (cost=0.00..12.50 rows=1000 width=4) (actual time=0.004..4.770 rows=1000 loops=1)
-> Index Only Scan using lt_id_idx on mytable d (cost=0.42..7.73 rows=1 width=4) (actual time=20.732..20.732 rows=1 loops=1000)
Index Cond: (lt_id = "*VALUES*".column1)
Heap Fetches: 1000
Buffers: shared hit=2395 read=1607
Planning Time: 86.284 ms
Execution Time: 20744.223 ms
(9 rows)
psql 11.7 , I was using 9 but upgraded to 11.7 , no real difference in speed observed.
free
total used free shared buff/cache available
Mem: 3783732 158076 3400932 55420 224724 3366832
Swap: 0 0 0
Even though it's low spec should it really be taking 20 seconds? In fact many other queries are taking twice as long or more. 20 seconds seems to be the best case scenario. There are a couple of other text columns in the table with some small text articles which I doubt is the issue.
I was previously using IN operator but observed similar or worse speeds.
I also made a couple of small changes from the default config, but it doesn't seem to make much difference.
work_mem = 32MB
shared_buffers = 512MB
Any ideas if this is expected performance given the machine? Or is there something else I can try?
edit: I guess what I'm curious about it the time in the actual loop
actual time=20.732..20.732 rows=1 loops=1000
It seems like the actual time is less than or equal 1ms per loop which in worst case would be less than 1 second for 1000 iterations and other operations also seem negligible. Does this mean the issue is simple IO ? slow disk ? What would typically be the situation here.
I notice if I run the query on my desktop which only has 8G ram but is using an SSD the query is massively faster..
Using an SSD is fine of course but I'd like to know if something in my config or query/setup is not optimal..
As #pifor suggested, set track_io_timing=on , can see that this is indeed almost entirely IO slowness..
Nested Loop (cost=0.42..7743.00 rows=1000 width=69) (actual time=0.026..14901.004 rows=1000 loops=1)
Buffers: shared hit=2859 read=1145
I/O Timings: read=14861.578
-> Values Scan on "*VALUES*" (cost=0.00..12.50 rows=1000 width=4) (actual time=0.002..5.497 rows=1000 loops=1)
-> Index Scan using mytable_pkey on mytable d (cost=0.42..7.73 rows=1 width=69) (actual time=14.888..14.888 rows=1 loops=1000)
Index Cond: (lt_id = "*VALUES*".column1)
Buffers: shared hit=2859 read=1145
I/O Timings: read=14861.578
Planning Time: 0.420 ms
Execution Time: 14901.734 ms
(10 rows)

Transaction is 20x slower on production server

One my development server the test transaction (series of updates etc) runs in about 2 minutes. On the production server it's about 25 minutes.
The server reads the file and inserts records. It starts out fast but then goes slower and slower as it progresses. There is an aggregate table update for each record that gets inserted and it is that update that progressively slows down. That aggregate update does query the table being written to with the inserts.
The config is only different in max_worker_processes (development 8, prod 16), shared_buffers (dev 128MB, prod 512MB), wal_buffers (Dev 4MB, prod 16MB).
I've tried tweaking a few configs and also dumped the whole database and re-did initdb just in case it was not upgraded (to 9.6) correctly. Nothing's worked.
I'm hoping that someone with experience in this could tell me what to look for.
Edit: After receiving some comments I was able figure out what is going on and get a work around going, but I think there has to be a better way. Firstly what is happening is this:
At first there is no data in the table for the relevant index, postgresql works out this plan. Note that there is data in the table just not anything with the relevant "businessIdentifier" index or "transactionNumber".
Aggregate (cost=16.63..16.64 rows=1 width=4) (actual time=0.031..0.031 rows=1 loops=1)
-> Nested Loop (cost=0.57..16.63 rows=1 width=4) (actual time=0.028..0.028 rows=0 loops=1)
-> Index Scan using transactionlinedateindex on "transactionLine" ed (cost=0.29..8.31 rows=1 width=5) (actual time=0.028..0.028 rows=0 loops=1)
Index Cond: ((("businessIdentifier")::text = '36'::text) AND ("reconciliationNumber" = 4519))
-> Index Scan using transaction_pkey on transaction eh (cost=0.29..8.31 rows=1 width=9) (never executed)
Index Cond: ((("businessIdentifier")::text = '36'::text) AND (("transactionNumber")::text = (ed."transactionNumber")::text))
Filter: ("transactionStatus" = 'posted'::"transactionStatusItemType")
Planning time: 0.915 ms
Execution time: 0.100 ms
Then as data gets inserted it becomes a really bad plan. 474ms in this example. It needs to execute thousands of times depending on what is uploaded so 474ms is bad.
Aggregate (cost=16.44..16.45 rows=1 width=4) (actual time=474.222..474.222 rows=1 loops=1)
-> Nested Loop (cost=0.57..16.44 rows=1 width=4) (actual time=474.218..474.218 rows=0 loops=1)
Join Filter: ((eh."transactionNumber")::text = (ed."transactionNumber")::text)
-> Index Scan using transaction_pkey on transaction eh (cost=0.29..8.11 rows=1 width=9) (actual time=0.023..0.408 rows=507 loops=1)
Index Cond: (("businessIdentifier")::text = '37'::text)
Filter: ("transactionStatus" = 'posted'::"transactionStatusItemType")
-> Index Scan using transactionlineprovdateindex on "transactionLine" ed (cost=0.29..8.31 rows=1 width=5) (actual time=0.934..0.934 rows=0 loops=507)
Index Cond: (("businessIdentifier")::text = '37'::text)
Filter: ("reconciliationNumber" = 4519)
Rows Removed by Filter: 2520
Planning time: 0.848 ms
Execution time: 474.278 ms
Vacuum analyze fixes it. But you cannot run Vacuum analyze until after the transaction is committed. After Vacuum analyze postgresql uses a different plan and it's back down to 0.1 ms.
Aggregate (cost=16.63..16.64 rows=1 width=4) (actual time=0.072..0.072 rows=1 loops=1)
-> Nested Loop (cost=0.57..16.63 rows=1 width=4) (actual time=0.069..0.069 rows=0 loops=1)
-> Index Scan using transactionlinedateindex on "transactionLine" ed (cost=0.29..8.31 rows=1 width=5) (actual time=0.067..0.067 rows=0 loops=1)
Index Cond: ((("businessIdentifier")::text = '37'::text) AND ("reconciliationNumber" = 4519))
-> Index Scan using transaction_pkey on transaction eh (cost=0.29..8.31 rows=1 width=9) (never executed)
Index Cond: ((("businessIdentifier")::text = '37'::text) AND (("transactionNumber")::text = (ed."transactionNumber")::text))
Filter: ("transactionStatus" = 'posted'::"transactionStatusItemType")
Planning time: 1.134 ms
Execution time: 0.141 ms
My work around is to commit after about 100 inserts and then run Vacuum analyze and then continue. The only problem is that if something in the remainder of the data fails and it's rolled back, there will still be 100 records inserted.
Is there a better way to handle this? Should I just upgrade to version 10 or 11 or postgresql and would that help?
There is an aggregate table update for each record that gets inserted and it is that update that progressively slows down.
Here is an idea: Change the workflow to (1) import external data into a table, using COPY interface, (2) Index and ANALYZE that data, (3) run the final UPDATE with all required joins/groupings to do actual transformation and update the aggregate table.
All of that could be done in one, long transaction - if needed.
Only if the whole thing is locking some vital database objects for too long, you should consider splitting this into separate transactions / batches (processing data partitioned in some generic way, by date/time or by ID).
But you cannot run Vacuum analyze until after the transaction is committed.
To get updated costs of query plan, you need only ANALYZE not VACUUM.

Slow on first query

I'm having troubles when I perform the first query on a table. Subsequent queries are much faster, even if I change the range date to look for. I assume that PostgreSQL implements a caching mechanism that allows the subsequent queries to be much faster. I can try to warmup the cache so the first user request can hit the cache. However, I think I can somehow improve the following query:
SELECT
y.id, y.title, x.visits, x.score
FROM (
SELECT
article_id, visits,
COALESCE(ROUND((visits / NULLIF(hits ,0)::float)::numeric, 4), 0) score
FROM (
SELECT
article_id, SUM(visits) visits, SUM(hits) hits
FROM
article_reports
WHERE
a.site_id = 'XYZ' AND a.date >= '2017-04-13' AND a.date <= '2017-06-28'
GROUP BY
article_id
) q ORDER BY score DESC, visits DESC LIMIT(20)
) x
INNER JOIN
articles y ON x.article_id = y.id
Any ideas on how can I improve this. The following is the result of EXPLAIN:
Nested Loop (cost=84859.76..85028.54 rows=20 width=272) (actual time=12612.596..12612.836 rows=20 loops=1)
-> Limit (cost=84859.34..84859.39 rows=20 width=52) (actual time=12612.502..12612.517 rows=20 loops=1)
-> Sort (cost=84859.34..84880.26 rows=8371 width=52) (actual time=12612.499..12612.503 rows=20 loops=1)
Sort Key: q.score DESC, q.visits DESC
Sort Method: top-N heapsort Memory: 27kB
-> Subquery Scan on q (cost=84218.04..84636.59 rows=8371 width=52) (actual time=12513.168..12602.649 rows=28965 loops=1)
-> HashAggregate (cost=84218.04..84301.75 rows=8371 width=36) (actual time=12513.093..12536.823 rows=28965 loops=1)
Group Key: a.article_id
-> Bitmap Heap Scan on article_reports a (cost=20122.78..77122.91 rows=405436 width=36) (actual time=135.588..11974.774 rows=398242 loops=1)
Recheck Cond: (((site_id)::text = 'XYZ'::text) AND (date >= '2017-04-13'::date) AND (date <= '2017-06-28'::date))
Heap Blocks: exact=36911
-> Bitmap Index Scan on index_article_reports_on_site_id_and_article_id_and_date (cost=0.00..20021.42 rows=405436 width=0) (actual time=125.846..125.846 rows=398479 loops=1)"
Index Cond: (((site_id)::text = 'XYZ'::text) AND (date >= '2017-04-13'::date) AND (date <= '2017-06-28'::date))
-> Index Scan using articles_pkey on articles y (cost=0.42..8.44 rows=1 width=128) (actual time=0.014..0.014 rows=1 loops=20)
Index Cond: (id = q.article_id)
Planning time: 1.443 ms
Execution time: 12613.689 ms
Thanks in advance
There are two levels of "cache" that Postgres uses:
OS file cache
shared buffers.
Important: Postgres directly controls only the second one, and relies on the first one, which is under OS' control.
First thing I would check are these two settings in postgresql.conf:
effective_cache_size – usually I set it to ~3/4 of all RAM available. Notice that it's not a setting that tells Postgres how to allocate memory, it's just "an advice" to Postgres planner telling some estimate of OS file cache size
shared_buffers – usually I set it to 1/4 of RAM size. This is allocation setting.
Also, I'd check other memory-related settings (work_mem, maintenance_work_mem) to understand how much RAM might be consumed, so will my effective_cache_size estimation be correct at most times.
But if you just turned your Postgres on, the first queries will most probably be long because there is no data in OS file cache and in shared buffers. You can check it with advanced EXPLAIN options:
EXPLAIN (ANALYZE, BUFFERS) SELECT ...
-- you will see how many buffers were fetched from disk ("read") or from cache ("hit")
Here you can find good material on using EXPLAIN: http://www.dalibo.org/_media/understanding_explain.pdf
Additionally, there is an extension aiming to solve "cold cache" problem: pg_prewarm https://www.postgresql.org/docs/current/static/pgprewarm.html
Also, working with SSD disks instead of magnetic ones will mean that disk reads will be much faster.
Have fun and well working Postgres :-)
If it is the first query after inserting several rows you must run an
ANALYZE
in all the database or over the involved tables. Try executing it at database level.