Postgres not using Covering Index with Aggregate - postgresql

Engine version: 12.4
Postgres wasn't using index only scan, then I ran vacuum analyze verbose table_name. After that it started using index only scan. Earlier when I had ran analyze verbose table_name without vacuum that time index only scan wasn't used.
So it means there is very heavy dependency on vacuum to use index only plan. Is there any way to eliminate this dependency OR should we configure vacuum very regularly? frequency like daily.
Our objective is to reduce cpu usage.. Overall machine cpu usage is 10%-15% throughout the day but when this query runs then cpu goes very high( this query runs in multiple threads at same time with diff values)
EXPLAIN ANALYZE SELECT COALESCE(requested_debit, 0) AS requestedDebit, COALESCE(requested_credit, 0) AS requestedCredit
FROM (SELECT COALESCE(Sum(le.credit), 0) AS requested_credit, COALESCE(Sum(le.debit), 0) AS requested_debit
FROM ledger_entries le
WHERE le.accounting_entity_id = 1
AND le.general_ledger_id = 503
AND le.post_date BETWEEN '2020-09-10' AND '2020-11-30') AS requested_le;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on requested_le (cost=66602.65..66602.67 rows=1 width=64) (actual time=81.263..81.352 rows=1 loops=1)
-> Finalize Aggregate (cost=66602.65..66602.66 rows=1 width=64) (actual time=81.261..81.348 rows=1 loops=1)
-> Gather (cost=66602.41..66602.62 rows=2 width=64) (actual time=79.485..81.331 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=65602.41..65602.42 rows=1 width=64) (actual time=74.293..74.294 rows=1 loops=3)
-> Parallel Index Only Scan using post_date_gl_id_accounting_entity_id_include_idx on ledger_entries le (cost=0.56..65203.73 rows=79735 width=8) (actual time=47.874..74.212 rows=197 loops=3)
Index Cond: ((post_date >= '2020-09-10'::date) AND (post_date <= '2020-11-30'::date) AND (general_ledger_id = 503) AND (accounting_entity_id = 1))
Heap Fetches: 0
Planning Time: 0.211 ms
Execution Time: 81.395 ms
(11 rows)

There is a very strong connection between VACUUM and index-only scans in PostgreSQL: an index-only scan can only skip fetching the table row (to check for visibility) if the block containing the row is marked all-visible in the visibility map. And the visibility map is updated by VACUUM.
So yes, you have to VACUUM often to get efficient index-only scans.
Typically, there is no need to schedule a manual VACUUM, you can simply
ALTER TABLE mytab SET (autovacuum_vacuum_scale_factor = 0.01);
(or a similar value) and let autovacuum do the job.
The only case where this is problematic are insert-only tables, because for them autovacuum won't be triggered for PostgreSQL versions below v13. In v13, you can simply change autovacuum_vacuum_insert_scale_factor, while in older versions you will have to set autovacuum_freeze_max_age to a low value for that table.

Related

LockRows plan node taking long time

I have the following query in Postgres (emulating a work queue):
DELETE FROM work_queue
WHERE id IN ( SELECT l.id
FROM work_queue l
WHERE l.delivered = 'f' and l.error = 'f' and l.archived = 'f'
ORDER BY created_at
LIMIT 5000
FOR UPDATE SKIP LOCKED );
While running the above concurrently (4 processes per second) along with a concurrent ingest at the rate of 10K records/second into work_queue, the query effectively bottlenecks on LockRow node.
Query plan output:
Delete on work_queue (cost=478.39..39609.09 rows=5000 width=67) (actual time=38734.995..38734.998 rows=0 loops=1)
-> Nested Loop (cost=478.39..39609.09 rows=5000 width=67) (actual time=36654.711..38507.393 rows=5000 loops=1)
-> HashAggregate (cost=477.96..527.96 rows=5000 width=98) (actual time=36654.690..36658.495 rows=5000 loops=1)
Group Key: "ANY_subquery".id
-> Subquery Scan on "ANY_subquery" (cost=0.43..465.46 rows=5000 width=98) (actual time=36600.963..36638.250 rows=5000 loops=1)
-> Limit (cost=0.43..415.46 rows=5000 width=51) (actual time=36600.958..36635.886 rows=5000 loops=1)
-> LockRows (cost=0.43..111701.83 rows=1345680 width=51) (actual time=36600.956..36635.039 rows=5000 loops=1)
-> Index Scan using work_queue_created_at_idx on work_queue l (cost=0.43..98245.03 rows=1345680 width=51) (actual time=779.706..2690.340 rows=250692 loops=1)
Filter: ((NOT delivered) AND (NOT error) AND (NOT archived))
-> Index Scan using work_queue_pkey on work_queue (cost=0.43..7.84 rows=1 width=43) (actual time=0.364..0.364 rows=1 loops=5000)
Index Cond: (id = "ANY_subquery".id)
Planning Time: 8.424 ms
Trigger for constraint work_queue_logs_work_queue_id_fkey: time=5490.925 calls=5000
Trigger work_queue_locked_trigger: time=2119.540 calls=1
Execution Time: 46346.471 ms
(corresponding visualization: https://explain.dalibo.com/plan/ZaZ)
Any ideas on improving this? Why should locking rows take so long in the presence of concurrent inserts? Note that if I do not have concurrent inserts into the work_queue table, the query is super fast.
We can see that the index scan returned 250692 rows in order to find 5000 to lock. So apparently we had to skip over 49 other queries worth of locked rows. That is not going to be very efficient, although if static it shouldn't be as slow as you see here. But it has to acquire a transient exclusive lock on a section of memory for each attempt. If it is fighting with many other processes for those locks, you can get a cascading collapse of performance.
If you are launching 4 such statements per second with no cap and without waiting for any previous ones to finish, then you have an unstable situation. The more you have running at one time, the more they fight each other and slow down. If the completion rate goes down but the launch interval does not, then you just get more processes fighting with more other processes and each getting slower. So once you get shoved over the edge, it might never recover on its own.
The role of concurrent insertions is probably just to provide enough noisy load on the system to give the collapse a chance to take a foothold. And of course without concurrent insertion, your deletes are doing to run out of things to delete pretty soon, at which point they will be very fast.

Inordinately slow Nested Loop with join on simple query

I'm running the query below against the primary key lt_id (no other index bar the pkey btree) and joining against 1000 ids.
It might be just my lack of experience with postgres but it seems like it's maybe an order of magnitude slow.. There are 800k rows in the table in total.
This is a low spec machine(4G mem) but still thought it should be faster. CPU is idle.
EXPLAIN (ANALYZE,BUFFERS) SELECT lt_id FROM "mytable" d INNER JOIN ( VALUES (1839147),(...998 more rows here...),(1756908)) v(id) ON (d.lt_id = v.id);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.42..7743.00 rows=1000 width=4) (actual time=69.852..20743.393 rows=1000 loops=1)
Buffers: shared hit=2395 read=1607
-> Values Scan on "*VALUES*" (cost=0.00..12.50 rows=1000 width=4) (actual time=0.004..4.770 rows=1000 loops=1)
-> Index Only Scan using lt_id_idx on mytable d (cost=0.42..7.73 rows=1 width=4) (actual time=20.732..20.732 rows=1 loops=1000)
Index Cond: (lt_id = "*VALUES*".column1)
Heap Fetches: 1000
Buffers: shared hit=2395 read=1607
Planning Time: 86.284 ms
Execution Time: 20744.223 ms
(9 rows)
psql 11.7 , I was using 9 but upgraded to 11.7 , no real difference in speed observed.
free
total used free shared buff/cache available
Mem: 3783732 158076 3400932 55420 224724 3366832
Swap: 0 0 0
Even though it's low spec should it really be taking 20 seconds? In fact many other queries are taking twice as long or more. 20 seconds seems to be the best case scenario. There are a couple of other text columns in the table with some small text articles which I doubt is the issue.
I was previously using IN operator but observed similar or worse speeds.
I also made a couple of small changes from the default config, but it doesn't seem to make much difference.
work_mem = 32MB
shared_buffers = 512MB
Any ideas if this is expected performance given the machine? Or is there something else I can try?
edit: I guess what I'm curious about it the time in the actual loop
actual time=20.732..20.732 rows=1 loops=1000
It seems like the actual time is less than or equal 1ms per loop which in worst case would be less than 1 second for 1000 iterations and other operations also seem negligible. Does this mean the issue is simple IO ? slow disk ? What would typically be the situation here.
I notice if I run the query on my desktop which only has 8G ram but is using an SSD the query is massively faster..
Using an SSD is fine of course but I'd like to know if something in my config or query/setup is not optimal..
As #pifor suggested, set track_io_timing=on , can see that this is indeed almost entirely IO slowness..
Nested Loop (cost=0.42..7743.00 rows=1000 width=69) (actual time=0.026..14901.004 rows=1000 loops=1)
Buffers: shared hit=2859 read=1145
I/O Timings: read=14861.578
-> Values Scan on "*VALUES*" (cost=0.00..12.50 rows=1000 width=4) (actual time=0.002..5.497 rows=1000 loops=1)
-> Index Scan using mytable_pkey on mytable d (cost=0.42..7.73 rows=1 width=69) (actual time=14.888..14.888 rows=1 loops=1000)
Index Cond: (lt_id = "*VALUES*".column1)
Buffers: shared hit=2859 read=1145
I/O Timings: read=14861.578
Planning Time: 0.420 ms
Execution Time: 14901.734 ms
(10 rows)

Transaction is 20x slower on production server

One my development server the test transaction (series of updates etc) runs in about 2 minutes. On the production server it's about 25 minutes.
The server reads the file and inserts records. It starts out fast but then goes slower and slower as it progresses. There is an aggregate table update for each record that gets inserted and it is that update that progressively slows down. That aggregate update does query the table being written to with the inserts.
The config is only different in max_worker_processes (development 8, prod 16), shared_buffers (dev 128MB, prod 512MB), wal_buffers (Dev 4MB, prod 16MB).
I've tried tweaking a few configs and also dumped the whole database and re-did initdb just in case it was not upgraded (to 9.6) correctly. Nothing's worked.
I'm hoping that someone with experience in this could tell me what to look for.
Edit: After receiving some comments I was able figure out what is going on and get a work around going, but I think there has to be a better way. Firstly what is happening is this:
At first there is no data in the table for the relevant index, postgresql works out this plan. Note that there is data in the table just not anything with the relevant "businessIdentifier" index or "transactionNumber".
Aggregate (cost=16.63..16.64 rows=1 width=4) (actual time=0.031..0.031 rows=1 loops=1)
-> Nested Loop (cost=0.57..16.63 rows=1 width=4) (actual time=0.028..0.028 rows=0 loops=1)
-> Index Scan using transactionlinedateindex on "transactionLine" ed (cost=0.29..8.31 rows=1 width=5) (actual time=0.028..0.028 rows=0 loops=1)
Index Cond: ((("businessIdentifier")::text = '36'::text) AND ("reconciliationNumber" = 4519))
-> Index Scan using transaction_pkey on transaction eh (cost=0.29..8.31 rows=1 width=9) (never executed)
Index Cond: ((("businessIdentifier")::text = '36'::text) AND (("transactionNumber")::text = (ed."transactionNumber")::text))
Filter: ("transactionStatus" = 'posted'::"transactionStatusItemType")
Planning time: 0.915 ms
Execution time: 0.100 ms
Then as data gets inserted it becomes a really bad plan. 474ms in this example. It needs to execute thousands of times depending on what is uploaded so 474ms is bad.
Aggregate (cost=16.44..16.45 rows=1 width=4) (actual time=474.222..474.222 rows=1 loops=1)
-> Nested Loop (cost=0.57..16.44 rows=1 width=4) (actual time=474.218..474.218 rows=0 loops=1)
Join Filter: ((eh."transactionNumber")::text = (ed."transactionNumber")::text)
-> Index Scan using transaction_pkey on transaction eh (cost=0.29..8.11 rows=1 width=9) (actual time=0.023..0.408 rows=507 loops=1)
Index Cond: (("businessIdentifier")::text = '37'::text)
Filter: ("transactionStatus" = 'posted'::"transactionStatusItemType")
-> Index Scan using transactionlineprovdateindex on "transactionLine" ed (cost=0.29..8.31 rows=1 width=5) (actual time=0.934..0.934 rows=0 loops=507)
Index Cond: (("businessIdentifier")::text = '37'::text)
Filter: ("reconciliationNumber" = 4519)
Rows Removed by Filter: 2520
Planning time: 0.848 ms
Execution time: 474.278 ms
Vacuum analyze fixes it. But you cannot run Vacuum analyze until after the transaction is committed. After Vacuum analyze postgresql uses a different plan and it's back down to 0.1 ms.
Aggregate (cost=16.63..16.64 rows=1 width=4) (actual time=0.072..0.072 rows=1 loops=1)
-> Nested Loop (cost=0.57..16.63 rows=1 width=4) (actual time=0.069..0.069 rows=0 loops=1)
-> Index Scan using transactionlinedateindex on "transactionLine" ed (cost=0.29..8.31 rows=1 width=5) (actual time=0.067..0.067 rows=0 loops=1)
Index Cond: ((("businessIdentifier")::text = '37'::text) AND ("reconciliationNumber" = 4519))
-> Index Scan using transaction_pkey on transaction eh (cost=0.29..8.31 rows=1 width=9) (never executed)
Index Cond: ((("businessIdentifier")::text = '37'::text) AND (("transactionNumber")::text = (ed."transactionNumber")::text))
Filter: ("transactionStatus" = 'posted'::"transactionStatusItemType")
Planning time: 1.134 ms
Execution time: 0.141 ms
My work around is to commit after about 100 inserts and then run Vacuum analyze and then continue. The only problem is that if something in the remainder of the data fails and it's rolled back, there will still be 100 records inserted.
Is there a better way to handle this? Should I just upgrade to version 10 or 11 or postgresql and would that help?
There is an aggregate table update for each record that gets inserted and it is that update that progressively slows down.
Here is an idea: Change the workflow to (1) import external data into a table, using COPY interface, (2) Index and ANALYZE that data, (3) run the final UPDATE with all required joins/groupings to do actual transformation and update the aggregate table.
All of that could be done in one, long transaction - if needed.
Only if the whole thing is locking some vital database objects for too long, you should consider splitting this into separate transactions / batches (processing data partitioned in some generic way, by date/time or by ID).
But you cannot run Vacuum analyze until after the transaction is committed.
To get updated costs of query plan, you need only ANALYZE not VACUUM.

Slow on first query

I'm having troubles when I perform the first query on a table. Subsequent queries are much faster, even if I change the range date to look for. I assume that PostgreSQL implements a caching mechanism that allows the subsequent queries to be much faster. I can try to warmup the cache so the first user request can hit the cache. However, I think I can somehow improve the following query:
SELECT
y.id, y.title, x.visits, x.score
FROM (
SELECT
article_id, visits,
COALESCE(ROUND((visits / NULLIF(hits ,0)::float)::numeric, 4), 0) score
FROM (
SELECT
article_id, SUM(visits) visits, SUM(hits) hits
FROM
article_reports
WHERE
a.site_id = 'XYZ' AND a.date >= '2017-04-13' AND a.date <= '2017-06-28'
GROUP BY
article_id
) q ORDER BY score DESC, visits DESC LIMIT(20)
) x
INNER JOIN
articles y ON x.article_id = y.id
Any ideas on how can I improve this. The following is the result of EXPLAIN:
Nested Loop (cost=84859.76..85028.54 rows=20 width=272) (actual time=12612.596..12612.836 rows=20 loops=1)
-> Limit (cost=84859.34..84859.39 rows=20 width=52) (actual time=12612.502..12612.517 rows=20 loops=1)
-> Sort (cost=84859.34..84880.26 rows=8371 width=52) (actual time=12612.499..12612.503 rows=20 loops=1)
Sort Key: q.score DESC, q.visits DESC
Sort Method: top-N heapsort Memory: 27kB
-> Subquery Scan on q (cost=84218.04..84636.59 rows=8371 width=52) (actual time=12513.168..12602.649 rows=28965 loops=1)
-> HashAggregate (cost=84218.04..84301.75 rows=8371 width=36) (actual time=12513.093..12536.823 rows=28965 loops=1)
Group Key: a.article_id
-> Bitmap Heap Scan on article_reports a (cost=20122.78..77122.91 rows=405436 width=36) (actual time=135.588..11974.774 rows=398242 loops=1)
Recheck Cond: (((site_id)::text = 'XYZ'::text) AND (date >= '2017-04-13'::date) AND (date <= '2017-06-28'::date))
Heap Blocks: exact=36911
-> Bitmap Index Scan on index_article_reports_on_site_id_and_article_id_and_date (cost=0.00..20021.42 rows=405436 width=0) (actual time=125.846..125.846 rows=398479 loops=1)"
Index Cond: (((site_id)::text = 'XYZ'::text) AND (date >= '2017-04-13'::date) AND (date <= '2017-06-28'::date))
-> Index Scan using articles_pkey on articles y (cost=0.42..8.44 rows=1 width=128) (actual time=0.014..0.014 rows=1 loops=20)
Index Cond: (id = q.article_id)
Planning time: 1.443 ms
Execution time: 12613.689 ms
Thanks in advance
There are two levels of "cache" that Postgres uses:
OS file cache
shared buffers.
Important: Postgres directly controls only the second one, and relies on the first one, which is under OS' control.
First thing I would check are these two settings in postgresql.conf:
effective_cache_size – usually I set it to ~3/4 of all RAM available. Notice that it's not a setting that tells Postgres how to allocate memory, it's just "an advice" to Postgres planner telling some estimate of OS file cache size
shared_buffers – usually I set it to 1/4 of RAM size. This is allocation setting.
Also, I'd check other memory-related settings (work_mem, maintenance_work_mem) to understand how much RAM might be consumed, so will my effective_cache_size estimation be correct at most times.
But if you just turned your Postgres on, the first queries will most probably be long because there is no data in OS file cache and in shared buffers. You can check it with advanced EXPLAIN options:
EXPLAIN (ANALYZE, BUFFERS) SELECT ...
-- you will see how many buffers were fetched from disk ("read") or from cache ("hit")
Here you can find good material on using EXPLAIN: http://www.dalibo.org/_media/understanding_explain.pdf
Additionally, there is an extension aiming to solve "cold cache" problem: pg_prewarm https://www.postgresql.org/docs/current/static/pgprewarm.html
Also, working with SSD disks instead of magnetic ones will mean that disk reads will be much faster.
Have fun and well working Postgres :-)
If it is the first query after inserting several rows you must run an
ANALYZE
in all the database or over the involved tables. Try executing it at database level.

Postgresql 9.x: Index to optimize `xpath_exists` (XMLEXISTS) queries

We have queries of the form
select sum(acol)
where xpath_exists('/Root/KeyValue[Key="val"]/Value//text()', xmlcol)
What index can be built to speed up the where clause ?
A btree index created using
create index idx_01 using btree(xpath_exists('/Root/KeyValue[Key="val"]/Value//text()', xmlcol))
does not seem to be used at all.
EDIT
Setting enable_seqscan to off, the query using xpath_exists is much faster (one order of magnitude) and clearly shows using the corresponding index (the btree index built with xpath_exists).
Any clue why PostgreSQL would not be using the index and attempt a much slower sequential scan ?
Since I do not want to disable sequential scanning globally, I am back to square one and I am happily welcoming suggestions.
EDIT 2 - Explain plans
See below - Cost of first plan (seqscan off) is slightly higher but processing time much faster
b2box=# set enable_seqscan=off;
SET
b2box=# explain analyze
Select count(*)
from B2HEAD.item
where cluster = 'B2BOX' and ( ( xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()', content) ) ) offset 0 limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=22766.63..22766.64 rows=1 width=0) (actual time=606.042..606.042 rows=1 loops=1)
-> Aggregate (cost=22766.63..22766.64 rows=1 width=0) (actual time=606.039..606.039 rows=1 loops=1)
-> Bitmap Heap Scan on item (cost=1058.65..22701.38 rows=26102 width=0) (actual time=3.290..603.823 rows=4085 loops=1)
Filter: (xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()'::text, content, '{}'::text[]) AND ((cluster)::text = 'B2BOX'::text))
-> Bitmap Index Scan on item_counter_01 (cost=0.00..1052.13 rows=56515 width=0) (actual time=2.283..2.283 rows=4085 loops=1)
Index Cond: (xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()'::text, content, '{}'::text[]) = true)
Total runtime: 606.136 ms
(7 rows)
plan on explain.depesz.com
b2box=# set enable_seqscan=on;
SET
b2box=# explain analyze
Select count(*)
from B2HEAD.item
where cluster = 'B2BOX' and ( ( xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()', content) ) ) offset 0 limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=22555.71..22555.72 rows=1 width=0) (actual time=10864.163..10864.163 rows=1 loops=1)
-> Aggregate (cost=22555.71..22555.72 rows=1 width=0) (actual time=10864.160..10864.160 rows=1 loops=1)
-> Seq Scan on item (cost=0.00..22490.45 rows=26102 width=0) (actual time=33.574..10861.672 rows=4085 loops=1)
Filter: (xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()'::text, content, '{}'::text[]) AND ((cluster)::text = 'B2BOX'::text))
Rows Removed by Filter: 108945
Total runtime: 10864.242 ms
(6 rows)
plan on explain.depesz.com
Planner cost parameters
Cost of first plan (seqscan off) is slightly higher but processing time much faster
This tells me that your random_page_cost and seq_page_cost are probably wrong. You're likely on storage with fast random I/O - either because most of the database is cached in RAM or because you're using SSD, SAN with cache, or other storage where random I/O is inherently fast.
Try:
SET random_page_cost = 1;
SET seq_page_cost = 1.1;
to greatly reduce the cost param differences and then re-run. If that does the job consider changing those params in postgresql.conf..
Your row-count estimates are reasonable, so it doesn't look like a planner mis-estimation problem or a problem with bad table statistics.
Incorrect query
Your query is also incorrect. OFFSET 0 LIMIT 1 without an ORDER BY will produce unpredictable results unless you're guaranteed to have exactly one match, in which case the OFFSET ... LIMIT ... clauses are unnecessary and can be removed entirely.
You're usually much better off phrasing such queries as SELECT max(...) or SELECT min(...) where possible; PostgreSQL will tend to be able to use an index to just pluck off the desired value without doing an expensive table scan or an index scan and sort.
Tips
BTW, for future questions the PostgreSQL wiki has some good information in the performance category and a guide to asking Slow query questions.