Explain not showing memory usage - postgresql

I am trying to see how much memory a query takes. In articles I can see in their explain output has a memory value but when I run explain I do not get a memory value.
Here is my query:
explain (analyze, verbose, buffers) select * from watched_url_queue
join rate_check rc on watched_url_queue.domain_key = rc.domain_key
where rc.locked = false
order by rc.domain_key
limit 1;
And this is my output:
Limit (cost=0.70..0.85 rows=1 width=336) (actual time=0.009..0.011 rows=1 loops=1)
Output: watched_url_queue.watched_url_record_id, watched_url_queue.url, watched_url_queue.domain_key, watched_url_queue.targets, watched_url_queue.create_date, watched_url_queue.tries, watched_url_queue.defer_until, watched_url_queue.duration, watched_url_queue.user_auth_custom_id, watched_url_queue.completed, rc.domain_key, rc.last_scan, rc.locked, rc.domain_key
Buffers: shared hit=7
-> Merge Join (cost=0.70..32514.53 rows=219864 width=336) (actual time=0.009..0.009 rows=1 loops=1)
Output: watched_url_queue.watched_url_record_id, watched_url_queue.url, watched_url_queue.domain_key, watched_url_queue.targets, watched_url_queue.create_date, watched_url_queue.tries, watched_url_queue.defer_until, watched_url_queue.duration, watched_url_queue.user_auth_custom_id, watched_url_queue.completed, rc.domain_key, rc.last_scan, rc.locked, rc.domain_key
Inner Unique: true
Merge Cond: (watched_url_queue.domain_key = rc.domain_key)
Buffers: shared hit=7
-> Index Scan using idx_watchedurlqueue_domainkey on public.watched_url_queue (cost=0.42..29069.88 rows=439728 width=289) (actual time=0.003..0.003 rows=1 loops=1)
Output: watched_url_queue.watched_url_record_id, watched_url_queue.url, watched_url_queue.domain_key, watched_url_queue.targets, watched_url_queue.create_date, watched_url_queue.tries, watched_url_queue.defer_until, watched_url_queue.duration, watched_url_queue.user_auth_custom_id, watched_url_queue.completed
Buffers: shared hit=4
-> Index Scan using rate_check_pkey on public.rate_check rc (cost=0.28..141.85 rows=2519 width=28) (actual time=0.003..0.003 rows=1 loops=1)
Output: rc.domain_key, rc.last_scan, rc.locked
Filter: (NOT rc.locked)
Buffers: shared hit=3
Planning time: 0.362 ms
Execution time: 0.060 ms
How do I see the memory usage?

A Merge Join doesn't report memory usage because it doesn't need any buffers (like e.g. the hash join). It simply goes through the sorted data from the two indexes as they are retrieved without the need to buffer that.
If you want to estimate the overall size of each step in the plan, you can multiply the width value and the rows value (from the actual rows).
But that is not necessarily the "memory" that is needed by the query, as the blocks are managed in the shared memory ("cache") rather then "by the query".

Related

Optimize Postgres Query with nested join

I have a query that takes several seconds in my staging environment, but takes a few minutes in my production environment. The query is the same, but the production environment has many more rows in each table (appx 10x).
The goal of the query is to find all distinct software records which are installed on at least one of the assets in the search results (NOTE: the search criteria can vary across dozens of fields).
The tables:
Assets: dozens of fields on which a user can search - production has several million records
InstalledSoftwares: asset_id (reference), software_id (reference) - each asset has 10-100 installed software records, so there are 10s of millions of records in production.
Softwares: the results. Production has less than 4000 unique software records.
I've removed duplicate WHERE clauses based on the suggestions from Laurenz Albe.
How can I make this more efficient?
The Query:
SELECT DISTINCT "softwares"."id" FROM "softwares"
INNER JOIN "installed_softwares" ON "installed_softwares"."software_id" = "softwares"."id"
WHERE "installed_softwares"."asset_id" IN
(SELECT "assets"."id" FROM "assets"
WHERE "assets"."assettype_id" = 3
AND "assets"."archive_number" IS NULL
AND "assets"."expired" = FALSE
AND "assets"."local_status_id" != 4
AND "assets"."scorable" = TRUE)
Here is the EXPLAIN (analyze, buffers) for the query:
Unique (cost=153558.09..153588.92 rows=5524 width=8) (actual time=4224.203..5872.293 rows=3525 loops=1)
Buffers: shared hit=112428 read=187, temp read=3145 written=3164
I/O Timings: read=2.916
-> Sort (cost=153558.09..153573.51 rows=6165 width=8) (actual time=4224.200..5065.158 rows=1087807 loops=1)
Sort Key: softwares.id
Sort Method: external merge Disk: 19240kB
Buffers: shared hit=112428 read=187, temp read=3145 written=3164
I/O Timings: read=2.916
-> Hash Join (cost=119348.05..153170.01 rows=6165 width=8) (actual time=342.860..3159.458 rows=1087807 loops=1)
Hash Cond: (installed_softwares.software_id = softwares.id)
Buffers: shared hit=112428 read=187
I/O Timings: read=2.916
-> Gather (cost=119119.76..152925.53 rows=6165 width=8) (actual time=333.981..1320.277 rows=1087807 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=112324 read=187
I/O Timings: read=2.916
-> Parallel Hash Join (cost=118119.76..151309.03 rows=2569 width=8) (actual time=331.836..2010.171 rows=362602 loops=3)
Hash Cond: (installed_softwares.asset_id = assets.id)
Buffers: shared hit=112324 read=187
I/O Timings: read=2.916
-> Parallel Seq Scan on installed_softwares (cost=0.00..30518.88 rows=1017288 width=16) (actual time=0.007..667.564 rows=813396 loops=3)
Buffers: shared hit=20159 read=187
I/O Timings: read=2.916
-> Parallel Hash (cost=118065.04..118065.04 rows=4378 width=4) (actual time=331.648..331.651 rows=23407 loops=3)
Buckets: 131072 (originally 16384) Batches: 1 (originally 1) Memory Usage: 4672kB
Buffers: shared hit=92058
-> Parallel Seq Scan on assets (cost=0.00..118065.04 rows=4378 width=4) (actual time=0.012..302.134 rows=23407 loops=3)
Filter: ((archive_number IS NULL) AND (NOT expired) AND scorable AND (local_status_id <> 4) AND (assettype_id = 3))
Rows Removed by Filter: 1363624
Buffers: shared hit=92058
-> Hash (cost=159.24..159.24 rows=5524 width=8) (actual time=8.859..8.862 rows=5546 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 281kB
Buffers: shared hit=104
-> Seq Scan on softwares (cost=0.00..159.24 rows=5524 width=8) (actual time=0.006..4.426 rows=5546 loops=1)
Buffers: shared hit=104
Planning Time: 0.534 ms
Execution Time: 5878.476 ms
The problem is the bad row count estimate for the scan on assets. The estimate is so bad (48 rows instead of 23407), because the condition is somewhat redundant:
archived_at IS NULL AND archive_number IS NULL AND
NOT expired AND
scorable AND
local_status_id <> 4 AND local_status_id <> 4 AND
assettype_id = 3
PostgreSQL treats all these conditions as statistically independent, which leads it astray. One of the conditions (local_status_id <> 4) is present twice; remove one copy. The first two conditions seem somewhat redundant too; perhaps one of them can be omitted.
Perhaps that is enough to improve the estimate so that PostgreSQL does not choose the slow nested loop join.

Nested Loop Left Join cost too much time?

This is the query:
EXPLAIN (analyze, BUFFERS, SETTINGS)
SELECT
operation.id
FROM
operation
RIGHT JOIN(
SELECT uid, did FROM (
SELECT uid, did FROM operation where id = 993754
) t
) parts ON (operation.uid = parts.uid AND operation.did = parts.did)
and EXPLAIN info:
Nested Loop Left Join (cost=0.85..29695.77 rows=100 width=8) (actual time=13.709..13.711 rows=1 loops=1)
Buffers: shared hit=4905
-> Unique (cost=0.42..8.45 rows=1 width=16) (actual time=0.011..0.013 rows=1 loops=1)
Buffers: shared hit=5
-> Index Only Scan using oi on operation operation_1 (cost=0.42..8.44 rows=1 width=16) (actual time=0.011..0.011 rows=1 loops=1)
Index Cond: (id = 993754)
Heap Fetches: 1
Buffers: shared hit=5
-> Index Only Scan using oi on operation (cost=0.42..29686.32 rows=100 width=24) (actual time=13.695..13.696 rows=1 loops=1)
Index Cond: ((uid = operation_1.uid) AND (did = operation_1.did))
Heap Fetches: 1
Buffers: shared hit=4900
Settings: max_parallel_workers_per_gather = '4', min_parallel_index_scan_size = '0', min_parallel_table_scan_size = '0', parallel_setup_cost = '0', parallel_tuple_cost = '0', work_mem = '256MB'
Planning Time: 0.084 ms
Execution Time: 13.728 ms
Why does Nested Loop cost more and more time than sum of childs cost? What can I do for that? The Execution Time should less than 1 ms right?
update:
Nested Loop Left Join (cost=5.88..400.63 rows=101 width=8) (actual time=0.012..0.012 rows=1 loops=1)
Buffers: shared hit=8
-> Index Scan using oi on operation operation_1 (cost=0.42..8.44 rows=1 width=16) (actual time=0.005..0.005 rows=1 loops=1)
Index Cond: (id = 993754)
Buffers: shared hit=4
-> Bitmap Heap Scan on operation (cost=5.45..391.19 rows=100 width=24) (actual time=0.004..0.005 rows=1 loops=1)
Recheck Cond: ((uid = operation_1.uid) AND (did = operation_1.did))
Heap Blocks: exact=1
Buffers: shared hit=4
-> Bitmap Index Scan on ou (cost=0.00..5.42 rows=100 width=0) (actual time=0.003..0.003 rows=1 loops=1)
Index Cond: ((uid = operation_1.uid) AND (did = operation_1.did))
Buffers: shared hit=3
Settings: max_parallel_workers_per_gather = '4', min_parallel_index_scan_size = '0', min_parallel_table_scan_size = '0', parallel_setup_cost = '0', parallel_tuple_cost = '0', work_mem = '256MB'
Planning Time: 0.127 ms
Execution Time: 0.028 ms
Thanks all of you, when I split the index to btree(id) and btree(uid, did), everything's going perfect, but what caused those can not be used together? Any details or rules?
BTW, the sql is used for Real-Time Calculation, there are some Window Functions code didn't show here.
The Nested Loop does not take much time actually. The actual time of 13.709..13.711 means that it took 13.709 ms until the first row was ready to be emitted from this node and it took 0.002 ms until it was finished.
Note that the startup cost of 13.709 ms includes the cost of its two child nodes. Both of the child nodes need to emit at least one row before the nested loop can start.
The Unique child began emitting its first (and only) row after 0.011 ms. The Index Only Scan child however only started to emit its first (and only) row after 13.695 ms. This means that most of your actual time spent is in this Index Only Scan.
There is a great answer here which explains the costs and actual times in depth.
Also there is a nice tool at https://explain.depesz.com which calculates an inclusive and exclusive time for each node. Here it is used for your query plan which clearly shows that most of the time is spent in the Index Only Scan.
Since the query is spending almost all of the time in this index only scan, optimizations there will have the most benefit. Creating a separate index for the columns uid and did on the operation table should improve query time a lot.
CREATE INDEX operation_uid_did ON operation(uid, did);
The current execution plan contains 2 index only scans.
A slow one:
-> Index Only Scan using oi on operation (cost=0.42..29686.32 rows=100 width=24) (actual time=13.695..13.696 rows=1 loops=1)
Index Cond: ((uid = operation_1.uid) AND (did = operation_1.did))
Heap Fetches: 1
Buffers: shared hit=4900
And a fast one:
-> Index Only Scan using oi on operation operation_1 (cost=0.42..8.44 rows=1 width=16) (actual time=0.011..0.011 rows=1 loops=1)
Index Cond: (id = 993754)
Heap Fetches: 1
Buffers: shared hit=5
Both of them use the index oi but have different index conditions. Note how the fast one, who uses the id as index condition only needs to load 5 pages of data (Buffers: shared hit=5). The slow one needs to load 4900 pages instead (Buffers: shared hit=4900). This indicates that the index is optimized to query for id but not so much for uid and did. Probably the index oi covers all 3 columns id, uid, did in this order.
A multi-column btree index can only be used efficently when there are constraints in the query on the leftmost columns. The official documentation about multi-column indexes explains this very well in depth.
Why does Nested Loop cost more and more time than sum of childs cost?
Based on your example, it doesn't. Can you elaborate on what makes you think it does?
Anyway, it seems extravagant to visit 4900 pages to fetch 1 tuple. I'm guessing your tables are not getting vacuumed enough.
Although now I prefer Florian's suggestion, that "uid" and "did" are not the leading columns of the index, and that is why it is slow. It is basically doing a full index scan, using the index as a skinny version of the table. It is a shame that EXPLAIN output doesn't make it clear when a index is being used in this fashion, rather than the traditional "jump to a specific part of the index"
So you have a missing index.

Reduce execution time of my query in postgresql

Below is my query. It takes 9387.430 ms to execute, which is certainly too long for such a request. I would like to be able to reduce this execution time. Can you please help me on this ? I also provided my analyze output.
EXPLAIN ANALYZE
SELECT a.artist, b.artist, COUNT(*)
FROM release_has_artist a, release_has_artist b
WHERE a.release = b.release AND a.artist <> b.artist
GROUP BY(a.artist,b.artist)
ORDER BY (a.artist,b.artist);;
Output of EXPLAIN ANALYZE :
Sort (cost=1696482.86..1707588.14 rows=4442112 width=48) (actual time=9253.474..9314.510 rows=461386 loops=1)
Sort Key: (ROW(a.artist, b.artist))
Sort Method: external sort Disk: 24832kB
-> Finalize GroupAggregate (cost=396240.32..932717.19 rows=4442112 width=48) (actual time=1928.058..2911.463 rows=461386 loops=1)
Group Key: a.artist, b.artist
-> Gather Merge (cost=396240.32..860532.87 rows=3701760 width=16) (actual time=1928.049..2494.638 rows=566468 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=395240.29..432257.89 rows=1850880 width=16) (actual time=1912.809..2156.951 rows=188823 loops=3)
Group Key: a.artist, b.artist
-> Sort (cost=395240.29..399867.49 rows=1850880 width=8) (actual time=1912.794..2003.776 rows=271327 loops=3)
Sort Key: a.artist, b.artist
Sort Method: external merge Disk: 4848kB
-> Merge Join (cost=0.85..177260.72 rows=1850880 width=8) (actual time=2.143..1623.628 rows=271327 loops=3)
Merge Cond: (a.release = b.release)
Join Filter: (a.artist <> b.artist)
Rows Removed by Join Filter: 687597
-> Parallel Index Only Scan using release_has_artist_pkey on release_has_artist a (cost=0.43..67329.73 rows=859497 width=8) (actual time=0.059..240.998 rows=687597 loops=3)
Heap Fetches: 711154
-> Index Only Scan using release_has_artist_pkey on release_has_artist b (cost=0.43..79362.68 rows=2062792 width=8) (actual time=0.072..798.402 rows=2329742 loops=3)
Heap Fetches: 2335683
Planning time: 2.101 ms
Execution time: 9387.430 ms
In your EXPLAIN ANALYZE output, there are two Sort Method: external merge Disk: ####kB, indicating that the sort spilled out to disk and not in memory, due to an insufficiently-sized work_mem. Try increasing your work_mem up to 32MB (30 might be ok, but I like multiples of 8), and try again
Note that you can set work_mem on a per-session basis, as a global change in work_mem could potentially have negative side-effects, such as running out of memory, because postgresql.conf-configured work_mem is allocated for each session (basically, it has a multiplicative effect).

LIMIT with ORDER BY makes query slow

I am having problems optimizing a query in PostgreSQL 9.5.14.
select *
from file as f
join product_collection pc on (f.product_collection_id = pc.id)
where pc.mission_id = 7
order by f.id asc
limit 100;
Takes about 100 seconds. If I drop the limit clause it takes about 0.5:
With limit:
explain (analyze,buffers) ... -- query exactly as above
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.84..859.32 rows=100 width=457) (actual time=102793.422..102856.884 rows=100 loops=1)
Buffers: shared hit=222430592
-> Nested Loop (cost=0.84..58412343.43 rows=6804163 width=457) (actual time=102793.417..102856.872 rows=100 loops=1)
Buffers: shared hit=222430592
-> Index Scan using file_pkey on file f (cost=0.57..23409008.61 rows=113831736 width=330) (actual time=0.048..28207.152 rows=55858772 loops=1)
Buffers: shared hit=55652672
-> Index Scan using product_collection_pkey on product_collection pc (cost=0.28..0.30 rows=1 width=127) (actual time=0.001..0.001 rows=0 loops=55858772)
Index Cond: (id = f.product_collection_id)
Filter: (mission_id = 7)
Rows Removed by Filter: 1
Buffers: shared hit=166777920
Planning time: 0.803 ms
Execution time: 102856.988 ms
Without limit:
=> explain (analyze,buffers) ... -- query as above, just without limit
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=20509671.01..20526681.42 rows=6804163 width=457) (actual time=456.175..510.596 rows=142055 loops=1)
Sort Key: f.id
Sort Method: quicksort Memory: 79392kB
Buffers: shared hit=37956
-> Nested Loop (cost=0.84..16494851.02 rows=6804163 width=457) (actual time=0.044..231.051 rows=142055 loops=1)
Buffers: shared hit=37956
-> Index Scan using product_collection_mission_id_index on product_collection pc (cost=0.28..46.13 rows=87 width=127) (actual time=0.017..0.101 rows=87 loops=1)
Index Cond: (mission_id = 7)
Buffers: shared hit=10
-> Index Scan using file_product_collection_id_index on file f (cost=0.57..187900.11 rows=169535 width=330) (actual time=0.007..1.335 rows=1633 loops=87)
Index Cond: (product_collection_id = pc.id)
Buffers: shared hit=37946
Planning time: 0.807 ms
Execution time: 569.865 ms
I have copied the database to a backup server so that I may safely manipulate the database without something else changing it on me.
Cardinalities:
Table file: 113,831,736 rows.
Table product_collection: 1370 rows.
The query without LIMIT: 142,055 rows.
SELECT count(*) FROM product_collection WHERE mission_id = 7: 87 rows.
What I have tried:
searching stack overflow
vacuum full analyze
creating two column indexes on file.product_collection_id & file.id. (there already are single column indexes on every field touched.)
creating two column indexes on file.id & file.product_collection_id.
increasing the statistics on file.id & file.product_collection_id, then re-vacuum analyze.
changing various query planner settings.
creating non-materialized views.
walking up and down the hallway while muttering to myself.
None of them seem to change the performance in a significant way.
Thoughts?
UPDATE from OP:
Tested this on PostgreSQL 9.6 & 10.4, and found no significant changes in plans or performance.
However, setting random_page_cost low enough is the only way to get faster performance on the without limit search.
With a default random_page_cost = 4, the without limit:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=9270013.01..9287875.64 rows=7145054 width=457) (actual time=47782.523..47843.812 rows=145697 loops=1)
Sort Key: f.id
Sort Method: external sort Disk: 59416kB
Buffers: shared hit=3997185 read=1295264, temp read=7427 written=7427
-> Hash Join (cost=24.19..6966882.72 rows=7145054 width=457) (actual time=1.323..47458.767 rows=145697 loops=1)
Hash Cond: (f.product_collection_id = pc.id)
Buffers: shared hit=3997182 read=1295264
-> Seq Scan on file f (cost=0.00..6458232.17 rows=116580217 width=330) (actual time=0.007..17097.581 rows=116729984 loops=1)
Buffers: shared hit=3997169 read=1295261
-> Hash (cost=23.08..23.08 rows=89 width=127) (actual time=0.840..0.840 rows=87 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 15kB
Buffers: shared hit=13 read=3
-> Bitmap Heap Scan on product_collection pc (cost=4.97..23.08 rows=89 width=127) (actual time=0.722..0.801 rows=87 loops=1)
Recheck Cond: (mission_id = 7)
Heap Blocks: exact=10
Buffers: shared hit=13 read=3
-> Bitmap Index Scan on product_collection_mission_id_index (cost=0.00..4.95 rows=89 width=0) (actual time=0.707..0.707 rows=87 loops=1)
Index Cond: (mission_id = 7)
Buffers: shared hit=3 read=3
Planning time: 0.929 ms
Execution time: 47911.689 ms
User Erwin's answer below will take me some time to fully understand and generalize to all of the use cases needed. In the mean time we will probably use either a materialized view or just flatten our table structure.
This query is harder for the Postgres query planner than it might look. Depending on cardinalities, data distribution, value frequencies, sizes, ... completely different query plans can prevail and the planner has a hard time predicting which is best. Current versions of Postgres are better at this in several aspects, but it's still hard to optimize.
Since you retrieve only relatively few rows from product_collection, this equivalent query with LIMIT in a LATERAL subquery should avoid performance degradation:
SELECT *
FROM product_collection pc
CROSS JOIN LATERAL (
SELECT *
FROM file f -- big table
WHERE f.product_collection_id = pc.id
ORDER BY f.id
LIMIT 100
) f
WHERE pc.mission_id = 7
ORDER BY f.id
LIMIT 100;
Edit: This results in a query plan with explain (analyze,verbose) provided by the OP:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=30524.34..30524.59 rows=100 width=457) (actual time=13.128..13.167 rows=100 loops=1)
Buffers: shared hit=3213
-> Sort (cost=30524.34..30546.09 rows=8700 width=457) (actual time=13.126..13.152 rows=100 loops=1)
Sort Key: file.id
Sort Method: top-N heapsort Memory: 76kB
Buffers: shared hit=3213
-> Nested Loop (cost=0.57..30191.83 rows=8700 width=457) (actual time=0.060..9.868 rows=2880 loops=1)
Buffers: shared hit=3213
-> Seq Scan on product_collection pc (cost=0.00..69.12 rows=87 width=127) (actual time=0.024..0.336 rows=87 loops=1)
Filter: (mission_id = 7)
Rows Removed by Filter: 1283
Buffers: shared hit=13
-> Limit (cost=0.57..344.24 rows=100 width=330) (actual time=0.008..0.071 rows=33 loops=87)
Buffers: shared hit=3200
-> Index Scan using file_pc_id_index on file (cost=0.57..582642.42 rows=169535 width=330) (actual time=0.007..0.065 rows=33 loops=87)
Index Cond: (product_collection_id = pc.id)
Buffers: shared hit=3200
Planning time: 0.595 ms
Execution time: 13.319 ms
You need these indexes (will help your original query, too):
CREATE INDEX idx1 ON file (product_collection_id, id); -- crucial
CREATE INDEX idx2 ON product_collection (mission_id, id); -- helpful
You mentioned:
two column indexes on file.id & file.product_collection_id.
Etc. But we need it the other way round: id last. The order of index expressions is crucial. See:
Is a composite index also good for queries on the first field?
Rationale: With only 87 rows from product_collection, we only fetch a maximum of 87 x 100 = 8700 rows (fewer if not every pc.id has 100 rows in table file), which are then sorted before picking the top 100. Performance degrades with the number of rows you get from product_collection and with bigger LIMIT.
With the multicolumn index idx1 above, that's 87 fast index scans. The rest is not very expensive.
More optimization is possible, depending on additional information. Related:
Can spatial index help a “range - order by - limit” query

Postgres 9.6 function performing poorly compared to straight sql

I have this function, and it works, it gives the most recent b record.
create or replace function most_recent_b(the_a a) returns b as $$
select distinct on (c.a_id) b.*
from c
join b on b.c_id = c.id
where c.a_id = the_a.id
order by c.a_id, b.date desc
$$ language sql stable;
This runs ~5000ms with real data. V.S. the following which runs in 500ms
create or replace function most_recent_b(the_a a) returns b as $$
select distinct on (c.a_id) b.*
from c
join b on b.c_id = c.id
where c.a_id = 1347
order by c.a_id, b.date desc
$$ language sql stable;
The only difference being that I've hard coded a.id with a value 1347 instead of using its param value.
Also running this query without a function also gives me speeds around 500ms
I'm running PostgreSQL 9.6, so the query planner failing in functions results I see suggested elsewhere shouldn't apply to me right?
I'm sure its not the query itself that is the issue, as this is my third iteration at it, different techniques to get this result all result in the same slow down when inside a function.
As requested by #laurenz-albe
Result of EXPLAIN (ANALYZE, BUFFERS) with constant
Unique (cost=60.88..60.89 rows=3 width=463) (actual time=520.117..520.122 rows=1 loops=1)
Buffers: shared hit=14555
-> Sort (cost=60.88..60.89 rows=3 width=463) (actual time=520.116..520.120 rows=9 loops=1)
Sort Key: b.date DESC
Sort Method: quicksort Memory: 28kB
Buffers: shared hit=14555
-> Hash Join (cost=13.71..60.86 rows=3 width=463) (actual time=386.848..520.083 rows=9 loops=1)
Hash Cond: (b.c_id = c.id)
Buffers: shared hit=14555
-> Seq Scan on b (cost=0.00..46.38 rows=54 width=459) (actual time=25.362..519.140 rows=51 loops=1)
Filter: b_can_view(b.*)
Rows Removed by Filter: 112
Buffers: shared hit=14530
-> Hash (cost=13.67..13.67 rows=3 width=8) (actual time=0.880..0.880 rows=10 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=25
-> Subquery Scan on c (cost=4.21..13.67 rows=3 width=8) (actual time=0.222..0.872 rows=10 loops=1)
Buffers: shared hit=25
-> Bitmap Heap Scan on c c_1 (cost=4.21..13.64 rows=3 width=2276) (actual time=0.221..0.863 rows=10 loops=1)
Recheck Cond: (a_id = 1347)
Filter: c_can_view(c_1.*)
Heap Blocks: exact=4
Buffers: shared hit=25
-> Bitmap Index Scan on c_a_id_c_number_idx (cost=0.00..4.20 rows=8 width=0) (actual time=0.007..0.007 rows=10 loops=1)
Index Cond: (a_id = 1347)
Buffers: shared hit=1
Execution time: 520.256 ms
And this is the result after running six times with the parameter being passed ( it was exactly six times as you predicted :) )
Slow query;
Unique (cost=57.07..57.07 rows=1 width=463) (actual time=5040.237..5040.243 rows=1 loops=1)
Buffers: shared hit=145325
-> Sort (cost=57.07..57.07 rows=1 width=463) (actual time=5040.237..5040.240 rows=9 loops=1)
Sort Key: b.date DESC
Sort Method: quicksort Memory: 28kB
Buffers: shared hit=145325
-> Nested Loop (cost=0.14..57.06 rows=1 width=463) (actual time=912.354..5040.195 rows=9 loops=1)
Join Filter: (c.id = b.c_id)
Rows Removed by Join Filter: 501
Buffers: shared hit=145325
-> Index Scan using c_a_id_idx on c (cost=0.14..9.45 rows=1 width=2276) (actual time=0.378..1.171 rows=10 loops=1)
Index Cond: (a_id = $1)
Filter: c_can_view(c.*)
Buffers: shared hit=25
-> Seq Scan on b (cost=0.00..46.38 rows=54 width=459) (actual time=24.842..503.854 rows=51 loops=10)
Filter: b_can_view(b.*)
Rows Removed by Filter: 112
Buffers: shared hit=145300
Execution time: 5040.375 ms
Its worth noting that I have some strict row level security involved, and I suspect this is why these queries are both slow, however, one is 10 times slower than the other.
I've changed my original table names hopefully my search and replace was good here.
The expensive part of your query execution is the filter b_can_view(b.*), which must come from your row level security definition.
The fast execution:
Seq Scan on b (cost=0.00..46.38 rows=54 width=459)
(actual time=25.362..519.140 rows=51 loops=1)
Filter: b_can_view(b.*)
Rows Removed by Filter: 112
Buffers: shared hit=14530
The slow execution:
Seq Scan on b (cost=0.00..46.38 rows=54 width=459)
(actual time=24.842..503.854 rows=51 loops=10)
Filter: b_can_view(b.*)
Rows Removed by Filter: 112
Buffers: shared hit=145300
The difference is that the scan is executed 10 times in the slow case (loops=10) and touches 10 times as many data blocks.
When using the generic plan, PostgreSQL underestimates how many rows in c will satisfy the condition c.a_id = $1, because it doesn't know that the actual value is 1347, which is more frequent than average.
Since PostgreSQL thinks there will be at most one row from c, it chooses a nested loop join with a sequential scan of b on the inner side.
Now two problems combine:
Calling function b_can_view takes over 3 milliseconds per row (which PostgreSQL doesn't know), which accounts for the half second that a sequential scan of the 163 rows takes.
There are actually 10 rows found in c instead of the predicted 1, so table b is scanned 10 times, and you end up with a query duration of 5 seconds.
So what can you do?
Tell PostgreSQL how expensive the b_can_view is. Use ALTER TABLE to set the COST for that function to 1000 or 10000 to reflect reality. That alone will not be enough to get a faster plan, since PostgreSQL thinks that it has to execute a single sequential scan anyway, but it is a good thing to give the optimizer correct data.
Create an index on b(c_id). That will enable PostgreSQL to avoid a sequential scan of b, which it will try to do once it is aware how expensive the function is.
Also, try to make the function b_can_view cheaper. That will make your experience so much better.