Gap in the execution-time parts in the query plan - postgresql

I have trouble understanding the following query plan (anonyized). There seems to be a gap in how much time the actual parts of the query took and a part is missing.
The relevant part:
-> Sort (cost=306952.44..307710.96 rows=303409 width=40) (actual time=4222.122..4400.534 rows=582081 loops=1)
Sort Key: table1.column3, table1.column2
Sort Method: quicksort Memory: 71708kB
Buffers: shared hit=38058
-> Index Scan using myindex on table1 (cost=0.56..279325.68 rows=303409 width=40) (actual time=0.056..339.565 rows=582081 loops=1)
Index Cond: (((column1)::text = 'xxx1'::text) AND ((column2)::text > 'xxx2'::text) AND ((column2)::text < 'xxx3'::text))
Buffers: shared hit=38058
What happened between the index scan (up until 339.565) and starting to sort the result (4222.122)?
And the full plan (just in case):
GroupAggregate (cost=344290.95..348368.01 rows=30341 width=65) (actual time=4933.739..4933.806 rows=11 loops=1)
Group Key: t.column1, t.current_value, t.previous_value
Buffers: shared hit=38058
-> Sort (cost=344290.95..345045.68 rows=301892 width=72) (actual time=4933.712..4933.719 rows=58 loops=1)
Sort Key: t.column1, t.current_value, t.previous_value
Sort Method: quicksort Memory: 32kB
Buffers: shared hit=38058
-> Subquery Scan on t (cost=306952.44..316813.23 rows=301892 width=72) (actual time=4573.523..4933.607 rows=58 loops=1)
Filter: ((t.current_value)::text <> (t.previous_value)::text)
Rows Removed by Filter: 582023
Buffers: shared hit=38058
-> WindowAgg (cost=306952.44..313020.62 rows=303409 width=72) (actual time=4222.144..4859.579 rows=582081 loops=1)
Buffers: shared hit=38058
-> Sort (cost=306952.44..307710.96 rows=303409 width=40) (actual time=4222.122..4400.534 rows=582081 loops=1)
Sort Key: table1.column3, table1.column2
Sort Method: quicksort Memory: 71708kB
Buffers: shared hit=38058
-> Index Scan using myindex on table1 (cost=0.56..279325.68 rows=303409 width=40) (actual time=0.056..339.565 rows=582081 loops=1)
Index Cond: (((column1)::text = 'xxx1'::text) AND ((column2)::text > 'xxx2'::text) AND ((column2)::text < 'xxx3'::text))
Buffers: shared hit=38058
Planning Time: 0.405 ms
Execution Time: 4941.003 ms

Both cost and actual time data are shown as two numbers in these plans. The first is the setup time. Sloppily described, it's the time the query step takes to deliver its first row. See this. The second is the completion time, or time to last row.
actual time=setup...completion
The setup on a sort step includes the time required to retrieve the result set needing sorting and to actually perform the sort: shuffle the rows around in RAM or hopefully not, on disk. (Because quicksort is generally O(n log(n)) in complexity, that can be long for big result sets. You knew that.)
In your plan, the inner sort handles almost 600K rows that came from an index scan. It's the step where the sort takes time. It used 71,708 kilobytes of RAM.
As far as I can tell, there's nothing anomalous in your plan. How to speed up that sort? You might try shorter or fixed-length column2 and column3 data types, but that's all guesswork without seeing your query and table definitions.

Related

Postgres query optimizer generates bad plan after adding another order by criterion

I'm using django orm with select related and it generates the query of the form:
SELECT *
FROM "coupons_coupon"
LEFT OUTER JOIN "coupons_merchant"
ON ("coupons_coupon"."merchant_id" = "coupons_merchant"."slug")
WHERE ("coupons_coupon"."end_date" > '2020-07-10T09:10:28.101980+00:00'::timestamptz AND "coupons_coupon"."published" = true)
ORDER BY "coupons_coupon"."end_date" ASC, "coupons_coupon"."id"
LIMIT 5;
Which is then executed using the following plan:
Limit (cost=4363.28..4363.30 rows=5 width=604) (actual time=21.864..21.865 rows=5 loops=1)
-> Sort (cost=4363.28..4373.34 rows=4022 width=604) (actual time=21.863..21.863 rows=5 loops=1)
Sort Key: coupons_coupon.end_date, coupons_coupon.id"
Sort Method: top-N heapsort Memory: 32kB
-> Hash Left Join (cost=2613.51..4296.48 rows=4022 width=604) (actual time=13.918..20.209 rows=4022 loops=1)
Hash Cond: ((coupons_coupon.merchant_id)::text = (coupons_merchant.slug)::text)
-> Seq Scan on coupons_coupon (cost=0.00..291.41 rows=4022 width=261) (actual time=0.007..1.110 rows=4022 loops=1)
Filter: (published AND (end_date > '2020-07-10 09:10:28.10198+00'::timestamp with time zone))
Rows Removed by Filter: 1691
-> Hash (cost=1204.56..1204.56 rows=24956 width=331) (actual time=13.894..13.894 rows=23911 loops=1)
Buckets: 16384 Batches: 4 Memory Usage: 1948kB
-> Seq Scan on coupons_merchant (cost=0.00..1204.56 rows=24956 width=331) (actual time=0.003..4.681 rows=23911 loops=1)
Which is a bad execution plan as join can be done after the left table has been filtered, ordered and limited. When I remove the id from order by it generates an efficient plan, which basically could have been used in the previous query as well.
Limit (cost=0.57..8.84 rows=5 width=600) (actual time=0.013..0.029 rows=5 loops=1)
-> Nested Loop Left Join (cost=0.57..6650.48 rows=4022 width=600) (actual time=0.012..0.028 rows=5 loops=1)
-> Index Scan using coupons_cou_end_dat_a8d5b7_btree on coupons_coupon (cost=0.28..1015.77 rows=4022 width=261) (actual time=0.007..0.010 rows=5 loops=1)
Index Cond: (end_date > '2020-07-10 09:10:28.10198+00'::timestamp with time zone)
Filter: published
-> Index Scan using coupons_merchant_pkey on coupons_merchant (cost=0.29..1.40 rows=1 width=331) (actual time=0.003..0.003 rows=1 loops=5)
Index Cond: ((slug)::text = (coupons_coupon.merchant_id)::text)
Why is this happening? Can the optimizer be nudged to use similar plan for the former query?
I'm using postgres 12.
v13 of PostgreSQL, which should be released in the next few months, implements incremental sorting, in which it can read rows in an pre-sorted order based on prefix columns, then sorts just the ties on those prefix column(s) by the remaining column(s) in order to get a complete sort based on more columns than an index provides. I think that will do more or less what you want.
Limit (cost=2.46..2.99 rows=5 width=21)
-> Incremental Sort (cost=2.46..405.58 rows=3850 width=21)
Sort Key: coupons_coupon.end_date, coupons_coupon.id
Presorted Key: coupons_coupon.end_date
-> Nested Loop Left Join (cost=0.31..253.48 rows=3850 width=21)
-> Index Scan using coupons_coupon_end_date_idx on coupons_coupon (cost=0.15..54.71 rows=302 width=17)
Index Cond: (end_date > '2020-07-10 05:10:28.10198-04'::timestamp with time zone)
Filter: published
-> Index Only Scan using coupons_merchant_slug_idx on coupons_merchant (cost=0.15..0.53 rows=13 width=4)
Index Cond: (slug = coupons_coupon.merchant_id)
Of course just adding "id" into the current index will work under currently released versions, and even under version 13 is should be more efficient to have the index fully order the rows in the way you need them.

Reduce execution time of my query in postgresql

Below is my query. It takes 9387.430 ms to execute, which is certainly too long for such a request. I would like to be able to reduce this execution time. Can you please help me on this ? I also provided my analyze output.
EXPLAIN ANALYZE
SELECT a.artist, b.artist, COUNT(*)
FROM release_has_artist a, release_has_artist b
WHERE a.release = b.release AND a.artist <> b.artist
GROUP BY(a.artist,b.artist)
ORDER BY (a.artist,b.artist);;
Output of EXPLAIN ANALYZE :
Sort (cost=1696482.86..1707588.14 rows=4442112 width=48) (actual time=9253.474..9314.510 rows=461386 loops=1)
Sort Key: (ROW(a.artist, b.artist))
Sort Method: external sort Disk: 24832kB
-> Finalize GroupAggregate (cost=396240.32..932717.19 rows=4442112 width=48) (actual time=1928.058..2911.463 rows=461386 loops=1)
Group Key: a.artist, b.artist
-> Gather Merge (cost=396240.32..860532.87 rows=3701760 width=16) (actual time=1928.049..2494.638 rows=566468 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=395240.29..432257.89 rows=1850880 width=16) (actual time=1912.809..2156.951 rows=188823 loops=3)
Group Key: a.artist, b.artist
-> Sort (cost=395240.29..399867.49 rows=1850880 width=8) (actual time=1912.794..2003.776 rows=271327 loops=3)
Sort Key: a.artist, b.artist
Sort Method: external merge Disk: 4848kB
-> Merge Join (cost=0.85..177260.72 rows=1850880 width=8) (actual time=2.143..1623.628 rows=271327 loops=3)
Merge Cond: (a.release = b.release)
Join Filter: (a.artist <> b.artist)
Rows Removed by Join Filter: 687597
-> Parallel Index Only Scan using release_has_artist_pkey on release_has_artist a (cost=0.43..67329.73 rows=859497 width=8) (actual time=0.059..240.998 rows=687597 loops=3)
Heap Fetches: 711154
-> Index Only Scan using release_has_artist_pkey on release_has_artist b (cost=0.43..79362.68 rows=2062792 width=8) (actual time=0.072..798.402 rows=2329742 loops=3)
Heap Fetches: 2335683
Planning time: 2.101 ms
Execution time: 9387.430 ms
In your EXPLAIN ANALYZE output, there are two Sort Method: external merge Disk: ####kB, indicating that the sort spilled out to disk and not in memory, due to an insufficiently-sized work_mem. Try increasing your work_mem up to 32MB (30 might be ok, but I like multiples of 8), and try again
Note that you can set work_mem on a per-session basis, as a global change in work_mem could potentially have negative side-effects, such as running out of memory, because postgresql.conf-configured work_mem is allocated for each session (basically, it has a multiplicative effect).

LIMIT with ORDER BY makes query slow

I am having problems optimizing a query in PostgreSQL 9.5.14.
select *
from file as f
join product_collection pc on (f.product_collection_id = pc.id)
where pc.mission_id = 7
order by f.id asc
limit 100;
Takes about 100 seconds. If I drop the limit clause it takes about 0.5:
With limit:
explain (analyze,buffers) ... -- query exactly as above
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.84..859.32 rows=100 width=457) (actual time=102793.422..102856.884 rows=100 loops=1)
Buffers: shared hit=222430592
-> Nested Loop (cost=0.84..58412343.43 rows=6804163 width=457) (actual time=102793.417..102856.872 rows=100 loops=1)
Buffers: shared hit=222430592
-> Index Scan using file_pkey on file f (cost=0.57..23409008.61 rows=113831736 width=330) (actual time=0.048..28207.152 rows=55858772 loops=1)
Buffers: shared hit=55652672
-> Index Scan using product_collection_pkey on product_collection pc (cost=0.28..0.30 rows=1 width=127) (actual time=0.001..0.001 rows=0 loops=55858772)
Index Cond: (id = f.product_collection_id)
Filter: (mission_id = 7)
Rows Removed by Filter: 1
Buffers: shared hit=166777920
Planning time: 0.803 ms
Execution time: 102856.988 ms
Without limit:
=> explain (analyze,buffers) ... -- query as above, just without limit
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=20509671.01..20526681.42 rows=6804163 width=457) (actual time=456.175..510.596 rows=142055 loops=1)
Sort Key: f.id
Sort Method: quicksort Memory: 79392kB
Buffers: shared hit=37956
-> Nested Loop (cost=0.84..16494851.02 rows=6804163 width=457) (actual time=0.044..231.051 rows=142055 loops=1)
Buffers: shared hit=37956
-> Index Scan using product_collection_mission_id_index on product_collection pc (cost=0.28..46.13 rows=87 width=127) (actual time=0.017..0.101 rows=87 loops=1)
Index Cond: (mission_id = 7)
Buffers: shared hit=10
-> Index Scan using file_product_collection_id_index on file f (cost=0.57..187900.11 rows=169535 width=330) (actual time=0.007..1.335 rows=1633 loops=87)
Index Cond: (product_collection_id = pc.id)
Buffers: shared hit=37946
Planning time: 0.807 ms
Execution time: 569.865 ms
I have copied the database to a backup server so that I may safely manipulate the database without something else changing it on me.
Cardinalities:
Table file: 113,831,736 rows.
Table product_collection: 1370 rows.
The query without LIMIT: 142,055 rows.
SELECT count(*) FROM product_collection WHERE mission_id = 7: 87 rows.
What I have tried:
searching stack overflow
vacuum full analyze
creating two column indexes on file.product_collection_id & file.id. (there already are single column indexes on every field touched.)
creating two column indexes on file.id & file.product_collection_id.
increasing the statistics on file.id & file.product_collection_id, then re-vacuum analyze.
changing various query planner settings.
creating non-materialized views.
walking up and down the hallway while muttering to myself.
None of them seem to change the performance in a significant way.
Thoughts?
UPDATE from OP:
Tested this on PostgreSQL 9.6 & 10.4, and found no significant changes in plans or performance.
However, setting random_page_cost low enough is the only way to get faster performance on the without limit search.
With a default random_page_cost = 4, the without limit:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=9270013.01..9287875.64 rows=7145054 width=457) (actual time=47782.523..47843.812 rows=145697 loops=1)
Sort Key: f.id
Sort Method: external sort Disk: 59416kB
Buffers: shared hit=3997185 read=1295264, temp read=7427 written=7427
-> Hash Join (cost=24.19..6966882.72 rows=7145054 width=457) (actual time=1.323..47458.767 rows=145697 loops=1)
Hash Cond: (f.product_collection_id = pc.id)
Buffers: shared hit=3997182 read=1295264
-> Seq Scan on file f (cost=0.00..6458232.17 rows=116580217 width=330) (actual time=0.007..17097.581 rows=116729984 loops=1)
Buffers: shared hit=3997169 read=1295261
-> Hash (cost=23.08..23.08 rows=89 width=127) (actual time=0.840..0.840 rows=87 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 15kB
Buffers: shared hit=13 read=3
-> Bitmap Heap Scan on product_collection pc (cost=4.97..23.08 rows=89 width=127) (actual time=0.722..0.801 rows=87 loops=1)
Recheck Cond: (mission_id = 7)
Heap Blocks: exact=10
Buffers: shared hit=13 read=3
-> Bitmap Index Scan on product_collection_mission_id_index (cost=0.00..4.95 rows=89 width=0) (actual time=0.707..0.707 rows=87 loops=1)
Index Cond: (mission_id = 7)
Buffers: shared hit=3 read=3
Planning time: 0.929 ms
Execution time: 47911.689 ms
User Erwin's answer below will take me some time to fully understand and generalize to all of the use cases needed. In the mean time we will probably use either a materialized view or just flatten our table structure.
This query is harder for the Postgres query planner than it might look. Depending on cardinalities, data distribution, value frequencies, sizes, ... completely different query plans can prevail and the planner has a hard time predicting which is best. Current versions of Postgres are better at this in several aspects, but it's still hard to optimize.
Since you retrieve only relatively few rows from product_collection, this equivalent query with LIMIT in a LATERAL subquery should avoid performance degradation:
SELECT *
FROM product_collection pc
CROSS JOIN LATERAL (
SELECT *
FROM file f -- big table
WHERE f.product_collection_id = pc.id
ORDER BY f.id
LIMIT 100
) f
WHERE pc.mission_id = 7
ORDER BY f.id
LIMIT 100;
Edit: This results in a query plan with explain (analyze,verbose) provided by the OP:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=30524.34..30524.59 rows=100 width=457) (actual time=13.128..13.167 rows=100 loops=1)
Buffers: shared hit=3213
-> Sort (cost=30524.34..30546.09 rows=8700 width=457) (actual time=13.126..13.152 rows=100 loops=1)
Sort Key: file.id
Sort Method: top-N heapsort Memory: 76kB
Buffers: shared hit=3213
-> Nested Loop (cost=0.57..30191.83 rows=8700 width=457) (actual time=0.060..9.868 rows=2880 loops=1)
Buffers: shared hit=3213
-> Seq Scan on product_collection pc (cost=0.00..69.12 rows=87 width=127) (actual time=0.024..0.336 rows=87 loops=1)
Filter: (mission_id = 7)
Rows Removed by Filter: 1283
Buffers: shared hit=13
-> Limit (cost=0.57..344.24 rows=100 width=330) (actual time=0.008..0.071 rows=33 loops=87)
Buffers: shared hit=3200
-> Index Scan using file_pc_id_index on file (cost=0.57..582642.42 rows=169535 width=330) (actual time=0.007..0.065 rows=33 loops=87)
Index Cond: (product_collection_id = pc.id)
Buffers: shared hit=3200
Planning time: 0.595 ms
Execution time: 13.319 ms
You need these indexes (will help your original query, too):
CREATE INDEX idx1 ON file (product_collection_id, id); -- crucial
CREATE INDEX idx2 ON product_collection (mission_id, id); -- helpful
You mentioned:
two column indexes on file.id & file.product_collection_id.
Etc. But we need it the other way round: id last. The order of index expressions is crucial. See:
Is a composite index also good for queries on the first field?
Rationale: With only 87 rows from product_collection, we only fetch a maximum of 87 x 100 = 8700 rows (fewer if not every pc.id has 100 rows in table file), which are then sorted before picking the top 100. Performance degrades with the number of rows you get from product_collection and with bigger LIMIT.
With the multicolumn index idx1 above, that's 87 fast index scans. The rest is not very expensive.
More optimization is possible, depending on additional information. Related:
Can spatial index help a “range - order by - limit” query

Postgres Explain Plans Are DIfferent

I have 2 databases running on the same server.
Both are development databases with the same structure, but different amounts of data.
Running Postgres 9.2 on Linux (Centos/Redhat)
I'm running the following SQL on each database, but getting very different results in the performance.
Note: the write_history table structure is identical in each DB and have the same indexes/constraints.
Here is the SQL:
explain analyze SELECT * FROM write_history WHERE fk_device_rw_id =
'd2c969b8-2609-11e3-80ca-0750cff1e96c' AND fk_write_history_status_id
= 5 ORDER BY update_time DESC LIMIT 1 ;
And the Explain Plans for each DB:
DB1 - PreProd
Limit (cost=57902.39..57902.39 rows=1 width=103) (actual
time=0.056..0.056 rows=0 loops=1)' -> Sort
(cost=57902.39..57908.69 rows=2520 width=103) (actual
time=0.053..0.053 rows=0 loops=1)'
Sort Key: update_time'
Sort Method: quicksort Memory: 25kB'
-> Bitmap Heap Scan on write_history (cost=554.04..57889.79 rows=2520 width=103) (actual time=0.033..0.033 rows=0 loops=1)'
Recheck Cond: (fk_device_rw_id = 'd2c969b8-2609-11e3-80ca-0750cff1e96c'::uuid)'
Filter: (fk_write_history_status_id = 5)'
-> Bitmap Index Scan on idx_write_history_fk_device_rw_id (cost=0.00..553.41 rows=24034
width=0) (actual time=0.028..0.028 rows=0 loops=1)'
Index Cond: (fk_device_rw_id = 'd2c969b8-2609-11e3-80ca-0750cff1e96c'::uuid)' Total runtime: 0.112 ms'
DB2 - QA
Limit (cost=50865.41..50865.41 rows=1 width=108) (actual
time=545.521..545.523 rows=1 loops=1)' -> Sort
(cost=50865.41..50916.62 rows=20484 width=108) (actual
time=545.518..545.518 rows=1 loops=1)'
Sort Key: update_time'
Sort Method: top-N heapsort Memory: 25kB'
-> Bitmap Heap Scan on write_history (cost=1431.31..50762.99 rows=20484 width=108) (actual time=21.931..524.034 rows=22034
loops=1)'
Recheck Cond: (fk_device_rw_id = 'd2cd81a6-2609-11e3-b574-47328bfa4c38'::uuid)'
Rows Removed by Index Recheck: 1401986'
Filter: (fk_write_history_status_id = 5)'
Rows Removed by Filter: 40161'
-> Bitmap Index Scan on idx_write_history_fk_device_rw_id (cost=0.00..1426.19 rows=62074
width=0) (actual time=19.167..19.167 rows=62195 loops=1)'
Index Cond: (fk_device_rw_id = 'd2cd81a6-2609-11e3-b574-47328bfa4c38'::uuid)' Total runtime: 545.588 ms'
So a few questions:
what is the difference between "Sort Method: quicksort Memory: 25kB" and "Sort Method: top-N heapsort Memory: 25kB" ?
What could be the cause for the total runtime to be so different?
Table Row Count:
DB1 : write_history row count: 5,863,565
DB2 : write_history row count: 2,670,888
Please let me know if further information is required.
Thanks for the help.
A top-N sort means that it is sorting in support of an ORDER BY ... LIMIT N, and that it will throw away any tuples once it can show that tuple cannot be in the top N. The decision to switch to a top-N sort is made dynamically during the sort process. Since that sort had zero input tuples, it never made the decision to switch to it. So the difference in the reported methods is an effect, not a cause; and is no importance to this case.
I think the key for you is the bitmap heap scans:
(actual time=0.033..0.033 rows=0 loops=1)
(actual time=21.931..524.034 rows=22034 loops=1)
The smaller database has a lot more rows which match your criteria, so has a lot more work to do.
Also, the amount of rechecking work that needs to be done:
Rows Removed by Index Recheck: 1401986
suggests you have work_mem set to an very small value of work_mem, so your bitmap is overflowing it.
1) Two different sorting algorithms.
http://en.wikipedia.org/wiki/Quicksort
http://en.wikipedia.org/wiki/Heapsort
2) Volume of data being sorted will change the decision.
From memory of database theory; QuickSort performs very badly on large volumes of data due to the sequence it scans through the data. It's particularly bad if sorting requires caching to disk. Heap sort has a much better worst case than quick sort. The query planner knows this and will change the plan accordingly.
The number of rows being sorted is 8X larger. Be very careful to notice the number of rows expected to be selected as well as the number of rows in the table. The query planner uses histograms of the data to get a better picture of what will be selected not simply the number of rows in the table. In DB1 it's 2.5K, in DB2 it's 20K
You can check if these estimates are correct by checking if the resulting set of rows matches the count in the estimate. If not then consider performing an ANALYZE

adding more indexes to postgres causes "out of shared memory" error

i have a rather complex query that i'm trying to optimize in postgres 9.2 - the explain analyze gives this plan (explain.depesz.com):
Merge Right Join (cost=194965639.35..211592151.26 rows=420423258 width=616) (actual time=15898.283..15920.603 rows=17 loops=1)
Merge Cond: ((((p.context -> 'device'::text)) = ((s.context -> 'device'::text))) AND (((p.context -> 'physical_port'::text)) = ((s.context -> 'physical_port'::text))))
-> Sort (cost=68925.49..69073.41 rows=59168 width=393) (actual time=872.289..877.818 rows=39898 loops=1)
Sort Key: ((p.context -> 'device'::text)), ((p.context -> 'physical_port'::text))
Sort Method: quicksort Memory: 27372kB
-> Seq Scan on ports__status p (cost=0.00..64235.68 rows=59168 width=393) (actual time=0.018..60.931 rows=41395 loops=1)
-> Materialize (cost=194896713.86..199620346.93 rows=284223403 width=299) (actual time=15023.710..15024.779 rows=17 loops=1)
-> Merge Left Join (cost=194896713.86..198909788.42 rows=284223403 width=299) (actual time=15023.705..15024.765 rows=17 loops=1)
Merge Cond: ((((s.context -> 'device'::text)) = ((l1.context -> 'device'::text))) AND (((s.context -> 'physical_port'::text)) = ((l1.context -> 'physical_port'::text))))
-> Sort (cost=194894861.42..195605419.92 rows=284223403 width=224) (actual time=14997.225..14997.230 rows=17 loops=1)
Sort Key: ((s.context -> 'device'::text)), ((s.context -> 'physical_port'::text))
Sort Method: quicksort Memory: 33kB
-> GroupAggregate (cost=100001395.98..122028709.71 rows=284223403 width=389) (actual time=14997.120..14997.186 rows=17 loops=1)
-> Sort (cost=100001395.98..100711954.49 rows=284223403 width=389) (actual time=14997.080..14997.080 rows=17 loops=1)
Sort Key: ((d.context -> 'hostname'::text)), ((a.context -> 'ip_address'::text)), ((a.context -> 'mac_address'::text)), ((s.context -> 'device'::text)), ((s.context -> 'physical_port'::text)), s.created_at, s.updated_at, d.created_at, d.updated_at
Sort Method: quicksort Memory: 33kB
-> Merge Join (cost=339026.99..9576678.30 rows=284223403 width=389) (actual time=14996.710..14996.749 rows=17 loops=1)
Merge Cond: (((a.context -> 'mac_address'::text)) = ((s.context -> 'mac_address'::text)))
-> Sort (cost=15038.32..15136.00 rows=39072 width=255) (actual time=23.556..23.557 rows=1 loops=1)
Sort Key: ((a.context -> 'mac_address'::text))
Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=471.88..12058.33 rows=39072 width=255) (actual time=13.482..23.548 rows=1 loops=1)
Hash Cond: ((a.context -> 'ip_address'::text) = (d.context -> 'ip_address'::text))
-> Seq Scan on arps__arps a (cost=0.00..8132.39 rows=46239 width=157) (actual time=0.007..11.191 rows=46259 loops=1)
-> Hash (cost=469.77..469.77 rows=169 width=98) (actual time=0.035..0.035 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Bitmap Heap Scan on ipam__dns d (cost=9.57..469.77 rows=169 width=98) (actual time=0.023..0.023 rows=1 loops=1)
Recheck Cond: ((context -> 'hostname'::text) = 'zglast-oracle03.slac.stanford.edu'::text)
-> Bitmap Index Scan on ipam__dns_hostname_index (cost=0.00..9.53 rows=169 width=0) (actual time=0.017..0.017 rows=1 loops=1)
Index Cond: ((context -> 'hostname'::text) = 'blah'::text)
-> Sort (cost=323988.67..327625.84 rows=1454870 width=134) (actual time=14973.118..14973.120 rows=18 loops=1)
Sort Key: ((s.context -> 'mac_address'::text))
Sort Method: external sort Disk: 214176kB
-> Result (cost=0.00..175064.84 rows=1454870 width=134) (actual time=0.016..1107.604 rows=1265154 loops=1)
-> Append (cost=0.00..175064.84 rows=1454870 width=134) (actual time=0.013..796.578 rows=1265154 loops=1)
-> Seq Scan on spanning_tree__neighbour s (cost=0.00..0.00 rows=1 width=98) (actual time=0.000..0.000 rows=0 loops=1)
Filter: ((context -> 'physical_port'::text) IS NOT NULL)
-> Seq Scan on spanning_tree__neighbour__vlan38 s (cost=0.00..469.32 rows=1220 width=129) (actual time=0.011..1.019 rows=823 loops=1)
Filter: ((context -> 'physical_port'::text) IS NOT NULL)
Rows Removed by Filter: 403
-> Seq Scan on spanning_tree__neighbour__vlan3 s (cost=0.00..270.20 rows=1926 width=139) (actual time=0.017..0.971 rows=1882 loops=1)
Filter: ((context -> 'physical_port'::text) IS NOT NULL)
Rows Removed by Filter: 54
-> Seq Scan on spanning_tree__neighbour__vlan466 s (cost=0.00..131.85 rows=306 width=141) (actual time=0.032..0.340 rows=276 loops=1)
Filter: ((context -> 'physical_port'::text) IS NOT NULL)
Rows Removed by Filter: 32
-> Seq Scan on spanning_tree__neighbour__vlan465 s (cost=0.00..208.57 rows=842 width=142) (actual time=0.005..0.622 rows=768 loops=1)
Filter: ((context -> 'physical_port'::text) IS NOT NULL)
Rows Removed by Filter: 78
-> Seq Scan on spanning_tree__neighbour__vlan499 s (cost=0.00..245.04 rows=481 width=142) (actual time=0.017..0.445 rows=483 loops=1)
Filter: ((context -> 'physical_port'::text) IS NOT NULL)
-> Seq Scan on spanning_tree__neighbour__vlan176 s (cost=0.00..346.36 rows=2576 width=131) (actual time=0.008..1.443 rows=2051 loops=1)
Filter: ((context -> 'physical_port'::text) IS NOT NULL)
Rows Removed by Filter: 538
i'm a bit of a novice at reading the plan, but i think it's all down to the fact that i have the table spanning_tree__neighbour (which i've partitioned into numerous 'vlan' tables). as you can see it's performing a seq scan.
so i write a quick and dirty bash script to create indexes for the child tables:
create index spanning_tree__neighbour__vlan1_physical_port_index ON spanning_tree__neighbour__vlan1((context->'physical_port')) wHERE ((context->'physical_port') IS NOT NULL);
create index spanning_tree__neighbour__vlan2_physical_port_index ON spanning_tree__neighbour__vlan2((context->'physical_port')) wHERE ((context->'physical_port') IS NOT NULL);
create index spanning_tree__neighbour__vlan3_physical_port_index ON spanning_tree__neighbour__vlan3((context->'physical_port')) wHERE ((context->'physical_port') IS NOT NULL);
...
but after i create a hundred or so of them, any query gives:
=> explain analyze select * from hosts where hostname='blah';
WARNING: out of shared memory
ERROR: out of shared memory
HINT: You might need to increase max_locks_per_transaction.
Time: 34.757 ms
will setting max_locks_per_transaction actually help? what value should i use given that my partitioned table has upto 4096 child tables?
or have i read the plan wrong?
will setting max_locks_per_transaction actually help?
No, it won't.
Not before fixing your schema and your query first anyway.
A few things pop out… Some already mentioned in comments. In no particular order:
Stats are off. ANALYZE your tables and, if you determine that autovacuum doesn't have enough memory to do its job properly, increase maintenance_work_mem.
Steps like Sort Method: external sort Disk: 214176kB indicate that you're sorting rows on disk. Increase work_mem accordingly.
Steps like Seq Scan on spanning_tree__neighbour__vlan176 s (cost=0.00..346.36 rows=2576 width=131) (actual time=0.008..1.443 rows=2051 loops=1) followed by append are dubious at best.
Look… Partition tables when you want to turn something unmanageable or impractical into something more manageable, e.g. pushing a few billion rows of old data out of the way of more the couple of millions that are used on a daily basis. Not to turn a couple of million rows into 4,096 puny tables with a pathetically small 1k rows in them on average.
The next offender is things like Filter: ((context -> 'physical_port'::text) IS NOT NULL) —- ARGH.
Never, ever store things in hstore, JSON, XML or any other kind of EAV (entity-attribute-value store), if you care about the data that lands in it; in particular if it appears in a where, join or sort (!) clause. No ifs, no buts: just change your schema.
Plus, a bunch of the fields that appear in your query could be conveniently stored using Postgres' network types instead of dumb text. Odds are they should all be indexed, too. (They wouldn't appear in the plan if they shouldn't.)
You've a step that does a GroupAggregate beneath a left join. Typically, this indicates a query like: … left join (select agg_fn(…) … group by …) foo …. That's a big no no in my experience. Pull that out of your query if you can.
The plan is too long and unreadable to guess why it's doing that exactly, but if select * from hosts where hostname='blah'; is anything to go by, you seem to be selecting absolutely every possible thing you can access in one query.
It's a lot cheaper, and faster, to find the select few rows that you actually want, and then run a handful of other queries to select the related data. So do so.
If you still need to join with that aggregate subquery for some reason, be sure to look into window functions. More often than not, they'll spare you the need for gory joins, by allowing you to run the aggregate on the current set of rows directly.
Once you've done these steps, the default max_locks_per_transaction should be fine.