How to interpret PostgreSQL EXPLAIN results when query hangs - postgresql

I have no idea how to simplify this problem, so this is going to be a long question.
For openers, for reasons I won't get into, I normalized out long paragraphs to a table named shared.notes.
Next I have a complicated view with a number of paragraph lookups. Each note_id field is (a) indexed and (b) has a foreign key constraint to the notes table. Pseudo code below:
CREATE VIEW shared.vw_get_the_whole_kit_and_kaboodle AS
SELECT
yada yada
, mi.electrical_note_id
, electrical_notes.note AS electrical_notes
, mi.hvac_note_id
, hvac_notes.note AS hvac_notes
, mi.network_note_id
, network_notes.note AS network_notes
, mi.plumbing_note_id
, plumbing_notes.note AS plumbing_notes
, mi.specification_note_id
, specification_notes.note AS specification_notes
, mi.structural_note_id
, structural_notes.note AS structural_notes
FROM shared.a_table AS mi
JOIN shared.generic_items AS gi
ON mi.generic_item_id = gi.generic_item_id
JOIN shared.manufacturers AS mft
ON mi.manufacturer_id = mft.manufacturer_id
JOIN shared.notes AS electrical_notes
ON mi.electrical_note_id = electrical_notes.note_id
JOIN shared.notes AS hvac_notes
ON mi.hvac_note_id = hvac_notes.note_id
JOIN shared.notes AS plumbing_notes
ON mi.plumbing_note_id = plumbing_notes.note_id
JOIN shared.notes AS specification_notes
ON mi.specification_note_id = specification_notes.note_id
JOIN shared.notes AS structural_notes
ON mi.structural_note_id = structural_notes.note_id
JOIN shared.notes AS network_notes
ON mi.network_note_id = network_notes.note_id
JOIN shared.connectivity AS nc
ON mi.connectivity_id = nc.connectivity_id
WHERE
mi.deletion_date IS NULL;
Then I select against this view:
SELECT
lots of columns...
FROM shared.vw_get_the_whole_kit_and_kaboodle
WHERE
is_active = TRUE
AND is_inventory = FALSE;
Strangely, in the cloud GCP databases, I've not run into problems yet, and there are thousands of rows involved in a number of these tables.
Meanwhile back at the ranch, on my local PC, I've got a test version of the database. SAME EXACT SQL, down to the last letter. Trust me on that. For table definitions, view definitions, indexes... everything.
The cloud will return queries nearly instantaneously.
The local PC will hang--this despite the fact that the PC database has a mere handful of rows each in the various tables. So if one should hang, it ought to be the cloud databases. But it's the other way around; the tiny-dataset database is the one that fails.
Add this plot twist in: if I remove the filter for is_inventory, the query on the PC returns instantaneously. Also, if I just remove, one by one, the joins to the notes table, after about half of them are gone, the PC starts to finish instantaneously. It's almost like it's upset to be hitting the same table so many times with one query.
If I run EXPLAIN (without the ANALYZE option), here's the NO-hang version:
Hash Left Join (cost=31.55..40.09 rows=43 width=751)
Hash Cond: (mi.mounting_location_id = ml.mounting_location_id)
-> Hash Left Join (cost=30.34..38.76 rows=43 width=719)
Hash Cond: (mi.price_type_id = pt.price_type_id)
-> Hash Join (cost=29.25..37.53 rows=43 width=687)
Hash Cond: (mi.connectivity_id = nc.connectivity_id)
-> Nested Loop (cost=28.16..36.21 rows=43 width=655)
Join Filter: (mi.network_note_id = network_notes.note_id)
-> Seq Scan on notes network_notes (cost=0.00..1.01 rows=1 width=48)
-> Nested Loop (cost=28.16..34.66 rows=43 width=623)
Join Filter: (mi.plumbing_note_id = plumbing_notes.note_id)
-> Seq Scan on notes plumbing_notes (cost=0.00..1.01 rows=1 width=48)
-> Hash Join (cost=28.16..33.11 rows=43 width=591)
Hash Cond: (mi.generic_item_id = gi.generic_item_id)
-> Hash Join (cost=5.11..9.95 rows=43 width=559)
Hash Cond: (mi.structural_note_id = structural_notes.note_id)
-> Hash Join (cost=4.09..8.57 rows=43 width=527)
Hash Cond: (mi.specification_note_id = specification_notes.note_id)
-> Hash Join (cost=3.07..7.37 rows=43 width=495)
Hash Cond: (mi.hvac_note_id = hvac_notes.note_id)
-> Hash Join (cost=2.04..5.99 rows=43 width=463)
Hash Cond: (mi.electrical_note_id = electrical_notes.note_id)
-> Hash Join (cost=1.02..4.70 rows=43 width=431)
Hash Cond: (mi.manufacturer_id = mft.manufacturer_id)
-> Seq Scan on mft_items mi (cost=0.00..3.44 rows=43 width=399)
Filter: ((deletion_date IS NULL) AND is_active)
-> Hash (cost=1.01..1.01 rows=1 width=48)
-> Seq Scan on manufacturers mft (cost=0.00..1.01 rows=1 width=48)
-> Hash (cost=1.01..1.01 rows=1 width=48)
-> Seq Scan on notes electrical_notes (cost=0.00..1.01 rows=1 width=48)
-> Hash (cost=1.01..1.01 rows=1 width=48)
-> Seq Scan on notes hvac_notes (cost=0.00..1.01 rows=1 width=48)
-> Hash (cost=1.01..1.01 rows=1 width=48)
-> Seq Scan on notes specification_notes (cost=0.00..1.01 rows=1 width=48)
-> Hash (cost=1.01..1.01 rows=1 width=48)
-> Seq Scan on notes structural_notes (cost=0.00..1.01 rows=1 width=48)
-> Hash (cost=15.80..15.80 rows=580 width=48)
-> Seq Scan on generic_items gi (cost=0.00..15.80 rows=580 width=48)
-> Hash (cost=1.04..1.04 rows=4 width=36)
-> Seq Scan on connectivity nc (cost=0.00..1.04 rows=4 width=36)
-> Hash (cost=1.04..1.04 rows=4 width=36)
-> Seq Scan on price_types pt (cost=0.00..1.04 rows=4 width=36)
-> Hash (cost=1.09..1.09 rows=9 width=48)
-> Seq Scan on mounting_locations ml (cost=0.00..1.09 rows=9 width=48)
And this is the hang version:
Hash Left Join (cost=26.43..38.57 rows=16 width=751)
Hash Cond: (mi.mounting_location_id = ml.mounting_location_id)
-> Hash Left Join (cost=25.23..37.32 rows=16 width=719)
Hash Cond: (mi.price_type_id = pt.price_type_id)
-> Hash Join (cost=24.14..36.18 rows=16 width=687)
Hash Cond: (mi.connectivity_id = nc.connectivity_id)
-> Nested Loop (cost=23.05..35.00 rows=16 width=655)
Join Filter: (mi.network_note_id = network_notes.note_id)
-> Seq Scan on notes network_notes (cost=0.00..1.01 rows=1 width=48)
-> Nested Loop (cost=23.05..33.79 rows=16 width=623)
Join Filter: (mi.structural_note_id = structural_notes.note_id)
-> Seq Scan on notes structural_notes (cost=0.00..1.01 rows=1 width=48)
-> Nested Loop (cost=23.05..32.58 rows=16 width=591)
Join Filter: (mi.electrical_note_id = electrical_notes.note_id)
-> Seq Scan on notes electrical_notes (cost=0.00..1.01 rows=1 width=48)
-> Nested Loop (cost=23.05..31.37 rows=16 width=559)
Join Filter: (mi.specification_note_id = specification_notes.note_id)
-> Seq Scan on notes specification_notes (cost=0.00..1.01 rows=1 width=48)
-> Nested Loop (cost=23.05..30.16 rows=16 width=527)
Join Filter: (mi.plumbing_note_id = plumbing_notes.note_id)
-> Seq Scan on notes plumbing_notes (cost=0.00..1.01 rows=1 width=48)
-> Nested Loop (cost=23.05..28.95 rows=16 width=495)
Join Filter: (mi.hvac_note_id = hvac_notes.note_id)
-> Seq Scan on notes hvac_notes (cost=0.00..1.01 rows=1 width=48)
-> Nested Loop (cost=23.05..27.74 rows=16 width=463)
Join Filter: (mi.manufacturer_id = mft.manufacturer_id)
-> Seq Scan on manufacturers mft (cost=0.00..1.01 rows=1 width=48)
-> Hash Join (cost=23.05..26.53 rows=16 width=431)
Hash Cond: (mi.generic_item_id = gi.generic_item_id)
-> Seq Scan on mft_items mi (cost=0.00..3.44 rows=16 width=399)
Filter: ((deletion_date IS NULL) AND is_active AND (NOT is_inventory))
-> Hash (cost=15.80..15.80 rows=580 width=48)
-> Seq Scan on generic_items gi (cost=0.00..15.80 rows=580 width=48)
-> Hash (cost=1.04..1.04 rows=4 width=36)
-> Seq Scan on connectivity nc (cost=0.00..1.04 rows=4 width=36)
-> Hash (cost=1.04..1.04 rows=4 width=36)
-> Seq Scan on price_types pt (cost=0.00..1.04 rows=4 width=36)
-> Hash (cost=1.09..1.09 rows=9 width=48)
-> Seq Scan on mounting_locations ml (cost=0.00..1.09 rows=9 width=48)
I'd like to understand what I should be doing differently to escape this hang condition. Unfortunately, I'm not clear on what I'm doing wrong.

Related

How can I speed up my PostgreSQL SELECT function that uses a list for its WHERE clause?

I have a function SELECT that takes in a list of symbol of parameters.
CREATE OR REPLACE FUNCTION api.stats(p_stocks text[])
RETURNS TABLE(symbol character, industry text, adj_close money, week52high money, week52low money, marketcap money,
pe_ratio int, beta numeric, dividend_yield character)
as $$
SELECT DISTINCT ON (t1.symbol) t1.symbol,
t3.industry,
cast(t2.adj_close as money),
cast(t1.week52high as money),
cast(t1.week52low as money),
cast(t1.marketcap as money),
cast(t1.pe_ratio as int),
ROUND(t1.beta,2),
to_char(t1.dividend_yield * 100, '99D99%%')
FROM api.security_stats as t1
LEFT JOIN api.security_price as t2 USING (symbol)
LEFT JOIN api.security as t3 USING (symbol)
WHERE symbol = any($1) ORDER BY t1.symbol, t2.date DESC
$$ language sql
PARALLEL SAFE;
I'm trying to speed up the initial query by adding indexes and other methods, it did reduce my query time by half the speed, but only when the list has ONE value, it's still pretty slow with more than one value.
For brevity, I've added the original select statement below, with only one symbol as a parameter, AAPL:
SELECT DISTINCT ON (t1.symbol) t1.symbol,
t3.industry,
cast(t2.adj_close as money),
cast(t1.week52high as money),
cast(t1.week52low as money),
cast(t1.marketcap as money),
cast(t1.pe_ratio as int),
ROUND(t1.beta,2),
to_char(t1.dividend_yield * 100, '99D99%%')
FROM api.security_stats as t1
LEFT JOIN api.security_price as t2 USING (symbol)
LEFT JOIN api.security as t3 USING (symbol)
WHERE symbol = 'AAPL' ORDER BY t1.symbol, t2.date DESC;
Here are the details on performance:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=71365.86..72083.62 rows=52 width=130) (actual time=828.301..967.263 rows=1 loops=1)
-> Sort (cost=71365.86..72083.62 rows=287101 width=130) (actual time=828.299..946.342 rows=326894 loops=1)
Sort Key: t2.date DESC
Sort Method: external merge Disk: 33920kB
-> Hash Right Join (cost=304.09..25710.44 rows=287101 width=130) (actual time=0.638..627.083 rows=326894 loops=1)
Hash Cond: ((t2.symbol)::text = (t1.symbol)::text)
-> Bitmap Heap Scan on security_price t2 (cost=102.41..16523.31 rows=5417 width=14) (actual time=0.317..2.658 rows=4478 loops=1)
Recheck Cond: ((symbol)::text = 'AAPL'::text)
Heap Blocks: exact=153
-> Bitmap Index Scan on symbol_price_idx (cost=0.00..101.06 rows=5417 width=0) (actual time=0.292..0.293 rows=4478 loops=1)
Index Cond: ((symbol)::text = 'AAPL'::text)
-> Hash (cost=201.02..201.02 rows=53 width=79) (actual time=0.290..0.295 rows=73 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 17kB
-> Nested Loop Left Join (cost=4.98..201.02 rows=53 width=79) (actual time=0.062..0.252 rows=73 loops=1)
Join Filter: ((t1.symbol)::text = (t3.symbol)::text)
-> Bitmap Heap Scan on security_stats t1 (cost=4.70..191.93 rows=53 width=57) (actual time=0.046..0.195 rows=73 loops=1)
Recheck Cond: ((symbol)::text = 'AAPL'::text)
Heap Blocks: exact=73
-> Bitmap Index Scan on symbol_stats_idx (cost=0.00..4.69 rows=53 width=0) (actual time=0.029..0.029 rows=73 loops=1)
Index Cond: ((symbol)::text = 'AAPL'::text)
-> Materialize (cost=0.28..8.30 rows=1 width=26) (actual time=0.000..0.000 rows=1 loops=73)
-> Index Scan using symbol_security_idx on security t3 (cost=0.28..8.29 rows=1 width=26) (actual time=0.011..0.011 rows=1 loops=1)
Index Cond: ((symbol)::text = 'AAPL'::text)
Planning Time: 0.329 ms
Execution Time: 973.894 ms
Now, I will take the same SELECT statement above and change the where clause to WHERE symbol in ('AAPL','TLSA') to replicate my original FUNCTION first mentioned.
EDIT: Here is the new test using multiple values, when I changed work_mem to 10mb:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=253542.02..255477.13 rows=101 width=130) (actual time=5239.415..5560.114 rows=2 loops=1)
-> Sort (cost=253542.02..254509.58 rows=387022 width=130) (actual time=5239.412..5507.122 rows=430439 loops=1)
Sort Key: t1.symbol, t2.date DESC
Sort Method: external merge Disk: 43056kB
-> Hash Left Join (cost=160938.84..191162.40 rows=387022 width=130) (actual time=2558.718..3509.201 rows=430439 loops=1)
Hash Cond: ((t1.symbol)::text = (t2.symbol)::text)
-> Hash Left Join (cost=50.29..400.99 rows=107 width=79) (actual time=0.617..0.864 rows=112 loops=1)
Hash Cond: ((t1.symbol)::text = (t3.symbol)::text)
-> Bitmap Heap Scan on security_stats t1 (cost=9.40..359.81 rows=107 width=57) (actual time=0.051..0.246 rows=112 loops=1)
Recheck Cond: ((symbol)::text = ANY ('{AAPL,TSLA}'::text[]))
Heap Blocks: exact=112
-> Bitmap Index Scan on symbol_stats_idx (cost=0.00..9.38 rows=107 width=0) (actual time=0.030..0.031 rows=112 loops=1)
Index Cond: ((symbol)::text = ANY ('{AAPL,TSLA}'::text[]))
-> Hash (cost=28.73..28.73 rows=973 width=26) (actual time=0.558..0.559 rows=973 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 64kB
-> Seq Scan on security t3 (cost=0.00..28.73 rows=973 width=26) (actual time=0.009..0.274 rows=973 loops=1)
-> Hash (cost=99479.91..99479.91 rows=3532691 width=14) (actual time=2537.403..2537.404 rows=3532691 loops=1)
Buckets: 262144 Batches: 32 Memory Usage: 6170kB
-> Seq Scan on security_price t2 (cost=0.00..99479.91 rows=3532691 width=14) (actual time=0.302..1347.778 rows=3532691 loops=1)
Planning Time: 1.409 ms
Execution Time: 5569.160 ms
I've managed to solve the problem by removing a adj_close from my original query. My function is now fast. Thank you for helping me point out the problem within my query planner.

Recursive query slow on strange conditions

The following query is part of a much bigger one that runs perfectly fast on a filled DB but on a nearly empty one it is very long.
In this simplified form, it takes ~400ms to execute but if you remove either line (1) or lines (2) and (3) then it takes ~35ms. Why ? And how do I make it work normally ?
Some background about the DB :
DB is VACUUMed and ANALYZEd
ctract is empty
contrats contains only 2 lines, none of which has a idtypecontrat IN (4,5)
so tmpctr1 is empty
copyrightad contains 280 rows, only one matches the filters idoeu=13 and role IN ('E','CE')
in all cases, query returns ONE row (the one returned by the first part of the recursive CTE)
line (1) is absolutely not used in this version but removing it hides the problem for some reason
WITH RECURSIVE tmpctr1 AS (
SELECT ced.idad AS cedant, ced.idclient
FROM contrats c
JOIN CtrAct ced ON c.idcontrat=ced.idcontrat AND ced.isassignor
JOIN CtrAct ces ON c.idcontrat=ces.idcontrat AND NOT COALESCE(ces.isassignor,FALSE) --(1)
WHERE idtypecontrat IN (4,5)
)
,rec1 AS (
SELECT ca.idoeu,ca.idad AS chn,1 AS idclient, 1 AS level
FROM copyrightad ca
WHERE ca.role IN ('E','CE')
AND ca.idoeu = 13
UNION
SELECT r.idoeu,0, 0, r.level+1
FROM rec1 r
LEFT JOIN tmpctr1 c ON r.chn=c.cedant
LEFT JOIN tmpctr1 c2 ON r.idclient=c2.idclient -- (2)
WHERE r.level<20
AND (c.cedant is not null
OR c2.cedant is not null --(3)
)
)
select * from rec1
Query plan #1 : slow
QUERY PLAN
CTE Scan on rec1 (cost=1662106.61..2431078.65 rows=38448602 width=16) (actual time=384.975..398.182 rows=1 loops=1)
CTE tmpctr1
-> Hash Join (cost=36.06..116.37 rows=148225 width=8) (actual time=0.009..0.010 rows=0 loops=1)
Hash Cond: (c.idcontrat = ces.idcontrat)
-> Hash Join (cost=1.04..28.50 rows=385 width=16) (actual time=0.009..0.009 rows=0 loops=1)
Hash Cond: (ced.idcontrat = c.idcontrat)
-> Seq Scan on ctract ced (cost=0.00..25.40 rows=770 width=12) (actual time=0.008..0.008 rows=0 loops=1)
Filter: isassignor
-> Hash (cost=1.02..1.02 rows=1 width=4) (never executed)
-> Seq Scan on contrats c (cost=0.00..1.02 rows=1 width=4) (never executed)
Filter: (idtypecontrat = ANY ('{4,5}'::integer[]))
-> Hash (cost=25.40..25.40 rows=770 width=4) (never executed)
-> Seq Scan on ctract ces (cost=0.00..25.40 rows=770 width=4) (never executed)
Filter: (NOT COALESCE(isassignor, false))
CTE rec1
-> Recursive Union (cost=0.00..1661990.25 rows=38448602 width=16) (actual time=384.973..398.179 rows=1 loops=1)
-> Seq Scan on copyrightad ca (cost=0.00..8.20 rows=2 width=16) (actual time=384.970..384.981 rows=1 loops=1)
Filter: (((role)::text = ANY ('{E,CE}'::text[])) AND (idoeu = 13))
Rows Removed by Filter: 279
-> Merge Left Join (cost=21618.01..89301.00 rows=3844860 width=16) (actual time=13.193..13.193 rows=0 loops=1)
Merge Cond: (r.idclient = c2.idclient)
Filter: ((c_1.cedant IS NOT NULL) OR (c2.cedant IS NOT NULL))
Rows Removed by Filter: 1
-> Sort (cost=3892.89..3905.86 rows=5188 width=16) (actual time=13.179..13.180 rows=1 loops=1)
Sort Key: r.idclient
Sort Method: quicksort Memory: 25kB
-> Hash Right Join (cost=0.54..3572.76 rows=5188 width=16) (actual time=13.170..13.171 rows=1 loops=1)
Hash Cond: (c_1.cedant = r.chn)
-> CTE Scan on tmpctr1 c_1 (cost=0.00..2964.50 rows=148225 width=4) (actual time=0.011..0.011 rows=0 loops=1)
-> Hash (cost=0.45..0.45 rows=7 width=16) (actual time=13.150..13.150 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> WorkTable Scan on rec1 r (cost=0.00..0.45 rows=7 width=16) (actual time=13.138..13.140 rows=1 loops=1)
Filter: (level < 20)
-> Materialize (cost=17725.12..18466.25 rows=148225 width=8) (actual time=0.008..0.008 rows=0 loops=1)
-> Sort (cost=17725.12..18095.68 rows=148225 width=8) (actual time=0.007..0.007 rows=0 loops=1)
Sort Key: c2.idclient
Sort Method: quicksort Memory: 25kB
-> CTE Scan on tmpctr1 c2 (cost=0.00..2964.50 rows=148225 width=8) (actual time=0.000..0.000 rows=0 loops=1)
Planning Time: 0.270 ms
JIT:
Functions: 53
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 5.064 ms, Inlining 4.491 ms, Optimization 236.336 ms, Emission 155.206 ms, Total 401.097 ms
Execution Time: 403.549 ms
Query plan #2 : fast : line (1) is hidden
QUERY PLAN
CTE Scan on rec1 (cost=240.86..245.90 rows=252 width=16) (actual time=0.030..0.058 rows=1 loops=1)
CTE tmpctr1
-> Hash Join (cost=1.04..28.50 rows=385 width=8) (actual time=0.001..0.001 rows=0 loops=1)
Hash Cond: (ced.idcontrat = c.idcontrat)
-> Seq Scan on ctract ced (cost=0.00..25.40 rows=770 width=12) (actual time=0.001..0.001 rows=0 loops=1)
Filter: isassignor
-> Hash (cost=1.02..1.02 rows=1 width=4) (never executed)
-> Seq Scan on contrats c (cost=0.00..1.02 rows=1 width=4) (never executed)
Filter: (idtypecontrat = ANY ('{4,5}'::integer[]))
CTE rec1
-> Recursive Union (cost=0.00..212.35 rows=252 width=16) (actual time=0.029..0.056 rows=1 loops=1)
-> Seq Scan on copyrightad ca (cost=0.00..8.20 rows=2 width=16) (actual time=0.027..0.041 rows=1 loops=1)
Filter: (((role)::text = ANY ('{E,CE}'::text[])) AND (idoeu = 13))
Rows Removed by Filter: 279
-> Hash Right Join (cost=9.97..19.91 rows=25 width=16) (actual time=0.013..0.013 rows=0 loops=1)
Hash Cond: (c2.idclient = r.idclient)
Filter: ((c_1.cedant IS NOT NULL) OR (c2.cedant IS NOT NULL))
Rows Removed by Filter: 1
-> CTE Scan on tmpctr1 c2 (cost=0.00..7.70 rows=385 width=8) (actual time=0.000..0.000 rows=0 loops=1)
-> Hash (cost=9.81..9.81 rows=13 width=16) (actual time=0.009..0.009 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Hash Right Join (cost=0.54..9.81 rows=13 width=16) (actual time=0.008..0.008 rows=1 loops=1)
Hash Cond: (c_1.cedant = r.chn)
-> CTE Scan on tmpctr1 c_1 (cost=0.00..7.70 rows=385 width=4) (actual time=0.001..0.001 rows=0 loops=1)
-> Hash (cost=0.45..0.45 rows=7 width=16) (actual time=0.003..0.003 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> WorkTable Scan on rec1 r (cost=0.00..0.45 rows=7 width=16) (actual time=0.002..0.002 rows=1 loops=1)
Filter: (level < 20)
Planning Time: 0.330 ms
Execution Time: 0.094 ms
Query plan #3 : fast : lines (2) and (3) are hidden
QUERY PLAN
CTE Scan on rec1 (cost=1829.46..2907.50 rows=53902 width=16) (actual time=0.050..0.074 rows=1 loops=1)
CTE rec1
-> Recursive Union (cost=0.00..1829.46 rows=53902 width=16) (actual time=0.049..0.072 rows=1 loops=1)
-> Seq Scan on copyrightad ca (cost=0.00..8.20 rows=2 width=16) (actual time=0.046..0.067 rows=1 loops=1)
Filter: (((role)::text = ANY ('{E,CE}'::text[])) AND (idoeu = 13))
Rows Removed by Filter: 279
-> Hash Join (cost=30.45..74.32 rows=5390 width=16) (actual time=0.003..0.003 rows=0 loops=1)
Hash Cond: (c.idcontrat = ced.idcontrat)
-> Hash Join (cost=1.04..28.50 rows=385 width=8) (actual time=0.002..0.002 rows=0 loops=1)
Hash Cond: (ces.idcontrat = c.idcontrat)
-> Seq Scan on ctract ces (cost=0.00..25.40 rows=770 width=4) (actual time=0.002..0.002 rows=0 loops=1)
Filter: (NOT COALESCE(isassignor, false))
-> Hash (cost=1.02..1.02 rows=1 width=4) (never executed)
-> Seq Scan on contrats c (cost=0.00..1.02 rows=1 width=4) (never executed)
Filter: (idtypecontrat = ANY ('{4,5}'::integer[]))
-> Hash (cost=29.08..29.08 rows=27 width=12) (never executed)
-> Hash Join (cost=0.54..29.08 rows=27 width=12) (never executed)
Hash Cond: (ced.idad = r.chn)
-> Seq Scan on ctract ced (cost=0.00..25.40 rows=766 width=8) (never executed)
Filter: (isassignor AND (idad IS NOT NULL))
-> Hash (cost=0.45..0.45 rows=7 width=12) (never executed)
-> WorkTable Scan on rec1 r (cost=0.00..0.45 rows=7 width=12) (never executed)
Filter: (level < 20)
Planning Time: 0.310 ms
Execution Time: 0.179 ms
PostgreSQL 12.2
Edit: the same query on the same DB on PostgreSQL 11.6 runs fast (still highly over-estimating rows on some parts) so I guess this is a regression.
Why?
The immediate reason for the big difference in query execution time is "Just-in-Time compilation", which is active by default in Postgres 12. Quoting the release notes:
Enable Just-in-Time (JIT) compilation by default, if the server
has been built with support for it (Andres Freund)
Note that this support is not built by default, but has to be selected
explicitly while configuring the build.
Turn it off in your session and test again:
SET jit = off
But JIT only amplifies the underlying problem: Estimates are way off in the query plan, which leads Postgres to assume a huge number of rows resulting from the joins in CTE tmpctr1, and assume that JIT would pay off.
Keep PostgreSQL from sometimes choosing a bad query plan
You asserted that ...
DB is VACUUMed and ANALYZEd
ctract is empty
But Postgres expects to find 770 rows in a sequential scan:
-> Seq Scan on ctract ced (cost=0.00..25.40 rows=770 width=12) (actual time=0.008..0.008 rows=0 loops=1)
Filter: isassignor
Bold emphasis mine. The number 770 comes directly from pg_class.reltuples, meaning that statistic is completely out of date. Maybe you relied on autovacuum but something kept it from kicking in, or its settings are not aggressive enough? Run this manually and retry:
ANALYZE ctract;
There is probably more potential to optimize, but I stopped processing here.
In a populated database, indexes will help a lot. Are you aware that partial or expression indexes can help with customized statistics? See:
Index that is not used, yet influences query
Get count estimates from pg_class.reltuples for given conditions
Abount (1):
JOIN CtrAct ces ON c.idcontrat=ces.idcontrat AND NOT COALESCE(ces.isassignor,FALSE) --(1)
Try replacing it with the equivalent:
JOIN CtrAct ces ON c.idcontrat=ces.idcontrat AND ces.isassignor IS NOT TRUE
It's clearer in any case. The convoluted expression may prevent index usage or better estimates (not the problem here).

Analyze: Why a query taking could take so long, seems costs are low?

I am having these results for analyze for a simple query that does not return more than 150 records from tables less than 200 records most of them, as I have a table that stores latest value and the other fields are FK of the data.
Update: see the new results from same query some our later. The site is not public and/or there should be not users right now as it is in development.
explain analyze
SELECT lv.station_id,
s.name AS station_name,
s.latitude,
s.longitude,
s.elevation,
lv.element_id,
e.symbol AS element_symbol,
u.symbol,
e.name AS element_name,
lv.last_datetime AS datetime,
lv.last_value AS valor,
s.basin_id,
s.municipality_id
FROM (((element_station lv /*350 records*/
JOIN stations s ON ((lv.station_id = s.id))) /*40 records*/
JOIN elements e ON ((lv.element_id = e.id))) /*103 records*/
JOIN units u ON ((e.unit_id = u.id))) /* 32 records */
WHERE s.id = lv.station_id AND e.id = lv.element_id AND lv.interval_id = 6 and
lv.last_datetime >= ((now() - '06:00:00'::interval) - '01:00:00'::interval)
I have already tried VACUUM and after that some is saved, but again after some times it goes up. I have implemented an index on the fields.
Nested Loop (cost=0.29..2654.66 rows=1 width=92) (actual time=1219.390..35296.253 rows=157 loops=1)
Join Filter: (e.unit_id = u.id)
Rows Removed by Join Filter: 4867
-> Nested Loop (cost=0.29..2652.93 rows=1 width=92) (actual time=1219.383..35294.083 rows=157 loops=1)
Join Filter: (lv.element_id = e.id)
Rows Removed by Join Filter: 16014
-> Nested Loop (cost=0.29..2648.62 rows=1 width=61) (actual time=1219.301..35132.373 rows=157 loops=1)
-> Seq Scan on element_station lv (cost=0.00..2640.30 rows=1 width=20) (actual time=1219.248..1385.517 rows=157 loops=1)
Filter: ((interval_id = 6) AND (last_datetime >= ((now() - '06:00:00'::interval) - '01:00:00'::interval)))
Rows Removed by Filter: 168
-> Index Scan using stations_pkey on stations s (cost=0.29..8.31 rows=1 width=45) (actual time=3.471..214.941 rows=1 loops=157)
Index Cond: (id = lv.station_id)
-> Seq Scan on elements e (cost=0.00..3.03 rows=103 width=35) (actual time=0.003..0.999 rows=103 loops=157)
-> Seq Scan on units u (cost=0.00..1.32 rows=32 width=8) (actual time=0.002..0.005 rows=32 loops=157)
Planning time: 8.312 ms
Execution time: 35296.427 ms
update, same query running it tonight; no changes:
Sort (cost=601.74..601.88 rows=55 width=92) (actual time=1.822..1.841 rows=172 loops=1)
Sort Key: lv.last_datetime DESC
Sort Method: quicksort Memory: 52kB
-> Nested Loop (cost=11.60..600.15 rows=55 width=92) (actual time=0.287..1.680 rows=172 loops=1)
-> Hash Join (cost=11.31..248.15 rows=55 width=51) (actual time=0.263..0.616 rows=172 loops=1)
Hash Cond: (e.unit_id = u.id)
-> Hash Join (cost=9.59..245.60 rows=75 width=51) (actual time=0.225..0.528 rows=172 loops=1)
Hash Cond: (lv.element_id = e.id)
-> Bitmap Heap Scan on element_station lv (cost=5.27..240.25 rows=75 width=20) (actual time=0.150..0.359 rows=172 loops=1)
Recheck Cond: ((last_datetime >= ((now() - '06:00:00'::interval) - '01:00:00'::interval)) AND (interval_id = 6))
Heap Blocks: exact=22
-> Bitmap Index Scan on element_station_latest (cost=0.00..5.25 rows=75 width=0) (actual time=0.136..0.136 rows=226 loops=1)
Index Cond: ((last_datetime >= ((now() - '06:00:00'::interval) - '01:00:00'::interval)) AND (interval_id = 6))
-> Hash (cost=3.03..3.03 rows=103 width=35) (actual time=0.062..0.062 rows=103 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 15kB
-> Seq Scan on elements e (cost=0.00..3.03 rows=103 width=35) (actual time=0.006..0.031 rows=103 loops=1)
-> Hash (cost=1.32..1.32 rows=32 width=8) (actual time=0.019..0.019 rows=32 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> Seq Scan on units u (cost=0.00..1.32 rows=32 width=8) (actual time=0.003..0.005 rows=32 loops=1)
-> Index Scan using stations_pkey on stations s (cost=0.29..6.39 rows=1 width=45) (actual time=0.005..0.006 rows=1 loops=172)
Index Cond: (id = lv.station_id)
Planning time: 2.390 ms
Execution time: 2.009 ms
The problem is the misestimate of the number of rows in the sequential scan on element_station. Either autoanalyze has kicked in and calculated new statistics for the table or the data changed.
The problem is probably that PostgreSQL doesn't know the result of
((now() - '06:00:00'::interval) - '01:00:00'::interval)
at query planning time.
If that is possible for you, do it in two steps: First, calculate the expression above (either in PostgreSQL or on the client side). Then run the query with the result as a constant. That will make it easier for PostgreSQL to estimate the result count.

Postgres Table Slow Performance

We have a Product table in postgres DB. This is hosted on Heroku. We have 8 GB RAM and 250 GB disk space. 1000 IPOP allowed.
We are having proper indexes on columns.
Platform
PostgreSQL 9.5.12 on x86_64-pc-linux-gnu (Ubuntu 9.5.12-1.pgdg14.04+1), compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit
We are running a keywords search query on this table. We are having 2.8 millions records in this table. Our search query is too slow. Its giving us result in about 50 seconds. Which is too slow.
Query
SELECT
P .sfid AS prodsfid,
P .image_url__c image,
P .productcode sku,
P .Short_Description__c shortDesc,
P . NAME pname,
P .category__c,
P .price__c price,
P .description,
P .vendor_name__c vname,
P .vendor__c supSfid
FROM
staging.product2 P
JOIN (
SELECT
p1.sfid
FROM
staging.product2 p1
WHERE
p1. NAME ILIKE '%s%'
OR p1.productcode ILIKE '%s%'
) AS TEMP ON (P .sfid = TEMP .sfid)
WHERE
P .status__c = 'Available'
AND LOWER (
P .vendor_shipping_country__c
) = ANY (
VALUES
('us'),
('usa'),
('united states'),
('united states of america')
)
AND P .vendor_catalog_tier__c = ANY (
VALUES
('a1c37000000oljnAAA'),
('a1c37000000oljQAAQ'),
('a1c37000000oljQAAQ'),
('a1c37000000pT7IAAU'),
('a1c37000000omDjAAI'),
('a1c37000000oljMAAQ'),
('a1c37000000oljaAAA'),
('a1c37000000pT7SAAU'),
('a1c0R000000AFcVQAW'),
('a1c0R000000A1HAQA0'),
('a1c0R0000000OpWQAU'),
('a1c0R0000005TZMQA2'),
('a1c37000000oljdAAA'),
('a1c37000000ooTqAAI'),
('a1c37000000omLBAAY'),
('a1c0R0000005N8GQAU')
)
Here is the explain plan:
Nested Loop (cost=31.85..33886.54 rows=3681 width=750)
-> Hash Join (cost=31.77..31433.07 rows=4415 width=750)
Hash Cond: (lower((p.vendor_shipping_country__c)::text) = "*VALUES*".column1)
-> Nested Loop (cost=31.73..31423.67 rows=8830 width=761)
-> HashAggregate (cost=0.06..0.11 rows=16 width=32)
Group Key: "*VALUES*_1".column1
-> Values Scan on "*VALUES*_1" (cost=0.00..0.06 rows=16 width=32)
-> Bitmap Heap Scan on product2 p (cost=31.66..1962.32 rows=552 width=780)
Recheck Cond: ((vendor_catalog_tier__c)::text = "*VALUES*_1".column1)
Filter: ((status__c)::text = 'Available'::text)
-> Bitmap Index Scan on vendor_catalog_tier_prd_idx (cost=0.00..31.64 rows=1016 width=0)
Index Cond: ((vendor_catalog_tier__c)::text = "*VALUES*_1".column1)
-> Hash (cost=0.03..0.03 rows=4 width=32)
-> Unique (cost=0.02..0.03 rows=4 width=32)
-> Sort (cost=0.02..0.02 rows=4 width=32)
Sort Key: "*VALUES*".column1
-> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=4 width=32)
-> Index Scan using sfid_prd_idx on product2 p1 (cost=0.09..0.55 rows=1 width=19)
Index Cond: ((sfid)::text = (p.sfid)::text)
Filter: (((name)::text ~~* '%s%'::text) OR ((productcode)::text ~~* '%s%'::text))
Its returning around 140,576 records. By the way we need only top 5,000 records only. Will putting Limit help here?
Let me know how to make it fast and what is causing this slow.
EXPLAIN ANALYZE
#RaymondNijland Here is the explain analyze
Nested Loop (cost=31.83..33427.28 rows=4039 width=750) (actual time=1.903..4384.221 rows=140576 loops=1)
-> Hash Join (cost=31.74..30971.32 rows=4369 width=750) (actual time=1.852..1094.964 rows=164353 loops=1)
Hash Cond: (lower((p.vendor_shipping_country__c)::text) = "*VALUES*".column1)
-> Nested Loop (cost=31.70..30962.02 rows=8738 width=761) (actual time=1.800..911.738 rows=164353 loops=1)
-> HashAggregate (cost=0.06..0.11 rows=16 width=32) (actual time=0.012..0.019 rows=15 loops=1)
Group Key: "*VALUES*_1".column1
-> Values Scan on "*VALUES*_1" (cost=0.00..0.06 rows=16 width=32) (actual time=0.004..0.005 rows=16 loops=1)
-> Bitmap Heap Scan on product2 p (cost=31.64..1933.48 rows=546 width=780) (actual time=26.004..57.290 rows=10957 loops=15)
Recheck Cond: ((vendor_catalog_tier__c)::text = "*VALUES*_1".column1)
Filter: ((status__c)::text = 'Available'::text)
Rows Removed by Filter: 645
Heap Blocks: exact=88436
-> Bitmap Index Scan on vendor_catalog_tier_prd_idx (cost=0.00..31.61 rows=1000 width=0) (actual time=24.811..24.811 rows=11601 loops=15)
Index Cond: ((vendor_catalog_tier__c)::text = "*VALUES*_1".column1)
-> Hash (cost=0.03..0.03 rows=4 width=32) (actual time=0.032..0.032 rows=4 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Unique (cost=0.02..0.03 rows=4 width=32) (actual time=0.026..0.027 rows=4 loops=1)
-> Sort (cost=0.02..0.02 rows=4 width=32) (actual time=0.026..0.026 rows=4 loops=1)
Sort Key: "*VALUES*".column1
Sort Method: quicksort Memory: 25kB
-> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=4 width=32) (actual time=0.001..0.002 rows=4 loops=1)
-> Index Scan using sfid_prd_idx on product2 p1 (cost=0.09..0.56 rows=1 width=19) (actual time=0.019..0.020 rows=1 loops=164353)
Index Cond: ((sfid)::text = (p.sfid)::text)
Filter: (((name)::text ~~* '%s%'::text) OR ((productcode)::text ~~* '%s%'::text))
Rows Removed by Filter: 0
Planning time: 2.488 ms
Execution time: 4391.378 ms
Another query version, with order by , but it seems very slow as well (140 seconds)
SELECT
P .sfid AS prodsfid,
P .image_url__c image,
P .productcode sku,
P .Short_Description__c shortDesc,
P . NAME pname,
P .category__c,
P .price__c price,
P .description,
P .vendor_name__c vname,
P .vendor__c supSfid
FROM
staging.product2 P
WHERE
P .status__c = 'Available'
AND P .vendor_shipping_country__c IN (
'us',
'usa',
'united states',
'united states of america'
)
AND P .vendor_catalog_tier__c IN (
'a1c37000000omDQAAY',
'a1c37000000omDTAAY',
'a1c37000000omDXAAY',
'a1c37000000omDYAAY',
'a1c37000000omDZAAY',
'a1c37000000omDdAAI',
'a1c37000000omDfAAI',
'a1c37000000omDiAAI',
'a1c37000000oml6AAA',
'a1c37000000oljPAAQ',
'a1c37000000oljRAAQ',
'a1c37000000oljWAAQ',
'a1c37000000oljXAAQ',
'a1c37000000oljZAAQ',
'a1c37000000oljcAAA',
'a1c37000000oljdAAA',
'a1c37000000oljlAAA',
'a1c37000000oljoAAA',
'a1c37000000oljqAAA',
'a1c37000000olnvAAA',
'a1c37000000olnwAAA',
'a1c37000000olnxAAA',
'a1c37000000olnyAAA',
'a1c37000000olo0AAA',
'a1c37000000olo1AAA',
'a1c37000000olo4AAA',
'a1c37000000olo8AAA',
'a1c37000000olo9AAA',
'a1c37000000oloCAAQ',
'a1c37000000oloFAAQ',
'a1c37000000oloIAAQ',
'a1c37000000oloJAAQ',
'a1c37000000oloMAAQ',
'a1c37000000oloNAAQ',
'a1c37000000oloSAAQ',
'a1c37000000olodAAA',
'a1c37000000oloeAAA',
'a1c37000000olzCAAQ',
'a1c37000000om0xAAA',
'a1c37000000ooV1AAI',
'a1c37000000oog8AAA',
'a1c37000000oogDAAQ',
'a1c37000000oonzAAA',
'a1c37000000oluuAAA',
'a1c37000000pT7SAAU',
'a1c37000000oljnAAA',
'a1c37000000olumAAA',
'a1c37000000oljpAAA',
'a1c37000000pUm2AAE',
'a1c37000000olo3AAA',
'a1c37000000oo1MAAQ',
'a1c37000000oo1vAAA',
'a1c37000000pWxgAAE',
'a1c37000000pYJkAAM',
'a1c37000000omDjAAI',
'a1c37000000ooTgAAI',
'a1c37000000op2GAAQ',
'a1c37000000one0AAA',
'a1c37000000oljYAAQ',
'a1c37000000pUlxAAE',
'a1c37000000oo9SAAQ',
'a1c37000000pcIYAAY',
'a1c37000000pamtAAA',
'a1c37000000pd2QAAQ',
'a1c37000000pdCOAAY',
'a1c37000000OpPaAAK',
'a1c37000000OphZAAS',
'a1c37000000olNkAAI'
)
ORDER BY p.productcode asc
LIMIT 5000
Here is the explain analyse for this:
Limit (cost=0.09..45271.54 rows=5000 width=750) (actual time=48593.355..86376.864 rows=5000 loops=1)
-> Index Scan using productcode_prd_idx on product2 p (cost=0.09..743031.39 rows=82064 width=750) (actual time=48593.353..86376.283 rows=5000 loops=1)
Filter: (((status__c)::text = 'Available'::text) AND ((vendor_shipping_country__c)::text = ANY ('{us,usa,"united states","united states of america"}'::text[])) AND ((vendor_catalog_tier__c)::text = ANY ('{a1c37000000omDQAAY,a1c37000000omDTAAY,a1c37000000omDXAAY,a1c37000000omDYAAY,a1c37000000omDZAAY,a1c37000000omDdAAI,a1c37000000omDfAAI,a1c37000000omDiAAI,a1c37000000oml6AAA,a1c37000000oljPAAQ,a1c37000000oljRAAQ,a1c37000000oljWAAQ,a1c37000000oljXAAQ,a1c37000000oljZAAQ,a1c37000000oljcAAA,a1c37000000oljdAAA,a1c37000000oljlAAA,a1c37000000oljoAAA,a1c37000000oljqAAA,a1c37000000olnvAAA,a1c37000000olnwAAA,a1c37000000olnxAAA,a1c37000000olnyAAA,a1c37000000olo0AAA,a1c37000000olo1AAA,a1c37000000olo4AAA,a1c37000000olo8AAA,a1c37000000olo9AAA,a1c37000000oloCAAQ,a1c37000000oloFAAQ,a1c37000000oloIAAQ,a1c37000000oloJAAQ,a1c37000000oloMAAQ,a1c37000000oloNAAQ,a1c37000000oloSAAQ,a1c37000000olodAAA,a1c37000000oloeAAA,a1c37000000olzCAAQ,a1c37000000om0xAAA,a1c37000000ooV1AAI,a1c37000000oog8AAA,a1c37000000oogDAAQ,a1c37000000oonzAAA,a1c37000000oluuAAA,a1c37000000pT7SAAU,a1c37000000oljnAAA,a1c37000000olumAAA,a1c37000000oljpAAA,a1c37000000pUm2AAE,a1c37000000olo3AAA,a1c37000000oo1MAAQ,a1c37000000oo1vAAA,a1c37000000pWxgAAE,a1c37000000pYJkAAM,a1c37000000omDjAAI,a1c37000000ooTgAAI,a1c37000000op2GAAQ,a1c37000000one0AAA,a1c37000000oljYAAQ,a1c37000000pUlxAAE,a1c37000000oo9SAAQ,a1c37000000pcIYAAY,a1c37000000pamtAAA,a1c37000000pd2QAAQ,a1c37000000pdCOAAY,a1c37000000OpPaAAK,a1c37000000OphZAAS,a1c37000000olNkAAI}'::text[])))
Rows Removed by Filter: 1707920
Planning time: 1.685 ms
Execution time: 86377.139 ms
Thanks
Aslam Bari
You might want to consider a GIN or GIST index on your staging.product2 table. Double-sided ILIKEs are slow and difficult to improve substantially. I've seen a GIN index improve a similar query by 60-80%.
See this doc.

Optimizing my postgres query

Can I optimize this query, or modify the table structure in order to shorten the execution time? I don't really understand the output of EXPLAIN. Am I missing some index?
EXPLAIN SELECT COUNT(*) AS count,
q.query_str
FROM click_fact cf,
query q,
date_dim dd,
queries_p_day_mv qpd
WHERE dd.date_dim_id = qpd.date_dim_id
AND qpd.query_id = q.query_id
AND type = 'S'
AND cf.query_id = q.query_id *emphasized text*
AND dd.pg_date BETWEEN '2010-12-29' AND '2011-01-28'
AND qpd.interface_id IN (SELECT DISTINCT interface_id from interface WHERE lang = 'sv')
GROUP BY q.query_str
ORDER BY count DESC;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=19170.15..19188.80 rows=7460 width=12)
Sort Key: (count(*))
-> HashAggregate (cost=18597.03..18690.28 rows=7460 width=12)
-> Nested Loop (cost=10.20..18559.73 rows=7460 width=12)
-> Nested Loop (cost=10.20..14975.36 rows=2452 width=20)
Join Filter: (qpd.interface_id = interface.interface_id)
-> Unique (cost=1.03..1.04 rows=1 width=4)
-> Sort (cost=1.03..1.04 rows=1 width=4)
Sort Key: interface.interface_id
-> Seq Scan on interface (cost=0.00..1.02 rows=1 width=4)
Filter: (lang = 'sv'::text)
-> Nested Loop (cost=9.16..14943.65 rows=2452 width=24)
-> Hash Join (cost=9.16..14133.58 rows=2452 width=8)
Hash Cond: (qpd.date_dim_id = dd.date_dim_id)
-> Seq Scan on queries_p_day_mv qpd (cost=0.00..11471.93 rows=700793 width=12)
-> Hash (cost=8.81..8.81 rows=28 width=4)
-> Index Scan using date_dim_pg_date_index on date_dim dd (cost=0.00..8.81 rows=28 width=4)
Index Cond: ((pg_date >= '2010-12-29'::date) AND (pg_date <= '2011-01-28'::date))
-> Index Scan using query_pkey on query q (cost=0.00..0.32 rows=1 width=16)
Index Cond: (q.query_id = qpd.query_id)
-> Index Scan using click_fact_query_id_index on click_fact cf (cost=0.00..1.01 rows=36 width=4)
Index Cond: (cf.query_id = qpd.query_id)
Filter: (cf.type = 'S'::bpchar)
Updated with EXPLAIN ANALYZE:
EXPLAIN ANALYZE SELECT COUNT(*) AS count,
q.query_str
FROM click_fact cf,
query q,
date_dim dd,
queries_p_day_mv qpd
WHERE dd.date_dim_id = qpd.date_dim_id
AND qpd.query_id = q.query_id
AND type = 'S'
AND cf.query_id = q.query_id
AND dd.pg_date BETWEEN '2010-12-29' AND '2011-01-28'
AND qpd.interface_id IN (SELECT DISTINCT interface_id from interface WHERE lang = 'sv')
GROUP BY q.query_str
ORDER BY count DESC;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=19201.06..19220.52 rows=7784 width=12) (actual time=51017.162..51046.102 rows=17586 loops=1)
Sort Key: (count(*))
Sort Method: external merge Disk: 632kB
-> HashAggregate (cost=18600.67..18697.97 rows=7784 width=12) (actual time=50935.411..50968.678 rows=17586 loops=1)
-> Nested Loop (cost=10.20..18561.75 rows=7784 width=12) (actual time=42.079..43666.404 rows=3868592 loops=1)
-> Nested Loop (cost=10.20..14975.91 rows=2453 width=20) (actual time=23.678..14609.282 rows=700803 loops=1)
Join Filter: (qpd.interface_id = interface.interface_id)
-> Unique (cost=1.03..1.04 rows=1 width=4) (actual time=0.104..0.110 rows=1 loops=1)
-> Sort (cost=1.03..1.04 rows=1 width=4) (actual time=0.100..0.102 rows=1 loops=1)
Sort Key: interface.interface_id
Sort Method: quicksort Memory: 25kB
-> Seq Scan on interface (cost=0.00..1.02 rows=1 width=4) (actual time=0.038..0.041 rows=1 loops=1)
Filter: (lang = 'sv'::text)
-> Nested Loop (cost=9.16..14944.20 rows=2453 width=24) (actual time=23.550..12553.786 rows=700808 loops=1)
-> Hash Join (cost=9.16..14133.80 rows=2453 width=8) (actual time=18.283..3885.700 rows=700808 loops=1)
Hash Cond: (qpd.date_dim_id = dd.date_dim_id)
-> Seq Scan on queries_p_day_mv qpd (cost=0.00..11472.08 rows=700808 width=12) (actual time=0.014..1587.106 rows=700808 loops=1)
-> Hash (cost=8.81..8.81 rows=28 width=4) (actual time=18.221..18.221 rows=31 loops=1)
-> Index Scan using date_dim_pg_date_index on date_dim dd (cost=0.00..8.81 rows=28 width=4) (actual time=14.388..18.152 rows=31 loops=1)
Index Cond: ((pg_date >= '2010-12-29'::date) AND (pg_date <= '2011-01-28'::date))
-> Index Scan using query_pkey on query q (cost=0.00..0.32 rows=1 width=16) (actual time=0.005..0.006 rows=1 loops=700808)
Index Cond: (q.query_id = qpd.query_id)
-> Index Scan using click_fact_query_id_index on click_fact cf (cost=0.00..1.01 rows=36 width=4) (actual time=0.005..0.022 rows=6 loops=700803)
Index Cond: (cf.query_id = qpd.query_id)
Filter: (cf.type = 'S'::bpchar)
You may try to eliminate subquery:
SELECT COUNT(*) AS count,
q.query_str
FROM click_fact cf,
query q,
date_dim dd,
queries_p_day_mv qpd
WHERE dd.date_dim_id = qpd.date_dim_id
AND qpd.query_id = q.query_id
AND type = 'S'
AND cf.query_id = q.query_id
AND dd.pg_date BETWEEN '2010-12-29' AND '2011-01-28'
AND qpd.interface_id = interface.interface_id
AND interface.lang = 'sv'
GROUP BY q.query_str
ORDER BY count DESC;
Also, if interface table is big, creating ingex on lang may help. index in queries_p_day_mv on day_dim_id may help too.
Generally, the first thing to try is to look for Seq Scans and try to make them index scans by creating indexes.
HTH
SELECT COUNT(*) AS count,
q.query_str
FROM date_dim dd
JOIN queries_p_date_mv qpd
ON qpd.date_dim_id = dd.date_dim_id
AND qpd.interface_id IN
(
SELECT interface_id
FROM interface
WHERE lang = 'sv'
)
JOIN query q
ON q.query_id = qpd.query_id
JOIN click_fact cf
ON cf.query_id = q.query_id
AND cf.type = 'S'
WHERE dd.pg_date BETWEEN '2010-12-29' AND '2011-01-28'
GROUP BY
q.query_str
ORDER BY
count DESC
Create the following indexes (in addition to your existing ones):
queries_p_date_mv (interface_id, date_dim_id)
interface (lang)
click_fact (query_id, type)
Could you please post the definitions of your tables?