Optimizing my postgres query - postgresql

Can I optimize this query, or modify the table structure in order to shorten the execution time? I don't really understand the output of EXPLAIN. Am I missing some index?
EXPLAIN SELECT COUNT(*) AS count,
q.query_str
FROM click_fact cf,
query q,
date_dim dd,
queries_p_day_mv qpd
WHERE dd.date_dim_id = qpd.date_dim_id
AND qpd.query_id = q.query_id
AND type = 'S'
AND cf.query_id = q.query_id *emphasized text*
AND dd.pg_date BETWEEN '2010-12-29' AND '2011-01-28'
AND qpd.interface_id IN (SELECT DISTINCT interface_id from interface WHERE lang = 'sv')
GROUP BY q.query_str
ORDER BY count DESC;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=19170.15..19188.80 rows=7460 width=12)
Sort Key: (count(*))
-> HashAggregate (cost=18597.03..18690.28 rows=7460 width=12)
-> Nested Loop (cost=10.20..18559.73 rows=7460 width=12)
-> Nested Loop (cost=10.20..14975.36 rows=2452 width=20)
Join Filter: (qpd.interface_id = interface.interface_id)
-> Unique (cost=1.03..1.04 rows=1 width=4)
-> Sort (cost=1.03..1.04 rows=1 width=4)
Sort Key: interface.interface_id
-> Seq Scan on interface (cost=0.00..1.02 rows=1 width=4)
Filter: (lang = 'sv'::text)
-> Nested Loop (cost=9.16..14943.65 rows=2452 width=24)
-> Hash Join (cost=9.16..14133.58 rows=2452 width=8)
Hash Cond: (qpd.date_dim_id = dd.date_dim_id)
-> Seq Scan on queries_p_day_mv qpd (cost=0.00..11471.93 rows=700793 width=12)
-> Hash (cost=8.81..8.81 rows=28 width=4)
-> Index Scan using date_dim_pg_date_index on date_dim dd (cost=0.00..8.81 rows=28 width=4)
Index Cond: ((pg_date >= '2010-12-29'::date) AND (pg_date <= '2011-01-28'::date))
-> Index Scan using query_pkey on query q (cost=0.00..0.32 rows=1 width=16)
Index Cond: (q.query_id = qpd.query_id)
-> Index Scan using click_fact_query_id_index on click_fact cf (cost=0.00..1.01 rows=36 width=4)
Index Cond: (cf.query_id = qpd.query_id)
Filter: (cf.type = 'S'::bpchar)
Updated with EXPLAIN ANALYZE:
EXPLAIN ANALYZE SELECT COUNT(*) AS count,
q.query_str
FROM click_fact cf,
query q,
date_dim dd,
queries_p_day_mv qpd
WHERE dd.date_dim_id = qpd.date_dim_id
AND qpd.query_id = q.query_id
AND type = 'S'
AND cf.query_id = q.query_id
AND dd.pg_date BETWEEN '2010-12-29' AND '2011-01-28'
AND qpd.interface_id IN (SELECT DISTINCT interface_id from interface WHERE lang = 'sv')
GROUP BY q.query_str
ORDER BY count DESC;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=19201.06..19220.52 rows=7784 width=12) (actual time=51017.162..51046.102 rows=17586 loops=1)
Sort Key: (count(*))
Sort Method: external merge Disk: 632kB
-> HashAggregate (cost=18600.67..18697.97 rows=7784 width=12) (actual time=50935.411..50968.678 rows=17586 loops=1)
-> Nested Loop (cost=10.20..18561.75 rows=7784 width=12) (actual time=42.079..43666.404 rows=3868592 loops=1)
-> Nested Loop (cost=10.20..14975.91 rows=2453 width=20) (actual time=23.678..14609.282 rows=700803 loops=1)
Join Filter: (qpd.interface_id = interface.interface_id)
-> Unique (cost=1.03..1.04 rows=1 width=4) (actual time=0.104..0.110 rows=1 loops=1)
-> Sort (cost=1.03..1.04 rows=1 width=4) (actual time=0.100..0.102 rows=1 loops=1)
Sort Key: interface.interface_id
Sort Method: quicksort Memory: 25kB
-> Seq Scan on interface (cost=0.00..1.02 rows=1 width=4) (actual time=0.038..0.041 rows=1 loops=1)
Filter: (lang = 'sv'::text)
-> Nested Loop (cost=9.16..14944.20 rows=2453 width=24) (actual time=23.550..12553.786 rows=700808 loops=1)
-> Hash Join (cost=9.16..14133.80 rows=2453 width=8) (actual time=18.283..3885.700 rows=700808 loops=1)
Hash Cond: (qpd.date_dim_id = dd.date_dim_id)
-> Seq Scan on queries_p_day_mv qpd (cost=0.00..11472.08 rows=700808 width=12) (actual time=0.014..1587.106 rows=700808 loops=1)
-> Hash (cost=8.81..8.81 rows=28 width=4) (actual time=18.221..18.221 rows=31 loops=1)
-> Index Scan using date_dim_pg_date_index on date_dim dd (cost=0.00..8.81 rows=28 width=4) (actual time=14.388..18.152 rows=31 loops=1)
Index Cond: ((pg_date >= '2010-12-29'::date) AND (pg_date <= '2011-01-28'::date))
-> Index Scan using query_pkey on query q (cost=0.00..0.32 rows=1 width=16) (actual time=0.005..0.006 rows=1 loops=700808)
Index Cond: (q.query_id = qpd.query_id)
-> Index Scan using click_fact_query_id_index on click_fact cf (cost=0.00..1.01 rows=36 width=4) (actual time=0.005..0.022 rows=6 loops=700803)
Index Cond: (cf.query_id = qpd.query_id)
Filter: (cf.type = 'S'::bpchar)

You may try to eliminate subquery:
SELECT COUNT(*) AS count,
q.query_str
FROM click_fact cf,
query q,
date_dim dd,
queries_p_day_mv qpd
WHERE dd.date_dim_id = qpd.date_dim_id
AND qpd.query_id = q.query_id
AND type = 'S'
AND cf.query_id = q.query_id
AND dd.pg_date BETWEEN '2010-12-29' AND '2011-01-28'
AND qpd.interface_id = interface.interface_id
AND interface.lang = 'sv'
GROUP BY q.query_str
ORDER BY count DESC;
Also, if interface table is big, creating ingex on lang may help. index in queries_p_day_mv on day_dim_id may help too.
Generally, the first thing to try is to look for Seq Scans and try to make them index scans by creating indexes.
HTH

SELECT COUNT(*) AS count,
q.query_str
FROM date_dim dd
JOIN queries_p_date_mv qpd
ON qpd.date_dim_id = dd.date_dim_id
AND qpd.interface_id IN
(
SELECT interface_id
FROM interface
WHERE lang = 'sv'
)
JOIN query q
ON q.query_id = qpd.query_id
JOIN click_fact cf
ON cf.query_id = q.query_id
AND cf.type = 'S'
WHERE dd.pg_date BETWEEN '2010-12-29' AND '2011-01-28'
GROUP BY
q.query_str
ORDER BY
count DESC
Create the following indexes (in addition to your existing ones):
queries_p_date_mv (interface_id, date_dim_id)
interface (lang)
click_fact (query_id, type)
Could you please post the definitions of your tables?

Related

How does a string operation on a column in a filter condition of a Postgresql query have on the plan it chooses

I was working on optimising a query, with dumb luck I tried something and it improved the query but I am unable to explain why.
Below is the query with poor performance
with ctedata1 as(
select
sum(total_visit_count) as total_visit_count,
sum(sh_visit_count) as sh_visit_count,
sum(ec_visit_count) as ec_visit_count,
sum(total_like_count) as total_like_count,
sum(sh_like_count) as sh_like_count,
sum(ec_like_count) as ec_like_count,
sum(total_order_count) as total_order_count,
sum(sh_order_count) as sh_order_count,
sum(ec_order_count) as ec_order_count,
sum(total_sales_amount) as total_sales_amount,
sum(sh_sales_amount) as sh_sales_amount,
sum(ec_sales_amount) as ec_sales_amount,
sum(ec_order_online_count) as ec_order_online_count,
sum(ec_sales_online_amount) as ec_sales_online_amount,
sum(ec_order_in_store_count) as ec_order_in_store_count,
sum(ec_sales_in_store_amount) as ec_sales_in_store_amount,
table2.im_name,
table2.brand as kpibrand,
table2.id_region as kpiregion
from
table2
where
deleted_at is null
and id_region = any('{1}')
group by
im_name,
kpiregion,
kpibrand ),
ctedata2 as (
select
ctedata1.*,
rank() over (partition by (kpiregion,
kpibrand)
order by
coalesce(ctedata1.total_sales_amount, 0) desc) rank,
count(*) over (partition by (kpiregion,
kpibrand)) as total_count
from
ctedata1 )
select
table1.id_pf_item,
table1.product_id,
table1.color_code,
table1.l1_code,
table1.local_title as product_name,
table1.id_region,
table1.gender,
case
when table1.created_at is null then '1970/01/01 00:00:00'
else table1.created_at
end as created_at,
(
select
count(distinct id_outfit)
from
table3
left join table4 on
table3.id_item = table4.id_item
and table4.deleted_at is null
where
table3.deleted_at is null
and table3.id_pf_item = table1.id_pf_item) as outfit_count,
count(*) over() as total_matched,
case
when table1.v8_im_name = '' then table1.im_name
else table1.v8_im_name
end as im_name,
case
when table1.id_region != 1 then null
else
case
when table1.sales_start_at is null then '1970/01/01 00:00:00'
else table1.sales_start_at
end
end as sales_start_date,
table1.category_ids,
array_to_string(table1.intermediate_category_ids, ','),
table1.image_url,
table1.brand,
table1.pdp_url,
coalesce(ctedata2.total_visit_count, 0) as total_visit_count,
coalesce(ctedata2.sh_visit_count, 0) as sh_visit_count,
coalesce(ctedata2.ec_visit_count, 0) as ec_visit_count,
coalesce(ctedata2.total_like_count, 0) as total_like_count,
coalesce(ctedata2.sh_like_count, 0) as sh_like_count,
coalesce(ctedata2.ec_like_count, 0) as ec_like_count,
coalesce(ctedata2.total_order_count, 0) as total_order_count,
coalesce(ctedata2.sh_order_count, 0) as sh_order_count,
coalesce(ctedata2.ec_order_count, 0) as ec_order_count,
coalesce(ctedata2.total_sales_amount, 0) as total_sales_amount,
coalesce(ctedata2.sh_sales_amount, 0) as sh_sales_amount,
coalesce(ctedata2.ec_sales_amount, 0) as ec_sales_amount,
coalesce(ctedata2.ec_order_online_count, 0) as ec_order_online_count,
coalesce(ctedata2.ec_sales_online_amount, 0) as ec_sales_online_amount,
coalesce(ctedata2.ec_order_in_store_count, 0) as ec_order_in_store_count,
coalesce(ctedata2.ec_sales_in_store_amount, 0) as ec_sales_in_store_amount,
ctedata2.rank,
ctedata2.total_count,
table1.department,
table1.seasons
from
table1
left join ctedata2 on
table1.im_name = ctedata2.im_name
and table1.brand = ctedata2.kpibrand
where
table1.deleted_at is null
and table1.id_region = any('{1}')
and lower(table1.brand) = any('{"brand1","brand2"}')
and 'season1' = any(lower(seasons::text)::text[])
and table1.department = 'Department1'
order by
total_sales_amount desc offset 0
limit 100
The explain output for above query is
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=172326.55..173435.38 rows=1 width=952) (actual time=85664.201..85665.970 rows=100 loops=1)
CTE ctedata1
-> GroupAggregate (cost=0.42..80478.71 rows=43468 width=530) (actual time=0.063..708.069 rows=73121 loops=1)
Group Key: table2.im_name, table2.id_region, table2.brand
-> Index Scan using udx_table2_im_name_id_region_brand_target_date_key on table2 (cost=0.42..59699.18 rows=391708 width=146) (actual time=0.029..308.582 rows=391779 loops=1)
Filter: ((deleted_at IS NULL) AND (id_region = ANY ('{1}'::integer[])))
Rows Removed by Filter: 20415
CTE ctedata2
-> WindowAgg (cost=16104.06..17842.78 rows=43468 width=628) (actual time=1012.994..1082.057 rows=73121 loops=1)
-> WindowAgg (cost=16104.06..17082.09 rows=43468 width=620) (actual time=945.755..1014.656 rows=73121 loops=1)
-> Sort (cost=16104.06..16212.73 rows=43468 width=612) (actual time=945.747..963.254 rows=73121 loops=1)
Sort Key: ctedata1.kpiregion, ctedata1.kpibrand, (COALESCE(ctedata1.total_sales_amount, '0'::numeric)) DESC
Sort Method: external merge Disk: 6536kB
-> CTE Scan on ctedata1 (cost=0.00..869.36 rows=43468 width=612) (actual time=0.069..824.841 rows=73121 loops=1)
-> Result (cost=74005.05..75113.88 rows=1 width=952) (actual time=85664.199..85665.950 rows=100 loops=1)
-> Sort (cost=74005.05..74005.05 rows=1 width=944) (actual time=85664.072..85664.089 rows=100 loops=1)
Sort Key: (COALESCE(ctedata2.total_sales_amount, '0'::numeric)) DESC
Sort Method: top-N heapsort Memory: 76kB
-> WindowAgg (cost=10960.95..74005.04 rows=1 width=944) (actual time=85658.049..85661.393 rows=3151 loops=1)
-> Nested Loop Left Join (cost=10960.95..74005.02 rows=1 width=927) (actual time=1075.219..85643.595 rows=3151 loops=1)
Join Filter: (((table1.im_name)::text = ctedata2.im_name) AND ((table1.brand)::text = ctedata2.kpibrand))
Rows Removed by Join Filter: 230402986
-> Bitmap Heap Scan on table1 (cost=10960.95..72483.64 rows=1 width=399) (actual time=45.466..278.376 rows=3151 loops=1)
Recheck Cond: (id_region = ANY ('{1}'::integer[]))
Filter: ((deleted_at IS NULL) AND (department = 'Department1'::text) AND (lower((brand)::text) = ANY ('{brand1, brand2}'::text[])) AND ('season1'::text = ANY ((lower((seasons)::text))::text[])))
Rows Removed by Filter: 106335
Heap Blocks: exact=42899
-> Bitmap Index Scan on table1_im_name_id_region_key (cost=0.00..10960.94 rows=110619 width=0) (actual time=38.307..38.307 rows=109486 loops=1)
Index Cond: (id_region = ANY ('{1}'::integer[]))
-> CTE Scan on ctedata2 (cost=0.00..869.36 rows=43468 width=592) (actual time=0.325..21.721 rows=73121 loops=3151)
SubPlan 3
-> Aggregate (cost=1108.80..1108.81 rows=1 width=8) (actual time=0.018..0.018 rows=1 loops=100)
-> Nested Loop Left Join (cost=5.57..1108.57 rows=93 width=4) (actual time=0.007..0.016 rows=3 loops=100)
-> Bitmap Heap Scan on table3 (cost=5.15..350.95 rows=93 width=4) (actual time=0.005..0.008 rows=3 loops=100)
Recheck Cond: (id_pf_item = table1.id_pf_item)
Filter: (deleted_at IS NULL)
Heap Blocks: exact=107
-> Bitmap Index Scan on idx_id_pf_item (cost=0.00..5.12 rows=93 width=0) (actual time=0.003..0.003 rows=3 loops=100)
Index Cond: (id_pf_item = table1.id_pf_item)
-> Index Scan using index_table4_id_item on table4 (cost=0.42..8.14 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=303)
Index Cond: (table3.id_item = id_item)
Filter: (deleted_at IS NULL)
Rows Removed by Filter: 0
Planning time: 1.023 ms
Execution time: 85669.512 ms
I changed
and lower(table1.brand) = any('{"brand1","brand2"}')
in the query to
and table1.brand = any('{"Brand1","Brand2"}')
and the plan changed to
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=173137.44..188661.06 rows=14 width=952) (actual time=1444.123..1445.653 rows=100 loops=1)
CTE ctedata1
-> GroupAggregate (cost=0.42..80478.71 rows=43468 width=530) (actual time=0.040..769.982 rows=73121 loops=1)
Group Key: table2.im_name, table2.id_region, table2.brand
-> Index Scan using udx_table2_item_im_name_id_region_brand_target_date_key on table2 (cost=0.42..59699.18 rows=391708 width=146) (actual time=0.021..350.774 rows=391779 loops=1)
Filter: ((deleted_at IS NULL) AND (id_region = ANY ('{1}'::integer[])))
Rows Removed by Filter: 20415
CTE ctedata2
-> WindowAgg (cost=16104.06..17842.78 rows=43468 width=628) (actual time=1088.905..1153.749 rows=73121 loops=1)
-> WindowAgg (cost=16104.06..17082.09 rows=43468 width=620) (actual time=1020.017..1089.117 rows=73121 loops=1)
-> Sort (cost=16104.06..16212.73 rows=43468 width=612) (actual time=1020.011..1037.170 rows=73121 loops=1)
Sort Key: ctedata1.kpiregion, ctedata1.kpibrand, (COALESCE(ctedata1.total_sales_amount, '0'::numeric)) DESC
Sort Method: external merge Disk: 6536kB
-> CTE Scan on ctedata1 (cost=0.00..869.36 rows=43468 width=612) (actual time=0.044..891.653 rows=73121 loops=1)
-> Result (cost=74815.94..90339.56 rows=14 width=952) (actual time=1444.121..1445.635 rows=100 loops=1)
-> Sort (cost=74815.94..74815.98 rows=14 width=944) (actual time=1444.053..1444.065 rows=100 loops=1)
Sort Key: (COALESCE(ctedata2.total_sales_amount, '0'::numeric)) DESC
Sort Method: top-N heapsort Memory: 76kB
-> WindowAgg (cost=72207.31..74815.68 rows=14 width=944) (actual time=1439.128..1441.885 rows=3151 loops=1)
-> Hash Right Join (cost=72207.31..74815.40 rows=14 width=927) (actual time=1307.531..1437.246 rows=3151 loops=1)
Hash Cond: ((ctedata2.im_name = (table1.im_name)::text) AND (ctedata2.kpibrand = (table1.brand)::text))
-> CTE Scan on ctedata2 (cost=0.00..869.36 rows=43468 width=592) (actual time=1088.911..1209.646 rows=73121 loops=1)
-> Hash (cost=72207.10..72207.10 rows=14 width=399) (actual time=216.850..216.850 rows=3151 loops=1)
Buckets: 4096 (originally 1024) Batches: 1 (originally 1) Memory Usage: 1249kB
-> Bitmap Heap Scan on table1 (cost=10960.95..72207.10 rows=14 width=399) (actual time=46.434..214.246 rows=3151 loops=1)
Recheck Cond: (id_region = ANY ('{1}'::integer[]))
Filter: ((deleted_at IS NULL) AND (department = 'Department1'::text) AND ((brand)::text = ANY ('{Brand1, Brand2}'::text[])) AND ('season1'::text = ANY ((lower((seasons)::text))::text[])))
Rows Removed by Filter: 106335
Heap Blocks: exact=42899
-> Bitmap Index Scan on table1_im_name_id_region_key (cost=0.00..10960.94 rows=110619 width=0) (actual time=34.849..34.849 rows=109486 loops=1)
Index Cond: (id_region = ANY ('{1}'::integer[]))
SubPlan 3
-> Aggregate (cost=1108.80..1108.81 rows=1 width=8) (actual time=0.015..0.015 rows=1 loops=100)
-> Nested Loop Left Join (cost=5.57..1108.57 rows=93 width=4) (actual time=0.006..0.014 rows=3 loops=100)
-> Bitmap Heap Scan on table3 (cost=5.15..350.95 rows=93 width=4) (actual time=0.004..0.006 rows=3 loops=100)
Recheck Cond: (id_pf_item = table1.id_pf_item)
Filter: (deleted_at IS NULL)
Heap Blocks: exact=107
-> Bitmap Index Scan on idx_id_pf_item (cost=0.00..5.12 rows=93 width=0) (actual time=0.003..0.003 rows=3 loops=100)
Index Cond: (id_pf_item = table1.id_pf_item)
-> Index Scan using index_table4_id_item on table4 (cost=0.42..8.14 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=303)
Index Cond: (table3.id_item = id_item)
Filter: (deleted_at IS NULL)
Rows Removed by Filter: 0
Planning time: 0.760 ms
Execution time: 1448.848 ms
My Observation
The join strategy for table1 left join ctedata2 changes after the lower() function is avoided. The strategy changes from nested loop left join to hash right join.
The CTE Scan node on ctedata2 is executed only once in the better performing query.
Postgres Version
9.6
Please help me to understand this behaviour. I will supply additional info if required.
It is almost not worthwhile taking a deep dive into the inner workings of a nearly-obsolete version. That time and energy is probably better spent jollying along an upgrade.
But the problem is pretty plain. Your scan on table1 is estimated dreadfully, although 14 times less dreadful in the better plan.
-> Bitmap Heap Scan on table1 (cost=10960.95..72483.64 rows=1 width=399) (actual time=45.466..278.376 rows=3151 loops=1)
-> Bitmap Heap Scan on table1 (cost=10960.95..72207.10 rows=14 width=399) (actual time=46.434..214.246 rows=3151 loops=1)
Your use of lower(), apparently without reason, surely contributes to the poor estimation. And dynamically converting a string into an array certainly doesn't help either. If it were stored as a real array in the first place, the statistics system could get its hands on it and generate more reasonable estimates.

Query using WHERE on a fixed value performs badly. Why is this?

I use a simple query with a WHERE on a fixed value. This query does a left join on a temporary view. For some reason this query is performing very badly. I guess that the view is being executed for every row and not only for the selected rows. When I replace the fixed value with a value from a temporary table, the query performs MUCH better (about 15-20 times faster). Why is this?
I use postgresql version 9.2.15.
I added a temporary table 'wrkovz91-selecties' with only 1 record to pass the selection-value of the WHERE instruction to the query.
The view 'view_wrk_012_000001' that is joined in the query is pretty 'heavy', because it contains a nested other view ('view_wrk_013_000006').
First create the temporary table and add one record:
CREATE TEMPORARY TABLE WRKOVZ91_SELECTIES
(
NR DECIMAL(006) NOT NULL,
KLANTNR DECIMAL(008),
CONSTRAINT WRKOVZ91_SELECTIES_KEY_001 PRIMARY KEY (NR)
);
INSERT INTO WRKOVZ91_SELECTIES
(NR, KLANTNR) VALUES (1, 1);
Then create the temporary view (with a nested view):
CREATE TEMPORARY VIEW view_wrk_013_000006 AS
SELECT vrr.klantnr AS klantnr,
UPPER(vrr.artnr) AS artnr,
vrr.partijnr AS partijnr,
SUM(vrr.kollo) AS colli_op_voorraad
FROM voorraad vrr
WHERE (vrr.status = 'A'::text)
GROUP BY 1,
2,
3;
CREATE TEMPORARY VIEW view_wrk_012_000001 AS
SELECT COALESCE(wrd.klantnr,0) AS klantnr,
COALESCE(wrd.wrkopdrnr,0) AS wrkopdrnr,
MIN(COALESCE(wrd.recept_benodigd_colli,0) *
COALESCE(wrk.te_produceren,0)) AS vrije_voorraad_ind,
MIN(COALESCE(v40.colli_op_voorraad,0)) AS hulpveld
FROM wrkopdr wrk
LEFT JOIN wrkopdrd wrd ON wrd.klantnr = wrk.klantnr
AND wrd.wrkopdrnr = wrk.wrkopdrnr
LEFT JOIN view_wrk_013_000006 v40 ON v40.klantnr =
COALESCE(wrd.klantnr_grondstof,0)
AND v40.artnr = COALESCE(wrd.artnr_grondstof,'')
AND v40.partijnr = COALESCE(wrd.partijnr_grondstof,0)
LEFT JOIN artikel art ON art.klantnr = COALESCE(wrd.klantnr_grondstof,0)
AND art.artnr = COALESCE(wrd.artnr_grondstof,'')
WHERE wrk.status = 'A'::text
AND art.voorraadhoudend_jn = 'J'::text
GROUP BY 1,
2;
The query that performs badly is simple:
SELECT WRK.KLANTNR,
WRK.WRKOPDRNR,
WRK.MIL_UITVOER_DATUM
FROM WRKOPDR WRK
LEFT JOIN VIEW_WRK_012_000001 V38 ON V38.KLANTNR = WRK.KLANTNR AND
V38.WRKOPDRNR = WRK.WRKOPDRNR
LEFT JOIN WRKOVZ91_SELECTIES S02 ON S02.NR = 1
WHERE WRK.KLANTNR = 1
LIMIT 9999;
The query that performs MUCH better is (note the small difference):
SELECT WRK.KLANTNR,
WRK.WRKOPDRNR,
WRK.MIL_UITVOER_DATUM
FROM WRKOPDR WRK
LEFT JOIN VIEW_WRK_012_000001 V38 ON V38.KLANTNR = WRK.KLANTNR AND
V38.WRKOPDRNR = WRK.WRKOPDRNR
LEFT JOIN WRKOVZ91_SELECTIES S02 ON S02.NR = 1
WHERE WRK.KLANTNR = S02.KLANTNR
LIMIT 9999;
I cannot understand why the slow query is performing so badly. It takes in my test-data about 219 secondes. The fast query is taking only 12 secondes. The only difference between the 2 queries is the selection-value in the WHERE.
Does anyone have an explanation for this behaviour?
In addition, the output of the explain analyse of the slow query is:
Limit (cost=19460.18..19972.73 rows=9999 width=18) (actual time=221573.343..221585.953 rows=9999 loops=1)
-> Hash Left Join (cost=19460.18..20913.07 rows=28344 width=18) (actual time=221573.341..221583.701 rows=9999 loops=1)
Hash Cond: ((wrk.klantnr = v38.klantnr) AND (wrk.wrkopdrnr = v38.wrkopdrnr))
-> Seq Scan on wrkopdr wrk (cost=0.00..1240.30 rows=28344 width=18) (actual time=0.055..5.490 rows=9999 loops=1)
Filter: (klantnr = 1::numeric)
-> Hash (cost=19460.17..19460.17 rows=1 width=64) (actual time=221573.254..221573.254 rows=2621 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 113kB
-> Subquery Scan on v38 (cost=19460.15..19460.17 rows=1 width=64) (actual time=221570.429..221572.158 rows=2621 loops=1)
-> HashAggregate (cost=19460.15..19460.16 rows=1 width=52) (actual time=221570.429..221571.499 rows=2621 loops=1)
-> Nested Loop Left Join (cost=14049.90..19460.14 rows=1 width=52) (actual time=225.848..221495.813 rows=6801 loops=1)
Join Filter: ((vrr.klantnr = COALESCE(wrd.klantnr_grondstof, 0::numeric)) AND ((upper(vrr.artnr)) = COALESCE(wrd.artnr_grondstof, ''::text)) AND (vrr.partijnr = COALESCE(wrd.partijnr_grondstof, 0::numeric)))
Rows Removed by Join Filter: 308209258
-> Nested Loop (cost=1076.77..6011.60 rows=1 width=43) (actual time=9.506..587.824 rows=6801 loops=1)
-> Hash Right Join (cost=1076.77..5828.70 rows=69 width=43) (actual time=9.428..204.601 rows=7861 loops=1)
Hash Cond: ((wrd.klantnr = wrk.klantnr) AND (wrd.wrkopdrnr = wrk.wrkopdrnr))
Filter: (COALESCE(wrd.klantnr, 0::numeric) = 1::numeric)
Rows Removed by Filter: 1
-> Seq Scan on wrkopdrd wrd (cost=0.00..3117.73 rows=116873 width=38) (actual time=0.013..65.472 rows=116873 loops=1)
-> Hash (cost=1026.34..1026.34 rows=3362 width=16) (actual time=9.324..9.324 rows=3362 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 161kB
-> Bitmap Heap Scan on wrkopdr wrk (cost=98.31..1026.34 rows=3362 width=16) (actual time=0.850..7.843 rows=3362 loops=1)
Recheck Cond: (status = 'A'::text)
-> Bitmap Index Scan on wrkopdr_key_002 (cost=0.00..97.47 rows=3362 width=0) (actual time=0.763..0.763 rows=3362 loops=1)
Index Cond: (status = 'A'::text)
-> Index Scan using artikel_key_001 on artikel art (cost=0.00..2.64 rows=1 width=17) (actual time=0.037..0.043 rows=1 loops=7861)
Index Cond: ((klantnr = COALESCE(wrd.klantnr_grondstof, 0::numeric)) AND (artnr = COALESCE(wrd.artnr_grondstof, ''::text)))
Filter: (voorraadhoudend_jn = 'J'::text)
Rows Removed by Filter: 0
-> HashAggregate (cost=12973.14..13121.70 rows=11885 width=25) (actual time=0.027..19.661 rows=45319 loops=6801)
-> Seq Scan on voorraad vrr (cost=0.00..12340.71 rows=63243 width=25) (actual time=0.456..122.855 rows=62655 loops=1)
Filter: (status = 'A'::text)
Rows Removed by Filter: 113953
Total runtime: 221587.386 ms
(33 rows)
The output of the explain analyse of the fast query is:
Limit (cost=56294.55..57997.06 rows=142 width=18) (actual time=445.371..12739.474 rows=9999 loops=1)
-> Nested Loop Left Join (cost=56294.55..57997.06 rows=142 width=18) (actual time=445.368..12736.035 rows=9999 loops=1)
Join Filter: ((v38.klantnr = wrk.klantnr) AND (v38.wrkopdrnr = wrk.wrkopdrnr))
Rows Removed by Join Filter: 26206807
-> Nested Loop (cost=0.00..1532.01 rows=142 width=18) (actual time=0.055..18.652 rows=9999 loops=1)
Join Filter: (wrk.klantnr = s02.klantnr)
-> Index Scan using wrkovz91_selecties_key_001 on wrkovz91_selecties s02 (cost=0.00..8.27 rows=1 width=14) (actual time=0.021..0.021 rows=1 loops=1)
Index Cond: (nr = 1::numeric)
-> Seq Scan on wrkopdr wrk (cost=0.00..1169.44 rows=28344 width=18) (actual time=0.026..11.905 rows=9999 loops=1)
-> Materialize (cost=56294.55..56296.25 rows=68 width=64) (actual time=0.044..0.380 rows=2621 loops=9999)
-> Subquery Scan on v38 (cost=56294.55..56295.91 rows=68 width=64) (actual time=441.797..443.503 rows=2621 loops=1)
-> HashAggregate (cost=56294.55..56295.23 rows=68 width=52) (actual time=441.795..442.848 rows=2621 loops=1)
-> Hash Left Join (cost=14525.30..56293.70 rows=68 width=52) (actual time=255.847..433.386 rows=6801 loops=1)
Hash Cond: ((COALESCE(wrd.klantnr_grondstof, 0::numeric) = v40.klantnr) AND (COALESCE(wrd.artnr_grondstof, ''::text) = v40.artnr) AND (COALESCE(wrd.partijnr_grondstof, 0::numeric) = v40.partijnr))
-> Nested Loop (cost=1076.77..42541.70 rows=68 width=43) (actual time=10.356..171.502 rows=6801 loops=1)
-> Hash Right Join (cost=1076.77..5794.04 rows=13863 width=43) (actual time=10.286..91.471 rows=7862 loops=1)
Hash Cond: ((wrd.klantnr = wrk.klantnr) AND (wrd.wrkopdrnr = wrk.wrkopdrnr))
-> Seq Scan on wrkopdrd wrd (cost=0.00..3117.73 rows=116873 width=38) (actual time=0.014..38.276 rows=116873 loops=1)
-> Hash (cost=1026.34..1026.34 rows=3362 width=16) (actual time=10.179..10.179 rows=3362 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 161kB
-> Bitmap Heap Scan on wrkopdr wrk (cost=98.31..1026.34 rows=3362 width=16) (actual time=0.835..8.667 rows=3362 loops=1)
Recheck Cond: (status = 'A'::text)
-> Bitmap Index Scan on wrkopdr_key_002 (cost=0.00..97.47 rows=3362 width=0) (actual time=0.748..0.748 rows=3362 loops=1)
Index Cond: (status = 'A'::text)
-> Index Scan using artikel_key_001 on artikel art (cost=0.00..2.64 rows=1 width=17) (actual time=0.009..0.009 rows=1 loops=7862)
Index Cond: ((klantnr = COALESCE(wrd.klantnr_grondstof, 0::numeric)) AND (artnr = COALESCE(wrd.artnr_grondstof, ''::text)))
Filter: (voorraadhoudend_jn = 'J'::text)
Rows Removed by Filter: 0
-> Hash (cost=13240.55..13240.55 rows=11885 width=74) (actual time=245.430..245.430 rows=45319 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 2645kB
-> Subquery Scan on v40 (cost=12973.14..13240.55 rows=11885 width=74) (actual time=186.156..222.042 rows=45319 loops=1)
-> HashAggregate (cost=12973.14..13121.70 rows=11885 width=25) (actual time=186.154..209.633 rows=45319 loops=1)
-> Seq Scan on voorraad vrr (cost=0.00..12340.71 rows=63243 width=25) (actual time=0.453..126.361 rows=62655 loops=1)
Filter: (status = 'A'::text)
Rows Removed by Filter: 113953
Total runtime: 12742.125 ms
(36 rows)

Postgres Table Slow Performance

We have a Product table in postgres DB. This is hosted on Heroku. We have 8 GB RAM and 250 GB disk space. 1000 IPOP allowed.
We are having proper indexes on columns.
Platform
PostgreSQL 9.5.12 on x86_64-pc-linux-gnu (Ubuntu 9.5.12-1.pgdg14.04+1), compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit
We are running a keywords search query on this table. We are having 2.8 millions records in this table. Our search query is too slow. Its giving us result in about 50 seconds. Which is too slow.
Query
SELECT
P .sfid AS prodsfid,
P .image_url__c image,
P .productcode sku,
P .Short_Description__c shortDesc,
P . NAME pname,
P .category__c,
P .price__c price,
P .description,
P .vendor_name__c vname,
P .vendor__c supSfid
FROM
staging.product2 P
JOIN (
SELECT
p1.sfid
FROM
staging.product2 p1
WHERE
p1. NAME ILIKE '%s%'
OR p1.productcode ILIKE '%s%'
) AS TEMP ON (P .sfid = TEMP .sfid)
WHERE
P .status__c = 'Available'
AND LOWER (
P .vendor_shipping_country__c
) = ANY (
VALUES
('us'),
('usa'),
('united states'),
('united states of america')
)
AND P .vendor_catalog_tier__c = ANY (
VALUES
('a1c37000000oljnAAA'),
('a1c37000000oljQAAQ'),
('a1c37000000oljQAAQ'),
('a1c37000000pT7IAAU'),
('a1c37000000omDjAAI'),
('a1c37000000oljMAAQ'),
('a1c37000000oljaAAA'),
('a1c37000000pT7SAAU'),
('a1c0R000000AFcVQAW'),
('a1c0R000000A1HAQA0'),
('a1c0R0000000OpWQAU'),
('a1c0R0000005TZMQA2'),
('a1c37000000oljdAAA'),
('a1c37000000ooTqAAI'),
('a1c37000000omLBAAY'),
('a1c0R0000005N8GQAU')
)
Here is the explain plan:
Nested Loop (cost=31.85..33886.54 rows=3681 width=750)
-> Hash Join (cost=31.77..31433.07 rows=4415 width=750)
Hash Cond: (lower((p.vendor_shipping_country__c)::text) = "*VALUES*".column1)
-> Nested Loop (cost=31.73..31423.67 rows=8830 width=761)
-> HashAggregate (cost=0.06..0.11 rows=16 width=32)
Group Key: "*VALUES*_1".column1
-> Values Scan on "*VALUES*_1" (cost=0.00..0.06 rows=16 width=32)
-> Bitmap Heap Scan on product2 p (cost=31.66..1962.32 rows=552 width=780)
Recheck Cond: ((vendor_catalog_tier__c)::text = "*VALUES*_1".column1)
Filter: ((status__c)::text = 'Available'::text)
-> Bitmap Index Scan on vendor_catalog_tier_prd_idx (cost=0.00..31.64 rows=1016 width=0)
Index Cond: ((vendor_catalog_tier__c)::text = "*VALUES*_1".column1)
-> Hash (cost=0.03..0.03 rows=4 width=32)
-> Unique (cost=0.02..0.03 rows=4 width=32)
-> Sort (cost=0.02..0.02 rows=4 width=32)
Sort Key: "*VALUES*".column1
-> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=4 width=32)
-> Index Scan using sfid_prd_idx on product2 p1 (cost=0.09..0.55 rows=1 width=19)
Index Cond: ((sfid)::text = (p.sfid)::text)
Filter: (((name)::text ~~* '%s%'::text) OR ((productcode)::text ~~* '%s%'::text))
Its returning around 140,576 records. By the way we need only top 5,000 records only. Will putting Limit help here?
Let me know how to make it fast and what is causing this slow.
EXPLAIN ANALYZE
#RaymondNijland Here is the explain analyze
Nested Loop (cost=31.83..33427.28 rows=4039 width=750) (actual time=1.903..4384.221 rows=140576 loops=1)
-> Hash Join (cost=31.74..30971.32 rows=4369 width=750) (actual time=1.852..1094.964 rows=164353 loops=1)
Hash Cond: (lower((p.vendor_shipping_country__c)::text) = "*VALUES*".column1)
-> Nested Loop (cost=31.70..30962.02 rows=8738 width=761) (actual time=1.800..911.738 rows=164353 loops=1)
-> HashAggregate (cost=0.06..0.11 rows=16 width=32) (actual time=0.012..0.019 rows=15 loops=1)
Group Key: "*VALUES*_1".column1
-> Values Scan on "*VALUES*_1" (cost=0.00..0.06 rows=16 width=32) (actual time=0.004..0.005 rows=16 loops=1)
-> Bitmap Heap Scan on product2 p (cost=31.64..1933.48 rows=546 width=780) (actual time=26.004..57.290 rows=10957 loops=15)
Recheck Cond: ((vendor_catalog_tier__c)::text = "*VALUES*_1".column1)
Filter: ((status__c)::text = 'Available'::text)
Rows Removed by Filter: 645
Heap Blocks: exact=88436
-> Bitmap Index Scan on vendor_catalog_tier_prd_idx (cost=0.00..31.61 rows=1000 width=0) (actual time=24.811..24.811 rows=11601 loops=15)
Index Cond: ((vendor_catalog_tier__c)::text = "*VALUES*_1".column1)
-> Hash (cost=0.03..0.03 rows=4 width=32) (actual time=0.032..0.032 rows=4 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Unique (cost=0.02..0.03 rows=4 width=32) (actual time=0.026..0.027 rows=4 loops=1)
-> Sort (cost=0.02..0.02 rows=4 width=32) (actual time=0.026..0.026 rows=4 loops=1)
Sort Key: "*VALUES*".column1
Sort Method: quicksort Memory: 25kB
-> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=4 width=32) (actual time=0.001..0.002 rows=4 loops=1)
-> Index Scan using sfid_prd_idx on product2 p1 (cost=0.09..0.56 rows=1 width=19) (actual time=0.019..0.020 rows=1 loops=164353)
Index Cond: ((sfid)::text = (p.sfid)::text)
Filter: (((name)::text ~~* '%s%'::text) OR ((productcode)::text ~~* '%s%'::text))
Rows Removed by Filter: 0
Planning time: 2.488 ms
Execution time: 4391.378 ms
Another query version, with order by , but it seems very slow as well (140 seconds)
SELECT
P .sfid AS prodsfid,
P .image_url__c image,
P .productcode sku,
P .Short_Description__c shortDesc,
P . NAME pname,
P .category__c,
P .price__c price,
P .description,
P .vendor_name__c vname,
P .vendor__c supSfid
FROM
staging.product2 P
WHERE
P .status__c = 'Available'
AND P .vendor_shipping_country__c IN (
'us',
'usa',
'united states',
'united states of america'
)
AND P .vendor_catalog_tier__c IN (
'a1c37000000omDQAAY',
'a1c37000000omDTAAY',
'a1c37000000omDXAAY',
'a1c37000000omDYAAY',
'a1c37000000omDZAAY',
'a1c37000000omDdAAI',
'a1c37000000omDfAAI',
'a1c37000000omDiAAI',
'a1c37000000oml6AAA',
'a1c37000000oljPAAQ',
'a1c37000000oljRAAQ',
'a1c37000000oljWAAQ',
'a1c37000000oljXAAQ',
'a1c37000000oljZAAQ',
'a1c37000000oljcAAA',
'a1c37000000oljdAAA',
'a1c37000000oljlAAA',
'a1c37000000oljoAAA',
'a1c37000000oljqAAA',
'a1c37000000olnvAAA',
'a1c37000000olnwAAA',
'a1c37000000olnxAAA',
'a1c37000000olnyAAA',
'a1c37000000olo0AAA',
'a1c37000000olo1AAA',
'a1c37000000olo4AAA',
'a1c37000000olo8AAA',
'a1c37000000olo9AAA',
'a1c37000000oloCAAQ',
'a1c37000000oloFAAQ',
'a1c37000000oloIAAQ',
'a1c37000000oloJAAQ',
'a1c37000000oloMAAQ',
'a1c37000000oloNAAQ',
'a1c37000000oloSAAQ',
'a1c37000000olodAAA',
'a1c37000000oloeAAA',
'a1c37000000olzCAAQ',
'a1c37000000om0xAAA',
'a1c37000000ooV1AAI',
'a1c37000000oog8AAA',
'a1c37000000oogDAAQ',
'a1c37000000oonzAAA',
'a1c37000000oluuAAA',
'a1c37000000pT7SAAU',
'a1c37000000oljnAAA',
'a1c37000000olumAAA',
'a1c37000000oljpAAA',
'a1c37000000pUm2AAE',
'a1c37000000olo3AAA',
'a1c37000000oo1MAAQ',
'a1c37000000oo1vAAA',
'a1c37000000pWxgAAE',
'a1c37000000pYJkAAM',
'a1c37000000omDjAAI',
'a1c37000000ooTgAAI',
'a1c37000000op2GAAQ',
'a1c37000000one0AAA',
'a1c37000000oljYAAQ',
'a1c37000000pUlxAAE',
'a1c37000000oo9SAAQ',
'a1c37000000pcIYAAY',
'a1c37000000pamtAAA',
'a1c37000000pd2QAAQ',
'a1c37000000pdCOAAY',
'a1c37000000OpPaAAK',
'a1c37000000OphZAAS',
'a1c37000000olNkAAI'
)
ORDER BY p.productcode asc
LIMIT 5000
Here is the explain analyse for this:
Limit (cost=0.09..45271.54 rows=5000 width=750) (actual time=48593.355..86376.864 rows=5000 loops=1)
-> Index Scan using productcode_prd_idx on product2 p (cost=0.09..743031.39 rows=82064 width=750) (actual time=48593.353..86376.283 rows=5000 loops=1)
Filter: (((status__c)::text = 'Available'::text) AND ((vendor_shipping_country__c)::text = ANY ('{us,usa,"united states","united states of america"}'::text[])) AND ((vendor_catalog_tier__c)::text = ANY ('{a1c37000000omDQAAY,a1c37000000omDTAAY,a1c37000000omDXAAY,a1c37000000omDYAAY,a1c37000000omDZAAY,a1c37000000omDdAAI,a1c37000000omDfAAI,a1c37000000omDiAAI,a1c37000000oml6AAA,a1c37000000oljPAAQ,a1c37000000oljRAAQ,a1c37000000oljWAAQ,a1c37000000oljXAAQ,a1c37000000oljZAAQ,a1c37000000oljcAAA,a1c37000000oljdAAA,a1c37000000oljlAAA,a1c37000000oljoAAA,a1c37000000oljqAAA,a1c37000000olnvAAA,a1c37000000olnwAAA,a1c37000000olnxAAA,a1c37000000olnyAAA,a1c37000000olo0AAA,a1c37000000olo1AAA,a1c37000000olo4AAA,a1c37000000olo8AAA,a1c37000000olo9AAA,a1c37000000oloCAAQ,a1c37000000oloFAAQ,a1c37000000oloIAAQ,a1c37000000oloJAAQ,a1c37000000oloMAAQ,a1c37000000oloNAAQ,a1c37000000oloSAAQ,a1c37000000olodAAA,a1c37000000oloeAAA,a1c37000000olzCAAQ,a1c37000000om0xAAA,a1c37000000ooV1AAI,a1c37000000oog8AAA,a1c37000000oogDAAQ,a1c37000000oonzAAA,a1c37000000oluuAAA,a1c37000000pT7SAAU,a1c37000000oljnAAA,a1c37000000olumAAA,a1c37000000oljpAAA,a1c37000000pUm2AAE,a1c37000000olo3AAA,a1c37000000oo1MAAQ,a1c37000000oo1vAAA,a1c37000000pWxgAAE,a1c37000000pYJkAAM,a1c37000000omDjAAI,a1c37000000ooTgAAI,a1c37000000op2GAAQ,a1c37000000one0AAA,a1c37000000oljYAAQ,a1c37000000pUlxAAE,a1c37000000oo9SAAQ,a1c37000000pcIYAAY,a1c37000000pamtAAA,a1c37000000pd2QAAQ,a1c37000000pdCOAAY,a1c37000000OpPaAAK,a1c37000000OphZAAS,a1c37000000olNkAAI}'::text[])))
Rows Removed by Filter: 1707920
Planning time: 1.685 ms
Execution time: 86377.139 ms
Thanks
Aslam Bari
You might want to consider a GIN or GIST index on your staging.product2 table. Double-sided ILIKEs are slow and difficult to improve substantially. I've seen a GIN index improve a similar query by 60-80%.
See this doc.

query with IN expression running slower than query with OR

I have two queries one is using OR expression and is running very fast. The other query is similar but is using IN expression instead of OR and is running very slow. I would appreciate if you could let me know how to make the query using IN as fast as the one using OR. The table has 15 million records
SELECT e.id
FROM events e,
resources r
WHERE e.resource_id = r.id
AND resource_type_id IN (19872817,
282)
ORDER BY occurrence_date DESC LIMIT 100
Limit (cost=0.85..228363.80 rows=100 width=12) (actual time=238.668..57470.017 rows=19 loops=1)
-> Nested Loop (cost=0.85..26211499.28 rows=11478 width=12) (actual time=238.667..57470.010 rows=19 loops=1)
Join Filter: (e.resource_id = r.id)
Rows Removed by Join Filter: 507548495
-> Index Scan using eventoccurrencedateindex on events e (cost=0.43..603333.83 rows=15380258 width=16) (actual time=0.023..2798.538 rows=15380258 loops=1)
-> Materialize (cost=0.42..36.16 rows=111 width=4) (actual time=0.000..0.001 rows=33 loops=15380258)
-> Index Scan using resources_type_fk_index on resources r (cost=0.42..35.60 rows=111 width=4) (actual time=0.014..0.107 rows=33 loops=1)
Index Cond: (resource_type_id = ANY ('{19872817,282}'::integer[]))
Total runtime: 57470.057 ms
SELECT e.id
FROM events e,
resources r
WHERE e.resource_id = r.id
AND (resource_type_id = '19872817' OR resource_type_id = '282')
ORDER BY occurrence_date DESC LIMIT 100
Limit (cost=10.17..14.22 rows=100 width=12) (actual time=0.060..0.181 rows=100 loops=1)
-> Nested Loop (cost=10.17..34747856.23 rows=858030913 width=12) (actual time=0.059..0.167 rows=100 loops=1)
Join Filter: (((e.resource_id = r.id) AND (r.resource_type_id = 19872817)) OR (r.resource_type_id = 282))
-> Index Scan using eventoccurrencedateindex on events e (cost=0.43..603333.83 rows=15380258 width=16) (actual time=0.018..0.019 rows=4 loops=1)
-> Materialize (cost=9.74..349.92 rows=111 width=8) (actual time=0.009..0.023 rows=25
loops=4)
-> Bitmap Heap Scan on resources r (cost=9.74..349.36 rows=111 width=8) (actual time=0.034..0.081 rows=33 loops=1)
Recheck Cond: ((resource_type_id = 19872817) OR (resource_type_id = 282))
-> BitmapOr (cost=9.74..9.74 rows=111 width=0) (actual time=0.023..0.023 rows=0 loops=1)
-> Bitmap Index Scan on resources_type_fk_index (cost=0.00..4.84 rows=56 width=0) (actual time=0.009..0.009 rows=0 loops=1)
Index Cond: (resource_type_id = 19872817)
-> Bitmap Index Scan on resources_type_fk_index (cost=0.00..4.84 rows=56 width=0) (actual time=0.014..0.014 rows=33 loops=1)
Index Cond: (resource_type_id = 282)" "Total runtime: 0.242 ms
This is strange in the or version:
Join Filter: (
((e.resource_id = r.id) AND (r.resource_type_id = 19872817))
OR
(r.resource_type_id = 282)
)
It does e.resource_id = r.id AND r.resource_type_id = 19872817 first and then OR r.resource_type_id = 282 which is wrong. Are you sure you issued the correct condition in that query? Notice that there must be parenthesis wrapping the OR:
e.resource_id = r.id
AND
(r.resource_type_id = 19872817 OR r.resource_type_id = 282)

How to create postgres date index properly?

I'm using Django ORM and postgresql.
ORM creates a query:
SELECT
(date_part('month', stat_date)) AS "stat_date",
"direct_keywordstat"."banner_id",
SUM("direct_keywordstat"."total") AS "total",
SUM("direct_keywordstat"."clicks") AS "clicks",
SUM("direct_keywordstat"."shows") AS "shows"
FROM "direct_keywordstat"
LEFT OUTER JOIN "direct_banner" ON ("direct_keywordstat"."banner_id" = "direct_banner"."banner_ptr_id")
LEFT OUTER JOIN "platforms_banner" ON ("direct_banner"."banner_ptr_id" = "platforms_banner"."id")
WHERE (
"direct_keywordstat".stat_date BETWEEN E'2009-08-25' AND E'2010-08-25' AND
"direct_keywordstat"."keyword_id" IN (
SELECT U0."id"
FROM "direct_keyword" U0
INNER JOIN "direct_banner" U1 ON (U0."banner_id" = U1."banner_ptr_id")
INNER JOIN "platforms_banner" U2 ON (U1."banner_ptr_id" = U2."id")
INNER JOIN "platforms_campaign" U3 ON (U2."campaign_id" = U3."id")
INNER JOIN "direct_campaign" U4 ON (U3."id" = U4."campaign_ptr_id")
WHERE (
U0."deleted" = E'False' AND
U0."low_ctr" = E'False' AND
U4."status_active" = E'True' AND
U0."banner_id" IN (
SELECT U0."banner_ptr_id"
FROM "direct_banner" U0
INNER JOIN "platforms_banner" U1
ON (U0."banner_ptr_id" = U1."id")
WHERE (
U0."status_show" = E'True' AND
U1."campaign_id" = E'174' )
)
)
)
)
GROUP BY
"direct_keywordstat"."banner_id",
(date_part('month', stat_date)),
"platforms_banner"."title", date_trunc('month', stat_date)
ORDER BY "platforms_banner"."title" ASC, "stat_date" ASC
Problem is, direct_keywordstat contains 3mln+ records, so the query executes in ~15 seconds.
I've tried creating indexes like
CREATE INDEX direct_keywordstat_stat_date on direct_keywordstat using btree(stat_date);
But EXPLAIN ANALYZE show that index is not used.
Table schema:
\d direct_keywordstat
Table "public.direct_keywordstat"
Column | Type | Modifiers
-------------+------------------------+-----------------------------------------------------------------
id | integer | not null default nextval('direct_keywordstat_id_seq'::regclass)
keyword_id | integer | not null
banner_id | integer | not null
campaign_id | integer | not null
stat_date | date | not null
region_id | integer | not null
place_type | character varying(30) |
place_name | character varying(100) |
clicks | integer | not null default 0
shows | integer | not null default 0
total | numeric(19,6) | not null
How can i create useful index?
Or, maybe, there's a chance to optimize this query other way?
Thing is, if WHERE looks like
"direct_keywordstat".clicks BETWEEN 10 AND 3000000
query executes in 0.8 seconds.
Do you have indexes on these columns:
direct_banner.banner_ptr_id
direct_keywordstat.banner_id
direct_keywordstat.stat_date
Both columns in direct_keywordstat could be combined in a single index, just check
This is also a problem:
Sort Method: external merge Disk:
20600kB
Check your settings for work_mem, you need at least 20MB for this query.
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=727967.61..847401.71 rows=2514402 width=67) (actual time=22010.522..23408.262 rows=5 loops=1)
-> Sort (cost=727967.61..734253.62 rows=2514402 width=67) (actual time=21742.365..23134.748 rows=198978 loops=1)
Sort Key: platforms_banner.title, (date_part('month'::text, (direct_keywordstat.stat_date)::timestamp without time zone)), direct_keywordstat.banner_id, (date_trunc('month'::text, (direct_keywordstat.stat_date)::timestamp with time zone))
Sort Method: external merge Disk: 20600kB
-> Hash Join (cost=1034.02..164165.25 rows=2514402 width=67) (actual time=5159.538..14942.441 rows=198978 loops=1)
Hash Cond: (direct_keywordstat.keyword_id = u0.id)
-> Hash Left Join (cost=365.78..117471.99 rows=2514402 width=71) (actual time=26.672..13101.294 rows=2523151 loops=1)
Hash Cond: (direct_keywordstat.banner_id = direct_banner.banner_ptr_id)
-> Seq Scan on direct_keywordstat (cost=0.00..76247.17 rows=2514402 width=25) (actual time=8.892..9386.010 rows=2523151 loops=1)
Filter: ((stat_date >= '2009-08-25'::date) AND (stat_date <= '2010-08-25'::date))
-> Hash (cost=324.86..324.86 rows=3274 width=50) (actual time=17.754..17.754 rows=2851 loops=1)
-> Hash Left Join (cost=209.15..324.86 rows=3274 width=50) (actual time=10.845..15.385 rows=2851 loops=1)
Hash Cond: (direct_banner.banner_ptr_id = platforms_banner.id)
-> Seq Scan on direct_banner (cost=0.00..66.74 rows=3274 width=4) (actual time=0.004..1.196 rows=2851 loops=1)
-> Hash (cost=173.51..173.51 rows=2851 width=50) (actual time=10.683..10.683 rows=2851 loops=1)
-> Seq Scan on platforms_banner (cost=0.00..173.51 rows=2851 width=50) (actual time=0.004..3.576 rows=2851 loops=1)
-> Hash (cost=641.44..641.44 rows=2144 width=4) (actual time=30.420..30.420 rows=106 loops=1)
-> HashAggregate (cost=620.00..641.44 rows=2144 width=4) (actual time=30.162..30.288 rows=106 loops=1)
-> Hash Join (cost=407.17..614.64 rows=2144 width=4) (actual time=16.152..30.031 rows=106 loops=1)
Hash Cond: (u0.banner_id = u1.banner_ptr_id)
-> Nested Loop (cost=76.80..238.50 rows=6488 width=16) (actual time=8.670..22.343 rows=106 loops=1)
-> HashAggregate (cost=76.80..76.87 rows=7 width=8) (actual time=0.045..0.047 rows=1 loops=1)
-> Nested Loop (cost=0.00..76.79 rows=7 width=8) (actual time=0.033..0.036 rows=1 loops=1)
-> Index Scan using platforms_banner_campaign_id on platforms_banner u1 (cost=0.00..22.82 rows=7 width=4) (actual time=0.019..0.020 rows=1 loops=1)
Index Cond: (campaign_id = 174)
-> Index Scan using direct_banner_pkey on direct_banner u0 (cost=0.00..7.70 rows=1 width=4) (actual time=0.009..0.011 rows=1 loops=1)
Index Cond: (u0.banner_ptr_id = u1.id)
Filter: u0.status_show
-> Index Scan using direct_keyword_banner_id on direct_keyword u0 (cost=0.00..23.03 rows=5 width=8) (actual time=8.620..22.127 rows=106 loops=1)
Index Cond: (u0.banner_id = u0.banner_ptr_id)
Filter: ((NOT u0.deleted) AND (NOT u0.low_ctr))
-> Hash (cost=316.84..316.84 rows=1082 width=8) (actual time=7.458..7.458 rows=403 loops=1)
-> Hash Join (cost=227.00..316.84 rows=1082 width=8) (actual time=3.584..7.149 rows=403 loops=1)
Hash Cond: (u1.banner_ptr_id = u2.id)
-> Seq Scan on direct_banner u1 (cost=0.00..66.74 rows=3274 width=4) (actual time=0.002..1.570 rows=2851 loops=1)
-> Hash (cost=213.48..213.48 rows=1082 width=4) (actual time=3.521..3.521 rows=403 loops=1)
-> Hash Join (cost=23.88..213.48 rows=1082 width=4) (actual time=0.715..3.268 rows=403 loops=1)
Hash Cond: (u2.campaign_id = u3.id)
-> Seq Scan on platforms_banner u2 (cost=0.00..173.51 rows=2851 width=8) (actual time=0.001..1.272 rows=2851 loops=1)
-> Hash (cost=22.95..22.95 rows=74 width=8) (actual time=0.345..0.345 rows=37 loops=1)
-> Hash Join (cost=11.84..22.95 rows=74 width=8) (actual time=0.133..0.320 rows=37 loops=1)
Hash Cond: (u3.id = u4.campaign_ptr_id)
-> Seq Scan on platforms_campaign u3 (cost=0.00..8.91 rows=391 width=4) (actual time=0.006..0.098 rows=196 loops=1)
-> Hash (cost=10.91..10.91 rows=74 width=4) (actual time=0.117..0.117 rows=37 loops=1)
-> Seq Scan on direct_campaign u4 (cost=0.00..10.91 rows=74 width=4) (actual time=0.004..0.097 rows=37 loops=1)
Filter: status_active
Total runtime: 23436.715 ms
(47 rows)
Here it is