Optimizing PostgreSQL Query - postgresql

I need help optimizing this PostgreSQL Query. I've already created some indexes but the only index being used is "cp_campaign_task_ap_index".
EXPLAIN ANALYZE SELECT * FROM campaign_phones
INNER JOIN campaigns ON campaigns.id = campaign_phones.campaign_id
INNER JOIN campaign_ports ON campaign_ports.campaign_id = campaigns.id
WHERE campaign_phones.task_id IS NULL
AND campaign_phones.assigned_port = 2
AND (campaigns.auto_start IS TRUE)
AND (campaigns.starts_at_date::date <= '2018-07-08'
AND campaigns.ends_at_date::date >= '2018-07-08')
AND (campaign_ports.gateway_port_id = 611)
AND (campaign_ports.op_mode != 1)
ORDER BY campaigns.last_sent_at ASC NULLS FIRST LIMIT 1;`
The output of the command is:
Limit (cost=26031.86..26031.87 rows=1 width=475) (actual time=2335.421..2335.421 rows=1 loops=1)
-> Sort (cost=26031.86..26047.26 rows=6158 width=475) (actual time=2335.419..2335.419 rows=1 loops=1)
Sort Key: campaigns.last_sent_at NULLS FIRST
Sort Method: top-N heapsort Memory: 25kB
-> Nested Loop (cost=136.10..26001.07 rows=6158 width=475) (actual time=1.176..1510.276 rows=36666 loops=1)
Join Filter: (campaigns.id = campaign_phones.campaign_id)
-> Nested Loop (cost=0.00..28.28 rows=2 width=218) (actual time=0.163..0.435 rows=4 loops=1)
Join Filter: (campaigns.id = campaign_ports.campaign_id)
Rows Removed by Join Filter: 113
-> Seq Scan on campaign_ports (cost=0.00..21.48 rows=9 width=55) (actual time=0.017..0.318 rows=9 loops=1)
Filter: ((op_mode <> 1) AND (gateway_port_id = 611))
Rows Removed by Filter: 823
-> Materialize (cost=0.00..5.74 rows=8 width=163) (actual time=0.001..0.008 rows=13 loops=9)
-> Seq Scan on campaigns (cost=0.00..5.70 rows=8 width=163) (actual time=0.011..0.050 rows=13 loops=1)
Filter: ((auto_start IS TRUE) AND ((starts_at_date)::date <= '2018-07-08'::date) AND ((ends_at_date)::date >= '2018-07-08'::date))
Rows Removed by Filter: 22
-> Bitmap Heap Scan on campaign_phones (cost=136.10..12931.82 rows=4366 width=249) (actual time=43.079..302.895 rows=9166 loops=4)
Recheck Cond: ((campaign_id = campaign_ports.campaign_id) AND (task_id IS NULL) AND (assigned_port = 2))
Heap Blocks: exact=6686
-> Bitmap Index Scan on cp_campaign_task_ap_index (cost=0.00..135.01 rows=4366 width=0) (actual time=8.884..8.884 rows=9167 loops=4)
Index Cond: ((campaign_id = campaign_ports.campaign_id) AND (task_id IS NULL) AND (assigned_port = 2))
Planning time: 1.115 ms
Execution time: 2335.563 ms
The "campaign_phones" relation could have many rows, perhaps a million.
I don't know where to start optimizing, perhaps creating indexes or changing query structure.
Thanks.

Related

How does a string operation on a column in a filter condition of a Postgresql query have on the plan it chooses

I was working on optimising a query, with dumb luck I tried something and it improved the query but I am unable to explain why.
Below is the query with poor performance
with ctedata1 as(
select
sum(total_visit_count) as total_visit_count,
sum(sh_visit_count) as sh_visit_count,
sum(ec_visit_count) as ec_visit_count,
sum(total_like_count) as total_like_count,
sum(sh_like_count) as sh_like_count,
sum(ec_like_count) as ec_like_count,
sum(total_order_count) as total_order_count,
sum(sh_order_count) as sh_order_count,
sum(ec_order_count) as ec_order_count,
sum(total_sales_amount) as total_sales_amount,
sum(sh_sales_amount) as sh_sales_amount,
sum(ec_sales_amount) as ec_sales_amount,
sum(ec_order_online_count) as ec_order_online_count,
sum(ec_sales_online_amount) as ec_sales_online_amount,
sum(ec_order_in_store_count) as ec_order_in_store_count,
sum(ec_sales_in_store_amount) as ec_sales_in_store_amount,
table2.im_name,
table2.brand as kpibrand,
table2.id_region as kpiregion
from
table2
where
deleted_at is null
and id_region = any('{1}')
group by
im_name,
kpiregion,
kpibrand ),
ctedata2 as (
select
ctedata1.*,
rank() over (partition by (kpiregion,
kpibrand)
order by
coalesce(ctedata1.total_sales_amount, 0) desc) rank,
count(*) over (partition by (kpiregion,
kpibrand)) as total_count
from
ctedata1 )
select
table1.id_pf_item,
table1.product_id,
table1.color_code,
table1.l1_code,
table1.local_title as product_name,
table1.id_region,
table1.gender,
case
when table1.created_at is null then '1970/01/01 00:00:00'
else table1.created_at
end as created_at,
(
select
count(distinct id_outfit)
from
table3
left join table4 on
table3.id_item = table4.id_item
and table4.deleted_at is null
where
table3.deleted_at is null
and table3.id_pf_item = table1.id_pf_item) as outfit_count,
count(*) over() as total_matched,
case
when table1.v8_im_name = '' then table1.im_name
else table1.v8_im_name
end as im_name,
case
when table1.id_region != 1 then null
else
case
when table1.sales_start_at is null then '1970/01/01 00:00:00'
else table1.sales_start_at
end
end as sales_start_date,
table1.category_ids,
array_to_string(table1.intermediate_category_ids, ','),
table1.image_url,
table1.brand,
table1.pdp_url,
coalesce(ctedata2.total_visit_count, 0) as total_visit_count,
coalesce(ctedata2.sh_visit_count, 0) as sh_visit_count,
coalesce(ctedata2.ec_visit_count, 0) as ec_visit_count,
coalesce(ctedata2.total_like_count, 0) as total_like_count,
coalesce(ctedata2.sh_like_count, 0) as sh_like_count,
coalesce(ctedata2.ec_like_count, 0) as ec_like_count,
coalesce(ctedata2.total_order_count, 0) as total_order_count,
coalesce(ctedata2.sh_order_count, 0) as sh_order_count,
coalesce(ctedata2.ec_order_count, 0) as ec_order_count,
coalesce(ctedata2.total_sales_amount, 0) as total_sales_amount,
coalesce(ctedata2.sh_sales_amount, 0) as sh_sales_amount,
coalesce(ctedata2.ec_sales_amount, 0) as ec_sales_amount,
coalesce(ctedata2.ec_order_online_count, 0) as ec_order_online_count,
coalesce(ctedata2.ec_sales_online_amount, 0) as ec_sales_online_amount,
coalesce(ctedata2.ec_order_in_store_count, 0) as ec_order_in_store_count,
coalesce(ctedata2.ec_sales_in_store_amount, 0) as ec_sales_in_store_amount,
ctedata2.rank,
ctedata2.total_count,
table1.department,
table1.seasons
from
table1
left join ctedata2 on
table1.im_name = ctedata2.im_name
and table1.brand = ctedata2.kpibrand
where
table1.deleted_at is null
and table1.id_region = any('{1}')
and lower(table1.brand) = any('{"brand1","brand2"}')
and 'season1' = any(lower(seasons::text)::text[])
and table1.department = 'Department1'
order by
total_sales_amount desc offset 0
limit 100
The explain output for above query is
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=172326.55..173435.38 rows=1 width=952) (actual time=85664.201..85665.970 rows=100 loops=1)
CTE ctedata1
-> GroupAggregate (cost=0.42..80478.71 rows=43468 width=530) (actual time=0.063..708.069 rows=73121 loops=1)
Group Key: table2.im_name, table2.id_region, table2.brand
-> Index Scan using udx_table2_im_name_id_region_brand_target_date_key on table2 (cost=0.42..59699.18 rows=391708 width=146) (actual time=0.029..308.582 rows=391779 loops=1)
Filter: ((deleted_at IS NULL) AND (id_region = ANY ('{1}'::integer[])))
Rows Removed by Filter: 20415
CTE ctedata2
-> WindowAgg (cost=16104.06..17842.78 rows=43468 width=628) (actual time=1012.994..1082.057 rows=73121 loops=1)
-> WindowAgg (cost=16104.06..17082.09 rows=43468 width=620) (actual time=945.755..1014.656 rows=73121 loops=1)
-> Sort (cost=16104.06..16212.73 rows=43468 width=612) (actual time=945.747..963.254 rows=73121 loops=1)
Sort Key: ctedata1.kpiregion, ctedata1.kpibrand, (COALESCE(ctedata1.total_sales_amount, '0'::numeric)) DESC
Sort Method: external merge Disk: 6536kB
-> CTE Scan on ctedata1 (cost=0.00..869.36 rows=43468 width=612) (actual time=0.069..824.841 rows=73121 loops=1)
-> Result (cost=74005.05..75113.88 rows=1 width=952) (actual time=85664.199..85665.950 rows=100 loops=1)
-> Sort (cost=74005.05..74005.05 rows=1 width=944) (actual time=85664.072..85664.089 rows=100 loops=1)
Sort Key: (COALESCE(ctedata2.total_sales_amount, '0'::numeric)) DESC
Sort Method: top-N heapsort Memory: 76kB
-> WindowAgg (cost=10960.95..74005.04 rows=1 width=944) (actual time=85658.049..85661.393 rows=3151 loops=1)
-> Nested Loop Left Join (cost=10960.95..74005.02 rows=1 width=927) (actual time=1075.219..85643.595 rows=3151 loops=1)
Join Filter: (((table1.im_name)::text = ctedata2.im_name) AND ((table1.brand)::text = ctedata2.kpibrand))
Rows Removed by Join Filter: 230402986
-> Bitmap Heap Scan on table1 (cost=10960.95..72483.64 rows=1 width=399) (actual time=45.466..278.376 rows=3151 loops=1)
Recheck Cond: (id_region = ANY ('{1}'::integer[]))
Filter: ((deleted_at IS NULL) AND (department = 'Department1'::text) AND (lower((brand)::text) = ANY ('{brand1, brand2}'::text[])) AND ('season1'::text = ANY ((lower((seasons)::text))::text[])))
Rows Removed by Filter: 106335
Heap Blocks: exact=42899
-> Bitmap Index Scan on table1_im_name_id_region_key (cost=0.00..10960.94 rows=110619 width=0) (actual time=38.307..38.307 rows=109486 loops=1)
Index Cond: (id_region = ANY ('{1}'::integer[]))
-> CTE Scan on ctedata2 (cost=0.00..869.36 rows=43468 width=592) (actual time=0.325..21.721 rows=73121 loops=3151)
SubPlan 3
-> Aggregate (cost=1108.80..1108.81 rows=1 width=8) (actual time=0.018..0.018 rows=1 loops=100)
-> Nested Loop Left Join (cost=5.57..1108.57 rows=93 width=4) (actual time=0.007..0.016 rows=3 loops=100)
-> Bitmap Heap Scan on table3 (cost=5.15..350.95 rows=93 width=4) (actual time=0.005..0.008 rows=3 loops=100)
Recheck Cond: (id_pf_item = table1.id_pf_item)
Filter: (deleted_at IS NULL)
Heap Blocks: exact=107
-> Bitmap Index Scan on idx_id_pf_item (cost=0.00..5.12 rows=93 width=0) (actual time=0.003..0.003 rows=3 loops=100)
Index Cond: (id_pf_item = table1.id_pf_item)
-> Index Scan using index_table4_id_item on table4 (cost=0.42..8.14 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=303)
Index Cond: (table3.id_item = id_item)
Filter: (deleted_at IS NULL)
Rows Removed by Filter: 0
Planning time: 1.023 ms
Execution time: 85669.512 ms
I changed
and lower(table1.brand) = any('{"brand1","brand2"}')
in the query to
and table1.brand = any('{"Brand1","Brand2"}')
and the plan changed to
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=173137.44..188661.06 rows=14 width=952) (actual time=1444.123..1445.653 rows=100 loops=1)
CTE ctedata1
-> GroupAggregate (cost=0.42..80478.71 rows=43468 width=530) (actual time=0.040..769.982 rows=73121 loops=1)
Group Key: table2.im_name, table2.id_region, table2.brand
-> Index Scan using udx_table2_item_im_name_id_region_brand_target_date_key on table2 (cost=0.42..59699.18 rows=391708 width=146) (actual time=0.021..350.774 rows=391779 loops=1)
Filter: ((deleted_at IS NULL) AND (id_region = ANY ('{1}'::integer[])))
Rows Removed by Filter: 20415
CTE ctedata2
-> WindowAgg (cost=16104.06..17842.78 rows=43468 width=628) (actual time=1088.905..1153.749 rows=73121 loops=1)
-> WindowAgg (cost=16104.06..17082.09 rows=43468 width=620) (actual time=1020.017..1089.117 rows=73121 loops=1)
-> Sort (cost=16104.06..16212.73 rows=43468 width=612) (actual time=1020.011..1037.170 rows=73121 loops=1)
Sort Key: ctedata1.kpiregion, ctedata1.kpibrand, (COALESCE(ctedata1.total_sales_amount, '0'::numeric)) DESC
Sort Method: external merge Disk: 6536kB
-> CTE Scan on ctedata1 (cost=0.00..869.36 rows=43468 width=612) (actual time=0.044..891.653 rows=73121 loops=1)
-> Result (cost=74815.94..90339.56 rows=14 width=952) (actual time=1444.121..1445.635 rows=100 loops=1)
-> Sort (cost=74815.94..74815.98 rows=14 width=944) (actual time=1444.053..1444.065 rows=100 loops=1)
Sort Key: (COALESCE(ctedata2.total_sales_amount, '0'::numeric)) DESC
Sort Method: top-N heapsort Memory: 76kB
-> WindowAgg (cost=72207.31..74815.68 rows=14 width=944) (actual time=1439.128..1441.885 rows=3151 loops=1)
-> Hash Right Join (cost=72207.31..74815.40 rows=14 width=927) (actual time=1307.531..1437.246 rows=3151 loops=1)
Hash Cond: ((ctedata2.im_name = (table1.im_name)::text) AND (ctedata2.kpibrand = (table1.brand)::text))
-> CTE Scan on ctedata2 (cost=0.00..869.36 rows=43468 width=592) (actual time=1088.911..1209.646 rows=73121 loops=1)
-> Hash (cost=72207.10..72207.10 rows=14 width=399) (actual time=216.850..216.850 rows=3151 loops=1)
Buckets: 4096 (originally 1024) Batches: 1 (originally 1) Memory Usage: 1249kB
-> Bitmap Heap Scan on table1 (cost=10960.95..72207.10 rows=14 width=399) (actual time=46.434..214.246 rows=3151 loops=1)
Recheck Cond: (id_region = ANY ('{1}'::integer[]))
Filter: ((deleted_at IS NULL) AND (department = 'Department1'::text) AND ((brand)::text = ANY ('{Brand1, Brand2}'::text[])) AND ('season1'::text = ANY ((lower((seasons)::text))::text[])))
Rows Removed by Filter: 106335
Heap Blocks: exact=42899
-> Bitmap Index Scan on table1_im_name_id_region_key (cost=0.00..10960.94 rows=110619 width=0) (actual time=34.849..34.849 rows=109486 loops=1)
Index Cond: (id_region = ANY ('{1}'::integer[]))
SubPlan 3
-> Aggregate (cost=1108.80..1108.81 rows=1 width=8) (actual time=0.015..0.015 rows=1 loops=100)
-> Nested Loop Left Join (cost=5.57..1108.57 rows=93 width=4) (actual time=0.006..0.014 rows=3 loops=100)
-> Bitmap Heap Scan on table3 (cost=5.15..350.95 rows=93 width=4) (actual time=0.004..0.006 rows=3 loops=100)
Recheck Cond: (id_pf_item = table1.id_pf_item)
Filter: (deleted_at IS NULL)
Heap Blocks: exact=107
-> Bitmap Index Scan on idx_id_pf_item (cost=0.00..5.12 rows=93 width=0) (actual time=0.003..0.003 rows=3 loops=100)
Index Cond: (id_pf_item = table1.id_pf_item)
-> Index Scan using index_table4_id_item on table4 (cost=0.42..8.14 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=303)
Index Cond: (table3.id_item = id_item)
Filter: (deleted_at IS NULL)
Rows Removed by Filter: 0
Planning time: 0.760 ms
Execution time: 1448.848 ms
My Observation
The join strategy for table1 left join ctedata2 changes after the lower() function is avoided. The strategy changes from nested loop left join to hash right join.
The CTE Scan node on ctedata2 is executed only once in the better performing query.
Postgres Version
9.6
Please help me to understand this behaviour. I will supply additional info if required.
It is almost not worthwhile taking a deep dive into the inner workings of a nearly-obsolete version. That time and energy is probably better spent jollying along an upgrade.
But the problem is pretty plain. Your scan on table1 is estimated dreadfully, although 14 times less dreadful in the better plan.
-> Bitmap Heap Scan on table1 (cost=10960.95..72483.64 rows=1 width=399) (actual time=45.466..278.376 rows=3151 loops=1)
-> Bitmap Heap Scan on table1 (cost=10960.95..72207.10 rows=14 width=399) (actual time=46.434..214.246 rows=3151 loops=1)
Your use of lower(), apparently without reason, surely contributes to the poor estimation. And dynamically converting a string into an array certainly doesn't help either. If it were stored as a real array in the first place, the statistics system could get its hands on it and generate more reasonable estimates.

Query using WHERE on a fixed value performs badly. Why is this?

I use a simple query with a WHERE on a fixed value. This query does a left join on a temporary view. For some reason this query is performing very badly. I guess that the view is being executed for every row and not only for the selected rows. When I replace the fixed value with a value from a temporary table, the query performs MUCH better (about 15-20 times faster). Why is this?
I use postgresql version 9.2.15.
I added a temporary table 'wrkovz91-selecties' with only 1 record to pass the selection-value of the WHERE instruction to the query.
The view 'view_wrk_012_000001' that is joined in the query is pretty 'heavy', because it contains a nested other view ('view_wrk_013_000006').
First create the temporary table and add one record:
CREATE TEMPORARY TABLE WRKOVZ91_SELECTIES
(
NR DECIMAL(006) NOT NULL,
KLANTNR DECIMAL(008),
CONSTRAINT WRKOVZ91_SELECTIES_KEY_001 PRIMARY KEY (NR)
);
INSERT INTO WRKOVZ91_SELECTIES
(NR, KLANTNR) VALUES (1, 1);
Then create the temporary view (with a nested view):
CREATE TEMPORARY VIEW view_wrk_013_000006 AS
SELECT vrr.klantnr AS klantnr,
UPPER(vrr.artnr) AS artnr,
vrr.partijnr AS partijnr,
SUM(vrr.kollo) AS colli_op_voorraad
FROM voorraad vrr
WHERE (vrr.status = 'A'::text)
GROUP BY 1,
2,
3;
CREATE TEMPORARY VIEW view_wrk_012_000001 AS
SELECT COALESCE(wrd.klantnr,0) AS klantnr,
COALESCE(wrd.wrkopdrnr,0) AS wrkopdrnr,
MIN(COALESCE(wrd.recept_benodigd_colli,0) *
COALESCE(wrk.te_produceren,0)) AS vrije_voorraad_ind,
MIN(COALESCE(v40.colli_op_voorraad,0)) AS hulpveld
FROM wrkopdr wrk
LEFT JOIN wrkopdrd wrd ON wrd.klantnr = wrk.klantnr
AND wrd.wrkopdrnr = wrk.wrkopdrnr
LEFT JOIN view_wrk_013_000006 v40 ON v40.klantnr =
COALESCE(wrd.klantnr_grondstof,0)
AND v40.artnr = COALESCE(wrd.artnr_grondstof,'')
AND v40.partijnr = COALESCE(wrd.partijnr_grondstof,0)
LEFT JOIN artikel art ON art.klantnr = COALESCE(wrd.klantnr_grondstof,0)
AND art.artnr = COALESCE(wrd.artnr_grondstof,'')
WHERE wrk.status = 'A'::text
AND art.voorraadhoudend_jn = 'J'::text
GROUP BY 1,
2;
The query that performs badly is simple:
SELECT WRK.KLANTNR,
WRK.WRKOPDRNR,
WRK.MIL_UITVOER_DATUM
FROM WRKOPDR WRK
LEFT JOIN VIEW_WRK_012_000001 V38 ON V38.KLANTNR = WRK.KLANTNR AND
V38.WRKOPDRNR = WRK.WRKOPDRNR
LEFT JOIN WRKOVZ91_SELECTIES S02 ON S02.NR = 1
WHERE WRK.KLANTNR = 1
LIMIT 9999;
The query that performs MUCH better is (note the small difference):
SELECT WRK.KLANTNR,
WRK.WRKOPDRNR,
WRK.MIL_UITVOER_DATUM
FROM WRKOPDR WRK
LEFT JOIN VIEW_WRK_012_000001 V38 ON V38.KLANTNR = WRK.KLANTNR AND
V38.WRKOPDRNR = WRK.WRKOPDRNR
LEFT JOIN WRKOVZ91_SELECTIES S02 ON S02.NR = 1
WHERE WRK.KLANTNR = S02.KLANTNR
LIMIT 9999;
I cannot understand why the slow query is performing so badly. It takes in my test-data about 219 secondes. The fast query is taking only 12 secondes. The only difference between the 2 queries is the selection-value in the WHERE.
Does anyone have an explanation for this behaviour?
In addition, the output of the explain analyse of the slow query is:
Limit (cost=19460.18..19972.73 rows=9999 width=18) (actual time=221573.343..221585.953 rows=9999 loops=1)
-> Hash Left Join (cost=19460.18..20913.07 rows=28344 width=18) (actual time=221573.341..221583.701 rows=9999 loops=1)
Hash Cond: ((wrk.klantnr = v38.klantnr) AND (wrk.wrkopdrnr = v38.wrkopdrnr))
-> Seq Scan on wrkopdr wrk (cost=0.00..1240.30 rows=28344 width=18) (actual time=0.055..5.490 rows=9999 loops=1)
Filter: (klantnr = 1::numeric)
-> Hash (cost=19460.17..19460.17 rows=1 width=64) (actual time=221573.254..221573.254 rows=2621 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 113kB
-> Subquery Scan on v38 (cost=19460.15..19460.17 rows=1 width=64) (actual time=221570.429..221572.158 rows=2621 loops=1)
-> HashAggregate (cost=19460.15..19460.16 rows=1 width=52) (actual time=221570.429..221571.499 rows=2621 loops=1)
-> Nested Loop Left Join (cost=14049.90..19460.14 rows=1 width=52) (actual time=225.848..221495.813 rows=6801 loops=1)
Join Filter: ((vrr.klantnr = COALESCE(wrd.klantnr_grondstof, 0::numeric)) AND ((upper(vrr.artnr)) = COALESCE(wrd.artnr_grondstof, ''::text)) AND (vrr.partijnr = COALESCE(wrd.partijnr_grondstof, 0::numeric)))
Rows Removed by Join Filter: 308209258
-> Nested Loop (cost=1076.77..6011.60 rows=1 width=43) (actual time=9.506..587.824 rows=6801 loops=1)
-> Hash Right Join (cost=1076.77..5828.70 rows=69 width=43) (actual time=9.428..204.601 rows=7861 loops=1)
Hash Cond: ((wrd.klantnr = wrk.klantnr) AND (wrd.wrkopdrnr = wrk.wrkopdrnr))
Filter: (COALESCE(wrd.klantnr, 0::numeric) = 1::numeric)
Rows Removed by Filter: 1
-> Seq Scan on wrkopdrd wrd (cost=0.00..3117.73 rows=116873 width=38) (actual time=0.013..65.472 rows=116873 loops=1)
-> Hash (cost=1026.34..1026.34 rows=3362 width=16) (actual time=9.324..9.324 rows=3362 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 161kB
-> Bitmap Heap Scan on wrkopdr wrk (cost=98.31..1026.34 rows=3362 width=16) (actual time=0.850..7.843 rows=3362 loops=1)
Recheck Cond: (status = 'A'::text)
-> Bitmap Index Scan on wrkopdr_key_002 (cost=0.00..97.47 rows=3362 width=0) (actual time=0.763..0.763 rows=3362 loops=1)
Index Cond: (status = 'A'::text)
-> Index Scan using artikel_key_001 on artikel art (cost=0.00..2.64 rows=1 width=17) (actual time=0.037..0.043 rows=1 loops=7861)
Index Cond: ((klantnr = COALESCE(wrd.klantnr_grondstof, 0::numeric)) AND (artnr = COALESCE(wrd.artnr_grondstof, ''::text)))
Filter: (voorraadhoudend_jn = 'J'::text)
Rows Removed by Filter: 0
-> HashAggregate (cost=12973.14..13121.70 rows=11885 width=25) (actual time=0.027..19.661 rows=45319 loops=6801)
-> Seq Scan on voorraad vrr (cost=0.00..12340.71 rows=63243 width=25) (actual time=0.456..122.855 rows=62655 loops=1)
Filter: (status = 'A'::text)
Rows Removed by Filter: 113953
Total runtime: 221587.386 ms
(33 rows)
The output of the explain analyse of the fast query is:
Limit (cost=56294.55..57997.06 rows=142 width=18) (actual time=445.371..12739.474 rows=9999 loops=1)
-> Nested Loop Left Join (cost=56294.55..57997.06 rows=142 width=18) (actual time=445.368..12736.035 rows=9999 loops=1)
Join Filter: ((v38.klantnr = wrk.klantnr) AND (v38.wrkopdrnr = wrk.wrkopdrnr))
Rows Removed by Join Filter: 26206807
-> Nested Loop (cost=0.00..1532.01 rows=142 width=18) (actual time=0.055..18.652 rows=9999 loops=1)
Join Filter: (wrk.klantnr = s02.klantnr)
-> Index Scan using wrkovz91_selecties_key_001 on wrkovz91_selecties s02 (cost=0.00..8.27 rows=1 width=14) (actual time=0.021..0.021 rows=1 loops=1)
Index Cond: (nr = 1::numeric)
-> Seq Scan on wrkopdr wrk (cost=0.00..1169.44 rows=28344 width=18) (actual time=0.026..11.905 rows=9999 loops=1)
-> Materialize (cost=56294.55..56296.25 rows=68 width=64) (actual time=0.044..0.380 rows=2621 loops=9999)
-> Subquery Scan on v38 (cost=56294.55..56295.91 rows=68 width=64) (actual time=441.797..443.503 rows=2621 loops=1)
-> HashAggregate (cost=56294.55..56295.23 rows=68 width=52) (actual time=441.795..442.848 rows=2621 loops=1)
-> Hash Left Join (cost=14525.30..56293.70 rows=68 width=52) (actual time=255.847..433.386 rows=6801 loops=1)
Hash Cond: ((COALESCE(wrd.klantnr_grondstof, 0::numeric) = v40.klantnr) AND (COALESCE(wrd.artnr_grondstof, ''::text) = v40.artnr) AND (COALESCE(wrd.partijnr_grondstof, 0::numeric) = v40.partijnr))
-> Nested Loop (cost=1076.77..42541.70 rows=68 width=43) (actual time=10.356..171.502 rows=6801 loops=1)
-> Hash Right Join (cost=1076.77..5794.04 rows=13863 width=43) (actual time=10.286..91.471 rows=7862 loops=1)
Hash Cond: ((wrd.klantnr = wrk.klantnr) AND (wrd.wrkopdrnr = wrk.wrkopdrnr))
-> Seq Scan on wrkopdrd wrd (cost=0.00..3117.73 rows=116873 width=38) (actual time=0.014..38.276 rows=116873 loops=1)
-> Hash (cost=1026.34..1026.34 rows=3362 width=16) (actual time=10.179..10.179 rows=3362 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 161kB
-> Bitmap Heap Scan on wrkopdr wrk (cost=98.31..1026.34 rows=3362 width=16) (actual time=0.835..8.667 rows=3362 loops=1)
Recheck Cond: (status = 'A'::text)
-> Bitmap Index Scan on wrkopdr_key_002 (cost=0.00..97.47 rows=3362 width=0) (actual time=0.748..0.748 rows=3362 loops=1)
Index Cond: (status = 'A'::text)
-> Index Scan using artikel_key_001 on artikel art (cost=0.00..2.64 rows=1 width=17) (actual time=0.009..0.009 rows=1 loops=7862)
Index Cond: ((klantnr = COALESCE(wrd.klantnr_grondstof, 0::numeric)) AND (artnr = COALESCE(wrd.artnr_grondstof, ''::text)))
Filter: (voorraadhoudend_jn = 'J'::text)
Rows Removed by Filter: 0
-> Hash (cost=13240.55..13240.55 rows=11885 width=74) (actual time=245.430..245.430 rows=45319 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 2645kB
-> Subquery Scan on v40 (cost=12973.14..13240.55 rows=11885 width=74) (actual time=186.156..222.042 rows=45319 loops=1)
-> HashAggregate (cost=12973.14..13121.70 rows=11885 width=25) (actual time=186.154..209.633 rows=45319 loops=1)
-> Seq Scan on voorraad vrr (cost=0.00..12340.71 rows=63243 width=25) (actual time=0.453..126.361 rows=62655 loops=1)
Filter: (status = 'A'::text)
Rows Removed by Filter: 113953
Total runtime: 12742.125 ms
(36 rows)

Increase execution speed of PostgreSQL query

I am facing a serious PostgreSQL efficiency issue with the following query. The speed of execution is too low even with enough CPU and memory in the server.
SELECT "kvm_bills".*, "billcat"."cat_name", "contractor"."con_id", "contractor"."con_name", "contractor"."con_address", "contractor"."con_mobileno", "billbranch"."branch_name"
FROM "kvm_bills" LEFT JOIN
"kvm_bill_categories" AS "billcat"
ON billcat.cat_id =kvm_bills.bill_cat_id LEFT JOIN
"kvm_bill_contractors" AS "contractor"
ON contractor.con_id =kvm_bills.bill_con_id LEFT JOIN
"kvm_core_branches" AS "billbranch"
ON billbranch.branch_id =kvm_bills.bill_branch
WHERE (kvm_bills.deleted = 0) AND
(bill_branch IN (258, 259, 332, 66, 65, 63, 168, 169, 170, 309, 330, 418, 257)) AND
(kvm_bills.bill_id NOT IN (SELECT kvm_core_voucherdet.vchrdet_bill_id
FROM kvm_core_voucherdet
WHERE kvm_core_voucherdet.deleted=0
)
) AND
(contractor.con_mobileno LIKE '123456') AND
(bill_approve_stat = 2) AND
(billcat.deleted = 0) AND
(contractor.deleted = 0) AND
(billbranch.deleted = 0)
ORDER BY "bill_branch" DESC, "bill_ref_no" ASC
The query plan taken through EXPLAIN ANALYSE is as follows:
QUERY PLAN
Sort (cost=501356982.86..501356982.86 rows=2 width=346) (actual time=155806.015..155806.015 rows=8 loops=1)
Sort Key: kvm_bills.bill_branch, kvm_bills.bill_ref_no
Sort Method: quicksort Memory: 29kB
-> Nested Loop (cost=0.00..501356982.85 rows=2 width=346) (actual time=2909.407..155805.861 rows=8 loops=1)
-> Nested Loop (cost=0.00..501356982.26 rows=2 width=325) (actual time=2909.297..155805.599 rows=8 loops=1)
-> Nested Loop (cost=0.00..501356981.69 rows=2 width=310) (actual time=2909.073..155805.155 rows=8 loops=1)
Join Filter: (kvm_bills.bill_con_id = contractor.con_id)
Rows Removed by Join Filter: 7855
-> Seq Scan on kvm_bills (cost=0.00..501356587.87 rows=6446 width=228) (actual time=63.218..155799.854 rows=2621 loops=1)
Filter: ((deleted = 0) AND (bill_approve_stat = 2) AND (bill_branch = ANY ('{258,259,332,66,65,63,168,169,170,309,330,418,257}'::bigint[])) AND (NOT (SubPlan 1)))
Rows Removed by Filter: 271730
SubPlan 1
-> Materialize (cost=0.00..3442.08 rows=85093 width=8) (actual time=0.003..6.998 rows=50182 loops=11956)
-> Seq Scan on kvm_core_voucherdet (cost=0.00..2683.61 rows=85093 width=8) (actual time=0.019..33.118 rows=84909 loops=1)
Filter: (deleted = 0)
Rows Removed by Filter: 6100
-> Materialize (cost=0.00..200.45 rows=2 width=82) (actual time=0.000..0.001 rows=3 loops=2621)
-> Seq Scan on kvm_bill_contractors contractor (cost=0.00..200.44 rows=2 width=82) (actual time=0.643..1.932 rows=3 loops=1)
Filter: ((con_mobileno ~~ '123456'::text) AND (deleted = 0))
Rows Removed by Filter: 5494
-> Index Scan using kvm_bill_categories_pkey on kvm_bill_categories billcat (cost=0.00..0.27 rows=1 width=23) (actual time=0.029..0.030 rows=1 loops=8)
Index Cond: (cat_id = kvm_bills.bill_cat_id)
Filter: (deleted = 0)
-> Index Scan using kvm_core_branches_pkey on kvm_core_branches billbranch (cost=0.00..0.28 rows=1 width=29) (actual time=0.018..0.019 rows=1 loops=8)
Index Cond: (branch_id = kvm_bills.bill_branch)
Filter: (deleted = 0)
Total runtime: 155807.130 ms
I believe that the NOT IN subquery is the culprit here which is giving the additional SubPlan 1.
Currently there is a btree index on kvm_core_voucherdet.vchrdet_bill_id. Is there any way the speed of this query can be improved either by adding additional indexes or by some other mechanism?
Like Nick.McDermald says, you should convert the NOT IN clause to
NOT EXISTS (SELECT 1
FROM kvm_core_voucherdet
WHERE kvm_core_voucherdet.vchrdet_bill_id = kvm_bills.bill_id
AND kvm_core_voucherdet.deleted = 0)
That will get you a join, which is probably faster.
In addition, you should create an index on kvm_bills:
CREATE INDEX ON kvm_bills (bill_branch)
WHERE deleted = 0 AND bill_approve_stat = 2;
If the constants are not always 0 and 2, use the following index instead:
CREATE INDEX ON kvm_bills (bill_approve_stat, deleted, bill_branch);

Analyze: Why a query taking could take so long, seems costs are low?

I am having these results for analyze for a simple query that does not return more than 150 records from tables less than 200 records most of them, as I have a table that stores latest value and the other fields are FK of the data.
Update: see the new results from same query some our later. The site is not public and/or there should be not users right now as it is in development.
explain analyze
SELECT lv.station_id,
s.name AS station_name,
s.latitude,
s.longitude,
s.elevation,
lv.element_id,
e.symbol AS element_symbol,
u.symbol,
e.name AS element_name,
lv.last_datetime AS datetime,
lv.last_value AS valor,
s.basin_id,
s.municipality_id
FROM (((element_station lv /*350 records*/
JOIN stations s ON ((lv.station_id = s.id))) /*40 records*/
JOIN elements e ON ((lv.element_id = e.id))) /*103 records*/
JOIN units u ON ((e.unit_id = u.id))) /* 32 records */
WHERE s.id = lv.station_id AND e.id = lv.element_id AND lv.interval_id = 6 and
lv.last_datetime >= ((now() - '06:00:00'::interval) - '01:00:00'::interval)
I have already tried VACUUM and after that some is saved, but again after some times it goes up. I have implemented an index on the fields.
Nested Loop (cost=0.29..2654.66 rows=1 width=92) (actual time=1219.390..35296.253 rows=157 loops=1)
Join Filter: (e.unit_id = u.id)
Rows Removed by Join Filter: 4867
-> Nested Loop (cost=0.29..2652.93 rows=1 width=92) (actual time=1219.383..35294.083 rows=157 loops=1)
Join Filter: (lv.element_id = e.id)
Rows Removed by Join Filter: 16014
-> Nested Loop (cost=0.29..2648.62 rows=1 width=61) (actual time=1219.301..35132.373 rows=157 loops=1)
-> Seq Scan on element_station lv (cost=0.00..2640.30 rows=1 width=20) (actual time=1219.248..1385.517 rows=157 loops=1)
Filter: ((interval_id = 6) AND (last_datetime >= ((now() - '06:00:00'::interval) - '01:00:00'::interval)))
Rows Removed by Filter: 168
-> Index Scan using stations_pkey on stations s (cost=0.29..8.31 rows=1 width=45) (actual time=3.471..214.941 rows=1 loops=157)
Index Cond: (id = lv.station_id)
-> Seq Scan on elements e (cost=0.00..3.03 rows=103 width=35) (actual time=0.003..0.999 rows=103 loops=157)
-> Seq Scan on units u (cost=0.00..1.32 rows=32 width=8) (actual time=0.002..0.005 rows=32 loops=157)
Planning time: 8.312 ms
Execution time: 35296.427 ms
update, same query running it tonight; no changes:
Sort (cost=601.74..601.88 rows=55 width=92) (actual time=1.822..1.841 rows=172 loops=1)
Sort Key: lv.last_datetime DESC
Sort Method: quicksort Memory: 52kB
-> Nested Loop (cost=11.60..600.15 rows=55 width=92) (actual time=0.287..1.680 rows=172 loops=1)
-> Hash Join (cost=11.31..248.15 rows=55 width=51) (actual time=0.263..0.616 rows=172 loops=1)
Hash Cond: (e.unit_id = u.id)
-> Hash Join (cost=9.59..245.60 rows=75 width=51) (actual time=0.225..0.528 rows=172 loops=1)
Hash Cond: (lv.element_id = e.id)
-> Bitmap Heap Scan on element_station lv (cost=5.27..240.25 rows=75 width=20) (actual time=0.150..0.359 rows=172 loops=1)
Recheck Cond: ((last_datetime >= ((now() - '06:00:00'::interval) - '01:00:00'::interval)) AND (interval_id = 6))
Heap Blocks: exact=22
-> Bitmap Index Scan on element_station_latest (cost=0.00..5.25 rows=75 width=0) (actual time=0.136..0.136 rows=226 loops=1)
Index Cond: ((last_datetime >= ((now() - '06:00:00'::interval) - '01:00:00'::interval)) AND (interval_id = 6))
-> Hash (cost=3.03..3.03 rows=103 width=35) (actual time=0.062..0.062 rows=103 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 15kB
-> Seq Scan on elements e (cost=0.00..3.03 rows=103 width=35) (actual time=0.006..0.031 rows=103 loops=1)
-> Hash (cost=1.32..1.32 rows=32 width=8) (actual time=0.019..0.019 rows=32 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> Seq Scan on units u (cost=0.00..1.32 rows=32 width=8) (actual time=0.003..0.005 rows=32 loops=1)
-> Index Scan using stations_pkey on stations s (cost=0.29..6.39 rows=1 width=45) (actual time=0.005..0.006 rows=1 loops=172)
Index Cond: (id = lv.station_id)
Planning time: 2.390 ms
Execution time: 2.009 ms
The problem is the misestimate of the number of rows in the sequential scan on element_station. Either autoanalyze has kicked in and calculated new statistics for the table or the data changed.
The problem is probably that PostgreSQL doesn't know the result of
((now() - '06:00:00'::interval) - '01:00:00'::interval)
at query planning time.
If that is possible for you, do it in two steps: First, calculate the expression above (either in PostgreSQL or on the client side). Then run the query with the result as a constant. That will make it easier for PostgreSQL to estimate the result count.

Postgres Table Slow Performance

We have a Product table in postgres DB. This is hosted on Heroku. We have 8 GB RAM and 250 GB disk space. 1000 IPOP allowed.
We are having proper indexes on columns.
Platform
PostgreSQL 9.5.12 on x86_64-pc-linux-gnu (Ubuntu 9.5.12-1.pgdg14.04+1), compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit
We are running a keywords search query on this table. We are having 2.8 millions records in this table. Our search query is too slow. Its giving us result in about 50 seconds. Which is too slow.
Query
SELECT
P .sfid AS prodsfid,
P .image_url__c image,
P .productcode sku,
P .Short_Description__c shortDesc,
P . NAME pname,
P .category__c,
P .price__c price,
P .description,
P .vendor_name__c vname,
P .vendor__c supSfid
FROM
staging.product2 P
JOIN (
SELECT
p1.sfid
FROM
staging.product2 p1
WHERE
p1. NAME ILIKE '%s%'
OR p1.productcode ILIKE '%s%'
) AS TEMP ON (P .sfid = TEMP .sfid)
WHERE
P .status__c = 'Available'
AND LOWER (
P .vendor_shipping_country__c
) = ANY (
VALUES
('us'),
('usa'),
('united states'),
('united states of america')
)
AND P .vendor_catalog_tier__c = ANY (
VALUES
('a1c37000000oljnAAA'),
('a1c37000000oljQAAQ'),
('a1c37000000oljQAAQ'),
('a1c37000000pT7IAAU'),
('a1c37000000omDjAAI'),
('a1c37000000oljMAAQ'),
('a1c37000000oljaAAA'),
('a1c37000000pT7SAAU'),
('a1c0R000000AFcVQAW'),
('a1c0R000000A1HAQA0'),
('a1c0R0000000OpWQAU'),
('a1c0R0000005TZMQA2'),
('a1c37000000oljdAAA'),
('a1c37000000ooTqAAI'),
('a1c37000000omLBAAY'),
('a1c0R0000005N8GQAU')
)
Here is the explain plan:
Nested Loop (cost=31.85..33886.54 rows=3681 width=750)
-> Hash Join (cost=31.77..31433.07 rows=4415 width=750)
Hash Cond: (lower((p.vendor_shipping_country__c)::text) = "*VALUES*".column1)
-> Nested Loop (cost=31.73..31423.67 rows=8830 width=761)
-> HashAggregate (cost=0.06..0.11 rows=16 width=32)
Group Key: "*VALUES*_1".column1
-> Values Scan on "*VALUES*_1" (cost=0.00..0.06 rows=16 width=32)
-> Bitmap Heap Scan on product2 p (cost=31.66..1962.32 rows=552 width=780)
Recheck Cond: ((vendor_catalog_tier__c)::text = "*VALUES*_1".column1)
Filter: ((status__c)::text = 'Available'::text)
-> Bitmap Index Scan on vendor_catalog_tier_prd_idx (cost=0.00..31.64 rows=1016 width=0)
Index Cond: ((vendor_catalog_tier__c)::text = "*VALUES*_1".column1)
-> Hash (cost=0.03..0.03 rows=4 width=32)
-> Unique (cost=0.02..0.03 rows=4 width=32)
-> Sort (cost=0.02..0.02 rows=4 width=32)
Sort Key: "*VALUES*".column1
-> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=4 width=32)
-> Index Scan using sfid_prd_idx on product2 p1 (cost=0.09..0.55 rows=1 width=19)
Index Cond: ((sfid)::text = (p.sfid)::text)
Filter: (((name)::text ~~* '%s%'::text) OR ((productcode)::text ~~* '%s%'::text))
Its returning around 140,576 records. By the way we need only top 5,000 records only. Will putting Limit help here?
Let me know how to make it fast and what is causing this slow.
EXPLAIN ANALYZE
#RaymondNijland Here is the explain analyze
Nested Loop (cost=31.83..33427.28 rows=4039 width=750) (actual time=1.903..4384.221 rows=140576 loops=1)
-> Hash Join (cost=31.74..30971.32 rows=4369 width=750) (actual time=1.852..1094.964 rows=164353 loops=1)
Hash Cond: (lower((p.vendor_shipping_country__c)::text) = "*VALUES*".column1)
-> Nested Loop (cost=31.70..30962.02 rows=8738 width=761) (actual time=1.800..911.738 rows=164353 loops=1)
-> HashAggregate (cost=0.06..0.11 rows=16 width=32) (actual time=0.012..0.019 rows=15 loops=1)
Group Key: "*VALUES*_1".column1
-> Values Scan on "*VALUES*_1" (cost=0.00..0.06 rows=16 width=32) (actual time=0.004..0.005 rows=16 loops=1)
-> Bitmap Heap Scan on product2 p (cost=31.64..1933.48 rows=546 width=780) (actual time=26.004..57.290 rows=10957 loops=15)
Recheck Cond: ((vendor_catalog_tier__c)::text = "*VALUES*_1".column1)
Filter: ((status__c)::text = 'Available'::text)
Rows Removed by Filter: 645
Heap Blocks: exact=88436
-> Bitmap Index Scan on vendor_catalog_tier_prd_idx (cost=0.00..31.61 rows=1000 width=0) (actual time=24.811..24.811 rows=11601 loops=15)
Index Cond: ((vendor_catalog_tier__c)::text = "*VALUES*_1".column1)
-> Hash (cost=0.03..0.03 rows=4 width=32) (actual time=0.032..0.032 rows=4 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Unique (cost=0.02..0.03 rows=4 width=32) (actual time=0.026..0.027 rows=4 loops=1)
-> Sort (cost=0.02..0.02 rows=4 width=32) (actual time=0.026..0.026 rows=4 loops=1)
Sort Key: "*VALUES*".column1
Sort Method: quicksort Memory: 25kB
-> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=4 width=32) (actual time=0.001..0.002 rows=4 loops=1)
-> Index Scan using sfid_prd_idx on product2 p1 (cost=0.09..0.56 rows=1 width=19) (actual time=0.019..0.020 rows=1 loops=164353)
Index Cond: ((sfid)::text = (p.sfid)::text)
Filter: (((name)::text ~~* '%s%'::text) OR ((productcode)::text ~~* '%s%'::text))
Rows Removed by Filter: 0
Planning time: 2.488 ms
Execution time: 4391.378 ms
Another query version, with order by , but it seems very slow as well (140 seconds)
SELECT
P .sfid AS prodsfid,
P .image_url__c image,
P .productcode sku,
P .Short_Description__c shortDesc,
P . NAME pname,
P .category__c,
P .price__c price,
P .description,
P .vendor_name__c vname,
P .vendor__c supSfid
FROM
staging.product2 P
WHERE
P .status__c = 'Available'
AND P .vendor_shipping_country__c IN (
'us',
'usa',
'united states',
'united states of america'
)
AND P .vendor_catalog_tier__c IN (
'a1c37000000omDQAAY',
'a1c37000000omDTAAY',
'a1c37000000omDXAAY',
'a1c37000000omDYAAY',
'a1c37000000omDZAAY',
'a1c37000000omDdAAI',
'a1c37000000omDfAAI',
'a1c37000000omDiAAI',
'a1c37000000oml6AAA',
'a1c37000000oljPAAQ',
'a1c37000000oljRAAQ',
'a1c37000000oljWAAQ',
'a1c37000000oljXAAQ',
'a1c37000000oljZAAQ',
'a1c37000000oljcAAA',
'a1c37000000oljdAAA',
'a1c37000000oljlAAA',
'a1c37000000oljoAAA',
'a1c37000000oljqAAA',
'a1c37000000olnvAAA',
'a1c37000000olnwAAA',
'a1c37000000olnxAAA',
'a1c37000000olnyAAA',
'a1c37000000olo0AAA',
'a1c37000000olo1AAA',
'a1c37000000olo4AAA',
'a1c37000000olo8AAA',
'a1c37000000olo9AAA',
'a1c37000000oloCAAQ',
'a1c37000000oloFAAQ',
'a1c37000000oloIAAQ',
'a1c37000000oloJAAQ',
'a1c37000000oloMAAQ',
'a1c37000000oloNAAQ',
'a1c37000000oloSAAQ',
'a1c37000000olodAAA',
'a1c37000000oloeAAA',
'a1c37000000olzCAAQ',
'a1c37000000om0xAAA',
'a1c37000000ooV1AAI',
'a1c37000000oog8AAA',
'a1c37000000oogDAAQ',
'a1c37000000oonzAAA',
'a1c37000000oluuAAA',
'a1c37000000pT7SAAU',
'a1c37000000oljnAAA',
'a1c37000000olumAAA',
'a1c37000000oljpAAA',
'a1c37000000pUm2AAE',
'a1c37000000olo3AAA',
'a1c37000000oo1MAAQ',
'a1c37000000oo1vAAA',
'a1c37000000pWxgAAE',
'a1c37000000pYJkAAM',
'a1c37000000omDjAAI',
'a1c37000000ooTgAAI',
'a1c37000000op2GAAQ',
'a1c37000000one0AAA',
'a1c37000000oljYAAQ',
'a1c37000000pUlxAAE',
'a1c37000000oo9SAAQ',
'a1c37000000pcIYAAY',
'a1c37000000pamtAAA',
'a1c37000000pd2QAAQ',
'a1c37000000pdCOAAY',
'a1c37000000OpPaAAK',
'a1c37000000OphZAAS',
'a1c37000000olNkAAI'
)
ORDER BY p.productcode asc
LIMIT 5000
Here is the explain analyse for this:
Limit (cost=0.09..45271.54 rows=5000 width=750) (actual time=48593.355..86376.864 rows=5000 loops=1)
-> Index Scan using productcode_prd_idx on product2 p (cost=0.09..743031.39 rows=82064 width=750) (actual time=48593.353..86376.283 rows=5000 loops=1)
Filter: (((status__c)::text = 'Available'::text) AND ((vendor_shipping_country__c)::text = ANY ('{us,usa,"united states","united states of america"}'::text[])) AND ((vendor_catalog_tier__c)::text = ANY ('{a1c37000000omDQAAY,a1c37000000omDTAAY,a1c37000000omDXAAY,a1c37000000omDYAAY,a1c37000000omDZAAY,a1c37000000omDdAAI,a1c37000000omDfAAI,a1c37000000omDiAAI,a1c37000000oml6AAA,a1c37000000oljPAAQ,a1c37000000oljRAAQ,a1c37000000oljWAAQ,a1c37000000oljXAAQ,a1c37000000oljZAAQ,a1c37000000oljcAAA,a1c37000000oljdAAA,a1c37000000oljlAAA,a1c37000000oljoAAA,a1c37000000oljqAAA,a1c37000000olnvAAA,a1c37000000olnwAAA,a1c37000000olnxAAA,a1c37000000olnyAAA,a1c37000000olo0AAA,a1c37000000olo1AAA,a1c37000000olo4AAA,a1c37000000olo8AAA,a1c37000000olo9AAA,a1c37000000oloCAAQ,a1c37000000oloFAAQ,a1c37000000oloIAAQ,a1c37000000oloJAAQ,a1c37000000oloMAAQ,a1c37000000oloNAAQ,a1c37000000oloSAAQ,a1c37000000olodAAA,a1c37000000oloeAAA,a1c37000000olzCAAQ,a1c37000000om0xAAA,a1c37000000ooV1AAI,a1c37000000oog8AAA,a1c37000000oogDAAQ,a1c37000000oonzAAA,a1c37000000oluuAAA,a1c37000000pT7SAAU,a1c37000000oljnAAA,a1c37000000olumAAA,a1c37000000oljpAAA,a1c37000000pUm2AAE,a1c37000000olo3AAA,a1c37000000oo1MAAQ,a1c37000000oo1vAAA,a1c37000000pWxgAAE,a1c37000000pYJkAAM,a1c37000000omDjAAI,a1c37000000ooTgAAI,a1c37000000op2GAAQ,a1c37000000one0AAA,a1c37000000oljYAAQ,a1c37000000pUlxAAE,a1c37000000oo9SAAQ,a1c37000000pcIYAAY,a1c37000000pamtAAA,a1c37000000pd2QAAQ,a1c37000000pdCOAAY,a1c37000000OpPaAAK,a1c37000000OphZAAS,a1c37000000olNkAAI}'::text[])))
Rows Removed by Filter: 1707920
Planning time: 1.685 ms
Execution time: 86377.139 ms
Thanks
Aslam Bari
You might want to consider a GIN or GIST index on your staging.product2 table. Double-sided ILIKEs are slow and difficult to improve substantially. I've seen a GIN index improve a similar query by 60-80%.
See this doc.