How to create postgres date index properly? - postgresql

I'm using Django ORM and postgresql.
ORM creates a query:
SELECT
(date_part('month', stat_date)) AS "stat_date",
"direct_keywordstat"."banner_id",
SUM("direct_keywordstat"."total") AS "total",
SUM("direct_keywordstat"."clicks") AS "clicks",
SUM("direct_keywordstat"."shows") AS "shows"
FROM "direct_keywordstat"
LEFT OUTER JOIN "direct_banner" ON ("direct_keywordstat"."banner_id" = "direct_banner"."banner_ptr_id")
LEFT OUTER JOIN "platforms_banner" ON ("direct_banner"."banner_ptr_id" = "platforms_banner"."id")
WHERE (
"direct_keywordstat".stat_date BETWEEN E'2009-08-25' AND E'2010-08-25' AND
"direct_keywordstat"."keyword_id" IN (
SELECT U0."id"
FROM "direct_keyword" U0
INNER JOIN "direct_banner" U1 ON (U0."banner_id" = U1."banner_ptr_id")
INNER JOIN "platforms_banner" U2 ON (U1."banner_ptr_id" = U2."id")
INNER JOIN "platforms_campaign" U3 ON (U2."campaign_id" = U3."id")
INNER JOIN "direct_campaign" U4 ON (U3."id" = U4."campaign_ptr_id")
WHERE (
U0."deleted" = E'False' AND
U0."low_ctr" = E'False' AND
U4."status_active" = E'True' AND
U0."banner_id" IN (
SELECT U0."banner_ptr_id"
FROM "direct_banner" U0
INNER JOIN "platforms_banner" U1
ON (U0."banner_ptr_id" = U1."id")
WHERE (
U0."status_show" = E'True' AND
U1."campaign_id" = E'174' )
)
)
)
)
GROUP BY
"direct_keywordstat"."banner_id",
(date_part('month', stat_date)),
"platforms_banner"."title", date_trunc('month', stat_date)
ORDER BY "platforms_banner"."title" ASC, "stat_date" ASC
Problem is, direct_keywordstat contains 3mln+ records, so the query executes in ~15 seconds.
I've tried creating indexes like
CREATE INDEX direct_keywordstat_stat_date on direct_keywordstat using btree(stat_date);
But EXPLAIN ANALYZE show that index is not used.
Table schema:
\d direct_keywordstat
Table "public.direct_keywordstat"
Column | Type | Modifiers
-------------+------------------------+-----------------------------------------------------------------
id | integer | not null default nextval('direct_keywordstat_id_seq'::regclass)
keyword_id | integer | not null
banner_id | integer | not null
campaign_id | integer | not null
stat_date | date | not null
region_id | integer | not null
place_type | character varying(30) |
place_name | character varying(100) |
clicks | integer | not null default 0
shows | integer | not null default 0
total | numeric(19,6) | not null
How can i create useful index?
Or, maybe, there's a chance to optimize this query other way?
Thing is, if WHERE looks like
"direct_keywordstat".clicks BETWEEN 10 AND 3000000
query executes in 0.8 seconds.

Do you have indexes on these columns:
direct_banner.banner_ptr_id
direct_keywordstat.banner_id
direct_keywordstat.stat_date
Both columns in direct_keywordstat could be combined in a single index, just check
This is also a problem:
Sort Method: external merge Disk:
20600kB
Check your settings for work_mem, you need at least 20MB for this query.

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=727967.61..847401.71 rows=2514402 width=67) (actual time=22010.522..23408.262 rows=5 loops=1)
-> Sort (cost=727967.61..734253.62 rows=2514402 width=67) (actual time=21742.365..23134.748 rows=198978 loops=1)
Sort Key: platforms_banner.title, (date_part('month'::text, (direct_keywordstat.stat_date)::timestamp without time zone)), direct_keywordstat.banner_id, (date_trunc('month'::text, (direct_keywordstat.stat_date)::timestamp with time zone))
Sort Method: external merge Disk: 20600kB
-> Hash Join (cost=1034.02..164165.25 rows=2514402 width=67) (actual time=5159.538..14942.441 rows=198978 loops=1)
Hash Cond: (direct_keywordstat.keyword_id = u0.id)
-> Hash Left Join (cost=365.78..117471.99 rows=2514402 width=71) (actual time=26.672..13101.294 rows=2523151 loops=1)
Hash Cond: (direct_keywordstat.banner_id = direct_banner.banner_ptr_id)
-> Seq Scan on direct_keywordstat (cost=0.00..76247.17 rows=2514402 width=25) (actual time=8.892..9386.010 rows=2523151 loops=1)
Filter: ((stat_date >= '2009-08-25'::date) AND (stat_date <= '2010-08-25'::date))
-> Hash (cost=324.86..324.86 rows=3274 width=50) (actual time=17.754..17.754 rows=2851 loops=1)
-> Hash Left Join (cost=209.15..324.86 rows=3274 width=50) (actual time=10.845..15.385 rows=2851 loops=1)
Hash Cond: (direct_banner.banner_ptr_id = platforms_banner.id)
-> Seq Scan on direct_banner (cost=0.00..66.74 rows=3274 width=4) (actual time=0.004..1.196 rows=2851 loops=1)
-> Hash (cost=173.51..173.51 rows=2851 width=50) (actual time=10.683..10.683 rows=2851 loops=1)
-> Seq Scan on platforms_banner (cost=0.00..173.51 rows=2851 width=50) (actual time=0.004..3.576 rows=2851 loops=1)
-> Hash (cost=641.44..641.44 rows=2144 width=4) (actual time=30.420..30.420 rows=106 loops=1)
-> HashAggregate (cost=620.00..641.44 rows=2144 width=4) (actual time=30.162..30.288 rows=106 loops=1)
-> Hash Join (cost=407.17..614.64 rows=2144 width=4) (actual time=16.152..30.031 rows=106 loops=1)
Hash Cond: (u0.banner_id = u1.banner_ptr_id)
-> Nested Loop (cost=76.80..238.50 rows=6488 width=16) (actual time=8.670..22.343 rows=106 loops=1)
-> HashAggregate (cost=76.80..76.87 rows=7 width=8) (actual time=0.045..0.047 rows=1 loops=1)
-> Nested Loop (cost=0.00..76.79 rows=7 width=8) (actual time=0.033..0.036 rows=1 loops=1)
-> Index Scan using platforms_banner_campaign_id on platforms_banner u1 (cost=0.00..22.82 rows=7 width=4) (actual time=0.019..0.020 rows=1 loops=1)
Index Cond: (campaign_id = 174)
-> Index Scan using direct_banner_pkey on direct_banner u0 (cost=0.00..7.70 rows=1 width=4) (actual time=0.009..0.011 rows=1 loops=1)
Index Cond: (u0.banner_ptr_id = u1.id)
Filter: u0.status_show
-> Index Scan using direct_keyword_banner_id on direct_keyword u0 (cost=0.00..23.03 rows=5 width=8) (actual time=8.620..22.127 rows=106 loops=1)
Index Cond: (u0.banner_id = u0.banner_ptr_id)
Filter: ((NOT u0.deleted) AND (NOT u0.low_ctr))
-> Hash (cost=316.84..316.84 rows=1082 width=8) (actual time=7.458..7.458 rows=403 loops=1)
-> Hash Join (cost=227.00..316.84 rows=1082 width=8) (actual time=3.584..7.149 rows=403 loops=1)
Hash Cond: (u1.banner_ptr_id = u2.id)
-> Seq Scan on direct_banner u1 (cost=0.00..66.74 rows=3274 width=4) (actual time=0.002..1.570 rows=2851 loops=1)
-> Hash (cost=213.48..213.48 rows=1082 width=4) (actual time=3.521..3.521 rows=403 loops=1)
-> Hash Join (cost=23.88..213.48 rows=1082 width=4) (actual time=0.715..3.268 rows=403 loops=1)
Hash Cond: (u2.campaign_id = u3.id)
-> Seq Scan on platforms_banner u2 (cost=0.00..173.51 rows=2851 width=8) (actual time=0.001..1.272 rows=2851 loops=1)
-> Hash (cost=22.95..22.95 rows=74 width=8) (actual time=0.345..0.345 rows=37 loops=1)
-> Hash Join (cost=11.84..22.95 rows=74 width=8) (actual time=0.133..0.320 rows=37 loops=1)
Hash Cond: (u3.id = u4.campaign_ptr_id)
-> Seq Scan on platforms_campaign u3 (cost=0.00..8.91 rows=391 width=4) (actual time=0.006..0.098 rows=196 loops=1)
-> Hash (cost=10.91..10.91 rows=74 width=4) (actual time=0.117..0.117 rows=37 loops=1)
-> Seq Scan on direct_campaign u4 (cost=0.00..10.91 rows=74 width=4) (actual time=0.004..0.097 rows=37 loops=1)
Filter: status_active
Total runtime: 23436.715 ms
(47 rows)
Here it is

Related

How does a string operation on a column in a filter condition of a Postgresql query have on the plan it chooses

I was working on optimising a query, with dumb luck I tried something and it improved the query but I am unable to explain why.
Below is the query with poor performance
with ctedata1 as(
select
sum(total_visit_count) as total_visit_count,
sum(sh_visit_count) as sh_visit_count,
sum(ec_visit_count) as ec_visit_count,
sum(total_like_count) as total_like_count,
sum(sh_like_count) as sh_like_count,
sum(ec_like_count) as ec_like_count,
sum(total_order_count) as total_order_count,
sum(sh_order_count) as sh_order_count,
sum(ec_order_count) as ec_order_count,
sum(total_sales_amount) as total_sales_amount,
sum(sh_sales_amount) as sh_sales_amount,
sum(ec_sales_amount) as ec_sales_amount,
sum(ec_order_online_count) as ec_order_online_count,
sum(ec_sales_online_amount) as ec_sales_online_amount,
sum(ec_order_in_store_count) as ec_order_in_store_count,
sum(ec_sales_in_store_amount) as ec_sales_in_store_amount,
table2.im_name,
table2.brand as kpibrand,
table2.id_region as kpiregion
from
table2
where
deleted_at is null
and id_region = any('{1}')
group by
im_name,
kpiregion,
kpibrand ),
ctedata2 as (
select
ctedata1.*,
rank() over (partition by (kpiregion,
kpibrand)
order by
coalesce(ctedata1.total_sales_amount, 0) desc) rank,
count(*) over (partition by (kpiregion,
kpibrand)) as total_count
from
ctedata1 )
select
table1.id_pf_item,
table1.product_id,
table1.color_code,
table1.l1_code,
table1.local_title as product_name,
table1.id_region,
table1.gender,
case
when table1.created_at is null then '1970/01/01 00:00:00'
else table1.created_at
end as created_at,
(
select
count(distinct id_outfit)
from
table3
left join table4 on
table3.id_item = table4.id_item
and table4.deleted_at is null
where
table3.deleted_at is null
and table3.id_pf_item = table1.id_pf_item) as outfit_count,
count(*) over() as total_matched,
case
when table1.v8_im_name = '' then table1.im_name
else table1.v8_im_name
end as im_name,
case
when table1.id_region != 1 then null
else
case
when table1.sales_start_at is null then '1970/01/01 00:00:00'
else table1.sales_start_at
end
end as sales_start_date,
table1.category_ids,
array_to_string(table1.intermediate_category_ids, ','),
table1.image_url,
table1.brand,
table1.pdp_url,
coalesce(ctedata2.total_visit_count, 0) as total_visit_count,
coalesce(ctedata2.sh_visit_count, 0) as sh_visit_count,
coalesce(ctedata2.ec_visit_count, 0) as ec_visit_count,
coalesce(ctedata2.total_like_count, 0) as total_like_count,
coalesce(ctedata2.sh_like_count, 0) as sh_like_count,
coalesce(ctedata2.ec_like_count, 0) as ec_like_count,
coalesce(ctedata2.total_order_count, 0) as total_order_count,
coalesce(ctedata2.sh_order_count, 0) as sh_order_count,
coalesce(ctedata2.ec_order_count, 0) as ec_order_count,
coalesce(ctedata2.total_sales_amount, 0) as total_sales_amount,
coalesce(ctedata2.sh_sales_amount, 0) as sh_sales_amount,
coalesce(ctedata2.ec_sales_amount, 0) as ec_sales_amount,
coalesce(ctedata2.ec_order_online_count, 0) as ec_order_online_count,
coalesce(ctedata2.ec_sales_online_amount, 0) as ec_sales_online_amount,
coalesce(ctedata2.ec_order_in_store_count, 0) as ec_order_in_store_count,
coalesce(ctedata2.ec_sales_in_store_amount, 0) as ec_sales_in_store_amount,
ctedata2.rank,
ctedata2.total_count,
table1.department,
table1.seasons
from
table1
left join ctedata2 on
table1.im_name = ctedata2.im_name
and table1.brand = ctedata2.kpibrand
where
table1.deleted_at is null
and table1.id_region = any('{1}')
and lower(table1.brand) = any('{"brand1","brand2"}')
and 'season1' = any(lower(seasons::text)::text[])
and table1.department = 'Department1'
order by
total_sales_amount desc offset 0
limit 100
The explain output for above query is
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=172326.55..173435.38 rows=1 width=952) (actual time=85664.201..85665.970 rows=100 loops=1)
CTE ctedata1
-> GroupAggregate (cost=0.42..80478.71 rows=43468 width=530) (actual time=0.063..708.069 rows=73121 loops=1)
Group Key: table2.im_name, table2.id_region, table2.brand
-> Index Scan using udx_table2_im_name_id_region_brand_target_date_key on table2 (cost=0.42..59699.18 rows=391708 width=146) (actual time=0.029..308.582 rows=391779 loops=1)
Filter: ((deleted_at IS NULL) AND (id_region = ANY ('{1}'::integer[])))
Rows Removed by Filter: 20415
CTE ctedata2
-> WindowAgg (cost=16104.06..17842.78 rows=43468 width=628) (actual time=1012.994..1082.057 rows=73121 loops=1)
-> WindowAgg (cost=16104.06..17082.09 rows=43468 width=620) (actual time=945.755..1014.656 rows=73121 loops=1)
-> Sort (cost=16104.06..16212.73 rows=43468 width=612) (actual time=945.747..963.254 rows=73121 loops=1)
Sort Key: ctedata1.kpiregion, ctedata1.kpibrand, (COALESCE(ctedata1.total_sales_amount, '0'::numeric)) DESC
Sort Method: external merge Disk: 6536kB
-> CTE Scan on ctedata1 (cost=0.00..869.36 rows=43468 width=612) (actual time=0.069..824.841 rows=73121 loops=1)
-> Result (cost=74005.05..75113.88 rows=1 width=952) (actual time=85664.199..85665.950 rows=100 loops=1)
-> Sort (cost=74005.05..74005.05 rows=1 width=944) (actual time=85664.072..85664.089 rows=100 loops=1)
Sort Key: (COALESCE(ctedata2.total_sales_amount, '0'::numeric)) DESC
Sort Method: top-N heapsort Memory: 76kB
-> WindowAgg (cost=10960.95..74005.04 rows=1 width=944) (actual time=85658.049..85661.393 rows=3151 loops=1)
-> Nested Loop Left Join (cost=10960.95..74005.02 rows=1 width=927) (actual time=1075.219..85643.595 rows=3151 loops=1)
Join Filter: (((table1.im_name)::text = ctedata2.im_name) AND ((table1.brand)::text = ctedata2.kpibrand))
Rows Removed by Join Filter: 230402986
-> Bitmap Heap Scan on table1 (cost=10960.95..72483.64 rows=1 width=399) (actual time=45.466..278.376 rows=3151 loops=1)
Recheck Cond: (id_region = ANY ('{1}'::integer[]))
Filter: ((deleted_at IS NULL) AND (department = 'Department1'::text) AND (lower((brand)::text) = ANY ('{brand1, brand2}'::text[])) AND ('season1'::text = ANY ((lower((seasons)::text))::text[])))
Rows Removed by Filter: 106335
Heap Blocks: exact=42899
-> Bitmap Index Scan on table1_im_name_id_region_key (cost=0.00..10960.94 rows=110619 width=0) (actual time=38.307..38.307 rows=109486 loops=1)
Index Cond: (id_region = ANY ('{1}'::integer[]))
-> CTE Scan on ctedata2 (cost=0.00..869.36 rows=43468 width=592) (actual time=0.325..21.721 rows=73121 loops=3151)
SubPlan 3
-> Aggregate (cost=1108.80..1108.81 rows=1 width=8) (actual time=0.018..0.018 rows=1 loops=100)
-> Nested Loop Left Join (cost=5.57..1108.57 rows=93 width=4) (actual time=0.007..0.016 rows=3 loops=100)
-> Bitmap Heap Scan on table3 (cost=5.15..350.95 rows=93 width=4) (actual time=0.005..0.008 rows=3 loops=100)
Recheck Cond: (id_pf_item = table1.id_pf_item)
Filter: (deleted_at IS NULL)
Heap Blocks: exact=107
-> Bitmap Index Scan on idx_id_pf_item (cost=0.00..5.12 rows=93 width=0) (actual time=0.003..0.003 rows=3 loops=100)
Index Cond: (id_pf_item = table1.id_pf_item)
-> Index Scan using index_table4_id_item on table4 (cost=0.42..8.14 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=303)
Index Cond: (table3.id_item = id_item)
Filter: (deleted_at IS NULL)
Rows Removed by Filter: 0
Planning time: 1.023 ms
Execution time: 85669.512 ms
I changed
and lower(table1.brand) = any('{"brand1","brand2"}')
in the query to
and table1.brand = any('{"Brand1","Brand2"}')
and the plan changed to
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=173137.44..188661.06 rows=14 width=952) (actual time=1444.123..1445.653 rows=100 loops=1)
CTE ctedata1
-> GroupAggregate (cost=0.42..80478.71 rows=43468 width=530) (actual time=0.040..769.982 rows=73121 loops=1)
Group Key: table2.im_name, table2.id_region, table2.brand
-> Index Scan using udx_table2_item_im_name_id_region_brand_target_date_key on table2 (cost=0.42..59699.18 rows=391708 width=146) (actual time=0.021..350.774 rows=391779 loops=1)
Filter: ((deleted_at IS NULL) AND (id_region = ANY ('{1}'::integer[])))
Rows Removed by Filter: 20415
CTE ctedata2
-> WindowAgg (cost=16104.06..17842.78 rows=43468 width=628) (actual time=1088.905..1153.749 rows=73121 loops=1)
-> WindowAgg (cost=16104.06..17082.09 rows=43468 width=620) (actual time=1020.017..1089.117 rows=73121 loops=1)
-> Sort (cost=16104.06..16212.73 rows=43468 width=612) (actual time=1020.011..1037.170 rows=73121 loops=1)
Sort Key: ctedata1.kpiregion, ctedata1.kpibrand, (COALESCE(ctedata1.total_sales_amount, '0'::numeric)) DESC
Sort Method: external merge Disk: 6536kB
-> CTE Scan on ctedata1 (cost=0.00..869.36 rows=43468 width=612) (actual time=0.044..891.653 rows=73121 loops=1)
-> Result (cost=74815.94..90339.56 rows=14 width=952) (actual time=1444.121..1445.635 rows=100 loops=1)
-> Sort (cost=74815.94..74815.98 rows=14 width=944) (actual time=1444.053..1444.065 rows=100 loops=1)
Sort Key: (COALESCE(ctedata2.total_sales_amount, '0'::numeric)) DESC
Sort Method: top-N heapsort Memory: 76kB
-> WindowAgg (cost=72207.31..74815.68 rows=14 width=944) (actual time=1439.128..1441.885 rows=3151 loops=1)
-> Hash Right Join (cost=72207.31..74815.40 rows=14 width=927) (actual time=1307.531..1437.246 rows=3151 loops=1)
Hash Cond: ((ctedata2.im_name = (table1.im_name)::text) AND (ctedata2.kpibrand = (table1.brand)::text))
-> CTE Scan on ctedata2 (cost=0.00..869.36 rows=43468 width=592) (actual time=1088.911..1209.646 rows=73121 loops=1)
-> Hash (cost=72207.10..72207.10 rows=14 width=399) (actual time=216.850..216.850 rows=3151 loops=1)
Buckets: 4096 (originally 1024) Batches: 1 (originally 1) Memory Usage: 1249kB
-> Bitmap Heap Scan on table1 (cost=10960.95..72207.10 rows=14 width=399) (actual time=46.434..214.246 rows=3151 loops=1)
Recheck Cond: (id_region = ANY ('{1}'::integer[]))
Filter: ((deleted_at IS NULL) AND (department = 'Department1'::text) AND ((brand)::text = ANY ('{Brand1, Brand2}'::text[])) AND ('season1'::text = ANY ((lower((seasons)::text))::text[])))
Rows Removed by Filter: 106335
Heap Blocks: exact=42899
-> Bitmap Index Scan on table1_im_name_id_region_key (cost=0.00..10960.94 rows=110619 width=0) (actual time=34.849..34.849 rows=109486 loops=1)
Index Cond: (id_region = ANY ('{1}'::integer[]))
SubPlan 3
-> Aggregate (cost=1108.80..1108.81 rows=1 width=8) (actual time=0.015..0.015 rows=1 loops=100)
-> Nested Loop Left Join (cost=5.57..1108.57 rows=93 width=4) (actual time=0.006..0.014 rows=3 loops=100)
-> Bitmap Heap Scan on table3 (cost=5.15..350.95 rows=93 width=4) (actual time=0.004..0.006 rows=3 loops=100)
Recheck Cond: (id_pf_item = table1.id_pf_item)
Filter: (deleted_at IS NULL)
Heap Blocks: exact=107
-> Bitmap Index Scan on idx_id_pf_item (cost=0.00..5.12 rows=93 width=0) (actual time=0.003..0.003 rows=3 loops=100)
Index Cond: (id_pf_item = table1.id_pf_item)
-> Index Scan using index_table4_id_item on table4 (cost=0.42..8.14 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=303)
Index Cond: (table3.id_item = id_item)
Filter: (deleted_at IS NULL)
Rows Removed by Filter: 0
Planning time: 0.760 ms
Execution time: 1448.848 ms
My Observation
The join strategy for table1 left join ctedata2 changes after the lower() function is avoided. The strategy changes from nested loop left join to hash right join.
The CTE Scan node on ctedata2 is executed only once in the better performing query.
Postgres Version
9.6
Please help me to understand this behaviour. I will supply additional info if required.
It is almost not worthwhile taking a deep dive into the inner workings of a nearly-obsolete version. That time and energy is probably better spent jollying along an upgrade.
But the problem is pretty plain. Your scan on table1 is estimated dreadfully, although 14 times less dreadful in the better plan.
-> Bitmap Heap Scan on table1 (cost=10960.95..72483.64 rows=1 width=399) (actual time=45.466..278.376 rows=3151 loops=1)
-> Bitmap Heap Scan on table1 (cost=10960.95..72207.10 rows=14 width=399) (actual time=46.434..214.246 rows=3151 loops=1)
Your use of lower(), apparently without reason, surely contributes to the poor estimation. And dynamically converting a string into an array certainly doesn't help either. If it were stored as a real array in the first place, the statistics system could get its hands on it and generate more reasonable estimates.

Query using WHERE on a fixed value performs badly. Why is this?

I use a simple query with a WHERE on a fixed value. This query does a left join on a temporary view. For some reason this query is performing very badly. I guess that the view is being executed for every row and not only for the selected rows. When I replace the fixed value with a value from a temporary table, the query performs MUCH better (about 15-20 times faster). Why is this?
I use postgresql version 9.2.15.
I added a temporary table 'wrkovz91-selecties' with only 1 record to pass the selection-value of the WHERE instruction to the query.
The view 'view_wrk_012_000001' that is joined in the query is pretty 'heavy', because it contains a nested other view ('view_wrk_013_000006').
First create the temporary table and add one record:
CREATE TEMPORARY TABLE WRKOVZ91_SELECTIES
(
NR DECIMAL(006) NOT NULL,
KLANTNR DECIMAL(008),
CONSTRAINT WRKOVZ91_SELECTIES_KEY_001 PRIMARY KEY (NR)
);
INSERT INTO WRKOVZ91_SELECTIES
(NR, KLANTNR) VALUES (1, 1);
Then create the temporary view (with a nested view):
CREATE TEMPORARY VIEW view_wrk_013_000006 AS
SELECT vrr.klantnr AS klantnr,
UPPER(vrr.artnr) AS artnr,
vrr.partijnr AS partijnr,
SUM(vrr.kollo) AS colli_op_voorraad
FROM voorraad vrr
WHERE (vrr.status = 'A'::text)
GROUP BY 1,
2,
3;
CREATE TEMPORARY VIEW view_wrk_012_000001 AS
SELECT COALESCE(wrd.klantnr,0) AS klantnr,
COALESCE(wrd.wrkopdrnr,0) AS wrkopdrnr,
MIN(COALESCE(wrd.recept_benodigd_colli,0) *
COALESCE(wrk.te_produceren,0)) AS vrije_voorraad_ind,
MIN(COALESCE(v40.colli_op_voorraad,0)) AS hulpveld
FROM wrkopdr wrk
LEFT JOIN wrkopdrd wrd ON wrd.klantnr = wrk.klantnr
AND wrd.wrkopdrnr = wrk.wrkopdrnr
LEFT JOIN view_wrk_013_000006 v40 ON v40.klantnr =
COALESCE(wrd.klantnr_grondstof,0)
AND v40.artnr = COALESCE(wrd.artnr_grondstof,'')
AND v40.partijnr = COALESCE(wrd.partijnr_grondstof,0)
LEFT JOIN artikel art ON art.klantnr = COALESCE(wrd.klantnr_grondstof,0)
AND art.artnr = COALESCE(wrd.artnr_grondstof,'')
WHERE wrk.status = 'A'::text
AND art.voorraadhoudend_jn = 'J'::text
GROUP BY 1,
2;
The query that performs badly is simple:
SELECT WRK.KLANTNR,
WRK.WRKOPDRNR,
WRK.MIL_UITVOER_DATUM
FROM WRKOPDR WRK
LEFT JOIN VIEW_WRK_012_000001 V38 ON V38.KLANTNR = WRK.KLANTNR AND
V38.WRKOPDRNR = WRK.WRKOPDRNR
LEFT JOIN WRKOVZ91_SELECTIES S02 ON S02.NR = 1
WHERE WRK.KLANTNR = 1
LIMIT 9999;
The query that performs MUCH better is (note the small difference):
SELECT WRK.KLANTNR,
WRK.WRKOPDRNR,
WRK.MIL_UITVOER_DATUM
FROM WRKOPDR WRK
LEFT JOIN VIEW_WRK_012_000001 V38 ON V38.KLANTNR = WRK.KLANTNR AND
V38.WRKOPDRNR = WRK.WRKOPDRNR
LEFT JOIN WRKOVZ91_SELECTIES S02 ON S02.NR = 1
WHERE WRK.KLANTNR = S02.KLANTNR
LIMIT 9999;
I cannot understand why the slow query is performing so badly. It takes in my test-data about 219 secondes. The fast query is taking only 12 secondes. The only difference between the 2 queries is the selection-value in the WHERE.
Does anyone have an explanation for this behaviour?
In addition, the output of the explain analyse of the slow query is:
Limit (cost=19460.18..19972.73 rows=9999 width=18) (actual time=221573.343..221585.953 rows=9999 loops=1)
-> Hash Left Join (cost=19460.18..20913.07 rows=28344 width=18) (actual time=221573.341..221583.701 rows=9999 loops=1)
Hash Cond: ((wrk.klantnr = v38.klantnr) AND (wrk.wrkopdrnr = v38.wrkopdrnr))
-> Seq Scan on wrkopdr wrk (cost=0.00..1240.30 rows=28344 width=18) (actual time=0.055..5.490 rows=9999 loops=1)
Filter: (klantnr = 1::numeric)
-> Hash (cost=19460.17..19460.17 rows=1 width=64) (actual time=221573.254..221573.254 rows=2621 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 113kB
-> Subquery Scan on v38 (cost=19460.15..19460.17 rows=1 width=64) (actual time=221570.429..221572.158 rows=2621 loops=1)
-> HashAggregate (cost=19460.15..19460.16 rows=1 width=52) (actual time=221570.429..221571.499 rows=2621 loops=1)
-> Nested Loop Left Join (cost=14049.90..19460.14 rows=1 width=52) (actual time=225.848..221495.813 rows=6801 loops=1)
Join Filter: ((vrr.klantnr = COALESCE(wrd.klantnr_grondstof, 0::numeric)) AND ((upper(vrr.artnr)) = COALESCE(wrd.artnr_grondstof, ''::text)) AND (vrr.partijnr = COALESCE(wrd.partijnr_grondstof, 0::numeric)))
Rows Removed by Join Filter: 308209258
-> Nested Loop (cost=1076.77..6011.60 rows=1 width=43) (actual time=9.506..587.824 rows=6801 loops=1)
-> Hash Right Join (cost=1076.77..5828.70 rows=69 width=43) (actual time=9.428..204.601 rows=7861 loops=1)
Hash Cond: ((wrd.klantnr = wrk.klantnr) AND (wrd.wrkopdrnr = wrk.wrkopdrnr))
Filter: (COALESCE(wrd.klantnr, 0::numeric) = 1::numeric)
Rows Removed by Filter: 1
-> Seq Scan on wrkopdrd wrd (cost=0.00..3117.73 rows=116873 width=38) (actual time=0.013..65.472 rows=116873 loops=1)
-> Hash (cost=1026.34..1026.34 rows=3362 width=16) (actual time=9.324..9.324 rows=3362 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 161kB
-> Bitmap Heap Scan on wrkopdr wrk (cost=98.31..1026.34 rows=3362 width=16) (actual time=0.850..7.843 rows=3362 loops=1)
Recheck Cond: (status = 'A'::text)
-> Bitmap Index Scan on wrkopdr_key_002 (cost=0.00..97.47 rows=3362 width=0) (actual time=0.763..0.763 rows=3362 loops=1)
Index Cond: (status = 'A'::text)
-> Index Scan using artikel_key_001 on artikel art (cost=0.00..2.64 rows=1 width=17) (actual time=0.037..0.043 rows=1 loops=7861)
Index Cond: ((klantnr = COALESCE(wrd.klantnr_grondstof, 0::numeric)) AND (artnr = COALESCE(wrd.artnr_grondstof, ''::text)))
Filter: (voorraadhoudend_jn = 'J'::text)
Rows Removed by Filter: 0
-> HashAggregate (cost=12973.14..13121.70 rows=11885 width=25) (actual time=0.027..19.661 rows=45319 loops=6801)
-> Seq Scan on voorraad vrr (cost=0.00..12340.71 rows=63243 width=25) (actual time=0.456..122.855 rows=62655 loops=1)
Filter: (status = 'A'::text)
Rows Removed by Filter: 113953
Total runtime: 221587.386 ms
(33 rows)
The output of the explain analyse of the fast query is:
Limit (cost=56294.55..57997.06 rows=142 width=18) (actual time=445.371..12739.474 rows=9999 loops=1)
-> Nested Loop Left Join (cost=56294.55..57997.06 rows=142 width=18) (actual time=445.368..12736.035 rows=9999 loops=1)
Join Filter: ((v38.klantnr = wrk.klantnr) AND (v38.wrkopdrnr = wrk.wrkopdrnr))
Rows Removed by Join Filter: 26206807
-> Nested Loop (cost=0.00..1532.01 rows=142 width=18) (actual time=0.055..18.652 rows=9999 loops=1)
Join Filter: (wrk.klantnr = s02.klantnr)
-> Index Scan using wrkovz91_selecties_key_001 on wrkovz91_selecties s02 (cost=0.00..8.27 rows=1 width=14) (actual time=0.021..0.021 rows=1 loops=1)
Index Cond: (nr = 1::numeric)
-> Seq Scan on wrkopdr wrk (cost=0.00..1169.44 rows=28344 width=18) (actual time=0.026..11.905 rows=9999 loops=1)
-> Materialize (cost=56294.55..56296.25 rows=68 width=64) (actual time=0.044..0.380 rows=2621 loops=9999)
-> Subquery Scan on v38 (cost=56294.55..56295.91 rows=68 width=64) (actual time=441.797..443.503 rows=2621 loops=1)
-> HashAggregate (cost=56294.55..56295.23 rows=68 width=52) (actual time=441.795..442.848 rows=2621 loops=1)
-> Hash Left Join (cost=14525.30..56293.70 rows=68 width=52) (actual time=255.847..433.386 rows=6801 loops=1)
Hash Cond: ((COALESCE(wrd.klantnr_grondstof, 0::numeric) = v40.klantnr) AND (COALESCE(wrd.artnr_grondstof, ''::text) = v40.artnr) AND (COALESCE(wrd.partijnr_grondstof, 0::numeric) = v40.partijnr))
-> Nested Loop (cost=1076.77..42541.70 rows=68 width=43) (actual time=10.356..171.502 rows=6801 loops=1)
-> Hash Right Join (cost=1076.77..5794.04 rows=13863 width=43) (actual time=10.286..91.471 rows=7862 loops=1)
Hash Cond: ((wrd.klantnr = wrk.klantnr) AND (wrd.wrkopdrnr = wrk.wrkopdrnr))
-> Seq Scan on wrkopdrd wrd (cost=0.00..3117.73 rows=116873 width=38) (actual time=0.014..38.276 rows=116873 loops=1)
-> Hash (cost=1026.34..1026.34 rows=3362 width=16) (actual time=10.179..10.179 rows=3362 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 161kB
-> Bitmap Heap Scan on wrkopdr wrk (cost=98.31..1026.34 rows=3362 width=16) (actual time=0.835..8.667 rows=3362 loops=1)
Recheck Cond: (status = 'A'::text)
-> Bitmap Index Scan on wrkopdr_key_002 (cost=0.00..97.47 rows=3362 width=0) (actual time=0.748..0.748 rows=3362 loops=1)
Index Cond: (status = 'A'::text)
-> Index Scan using artikel_key_001 on artikel art (cost=0.00..2.64 rows=1 width=17) (actual time=0.009..0.009 rows=1 loops=7862)
Index Cond: ((klantnr = COALESCE(wrd.klantnr_grondstof, 0::numeric)) AND (artnr = COALESCE(wrd.artnr_grondstof, ''::text)))
Filter: (voorraadhoudend_jn = 'J'::text)
Rows Removed by Filter: 0
-> Hash (cost=13240.55..13240.55 rows=11885 width=74) (actual time=245.430..245.430 rows=45319 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 2645kB
-> Subquery Scan on v40 (cost=12973.14..13240.55 rows=11885 width=74) (actual time=186.156..222.042 rows=45319 loops=1)
-> HashAggregate (cost=12973.14..13121.70 rows=11885 width=25) (actual time=186.154..209.633 rows=45319 loops=1)
-> Seq Scan on voorraad vrr (cost=0.00..12340.71 rows=63243 width=25) (actual time=0.453..126.361 rows=62655 loops=1)
Filter: (status = 'A'::text)
Rows Removed by Filter: 113953
Total runtime: 12742.125 ms
(36 rows)

Postgres Table Slow Performance

We have a Product table in postgres DB. This is hosted on Heroku. We have 8 GB RAM and 250 GB disk space. 1000 IPOP allowed.
We are having proper indexes on columns.
Platform
PostgreSQL 9.5.12 on x86_64-pc-linux-gnu (Ubuntu 9.5.12-1.pgdg14.04+1), compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit
We are running a keywords search query on this table. We are having 2.8 millions records in this table. Our search query is too slow. Its giving us result in about 50 seconds. Which is too slow.
Query
SELECT
P .sfid AS prodsfid,
P .image_url__c image,
P .productcode sku,
P .Short_Description__c shortDesc,
P . NAME pname,
P .category__c,
P .price__c price,
P .description,
P .vendor_name__c vname,
P .vendor__c supSfid
FROM
staging.product2 P
JOIN (
SELECT
p1.sfid
FROM
staging.product2 p1
WHERE
p1. NAME ILIKE '%s%'
OR p1.productcode ILIKE '%s%'
) AS TEMP ON (P .sfid = TEMP .sfid)
WHERE
P .status__c = 'Available'
AND LOWER (
P .vendor_shipping_country__c
) = ANY (
VALUES
('us'),
('usa'),
('united states'),
('united states of america')
)
AND P .vendor_catalog_tier__c = ANY (
VALUES
('a1c37000000oljnAAA'),
('a1c37000000oljQAAQ'),
('a1c37000000oljQAAQ'),
('a1c37000000pT7IAAU'),
('a1c37000000omDjAAI'),
('a1c37000000oljMAAQ'),
('a1c37000000oljaAAA'),
('a1c37000000pT7SAAU'),
('a1c0R000000AFcVQAW'),
('a1c0R000000A1HAQA0'),
('a1c0R0000000OpWQAU'),
('a1c0R0000005TZMQA2'),
('a1c37000000oljdAAA'),
('a1c37000000ooTqAAI'),
('a1c37000000omLBAAY'),
('a1c0R0000005N8GQAU')
)
Here is the explain plan:
Nested Loop (cost=31.85..33886.54 rows=3681 width=750)
-> Hash Join (cost=31.77..31433.07 rows=4415 width=750)
Hash Cond: (lower((p.vendor_shipping_country__c)::text) = "*VALUES*".column1)
-> Nested Loop (cost=31.73..31423.67 rows=8830 width=761)
-> HashAggregate (cost=0.06..0.11 rows=16 width=32)
Group Key: "*VALUES*_1".column1
-> Values Scan on "*VALUES*_1" (cost=0.00..0.06 rows=16 width=32)
-> Bitmap Heap Scan on product2 p (cost=31.66..1962.32 rows=552 width=780)
Recheck Cond: ((vendor_catalog_tier__c)::text = "*VALUES*_1".column1)
Filter: ((status__c)::text = 'Available'::text)
-> Bitmap Index Scan on vendor_catalog_tier_prd_idx (cost=0.00..31.64 rows=1016 width=0)
Index Cond: ((vendor_catalog_tier__c)::text = "*VALUES*_1".column1)
-> Hash (cost=0.03..0.03 rows=4 width=32)
-> Unique (cost=0.02..0.03 rows=4 width=32)
-> Sort (cost=0.02..0.02 rows=4 width=32)
Sort Key: "*VALUES*".column1
-> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=4 width=32)
-> Index Scan using sfid_prd_idx on product2 p1 (cost=0.09..0.55 rows=1 width=19)
Index Cond: ((sfid)::text = (p.sfid)::text)
Filter: (((name)::text ~~* '%s%'::text) OR ((productcode)::text ~~* '%s%'::text))
Its returning around 140,576 records. By the way we need only top 5,000 records only. Will putting Limit help here?
Let me know how to make it fast and what is causing this slow.
EXPLAIN ANALYZE
#RaymondNijland Here is the explain analyze
Nested Loop (cost=31.83..33427.28 rows=4039 width=750) (actual time=1.903..4384.221 rows=140576 loops=1)
-> Hash Join (cost=31.74..30971.32 rows=4369 width=750) (actual time=1.852..1094.964 rows=164353 loops=1)
Hash Cond: (lower((p.vendor_shipping_country__c)::text) = "*VALUES*".column1)
-> Nested Loop (cost=31.70..30962.02 rows=8738 width=761) (actual time=1.800..911.738 rows=164353 loops=1)
-> HashAggregate (cost=0.06..0.11 rows=16 width=32) (actual time=0.012..0.019 rows=15 loops=1)
Group Key: "*VALUES*_1".column1
-> Values Scan on "*VALUES*_1" (cost=0.00..0.06 rows=16 width=32) (actual time=0.004..0.005 rows=16 loops=1)
-> Bitmap Heap Scan on product2 p (cost=31.64..1933.48 rows=546 width=780) (actual time=26.004..57.290 rows=10957 loops=15)
Recheck Cond: ((vendor_catalog_tier__c)::text = "*VALUES*_1".column1)
Filter: ((status__c)::text = 'Available'::text)
Rows Removed by Filter: 645
Heap Blocks: exact=88436
-> Bitmap Index Scan on vendor_catalog_tier_prd_idx (cost=0.00..31.61 rows=1000 width=0) (actual time=24.811..24.811 rows=11601 loops=15)
Index Cond: ((vendor_catalog_tier__c)::text = "*VALUES*_1".column1)
-> Hash (cost=0.03..0.03 rows=4 width=32) (actual time=0.032..0.032 rows=4 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Unique (cost=0.02..0.03 rows=4 width=32) (actual time=0.026..0.027 rows=4 loops=1)
-> Sort (cost=0.02..0.02 rows=4 width=32) (actual time=0.026..0.026 rows=4 loops=1)
Sort Key: "*VALUES*".column1
Sort Method: quicksort Memory: 25kB
-> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=4 width=32) (actual time=0.001..0.002 rows=4 loops=1)
-> Index Scan using sfid_prd_idx on product2 p1 (cost=0.09..0.56 rows=1 width=19) (actual time=0.019..0.020 rows=1 loops=164353)
Index Cond: ((sfid)::text = (p.sfid)::text)
Filter: (((name)::text ~~* '%s%'::text) OR ((productcode)::text ~~* '%s%'::text))
Rows Removed by Filter: 0
Planning time: 2.488 ms
Execution time: 4391.378 ms
Another query version, with order by , but it seems very slow as well (140 seconds)
SELECT
P .sfid AS prodsfid,
P .image_url__c image,
P .productcode sku,
P .Short_Description__c shortDesc,
P . NAME pname,
P .category__c,
P .price__c price,
P .description,
P .vendor_name__c vname,
P .vendor__c supSfid
FROM
staging.product2 P
WHERE
P .status__c = 'Available'
AND P .vendor_shipping_country__c IN (
'us',
'usa',
'united states',
'united states of america'
)
AND P .vendor_catalog_tier__c IN (
'a1c37000000omDQAAY',
'a1c37000000omDTAAY',
'a1c37000000omDXAAY',
'a1c37000000omDYAAY',
'a1c37000000omDZAAY',
'a1c37000000omDdAAI',
'a1c37000000omDfAAI',
'a1c37000000omDiAAI',
'a1c37000000oml6AAA',
'a1c37000000oljPAAQ',
'a1c37000000oljRAAQ',
'a1c37000000oljWAAQ',
'a1c37000000oljXAAQ',
'a1c37000000oljZAAQ',
'a1c37000000oljcAAA',
'a1c37000000oljdAAA',
'a1c37000000oljlAAA',
'a1c37000000oljoAAA',
'a1c37000000oljqAAA',
'a1c37000000olnvAAA',
'a1c37000000olnwAAA',
'a1c37000000olnxAAA',
'a1c37000000olnyAAA',
'a1c37000000olo0AAA',
'a1c37000000olo1AAA',
'a1c37000000olo4AAA',
'a1c37000000olo8AAA',
'a1c37000000olo9AAA',
'a1c37000000oloCAAQ',
'a1c37000000oloFAAQ',
'a1c37000000oloIAAQ',
'a1c37000000oloJAAQ',
'a1c37000000oloMAAQ',
'a1c37000000oloNAAQ',
'a1c37000000oloSAAQ',
'a1c37000000olodAAA',
'a1c37000000oloeAAA',
'a1c37000000olzCAAQ',
'a1c37000000om0xAAA',
'a1c37000000ooV1AAI',
'a1c37000000oog8AAA',
'a1c37000000oogDAAQ',
'a1c37000000oonzAAA',
'a1c37000000oluuAAA',
'a1c37000000pT7SAAU',
'a1c37000000oljnAAA',
'a1c37000000olumAAA',
'a1c37000000oljpAAA',
'a1c37000000pUm2AAE',
'a1c37000000olo3AAA',
'a1c37000000oo1MAAQ',
'a1c37000000oo1vAAA',
'a1c37000000pWxgAAE',
'a1c37000000pYJkAAM',
'a1c37000000omDjAAI',
'a1c37000000ooTgAAI',
'a1c37000000op2GAAQ',
'a1c37000000one0AAA',
'a1c37000000oljYAAQ',
'a1c37000000pUlxAAE',
'a1c37000000oo9SAAQ',
'a1c37000000pcIYAAY',
'a1c37000000pamtAAA',
'a1c37000000pd2QAAQ',
'a1c37000000pdCOAAY',
'a1c37000000OpPaAAK',
'a1c37000000OphZAAS',
'a1c37000000olNkAAI'
)
ORDER BY p.productcode asc
LIMIT 5000
Here is the explain analyse for this:
Limit (cost=0.09..45271.54 rows=5000 width=750) (actual time=48593.355..86376.864 rows=5000 loops=1)
-> Index Scan using productcode_prd_idx on product2 p (cost=0.09..743031.39 rows=82064 width=750) (actual time=48593.353..86376.283 rows=5000 loops=1)
Filter: (((status__c)::text = 'Available'::text) AND ((vendor_shipping_country__c)::text = ANY ('{us,usa,"united states","united states of america"}'::text[])) AND ((vendor_catalog_tier__c)::text = ANY ('{a1c37000000omDQAAY,a1c37000000omDTAAY,a1c37000000omDXAAY,a1c37000000omDYAAY,a1c37000000omDZAAY,a1c37000000omDdAAI,a1c37000000omDfAAI,a1c37000000omDiAAI,a1c37000000oml6AAA,a1c37000000oljPAAQ,a1c37000000oljRAAQ,a1c37000000oljWAAQ,a1c37000000oljXAAQ,a1c37000000oljZAAQ,a1c37000000oljcAAA,a1c37000000oljdAAA,a1c37000000oljlAAA,a1c37000000oljoAAA,a1c37000000oljqAAA,a1c37000000olnvAAA,a1c37000000olnwAAA,a1c37000000olnxAAA,a1c37000000olnyAAA,a1c37000000olo0AAA,a1c37000000olo1AAA,a1c37000000olo4AAA,a1c37000000olo8AAA,a1c37000000olo9AAA,a1c37000000oloCAAQ,a1c37000000oloFAAQ,a1c37000000oloIAAQ,a1c37000000oloJAAQ,a1c37000000oloMAAQ,a1c37000000oloNAAQ,a1c37000000oloSAAQ,a1c37000000olodAAA,a1c37000000oloeAAA,a1c37000000olzCAAQ,a1c37000000om0xAAA,a1c37000000ooV1AAI,a1c37000000oog8AAA,a1c37000000oogDAAQ,a1c37000000oonzAAA,a1c37000000oluuAAA,a1c37000000pT7SAAU,a1c37000000oljnAAA,a1c37000000olumAAA,a1c37000000oljpAAA,a1c37000000pUm2AAE,a1c37000000olo3AAA,a1c37000000oo1MAAQ,a1c37000000oo1vAAA,a1c37000000pWxgAAE,a1c37000000pYJkAAM,a1c37000000omDjAAI,a1c37000000ooTgAAI,a1c37000000op2GAAQ,a1c37000000one0AAA,a1c37000000oljYAAQ,a1c37000000pUlxAAE,a1c37000000oo9SAAQ,a1c37000000pcIYAAY,a1c37000000pamtAAA,a1c37000000pd2QAAQ,a1c37000000pdCOAAY,a1c37000000OpPaAAK,a1c37000000OphZAAS,a1c37000000olNkAAI}'::text[])))
Rows Removed by Filter: 1707920
Planning time: 1.685 ms
Execution time: 86377.139 ms
Thanks
Aslam Bari
You might want to consider a GIN or GIST index on your staging.product2 table. Double-sided ILIKEs are slow and difficult to improve substantially. I've seen a GIN index improve a similar query by 60-80%.
See this doc.

postgres two column sort low performance

I've got a query that performs multiple joins. I try to get only those positions of each keyword that are latest in results.
Here is the query:
SELECT DISTINCT ON (p.keyword_id)
a.id AS account_id,
w.parent_id AS parent_id,
w.name AS name,
p.position AS position
FROM websites w
JOIN accounts a ON w.account_id = a.id
JOIN keywords k ON k.website_id = w.parent_id
JOIN positions p ON p.website_id = w.parent_id
WHERE a.amount > 0 AND w.parent_id NOTNULL AND (round((a.amount / a.payment_renewal_period), 2) BETWEEN 1 AND 19)
ORDER BY p.keyword_id, p.created_at DESC;
Plan with costs for that query is as follows:
Unique (cost=73673.65..76630.38 rows=264 width=40) (actual time=30777.117..49143.023 rows=259 loops=1)
-> Sort (cost=73673.65..75152.02 rows=591347 width=40) (actual time=30777.116..47352.373 rows=10891486 loops=1)
Sort Key: p.keyword_id, p.created_at DESC
Sort Method: external merge Disk: 512672kB
-> Merge Join (cost=219.59..812.26 rows=591347 width=40) (actual time=3.487..3827.028 rows=10891486 loops=1)
Merge Cond: (w.parent_id = k.website_id)
-> Nested Loop (cost=128.46..597.73 rows=1268 width=44) (actual time=3.378..108.915 rows=61582 loops=1)
-> Nested Loop (cost=2.28..39.86 rows=1 width=28) (actual time=0.026..0.216 rows=7 loops=1)
-> Index Scan using index_websites_on_parent_id on websites w (cost=0.14..15.08 rows=4 width=28) (actual time=0.004..0.023 rows=7 loops=1)
Index Cond: (parent_id IS NOT NULL)
-> Bitmap Heap Scan on accounts a (cost=2.15..6.18 rows=1 width=4) (actual time=0.019..0.020 rows=1 loops=7)
Recheck Cond: (id = w.account_id)
Filter: ((amount > '0'::numeric) AND (round((amount / (payment_renewal_period)::numeric), 2) >= '1'::numeric) AND (round((amount / (payment_renewal_period)::numeric), 2) <= '19'::numeric))
Heap Blocks: exact=7
-> Bitmap Index Scan on accounts_pkey (cost=0.00..2.15 rows=1 width=0) (actual time=0.006..0.006 rows=1 loops=7)
Index Cond: (id = w.account_id)
-> Bitmap Heap Scan on positions p (cost=126.18..511.57 rows=4631 width=16) (actual time=0.994..8.226 rows=8797 loops=7)
Recheck Cond: (website_id = w.parent_id)
Heap Blocks: exact=1004
-> Bitmap Index Scan on index_positions_on_5_columns (cost=0.00..125.02 rows=4631 width=0) (actual time=0.965..0.965 rows=8797 loops=7)
Index Cond: (website_id = w.parent_id)
-> Sort (cost=18.26..18.92 rows=264 width=4) (actual time=0.106..1013.966 rows=10891487 loops=1)
Sort Key: k.website_id
Sort Method: quicksort Memory: 37kB
-> Seq Scan on keywords k (cost=0.00..7.64 rows=264 width=4) (actual time=0.005..0.039 rows=263 loops=1)
Planning time: 1.081 ms
Execution time: 49184.222 ms
The thing is when I run query with w.id instead of w.parent_id in join positions part total cost decreases to
Unique (cost=3621.07..3804.99 rows=264 width=40) (actual time=128.430..139.550 rows=259 loops=1)
-> Sort (cost=3621.07..3713.03 rows=36784 width=40) (actual time=128.429..135.444 rows=40385 loops=1)
Sort Key: p.keyword_id, p.created_at DESC
Sort Method: external sort Disk: 2000kB
-> Merge Join (cost=128.73..831.59 rows=36784 width=40) (actual time=25.521..63.299 rows=40385 loops=1)
Merge Cond: (k.website_id = w.id)
-> Index Only Scan using index_keywords_on_website_id_deleted_at on keywords k (cost=0.27..24.23 rows=264 width=4) (actual time=0.137..0.274 rows=263 loops=1)
Heap Fetches: 156
-> Materialize (cost=128.46..606.85 rows=1268 width=44) (actual time=3.772..49.587 rows=72242 loops=1)
-> Nested Loop (cost=128.46..603.68 rows=1268 width=44) (actual time=3.769..30.530 rows=61582 loops=1)
-> Nested Loop (cost=2.28..45.80 rows=1 width=32) (actual time=0.047..0.204 rows=7 loops=1)
-> Index Scan using websites_pkey on websites w (cost=0.14..21.03 rows=4 width=32) (actual time=0.007..0.026 rows=7 loops=1)
Filter: (parent_id IS NOT NULL)
Rows Removed by Filter: 4
-> Bitmap Heap Scan on accounts a (cost=2.15..6.18 rows=1 width=4) (actual time=0.018..0.019 rows=1 loops=7)
Recheck Cond: (id = w.account_id)
Filter: ((amount > '0'::numeric) AND (round((amount / (payment_renewal_period)::numeric), 2) >= '1'::numeric) AND (round((amount / (payment_renewal_period)::numeric), 2) <= '19'::numeric))
Heap Blocks: exact=7
-> Bitmap Index Scan on accounts_pkey (cost=0.00..2.15 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops=7)
Index Cond: (id = w.account_id)
-> Bitmap Heap Scan on positions p (cost=126.18..511.57 rows=4631 width=16) (actual time=0.930..2.341 rows=8797 loops=7)
Recheck Cond: (website_id = w.parent_id)
Heap Blocks: exact=1004
-> Bitmap Index Scan on index_positions_on_5_columns (cost=0.00..125.02 rows=4631 width=0) (actual time=0.906..0.906 rows=8797 loops=7)
Index Cond: (website_id = w.parent_id)
Planning time: 1.124 ms
Execution time: 157.167 ms
Indexes on websites
Indexes:
"websites_pkey" PRIMARY KEY, btree (id)
"index_websites_on_account_id" btree (account_id)
"index_websites_on_deleted_at" btree (deleted_at)
"index_websites_on_domain_id" btree (domain_id)
"index_websites_on_parent_id" btree (parent_id)
"index_websites_on_processed_at" btree (processed_at)
Indexes on positions
Indexes:
"positions_pkey" PRIMARY KEY, btree (id)
"index_positions_on_5_columns" UNIQUE, btree (website_id, keyword_id, created_at, engine_id, region_id)
"overlap_index" btree (keyword_id, created_at)
The second EXPLAIN output shows more than 200 times fewer rows, so it is hardly surprising that sorting is much faster.
You will notice that the sort spills to disk in both cases (Sort Method: external merge Disk: ...kB). If you can keep the sort in memory by raising work_mem, it will be much faster.
But the first sort is so large that you won't be able to fit it in memory.
Ideas to speed up the query:
An index on (keyword_id, created_at)for positions. Not sure if that helps though.
Do the filtering first, like this:
SELECT
a.id AS account_id,
w.parent_id AS parent_id,
w.name AS name,
p.position AS position
FROM (SELECT DISTINCT ON (keyword_id)
positions,
website_id,
keyword_id,
created_at
FROM positions
ORDER BY keyword_id, created_at DESC) p
JOIN ...
WHERE ...
ORDER BY p.keyword_id, p.created_at DESC;
Remark: The DISTINCT ON is somewhat strange, since you do not ORDER BY the values of the SELECT list, so the result values are not well defined.

Optimize the query (may be avoid nested loop)

How can we optimize the following query:
select *
from program_infos pi
join endeavour_organizations seller_organization on seller_organization.id = pi.supplier_id
join endeavour_organizations obligor_organization on obligor_organization.id = pi.buyer_id
join invoices i on pi.program_id = i.program_id
join assets fa on fa.invoice_id = i.id and fa.owner_id=pi.fi_id
join assets sa on sa.invoice_id = i.id and sa.owner_id=pi.supplier_id;
The corresponding Explain Analyze is :
Nested Loop (cost=36.94..70919.65 rows=505 width=793) (actual time=0.263..1729.519 rows=267238 loops=1)
-> Nested Loop (cost=36.79..70806.58 rows=505 width=718) (actual time=0.261..1405.417 rows=267238 loops=1)
Join Filter: ((i.id = fa.invoice_id) AND (pi.fi_id = fa.owner_id))
Rows Removed by Join Filter: 400287
-> Hash Join (cost=36.37..69201.99 rows=2567 width=626) (actual time=0.255..772.895 rows=248735 loops=1)
Hash Cond: (pi.supplier_id = seller_organization.id)
-> Hash Join (cost=27.52..68973.45 rows=15977 width=551) (actual time=0.202..672.442 rows=248735 loops=1)
Hash Cond: ((sa.owner_id = pi.supplier_id) AND (i.program_id = pi.program_id))
-> Merge Join (cost=1.29..63781.02 rows=667525 width=288) (actual time=0.021..496.274 rows=667525 loops=1)
Merge Cond: (i.id = sa.invoice_id)
-> Index Scan using invoices_pkey on invoices i (cost=0.42..27363.52 rows=249447 width=196) (actual time=0.004..60.598 rows=249440 loops=1)
-> Index Scan using index_assets_invoice on assets sa (cost=0.42..27450.72 rows=667525 width=92) (actual time=0.014..147.276 rows=667525 loops=1)
-> Hash (cost=20.09..20.09 rows=409 width=263) (actual time=0.176..0.176 rows=409 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 133kB
-> Seq Scan on program_infos pi (cost=0.00..20.09 rows=409 width=263) (actual time=0.001..0.064 rows=409 loops=1)
-> Hash (cost=5.60..5.60 rows=260 width=75) (actual time=0.049..0.049 rows=260 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 27kB
-> Seq Scan on endeavour_organizations seller_organization (cost=0.00..5.60 rows=260 width=75) (actual time=0.006..0.019 rows=260 loops=1)
-> Index Scan using index_assets_owner_invoice on assets fa (cost=0.42..0.57 rows=4 width=92) (actual time=0.001..0.002 rows=3 loops=248735)
Index Cond: (invoice_id = sa.invoice_id)
-> Index Scan using endeavour_organizations_pkey on endeavour_organizations obligor_organization (cost=0.15..0.21 rows=1 width=75) (actual time=0.001..0.001 rows=1 loops=267238)
Index Cond: (id = pi.buyer_id)
Planning time: 3.194 ms
Execution time: 1740.875 ms
(24 rows)
Indexes are on pi.program_id, pi.fi_id, pi.supplier_id, asset.invoice_id, i.program_id
I am not able to understand why its doing nested loop. Please let me know if something else is needed.