postgres query optimisation to avoid hash right join

postgres query optimisation to avoid hash right join - postgresql

i have this Postgres query where i left join a couple of tables. This query runs for hours and causes issues. When I run explain analyse I see that the most time is spent in one of the left joins, for which optimiser selects Right Hash Join. When I use inner join instead and run explain analyse, optimiser selects a different plan and query finishes in minutes.
I have to use left join because with inner join some data will be excluded.
How should i rewrite the query to avoid this hash right join?
Many thanks in advance!
Links to query plans are attached above. I am using PostgreSQL 12.11 on x86_64-pc-linux-gnu, compiled by Debian clang version 12.0.1, 64-bit
WITH memberships AS (
SELECT customer_sk
, membership_sk
, membership_state
, membership_b2b_type
, membership_sml_type
, membership_start_date
, membership_end_date
, membership_pause_from
, membership_pause_to
, covid_pause_start_date
, covid_pause_end_date
, city_sk AS membership_city_region_sk
, sport_persona_current
, membership_cancellation_reason
, membership_sequence_nr_reverse
, company_sk
, company_name
FROM dwh.fact_membership
WHERE membership_is_urban_sports IS TRUE
),
-- Data preparation
request_cancellation AS (
SELECT membership_sk,
requested_cancellation_last_date
FROM staging.request_cancellation
),
blacklisted_emails AS (
SELECT customer_sk, email, 'blacklisted' AS blacklisted
FROM dwh_userdata.blacklist_emails
),
nonanon_customer AS (
SELECT id
, first_name
, last_name
, email
FROM dwh_userdata.customer
),
nonanon_customer_address_prep AS (
SELECT customer_id
, city
, state
, country
, zip
, row_number() over (partition by customer_id order by created_at desc) as row_number
FROM dwh_userdata.customer_address
),
nonanon_customer_address AS (
SELECT *
FROM nonanon_customer_address_prep
WHERE row_number = 1
),
favorite_sport_category_prep_1 AS (
SELECT membership_sk
, service_top_category_name
, count(DISTINCT booking_sk) as cnt_booking
FROM dwh.report_venue_visitors
WHERE booking_is_valid
GROUP BY 1, 2
),
favorite_sport_category_prep_2 AS (
SELECT membership_sk
, service_top_category_name
, cnt_booking
, row_number()
over (partition by membership_sk order by cnt_booking DESC,service_top_category_name ) AS row_number
FROM favorite_sport_category_prep_1
),
favorite_sport_category AS (
SELECT membership_sk
, service_top_category_name AS favourite_sport_category
, cnt_booking
FROM favorite_sport_category_prep_2
WHERE row_number = 1
),
free_trial AS (
select distinct membership_sk
, customer_sk
, trial_status AS free_trial_status
, trial AS free_trial_length
, trial_start_date AS free_trial_start
, trial_end_date AS free_trial_end
FROM dwh.report_memberships
WHERE trial_status IS NOT NULL
and trial_start_date >= '2020-06-23'
)
-- #### OUTOPUT TABLE
SELECT c.customer_sk AS named_user
, CASE WHEN c.gender IN ('M', 'F') THEN c.gender ELSE NULL END AS gender
, nc.first_name
, nc.last_name
, customer_language
, anss.state AS newsletter_status
, dl.city_name AS membership_city_region
, dl.country_code AS membership_country_code
, dl.country_name AS membership_country_name
, dl.admin1 AS membership_administrative_state
, m.membership_sk
, m.membership_state
, m.membership_b2b_type
, m.company_sk
, m.company_name
, m.membership_sml_type
, CASE
WHEN m.membership_start_date IS NOT NULL
THEN CONCAT(TO_CHAR(m.membership_start_date, 'YYYY-MM-DD'), 'T00:00:00')
ELSE NULL END AS membership_start_date
, CASE
WHEN m.membership_end_date IS NOT NULL THEN CONCAT(TO_CHAR(m.membership_end_date, 'YYYY-MM-DD'), 'T00:00:00')
ELSE NULL END AS membership_end_date
, ft.free_trial_status
, ft.free_trial_length
, CASE
WHEN ft.free_trial_start IS NOT NULL THEN CONCAT(TO_CHAR(ft.free_trial_start, 'YYYY-MM-DD'), 'T00:00:00')
ELSE NULL END AS free_trial_start
, CASE
WHEN ft.free_trial_end IS NOT NULL THEN CONCAT(TO_CHAR(ft.free_trial_end, 'YYYY-MM-DD'), 'T00:00:00')
ELSE NULL END AS free_trial_end
, CASE
WHEN m.membership_pause_from IS NOT NULL
THEN CONCAT(TO_CHAR(m.membership_pause_from, 'YYYY-MM-DD'), 'T00:00:00')
ELSE NULL END AS membership_pause_from
, CASE
WHEN m.membership_pause_to IS NOT NULL THEN CONCAT(TO_CHAR(m.membership_pause_to, 'YYYY-MM-DD'), 'T00:00:00')
ELSE NULL END AS membership_pause_to
, CASE
WHEN m.covid_pause_start_date IS NOT NULL THEN CONCAT(TO_CHAR(m.covid_pause_start_date, 'YYYY-MM-DD'),
'T00:00:00')
ELSE NULL END AS covid_pause_start_date
, CASE
WHEN m.covid_pause_end_date IS NOT NULL
THEN CONCAT(TO_CHAR(m.covid_pause_end_date, 'YYYY-MM-DD'), 'T00:00:00')
ELSE NULL END AS covid_pause_end_date
, CASE
WHEN rc.requested_cancellation_last_date IS NOT NULL THEN CONCAT(
TO_CHAR(rc.requested_cancellation_last_date, 'YYYY-MM-DD'), 'T00:00:00')
ELSE NULL END AS requested_cancellation_last_date
, membership_cancellation_reason
, be.blacklisted AS blacklist_email
, fsc.favourite_sport_category AS fav_sports_category
, m.sport_persona_current
, ambd.membership_months_active
, ambd.membership_months_total
, ambd.is_gm1_positive
, ambd.cnt_bookings_total
, ambd.cnt_bookings_last_30_days_total
, ambd.cnt_bookings_last_30_days_onsite
, ambd.cnt_bookings_onsite
, ambd.cnt_bookings_online
, ambd.cnt_bookings_last_30_days_online
, CASE
WHEN ambd.latest_booking_date IS NOT NULL
THEN CONCAT(TO_CHAR(ambd.latest_booking_date, 'YYYY-MM-DD'), 'T00:00:00')
ELSE NULL END AS latest_booking_date
, ambd.avg_bookings_active_month
, ambd.last_checkin_type
, ambd.fav_sports_category_onsite
, ambd.fav_sports_category_online
, ambd.fav_studio_last_30_days
, ambd.fav_studio_group_website
FROM dwh.dim_customer c
LEFT JOIN nonanon_customer nc
ON nc.id = c.customer_sk
LEFT JOIN nonanon_customer_address nca
ON nca.customer_id = customer_sk
LEFT JOIN memberships m
ON c.customer_sk = m.customer_sk
AND membership_sequence_nr_reverse = 1
LEFT JOIN request_cancellation rc
ON m.membership_sk = rc.membership_sk
LEFT JOIN dwh.dim_location dl
ON m.membership_city_region_sk = dl.city_sk
LEFT JOIN blacklisted_emails be
ON be.email = nc.email
LEFT JOIN favorite_sport_category fsc
ON fsc.membership_sk = m.membership_sk
LEFT JOIN staging.airship_newsletter_subscription_status anss
ON anss.customer_id = c.customer_sk
LEFT JOIN free_trial ft
ON ft.customer_sk = m.customer_sk
LEFT JOIN staging.airship_membership_booking_details ambd
ON ambd.membership_sk = m.membership_sk
AND membership_sequence_nr_reverse = 1
WHERE be.blacklisted IS NULL
AND nc.email NOT LIKE '%delete%'
AND nc.email IS NOT NULL
AND ((m.membership_sk IS NULL AND anss.state = 'subscribed') OR membership_state IS NOT NULL)
Results of EXPLAIN ANALYSE:
Hash Left Join (cost=6667580.77..6764370.56 rows=3256 width=692) (actual time=4319030.909..4328353.358 rows=518825 loops=1)
Hash Cond: (fact_membership.customer_sk = ft.customer_sk)
-> Hash Left Join (cost=6663581.42..6759951.96 rows=3256 width=380) (actual time=4318059.369..4324841.032 rows=518825 loops=1)
Hash Cond: (fact_membership.membership_sk = ambd.membership_sk)
Join Filter: (fact_membership.membership_sequence_nr_reverse = 1)
-> Hash Left Join (cost=6655261.78..6748793.03 rows=3256 width=242) (actual time=4317733.942..4323056.862 rows=518825 loops=1)
Hash Cond: (c.customer_sk = anss.customer_id)
Filter: (((fact_membership.membership_sk IS NULL) AND (anss.state = 'subscribed'::text)) OR (fact_membership.membership_state IS NOT NULL))
Rows Removed by Filter: 129098
-> Merge Left Join (cost=6642237.84..6733674.25 rows=3256 width=227) (actual time=4317378.943..4321020.832 rows=647923 loops=1)
Merge Cond: (fact_membership.membership_sk = favorite_sport_category_prep_2.membership_sk)
-> Sort (cost=167496.47..167504.61 rows=3256 width=218) (actual time=4146517.144..4147134.144 rows=647923 loops=1)
Sort Key: fact_membership.membership_sk
Sort Method: external merge Disk: 82352kB
-> Merge Left Join (cost=150681.68..167306.50 rows=3256 width=218) (actual time=4142397.925..4145027.017 rows=647923 loops=1)
Merge Cond: (c.customer_sk = nonanon_customer_address_prep.customer_id)
-> Sort (cost=59476.20..59484.34 rows=3256 width=218) (actual time=4139725.733..4140241.833 rows=647923 loops=1)
Sort Key: c.customer_sk
Sort Method: external merge Disk: 82344kB
-> Hash Right Join (cost=52983.04..59286.23 rows=3256 width=218) (actual time=33403.336..4135281.108 rows=647923 loops=1)
Hash Cond: (request_cancellation.membership_sk = fact_membership.membership_sk)
-> Seq Scan on request_cancellation (cost=0.00..5128.40 rows=308340 width=8) (actual time=1.160..228.691 rows=308340 loops=1)
-> Hash (cost=52942.34..52942.34 rows=3256 width=214) (actual time=30038.787..30048.670 rows=647923 loops=1)
Buckets: 65536 (originally 4096) Batches: 131072 (originally 1) Memory Usage: 10511kB
-> Gather (cost=1064.24..52942.34 rows=3256 width=214) (actual time=11.564..12621.194 rows=647923 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Hash Left Join (cost=64.24..51616.74 rows=1357 width=214) (actual time=5.510..22450.906 rows=215974 loops=3)
Hash Cond: (fact_membership.city_sk = dl.city_sk)
-> Nested Loop Left Join (cost=59.79..51608.59 rows=1357 width=191) (actual time=5.239..22013.464 rows=215974 loops=3)
-> Nested Loop (cost=59.37..50428.72 rows=1357 width=60) (actual time=4.923..6958.191 rows=215974 loops=3)
-> Hash Left Join (cost=58.94..49440.62 rows=1357 width=55) (actual time=3.419..2000.407 rows=215976 loops=3)
Hash Cond: ((customer.email)::text = blacklist_emails.email)
Filter: (('blacklisted'::text) IS NULL)
Rows Removed by Filter: 122
-> Parallel Seq Scan on customer (cost=0.00..46660.28 rows=271334 width=46) (actual time=0.999..1668.668 rows=216091 loops=3)
Filter: ((email IS NOT NULL) AND ((email)::text !~~ '%delete%'::text))
Rows Removed by Filter: 3191
-> Hash (cost=34.53..34.53 rows=1953 width=54) (actual time=2.222..2.226 rows=1953 loops=3)
Buckets: 2048 Batches: 1 Memory Usage: 144kB
-> Seq Scan on blacklist_emails (cost=0.00..34.53 rows=1953 width=54) (actual time=0.263..1.207 rows=1953 loops=3)
-> Index Scan using customer_pk on dim_customer c (cost=0.42..0.73 rows=1 width=13) (actual time=0.020..0.020 rows=1 loops=647929)
Index Cond: (customer_sk = customer.id)
-> Index Scan using dwh_fact_membership_3b307128 on fact_membership (cost=0.42..0.86 rows=1 width=131) (actual time=0.066..0.067 rows=1 loops=647923)
Index Cond: (customer_sk = c.customer_sk)
Filter: ((membership_is_urban_sports IS TRUE) AND (membership_sequence_nr_reverse = 1))
Rows Removed by Filter: 0
-> Hash (cost=3.09..3.09 rows=109 width=35) (actual time=0.148..0.214 rows=109 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 16kB
-> Seq Scan on dim_location dl (cost=0.00..3.09 rows=109 width=35) (actual time=0.031..0.098 rows=109 loops=3)
-> Materialize (cost=91205.48..107807.50 rows=2553 width=4) (actual time=2668.900..3946.682 rows=470415 loops=1)
-> Subquery Scan on nonanon_customer_address_prep (cost=91205.48..107801.12 rows=2553 width=4) (actual time=2666.188..3647.463 rows=470415 loops=1)
Filter: (nonanon_customer_address_prep.row_number = 1)
Rows Removed by Filter: 40218
-> WindowAgg (cost=91205.48..101418.18 rows=510635 width=148) (actual time=2664.902..3526.361 rows=510633 loops=1)
-> Sort (cost=91205.48..92482.07 rows=510635 width=12) (actual time=2664.083..2833.676 rows=510634 loops=1)
Sort Key: customer_address.customer_id, customer_address.created_at DESC
Sort Method: external merge Disk: 13032kB
-> Seq Scan on customer_address (cost=0.00..34063.35 rows=510635 width=12) (actual time=4.596..1522.444 rows=510635 loops=1)
-> Materialize (cost=6474741.37..6566128.10 rows=13051 width=13) (actual time=170857.053..173215.019 rows=465703 loops=1)
-> Subquery Scan on favorite_sport_category_prep_2 (cost=6474741.37..6566095.47 rows=13051 width=13) (actual time=170855.731..173002.743 rows=465703 loops=1)
Filter: (favorite_sport_category_prep_2.row_number = 1)
Rows Removed by Filter: 1343535
-> WindowAgg (cost=6474741.37..6533469.01 rows=2610117 width=29) (actual time=170854.901..172755.674 rows=1809238 loops=1)
-> Sort (cost=6474741.37..6481266.67 rows=2610117 width=21) (actual time=170853.124..171205.257 rows=1809238 loops=1)
Sort Key: report_venue_visitors.membership_sk, (count(DISTINCT report_venue_visitors.booking_sk)) DESC, report_venue_visitors.service_top_category_name
Sort Method: external merge Disk: 63696kB
-> GroupAggregate (cost=5839877.44..6063400.07 rows=2610117 width=21) (actual time=154838.978..169250.761 rows=1809238 loops=1)
Group Key: report_venue_visitors.membership_sk, report_venue_visitors.service_top_category_name
-> Sort (cost=5839877.44..5889232.80 rows=19742146 width=21) (actual time=154835.761..158654.645 rows=19827987 loops=1)
Sort Key: report_venue_visitors.membership_sk, report_venue_visitors.service_top_category_name
Sort Method: external merge Disk: 694120kB
-> Seq Scan on report_venue_visitors (cost=0.00..2233036.56 rows=19742146 width=21) (actual time=1.868..117392.591 rows=19827987 loops=1)
Filter: booking_is_valid
Rows Removed by Filter: 6441170
-> Hash (cost=7199.42..7199.42 rows=317242 width=19) (actual time=352.386..352.386 rows=317242 loops=1)
Buckets: 65536 Batches: 8 Memory Usage: 2606kB
-> Seq Scan on airship_newsletter_subscription_status anss (cost=0.00..7199.42 rows=317242 width=19) (actual time=1.120..154.407 rows=317242 loops=1)
-> Hash (cost=4207.06..4207.06 rows=121006 width=150) (actual time=320.770..320.771 rows=121006 loops=1)
Buckets: 32768 Batches: 8 Memory Usage: 3111kB
-> Seq Scan on airship_membership_booking_details ambd (cost=0.00..4207.06 rows=121006 width=150) (actual time=1.446..107.525 rows=121006 loops=1)
-> Hash (cost=3993.93..3993.93 rows=434 width=26) (actual time=951.259..951.264 rows=26392 loops=1)
Buckets: 32768 (originally 1024) Batches: 1 (originally 1) Memory Usage: 1760kB
-> Subquery Scan on ft (cost=3981.99..3993.93 rows=434 width=26) (actual time=857.944..888.163 rows=26392 loops=1)
-> Unique (cost=3981.99..3989.59 rows=434 width=30) (actual time=857.288..878.098 rows=26392 loops=1)
-> Sort (cost=3981.99..3983.08 rows=434 width=30) (actual time=856.675..863.298 rows=26392 loops=1)
Sort Key: report_memberships.membership_sk, report_memberships.customer_sk, report_memberships.trial_status, report_memberships.trial, report_memberships.trial_start_date, report_memberships.trial_end_date
Sort Method: quicksort Memory: 2830kB
-> Bitmap Heap Scan on report_memberships (cost=2256.96..3962.98 rows=434 width=30) (actual time=102.229..817.152 rows=26392 loops=1)
Recheck Cond: ((trial_start_date >= '2020-06-23'::date) AND (trial_status IS NOT NULL))
Heap Blocks: exact=1383
-> BitmapAnd (cost=2256.96..2256.96 rows=434 width=0) (actual time=99.478..99.479 rows=0 loops=1)
-> Bitmap Index Scan on dwh_report_memberships_bc76fe51 (cost=0.00..578.02 rows=31145 width=0) (actual time=7.497..7.497 rows=26392 loops=1)
Index Cond: (trial_start_date >= '2020-06-23'::date)
-> Bitmap Index Scan on dwh_report_memberships_35525e76 (cost=0.00..1678.48 rows=90406 width=0) (actual time=91.704..91.704 rows=92029 loops=1)
Index Cond: (trial_status IS NOT NULL)
Planning Time: 7.850 ms
Execution Time: 4328700.854 ms

Related

How to optimize this PostgreSQL query

I have a question regarding PostgreSQL query speed optimizations.
I tried with Indexes and I speed up some queries, but for this one I don't know why it's still slow.
With this explain (https://explain.dalibo.com/plan/8ace3g496e6112f5), I see that the issues are on the Index Scan for User and Location, but I have indexes and it's still very slow.
I don't know why my tables or Indexes can't be fully in memory.
Query Text: SELECT "mygreatapp"."User"."id", "mygreatapp"."User"."createdAt", "mygreatapp"."User"."updatedAt", "mygreatapp"."User"."lastLogin", "mygreatapp"."User"."signupCompleted", "mygreatapp"."User"."visible", "mygreatapp"."User"."deleted", "mygreatapp"."User"."pushNotificationsToken", "mygreatapp"."User"."email", "mygreatapp"."User"."password", "mygreatapp"."User"."facebookUserId", "mygreatapp"."User"."name", "mygreatapp"."User"."birthday", "mygreatapp"."User"."birthdayString", "mygreatapp"."User"."bmi", "mygreatapp"."User"."height", "mygreatapp"."User"."weight", "mygreatapp"."User"."children", "mygreatapp"."User"."locale", "mygreatapp"."User"."timeZone", "mygreatapp"."User"."newMatchNotification", "mygreatapp"."User"."messageNotification", "mygreatapp"."User"."suggestionNotification", "mygreatapp"."User"."likeScore", "mygreatapp"."User"."likeNotificationFrequency", "mygreatapp"."User"."matchNotificationFrequency", "mygreatapp"."User"."messageNotificationFrequency", "mygreatapp"."User"."moderated", "mygreatapp"."User"."moderationDate", "mygreatapp"."User"."moderatedBy", "mygreatapp"."User"."isGrazer", "mygreatapp"."User"."blockedUntil", "mygreatapp"."User"."blockedBy", "mygreatapp"."User"."gender", "mygreatapp"."User"."emailStatus", "mygreatapp"."User"."emailOptIn", "mygreatapp"."User"."authProvider", "mygreatapp"."User"."role", "mygreatapp"."User"."drinking", "mygreatapp"."User"."eyesColor", "mygreatapp"."User"."hairColor", "mygreatapp"."User"."studyLevel", "mygreatapp"."User"."smoking", "mygreatapp"."User"."ethnia", "mygreatapp"."User"."religion", "mygreatapp"."User"."maritalSituation", "mygreatapp"."User"."conjugalSituation", "mygreatapp"."User"."blocked", "mygreatapp"."User"."lastLocationId", "mygreatapp"."User"."hiddenUniverses", "mygreatapp"."User"."jwtVersion", "mygreatapp"."User"."ipAddress", "mygreatapp"."User"."ipAddressCountry", "mygreatapp"."User"."ipAddressInEurope", "mygreatapp"."User"."ipAddressScore", "mygreatapp"."User"."phoneOperator" FROM "mygreatapp"."User" WHERE ("mygreatapp"."User"."signupCompleted" = $1 AND "mygreatapp"."User"."visible" = $2 AND "mygreatapp"."User"."blocked" IS NULL AND "mygreatapp"."User"."deleted" = $3 AND "mygreatapp"."User"."id" <> $4 AND ("mygreatapp"."User"."id") NOT IN (SELECT "t0"."id" FROM "mygreatapp"."User" AS "t0" INNER JOIN "mygreatapp"."Like" AS "j0" ON ("j0"."userId") = ("t0"."id") WHERE ("j0"."likedUserId" = $5 AND "j0"."status" = $6 AND "t0"."id" IS NOT NULL)) AND ("mygreatapp"."User"."id") NOT IN (SELECT "t0"."id" FROM "mygreatapp"."User" AS "t0" INNER JOIN "mygreatapp"."Like" AS "j0" ON ("j0"."likedUserId") = ("t0"."id") WHERE ((("j0"."userId" = $7 AND "j0"."status" = $8) OR ("j0"."universe" = $9 AND "j0"."userId" = $10)) AND "t0"."id" IS NOT NULL)) AND ("mygreatapp"."User"."id") IN (SELECT "t0"."id" FROM "mygreatapp"."User" AS "t0" INNER JOIN "mygreatapp"."UniverseOnUser" AS "j0" ON ("j0"."userId") = ("t0"."id") WHERE (("j0"."userId","j0"."universeId") IN (SELECT "t1"."userId", "t1"."universeId" FROM "mygreatapp"."UniverseOnUser" AS "t1" INNER JOIN "mygreatapp"."Universe" AS "j1" ON ("j1"."id") = ("t1"."universeId") WHERE ("j1"."id" = $11 AND "t1"."userId" IS NOT NULL AND "t1"."universeId" IS NOT NULL)) AND "j0"."relation" = $12 AND "t0"."id" IS NOT NULL)) AND "mygreatapp"."User"."gender" IN ($13,$14) AND "mygreatapp"."User"."religion" IN ($15,$16,$17,$18,$19,$20,$21,$22,$23) AND ("mygreatapp"."User"."id") NOT IN (SELECT "t0"."id" FROM "mygreatapp"."User" AS "t0" INNER JOIN "mygreatapp"."LocationOnUser" AS "j0" ON ("j0"."userId") = ("t0"."id") WHERE ("j0"."exclude" = $24 AND ("j0"."id") IN (SELECT "t1"."id" FROM "mygreatapp"."LocationOnUser" AS "t1" INNER JOIN "mygreatapp"."Bound" AS "j1" ON ("j1"."id") = ("t1"."boundId") WHERE ("j1"."maxLng" >= $25 AND "j1"."minLng" <= $26 AND "j1"."maxLat" >= $27 AND "j1"."minLat" <= $28 AND "t1"."id" IS NOT NULL)) AND "t0"."id" IS NOT NULL)) AND ("mygreatapp"."User"."id") IN (SELECT "t0"."id" FROM "mygreatapp"."User" AS "t0" INNER JOIN "mygreatapp"."Location" AS "j0" ON ("j0"."id") = ("t0"."lastLocationId") WHERE (((("j0"."latitude" <= $29 AND "j0"."latitude" >= $30) AND ("j0"."longitude" <= $31 AND "j0"."longitude" >= $32)) OR (("j0"."latitude" <= $33 AND "j0"."latitude" >= $34) AND ("j0"."longitude" <= $35 AND "j0"."longitude" >= $36)) OR (("j0"."latitude" <= $37 AND "j0"."latitude" >= $38) AND ("j0"."longitude" <= $39 AND "j0"."longitude" >= $40))) AND "t0"."id" IS NOT NULL))) ORDER BY "mygreatapp"."User"."id" ASC LIMIT $41 OFFSET $42
Limit (cost=6109.21..6109.24 rows=13 width=652) (actual time=482.978..482.993 rows=18 loops=1)
-> Sort (cost=6109.21..6109.24 rows=13 width=652) (actual time=482.976..482.990 rows=18 loops=1)
Sort Key: "User".id
Sort Method: quicksort Memory: 36kB
-> Nested Loop Semi Join (cost=1947.03..6108.97 rows=13 width=652) (actual time=80.857..482.921 rows=18 loops=1)
Join Filter: ("User".id = j0."userId")
-> Nested Loop Semi Join (cost=1946.16..4945.48 rows=231 width=679) (actual time=1.442..386.430 rows=1704 loops=1)
-> Bitmap Heap Scan on "User" (cost=1945.46..2700.02 rows=476 width=652) (actual time=1.353..7.950 rows=4281 loops=1)
Recheck Cond: ((blocked IS NULL) AND (gender = ANY ('{MALE,FEMALE}'::"Gender"[])) AND (religion = ANY ('{BUDDHIST,CATHOLIC,HINDU,JEW,MUSLIM,NONE,PROTESTANT,ORTHODOX,ORTHODOX}'::"Religion"[])))
Filter: ("signupCompleted" AND visible AND (NOT deleted) AND (id <> 'ckki8m4y81jew0792sw40q81w'::text) AND (NOT (hashed SubPlan 1)) AND (NOT (hashed SubPlan 2)) AND (NOT (hashed SubPlan 3)))
Rows Removed by Filter: 56
Heap Blocks: exact=621
-> Bitmap Index Scan on "User.signupCompleted_visible_blocked_deleted_gender_religion_in" (cost=0.00..129.45 rows=3809 width=0) (actual time=0.227..0.227 rows=4655 loops=1)
Index Cond: (("signupCompleted" = true) AND (visible = true) AND (blocked IS NULL) AND (deleted = false) AND (gender = ANY ('{MALE,FEMALE}'::"Gender"[])) AND (religion = ANY ('{BUDDHIST,CATHOLIC,HINDU,JEW,MUSLIM,NONE,PROTESTANT,ORTHODOX,ORTHODOX}'::"Religion"[])))
SubPlan 1
-> Nested Loop (cost=25.28..37.31 rows=1 width=27) (actual time=0.178..0.610 rows=59 loops=1)
-> Bitmap Heap Scan on "Like" j0_2 (cost=24.99..29.01 rows=1 width=26) (actual time=0.163..0.239 rows=59 loops=1)
Recheck Cond: (("likedUserId" = 'ckki8m4y81jew0792sw40q81w'::text) AND (status = 'LIKED'::"LikeStatus"))
Heap Blocks: exact=58
-> BitmapAnd (cost=24.99..24.99 rows=1 width=0) (actual time=0.154..0.155 rows=0 loops=1)
-> Bitmap Index Scan on "Like.likedUserId_index" (cost=0.00..4.71 rows=38 width=0) (actual time=0.014..0.014 rows=141 loops=1)
Index Cond: ("likedUserId" = 'ckki8m4y81jew0792sw40q81w'::text)
-> Bitmap Index Scan on "Like.status_universe_index" (cost=0.00..20.04 rows=1549 width=0) (actual time=0.132..0.132 rows=1559 loops=1)
Index Cond: (status = 'LIKED'::"LikeStatus")
-> Index Only Scan using "User.id_lastLogin_blocked_isGrazer_index" on "User" t0_2 (cost=0.29..8.30 rows=1 width=27) (actual time=0.005..0.005 rows=1 loops=59)
Index Cond: ((id = j0_2."userId") AND (id IS NOT NULL))
Heap Fetches: 59
SubPlan 2
-> Nested Loop (cost=102.16..1611.05 rows=168 width=27) (actual time=0.054..0.284 rows=33 loops=1)
-> Bitmap Heap Scan on "Like" j0_3 (cost=101.87..667.81 rows=168 width=26) (actual time=0.042..0.083 rows=33 loops=1)
Recheck Cond: ((("userId" = 'ckki8m4y81jew0792sw40q81w'::text) AND (status = 'LIKED'::"LikeStatus")) OR (("userId" = 'ckki8m4y81jew0792sw40q81w'::text) AND (universe = 'vegan'::text)))
Heap Blocks: exact=30
-> BitmapOr (cost=101.87..101.87 rows=169 width=0) (actual time=0.035..0.036 rows=0 loops=1)
-> Bitmap Index Scan on "Like.userId_status_universe_index" (cost=0.00..4.84 rows=42 width=0) (actual time=0.011..0.011 rows=24 loops=1)
Index Cond: (("userId" = 'ckki8m4y81jew0792sw40q81w'::text) AND (status = 'LIKED'::"LikeStatus"))
-> Bitmap Index Scan on "Like.userId_status_universe_index" (cost=0.00..96.95 rows=127 width=0) (actual time=0.023..0.024 rows=9 loops=1)
Index Cond: (("userId" = 'ckki8m4y81jew0792sw40q81w'::text) AND (universe = 'vegan'::text))
-> Index Only Scan using "User_pkey" on "User" t0_3 (cost=0.29..5.61 rows=1 width=27) (actual time=0.005..0.005 rows=1 loops=33)
Index Cond: ((id = j0_3."likedUserId") AND (id IS NOT NULL))
Heap Fetches: 33
SubPlan 3
-> Nested Loop (cost=74.27..167.09 rows=7 width=27) (actual time=0.103..0.106 rows=0 loops=1)
-> Nested Loop (cost=73.99..143.13 rows=7 width=27) (actual time=0.103..0.105 rows=0 loops=1)
-> HashAggregate (cost=73.70..75.32 rows=162 width=26) (actual time=0.102..0.105 rows=0 loops=1)
Group Key: t1_1.id
Batches: 1 Memory Usage: 40kB
-> Merge Join (cost=28.97..73.30 rows=162 width=26) (actual time=0.100..0.101 rows=0 loops=1)
Merge Cond: (t1_1."boundId" = j1_1.id)
-> Index Scan using "LocationOnUser.boundId_unique" on "LocationOnUser" t1_1 (cost=0.29..778.03 rows=15178 width=53) (actual time=0.025..0.057 rows=46 loops=1)
Filter: (id IS NOT NULL)
-> Sort (cost=28.68..28.71 rows=13 width=27) (actual time=0.027..0.028 rows=2 loops=1)
Sort Key: j1_1.id
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on "Bound" j1_1 (cost=11.19..28.44 rows=13 width=27) (actual time=0.020..0.021 rows=2 loops=1)
Recheck Cond: (("maxLng" >= '5.531'::double precision) AND ("minLng" <= '5.531'::double precision) AND ("maxLat" >= '49.15'::double precision) AND ("minLat" <= '49.15'::double precision))
Heap Blocks: exact=2
-> Bitmap Index Scan on "Bound.maxLng_minLng_maxLat_minLat_index" (cost=0.00..11.19 rows=13 width=0) (actual time=0.017..0.017 rows=2 loops=1)
Index Cond: (("maxLng" >= '5.531'::double precision) AND ("minLng" <= '5.531'::double precision) AND ("maxLat" >= '49.15'::double precision) AND ("minLat" <= '49.15'::double precision))
-> Index Scan using "LocationOnUser_pkey" on "LocationOnUser" j0_4 (cost=0.29..0.42 rows=1 width=53) (never executed)
Index Cond: (id = t1_1.id)
Filter: exclude
-> Index Only Scan using "User_pkey" on "User" t0_4 (cost=0.29..3.42 rows=1 width=27) (never executed)
Index Cond: ((id = j0_4."userId") AND (id IS NOT NULL))
Heap Fetches: 0
-> Nested Loop (cost=0.70..4.71 rows=1 width=27) (actual time=0.088..0.088 rows=0 loops=4281)
-> Index Scan using "User_pkey" on "User" t0_1 (cost=0.29..3.83 rows=1 width=53) (actual time=0.018..0.036 rows=1 loops=4281)
Index Cond: ((id = "User".id) AND (id IS NOT NULL))
-> Index Scan using "Location_pkey" on "Location" j0_1 (cost=0.41..0.88 rows=1 width=26) (actual time=0.043..0.043 rows=0 loops=4281)
Index Cond: (id = t0_1."lastLocationId")
Filter: (((latitude <= '48.91064321183747'::double precision) AND (latitude >= '45.31335678816254'::double precision) AND (longitude <= '3.257852869245691'::double precision) AND (longitude >= '-2.025852869245691'::double precision)) OR ((latitude <= '50.94464321183747'::double precision) AND (latitude >= '47.34735678816255'::double precision) AND (longitude <= '8.155450105467445'::double precision) AND (longitude >= '2.658549894532557'::double precision)) OR ((latitude <= '50.94864321183746'::double precision) AND (latitude >= '47.35135678816253'::double precision) AND (longitude <= '8.279671659522354'::double precision) AND (longitude >= '2.782328340477646'::double precision)))
Rows Removed by Filter: 1
-> Nested Loop (cost=0.86..5.02 rows=1 width=79) (actual time=0.056..0.056 rows=0 loops=1704)
Join Filter: (j0."userId" = t0.id)
-> Nested Loop Semi Join (cost=0.58..4.47 rows=1 width=52) (actual time=0.056..0.056 rows=0 loops=1704)
Join Filter: (j0."userId" = t1."userId")
-> Index Scan using "UniverseOnUser.userId_index" on "UniverseOnUser" j0 (cost=0.29..1.59 rows=1 width=52) (actual time=0.050..0.050 rows=0 loops=1704)
Index Cond: ("userId" = t0_1.id)
Filter: ((relation = 'FRIENDSHIP'::"Relation") AND ("universeId" = 'cjzx66jhlh8gx0974trtqhkb4'::text))
Rows Removed by Filter: 7
-> Nested Loop (cost=0.29..2.87 rows=1 width=52) (actual time=0.018..0.018 rows=1 loops=18)
-> Index Scan using "UniverseOnUser.userId_index" on "UniverseOnUser" t1 (cost=0.29..1.59 rows=1 width=52) (actual time=0.010..0.010 rows=1 loops=18)
Index Cond: (("userId" = t0_1.id) AND ("userId" IS NOT NULL))
Filter: (("universeId" IS NOT NULL) AND ("universeId" = 'cjzx66jhlh8gx0974trtqhkb4'::text))
Rows Removed by Filter: 4
-> Seq Scan on "Universe" j1 (cost=0.00..1.26 rows=1 width=26) (actual time=0.004..0.004 rows=1 loops=18)
Filter: (id = 'cjzx66jhlh8gx0974trtqhkb4'::text)
Rows Removed by Filter: 6
-> Index Only Scan using "User_pkey" on "User" t0 (cost=0.29..0.54 rows=1 width=27) (actual time=0.006..0.006 rows=1 loops=18)
Index Cond: ((id = t0_1.id) AND (id IS NOT NULL))
Heap Fetches: 18
Thanks a lot.
Regards

Your request is very large, it will take time to review it in detail. I can just give you some recommendations.
It is recommended to use CTE (with as) to make your query readable and simple.
If there are duplicates between your subqueries, write them as (with as materialized).
If you have OR conditions in queries, use UNION instead of OR.
If you have queries that return large results in your subqueries after the IN command, then use INNER JOIN instead of IN. Apply the same logic to NOT IN if you can.
And lastly, of course, pay attention to the correct indexing of the fields you use in the conditions.

How does a string operation on a column in a filter condition of a Postgresql query have on the plan it chooses

I was working on optimising a query, with dumb luck I tried something and it improved the query but I am unable to explain why.
Below is the query with poor performance
with ctedata1 as(
select
sum(total_visit_count) as total_visit_count,
sum(sh_visit_count) as sh_visit_count,
sum(ec_visit_count) as ec_visit_count,
sum(total_like_count) as total_like_count,
sum(sh_like_count) as sh_like_count,
sum(ec_like_count) as ec_like_count,
sum(total_order_count) as total_order_count,
sum(sh_order_count) as sh_order_count,
sum(ec_order_count) as ec_order_count,
sum(total_sales_amount) as total_sales_amount,
sum(sh_sales_amount) as sh_sales_amount,
sum(ec_sales_amount) as ec_sales_amount,
sum(ec_order_online_count) as ec_order_online_count,
sum(ec_sales_online_amount) as ec_sales_online_amount,
sum(ec_order_in_store_count) as ec_order_in_store_count,
sum(ec_sales_in_store_amount) as ec_sales_in_store_amount,
table2.im_name,
table2.brand as kpibrand,
table2.id_region as kpiregion
from
table2
where
deleted_at is null
and id_region = any('{1}')
group by
im_name,
kpiregion,
kpibrand ),
ctedata2 as (
select
ctedata1.*,
rank() over (partition by (kpiregion,
kpibrand)
order by
coalesce(ctedata1.total_sales_amount, 0) desc) rank,
count(*) over (partition by (kpiregion,
kpibrand)) as total_count
from
ctedata1 )
select
table1.id_pf_item,
table1.product_id,
table1.color_code,
table1.l1_code,
table1.local_title as product_name,
table1.id_region,
table1.gender,
case
when table1.created_at is null then '1970/01/01 00:00:00'
else table1.created_at
end as created_at,
(
select
count(distinct id_outfit)
from
table3
left join table4 on
table3.id_item = table4.id_item
and table4.deleted_at is null
where
table3.deleted_at is null
and table3.id_pf_item = table1.id_pf_item) as outfit_count,
count(*) over() as total_matched,
case
when table1.v8_im_name = '' then table1.im_name
else table1.v8_im_name
end as im_name,
case
when table1.id_region != 1 then null
else
case
when table1.sales_start_at is null then '1970/01/01 00:00:00'
else table1.sales_start_at
end
end as sales_start_date,
table1.category_ids,
array_to_string(table1.intermediate_category_ids, ','),
table1.image_url,
table1.brand,
table1.pdp_url,
coalesce(ctedata2.total_visit_count, 0) as total_visit_count,
coalesce(ctedata2.sh_visit_count, 0) as sh_visit_count,
coalesce(ctedata2.ec_visit_count, 0) as ec_visit_count,
coalesce(ctedata2.total_like_count, 0) as total_like_count,
coalesce(ctedata2.sh_like_count, 0) as sh_like_count,
coalesce(ctedata2.ec_like_count, 0) as ec_like_count,
coalesce(ctedata2.total_order_count, 0) as total_order_count,
coalesce(ctedata2.sh_order_count, 0) as sh_order_count,
coalesce(ctedata2.ec_order_count, 0) as ec_order_count,
coalesce(ctedata2.total_sales_amount, 0) as total_sales_amount,
coalesce(ctedata2.sh_sales_amount, 0) as sh_sales_amount,
coalesce(ctedata2.ec_sales_amount, 0) as ec_sales_amount,
coalesce(ctedata2.ec_order_online_count, 0) as ec_order_online_count,
coalesce(ctedata2.ec_sales_online_amount, 0) as ec_sales_online_amount,
coalesce(ctedata2.ec_order_in_store_count, 0) as ec_order_in_store_count,
coalesce(ctedata2.ec_sales_in_store_amount, 0) as ec_sales_in_store_amount,
ctedata2.rank,
ctedata2.total_count,
table1.department,
table1.seasons
from
table1
left join ctedata2 on
table1.im_name = ctedata2.im_name
and table1.brand = ctedata2.kpibrand
where
table1.deleted_at is null
and table1.id_region = any('{1}')
and lower(table1.brand) = any('{"brand1","brand2"}')
and 'season1' = any(lower(seasons::text)::text[])
and table1.department = 'Department1'
order by
total_sales_amount desc offset 0
limit 100
The explain output for above query is
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=172326.55..173435.38 rows=1 width=952) (actual time=85664.201..85665.970 rows=100 loops=1)
CTE ctedata1
-> GroupAggregate (cost=0.42..80478.71 rows=43468 width=530) (actual time=0.063..708.069 rows=73121 loops=1)
Group Key: table2.im_name, table2.id_region, table2.brand
-> Index Scan using udx_table2_im_name_id_region_brand_target_date_key on table2 (cost=0.42..59699.18 rows=391708 width=146) (actual time=0.029..308.582 rows=391779 loops=1)
Filter: ((deleted_at IS NULL) AND (id_region = ANY ('{1}'::integer[])))
Rows Removed by Filter: 20415
CTE ctedata2
-> WindowAgg (cost=16104.06..17842.78 rows=43468 width=628) (actual time=1012.994..1082.057 rows=73121 loops=1)
-> WindowAgg (cost=16104.06..17082.09 rows=43468 width=620) (actual time=945.755..1014.656 rows=73121 loops=1)
-> Sort (cost=16104.06..16212.73 rows=43468 width=612) (actual time=945.747..963.254 rows=73121 loops=1)
Sort Key: ctedata1.kpiregion, ctedata1.kpibrand, (COALESCE(ctedata1.total_sales_amount, '0'::numeric)) DESC
Sort Method: external merge Disk: 6536kB
-> CTE Scan on ctedata1 (cost=0.00..869.36 rows=43468 width=612) (actual time=0.069..824.841 rows=73121 loops=1)
-> Result (cost=74005.05..75113.88 rows=1 width=952) (actual time=85664.199..85665.950 rows=100 loops=1)
-> Sort (cost=74005.05..74005.05 rows=1 width=944) (actual time=85664.072..85664.089 rows=100 loops=1)
Sort Key: (COALESCE(ctedata2.total_sales_amount, '0'::numeric)) DESC
Sort Method: top-N heapsort Memory: 76kB
-> WindowAgg (cost=10960.95..74005.04 rows=1 width=944) (actual time=85658.049..85661.393 rows=3151 loops=1)
-> Nested Loop Left Join (cost=10960.95..74005.02 rows=1 width=927) (actual time=1075.219..85643.595 rows=3151 loops=1)
Join Filter: (((table1.im_name)::text = ctedata2.im_name) AND ((table1.brand)::text = ctedata2.kpibrand))
Rows Removed by Join Filter: 230402986
-> Bitmap Heap Scan on table1 (cost=10960.95..72483.64 rows=1 width=399) (actual time=45.466..278.376 rows=3151 loops=1)
Recheck Cond: (id_region = ANY ('{1}'::integer[]))
Filter: ((deleted_at IS NULL) AND (department = 'Department1'::text) AND (lower((brand)::text) = ANY ('{brand1, brand2}'::text[])) AND ('season1'::text = ANY ((lower((seasons)::text))::text[])))
Rows Removed by Filter: 106335
Heap Blocks: exact=42899
-> Bitmap Index Scan on table1_im_name_id_region_key (cost=0.00..10960.94 rows=110619 width=0) (actual time=38.307..38.307 rows=109486 loops=1)
Index Cond: (id_region = ANY ('{1}'::integer[]))
-> CTE Scan on ctedata2 (cost=0.00..869.36 rows=43468 width=592) (actual time=0.325..21.721 rows=73121 loops=3151)
SubPlan 3
-> Aggregate (cost=1108.80..1108.81 rows=1 width=8) (actual time=0.018..0.018 rows=1 loops=100)
-> Nested Loop Left Join (cost=5.57..1108.57 rows=93 width=4) (actual time=0.007..0.016 rows=3 loops=100)
-> Bitmap Heap Scan on table3 (cost=5.15..350.95 rows=93 width=4) (actual time=0.005..0.008 rows=3 loops=100)
Recheck Cond: (id_pf_item = table1.id_pf_item)
Filter: (deleted_at IS NULL)
Heap Blocks: exact=107
-> Bitmap Index Scan on idx_id_pf_item (cost=0.00..5.12 rows=93 width=0) (actual time=0.003..0.003 rows=3 loops=100)
Index Cond: (id_pf_item = table1.id_pf_item)
-> Index Scan using index_table4_id_item on table4 (cost=0.42..8.14 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=303)
Index Cond: (table3.id_item = id_item)
Filter: (deleted_at IS NULL)
Rows Removed by Filter: 0
Planning time: 1.023 ms
Execution time: 85669.512 ms
I changed
and lower(table1.brand) = any('{"brand1","brand2"}')
in the query to
and table1.brand = any('{"Brand1","Brand2"}')
and the plan changed to
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=173137.44..188661.06 rows=14 width=952) (actual time=1444.123..1445.653 rows=100 loops=1)
CTE ctedata1
-> GroupAggregate (cost=0.42..80478.71 rows=43468 width=530) (actual time=0.040..769.982 rows=73121 loops=1)
Group Key: table2.im_name, table2.id_region, table2.brand
-> Index Scan using udx_table2_item_im_name_id_region_brand_target_date_key on table2 (cost=0.42..59699.18 rows=391708 width=146) (actual time=0.021..350.774 rows=391779 loops=1)
Filter: ((deleted_at IS NULL) AND (id_region = ANY ('{1}'::integer[])))
Rows Removed by Filter: 20415
CTE ctedata2
-> WindowAgg (cost=16104.06..17842.78 rows=43468 width=628) (actual time=1088.905..1153.749 rows=73121 loops=1)
-> WindowAgg (cost=16104.06..17082.09 rows=43468 width=620) (actual time=1020.017..1089.117 rows=73121 loops=1)
-> Sort (cost=16104.06..16212.73 rows=43468 width=612) (actual time=1020.011..1037.170 rows=73121 loops=1)
Sort Key: ctedata1.kpiregion, ctedata1.kpibrand, (COALESCE(ctedata1.total_sales_amount, '0'::numeric)) DESC
Sort Method: external merge Disk: 6536kB
-> CTE Scan on ctedata1 (cost=0.00..869.36 rows=43468 width=612) (actual time=0.044..891.653 rows=73121 loops=1)
-> Result (cost=74815.94..90339.56 rows=14 width=952) (actual time=1444.121..1445.635 rows=100 loops=1)
-> Sort (cost=74815.94..74815.98 rows=14 width=944) (actual time=1444.053..1444.065 rows=100 loops=1)
Sort Key: (COALESCE(ctedata2.total_sales_amount, '0'::numeric)) DESC
Sort Method: top-N heapsort Memory: 76kB
-> WindowAgg (cost=72207.31..74815.68 rows=14 width=944) (actual time=1439.128..1441.885 rows=3151 loops=1)
-> Hash Right Join (cost=72207.31..74815.40 rows=14 width=927) (actual time=1307.531..1437.246 rows=3151 loops=1)
Hash Cond: ((ctedata2.im_name = (table1.im_name)::text) AND (ctedata2.kpibrand = (table1.brand)::text))
-> CTE Scan on ctedata2 (cost=0.00..869.36 rows=43468 width=592) (actual time=1088.911..1209.646 rows=73121 loops=1)
-> Hash (cost=72207.10..72207.10 rows=14 width=399) (actual time=216.850..216.850 rows=3151 loops=1)
Buckets: 4096 (originally 1024) Batches: 1 (originally 1) Memory Usage: 1249kB
-> Bitmap Heap Scan on table1 (cost=10960.95..72207.10 rows=14 width=399) (actual time=46.434..214.246 rows=3151 loops=1)
Recheck Cond: (id_region = ANY ('{1}'::integer[]))
Filter: ((deleted_at IS NULL) AND (department = 'Department1'::text) AND ((brand)::text = ANY ('{Brand1, Brand2}'::text[])) AND ('season1'::text = ANY ((lower((seasons)::text))::text[])))
Rows Removed by Filter: 106335
Heap Blocks: exact=42899
-> Bitmap Index Scan on table1_im_name_id_region_key (cost=0.00..10960.94 rows=110619 width=0) (actual time=34.849..34.849 rows=109486 loops=1)
Index Cond: (id_region = ANY ('{1}'::integer[]))
SubPlan 3
-> Aggregate (cost=1108.80..1108.81 rows=1 width=8) (actual time=0.015..0.015 rows=1 loops=100)
-> Nested Loop Left Join (cost=5.57..1108.57 rows=93 width=4) (actual time=0.006..0.014 rows=3 loops=100)
-> Bitmap Heap Scan on table3 (cost=5.15..350.95 rows=93 width=4) (actual time=0.004..0.006 rows=3 loops=100)
Recheck Cond: (id_pf_item = table1.id_pf_item)
Filter: (deleted_at IS NULL)
Heap Blocks: exact=107
-> Bitmap Index Scan on idx_id_pf_item (cost=0.00..5.12 rows=93 width=0) (actual time=0.003..0.003 rows=3 loops=100)
Index Cond: (id_pf_item = table1.id_pf_item)
-> Index Scan using index_table4_id_item on table4 (cost=0.42..8.14 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=303)
Index Cond: (table3.id_item = id_item)
Filter: (deleted_at IS NULL)
Rows Removed by Filter: 0
Planning time: 0.760 ms
Execution time: 1448.848 ms
My Observation
The join strategy for table1 left join ctedata2 changes after the lower() function is avoided. The strategy changes from nested loop left join to hash right join.
The CTE Scan node on ctedata2 is executed only once in the better performing query.
Postgres Version
9.6
Please help me to understand this behaviour. I will supply additional info if required.

It is almost not worthwhile taking a deep dive into the inner workings of a nearly-obsolete version. That time and energy is probably better spent jollying along an upgrade.
But the problem is pretty plain. Your scan on table1 is estimated dreadfully, although 14 times less dreadful in the better plan.
-> Bitmap Heap Scan on table1 (cost=10960.95..72483.64 rows=1 width=399) (actual time=45.466..278.376 rows=3151 loops=1)
-> Bitmap Heap Scan on table1 (cost=10960.95..72207.10 rows=14 width=399) (actual time=46.434..214.246 rows=3151 loops=1)
Your use of lower(), apparently without reason, surely contributes to the poor estimation. And dynamically converting a string into an array certainly doesn't help either. If it were stored as a real array in the first place, the statistics system could get its hands on it and generate more reasonable estimates.

index only scan taking longer to run

In the below execution plan, the index scan on five_lima (table has 900m records) is where it's spending most of its time. I want to bring down the runtime to few seconds, how do I optimize it? Tried forcing seq scan and ran vacuum/analyze but it is not helping.
As per explain analysis from depesz, the index scan on five_lima is spending 86% of time.
five_lima 2 43,600.875 ms 86.6 %
Index Only Scan Backward 1 21,936.780 ms 50.3 %
Index Scan 1 21,664.095 ms 49.7 %
https://explain.depesz.com/s/7lXg
GroupAggregate (cost=5236122.79..5238409.75 rows=19058 width=392) (actual time=50337.968..50338.284 rows=76 loops=1)
Group Key: ((((((three.papa)::text || 'sierra_tango'::text) || (quebec_three.mike_india)::text) || bravo_five((quebec_three.sierra_uniform)::text, 3, 'november_golf'::text)) || 'lima_charlie'::text)), quebec_three.mike_india, quebec_three.sierra_uniform
-> Sort (cost=5236122.79..5236170.44 rows=19058 width=120) (actual time=50337.880..50337.903 rows=773 loops=1)
Sort Key: ((((((three.papa)::text || 'sierra_tango'::text) || (quebec_three.mike_india)::text) || bravo_five((quebec_three.sierra_uniform)::text, 3, 'november_golf'::text)) || 'lima_charlie'::text)), quebec_three.mike_india, quebec_three.sierra_uniform
Sort Method: quicksort Memory: 142kB
-> Hash Left Join (cost=5221327.29..5234767.95 rows=19058 width=120) (actual time=49423.721..50337.319 rows=773 loops=1)
Hash Cond: (((quebec_three.mike_india)::bpchar = three.mike_india) AND (quebec_three.sierra_uniform = three.sierra_uniform))
-> GroupAggregate (cost=5221204.51..5233639.85 rows=19058 width=292) (actual time=49422.982..50336.121 rows=773 loops=1)
Group Key: quebec_three.mike_india, quebec_three.sierra_uniform, quebec_three.whiskey, quebec_three.tango, quebec_three.juliet_charlie, quebec_three.victor_papa, quebec_three.yankee, quebec_three.india_papa, quebec_three.victor_charlie, quebec_three.november_hotel, quebec_three.hotel_november
-> Sort (cost=5221204.51..5221680.96 rows=190580 width=228) (actual time=49408.728..49416.532 rows=250551 loops=1)
Sort Key: quebec_three.mike_india, quebec_three.sierra_uniform, quebec_three.whiskey, quebec_three.tango, quebec_three.juliet_charlie, quebec_three.victor_papa, quebec_three.yankee, quebec_three.india_papa, quebec_three.victor_charlie, quebec_three.november_hotel, quebec_three.hotel_november
Sort Method: quicksort Memory: 27472kB
-> Subquery Scan on quebec_three (cost=5191626.46..5204490.61 rows=190580 width=228) (actual time=49045.224..49167.610 rows=250551 loops=1)
-> Unique (cost=5191626.46..5198773.21 rows=190580 width=286) (actual time=49045.204..49136.969 rows=250551 loops=1)
-> Sort (cost=5191626.46..5192102.91 rows=190580 width=286) (actual time=49045.190..49071.536 rows=252496 loops=1)
Sort Key: mike_november1.sierra_uniform, mike_november1.charlie_six, mike_november1.foxtrot_india, (xray(mike_november1.delta_xray, 'zulu'::text)), mike_november1.whiskey, quebec_sierra.tango, quebec_sierra.juliet_charlie, golf.oscar_lima, (CASE WHEN ((("six_four"((five_hotel.tango)::text, 2))::integer = 5) AND (five_hotel.juliet_charlie <> november_november ('charlie_tango'::bpchar[]))) THEN 'oscar_romeo'::text ELSE NULL::text END), (CASE WHEN ((("six_four"((five_hotel.tango)::text, 2))::integer = 5) AND (five_hotel.juliet_charlie = ANY ('charlie_tango'::bpchar[]))) THEN 'romeo'::text ELSE NULL::text END), (CASE WHEN ((("six_four"((five_hotel.tango)::text, 2))::integer = 6) AND (five_hotel.juliet_charlie <> november_november ('charlie_tango'::bpchar[]))) THEN 'oscar_romeo'::text ELSE NULL::text END), (CASE WHEN ((("six_four"((five_hotel.tango)::text, 2))::integer = 6) AND (five_hotel.juliet_charlie = ANY ('charlie_tango'::bpchar[]))) THEN 'romeo'::text ELSE NULL::text END), (CASE WHEN (golf.oscar_lima five_romeo NOT NULL) THEN 'delta_foxtrot'::text ELSE 'oscar_romeo'::text END), (CASE WHEN (("six_four"((five_hotel.tango)::text, 2))::integer = 15) THEN 'oscar_romeo'::text ELSE NULL::text END)
Sort Method: quicksort Memory: 41652kB
-> Nested Loop Left Join (cost=661986.99..5174912.56 rows=190580 width=286) (actual time=1737.304..47625.922 rows=252496 loops=1)
-> Gather (cost=661986.29..3041816.11 rows=190580 width=79) (actual time=1733.755..1827.448 rows=252383 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Hash Left Join (cost=660986.29..3021758.11 rows=79408 width=79) (actual time=1723.881..8094.375 rows=84128 loops=3)
Hash Cond: (((mike_november1.mike_india)::text = (seven_quebec.mike_india)::text) AND ((mike_november1.foxtrot_india)::text = (seven_quebec.foxtrot_india)::text) AND (mike_november1.sierra_uniform = seven_quebec.sierra_uniform))
-> Nested Loop Left Join (cost=420698.30..2780844.78 rows=79408 width=74) (actual time=1263.213..7579.180 rows=84128 loops=3)
-> Nested Loop Left Join (cost=420697.72..2359299.65 rows=79408 width=69) (actual time=1262.995..6114.930 rows=84128 loops=3)
-> Nested Loop Left Join (cost=420697.15..1943062.02 rows=79408 width=53) (actual time=1262.617..4846.387 rows=83834 loops=3)
-> Parallel Hash Left Join (cost=420691.40..1289489.61 rows=79408 width=53) (actual time=1262.432..3244.712 rows=83191 loops=3)
Hash Cond: (((mike_november1.mike_india)::text = (five_hotel.mike_india)::text) AND ((mike_november1.foxtrot_india)::text = (five_hotel.foxtrot_india)::text) AND (mike_november1.sierra_uniform = five_hotel.sierra_uniform))
-> Parallel Index Scan using lima_papa on six_echo juliet_xray_xray (cost=0.57..867904.42 rows=79408 width=45) (actual time=1.304..1922.996 rows=83190 loops=3)
Index Cond: ((delta_xray >= 'four_kilo'::timestamp without time zone) AND (delta_xray <= 'uniform'::timestamp without time zone) AND ((mike_india)::text = 'five_papa'::text))
Filter: (oscar_quebec = 'quebec_golf'::numeric)
Rows Removed by Filter: 115955
-> Parallel Hash (cost=404875.54..404875.54 rows=421741 width=32) (actual time=1259.567..1259.567 rows=93164 loops=3)
Buckets: 1048576 Batches: 1 Memory Usage: 27936kB
-> Parallel Bitmap Heap Scan on delta_echo five_hotel (cost=87668.95..404875.54 rows=421741 width=32) (actual time=947.771..1217.127 rows=93164 loops=3)
Recheck Cond: ((mike_india)::text = 'five_papa'::text)
Heap Blocks: exact=24664
-> Bitmap Index Scan on india_three (cost=0.00..87415.90 rows=1012179 width=0) (actual time=935.805..935.805 rows=466562 loops=1)
Index Cond: ((mike_india)::text = 'five_papa'::text)
-> Bitmap Heap Scan on two_bravo juliet_xray_delta (cost=5.76..8.20 rows=1 width=32) (actual time=0.018..0.018 rows=1 loops=249572)
Recheck Cond: (((charlie_six)::text = (mike_november1.charlie_six)::text) AND ((foxtrot_india)::text = (mike_november1.foxtrot_india)::text))
Filter: (((mike_india)::text = 'five_papa'::text) AND ((mike_india)::text = (mike_november1.mike_india)::text) AND (sierra_uniform = mike_november1.sierra_uniform))
Heap Blocks: exact=16
-> BitmapAnd (cost=5.76..5.76 rows=1 width=0) (actual time=0.016..0.016 rows=0 loops=249572)
-> Bitmap Index Scan on victor_three (cost=0.00..2.74 rows=5 width=0) (actual time=0.010..0.010 rows=1 loops=249572)
Index Cond: ((charlie_six)::text = (mike_november1.charlie_six)::text)
-> Bitmap Index Scan on two_delta (cost=0.00..2.77 rows=16 width=0) (actual time=0.010..0.010 rows=1 loops=128367)
Index Cond: ((foxtrot_india)::text = (mike_november1.foxtrot_india)::text)
-> Index Scan using hotel_oscar on charlie_yankee golf (cost=0.57..5.21 rows=1 width=40) (actual time=0.015..0.015 rows=1 loops=251501)
Index Cond: ((foxtrot_india)::text = (mike_november1.foxtrot_india)::text)
Filter: (((mike_india)::text = 'five_papa'::text) AND ((mike_india)::text = (mike_november1.mike_india)::text) AND (sierra_uniform = mike_november1.sierra_uniform))
Rows Removed by Filter: 0
-> Index Scan using seven_victor on five_charlie bravo_oscar (cost=0.57..5.28 rows=1 width=23) (actual time=0.017..0.017 rows=1 loops=252383)
Index Cond: ((charlie_six)::text = (mike_november1.charlie_six)::text)
Filter: (((mike_india)::text = 'five_papa'::text) AND ((mike_india)::text = (mike_november1.mike_india)::text) AND (sierra_uniform = mike_november1.sierra_uniform))
Rows Removed by Filter: 0
-> Parallel Hash (cost=232855.56..232855.56 rows=198198 width=29) (actual time=459.697..459.697 rows=66317 loops=3)
Buckets: 524288 Batches: 1 Memory Usage: 16608kB
-> Parallel Bitmap Heap Scan on victor_four seven_quebec (cost=20722.12..232855.56 rows=198198 width=29) (actual time=111.386..434.496 rows=66317 loops=3)
Recheck Cond: ((mike_india)::text = 'five_papa'::text)
Heap Blocks: exact=27484
-> Bitmap Index Scan on four_charlie (cost=0.00..20603.20 rows=475676 width=0) (actual time=107.013..107.013 rows=227154 loops=1)
Index Cond: ((mike_india)::text = 'five_papa'::text)
-> Index Scan using hotel_whiskey on five_lima quebec_sierra (cost=0.70..11.08 rows=1 width=33) (actual time=0.179..0.180 rows=1 loops=252383)
Index Cond: (((foxtrot_india)::text = (mike_november1.foxtrot_india)::text) AND ((mike_india)::text = (mike_november1.mike_india)::text) AND ((mike_india)::text = 'five_papa'::text) AND (sierra_uniform = mike_november1.sierra_uniform))
Filter: (bravo_lima = (delta_four 2))
Rows Removed by Filter: 6
SubPlan
-> Result (cost=5.55..5.58 rows=1 width=8) (actual time=0.013..0.013 rows=1 loops=1828065)
InitPlan
-> Limit (cost=0.70..5.55 rows=1 width=8) (actual time=0.012..0.012 rows=1 loops=1828065)
-> Index Only Scan Backward using hotel_whiskey on five_lima foxtrot_four (cost=0.70..5.55 rows=1 width=8) (actual time=0.012..0.012 rows=1 loops=1828065)
Index Cond: ((foxtrot_india = (quebec_sierra.foxtrot_india)::text) AND (mike_india = (quebec_sierra.mike_india)::text) AND (sierra_uniform = quebec_sierra.sierra_uniform) AND (oscar_quebec = quebec_sierra.oscar_quebec) AND (bravo_lima five_romeo NOT NULL))
Heap Fetches: 18062
-> Hash (cost=70.67..70.67 rows=1489 width=20) (actual time=0.644..0.644 rows=1489 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 95kB
-> Seq Scan on bravo_zulu three (cost=0.00..70.67 rows=1489 width=20) (actual time=0.048..0.416 rows=1489 loops=1)
Planning time: 24.541 ms
Execution time: 50356.651 ms
Here is the query -
explain analyze select quebec_three.mike_india,quebec_three.sierra_uniform,papa ||'('||quebec_three.mike_india||bravo_five(quebec_three.sierra_uniform::text,3,'0')||')' as papa,
sum( dms_appl_pending + dms_appl_done + no_of_fee_pending + veri_appl_pending )no_of_appl_done,
sum(veri_appl_done)veri_appl_done,sum(veri_appl_rejected)veri_appl_rejected,
sum(veri_appl_pending)veri_appl_pending,sum(app_appl_done)appr_appl_done,sum(app_appl_rejected)appr_appl_rejected,
sum(app_appl_pending)appr_appl_pending
,sum(no_of_fee_pending)no_of_fee_pending,sum(no_of_fee_done)no_of_fee_done
,sum(dms_appl_pending)dms_appl_pending,sum(dms_appl_done)dms_appl_done
from(
select quebec_three.mike_india,quebec_three.sierra_uniform,
case when right(quebec_three.tango::text,2)::int=05 and quebec_three.juliet_charlie ='C' then count(distinct quebec_three.foxtrot_india) else 0 end veri_appl_done,
case when (victor_papa='R' ) then count(quebec_three.foxtrot_india) else 0 end as veri_appl_rejected,
case when (yankee='P' ) then count(distinct quebec_three.foxtrot_india) else 0 end as veri_appl_pending,
case when whiskey='A' and (right(quebec_three.tango::text,2)::int=06 and quebec_three.juliet_charlie ='C') then count(distinct quebec_three.foxtrot_india) else 0 end app_appl_done,
case when (india_papa='R') then count(distinct quebec_three.foxtrot_india) else 0 end as app_appl_rejected,
case when (victor_charlie='P') then count(distinct quebec_three.foxtrot_india) else 0 end as app_appl_pending,
case when november_hotel='P' then count(distinct quebec_three.foxtrot_india) else 0 end as no_of_fee_pending,
case when november_hotel='A' then count(distinct quebec_three.foxtrot_india) else 0 end as no_of_fee_done,
case when (hotel_november='P' ) then count(distinct quebec_three.foxtrot_india) else 0 end as dms_appl_pending,
case when right(quebec_three.tango::text,2)::int=15 and quebec_three.juliet_charlie ='C' then count(distinct quebec_three.foxtrot_india) else 0 end dms_appl_done
from(
select distinct quebec_three.mike_india,quebec_three.sierra_uniform ,quebec_three.charlie_six,quebec_three.foxtrot_india,xray(quebec_three.delta_xray,'dd-Mon-yyyy')delta_xray,quebec_three.whiskey,quebec_sierra.tango,quebec_sierra.juliet_charlie,golf.oscar_lima,
case when right(five_hotel.tango::text,2)::int=05 and five_hotel.juliet_charlie not in ('M','I') then oscar_romeo else null end yankee,
case when right(five_hotel.tango::text,2)::int=05 and five_hotel.juliet_charlie in ('M','I') then romeo else null end victor_papa,
case when right(five_hotel.tango::text,2)::int=06 and five_hotel.juliet_charlie not in ('M','I') then oscar_romeo else null end victor_charlie,
case when right(five_hotel.tango::text,2)::int=06 and five_hotel.juliet_charlie in ('M','I') then romeo else null end india_papa,
case when golf.oscar_lima is not null then delta_foxtrot else oscar_romeo end november_hotel,
case when right(five_hotel.tango::text,2)::int=15 then oscar_romeo else null end hotel_november
from six_echo quebec_three
left join delta_echo five_hotel on five_hotel.foxtrot_india=quebec_three.foxtrot_india and five_hotel.mike_india=quebec_three.mike_india and five_hotel.sierra_uniform=quebec_three.sierra_uniform
left join five_lima quebec_sierra on quebec_sierra.foxtrot_india=quebec_three.foxtrot_india and quebec_sierra.mike_india=quebec_three.mike_india and quebec_sierra.sierra_uniform=quebec_three.sierra_uniform
and quebec_sierra.bravo_lima =(select max(bravo_lima) from five_lima foxtrot_four where foxtrot_four.foxtrot_india=quebec_sierra.foxtrot_india
and foxtrot_four.mike_india=quebec_sierra.mike_india and foxtrot_four.sierra_uniform=quebec_sierra.sierra_uniform and foxtrot_four.oscar_quebec=quebec_sierra.oscar_quebec)
left join hsrp.vt_hsrp h on h.foxtrot_india=quebec_three.foxtrot_india and h.charlie_six=quebec_three.charlie_six and h.mike_india=quebec_three.mike_india and h.sierra_uniform=quebec_three.sierra_uniform
left join two_bravo juliet_xray_delta on juliet_xray_delta.foxtrot_india=quebec_three.foxtrot_india and juliet_xray_delta.charlie_six=quebec_three.charlie_six and juliet_xray_delta.mike_india=quebec_three.mike_india and juliet_xray_delta.sierra_uniform=quebec_three.sierra_uniform
left join charlie_yankee golf on golf.foxtrot_india=quebec_three.foxtrot_india and golf.mike_india=quebec_three.mike_india and golf.sierra_uniform=quebec_three.sierra_uniform
left join five_charlie bravo_oscar on bravo_oscar.charlie_six=quebec_three.charlie_six and bravo_oscar.mike_india=quebec_three.mike_india and bravo_oscar.sierra_uniform=quebec_three.sierra_uniform
left join victor_four seven_quebec on seven_quebec.foxtrot_india=quebec_three.foxtrot_india and seven_quebec.mike_india=quebec_three.mike_india and seven_quebec.sierra_uniform=quebec_three.sierra_uniform
left join vm_vh_class vh on vh.vh_class=COALESCE(bravo_oscar.vh_class,seven_quebec.vh_class)
where quebec_three.mike_india='UP' and case when 0=0 then true else quebec_three.sierra_uniform=0 end and quebec_three.delta_xray between '2021-03-01 00:00:00.000000 +05:30' and ('2021-04-02 23:59:59.999000 +05:30'::date + interval '1 day' - interval '1 sec')
and quebec_three.oscar_quebec in (123)
)quebec_three
group by 1,2,whiskey,quebec_three.tango,quebec_three.juliet_charlie,victor_papa,yankee,india_papa,victor_charlie,november_hotel,hotel_november
)quebec_three
left join bravo_zulu three on three.mike_india=quebec_three.mike_india and three.sierra_uniform=quebec_three.sierra_uniform
group by 1,2,3 order by 3;
Adding orignal partial query/indexes/plan -
Partial query:
.....
vow4(# left join vha_status c on c.appl_no=a.appl_no and c.state_cd=a.state_cd and c.off_cd=a.off_cd
vow4(# and c.moved_on =(select max(moved_on) from vha_status c1 where c1.appl_no=c.appl_no
vow4(# and c1.state_cd=c.state_cd and c1.off_cd=c.off_cd and c1.pur_cd=c.pur_cd)
.....
vow4(# where a.state_cd='UP' and case when 0=0 then true else a.off_cd=0 end and a.appl_dt between '2021-03-01 00:00:00.000000 +05:30' and ('2021-04-02 23:59:59.999000 +05:30'::date + interval '1 day' - interval '1 sec')
vow4(# and a.pur_cd in (123)
Indexes:
"vha_status_pkey" PRIMARY KEY, btree (appl_no, pur_cd, file_movement_slno)
"idx_state_cd_vha_status" btree (state_cd)
"va_status_moved_on_indx" btree (moved_on)
"vha_status_appl_no_state_cd_off_cd_pur_cd_moved_on_idx" btree (appl_no, state_cd, off_cd, pur_cd, moved_on)
"vha_status_movedon_state_cd_off_cd_idx" btree (moved_on, state_cd, off_cd)
Partial Plan:
-> Index Scan using vha_status_appl_no_state_cd_off_cd_pur_cd_moved_on_idx on vha_status c (cost=0.70..11.08 rows=1 width=33) (actual time=0.179..0.180 rows=1 loops=252383)
Index Cond: (((appl_no)::text = (a_1.appl_no)::text) AND ((state_cd)::text = (a_1.state_cd)::text) AND ((state_cd)::text = 'UP'::text) AND (off_cd = a_1.off_cd))
Filter: (moved_on = (SubPlan 2))
Rows Removed by Filter: 6
SubPlan 2
-> Result (cost=5.55..5.58 rows=1 width=8) (actual time=0.013..0.013 rows=1 loops=1828065)
InitPlan 1 (returns $4)
-> Limit (cost=0.70..5.55 rows=1 width=8) (actual time=0.012..0.012 rows=1 loops=1828065)
-> Index Only Scan Backward using vha_status_appl_no_state_cd_off_cd_pur_cd_moved_on_idx on vha_status c1 (cost=0.70..5.55 rows=1 width=8) (actual time=0.012..0.012 rows=1 loops=1828065)
Index Cond: ((appl_no = (c.appl_no)::text) AND (state_cd = (c.state_cd)::text) AND (off_cd = c.off_cd) AND (pur_cd = c.pur_cd) AND (moved_on IS NOT NULL))
Heap Fetches: 18062

This looks like primarily that there is a nested loop on five_lima (900 million) rows are not the problem , no. of times your are querying is looks like.
Most probable solution for this would be either don't do that much query if possible, can discuss on exact solution , or else try to first limit the record by filtering and then query
In general querying that many times is not preferable.

Postgres Table Slow Performance

We have a Product table in postgres DB. This is hosted on Heroku. We have 8 GB RAM and 250 GB disk space. 1000 IPOP allowed.
We are having proper indexes on columns.
Platform
PostgreSQL 9.5.12 on x86_64-pc-linux-gnu (Ubuntu 9.5.12-1.pgdg14.04+1), compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit
We are running a keywords search query on this table. We are having 2.8 millions records in this table. Our search query is too slow. Its giving us result in about 50 seconds. Which is too slow.
Query
SELECT
P .sfid AS prodsfid,
P .image_url__c image,
P .productcode sku,
P .Short_Description__c shortDesc,
P . NAME pname,
P .category__c,
P .price__c price,
P .description,
P .vendor_name__c vname,
P .vendor__c supSfid
FROM
staging.product2 P
JOIN (
SELECT
p1.sfid
FROM
staging.product2 p1
WHERE
p1. NAME ILIKE '%s%'
OR p1.productcode ILIKE '%s%'
) AS TEMP ON (P .sfid = TEMP .sfid)
WHERE
P .status__c = 'Available'
AND LOWER (
P .vendor_shipping_country__c
) = ANY (
VALUES
('us'),
('usa'),
('united states'),
('united states of america')
)
AND P .vendor_catalog_tier__c = ANY (
VALUES
('a1c37000000oljnAAA'),
('a1c37000000oljQAAQ'),
('a1c37000000oljQAAQ'),
('a1c37000000pT7IAAU'),
('a1c37000000omDjAAI'),
('a1c37000000oljMAAQ'),
('a1c37000000oljaAAA'),
('a1c37000000pT7SAAU'),
('a1c0R000000AFcVQAW'),
('a1c0R000000A1HAQA0'),
('a1c0R0000000OpWQAU'),
('a1c0R0000005TZMQA2'),
('a1c37000000oljdAAA'),
('a1c37000000ooTqAAI'),
('a1c37000000omLBAAY'),
('a1c0R0000005N8GQAU')
)
Here is the explain plan:
Nested Loop (cost=31.85..33886.54 rows=3681 width=750)
-> Hash Join (cost=31.77..31433.07 rows=4415 width=750)
Hash Cond: (lower((p.vendor_shipping_country__c)::text) = "*VALUES*".column1)
-> Nested Loop (cost=31.73..31423.67 rows=8830 width=761)
-> HashAggregate (cost=0.06..0.11 rows=16 width=32)
Group Key: "*VALUES*_1".column1
-> Values Scan on "*VALUES*_1" (cost=0.00..0.06 rows=16 width=32)
-> Bitmap Heap Scan on product2 p (cost=31.66..1962.32 rows=552 width=780)
Recheck Cond: ((vendor_catalog_tier__c)::text = "*VALUES*_1".column1)
Filter: ((status__c)::text = 'Available'::text)
-> Bitmap Index Scan on vendor_catalog_tier_prd_idx (cost=0.00..31.64 rows=1016 width=0)
Index Cond: ((vendor_catalog_tier__c)::text = "*VALUES*_1".column1)
-> Hash (cost=0.03..0.03 rows=4 width=32)
-> Unique (cost=0.02..0.03 rows=4 width=32)
-> Sort (cost=0.02..0.02 rows=4 width=32)
Sort Key: "*VALUES*".column1
-> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=4 width=32)
-> Index Scan using sfid_prd_idx on product2 p1 (cost=0.09..0.55 rows=1 width=19)
Index Cond: ((sfid)::text = (p.sfid)::text)
Filter: (((name)::text ~~* '%s%'::text) OR ((productcode)::text ~~* '%s%'::text))
Its returning around 140,576 records. By the way we need only top 5,000 records only. Will putting Limit help here?
Let me know how to make it fast and what is causing this slow.
EXPLAIN ANALYZE
#RaymondNijland Here is the explain analyze
Nested Loop (cost=31.83..33427.28 rows=4039 width=750) (actual time=1.903..4384.221 rows=140576 loops=1)
-> Hash Join (cost=31.74..30971.32 rows=4369 width=750) (actual time=1.852..1094.964 rows=164353 loops=1)
Hash Cond: (lower((p.vendor_shipping_country__c)::text) = "*VALUES*".column1)
-> Nested Loop (cost=31.70..30962.02 rows=8738 width=761) (actual time=1.800..911.738 rows=164353 loops=1)
-> HashAggregate (cost=0.06..0.11 rows=16 width=32) (actual time=0.012..0.019 rows=15 loops=1)
Group Key: "*VALUES*_1".column1
-> Values Scan on "*VALUES*_1" (cost=0.00..0.06 rows=16 width=32) (actual time=0.004..0.005 rows=16 loops=1)
-> Bitmap Heap Scan on product2 p (cost=31.64..1933.48 rows=546 width=780) (actual time=26.004..57.290 rows=10957 loops=15)
Recheck Cond: ((vendor_catalog_tier__c)::text = "*VALUES*_1".column1)
Filter: ((status__c)::text = 'Available'::text)
Rows Removed by Filter: 645
Heap Blocks: exact=88436
-> Bitmap Index Scan on vendor_catalog_tier_prd_idx (cost=0.00..31.61 rows=1000 width=0) (actual time=24.811..24.811 rows=11601 loops=15)
Index Cond: ((vendor_catalog_tier__c)::text = "*VALUES*_1".column1)
-> Hash (cost=0.03..0.03 rows=4 width=32) (actual time=0.032..0.032 rows=4 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Unique (cost=0.02..0.03 rows=4 width=32) (actual time=0.026..0.027 rows=4 loops=1)
-> Sort (cost=0.02..0.02 rows=4 width=32) (actual time=0.026..0.026 rows=4 loops=1)
Sort Key: "*VALUES*".column1
Sort Method: quicksort Memory: 25kB
-> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=4 width=32) (actual time=0.001..0.002 rows=4 loops=1)
-> Index Scan using sfid_prd_idx on product2 p1 (cost=0.09..0.56 rows=1 width=19) (actual time=0.019..0.020 rows=1 loops=164353)
Index Cond: ((sfid)::text = (p.sfid)::text)
Filter: (((name)::text ~~* '%s%'::text) OR ((productcode)::text ~~* '%s%'::text))
Rows Removed by Filter: 0
Planning time: 2.488 ms
Execution time: 4391.378 ms
Another query version, with order by , but it seems very slow as well (140 seconds)
SELECT
P .sfid AS prodsfid,
P .image_url__c image,
P .productcode sku,
P .Short_Description__c shortDesc,
P . NAME pname,
P .category__c,
P .price__c price,
P .description,
P .vendor_name__c vname,
P .vendor__c supSfid
FROM
staging.product2 P
WHERE
P .status__c = 'Available'
AND P .vendor_shipping_country__c IN (
'us',
'usa',
'united states',
'united states of america'
)
AND P .vendor_catalog_tier__c IN (
'a1c37000000omDQAAY',
'a1c37000000omDTAAY',
'a1c37000000omDXAAY',
'a1c37000000omDYAAY',
'a1c37000000omDZAAY',
'a1c37000000omDdAAI',
'a1c37000000omDfAAI',
'a1c37000000omDiAAI',
'a1c37000000oml6AAA',
'a1c37000000oljPAAQ',
'a1c37000000oljRAAQ',
'a1c37000000oljWAAQ',
'a1c37000000oljXAAQ',
'a1c37000000oljZAAQ',
'a1c37000000oljcAAA',
'a1c37000000oljdAAA',
'a1c37000000oljlAAA',
'a1c37000000oljoAAA',
'a1c37000000oljqAAA',
'a1c37000000olnvAAA',
'a1c37000000olnwAAA',
'a1c37000000olnxAAA',
'a1c37000000olnyAAA',
'a1c37000000olo0AAA',
'a1c37000000olo1AAA',
'a1c37000000olo4AAA',
'a1c37000000olo8AAA',
'a1c37000000olo9AAA',
'a1c37000000oloCAAQ',
'a1c37000000oloFAAQ',
'a1c37000000oloIAAQ',
'a1c37000000oloJAAQ',
'a1c37000000oloMAAQ',
'a1c37000000oloNAAQ',
'a1c37000000oloSAAQ',
'a1c37000000olodAAA',
'a1c37000000oloeAAA',
'a1c37000000olzCAAQ',
'a1c37000000om0xAAA',
'a1c37000000ooV1AAI',
'a1c37000000oog8AAA',
'a1c37000000oogDAAQ',
'a1c37000000oonzAAA',
'a1c37000000oluuAAA',
'a1c37000000pT7SAAU',
'a1c37000000oljnAAA',
'a1c37000000olumAAA',
'a1c37000000oljpAAA',
'a1c37000000pUm2AAE',
'a1c37000000olo3AAA',
'a1c37000000oo1MAAQ',
'a1c37000000oo1vAAA',
'a1c37000000pWxgAAE',
'a1c37000000pYJkAAM',
'a1c37000000omDjAAI',
'a1c37000000ooTgAAI',
'a1c37000000op2GAAQ',
'a1c37000000one0AAA',
'a1c37000000oljYAAQ',
'a1c37000000pUlxAAE',
'a1c37000000oo9SAAQ',
'a1c37000000pcIYAAY',
'a1c37000000pamtAAA',
'a1c37000000pd2QAAQ',
'a1c37000000pdCOAAY',
'a1c37000000OpPaAAK',
'a1c37000000OphZAAS',
'a1c37000000olNkAAI'
)
ORDER BY p.productcode asc
LIMIT 5000
Here is the explain analyse for this:
Limit (cost=0.09..45271.54 rows=5000 width=750) (actual time=48593.355..86376.864 rows=5000 loops=1)
-> Index Scan using productcode_prd_idx on product2 p (cost=0.09..743031.39 rows=82064 width=750) (actual time=48593.353..86376.283 rows=5000 loops=1)
Filter: (((status__c)::text = 'Available'::text) AND ((vendor_shipping_country__c)::text = ANY ('{us,usa,"united states","united states of america"}'::text[])) AND ((vendor_catalog_tier__c)::text = ANY ('{a1c37000000omDQAAY,a1c37000000omDTAAY,a1c37000000omDXAAY,a1c37000000omDYAAY,a1c37000000omDZAAY,a1c37000000omDdAAI,a1c37000000omDfAAI,a1c37000000omDiAAI,a1c37000000oml6AAA,a1c37000000oljPAAQ,a1c37000000oljRAAQ,a1c37000000oljWAAQ,a1c37000000oljXAAQ,a1c37000000oljZAAQ,a1c37000000oljcAAA,a1c37000000oljdAAA,a1c37000000oljlAAA,a1c37000000oljoAAA,a1c37000000oljqAAA,a1c37000000olnvAAA,a1c37000000olnwAAA,a1c37000000olnxAAA,a1c37000000olnyAAA,a1c37000000olo0AAA,a1c37000000olo1AAA,a1c37000000olo4AAA,a1c37000000olo8AAA,a1c37000000olo9AAA,a1c37000000oloCAAQ,a1c37000000oloFAAQ,a1c37000000oloIAAQ,a1c37000000oloJAAQ,a1c37000000oloMAAQ,a1c37000000oloNAAQ,a1c37000000oloSAAQ,a1c37000000olodAAA,a1c37000000oloeAAA,a1c37000000olzCAAQ,a1c37000000om0xAAA,a1c37000000ooV1AAI,a1c37000000oog8AAA,a1c37000000oogDAAQ,a1c37000000oonzAAA,a1c37000000oluuAAA,a1c37000000pT7SAAU,a1c37000000oljnAAA,a1c37000000olumAAA,a1c37000000oljpAAA,a1c37000000pUm2AAE,a1c37000000olo3AAA,a1c37000000oo1MAAQ,a1c37000000oo1vAAA,a1c37000000pWxgAAE,a1c37000000pYJkAAM,a1c37000000omDjAAI,a1c37000000ooTgAAI,a1c37000000op2GAAQ,a1c37000000one0AAA,a1c37000000oljYAAQ,a1c37000000pUlxAAE,a1c37000000oo9SAAQ,a1c37000000pcIYAAY,a1c37000000pamtAAA,a1c37000000pd2QAAQ,a1c37000000pdCOAAY,a1c37000000OpPaAAK,a1c37000000OphZAAS,a1c37000000olNkAAI}'::text[])))
Rows Removed by Filter: 1707920
Planning time: 1.685 ms
Execution time: 86377.139 ms
Thanks
Aslam Bari

You might want to consider a GIN or GIST index on your staging.product2 table. Double-sided ILIKEs are slow and difficult to improve substantially. I've seen a GIN index improve a similar query by 60-80%.
See this doc.

How can I get performance using PostgreSQL CTE recursive?

I did a tree structure using id and parent_id in the same table. For query's I'm using CTE provide by PostgreSQL, but it's spend so much time to do the joins with recursive results. For example, by the time I have 100 records on sadt_lot table, and this query is spend 8 seconds to return the results. Someone have a better idea to do it?
WITH RECURSIVE downlots as (
SELECT s1.sadt_lot_id, 0 AS level, s1.sadt_lot_id as root_id
FROM sadt_lot s1
WHERE s1.parent_lot_id IS NULL
UNION
SELECT s2.sadt_lot_id, d.level + 1, d.sadt_lot_id as root_id
FROM sadt_lot s2
INNER JOIN downlots d ON d.sadt_lot_id = s2.parent_lot_id
)
SELECT
"s"."sadt_lot_id",
"s"."name", concat(lpad(s.sadt_lot_id::TEXT, 3, '0'), '-', to_char(to_timestamp(s.created_at), 'DDMMYY')) sadt_lot_code,
"s"."created_at" AS "created_at",
"s"."version" AS "version", "s"."sadt_lot_status_id",
SUM(procedure_performed.amount_requested) procedures_total,
SUM(procedure_performed.total_value) procedures_total_value
FROM "sadt_lot" "s"
LEFT JOIN "sadt" ON sadt.sadt_lot_id = any(SELECT sadt_lot_id FROM downlots WHERE root_id = s.sadt_lot_id)
LEFT JOIN "procedure_auth" ON sadt.procedure_auth_id = procedure_auth.procedure_auth_id
LEFT JOIN "procedure_performed" ON procedure_auth.procedure_auth_id = procedure_performed.procedure_auth_id
WHERE "s"."parent_lot_id" IS NULL
GROUP BY "s"."sadt_lot_id"
ORDER BY "created_at" DESC
Other example listing all sadt's grouped by root sadt_lot's:
EXPLAIN ANALYZE WITH RECURSIVE downlots as (
SELECT sl1.sadt_lot_id, 0 AS level, sl1.sadt_lot_id as root_id
FROM sadt_lot sl1
WHERE sl1.parent_lot_id IS NULL
UNION
SELECT sl2.sadt_lot_id, d.level + 1, d.sadt_lot_id as root_id
FROM sadt_lot sl2
INNER JOIN downlots d ON d.sadt_lot_id = sl2.parent_lot_id
)
SELECT sl.sadt_lot_id, array_agg(s.sadt_id)
FROM sadt_lot sl
LEFT JOIN sadt s ON s.sadt_lot_id = any(SELECT sadt_lot_id FROM downlots WHERE root_id = sl.sadt_lot_id)
WHERE sl.parent_lot_id IS NULL
group by sl.sadt_lot_id
ORDEr By sl.sadt_lot_id
Query Plan
GroupAggregate (cost=42.53..15077.74 rows=1 width=36) (actual time=104.090..8436.505 rows=90 loops=1)
Group Key: sl.sadt_lot_id
CTE downlots
-> Recursive Union (cost=0.00..42.39 rows=101 width=12) (actual time=0.006..0.104 rows=95 loops=1)
-> Seq Scan on sadt_lot sl1 (cost=0.00..2.94 rows=1 width=12) (actual time=0.005..0.019 rows=90 loops=1)
Filter: (parent_lot_id IS NULL)
Rows Removed by Filter: 5
-> Hash Join (cost=0.33..3.74 rows=10 width=12) (actual time=0.027..0.028 rows=2 loops=2)
Hash Cond: (sl2.parent_lot_id = d.sadt_lot_id)
-> Seq Scan on sadt_lot sl2 (cost=0.00..2.94 rows=94 width=8) (actual time=0.002..0.008 rows=95 loops=2)
-> Hash (cost=0.20..0.20 rows=10 width=8) (actual time=0.010..0.010 rows=48 loops=2)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> WorkTable Scan on downlots d (cost=0.00..0.20 rows=10 width=8) (actual time=0.001..0.004 rows=48 loops=2)
-> Nested Loop Left Join (cost=0.14..15004.14 rows=6242 width=8) (actual time=8.234..8434.229 rows=11345 loops=1)
Join Filter: (SubPlan 2)
Rows Removed by Join Filter: 1112125
-> Index Only Scan using sadt_lot_sadt_lot_id_parent_lot_id_idx on sadt_lot sl (cost=0.14..12.86 rows=1 width=4) (actual time=0.011..0.252 rows=90 loops=1)
Index Cond: (parent_lot_id IS NULL)
Heap Fetches: 90
-> Seq Scan on sadt s (cost=0.00..635.83 rows=12483 width=8) (actual time=0.002..1.785 rows=12483 loops=90)
SubPlan 2
-> CTE Scan on downlots (cost=0.00..2.27 rows=1 width=4) (actual time=0.003..0.007 rows=1 loops=1123470)
Filter: (root_id = sl.sadt_lot_id)
Rows Removed by Filter: 94
Planning time: 0.203 ms
Execution time: 8436.598 ms

try EXPLAIN ANALYZE
WITH RECURSIVE downlots as (
SELECT sl1.sadt_lot_id, 0 AS level, sl1.sadt_lot_id as root_id
FROM sadt_lot sl1
WHERE sl1.parent_lot_id IS NULL
UNION
SELECT sl2.sadt_lot_id, d.level + 1, d.sadt_lot_id as root_id
FROM sadt_lot sl2
INNER JOIN downlots d ON d.sadt_lot_id = sl2.parent_lot_id
)
SELECT downlots.root_id AS sadt_lot_id, array_agg(s.sadt_id)
FROM downlots
LEFT JOIN sadt s ON s.sadt_lot_id = downlots.sadt_lot_id
GROUP BY downlots.root_id
ORDER BY downlots.root_id

I found the solution. I was using the recursive expression how parameter to join and it was do several lops on the table used on join, a better aprouch is before join with this table(sadt), do the join with recursive expression(downlots "table") and after, using result, do join with the sadt, with that, the query jump from 8sec to 8ms.
Follow the solution:
EXPLAIN ANALYZE SELECT sl.sadt_lot_id, array_agg(s.sadt_id)
FROM sadt_lot sl
LEFT JOIN (WITH RECURSIVE downlots as (
SELECT sl1.sadt_lot_id, 0 AS level, sl1.sadt_lot_id as root_id
FROM sadt_lot sl1
WHERE sl1.parent_lot_id IS NULL
UNION
SELECT sl2.sadt_lot_id, d.level + 1, d.sadt_lot_id as root_id
FROM sadt_lot sl2
INNER JOIN downlots d ON d.sadt_lot_id = sl2.parent_lot_id
)SELECT * FROM downlots) d ON d.sadt_lot_id = sl.sadt_lot_id
LEFT JOIN sadt s ON s.sadt_lot_id = d.root_id
WHERE sl.parent_lot_id IS NULL
group by sl.sadt_lot_id
ORDEr By sl.sadt_lot_id
Query Plan
Sort (cost=1935.35..1935.56 rows=82 width=36) (actual time=8.230..8.234 rows=82 loops=1)
Sort Key: sl.sadt_lot_id
Sort Method: quicksort Memory: 75kB
-> HashAggregate (cost=1931.72..1932.74 rows=82 width=36) (actual time=8.085..8.197 rows=82 loops=1)
Group Key: sl.sadt_lot_id
-> Hash Right Join (cost=469.73..1839.25 rows=18493 width=8) (actual time=0.328..6.273 rows=10742 loops=1)
Hash Cond: (s.sadt_lot_id = downlots.root_id)
-> Seq Scan on sadt s (cost=0.00..645.78 rows=12678 width=8) (actual time=0.007..1.406 rows=12493 loops=1)
-> Hash (cost=465.72..465.72 rows=321 width=8) (actual time=0.242..0.242 rows=82 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Hash Right Join (cost=432.42..465.72 rows=321 width=8) (actual time=0.049..0.232 rows=82 loops=1)
Hash Cond: (downlots.sadt_lot_id = sl.sadt_lot_id)
-> CTE Scan on downlots (cost=428.41..444.05 rows=782 width=12) (actual time=0.007..0.167 rows=96 loops=1)
CTE downlots
-> Recursive Union (cost=0.00..428.41 rows=782 width=12) (actual time=0.006..0.143 rows=96 loops=1)
-> Seq Scan on sadt_lot sl1 (cost=0.00..2.99 rows=82 width=12) (actual time=0.004..0.018 rows=82 loops=1)
Filter: (parent_lot_id IS NULL)
Rows Removed by Filter: 14
-> Hash Join (cost=4.23..40.98 rows=70 width=12) (actual time=0.030..0.031 rows=5 loops=3)
Hash Cond: (d.sadt_lot_id = sl2.parent_lot_id)
-> WorkTable Scan on downlots d (cost=0.00..16.40 rows=820 width=8) (actual time=0.000..0.002 rows=32 loops=3)
-> Hash (cost=2.99..2.99 rows=99 width=8) (actual time=0.069..0.069 rows=14 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on sadt_lot sl2 (cost=0.00..2.99 rows=99 width=8) (actual time=0.004..0.061 rows=96 loops=1)
-> Hash (cost=2.99..2.99 rows=82 width=4) (actual time=0.039..0.039 rows=82 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 11kB
-> Seq Scan on sadt_lot sl (cost=0.00..2.99 rows=82 width=4) (actual time=0.014..0.028 rows=82 loops=1)
Filter: (parent_lot_id IS NULL)
Rows Removed by Filter: 14
Planning time: 0.225 ms
Execution time: 8.300 ms

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

postgres query optimisation to avoid hash right join - postgresql

Related

How to optimize this PostgreSQL query

How does a string operation on a column in a filter condition of a Postgresql query have on the plan it chooses

index only scan taking longer to run

Postgres Table Slow Performance

How can I get performance using PostgreSQL CTE recursive?

Categories

Resources