Postgres query performance improvement - postgresql

I am a newbie to database optimisations,
The table data I have is around 29 million rows,
I am running on Pgadmin to do select * on the rows and it takes 9 seconds.
What can I do to optimize performance?
SELECT
F."Id",
F."Name",
F."Url",
F."CountryModel",
F."RegionModel",
F."CityModel",
F."Street",
F."Phone",
F."PostCode",
F."Images",
F."Rank",
F."CommentCount",
F."PageRank",
F."Description",
F."Properties",
F."IsVerify",
count(*) AS Counter
FROM
public."Firms" F,
LATERAL unnest(F."Properties") AS P
WHERE
F."CountryId" = 1
AND F."RegionId" = 7
AND F."CityId" = 4365
AND P = ANY (ARRAY[126, 128])
AND F."Deleted" = FALSE
GROUP BY
F."Id"
ORDER BY
Counter DESC,
F."IsVerify" DESC,
F."PageRank" DESC OFFSET 10 ROWS FETCH FIRST 20 ROW ONLY
Thats my query plan
" -> Sort (cost=11945.20..11948.15 rows=1178 width=369) (actual time=8981.514..8981.515 rows=30 loops=1)"
" Sort Key: (count(*)) DESC, f.""IsVerify"" DESC, f.""PageRank"" DESC"
" Sort Method: top-N heapsort Memory: 58kB"
" -> HashAggregate (cost=11898.63..11910.41 rows=1178 width=369) (actual time=8981.234..8981.305 rows=309 loops=1)"
" Group Key: f.""Id"""
" Batches: 1 Memory Usage: 577kB"
" -> Nested Loop (cost=7050.07..11886.85 rows=2356 width=360) (actual time=79.408..8980.167 rows=322 loops=1)"
" -> Bitmap Heap Scan on ""Firms"" f (cost=7050.06..11716.04 rows=1178 width=360) (actual time=78.414..8909.649 rows=56071 loops=1)"
" Recheck Cond: ((""CityId"" = 4365) AND (""RegionId"" = 7))"
" Filter: ((NOT ""Deleted"") AND (""CountryId"" = 1))"
" Heap Blocks: exact=55330"
" -> BitmapAnd (cost=7050.06..7050.06 rows=1178 width=0) (actual time=70.947..70.947 rows=0 loops=1)"
" -> Bitmap Index Scan on ""IX_Firms_CityId"" (cost=0.00..635.62 rows=58025 width=0) (actual time=11.563..11.563 rows=56072 loops=1)"
" Index Cond: (""CityId"" = 4365)"
" -> Bitmap Index Scan on ""IX_Firms_RegionId"" (cost=0.00..6413.60 rows=588955 width=0) (actual time=57.795..57.795 rows=598278 loops=1)"
" Index Cond: (""RegionId"" = 7)"
" -> Function Scan on unnest p (cost=0.00..0.13 rows=2 width=0) (actual time=0.001..0.001 rows=0 loops=56071)"
" Filter: (p = ANY ('{126,128}'::integer[]))"
" Rows Removed by Filter: 2"
"Planning Time: 0.351 ms"
"Execution Time: 8981.725 ms"```

Create a GIN index on F."Properties",
create index on "Firms" using gin ("Properties");
then add a clause to your WHERE
...
AND P = ANY (ARRAY[126, 128])
AND "Properties" && ARRAY[126, 128]
....
That added clause is redundant to the one preceding it, but the planner is not smart enough to reason through that so you need to make it explicit.

Related

query the jsonb array data by specific field condition with OR and ORDER BY constraint really slow

I have a postgresql table named events
Name
Data Type
id
character varying
idx
integer
module
character varying
method
character varying
idx
integer
block_height
integer
data
jsonb
where data is like
["hex_address", 101, 1001, 51637660000, 51324528436, 6003, 235458709417729, 234683610487930] //trade method
["hex_address", 200060013, 1, 250000000000, 3176359357709794, 6006, "0x00000000000000001431f1724a499fcd", 114440794831460] //add method
["hex_address", 200060013, 1, 42658905229340, 407285893749, "0x000000000000000000110204f76c06e2", 6006, 121017475390243, "0x000000000000000013bd9463821aedee"] //remove method
And the table is about 3 million items.The indexes have been created
CREATE INDEX IDX_event_multicolumn_index ON public.events ("module", "method", "block_height") INCLUDE ("id", "idx", "data");
CREATE INDEX event_data_gin_index ON public.events USING gin (data)
When I run the SQL SELECT data FROM public.events WHERE module = 'amm' AND (((data::jsonb->6 IN ('6002', '6003') AND method = 'LiquidityRemoved')) OR ((data::jsonb->5 IN ('6002', '6003') AND method IN ('Traded', 'LiquidityAdded')))) ORDER BY block_height DESC, idx DESC LIMIT 500;
it'll take about 1 minute. the explain analyze result(this is another table with less items):
"Limit (cost=1909.21..1909.22 rows=5 width=334) (actual time=14019.484..14019.524 rows=100 loops=1)"
" -> Sort (cost=1909.21..1909.22 rows=5 width=334) (actual time=14019.477..14019.504 rows=100 loops=1)"
" Sort Key: block_height DESC, idx DESC"
" Sort Method: top-N heapsort Memory: 128kB"
" -> Bitmap Heap Scan on events (cost=114.28..1909.15 rows=5 width=334) (actual time=703.038..13957.503 rows=25625 loops=1)"
" Recheck Cond: ((((module)::text = 'amm'::text) AND ((method)::text = 'LiquidityRemoved'::text)) OR (((module)::text = 'amm'::text) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[]))))"
" Filter: ((((data -> 6) = ANY ('{5002,5003}'::jsonb[])) AND ((method)::text = 'LiquidityRemoved'::text)) OR (((data -> 5) = ANY ('{5002,5003}'::jsonb[])) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[]))))"
" Rows Removed by Filter: 9435"
" Heap Blocks: exact=28532"
" -> BitmapOr (cost=114.28..114.28 rows=462 width=0) (actual time=696.569..696.580 rows=0 loops=1)"
" -> Bitmap Index Scan on ""IDX_event_multicolumn_index"" (cost=0.00..4.59 rows=3 width=0) (actual time=24.375..24.382 rows=896 loops=1)"
" Index Cond: (((module)::text = 'amm'::text) AND ((method)::text = 'LiquidityRemoved'::text))"
" -> Bitmap Index Scan on ""IDX_event_multicolumn_index"" (cost=0.00..109.69 rows=459 width=0) (actual time=672.191..672.191 rows=34164 loops=1)"
" Index Cond: (((module)::text = 'amm'::text) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[])))"
If I delete the multicolumn index, it does be faster and take about 20 seconds.
"Gather Merge (cost=477713.00..477720.00 rows=60 width=130) (actual time=22151.357..22210.826 rows=79864 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Sort (cost=476712.97..476713.05 rows=30 width=130) (actual time=22090.960..22097.308 rows=26621 loops=3)"
" Sort Key: block_height"
" Sort Method: external merge Disk: 5400kB"
" Worker 0: Sort Method: external merge Disk: 5264kB"
" Worker 1: Sort Method: external merge Disk: 5416kB"
" -> Parallel Seq Scan on events (cost=0.00..476712.24 rows=30 width=130) (actual time=5.151..21985.878 rows=26621 loops=3)"
" Filter: (((module)::text = 'amm'::text) AND ((((data -> 6) = ANY ('{6002,6003}'::jsonb[])) AND ((method)::text = 'LiquidityRemoved'::text)) OR (((data -> 5) = ANY ('{6002,6003}'::jsonb[])) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[])))))"
" Rows Removed by Filter: 2858160"
"Planning Time: 0.559 ms"
"Execution Time: 22217.351 ms"
Then I try to create the specific field multicolumn index:
create EXTENSION btree_gin;
CREATE INDEX extcondindex ON public.events USING gin (((data -> 5)), ((data -> 6)), module, method);
The result is just same with the origin multi column index.
If I remove one of the OR constraint SELECT data FROM public.events WHERE module = 'amm' AND (((data::jsonb->6 IN ('6002', '6003') AND method = 'LiquidityRemoved'))) ORDER BY block_height DESC, idx DESC LIMIT 500; It's fast enough and takes about 3 seconds.
I want to know why the multi column index slow down the query, and how should I add the index for the specific field to optimize my query.
The io timing and buffer analyze for the multi-column index table
"Sort (cost=1916.98..1916.99 rows=5 width=142) (actual time=14204.316..14210.109 rows=25683 loops=1)"
" Sort Key: block_height"
" Sort Method: external merge Disk: 5264kB"
" Buffers: shared hit=93 read=31211, temp read=658 written=659"
" I/O Timings: read=13533.823"
" -> Bitmap Heap Scan on events (cost=114.30..1916.92 rows=5 width=142) (actual time=926.714..14156.662 rows=25683 loops=1)"
" Recheck Cond: ((((module)::text = 'amm'::text) AND ((method)::text = 'LiquidityRemoved'::text)) OR (((module)::text = 'amm'::text) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[]))))"
" Filter: ((((data -> 6) = ANY ('{5002,5003}'::jsonb[])) AND ((method)::text = 'LiquidityRemoved'::text)) OR (((data -> 5) = ANY ('{5002,5003}'::jsonb[])) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[]))))"
" Rows Removed by Filter: 9503"
" Heap Blocks: exact=28626"
" Buffers: shared hit=90 read=31211"
" I/O Timings: read=13533.823"
" -> BitmapOr (cost=114.30..114.30 rows=464 width=0) (actual time=919.777..919.779 rows=0 loops=1)"
" Buffers: shared hit=27 read=2648"
" I/O Timings: read=892.499"
" -> Bitmap Index Scan on ""IDX_event_multicolumn_index"" (cost=0.00..4.59 rows=3 width=0) (actual time=32.255..32.255 rows=898 loops=1)"
" Index Cond: (((module)::text = 'amm'::text) AND ((method)::text = 'LiquidityRemoved'::text))"
" Buffers: shared hit=7 read=72"
" I/O Timings: read=30.095"
" -> Bitmap Index Scan on ""IDX_event_multicolumn_index"" (cost=0.00..109.71 rows=461 width=0) (actual time=887.519..887.520 rows=34288 loops=1)"
" Index Cond: (((module)::text = 'amm'::text) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[])))"
" Buffers: shared hit=20 read=2576"
" I/O Timings: read=862.404"
"Planning:"
" Buffers: shared hit=230"
"Planning Time: 0.663 ms"
"Execution Time: 14214.038 ms"

Query optimization in Postgres

I have is over 25 million rows, I'm trying to learn what needs to be done to speed up the query time.
"Properties" column is integer[]. I created a GIN index on F."Properties", what is slowing my query and how do I solve it?
SELECT
F. "Id",
F. "Name",
F. "Url",
F. "CountryModel",
F. "IsVerify",
count(*) AS Counter
FROM
public. "Firms" F,
LATERAL unnest(F."Properties") AS P
WHERE
F. "CountryId" = 1
AND P = ANY (ARRAY[126,128])
AND "Properties" && ARRAY[126,128]
AND F. "Deleted" = FALSE
GROUP BY
F. "Id"
ORDER BY
F. "IsVerify" DESC,
Counter DESC,
F. "PageRank" DESC OFFSET 0 ROWS FETCH FIRST 20 ROW ONLY
**Query Explain Analyze**
"Limit (cost=793826.15..793826.20 rows=20 width=100) (actual time=12255.433..12257.874 rows=20 loops=1)"
" -> Sort (cost=793826.15..794287.87 rows=184689 width=100) (actual time=12255.433..12257.872 rows=20 loops=1)"
" Sort Key: f.""IsVerify"" DESC, (count(*)) DESC, f.""PageRank"" DESC"
" Sort Method: top-N heapsort Memory: 29kB"
" -> GroupAggregate (cost=755368.13..788911.64 rows=184689 width=100) (actual time=12062.457..12224.136 rows=201352 loops=1)"
" Group Key: f.""Id"""
" -> Nested Loop (cost=755368.13..785217.86 rows=369378 width=92) (actual time=12062.450..12176.968 rows=205124 loops=1)"
" -> Gather Merge (cost=755368.12..776878.19 rows=184689 width=120) (actual time=12062.435..12090.924 rows=201352 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Sort (cost=754368.09..754560.48 rows=76954 width=120) (actual time=12050.371..12060.077 rows=67117 loops=3)"
" Sort Key: f.""Id"""
" Sort Method: external merge Disk: 9848kB"
" Worker 0: Sort Method: external merge Disk: 9840kB"
" Worker 1: Sort Method: external merge Disk: 9784kB"
" -> Parallel Bitmap Heap Scan on ""Firms"" f (cost=1731.34..743387.12 rows=76954 width=120) (actual time=44.825..12010.247 rows=67117 loops=3)"
" Recheck Cond: (""Properties"" && '{126,128}'::integer[])"
" Rows Removed by Index Recheck: 356198"
" Filter: ((NOT ""Deleted"") AND (""CountryId"" = 1))"
" Heap Blocks: exact=17368 lossy=47419"
" -> Bitmap Index Scan on ix_properties_gin (cost=0.00..1685.17 rows=184689 width=0) (actual time=47.787..47.787 rows=201354 loops=1)"
" Index Cond: (""Properties"" && '{126,128}'::integer[])"
" -> Memoize (cost=0.01..0.14 rows=2 width=0) (actual time=0.000..0.000 rows=1 loops=201352)"
" Cache Key: f.""Properties"""
" Hits: 179814 Misses: 21538 Evictions: 0 Overflows: 0 Memory Usage: 3076kB"
" -> Function Scan on unnest p (cost=0.00..0.13 rows=2 width=0) (actual time=0.001..0.001 rows=1 loops=21538)"
" Filter: (p = ANY ('{126,128}'::integer[]))"
" Rows Removed by Filter: 6"
"Planning Time: 2.143 ms"
"Execution Time: 12259.445 ms"

PostgreSQL Query performance slower with Text column

We are using PostgreSQL 9.5.2
We have 11 tables with around average of 10K records in each table
One of the table contains text column for which maximum content size is 12K characters.
When we exclude text column from select statement, it comes in around 5 seconds, and when we include text column, it take around 55 seconds. if we select any other column from same table, it works fine, but as soon as we take text column, performance goes on toss.
All tables are inner joined.
Can you please suggest on how to solve this?
Explain output shows 378ms but in real, it take around 1 minute to get these data.
so when we exclude text column from "ic" table, get result in 4-5 seconds.
"Nested Loop Left Join (cost=4.04..156.40 rows=10 width=616) (actual time=3.092..377.128 rows=24118 loops=1)"
" -> Nested Loop Left Join (cost=3.90..59.92 rows=7 width=603) (actual time=2.834..110.842 rows=14325 loops=1)"
" -> Nested Loop Left Join (cost=3.76..58.56 rows=7 width=604) (actual time=2.832..101.481 rows=12340 loops=1)"
" -> Nested Loop (cost=3.62..57.19 rows=7 width=590) (actual time=2.830..90.614 rows=8436 loops=1)"
" Join Filter: (i."Id" = ic."ImId")"
" -> Nested Loop (cost=3.33..51.42 rows=7 width=210) (actual time=2.807..65.782 rows=8436 loops=1)"
" -> Nested Loop (cost=3.19..50.21 rows=7 width=187) (actual time=2.424..54.596 rows=8436 loops=1)"
" -> Nested Loop (cost=2.77..46.16 rows=7 width=175) (actual time=1.944..32.056 rows=8436 loops=1)"
" -> Nested Loop (cost=2.35..23.66 rows=5 width=87) (actual time=1.750..1.877 rows=4 loops=1)"
" -> Hash Join (cost=2.22..22.84 rows=5 width=55) (actual time=1.492..1.605 rows=4 loops=1)"
" Hash Cond: (i."ImtypId" = it."Id")"
" -> Nested Loop (cost=0.84..21.29 rows=34 width=51) (actual time=1.408..1.507 rows=30 loops=1)"
" -> Nested Loop (cost=0.56..9.68 rows=34 width=35) (actual time=1.038..1.053 rows=30 loops=1)"
" -> Index Only Scan using ev_query on "table_Ev" e (cost=0.28..4.29 rows=1 width=31) (actual time=0.523..0.523 rows=1 loops=1)"
" Index Cond: ("Id" = 1301)"
" Heap Fetches: 0"
" -> Index Only Scan using asmitm_query on "table_AsmItm" ai (cost=0.28..5.07 rows=31 width=8) (actual time=0.499..0.508 rows=30 loops=1)"
" Index Cond: (("AsmId" = e."AsmId") AND ("IsActive" = true))"
" Filter: "IsActive""
" Heap Fetches: 0"
" -> Index Only Scan using itm_query on "table_Itm" i (cost=0.28..0.33 rows=1 width=16) (actual time=0.014..0.014 rows=1 loops=30)"
" Index Cond: ("Id" = ai."ImId")"
" Heap Fetches: 0"
" -> Hash (cost=1.33..1.33 rows=4 width=12) (actual time=0.026..0.026 rows=4 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 9kB"
" -> Seq Scan on "ItmTyp" it (cost=0.00..1.33 rows=4 width=12) (actual time=0.013..0.018 rows=4 loops=1)"
" Filter: ("ParentId" = 12)"
" Rows Removed by Filter: 22"
" -> Index Only Scan using jur_query on "table_Jur" j (cost=0.14..0.15 rows=1 width=36) (actual time=0.065..0.066 rows=1 loops=4)"
" Index Cond: ("Id" = i."JurId")"
" Heap Fetches: 4"
" -> Index Scan using pwsres_evid_ImId_canid_query on "table_PwsRes" p (cost=0.42..3.78 rows=72 width=92) (actual time=0.056..6.562 rows=2109 loops=4)"
" Index Cond: (("EvId" = 1301) AND ("ImId" = i."Id"))"
" -> Index Only Scan using user_query on "table_User" u (cost=0.42..0.57 rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=8436)"
" Index Cond: ("Id" = p."CanId")"
" Heap Fetches: 0"
" -> Index Only Scan using ins_query on "table_Ins" ins (cost=0.14..0.16 rows=1 width=31) (actual time=0.001..0.001 rows=1 loops=8436)"
" Index Cond: ("Id" = u."InsId")"
" Heap Fetches: 0"
" -> Index Scan using "IX_ItmCont_ImId" on "table_ItmCont" ic (cost=0.29..0.81 rows=1 width=392) (actual time=0.002..0.002 rows=1 loops=8436)"
" Index Cond: ("ImId" = p."ImId")"
" Filter: ("ContTyp" = 'CP'::text)"
" Rows Removed by Filter: 1"
" -> Index Scan using "IX_FreDetail_FreId" on "table_FreDetail" f (cost=0.14..0.18 rows=2 width=22) (actual time=0.000..0.001 rows=1 loops=8436)"
" Index Cond: ("FreId" = p."FreId")"
" -> Index Scan using "IX_DurDetail_DurId" on "table_DurDetail" d (cost=0.14..0.17 rows=2 width=7) (actual time=0.000..0.000 rows=0 loops=12340)"
" Index Cond: ("DurId" = p."DurId")"
" -> Index Scan using "IX_DruConsRouteDetail_DruConsRouId" on "table_DruConsRouDetail" dr (cost=0.14..0.18 rows=2 width=21) (actual time=0.001..0.001 rows=1 loops=14325)"
" Index Cond: ("DruConsRouteId" = p."RouteId")"
" SubPlan 1"
" -> Index Only Scan using asm_query on "table_Asm" (cost=0.14..8.16 rows=1 width=26) (actual time=0.001..0.001 rows=1 loops=24118)"
" Index Cond: ("Id" = e."AsmId")"
" Heap Fetches: 24118"
" SubPlan 2"
" -> Seq Scan on "ItmTyp" ity (cost=0.00..1.33 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=24118)"
" Filter: ("Id" = it."ParentId")"
" Rows Removed by Filter: 25"
"Planning time: 47.056 ms"
"Execution time: 378.229 ms"
If the explain analyze output is taking 378ms, that is how long the query is taking and there's probably not a lot of room for improvement there. If it's taking 1 minute to transfer and load the data, you need to work on that end.
If you're trying to view very wide rows in psql or pgadmin, it can take some time to calculate the row widths or render the html, but that has nothing to do with query performance.

Optimisation on postgres query

I am looking for optimization suggestions for the below query on postgres. Not a DBA so looking for some expert advice in here.
Devices table holds device_id which are hexadecimal.
To achieve high throughput we run 6 instances of this query in parallel with pattern matching for device_id
beginning with [0-2], [3-5], [6-9], [a-c], [d-f]
When we run just one instance of the query it works fine, but with 6 instances we get error -
[6669]:FATAL: connection to client lost
explain analyze select notifications.id, notifications.status, events.alert_type,
events.id as event_id, events.payload, notifications.device_id as device_id,
device_endpoints.region, device_endpoints.device_endpoint as endpoint
from notifications
inner join events
on notifications.event_id = events.id
inner join devices
on notifications.device_id = devices.id
inner join device_endpoints
on devices.id = device_endpoints.device_id
where notifications.status = 'pending' AND notifications.region = 'ap-southeast-2'
AND devices.device_id ~ '[0-9a-f].*'
limit 10000;
Output of explain analyse
"Limit (cost=25.62..1349.23 rows=206 width=202) (actual time=0.359..0.359 rows=0 loops=1)"
" -> Nested Loop (cost=25.62..1349.23 rows=206 width=202) (actual time=0.357..0.357 rows=0 loops=1)"
" Join Filter: (notifications.device_id = devices.id)"
" -> Nested Loop (cost=25.33..1258.73 rows=206 width=206) (actual time=0.357..0.357 rows=0 loops=1)"
" -> Hash Join (cost=25.04..61.32 rows=206 width=52) (actual time=0.043..0.172 rows=193 loops=1)"
" Hash Cond: (notifications.event_id = events.id)"
" -> Index Scan using idx_notifications_status on notifications (cost=0.42..33.87 rows=206 width=16) (actual time=0.013..0.100 rows=193 loops=1)"
" Index Cond: (status = 'pending'::notification_status)"
" Filter: (region = 'ap-southeast-2'::text)"
" -> Hash (cost=16.50..16.50 rows=650 width=40) (actual time=0.022..0.022 rows=34 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 14kB"
" -> Seq Scan on events (cost=0.00..16.50 rows=650 width=40) (actual time=0.005..0.014 rows=34 loops=1)"
" -> Index Scan using idx_device_endpoints_device_id on device_endpoints (cost=0.29..5.80 rows=1 width=154) (actual time=0.001..0.001 rows=0 loops=193)"
" Index Cond: (device_id = notifications.device_id)"
" -> Index Scan using devices_pkey on devices (cost=0.29..0.43 rows=1 width=4) (never executed)"
" Index Cond: (id = device_endpoints.device_id)"
" Filter: (device_id ~ '[0-9a-f].*'::text)"
"Planning time: 0.693 ms"
"Execution time: 0.404 ms"

Postgres Explain Plans Are DIfferent for same query with different value

I have databases running Postgres 9.56 on heroku.
I'm running the following SQL with different parameter value, but getting very different results in the performance.
Query 1
SELECT COUNT(s), DATE_TRUNC('MONTH', t.departure)
FROM tk_seat s
LEFT JOIN tk_trip t ON t.trip_id = s.trip_id
WHERE DATE_PART('year', t.departure)= '2017'
AND t.trip_status = 'BOOKABLE'
AND t.route_id = '278'
AND s.seat_status_type != 'NONE'
AND s.operator_id = '15'
GROUP BY DATE_TRUNC('MONTH', t.departure)
ORDER BY DATE_TRUNC('MONTH', t.departure)
Query 2
SELECT COUNT(s), DATE_TRUNC('MONTH', t.departure)
FROM tk_seat s
LEFT JOIN tk_trip t ON t.trip_id = s.trip_id
WHERE DATE_PART('year', t.departure)= '2017'
AND t.trip_status = 'BOOKABLE'
AND t.route_id = '150'
AND s.seat_status_type != 'NONE'
AND s.operator_id = '15'
GROUP BY DATE_TRUNC('MONTH', t.departure)
ORDER BY DATE_TRUNC('MONTH', t.departure)
Only Difference is t.route_id value.
So, I tried running explain and give me very different result.
For Query 1
"GroupAggregate (cost=279335.17..279335.19 rows=1 width=298)"
" Group Key: (date_trunc('MONTH'::text, t.departure))"
" -> Sort (cost=279335.17..279335.17 rows=1 width=298)"
" Sort Key: (date_trunc('MONTH'::text, t.departure))"
" -> Nested Loop (cost=0.00..279335.16 rows=1 width=298)"
" Join Filter: (s.trip_id = t.trip_id)"
" -> Seq Scan on tk_trip t (cost=0.00..5951.88 rows=1 width=12)"
" Filter: (((trip_status)::text = 'BOOKABLE'::text) AND (route_id = '278'::bigint) AND (date_part('year'::text, departure) = '2017'::double precision))"
" -> Seq Scan on tk_seat s (cost=0.00..271738.35 rows=131594 width=298)"
" Filter: (((seat_status_type)::text <> 'NONE'::text) AND (operator_id = '15'::bigint))"
For Query 2
"Sort (cost=278183.94..278183.95 rows=1 width=298)"
" Sort Key: (date_trunc('MONTH'::text, t.departure))"
" -> HashAggregate (cost=278183.92..278183.93 rows=1 width=298)"
" Group Key: date_trunc('MONTH'::text, t.departure)"
" -> Hash Join (cost=5951.97..278183.88 rows=7 width=298)"
" Hash Cond: (s.trip_id = t.trip_id)"
" -> Seq Scan on tk_seat s (cost=0.00..271738.35 rows=131594 width=298)"
" Filter: (((seat_status_type)::text <> 'NONE'::text) AND (operator_id = '15'::bigint))"
" -> Hash (cost=5951.88..5951.88 rows=7 width=12)"
" -> Seq Scan on tk_trip t (cost=0.00..5951.88 rows=7 width=12)"
" Filter: (((trip_status)::text = 'BOOKABLE'::text) AND (route_id = '150'::bigint) AND (date_part('year'::text, departure) = '2017'::double precision))"
My question is why and how can i make it same? Because first Query give me very bad performance
Query 1 Analyze
"GroupAggregate (cost=274051.28..274051.31 rows=1 width=8) (actual time=904682.606..904684.283 rows=7 loops=1)"
" Group Key: (date_trunc('MONTH'::text, t.departure))"
" -> Sort (cost=274051.28..274051.29 rows=1 width=8) (actual time=904682.432..904682.917 rows=13520 loops=1)"
" Sort Key: (date_trunc('MONTH'::text, t.departure))"
" Sort Method: quicksort Memory: 1018kB"
" -> Nested Loop (cost=0.42..274051.27 rows=1 width=8) (actual time=1133.925..904676.254 rows=13520 loops=1)"
" Join Filter: (s.trip_id = t.trip_id)"
" Rows Removed by Join Filter: 42505528"
" -> Index Scan using tk_trip_route_id_idx on tk_trip t (cost=0.42..651.34 rows=1 width=12) (actual time=0.020..2.720 rows=338 loops=1)"
" Index Cond: (route_id = '278'::bigint)"
" Filter: (((trip_status)::text = 'BOOKABLE'::text) AND (date_part('year'::text, departure) = '2017'::double precision))"
" Rows Removed by Filter: 28"
" -> Seq Scan on tk_seat s (cost=0.00..271715.83 rows=134728 width=8) (actual time=0.071..2662.102 rows=125796 loops=338)"
" Filter: (((seat_status_type)::text <> 'NONE'::text) AND (operator_id = '15'::bigint))"
" Rows Removed by Filter: 6782294"
"Planning time: 1.172 ms"
"Execution time: 904684.570 ms"
Query 2 Analyze
"Sort (cost=275018.88..275018.89 rows=1 width=8) (actual time=2153.843..2153.843 rows=9 loops=1)"
" Sort Key: (date_trunc('MONTH'::text, t.departure))"
" Sort Method: quicksort Memory: 25kB"
" -> HashAggregate (cost=275018.86..275018.87 rows=1 width=8) (actual time=2153.833..2153.834 rows=9 loops=1)"
" Group Key: date_trunc('MONTH'::text, t.departure)"
" -> Hash Join (cost=2797.67..275018.82 rows=7 width=8) (actual time=2.472..2147.093 rows=36565 loops=1)"
" Hash Cond: (s.trip_id = t.trip_id)"
" -> Seq Scan on tk_seat s (cost=0.00..271715.83 rows=134728 width=8) (actual time=0.127..2116.153 rows=125796 loops=1)"
" Filter: (((seat_status_type)::text <> 'NONE'::text) AND (operator_id = '15'::bigint))"
" Rows Removed by Filter: 6782294"
" -> Hash (cost=2797.58..2797.58 rows=7 width=12) (actual time=1.853..1.853 rows=1430 loops=1)"
" Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 78kB"
" -> Bitmap Heap Scan on tk_trip t (cost=32.21..2797.58 rows=7 width=12) (actual time=0.176..1.559 rows=1430 loops=1)"
" Recheck Cond: (route_id = '150'::bigint)"
" Filter: (((trip_status)::text = 'BOOKABLE'::text) AND (date_part('year'::text, departure) = '2017'::double precision))"
" Rows Removed by Filter: 33"
" Heap Blocks: exact=333"
" -> Bitmap Index Scan on tk_trip_route_id_idx (cost=0.00..32.21 rows=1572 width=0) (actual time=0.131..0.131 rows=1463 loops=1)"
" Index Cond: (route_id = '150'::bigint)"
"Planning time: 0.211 ms"
"Execution time: 2153.972 ms"
You can - possibly - make them the same if you hint postgres to not use Nested Loops:
SET enable_nestloop = 'off';
You can make it permanent by setting it to either server, role, inside function definition or server configuration:
ALTER DATABASE postgres
SET enable_nestloop = 'off';
ALTER ROLE lkaminski
SET enable_nestloop = 'off';
CREATE FUNCTION add(integer, integer) RETURNS integer
AS 'select $1 + $2;'
LANGUAGE SQL
SET enable_nestloop = 'off'
IMMUTABLE
RETURNS NULL ON NULL INPUT;
As for why - you change search condition and planner estimates that from tk_trip he will get 1 row instead of 7, so it changes plan because it seems like Nested Loop will be better. Sometimes it is wrong about those and you might get slower execution time. But if you "force" it to not use Nested Loops then for different parameter it could be slower to use second plan instead of first one (with Nested Loop).
You can make planner estimates more accurate by increasing how much statistics it gathers per column. It might help.
ALTER TABLE tk_trip ALTER COLUMN route_id SET STATISTICS 1000;
As a side note - your LEFT JOIN is actually INNER JOIN, because you have put conditions for that table inside WHERE instead of ON. You should get different plan (and result) if you move them over to ON - assuming you wanted LEFT JOIN.