Postgresql LEFT OUTER JOIN and performance - postgresql

I have two tables: wells(id, name, extra_name) and geodesies(id, well_id, plot)
For this two tables query
EXPLAIN ANALYZE
SELECT
wells.name, geodesies.plot
FROM "geodesies" LEFT OUTER JOIN "wells" ON "wells"."id" = "geodesies"."well_id"
ORDER BY LOWER("wells"."name_nso"), "wells"."extra_name"
LIMIT 10;
Output:
"Limit (cost=1146.27..1146.29 rows=10 width=58) (actual time=64.482..64.488 rows=10 loops=1)"
" -> Sort (cost=1146.27..1176.83 rows=12225 width=58) (actual time=64.480..64.484 rows=10 loops=1)"
" Sort Key: (lower(wells.name_nso)), wells.extra_name"
" Sort Method: top-N heapsort Memory: 27kB"
" -> Hash Left Join (cost=568.17..882.09 rows=12225 width=58) (actual time=11.214..56.280 rows=12225 loops=1)"
" Hash Cond: (geodesies.well_id = wells.id)"
" -> Seq Scan on geodesies (cost=0.00..251.25 rows=12225 width=23) (actual time=0.017..5.533 rows=12225 loops=1)"
" -> Hash (cost=415.30..415.30 rows=12230 width=118) (actual time=11.126..11.127 rows=12230 loops=1)"
" Buckets: 16384 Batches: 1 Memory Usage: 1848kB"
" -> Seq Scan on wells (cost=0.00..415.30 rows=12230 width=118) (actual time=0.009..5.611 rows=12230 loops=1)"
"Planning Time: 0.804 ms"
"Execution Time: 64.544 ms"
This query does not use any index.
If i remove order by from query:
EXPLAIN ANALYZE
SELECT
wells.name, geodesies.plot
FROM "geodesies" LEFT OUTER JOIN "wells" ON "wells"."id" = "geodesies"."well_id"
LIMIT 10;
it uses index, and output looks like:
"Limit (cost=0.57..2.86 rows=10 width=19) (actual time=0.042..0.146 rows=10 loops=1)"
" -> Merge Left Join (cost=0.57..2794.76 rows=12225 width=19) (actual time=0.040..0.142 rows=10 loops=1)"
" Merge Cond: (geodesies.well_id = wells.id)"
" -> Index Scan using index_geodesies_on_well_id on geodesies (cost=0.29..979.64 rows=12225 width=23) (actual time=0.023..0.056 rows=10 loops=1)"
" -> Index Scan using wells_pkey on wells (cost=0.29..1631.73 rows=12230 width=28) (actual time=0.013..0.069 rows=10 loops=1)"
"Planning Time: 0.654 ms"
"Execution Time: 0.293 ms"
How to speed up query with order by clausle?
Regards
PostgresSQL 13

Related

PostgreSQL Calls All Data For Group By Limit Operation

I have a query like below:
SELECT
MAX(m.org_id) as orgId,
MAX(m.org_name) as orgName,
MAX(m.app_id) as appId,
MAX(r.country_or_region) as country,
MAX(r.local_spend_currency) as currency,
SUM(r.local_spend_amount) as spend,
SUM(r.impressions) as impressions
...
FROM report r
LEFT JOIN metadata m
ON m.org_id = r.org_id
AND m.campaign_id = r.campaign_id
AND m.ad_group_id = r.ad_group_id
WHERE (r.report_date BETWEEN '2019-01-01' AND '2019-10-10')
AND r.org_id = 1
GROUP BY r.country_or_region, r.ad_group_id, r.keyword_id, r.keyword, r.text
OFFSET 0
LIMIT 20
Explain Analyze:
"Limit (cost=1308.04..1308.14 rows=1 width=562) (actual time=267486.538..267487.067 rows=20 loops=1)"
" -> GroupAggregate (cost=1308.04..1308.14 rows=1 width=562) (actual time=267486.537..267487.061 rows=20 loops=1)"
" Group Key: r.country_or_region, r.ad_group_id, r.keyword_id, r.keyword, r.text"
" -> Sort (cost=1308.04..1308.05 rows=1 width=221) (actual time=267486.429..267486.536 rows=567 loops=1)"
" Sort Key: r.country_or_region, r.ad_group_id, r.keyword_id, r.keyword, r.text"
" Sort Method: external merge Disk: 667552kB"
" -> Nested Loop (cost=1.13..1308.03 rows=1 width=221) (actual time=0.029..235158.692 rows=2742789 loops=1)"
" -> Nested Loop Semi Join (cost=0.44..89.76 rows=1 width=127) (actual time=0.016..8.967 rows=1506 loops=1)"
" Join Filter: (m.org_id = (479360))"
" -> Nested Loop (cost=0.44..89.05 rows=46 width=123) (actual time=0.013..4.491 rows=1506 loops=1)"
" -> HashAggregate (cost=0.02..0.03 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=1)"
" Group Key: 479360"
" -> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)"
" -> Index Scan using pmx_org_cmp_adg on metadata m (cost=0.41..88.55 rows=46 width=119) (actual time=0.008..1.947 rows=1506 loops=1)"
" Index Cond: (org_id = (479360))"
" -> Materialize (cost=0.00..0.03 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1506)"
" -> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)"
" -> Index Scan using report_unx on search_term_report r (cost=0.69..1218.26 rows=1 width=118) (actual time=51.983..155.421 rows=1821 loops=1506)"
" Index Cond: ((org_id = m.org_id) AND (report_date >= '2019-07-01'::date) AND (report_date <= '2019-10-10'::date) AND (campaign_id = m.campaign_id) AND (ad_group_id = m.ad_group_id))"
"Planning Time: 0.988 ms"
"Execution Time: 267937.889 ms"
I have indexes on metadata and report table like: metadata(org_id, campaign_id, ad_group_id); report(org_id, report_date, campaign_id, ad_group_id)
I just want to call random 20 items with limit. But PostgreSQL takes so long time to call it? How can I improve it?
You want to have 20 groups. But for building these groups (to be sure, there is nothing missing in any group), you need to fetch all raw data.
When you say "random items", I assume you mean "random reports", as you have no item table.
with r as (select * from report WHERE r.report_date BETWEEN '2019-01-01' AND '2019-10-10' AND r.org_id = 1 order by random() limit 20)
select <whatever> from r left join <whatever>
You might need to tweak your aggregates a but. Does every record in "metadata" belong to exactly one record in "report"?

Optimisation on postgres query

I am looking for optimization suggestions for the below query on postgres. Not a DBA so looking for some expert advice in here.
Devices table holds device_id which are hexadecimal.
To achieve high throughput we run 6 instances of this query in parallel with pattern matching for device_id
beginning with [0-2], [3-5], [6-9], [a-c], [d-f]
When we run just one instance of the query it works fine, but with 6 instances we get error -
[6669]:FATAL: connection to client lost
explain analyze select notifications.id, notifications.status, events.alert_type,
events.id as event_id, events.payload, notifications.device_id as device_id,
device_endpoints.region, device_endpoints.device_endpoint as endpoint
from notifications
inner join events
on notifications.event_id = events.id
inner join devices
on notifications.device_id = devices.id
inner join device_endpoints
on devices.id = device_endpoints.device_id
where notifications.status = 'pending' AND notifications.region = 'ap-southeast-2'
AND devices.device_id ~ '[0-9a-f].*'
limit 10000;
Output of explain analyse
"Limit (cost=25.62..1349.23 rows=206 width=202) (actual time=0.359..0.359 rows=0 loops=1)"
" -> Nested Loop (cost=25.62..1349.23 rows=206 width=202) (actual time=0.357..0.357 rows=0 loops=1)"
" Join Filter: (notifications.device_id = devices.id)"
" -> Nested Loop (cost=25.33..1258.73 rows=206 width=206) (actual time=0.357..0.357 rows=0 loops=1)"
" -> Hash Join (cost=25.04..61.32 rows=206 width=52) (actual time=0.043..0.172 rows=193 loops=1)"
" Hash Cond: (notifications.event_id = events.id)"
" -> Index Scan using idx_notifications_status on notifications (cost=0.42..33.87 rows=206 width=16) (actual time=0.013..0.100 rows=193 loops=1)"
" Index Cond: (status = 'pending'::notification_status)"
" Filter: (region = 'ap-southeast-2'::text)"
" -> Hash (cost=16.50..16.50 rows=650 width=40) (actual time=0.022..0.022 rows=34 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 14kB"
" -> Seq Scan on events (cost=0.00..16.50 rows=650 width=40) (actual time=0.005..0.014 rows=34 loops=1)"
" -> Index Scan using idx_device_endpoints_device_id on device_endpoints (cost=0.29..5.80 rows=1 width=154) (actual time=0.001..0.001 rows=0 loops=193)"
" Index Cond: (device_id = notifications.device_id)"
" -> Index Scan using devices_pkey on devices (cost=0.29..0.43 rows=1 width=4) (never executed)"
" Index Cond: (id = device_endpoints.device_id)"
" Filter: (device_id ~ '[0-9a-f].*'::text)"
"Planning time: 0.693 ms"
"Execution time: 0.404 ms"

Slow query ordering by a column in a joined table in postgresql

How can I optimize this query ?
I try to increase work_mem value and create index for name. But It does not work.
EXPLAIN ANALYZE
SELECT b.* FROM book b
JOIN category c ON c.id = b.categoryid
ORDER BY c.name, b.name
LIMIT 20 OFFSET 1
"Limit (cost=328.82..328.87 rows=20 width=207) (actual time=11.942..11.955 rows=20 loops=1)"
" -> Sort (cost=328.81..341.64 rows=5132 width=207) (actual time=11.940..11.944 rows=21 loops=1)"
" Sort Key: c.name, b.name"
" Sort Method: top-N heapsort Memory: 34kB"
" -> Hash Join (cost=10.37..190.45 rows=5132 width=207) (actual time=0.143..4.963 rows=5132 loops=1)"
" Hash Cond: (b.categoryid = c.id)"
" -> Seq Scan on book b (cost=0.00..166.32 rows=5132 width=196) (actual time=0.007..2.070 rows=5132 loops=1)"
" -> Hash (cost=7.94..7.94 rows=194 width=27) (actual time=0.129..0.129 rows=194 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 20kB"
" -> Seq Scan on category c (cost=0.00..7.94 rows=194 width=27) (actual time=0.002..0.061 rows=194 loops=1)"
"Planning time: 0.283 ms"
"Execution time: 11.999 ms"

Performance degrade while fetching it from views PostgreSQL

I am running this query and i am getting a low performance. We have fetch the data from views but some how it is giving low performance.
I got explain analyze
"Aggregate (cost=387.95..387.96 rows=1 width=0) (actual time=0.561..0.561 rows=1 loops=1)"
" -> Unique (cost=387.95..387.95 rows=1 width=36) (actual time=0.558..0.558 rows=0 loops=1)"
" -> Sort (cost=387.95..387.95 rows=1 width=36) (actual time=0.558..0.558 rows=0 loops=1)"
" Sort Key: at.id, at.cid, at.created_at, ps.channel"
" Sort Method: quicksort Memory: 25kB"
" -> Nested Loop (cost=15.89..387.94 rows=1 width=36) (actual time=0.525..0.525 rows=0 loops=1)"
" -> Hash Join (cost=15.78..269.20 rows=56 width=108) (actual time=0.212..0.347 rows=11 loops=1)"
" Hash Cond: (at."LV" = br.id)"
" -> Nested Loop (cost=8.47..261.68 rows=56 width=105) (actual time=0.078..0.209 rows=11 loops=1)"
" Join Filter: (at."aRR" = ar.id)"
" Rows Removed by Join Filter: 11"
" -> Hash Join (cost=8.47..260.00 rows=56 width=89) (actual time=0.071..0.196 rows=11 loops=1)"
" Hash Cond: (at."Type" = at.id)"
" -> Nested Loop (cost=6.28..257.60 rows=56 width=90) (actual time=0.043..0.161 rows=11 loops=1)"
" Join Filter: (at."Src" = sa.id)"
" Rows Removed by Join Filter: 231"
" -> Bitmap Heap Scan on at (cost=6.28..252.88 rows=67 width=94) (actual time=0.026..0.109 rows=11 loops=1)"
" Recheck Cond: (created_at > '2018-01-05 11:33:28'::timestamp without time zone)"
" Filter: (status = 't'::text)"
" Heap Blocks: exact=11"
" -> Bitmap Index Scan on created_date_ids (cost=0.00..6.28 rows=128 width=0) (actual time=0.011..0.011 rows=12 loops=1)"
" Index Cond: (created_at > '2018-01-05 11:33:28'::timestamp without time zone)"
" -> Materialize (cost=0.00..2.04 rows=10 width=28) (actual time=0.001..0.002 rows=22 loops=11)"
" -> Seq Scan on sa (cost=0.00..2.03 rows=10 width=28) (actual time=0.002..0.006 rows=22 loops=1)"
" -> Hash (cost=2.09..2.09 rows=29 width=31) (actual time=0.018..0.018 rows=30 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 10kB"
" -> Seq Scan on at (cost=0.00..2.09 rows=29 width=31) (actual time=0.005..0.010 rows=30 loops=1)"
" -> Materialize (cost=0.00..1.01 rows=3 width=48) (actual time=0.000..0.000 rows=2 loops=11)"
" -> Seq Scan on ar (cost=0.00..1.01 rows=3 width=48) (actual time=0.002..0.002 rows=2 loops=1)"
" -> Hash (cost=6.06..6.06 rows=355 width=35) (actual time=0.122..0.122 rows=370 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 33kB"
" -> Seq Scan on br (cost=0.00..6.06 rows=355 width=35) (actual time=0.006..0.048 rows=370 loops=1)"
" -> Index Only Scan using prs_Src_application_at_activit_key on prs ps (cost=0.11..2.12 rows=1 width=63) (actual time=0.015..0.015 rows=0 loops=11)"
" Index Cond: ((Src_application = (sa."Name")::text) AND (at = (at."Name")::text) AND (aRR = (ar."Name")::text) AND (LV = (br."Name")::text))"
" Filter: (btrim((channel)::text) = 'V'::text)"
" Rows Removed by Filter: 1"
" Heap Fetches: 0"
"Planning time: 7.735 ms"
"Execution time: 0.721 ms"
```
Our views look like
SELECT DISTINCT at.id,
at.cid,
at.created_at,
at.status,
ps.channel
FROM at
JOIN sa ON sa.id = at."Src"
JOIN at ON at.id = at."Type"
JOIN ar ON ar.id = at."aRR"
JOIN br ON br.id = at."LV"
JOIN prs ps ON ps.aRR::text = ar."Name"::text AND ps.at::text = at."Name"::text AND ps.LV::text = br."Name"::text AND ps.Src_application::text = sa."Name"::text
WHERE at.status = 't'::text and
trim(ps.channel)= 'V' and at.created_at > '2018-01-05 11:33:28'
This query is taking too much time. How to improve the performance of this query.

query without limit works faster than query with limit

What is the explanation why the same query with limit 100 works slower than similar query without limit 100. The two queries run against the same database and and the result-set is less than 100
The original query was generated by hibernate and had some extra joins. Based on the feedback I got I made the query simpler and ran
VACUUM FULL ANALYZE events
VACUUM FULL ANALYZE resources
But the problem still exist.
Thanks!
explain ANALYZE
SELECT e.id
FROM events e,
resources r
WHERE e.resource_id = r.id
AND (resource_type_id = '19872817' OR resource_type_id = '282')
ORDER BY occurrence_date DESC LIMIT 100
outputs...
"Limit (cost=0.98..86362.46 rows=100 width=12) (actual time=61958.090..185854.425 rows=22 loops=1)"
" -> Nested Loop (cost=0.98..16791263.94 rows=19443 width=12) (actual time=61958.087..185854.392 rows=22 loops=1)"
" -> Index Scan using eventoccurrencedateindex on events e (cost=0.56..2295556.29 rows=31819630 width=16) (actual time=0.028..31770.948 rows=31819491 loops=1)"
" -> Index Scan using resources_pkey on resources r (cost=0.42..0.45 rows=1 width=4) (actual time=0.004..0.004 rows=0 loops=31819491)"
" Index Cond: (id = e.resource_id)"
" Filter: ((resource_type_id = 19872817) OR (resource_type_id = 282))"
" Rows Removed by Filter: 1"
"Total runtime: 185854.569 ms"
and
explain ANALYZE
SELECT e.id
FROM events e,
resources r
WHERE e.resource_id = r.id
AND (resource_type_id = '19872817' OR resource_type_id = '282')
ORDER BY occurrence_date DESC
outputs...
"Sort (cost=455353.69..455402.30 rows=19443 width=12) (actual time=1.942..1.947 rows=22 loops=1)"
" Sort Key: e.occurrence_date"
" Sort Method: quicksort Memory: 26kB"
" -> Nested Loop (cost=42.30..453968.67 rows=19443 width=12) (actual time=0.720..1.900 rows=22 loops=1)"
" -> Bitmap Heap Scan on resources r (cost=9.53..309.53 rows=86 width=4) (actual time=0.120..0.306 rows=34 loops=1)"
" Recheck Cond: ((resource_type_id = 19872817) OR (resource_type_id = 282))"
" -> BitmapOr (cost=9.53..9.53 rows=86 width=0) (actual time=0.109..0.109 rows=0 loops=1)"
" -> Bitmap Index Scan on resources_type_fk_index (cost=0.00..4.74 rows=43 width=0) (actual time=0.016..0.016 rows=0 loops=1)"
" Index Cond: (resource_type_id = 19872817)"
" -> Bitmap Index Scan on resources_type_fk_index (cost=0.00..4.74 rows=43 width=0) (actual time=0.092..0.092 rows=34 loops=1)"
" Index Cond: (resource_type_id = 282)"
" -> Bitmap Heap Scan on events e (cost=32.78..5259.29 rows=1582 width=16) (actual time=0.041..0.043 rows=1 loops=34)"
" Recheck Cond: (resource_id = r.id)"
" -> Bitmap Index Scan on events_resource_fk_index (cost=0.00..32.38 rows=1582 width=0) (actual time=0.037..0.037 rows=1 loops=34)"
" Index Cond: (resource_id = r.id)"
"Total runtime: 2.054 ms"
Increasing the limit size to 1000 caused Postgres to use a different plan which worked much faster.