I have a query that I cannot optimize:
SELECT evse_label AS evse_label,
evc_model AS evc_model,
address AS address,
city AS city,
message AS message,
reason AS reason,
min(timestamp) AS oldest_timestamp,
max(timestamp) AS latest_timestamp,
count(message) AS "# messages"
FROM public.om_logbook_mart
GROUP BY evse_label,
evc_model,
address,
city,
message,
reason
ORDER BY oldest_timestamp DESC
LIMIT 10000
I tried to create the following index:
CREATE INDEX evselabel_evc_model_address_city_reason_message_timestamp_idx
ON om_logbook_mart (evse_label,evc_model,address,city,message,reason,timestamp);
Unfortunately nothing changes and the query still lasts around 40s.
Doing an explain analyze, this is the output:
Limit (cost=1184359.89..1184384.89 rows=10000 width=95) (actual time=39395.145..39451.451 rows=10000 loops=1)
-> Sort (cost=1184359.89..1186097.37 rows=694989 width=95) (actual time=39394.488..39449.924 rows=10000 loops=1)
Sort Key: (min("timestamp")) DESC
Sort Method: top-N heapsort Memory: 3375kB
-> Finalize GroupAggregate (cost=856703.95..1134710.88 rows=694989 width=95) (actual time=30343.111..39339.675 rows=97445 loops=1)
Group Key: evse_label, evc_model, address, city, message, reason
-> Gather Merge (cost=856703.95..1096486.49 rows=1389978 width=95) (actual time=30343.062..39141.342 rows=231382 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=855703.92..935048.51 rows=694989 width=95) (actual time=29867.817..37735.438 rows=77127 loops=3)
Group Key: evse_label, evc_model, address, city, message, reason
-> Sort (cost=855703.92..862943.39 rows=2895788 width=79) (actual time=29867.788..35909.736 rows=2317226 loops=3)
Sort Key: evse_label, evc_model, address, city, message, reason
Sort Method: external merge Disk: 215024kB
Worker 0: Sort Method: external merge Disk: 203704kB
Worker 1: Sort Method: external merge Disk: 219048kB
-> Parallel Seq Scan on om_logbook_mart (cost=0.00..287564.88 rows=2895788 width=79) (actual time=0.033..1852.787 rows=2317226 loops=3)
Planning Time: 1.285 ms
Execution Time: 39478.630 ms
Can you make me understand how to optimize it?
UPDATE: After setting work_mem to 500MB:
Limit (cost=543440.12..543465.12 rows=10000 width=95) (actual time=6466.267..6468.723 rows=10000 loops=1)
-> Sort (cost=543440.12..545184.77 rows=697860 width=95) (actual time=6466.265..6467.809 rows=10000 loops=1)
Sort Key: (min("timestamp")) DESC
Sort Method: top-N heapsort Memory: 3646kB
-> HashAggregate (cost=486607.40..493586.00 rows=697860 width=95) (actual time=6384.249..6425.960 rows=97403 loops=1)
Group Key: evse_label, evc_model, address, city, message, reason
Batches: 1 Memory Usage: 49169kB
-> Seq Scan on om_logbook_mart (cost=0.00..329588.97 rows=6978597 width=79) (actual time=0.040..2362.520 rows=6978597 loops=1)
Planning Time: 0.135 ms
Execution Time: 6489.381 ms
Can i do something more?
Related
PostgreSQL 14.6 on x86_64-pc-linux-gnu, compiled by gcc, a 12d889e8a6 p ce8d8b4729, 64-bit
I have an organizations table and a (much smaller) partner_members table that associate some organizations with a partner_member_id.
There is also a convenience view to list organizations with their (potential) partner IDs, defined like this:
select
o.id,
o.name,
o.email,
o.created,
p.member_id AS partner_member_id
from organizations o
left join partner_members p on o.id= p.organization_id
However, this leads to an admin query that queries this view ending up like this:
select count(*) OVER (),"id","name","email","created"
from (
select
o.id,
o.name,
o.email,
o.created,
p.member_id AS partner_member_id
from organizations o
left join partner_members p on o.id= p.organization_id
) _
where ("name" ilike '%example#example.com%')
or ("email" ilike '%example#example.com%')
or ("partner_member_id" ilike '%example#example.com%')
or ("id" ilike '%example#example.com%')
order by "created" desc
offset 0 limit 50;
… which is super slow, since the partner_member_id constraint isn't “pushed” into the sub query, which means that the filtering happens way too late.
Is there a way to make a query such as this efficient, or is this convenience view a no-go here?
Here is the plan:
Limit (cost=12842.32..12848.77 rows=50 width=74) (actual time=2344.828..2385.234 rows=0 loops=1)
Buffers: shared hit=5246, temp read=3088 written=3120
-> WindowAgg (cost=12842.32..12853.80 rows=89 width=74) (actual time=2344.826..2385.232 rows=0 loops=1)
Buffers: shared hit=5246, temp read=3088 written=3120
-> Gather Merge (cost=12842.32..12852.69 rows=89 width=66) (actual time=2344.822..2385.226 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=5246, temp read=3088 written=3120
-> Sort (cost=11842.30..11842.39 rows=37 width=66) (actual time=2322.988..2323.050 rows=0 loops=3)
Sort Key: o.created DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=5246, temp read=3088 written=3120
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
-> Parallel Hash Left Join (cost=3368.61..11841.33 rows=37 width=66) (actual time=2322.857..2322.917 rows=0 loops=3)
Hash Cond: ((o.id)::text = p.organization_id)
Filter: (((o.name)::text ~~* '%example#example.com%'::text) OR ((o.email)::text ~~* '%example#example.com%'::text) OR (p.member_id ~~* '%example#example.com%'::text) OR ((o.id)::text ~~* '%example#example.com%'::text))
Rows Removed by Filter: 73800
Buffers: shared hit=5172, temp read=3088 written=3120
-> Parallel Seq Scan on organizations o (cost=0.00..4813.65 rows=92365 width=66) (actual time=0.020..200.111 rows=73800 loops=3)
Buffers: shared hit=3890
-> Parallel Hash (cost=1926.05..1926.05 rows=71005 width=34) (actual time=108.608..108.610 rows=40150 loops=3)
Buckets: 32768 Batches: 4 Memory Usage: 2432kB
Buffers: shared hit=1216, temp written=620
-> Parallel Seq Scan on partner_members p (cost=0.00..1926.05 rows=71005 width=34) (actual time=0.028..43.757 rows=40150 loops=3)
Buffers: shared hit=1216
Planning:
Buffers: shared hit=24
Planning Time: 1.837 ms
Execution Time: 2385.319 ms
Below is my query. It takes 9387.430 ms to execute, which is certainly too long for such a request. I would like to be able to reduce this execution time. Can you please help me on this ? I also provided my analyze output.
EXPLAIN ANALYZE
SELECT a.artist, b.artist, COUNT(*)
FROM release_has_artist a, release_has_artist b
WHERE a.release = b.release AND a.artist <> b.artist
GROUP BY(a.artist,b.artist)
ORDER BY (a.artist,b.artist);;
Output of EXPLAIN ANALYZE :
Sort (cost=1696482.86..1707588.14 rows=4442112 width=48) (actual time=9253.474..9314.510 rows=461386 loops=1)
Sort Key: (ROW(a.artist, b.artist))
Sort Method: external sort Disk: 24832kB
-> Finalize GroupAggregate (cost=396240.32..932717.19 rows=4442112 width=48) (actual time=1928.058..2911.463 rows=461386 loops=1)
Group Key: a.artist, b.artist
-> Gather Merge (cost=396240.32..860532.87 rows=3701760 width=16) (actual time=1928.049..2494.638 rows=566468 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=395240.29..432257.89 rows=1850880 width=16) (actual time=1912.809..2156.951 rows=188823 loops=3)
Group Key: a.artist, b.artist
-> Sort (cost=395240.29..399867.49 rows=1850880 width=8) (actual time=1912.794..2003.776 rows=271327 loops=3)
Sort Key: a.artist, b.artist
Sort Method: external merge Disk: 4848kB
-> Merge Join (cost=0.85..177260.72 rows=1850880 width=8) (actual time=2.143..1623.628 rows=271327 loops=3)
Merge Cond: (a.release = b.release)
Join Filter: (a.artist <> b.artist)
Rows Removed by Join Filter: 687597
-> Parallel Index Only Scan using release_has_artist_pkey on release_has_artist a (cost=0.43..67329.73 rows=859497 width=8) (actual time=0.059..240.998 rows=687597 loops=3)
Heap Fetches: 711154
-> Index Only Scan using release_has_artist_pkey on release_has_artist b (cost=0.43..79362.68 rows=2062792 width=8) (actual time=0.072..798.402 rows=2329742 loops=3)
Heap Fetches: 2335683
Planning time: 2.101 ms
Execution time: 9387.430 ms
In your EXPLAIN ANALYZE output, there are two Sort Method: external merge Disk: ####kB, indicating that the sort spilled out to disk and not in memory, due to an insufficiently-sized work_mem. Try increasing your work_mem up to 32MB (30 might be ok, but I like multiples of 8), and try again
Note that you can set work_mem on a per-session basis, as a global change in work_mem could potentially have negative side-effects, such as running out of memory, because postgresql.conf-configured work_mem is allocated for each session (basically, it has a multiplicative effect).
I have a Postgres query that works but performs slower than expected.
SELECT
"post"."id",
COUNT(DISTINCT l.id) AS num_likes,
COUNT(DISTINCT ul.id) AS num_user_likes,
COUNT(DISTINCT c.id) AS num_comments
FROM "post"
LEFT JOIN "like" AS "l" ON "l"."post_id" = "post"."id"
LEFT JOIN "like" AS "ul" ON "ul"."post_id" = "post"."id" AND "ul"."user_id" = 1
LEFT JOIN "comment" AS "c" ON "c"."post_id" = "post"."id"
GROUP BY "post"."id"
The query is quite fast if I omit one of the LEFT JOIN statements, but becomes 5-10x slower once I add the third. From my basic understanding of joins, shouldn't Postgres be joining these tables to post separately? Why the spike after the third join?
How may I rewrite this query to be more performant?
Running EXPLAIN (ANALYZE, BUFFERS) on the query locally yields:
GroupAggregate (cost=26.31..114.09 rows=12 width=28) (actual time=11.466..11.580 rows=15 loops=1)
Group Key: post.id
Buffers: shared hit=13
-> Merge Left Join (cost=26.31..80.52 rows=3345 width=16) (actual time=0.171..6.298 rows=20443 loops=1)
Merge Cond: (post.id = l.post_id)
Buffers: shared hit=13
-> Merge Left Join (cost=8.49..11.98 rows=217 width=12) (actual time=0.085..0.682 rows=2042 loops=1)
Merge Cond: (post.id = ul.post_id)
Buffers: shared hit=4
-> Sort (cost=5.40..5.53 rows=51 width=8) (actual time=0.061..0.067 rows=60 loops=1)
Sort Key: post.id
Sort Method: quicksort Memory: 27kB
Buffers: shared hit=3
-> Hash Right Join (cost=2.27..3.96 rows=51 width=8) (actual time=0.027..0.048 rows=60 loops=1)
Hash Cond: (l.post_id = post.id)
Buffers: shared hit=3
-> Seq Scan on like l (cost=0.00..1.51 rows=51 width=8) (actual time=0.005..0.009 rows=49 loops=1)
Buffers: shared hit=1
-> Hash (cost=2.12..2.12 rows=12 width=4) (actual time=0.017..0.017 rows=15 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=2
-> Seq Scan on post (cost=0.00..2.12 rows=12 width=4) (actual time=0.009..0.012 rows=15 loops=1)
Buffers: shared hit=2
-> Sort (cost=3.08..3.21 rows=51 width=8) (actual time=0.021..0.212 rows=2030 loops=1)
Sort Key: ul.post_id
Sort Method: quicksort Memory: 27kB
Buffers: shared hit=1
-> Seq Scan on like ul (cost=0.00..1.64 rows=51 width=8) (actual time=0.004..0.012 rows=49 loops=1)
Filter: (user_id = 1)
Buffers: shared hit=1
-> Sort (cost=17.82..18.28 rows=185 width=8) (actual time=0.084..1.506 rows=20438 loops=1)
Sort Key: c.post_id
Sort Method: quicksort Memory: 34kB
Buffers: shared hit=9
-> Seq Scan on comment c (cost=0.00..10.85 rows=185 width=8) (actual time=0.004..0.045 rows=192 loops=1)
Buffers: shared hit=9
Planning Time: 0.319 ms
Execution Time: 11.624 ms
I am running the below query and it's takig 5 minutes,
SELECT "DID"
FROM location_signals
GROUP BY "DID";
I have an index on DID, which is variable char 100, the table has about 150 million records
how to improve and optimize this further ?
is there any additional indexes that can be added or recommendations? thanks
edit: below results of explain analyse query:
Finalize GroupAggregate (cost=23803276.36..24466411.92 rows=179625 width=44) (actual time=285577.900..321360.237 rows=4833061 loops=1)
Group Key: DID
-> Gather Merge (cost=23803276.36..24462819.42 rows=359250 width=44) (actual time=285577.874..320018.354 rows=10825153 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=23802276.33..24420353.03 rows=179625 width=44) (actual time=281580.548..310818.137 rows=3608384 loops=3)
Group Key: DID
-> Sort (cost=23802276.33..24007703.15 rows=82170727 width=36) (actual time=281580.535..303887.638 rows=65736579 loops=3)
Sort Key: DID
Sort Method: external merge Disk: 2987656kB
Worker 0: Sort Method: external merge Disk: 3099408kB
Worker 1: Sort Method: external merge Disk: 2987648kB
-> Parallel Seq Scan on location_signals (cost=0.00..6259493.27 rows=82170727 width=36) (actual time=0.043..13460.990 rows=65736579 loops=3)
Planning Time: 1.332 ms
Execution Time: 322686.767 ms
As part of a query (one side of a join actually)
SELECT DISTINCT ON (shop_id) shop_id, id, color
FROM products
ORDER BY shop_id, id
There are btree indices on shop_id and id, but for some reasons they are not used.
EXPLAIN ANALYZE says:
-> Unique (cost=724198.71..742348.37 rows=360949 width=71) (actual time=179157.101..195998.646 rows=1673170 loops=1)
-> Sort (cost=724198.71..733273.54 rows=3629931 width=71) (actual time=179157.095..191853.377 rows=3599644 loops=1)
Sort Key: products.shop_id, products.id
Sort Method: external merge Disk: 285064kB
-> Seq Scan on products (cost=0.00..328690.31 rows=3629931 width=71) (actual time=0.025..7575.905 rows=3629713 loops=1)
I also tried to make an multicolum btree index on both shop_id and id, but it wasn't used.. (Maybe I did it wrong and would have had to restart postgresql and everything?)
How can I accelarete this query (with an index or something?) ?
Adding an multicolumn index on all three ("CREATE INDEX products_idx ON products USING btree (shop_id, id, color);") doesn't get used.
I experimented, and if I set a LIMIT of 10000, it will be used:
Limit (cost=0.00..161337.91 rows=10000 width=14) (actual time=0.043..15.973 rows=10000 loops=1)
-> Unique (cost=0.00..2925620.98 rows=181335 width=14) (actual time=0.042..15.249 rows=10000 loops=1)
-> Index Scan using products_idx on products (cost=0.00..2922753.69 rows=1146917 width=14) (actual time=0.041..12.927 rows=14004 loops=1)
Total runtime: 16.293 ms
There are around 3*10^6 entries (3 million)
For larger LIMIT, it uses sequential scan again :(
Limit (cost=213533.52..215114.73 rows=50000 width=14) (actual time=816.580..835.075 rows=50000 loops=1)
-> Unique (cost=213533.52..219268.11 rows=181335 width=14) (actual time=816.578..831.963 rows=50000 loops=1)
-> Sort (cost=213533.52..216400.81 rows=1146917 width=14) (actual time=816.576..823.034 rows=80830 loops=1)
Sort Key: shop_id, id
Sort Method: quicksort Memory: 107455kB
-> Seq Scan on products (cost=0.00..98100.17 rows=1146917 width=14) (actual time=0.019..296.867 rows=1146917 loops=1)
Total runtime: 840.788 ms
(I also had to raise the work_mem here to 128MB, otherwise there would be an external merge which takes even longer)