postgres optimizing "is null" condition on left join - postgresql

I run the below 'is null' condition query on the postgres - 11.9 version and it takes almost 1500-2000 ms for each run. I changed the default statistics to 1000 for the database and executed reindex,vacuum and analyze on the tables involved in the query. Still the query takes ~1100-1300 ms.
If I change the condition to 'is not null' and it executes within 150 ms. And If I disable the merge join in the instance and run the same query with 'is null' clause, it executes within 50ms and the execution plan changes the join from merge to hash join. I rewrote the query to avoid the 'is null' key , but still it generates the same execution plan of 1100 ms.
How can I enforce the hash join in the plan without disabling the merge join? Any suggestions to make it better?
Actual Query:
select ev.id from logevent ev
left join flowtoken FT on ev.uri = 'xxx://xxx/WorkflowToken?id=''' || FT.id || ''' &xx=''Token'''
where
ev.uri like 'xxx://xxx/WorkflowToken?id=%'
and FT.id is null
Table definition:
table - flowtoken
column - id character varying(100) not null, primary key indexes btree(id)
table - logevent
column - id character varying(100) not null, primary key indexes btree(id)
uri character varying(1024) not null, "idx_logeventuri" btree(uri)
Actual Execution Plan:
Gather (cost=23655.83..33636.20 rows=72532 width=33) (actual time=783.987..1117.367 rows=703 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Merger Anti Join (cost-22644.83..25383.00 rows=30222 width=33) (actual time=706.542...972.300 rows=234 loops=3)
Merge Cond: ((ev.uri)::text = ((('xxx://xxx/WorkflowToken?id='''::text || (FT.id)::text) || ''' &xx=''Token'''::text)))
-> Sort (cost=17003.08..17154.19 rows=60443 width=140) (actual time=626.520..739.990 rows=48237 loops=3)
sort key:ev.uri
sort method: external merge disk: 8136kb
worker 0: sort method: external merge disk: 6184kb
worker 1: sort method: external merge disk: 6184kb
-> Parallel Seq Scan on logevent ev (cost=0.00..7862.91 rows=60443 width=140) (actual time=0.022..19.463 rows=48237 loops=3)
Filter: ((uri)::text ~~ 'xxx://xxx/WorkflowToken?id=%'::text)
Rows Removed by filter: 142
-> Sort (cost=5641.08..5757.19 rows=46369 width=37) (actual time=77.520..164.990 rows=46368 loops=3)
sort key:((('xxx://xxx/WorkflowToken?id='''::text || (FT.id)::text) || ''' &xx=''Token'''::text)))
sort method: external merge disk: 6992kb
worker 0: sort method: external merge disk: 6992kb
worker 1: sort method: external merge disk: 6992kb
-> Index Only Scan using pk_flowtoken on flowtoken FT (cost=0.41..2047.91 rows=46369 width=37) (actual time=0.049..7.539 rows=46369 loops=3)
Heap Fetches: 0
Planning Time: 1.381 ms
Execution Time: 1120.433 ms
Execution Plan for 'is not null' condition:
Hash Join (cost=13709.89..39695.69 rows=400812 width=33) (actual time=86.005..149.488 rows=144007 loops=1)
Hash Cond: (('xxx://xxx/WorkflowToken?id='''::text || (FT.id)::text) || ''' &xx=''Token'''::text) = (ev.uri)::text)
-> Index Only Scan using pk_flowtoken on flowtoken FT (cost=0.41..2163.87 rows=46369 width=37) (actual time=0.020..4.256 rows=46369 loops=1)
Index Cond: (id is not null)
Heap fetches: 0
-> Hash (cost=8921.19..8921.19 rows=145063 width=140) (actual time=85.828..85..829 rows=144710 loops=1)
Buckets: 32768 Batches:8 Memory Usage: 3228kb
-> seq scan on logevent ev (cost=0.00..8921.19 rows=145063 width=140) (actual time=0.013..46.118 rows=144710 loops=1)
Filter:((uri)::text ~~ 'xxx://xxx/WorkflowToken?id=%'::text)
Rows Removed by filter: 425
Planning Time: 0.417 ms
Execution Time: 153.211 ms
** Executution Plan after disabled merge join**
Gather (cost=3018.96..85692.57 rows=72532 width=33) (actual time=15.420..50.290 rows=703 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Hash Anti Join (cost=2018.96..77439.37 rows=30222 width=33) (actual time=8.375..40.668 rows=234 loops=3)
Hash Cond: ((ev.uri)::text = ((('xxx://xxx/WorkflowToken?id='''::text || (FT.id)::text) || ''' &xx=''Token'''::text)))
-> Parallel seq scan on logevent ev (cost=0.00..7862.91 rows=60443 width=140) (actual time=0.012..20.100 rows=48237 loops=3)
Filter: ((uri)::text ~~ 'xxx://xxx/WorkflowToken?id=%'::text)
Rows Removed by filter:142
-> Parallel Index Only Scan using pk_flowtoken on flowtoken FT (cost=0.41..1777.46 rows=19320 width=37) (actual time=0.031..1.829 rows=15456 loops=3)
Heap Fetches: 0
Planning Time: 0.240 ms
Execution Time: 50.363 ms
Query Rewrote:
select ev.id
from logevent ev
where not exists ( select *
from flowtoken FT
where ev.uri = 'xxx://xxx/WorkflowToken?id=''' || FT.id || ''' &xx=''Token''')
and ev.uri like 'xxx://xxx/WorkflowToken?id=%'
Execution plan after increasing the work mem to 64mb
Gather (cost=19304.83..29296.20 rows=72532 width=33) (actual time=1029.283..1114.902 rows=703 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Merger Anti Join (cost-18304.83..21043.00 rows=30222 width=33) (actual time=821.942...901.624 rows=234 loops=3)
Merge Cond: ((ev.uri)::text = ((('xxx://xxx/WorkflowToken?id='''::text || (FT.id)::text) || ''' &xx=''Token'''::text)))
-> Sort (cost=12663.08..12814.19 rows=60443 width=140) (actual time=746.900..753.702 rows=48237 loops=3)
sort key:ev.uri
sort method: quicksort memory: 17703kb
worker 0: sort method: quicksort memory: 12731kb
worker 1: sort method: quicksort memory: 12614kb
-> Parallel Seq Scan on logevent ev (cost=0.00..7862.91 rows=60443 width=140) (actual time=0.011..23.60 rows=48237 loops=3)
Filter: ((uri)::text ~~ 'xxx://xxx/WorkflowToken?id=%'::text)
Rows Removed by filter: 142
-> Sort (cost=5641.08..5757.19 rows=46369 width=37) (actual time=74.520..77.532 rows=46368 loops=3)
sort key:((('xxx://xxx/WorkflowToken?id='''::text || (FT.id)::text) || ''' &xx=''Token'''::text)))
sort method: quicksort memory:13853kb
worker 0: sort method: quicksort memory: 13853kb
worker 1: sort method: quicksort memory: 13853kb
-> Index Only Scan using pk_flowtoken on flowtoken FT (cost=0.41..2047.91 rows=46369 width=37) (actual time=0.049..7.539 rows=46369 loops=3)
Heap Fetches: 0
Planning Time: 0.445 ms
Execution Time: 1118.389 ms
'''

Stick with the rewritten query with the NOT EXISTS clause. Extended statistics on the expression (available since PostgreSQL v14) may improve the estimate and get you a better plan:
CREATE STATISTICS url_stats
ON ('xxx://xxx/WorkflowToken?id=''' || id || ''' &xx=''Token''')
FROM flowtoken;
ANALYZE flowtoken;

To my surprise, I can reproduce the horrible sort performance just by using en_US.UTF-8. Maybe that is because all of the data has the same prefix which must be tested equal before it gets to the part that matters. (I don't think en_US.UTF-8 is normally all that much slower than C).
Hash antijoins are severely (and in my opinion, unwisely) punished by the planner when it can't estimate the number of distinct values on the 2nd side, which it generally can't when the join is being done on an expression rather than a simple column. Laurenz's answer provides one way to address that but it only works on newer versions.
But there is another way on older versions, create an expression index.
create index on flowtoken (('xxx://xxx/WorkflowToken?id=''' || id || ''' &xx=''Token'''));
analyze flowtoken;
That index will probably not be used in your query, but it will trigger the collection of statistics and those statistics will get used in planning the query.

Related

Postgres not using index when ORDER BY and LIMIT when LIMIT above X

I have been trying to debug an issue with postgres where it decides to not use an index when LIMIT is above a specific value.
For example I have a table of 150k rows and when searching with LIMIT of 286 it uses the index while with LIMIT above 286 it does not.
LIMIT 286 uses index
db=# explain (analyze, buffers) SELECT * FROM tempz.tempx AS r INNER JOIN tempz.tempy AS z ON (r.id_tempy=z.id) WHERE z.int_col=2000 AND z.string_col='temp_string' ORDER BY r.name ASC, r.type ASC, r.id ASC LIMIT 286;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.56..5024.12 rows=286 width=810) (actual time=0.030..0.992 rows=286 loops=1)
Buffers: shared hit=921
-> Nested Loop (cost=0.56..16968.23 rows=966 width=810) (actual time=0.030..0.977 rows=286 loops=1)
Join Filter: (r.id_tempy = z.id)
Rows Removed by Join Filter: 624
Buffers: shared hit=921
-> Index Scan using tempz_tempx_name_type_id_idx on tempx r (cost=0.42..14357.69 rows=173878 width=373) (actual time=0.016..0.742 rows=910 loops=1)
Buffers: shared hit=919
-> Materialize (cost=0.14..2.37 rows=1 width=409) (actual time=0.000..0.000 rows=1 loops=910)
Buffers: shared hit=2
-> Index Scan using tempy_string_col_idx on tempy z (cost=0.14..2.37 rows=1 width=409) (actual time=0.007..0.008 rows=1 loops=1)
Index Cond: (string_col = 'temp_string'::text)
Filter: (int_col = 2000)
Buffers: shared hit=2
Planning Time: 0.161 ms
Execution Time: 1.032 ms
(16 rows)
vs.
LIMIT 287 doing sort
db=# explain (analyze, buffers) SELECT * FROM tempz.tempx AS r INNER JOIN tempz.tempy AS z ON (r.id_tempy=z.id) WHERE z.int_col=2000 AND z.string_col='temp_string' ORDER BY r.name ASC, r.type ASC, r.id ASC LIMIT 287;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4976.86..4977.58 rows=287 width=810) (actual time=49.802..49.828 rows=287 loops=1)
Buffers: shared hit=37154
-> Sort (cost=4976.86..4979.27 rows=966 width=810) (actual time=49.801..49.813 rows=287 loops=1)
Sort Key: r.name, r.type, r.id
Sort Method: top-N heapsort Memory: 506kB
Buffers: shared hit=37154
-> Nested Loop (cost=0.42..4932.59 rows=966 width=810) (actual time=0.020..27.973 rows=51914 loops=1)
Buffers: shared hit=37154
-> Seq Scan on tempy z (cost=0.00..12.70 rows=1 width=409) (actual time=0.006..0.008 rows=1 loops=1)
Filter: ((int_col = 2000) AND (string_col = 'temp_string'::text))
Rows Removed by Filter: 2
Buffers: shared hit=1
-> Index Scan using tempx_id_tempy_idx on tempx r (cost=0.42..4340.30 rows=57959 width=373) (actual time=0.012..17.075 rows=51914 loops=1)
Index Cond: (id_tempy = z.id)
Buffers: shared hit=37153
Planning Time: 0.258 ms
Execution Time: 49.907 ms
(17 rows)
Update:
This is Postgres 11 and VACUUM ANALYZE is run daily. Also, I have already tried to use CTE to remove the filter but the problem is the sorting specifically
-> Sort (cost=4976.86..4979.27 rows=966 width=810) (actual time=49.801..49.813 rows=287 loops=1)
Sort Key: r.name, r.type, r.id
Sort Method: top-N heapsort Memory: 506kB
Buffers: shared hit=37154
Update 2:
After running VACUUM ANALYZE the database starts using the index for some hours and then it goes back to not using it.
Turns out that I can force Postgres to avoid doing any sort if I run SET enable_sort TO OFF;. This raises the cost of sorting very high which causes the Postgres planner to do index scan instead.
I am not really sure why Postgres thinks that index scan is so costly cost=0.42..14357.69 and thinks sorting is cheaper and ends up choosing that. It is also very weird that immediately after a VACUUM ANALYZE it analyzes the costs correctly but after some hours it goes back to sorting.
With sort off plan is still not optimized as it does materialize and loads stuff into memory but it is still faster than sorting.

PostgreSQL inefficient index is selected in sub-query

I've a query that for each row from campaigns gets the most recent row from effects:
SELECT
id,
(
SELECT
created
FROM
effects
WHERE
effects.campaignid = campaigns.id
ORDER BY
effects.created DESC
LIMIT 1
) AS last_activity
FROM
campaigns
WHERE
deleted_at IS NULL
AND id in(53, 54);
To optimize the performance for this query I'm using an index effects_campaign_created_idx over (campaignid,created).
Additionally, for another use case, there's the index effects_created_idx on (created).
For some reason, at one moment, the subquery stopped using the correct index effects_campaign_created_idx and started using effects_created_idx instead, which is highly ineffective and takes ~5 minutes to run, instead of ~40ms.
Whenever I execute the internal query alone (using the same campaignid) the correct index is used.
What could be the reason for such behavior on part of the query planner? Should I structure my query differently, so that the right index is chosen?
What are more advanced ways to debug the query planner behavior?
Here's explain analyze results from executing the offending query:
explain analyze SELECT
id,
(
SELECT
created
FROM
effects
WHERE
effects.campaignid = campaigns.id
ORDER BY
effects.created DESC
LIMIT 1) AS last_activity
FROM
campaigns
WHERE
deleted_at IS NULL
AND id in(53, 54);
-----------------------------------------
Seq Scan on campaigns (cost=0.00..4.56 rows=2 width=12) (actual time=330176.476..677186.438 rows=2 loops=1)
" Filter: ((deleted_at IS NULL) AND (id = ANY ('{53,54}'::integer[])))"
Rows Removed by Filter: 45
SubPlan 1
-> Limit (cost=0.43..0.98 rows=1 width=8) (actual time=338593.165..338593.166 rows=1 loops=2)
-> Index Scan Backward using effects_created_idx on effects (cost=0.43..858859.67 rows=1562954 width=8) (actual time=338593.160..338593.160 rows=1 loops=2)
Filter: (campaignid = campaigns.id)
Rows Removed by Filter: 14026092
Planning Time: 0.245 ms
Execution Time: 677195.239 ms
Following advice here, tried moving to MAX(created) instead of using the Subquery with ORDER BY created DESC LIMIT 1. Unfortunately the results are still poor:
EXPLAIN ANALYZE SELECT
campaigns.id,
subquery.created
FROM
campaigns
LEFT JOIN (
SELECT
campaignid,
MAX(created) created
FROM
effects
GROUP BY
campaignid) subquery ON campaigns.id = subquery.campaignid
WHERE
campaigns.deleted_at IS NULL
AND campaigns.id in(53, 54);
Hash Right Join (cost=667460.06..667462.46 rows=2 width=12) (actual time=30516.620..30573.091 rows=2 loops=1)
Hash Cond: (effects.campaignid = campaigns.id)
-> Finalize GroupAggregate (cost=667457.45..667459.73 rows=9 width=16) (actual time=30251.920..30308.379 rows=23 loops=1)
Group Key: effects.campaignid
-> Gather Merge (cost=667457.45..667459.55 rows=18 width=16) (actual time=30251.832..30308.271 rows=49 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=666457.43..666457.45 rows=9 width=16) (actual time=30156.539..30156.544 rows=16 loops=3)
Sort Key: effects.campaignid
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
-> Partial HashAggregate (cost=666457.19..666457.28 rows=9 width=16) (actual time=30155.951..30155.957 rows=16 loops=3)
Group Key: effects.campaignid
Batches: 1 Memory Usage: 24kB
Worker 0: Batches: 1 Memory Usage: 24kB
Worker 1: Batches: 1 Memory Usage: 24kB
-> Parallel Seq Scan on effects (cost=0.00..637166.13 rows=5858213 width=16) (actual time=220.784..28693.182 rows=4684157 loops=3)
-> Hash (cost=2.59..2.59 rows=2 width=4) (actual time=264.653..264.656 rows=2 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on campaigns (cost=0.00..2.59 rows=2 width=4) (actual time=264.612..264.640 rows=2 loops=1)
" Filter: ((deleted_at IS NULL) AND (id = ANY ('{53,54}'::integer[])))"
Rows Removed by Filter: 45
Planning Time: 0.354 ms
JIT:
Functions: 34
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 9.958 ms, Inlining 409.293 ms, Optimization 308.279 ms, Emission 206.936 ms, Total 934.465 ms
Execution Time: 30578.920 ms
Notes
It is not the case where there is a majority of effects row with campaignid in (53,54).
I've reindexed and analyzed tables already.
[edit: the index was created with USING btree]
Try refactoring your query to eliminate the correlated subquery. Subqueries like that with LIMIT clauses in them can baffle query planners. What you want is the latest created date for each campaignid. So your subquery will be this:
SELECT campaignid, MAX(created) created
FROM effects
GROUP BY campaignid
You can then build it into your main query like this.
SELECT campaigns.id, subquery.created
FROM campaigns
LEFT JOIN (
SELECT campaignid, MAX(created) created
FROM effects
GROUP BY campaignid
) subquery ON campaigns.id = subquery.campaignid
WHERE campaigns.deleted_at IS NULL
AND campaigns.id in(53, 54);
This allows the query planner to run that subquery efficiently and just once. It will use your (campaignid,created) index.
Asking "why" about query planner output is a tricky business. Query planners are very complex beasts. And correlated subqueries present planning complexities. It's possible a growing table changed some sort of internal index-selectiveness metric.
Pro tip Avoid correlated subqueries whenever possible, especially in long-lived code in growing systems. It isn't always possible, though.

Search results using ts_vectors and ST_Distance

We have a page where we show a list of results, and the results must be relevant given 2 factors:
keyword similarity
location
we are using postgresql postgis and ts_vectors, however, we don't know how to combine the scores coming of ts vectors and st_distance in order to have the "best" search results, the queries seem to be taking between 30 seconds and 1 minute.
SELECT [121/1808]
ts_rank_cd(doc_vectors, plainto_tsquery('Uber '), 1 | 4 | 32) AS rank, ts_headline('english', short_job_description, plainto_tsquery('Uber '), 'MaxWords=80,MinWords=50'),
-- a bunch of fields omitted...
org.logo
FROM jobs.job as job
LEFT OUTER JOIN jobs.organization as org
ON job.organization_id = org.id
WHERE job.is_expired = 0 and deleted_at is NULL and doc_vectors ## plainto_tsquery('Uber ') order by rank desc offset 80 limit 20;
Do you guys have suggestions for us?
EXPLAIN (ANALYZE, BUFFERS) for same Query:
----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=886908.73..886908.81 rows=30 width=1108) (actual time=20684.508..20684.518 rows=30 loops=1)
Buffers: shared hit=1584 read=825114
-> Sort (cost=886908.68..889709.48 rows=1120318 width=1108) (actual time=20684.502..20684.509 rows=50 loops=1)
Sort Key: job.created_at DESC
Sort Method: top-N heapsort Memory: 75kB
Buffers: shared hit=1584 read=825114
-> Hash Left Join (cost=421.17..849692.52 rows=1120318 width=1108) (actual time=7.012..18887.816 rows=1111019 loops=1)
Hash Cond: (job.organization_id = org.id)
Buffers: shared hit=1581 read=825114
-> Seq Scan on job (cost=0.00..846329.53 rows=1120318 width=1001) (actual time=0.052..17866.594 rows=1111019 loops=1)
Filter: ((deleted_at IS NULL) AND (is_expired = 0) AND (is_hidden = 0))
Rows Removed by Filter: 196298
Buffers: shared hit=1564 read=824989
-> Hash (cost=264.41..264.41 rows=12541 width=107) (actual time=6.898..6.899 rows=12541 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 1037kB
Buffers: shared hit=14 read=125
-> Seq Scan on organization org (cost=0.00..264.41 rows=12541 width=107) (actual time=0.021..3.860 rows=12541 loops=1)
Buffers: shared hit=14 read=125
Planning time: 2.223 ms
Execution time: 20684.682 ms```

Postgres 9.6 function performing poorly compared to straight sql

I have this function, and it works, it gives the most recent b record.
create or replace function most_recent_b(the_a a) returns b as $$
select distinct on (c.a_id) b.*
from c
join b on b.c_id = c.id
where c.a_id = the_a.id
order by c.a_id, b.date desc
$$ language sql stable;
This runs ~5000ms with real data. V.S. the following which runs in 500ms
create or replace function most_recent_b(the_a a) returns b as $$
select distinct on (c.a_id) b.*
from c
join b on b.c_id = c.id
where c.a_id = 1347
order by c.a_id, b.date desc
$$ language sql stable;
The only difference being that I've hard coded a.id with a value 1347 instead of using its param value.
Also running this query without a function also gives me speeds around 500ms
I'm running PostgreSQL 9.6, so the query planner failing in functions results I see suggested elsewhere shouldn't apply to me right?
I'm sure its not the query itself that is the issue, as this is my third iteration at it, different techniques to get this result all result in the same slow down when inside a function.
As requested by #laurenz-albe
Result of EXPLAIN (ANALYZE, BUFFERS) with constant
Unique (cost=60.88..60.89 rows=3 width=463) (actual time=520.117..520.122 rows=1 loops=1)
Buffers: shared hit=14555
-> Sort (cost=60.88..60.89 rows=3 width=463) (actual time=520.116..520.120 rows=9 loops=1)
Sort Key: b.date DESC
Sort Method: quicksort Memory: 28kB
Buffers: shared hit=14555
-> Hash Join (cost=13.71..60.86 rows=3 width=463) (actual time=386.848..520.083 rows=9 loops=1)
Hash Cond: (b.c_id = c.id)
Buffers: shared hit=14555
-> Seq Scan on b (cost=0.00..46.38 rows=54 width=459) (actual time=25.362..519.140 rows=51 loops=1)
Filter: b_can_view(b.*)
Rows Removed by Filter: 112
Buffers: shared hit=14530
-> Hash (cost=13.67..13.67 rows=3 width=8) (actual time=0.880..0.880 rows=10 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=25
-> Subquery Scan on c (cost=4.21..13.67 rows=3 width=8) (actual time=0.222..0.872 rows=10 loops=1)
Buffers: shared hit=25
-> Bitmap Heap Scan on c c_1 (cost=4.21..13.64 rows=3 width=2276) (actual time=0.221..0.863 rows=10 loops=1)
Recheck Cond: (a_id = 1347)
Filter: c_can_view(c_1.*)
Heap Blocks: exact=4
Buffers: shared hit=25
-> Bitmap Index Scan on c_a_id_c_number_idx (cost=0.00..4.20 rows=8 width=0) (actual time=0.007..0.007 rows=10 loops=1)
Index Cond: (a_id = 1347)
Buffers: shared hit=1
Execution time: 520.256 ms
And this is the result after running six times with the parameter being passed ( it was exactly six times as you predicted :) )
Slow query;
Unique (cost=57.07..57.07 rows=1 width=463) (actual time=5040.237..5040.243 rows=1 loops=1)
Buffers: shared hit=145325
-> Sort (cost=57.07..57.07 rows=1 width=463) (actual time=5040.237..5040.240 rows=9 loops=1)
Sort Key: b.date DESC
Sort Method: quicksort Memory: 28kB
Buffers: shared hit=145325
-> Nested Loop (cost=0.14..57.06 rows=1 width=463) (actual time=912.354..5040.195 rows=9 loops=1)
Join Filter: (c.id = b.c_id)
Rows Removed by Join Filter: 501
Buffers: shared hit=145325
-> Index Scan using c_a_id_idx on c (cost=0.14..9.45 rows=1 width=2276) (actual time=0.378..1.171 rows=10 loops=1)
Index Cond: (a_id = $1)
Filter: c_can_view(c.*)
Buffers: shared hit=25
-> Seq Scan on b (cost=0.00..46.38 rows=54 width=459) (actual time=24.842..503.854 rows=51 loops=10)
Filter: b_can_view(b.*)
Rows Removed by Filter: 112
Buffers: shared hit=145300
Execution time: 5040.375 ms
Its worth noting that I have some strict row level security involved, and I suspect this is why these queries are both slow, however, one is 10 times slower than the other.
I've changed my original table names hopefully my search and replace was good here.
The expensive part of your query execution is the filter b_can_view(b.*), which must come from your row level security definition.
The fast execution:
Seq Scan on b (cost=0.00..46.38 rows=54 width=459)
(actual time=25.362..519.140 rows=51 loops=1)
Filter: b_can_view(b.*)
Rows Removed by Filter: 112
Buffers: shared hit=14530
The slow execution:
Seq Scan on b (cost=0.00..46.38 rows=54 width=459)
(actual time=24.842..503.854 rows=51 loops=10)
Filter: b_can_view(b.*)
Rows Removed by Filter: 112
Buffers: shared hit=145300
The difference is that the scan is executed 10 times in the slow case (loops=10) and touches 10 times as many data blocks.
When using the generic plan, PostgreSQL underestimates how many rows in c will satisfy the condition c.a_id = $1, because it doesn't know that the actual value is 1347, which is more frequent than average.
Since PostgreSQL thinks there will be at most one row from c, it chooses a nested loop join with a sequential scan of b on the inner side.
Now two problems combine:
Calling function b_can_view takes over 3 milliseconds per row (which PostgreSQL doesn't know), which accounts for the half second that a sequential scan of the 163 rows takes.
There are actually 10 rows found in c instead of the predicted 1, so table b is scanned 10 times, and you end up with a query duration of 5 seconds.
So what can you do?
Tell PostgreSQL how expensive the b_can_view is. Use ALTER TABLE to set the COST for that function to 1000 or 10000 to reflect reality. That alone will not be enough to get a faster plan, since PostgreSQL thinks that it has to execute a single sequential scan anyway, but it is a good thing to give the optimizer correct data.
Create an index on b(c_id). That will enable PostgreSQL to avoid a sequential scan of b, which it will try to do once it is aware how expensive the function is.
Also, try to make the function b_can_view cheaper. That will make your experience so much better.

Configuration parameter work_mem in PostgreSQL on Linux

I have to optimize queries by tuning basic PostgreSQL server configuration parameters. In documentation I've came across the work_mem parameter. Then I checked how changing this parameter would influence performance of my query (using sort). I measured query execution time with various work_mem settings and was very disappointed.
The table on which I perform my query contains 10,000,000 rows and there are 430 MB of data to sort. (Sort Method: external merge Disk: 430112kB).
With work_mem = 1MB, EXPLAIN output is:
Total runtime: 29950.571 ms (sort takes about 19300 ms).
Sort (cost=4032588.78..4082588.66 rows=19999954 width=8)
(actual time=22577.149..26424.951 rows=20000000 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 430104kB
With work_mem = 5MB:
Total runtime: 36282.729 ms (sort: 25400 ms).
Sort (cost=3485713.78..3535713.66 rows=19999954 width=8)
(actual time=25062.383..33246.561 rows=20000000 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 430104kB
With work_mem = 64MB:
Total runtime: 42566.538 ms (sort: 31000 ms).
Sort (cost=3212276.28..3262276.16 rows=19999954 width=8)
(actual time=28599.611..39454.279 rows=20000000 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 430104kB
Can anyone explain why performance gets worse? Or suggest any other methods to makes queries execution faster by changing server parameters?
My query (I know it's not optimal, but I have to benchmark this kind of query):
SELECT n
FROM (
SELECT n + 1 AS n FROM table_name
EXCEPT
SELECT n FROM table_name) AS q1
ORDER BY n DESC;
Full execution plan:
Sort (cost=5805421.81..5830421.75 rows=9999977 width=8) (actual time=30405.682..30405.682 rows=1 loops=1)
Sort Key: q1.n
Sort Method: quicksort Memory: 25kB
-> Subquery Scan q1 (cost=4032588.78..4232588.32 rows=9999977 width=8) (actual time=30405.636..30405.637 rows=1 loops=1)
-> SetOp Except (cost=4032588.78..4132588.55 rows=9999977 width=8) (actual time=30405.634..30405.634 rows=1 loops=1)
-> Sort (cost=4032588.78..4082588.66 rows=19999954 width=8) (actual time=23046.478..27733.020 rows=20000000 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 430104kB
-> Append (cost=0.00..513495.02 rows=19999954 width=8) (actual time=0.040..8191.185 rows=20000000 loops=1)
-> Subquery Scan "*SELECT* 1" (cost=0.00..269247.48 rows=9999977 width=8) (actual time=0.039..3651.506 rows=10000000 loops=1)
-> Seq Scan on table_name (cost=0.00..169247.71 rows=9999977 width=8) (actual time=0.038..2258.323 rows=10000000 loops=1)
-> Subquery Scan "*SELECT* 2" (cost=0.00..244247.54 rows=9999977 width=8) (actual time=0.008..2697.546 rows=10000000 loops=1)
-> Seq Scan on table_name (cost=0.00..144247.77 rows=9999977 width=8) (actual time=0.006..1079.561 rows=10000000 loops=1)
Total runtime: 30496.100 ms
I posted your query plan on explain.depesz.com, have a look.
The query planner's estimates are terribly wrong in some places.
Have you run ANALYZE recently?
Read the chapters in the manual on Statistics Used by the Planner and Planner Cost Constants. Pay special attention to the chapters on random_page_cost and default_statistics_target.
You might try:
ALTER TABLE diplomas ALTER COLUMN number SET STATISTICS 1000;
ANALYZE diplomas;
Or go even a higher for a table with 10M rows. It depends on data distribution and actual queries. Experiment. Default is 100, maximum is 10000.
For a database of that size, only 1 or 5 MB of work_mem are generally not enough. Read the Postgres Wiki page on Tuning Postgres that #aleroot linked to.
As your query needs 430104kB of memory on disk according to EXPLAIN output, you have to set work_mem to something like 500MB or more to allow in-memory sorting. In-memory representation of data needs some more space than on-disk representation. You may be interested in what Tom Lane posted on that matter recently.
Increasing work_mem by just a little, like you tried, won't help much or can even slow down. Setting it to high globally can even hurt, especially with concurrent access. Multiple sessions might starve one another for resources. Allocating more for one purpose takes away memory from another if the resource is limited. The best setup depends on the complete situation.
To avoid side effects, only set it high enough locally in your session, and temporarily for the query:
SET work_mem = '500MB';
Reset it to your default afterwards:
RESET work_mem;
Or use SET LOCAL to set it just for the current transaction to begin with.
SET search_path='tmp';
-- Generate some data ...
-- DROP table tmp.table_name ;
-- CREATE table tmp.table_name ( n INTEGER NOT NULL PRIMARY KEY);
-- INSERT INTO tmp.table_name(n) SELECT generate_series(1,1000);
-- DELETE FROM tmp.table_name WHERE random() < 0.05 ;
The except query is equivalent to the following NOT EXISTS form, which generates a different query plan (but the same results) here ( 9.0.1beta something)
-- EXPLAIN ANALYZE
WITH q1 AS (
SELECT 1+tn.n AS n
FROM table_name tn
WHERE NOT EXISTS (
SELECT * FROM table_name nx
WHERE nx.n = tn.n+1
)
)
SELECT q1.n
FROM q1
ORDER BY q1.n DESC;
(a version with a recursive CTE might also be possible :-)
EDIT: the query plans. all for 100K records with 0.2 % deleted
Original query:
------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=36461.76..36711.20 rows=99778 width=4) (actual time=2682.600..2682.917 rows=222 loops=1)
Sort Key: q1.n
Sort Method: quicksort Memory: 22kB
-> Subquery Scan q1 (cost=24984.41..26979.97 rows=99778 width=4) (actual time=2003.047..2682.036 rows=222 loops=1)
-> SetOp Except (cost=24984.41..25982.19 rows=99778 width=4) (actual time=2003.042..2681.389 rows=222 loops=1)
-> Sort (cost=24984.41..25483.30 rows=199556 width=4) (actual time=2002.584..2368.963 rows=199556 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 3512kB
-> Append (cost=0.00..5026.57 rows=199556 width=4) (actual time=0.071..1452.838 rows=199556 loops=1)
-> Subquery Scan "*SELECT* 1" (cost=0.00..2638.01 rows=99778 width=4) (actual time=0.067..470.652 rows=99778 loops=1)
-> Seq Scan on table_name (cost=0.00..1640.22 rows=99778 width=4) (actual time=0.063..178.365 rows=99778 loops=1)
-> Subquery Scan "*SELECT* 2" (cost=0.00..2388.56 rows=99778 width=4) (actual time=0.014..429.224 rows=99778 loops=1)
-> Seq Scan on table_name (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.011..143.320 rows=99778 loops=1)
Total runtime: 2684.840 ms
(14 rows)
NOT EXISTS-version with CTE:
----------------------------------------------------------------------------------------------------------------------
Sort (cost=6394.60..6394.60 rows=1 width=4) (actual time=699.190..699.498 rows=222 loops=1)
Sort Key: q1.n
Sort Method: quicksort Memory: 22kB
CTE q1
-> Hash Anti Join (cost=2980.01..6394.57 rows=1 width=4) (actual time=312.262..697.985 rows=222 loops=1)
Hash Cond: ((tn.n + 1) = nx.n)
-> Seq Scan on table_name tn (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.013..143.210 rows=99778 loops=1)
-> Hash (cost=1390.78..1390.78 rows=99778 width=4) (actual time=309.923..309.923 rows=99778 loops=1)
-> Seq Scan on table_name nx (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.007..144.102 rows=99778 loops=1)
-> CTE Scan on q1 (cost=0.00..0.02 rows=1 width=4) (actual time=312.270..698.742 rows=222 loops=1)
Total runtime: 700.040 ms
(11 rows)
NOT EXISTS-version without CTE
--------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=6394.58..6394.58 rows=1 width=4) (actual time=692.313..692.625 rows=222 loops=1)
Sort Key: ((1 + tn.n))
Sort Method: quicksort Memory: 22kB
-> Hash Anti Join (cost=2980.01..6394.57 rows=1 width=4) (actual time=308.046..691.849 rows=222 loops=1)
Hash Cond: ((tn.n + 1) = nx.n)
-> Seq Scan on table_name tn (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.014..142.781 rows=99778 loops=1)
-> Hash (cost=1390.78..1390.78 rows=99778 width=4) (actual time=305.732..305.732 rows=99778 loops=1)
-> Seq Scan on table_name nx (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.007..143.783 rows=99778 loops=1)
Total runtime: 693.139 ms
(9 rows)
My conclusion is that the "NOT EXISTS" versions cause postgres to produce better plans.