Can this self-join be optimized further? - postgresql

I'm trying to understand if it's possible to optimize the query containing a self-join, and if it is possible - how to do it.
I'm working on a bigger real-life task, but here I extracted a simple sub-task from it to keep focus on a particular issue: optimizing a self-join query.
I have a table called parties. It contains over 85k records and looks like this:
# \d test.parties
Table "test.parties"
Column | Type | Collation | Nullable | Default
-------------+------+-----------+----------+---------
id | uuid | | |
contract_id | uuid | | |
Doing a self-join on contract_id I get this plan:
# explain analyse select p1.id from test.parties p1 join test.parties p2 on p1.contract_id = p2.contract_id;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=20207.87..628157.87 rows=40500000 width=16) (actual time=109.709..184.523 rows=197632 loops=1)
Merge Cond: (p1.contract_id = p2.contract_id)
-> Sort (cost=11181.94..11406.94 rows=90000 width=32) (actual time=55.560..66.173 rows=86332 loops=1)
Sort Key: p1.contract_id
Sort Method: external merge Disk: 3560kB
-> Seq Scan on parties p1 (cost=0.00..1620.00 rows=90000 width=32) (actual time=0.018..14.518 rows=86332 loops=1)
-> Sort (cost=9025.94..9250.94 rows=90000 width=16) (actual time=54.135..74.973 rows=197631 loops=1)
Sort Key: p2.contract_id
Sort Method: external sort Disk: 2544kB
-> Seq Scan on parties p2 (cost=0.00..1620.00 rows=90000 width=16) (actual time=0.009..10.462 rows=86332 loops=1)
Planning Time: 0.167 ms
Execution Time: 199.677 ms
(12 rows)
Adding an index on contract_id I get this plan:
# create index on test.parties(contract_id);
CREATE INDEX
# explain analyse select p1.id from test.parties p1 join test.parties p2 on p1.contract_id = p2.contract_id;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=3084.47..10570.76 rows=192484 width=16) (actual time=32.457..97.662 rows=197632 loops=1)
Hash Cond: (p1.contract_id = p2.contract_id)
-> Seq Scan on parties p1 (cost=0.00..1583.32 rows=86332 width=32) (actual time=0.013..11.293 rows=86332 loops=1)
-> Hash (cost=1583.32..1583.32 rows=86332 width=16) (actual time=32.133..32.133 rows=86332 loops=1)
Buckets: 131072 Batches: 2 Memory Usage: 3048kB
-> Seq Scan on parties p2 (cost=0.00..1583.32 rows=86332 width=16) (actual time=0.007..12.815 rows=86332 loops=1)
Planning Time: 0.444 ms
Execution Time: 110.692 ms
(8 rows)
Is there a way I could get rid of those Seq Scans?

I don't see the presence of any index in your explain plan, so assign that you have not yet looked into using indices, here is one suggestion:
CREATE INDEX idx ON parties (contract_id, id);
This should speed up the join, and it also covers the id value, which is required in the SELECT clause.

Related

Unusually slow query across a join table

I have a query that is strangely slow in Postgres 13 for a database containing only small amounts of data. I have even seen the problem in my test suite where I fabricate some fake data.
SELECT sales.* FROM sales
INNER JOIN members ON members.id = sales.member_id
INNER JOIN members_teams ON members_teams.member_id = members.id
INNER JOIN teams ON teams.id = members_teams.team_id
WHERE teams.id IN (1, 2)
In my test suite I have the following counts of data in the different tables:
| Table | Count |
| -------- | -------------- |
| members | 501 |
| teams | 3 |
| members_teams | 501 |
| sales | 502 |
Here is an example of when it is slow:
Nested Loop (cost=0.75..25.83 rows=1 width=631) (actual time=38226.620..38226.622 rows=0 loops=1)
Join Filter: (members_teams.team_id = teams.id)
-> Nested Loop (cost=0.75..24.82 rows=1 width=635) (actual time=0.082..38220.385 rows=502 loops=1)
Join Filter: (members.id = members_teams.member_id)
Rows Removed by Join Filter: 251000
-> Index Scan using index_members_teams_on_team_id on members_teams (cost=0.25..8.26 rows=1 width=8) (actual time=0.031..0.544 rows=501 loops=1)
-> Nested Loop (cost=0.50..16.54 rows=1 width=635) (actual time=0.014..76.217 rows=502 loops=501)
Join Filter: (sales.member_id = members.id)
Rows Removed by Join Filter: 125250
-> Index Scan using index_sales_on_member_id on sales (cost=0.25..8.26 rows=1 width=631) (actual time=0.005..0.262 rows=502 loops=501)
-> Index Only Scan using members_pkey on members (cost=0.25..8.26 rows=1 width=4) (actual time=0.008..0.124 rows=251 loops=251502)
Heap Fetches: 63001752
-> Seq Scan on teams (cost=0.00..1.00 rows=1 width=4) (actual time=0.005..0.005 rows=0 loops=502)
Filter: (id = ANY ('{1,2}'::integer[]))
Rows Removed by Filter: 3
Planning Time: 0.690 ms
Execution Time: 38226.701 ms
Here is an example of when it is a more normal speed:
Nested Loop (cost=0.75..24.82 rows=1 width=631) (actual time=224.746..224.747 rows=0 loops=1)
Join Filter: (members.id = members_teams.member_id)
-> Nested Loop (cost=0.50..16.54 rows=1 width=635) (actual time=0.047..80.953 rows=502 loops=1)
Join Filter: (sales.member_id = members.id)
Rows Removed by Join Filter: 125250
-> Index Scan using index_sales_on_member_id on sales (cost=0.25..8.26 rows=1 width=631) (actual time=0.015..0.367 rows=502 loops=1)
-> Index Only Scan using members_pkey on members (cost=0.25..8.26 rows=1 width=4) (actual time=0.009..0.131 rows=251 loops=502)
Heap Fetches: 125752
-> Index Only Scan using index_members_teams_on_member_id_and_team_id on members_teams (cost=0.25..8.27 rows=1 width=4) (actual time=0.286..0.286 rows=0 loops=502)
Filter: (team_id = ANY (‘{1,2}’::integer[]))
Rows Removed by Filter: 501
Heap Fetches: 251502
Planning Time: 0.481 ms
Execution Time: 224.798 ms
Summary
A key difference seems to be which index it uses for the join table members_teams. Do you have any suggestions for how I can make this consistently performant? I thought about removing the join to teams and filtering on the team_id on the join table, but I'm worried that in the future we may need to use this query with additional constraints from the teams table.
Your estimates seem completely off. Do you have autovacuum disabled, or is your statistics collector malfunctioning? You should get better plans by explicitly collecting statistics:
ANALYZE sales;
ANALYZE members;
ANALYZE members_teams;

Search results using ts_vectors and ST_Distance

We have a page where we show a list of results, and the results must be relevant given 2 factors:
keyword similarity
location
we are using postgresql postgis and ts_vectors, however, we don't know how to combine the scores coming of ts vectors and st_distance in order to have the "best" search results, the queries seem to be taking between 30 seconds and 1 minute.
SELECT [121/1808]
ts_rank_cd(doc_vectors, plainto_tsquery('Uber '), 1 | 4 | 32) AS rank, ts_headline('english', short_job_description, plainto_tsquery('Uber '), 'MaxWords=80,MinWords=50'),
-- a bunch of fields omitted...
org.logo
FROM jobs.job as job
LEFT OUTER JOIN jobs.organization as org
ON job.organization_id = org.id
WHERE job.is_expired = 0 and deleted_at is NULL and doc_vectors ## plainto_tsquery('Uber ') order by rank desc offset 80 limit 20;
Do you guys have suggestions for us?
EXPLAIN (ANALYZE, BUFFERS) for same Query:
----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=886908.73..886908.81 rows=30 width=1108) (actual time=20684.508..20684.518 rows=30 loops=1)
Buffers: shared hit=1584 read=825114
-> Sort (cost=886908.68..889709.48 rows=1120318 width=1108) (actual time=20684.502..20684.509 rows=50 loops=1)
Sort Key: job.created_at DESC
Sort Method: top-N heapsort Memory: 75kB
Buffers: shared hit=1584 read=825114
-> Hash Left Join (cost=421.17..849692.52 rows=1120318 width=1108) (actual time=7.012..18887.816 rows=1111019 loops=1)
Hash Cond: (job.organization_id = org.id)
Buffers: shared hit=1581 read=825114
-> Seq Scan on job (cost=0.00..846329.53 rows=1120318 width=1001) (actual time=0.052..17866.594 rows=1111019 loops=1)
Filter: ((deleted_at IS NULL) AND (is_expired = 0) AND (is_hidden = 0))
Rows Removed by Filter: 196298
Buffers: shared hit=1564 read=824989
-> Hash (cost=264.41..264.41 rows=12541 width=107) (actual time=6.898..6.899 rows=12541 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 1037kB
Buffers: shared hit=14 read=125
-> Seq Scan on organization org (cost=0.00..264.41 rows=12541 width=107) (actual time=0.021..3.860 rows=12541 loops=1)
Buffers: shared hit=14 read=125
Planning time: 2.223 ms
Execution time: 20684.682 ms```

PostgreSQL recursive CTE performance issue

I'm trying to understand such a huge difference in performance of two queries.
Let's assume I have two tables.
First one contains A records for some set of domains:
Table "public.dns_a"
Column | Type | Modifiers | Storage | Stats target | Description
--------+------------------------+-----------+----------+--------------+-------------
name | character varying(125) | | extended | |
a | inet | | main | |
Indexes:
"dns_a_a_idx" btree (a)
"dns_a_name_idx" btree (name varchar_pattern_ops)
Second table handles CNAME records:
Table "public.dns_cname"
Column | Type | Modifiers | Storage | Stats target | Description
--------+------------------------+-----------+----------+--------------+-------------
name | character varying(256) | | extended | |
cname | character varying(256) | | extended | |
Indexes:
"dns_cname_cname_idx" btree (cname varchar_pattern_ops)
"dns_cname_name_idx" btree (name varchar_pattern_ops)
Now I'm trying to solve "simple" problem with getting all the domains pointing to the same IP address, including CNAME.
The first attempt to use CTE works kind of fine:
EXPLAIN ANALYZE WITH RECURSIVE names_traverse AS (
(
SELECT name::varchar(256), NULL::varchar(256) as cname, a FROM dns_a WHERE a = '118.145.5.20'
)
UNION ALL
SELECT c.name, c.cname, NULL::inet as a FROM names_traverse nt, dns_cname c WHERE c.cname=nt.name
)
SELECT * FROM names_traverse;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on names_traverse (cost=3051757.20..4337044.86 rows=64264383 width=1064) (actual time=0.037..1697.444 rows=199 loops=1)
CTE names_traverse
-> Recursive Union (cost=0.57..3051757.20 rows=64264383 width=45) (actual time=0.036..1697.395 rows=199 loops=1)
-> Index Scan using dns_a_a_idx on dns_a (cost=0.57..1988.89 rows=1953 width=24) (actual time=0.035..0.064 rows=14 loops=1)
Index Cond: (a = '118.145.5.20'::inet)
-> Merge Join (cost=4377.00..176448.06 rows=6426243 width=45) (actual time=498.101..848.648 rows=92 loops=2)
Merge Cond: ((c.cname)::text = (nt.name)::text)
-> Index Scan using dns_cname_cname_idx on dns_cname c (cost=0.56..69958.06 rows=2268434 width=45) (actual time=4.732..688.456 rows=2219973 loops=2)
-> Materialize (cost=4376.44..4474.09 rows=19530 width=516) (actual time=0.039..0.084 rows=187 loops=2)
-> Sort (cost=4376.44..4425.27 rows=19530 width=516) (actual time=0.037..0.053 rows=100 loops=2)
Sort Key: nt.name USING ~<~
Sort Method: quicksort Memory: 33kB
-> WorkTable Scan on names_traverse nt (cost=0.00..390.60 rows=19530 width=516) (actual time=0.001..0.007 rows=100 loops=2)
Planning time: 0.130 ms
Execution time: 1697.477 ms
(15 rows)
There are two loops in the example above, so if I make a simple outer join query, I get much better results:
EXPLAIN ANALYZE
SELECT *
FROM dns_a a
LEFT JOIN dns_cname c1 ON (c1.cname=a.name)
LEFT JOIN dns_cname c2 ON (c2.cname=c1.name)
WHERE a.a='118.145.5.20';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=1.68..65674.19 rows=1953 width=114) (actual time=1.086..12.992 rows=189 loops=1)
-> Nested Loop Left Join (cost=1.12..46889.57 rows=1953 width=69) (actual time=1.085..2.154 rows=189 loops=1)
-> Index Scan using dns_a_a_idx on dns_a a (cost=0.57..1988.89 rows=1953 width=24) (actual time=0.022..0.055 rows=14 loops=1)
Index Cond: (a = '118.145.5.20'::inet)
-> Index Scan using dns_cname_cname_idx on dns_cname c1 (cost=0.56..19.70 rows=329 width=45) (actual time=0.137..0.148 rows=13 loops=14)
Index Cond: ((cname)::text = (a.name)::text)
-> Index Scan using dns_cname_cname_idx on dns_cname c2 (cost=0.56..6.33 rows=329 width=45) (actual time=0.057..0.057 rows=0 loops=189)
Index Cond: ((cname)::text = (c1.name)::text)
Planning time: 0.452 ms
Execution time: 13.012 ms
(10 rows)
Time: 13.787 ms
So, the performance difference is about 100 times and that's the thing that worries me.
I like the convenience of recursive CTE and prefer to use it instead of doing dirty tricks on application side, but I don't get why the cost of Index Scan using dns_cname_cname_idx on dns_cname c (cost=0.56..69958.06 rows=2268434 width=45) (actual time=4.732..688.456 rows=2219973 loops=2) is so high.
Am I missing something important regarding CTE or the issue is with something else?
Thanks!
Update: A friend of mine spotted the number of affected rows I missed Index Scan using dns_cname_cname_idx on dns_cname c (cost=0.56..69958.06 rows=2268434 width=45) (actual time=4.732..688.456 rows=2219973 loops=2), it equals total number of rows in the table and, if I understand correctly, it performs full index scan without condition and I don't get where condition is missed.
Result: After applying SET LOCAL enable_mergejoin TO false; execution time is much, much better.
EXPLAIN ANALYZE WITH RECURSIVE names_traverse AS (
(
SELECT name::varchar(256), NULL::varchar(256) as cname, a FROM dns_a WHERE a = '118.145.5.20'
)
UNION ALL
SELECT c.name, c.cname, NULL::inet as a FROM names_traverse nt, dns_cname c WHERE c.cname=nt.name
)
SELECT * FROM names_traverse;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on names_traverse (cost=4746432.42..6527720.02 rows=89064380 width=1064) (actual time=0.718..45.656 rows=199 loops=1)
CTE names_traverse
-> Recursive Union (cost=0.57..4746432.42 rows=89064380 width=45) (actual time=0.717..45.597 rows=199 loops=1)
-> Index Scan using dns_a_a_idx on dns_a (cost=0.57..74.82 rows=2700 width=24) (actual time=0.716..0.717 rows=14 loops=1)
Index Cond: (a = '118.145.5.20'::inet)
-> Nested Loop (cost=0.56..296507.00 rows=8906168 width=45) (actual time=11.276..22.418 rows=92 loops=2)
-> WorkTable Scan on names_traverse nt (cost=0.00..540.00 rows=27000 width=516) (actual time=0.000..0.013 rows=100 loops=2)
-> Index Scan using dns_cname_cname_idx on dns_cname c (cost=0.56..7.66 rows=330 width=45) (actual time=0.125..0.225 rows=1 loops=199)
Index Cond: ((cname)::text = (nt.name)::text)
Planning time: 0.253 ms
Execution time: 45.697 ms
(11 rows)
The first query is slow because of the index scan, as you noted.
The plan has to scan the complete index in order to get dns_cname sorted by cname, which is needed for the merge join. A merge join requires that both input tables are sorted by the join key, which can either be done with an index scan over the complete table (as in this case) or by a sequential scan followed by an explicit sort.
You will notice that the planner grossly overestimates all row counts for the CTE evaluation, which is probably the root of the problem. For fewer rows, PostgreSQL might choose a nested loop join which would not have to scan the whole table dns_cname.
That may be fixable or not. One thing that I can see immediately is that the estimate for the initial value '118.145.5.20' is too high by a factor 139.5, which is pretty bad. You might fix that by running ANALYZE on dns_cname, perhaps after increasing the statistics target for the column:
ALTER TABLE dns_a ALTER a SET STATISTICS 1000;
See if that makes a difference.
If that doesn't do the trick, you can manually set enable_mergejoin and enable_hashjoin to off and see if a plan with a nested loop join is really better or not. If you can get away with changing these parameters for this one statement only (probably with SET LOCAL) and get a better result that way, that is another option you have.

Select records with IDs containing in another table

vit=# select count(*) from evtags;
count
---------
4496914
vit=# explain select tag from evtags where evid in (1002, 1023);
QUERY PLAN
---------------------------------------------------------------------------------
Index Only Scan using evtags_pkey on evtags (cost=0.00..15.64 rows=12 width=7)
Index Cond: (evid = ANY ('{1002,1023}'::integer[]))
This seems completely ok so far. Next, I want to use IDs from another table instead of specifying them in the query.
vit=# select count(*) from zzz;
count
-------
49738
Here we go...
vit=# explain select tag from evtags where evid in (select evid from zzz);
QUERY PLAN
-----------------------------------------------------------------------
Hash Semi Join (cost=1535.11..142452.47 rows=291712 width=7)
Hash Cond: (evtags.evid = zzz.evid)
-> Seq Scan on evtags (cost=0.00..69283.14 rows=4496914 width=11)
-> Hash (cost=718.38..718.38 rows=49738 width=4)
-> Seq Scan on zzz (cost=0.00..718.38 rows=49738 width=4)
Why index scan on the much more larger table and what's the correct way to do this?
EDIT
I recreated my zzz table and now it is better for some reason:
vit=# explain analyze select tag from evtags where evid in (select evid from zzz);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=708.00..2699.17 rows=2248457 width=7) (actual time=28.935..805.923 rows=244353 loops=1)
-> HashAggregate (cost=708.00..710.00 rows=200 width=4) (actual time=28.893..54.461 rows=38822 loops=1)
-> Seq Scan on zzz (cost=0.00..601.80 rows=42480 width=4) (actual time=0.032..10.985 rows=40000 loops=1)
-> Index Only Scan using evtags_pkey on evtags (cost=0.00..9.89 rows=6 width=11) (actual time=0.015..0.017 rows=6 loops=38822)
Index Cond: (evid = zzz.evid)
Heap Fetches: 0
Total runtime: 825.651 ms
But after several executions it changes to
vit=# explain analyze select tag from evtags where evid in (select evid from zzz);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Merge Semi Join (cost=4184.11..127258.48 rows=235512 width=7) (actual time=38.269..1461.755 rows=244353 loops=1)
Merge Cond: (evtags.evid = zzz.evid)
-> Index Only Scan using evtags_pkey on evtags (cost=0.00..136736.89 rows=4496914 width=11) (actual time=0.038..899.647 rows=3630070 loops=1)
Heap Fetches: 0
-> Materialize (cost=4184.04..4384.04 rows=40000 width=4) (actual time=38.212..61.038 rows=40000 loops=1)
-> Sort (cost=4184.04..4284.04 rows=40000 width=4) (actual time=38.208..51.104 rows=40000 loops=1)
Sort Key: zzz.evid
Sort Method: external sort Disk: 552kB
-> Seq Scan on zzz (cost=0.00..577.00 rows=40000 width=4) (actual time=0.018..8.833 rows=40000 loops=1)
Total runtime: 1484.293 ms
...Which is actually slower. Is there any way to hint it a 'correct' execution plan?
The point of these operations is that I want to perform number of queries on a subset of my data and wanted to use separate temporary table to hold IDs of records I want to process.
An inner join has a better chance of a good plan:
select e.tag
from
evtags e
inner join
zzz z using (evid)
Or this:
select e.tag
from evtags e
where exists (
select 1
from zzz
where evid = e.evid
)
As pointed in the comments run analyze evtags; analyze zzz;

Configuration parameter work_mem in PostgreSQL on Linux

I have to optimize queries by tuning basic PostgreSQL server configuration parameters. In documentation I've came across the work_mem parameter. Then I checked how changing this parameter would influence performance of my query (using sort). I measured query execution time with various work_mem settings and was very disappointed.
The table on which I perform my query contains 10,000,000 rows and there are 430 MB of data to sort. (Sort Method: external merge Disk: 430112kB).
With work_mem = 1MB, EXPLAIN output is:
Total runtime: 29950.571 ms (sort takes about 19300 ms).
Sort (cost=4032588.78..4082588.66 rows=19999954 width=8)
(actual time=22577.149..26424.951 rows=20000000 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 430104kB
With work_mem = 5MB:
Total runtime: 36282.729 ms (sort: 25400 ms).
Sort (cost=3485713.78..3535713.66 rows=19999954 width=8)
(actual time=25062.383..33246.561 rows=20000000 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 430104kB
With work_mem = 64MB:
Total runtime: 42566.538 ms (sort: 31000 ms).
Sort (cost=3212276.28..3262276.16 rows=19999954 width=8)
(actual time=28599.611..39454.279 rows=20000000 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 430104kB
Can anyone explain why performance gets worse? Or suggest any other methods to makes queries execution faster by changing server parameters?
My query (I know it's not optimal, but I have to benchmark this kind of query):
SELECT n
FROM (
SELECT n + 1 AS n FROM table_name
EXCEPT
SELECT n FROM table_name) AS q1
ORDER BY n DESC;
Full execution plan:
Sort (cost=5805421.81..5830421.75 rows=9999977 width=8) (actual time=30405.682..30405.682 rows=1 loops=1)
Sort Key: q1.n
Sort Method: quicksort Memory: 25kB
-> Subquery Scan q1 (cost=4032588.78..4232588.32 rows=9999977 width=8) (actual time=30405.636..30405.637 rows=1 loops=1)
-> SetOp Except (cost=4032588.78..4132588.55 rows=9999977 width=8) (actual time=30405.634..30405.634 rows=1 loops=1)
-> Sort (cost=4032588.78..4082588.66 rows=19999954 width=8) (actual time=23046.478..27733.020 rows=20000000 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 430104kB
-> Append (cost=0.00..513495.02 rows=19999954 width=8) (actual time=0.040..8191.185 rows=20000000 loops=1)
-> Subquery Scan "*SELECT* 1" (cost=0.00..269247.48 rows=9999977 width=8) (actual time=0.039..3651.506 rows=10000000 loops=1)
-> Seq Scan on table_name (cost=0.00..169247.71 rows=9999977 width=8) (actual time=0.038..2258.323 rows=10000000 loops=1)
-> Subquery Scan "*SELECT* 2" (cost=0.00..244247.54 rows=9999977 width=8) (actual time=0.008..2697.546 rows=10000000 loops=1)
-> Seq Scan on table_name (cost=0.00..144247.77 rows=9999977 width=8) (actual time=0.006..1079.561 rows=10000000 loops=1)
Total runtime: 30496.100 ms
I posted your query plan on explain.depesz.com, have a look.
The query planner's estimates are terribly wrong in some places.
Have you run ANALYZE recently?
Read the chapters in the manual on Statistics Used by the Planner and Planner Cost Constants. Pay special attention to the chapters on random_page_cost and default_statistics_target.
You might try:
ALTER TABLE diplomas ALTER COLUMN number SET STATISTICS 1000;
ANALYZE diplomas;
Or go even a higher for a table with 10M rows. It depends on data distribution and actual queries. Experiment. Default is 100, maximum is 10000.
For a database of that size, only 1 or 5 MB of work_mem are generally not enough. Read the Postgres Wiki page on Tuning Postgres that #aleroot linked to.
As your query needs 430104kB of memory on disk according to EXPLAIN output, you have to set work_mem to something like 500MB or more to allow in-memory sorting. In-memory representation of data needs some more space than on-disk representation. You may be interested in what Tom Lane posted on that matter recently.
Increasing work_mem by just a little, like you tried, won't help much or can even slow down. Setting it to high globally can even hurt, especially with concurrent access. Multiple sessions might starve one another for resources. Allocating more for one purpose takes away memory from another if the resource is limited. The best setup depends on the complete situation.
To avoid side effects, only set it high enough locally in your session, and temporarily for the query:
SET work_mem = '500MB';
Reset it to your default afterwards:
RESET work_mem;
Or use SET LOCAL to set it just for the current transaction to begin with.
SET search_path='tmp';
-- Generate some data ...
-- DROP table tmp.table_name ;
-- CREATE table tmp.table_name ( n INTEGER NOT NULL PRIMARY KEY);
-- INSERT INTO tmp.table_name(n) SELECT generate_series(1,1000);
-- DELETE FROM tmp.table_name WHERE random() < 0.05 ;
The except query is equivalent to the following NOT EXISTS form, which generates a different query plan (but the same results) here ( 9.0.1beta something)
-- EXPLAIN ANALYZE
WITH q1 AS (
SELECT 1+tn.n AS n
FROM table_name tn
WHERE NOT EXISTS (
SELECT * FROM table_name nx
WHERE nx.n = tn.n+1
)
)
SELECT q1.n
FROM q1
ORDER BY q1.n DESC;
(a version with a recursive CTE might also be possible :-)
EDIT: the query plans. all for 100K records with 0.2 % deleted
Original query:
------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=36461.76..36711.20 rows=99778 width=4) (actual time=2682.600..2682.917 rows=222 loops=1)
Sort Key: q1.n
Sort Method: quicksort Memory: 22kB
-> Subquery Scan q1 (cost=24984.41..26979.97 rows=99778 width=4) (actual time=2003.047..2682.036 rows=222 loops=1)
-> SetOp Except (cost=24984.41..25982.19 rows=99778 width=4) (actual time=2003.042..2681.389 rows=222 loops=1)
-> Sort (cost=24984.41..25483.30 rows=199556 width=4) (actual time=2002.584..2368.963 rows=199556 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 3512kB
-> Append (cost=0.00..5026.57 rows=199556 width=4) (actual time=0.071..1452.838 rows=199556 loops=1)
-> Subquery Scan "*SELECT* 1" (cost=0.00..2638.01 rows=99778 width=4) (actual time=0.067..470.652 rows=99778 loops=1)
-> Seq Scan on table_name (cost=0.00..1640.22 rows=99778 width=4) (actual time=0.063..178.365 rows=99778 loops=1)
-> Subquery Scan "*SELECT* 2" (cost=0.00..2388.56 rows=99778 width=4) (actual time=0.014..429.224 rows=99778 loops=1)
-> Seq Scan on table_name (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.011..143.320 rows=99778 loops=1)
Total runtime: 2684.840 ms
(14 rows)
NOT EXISTS-version with CTE:
----------------------------------------------------------------------------------------------------------------------
Sort (cost=6394.60..6394.60 rows=1 width=4) (actual time=699.190..699.498 rows=222 loops=1)
Sort Key: q1.n
Sort Method: quicksort Memory: 22kB
CTE q1
-> Hash Anti Join (cost=2980.01..6394.57 rows=1 width=4) (actual time=312.262..697.985 rows=222 loops=1)
Hash Cond: ((tn.n + 1) = nx.n)
-> Seq Scan on table_name tn (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.013..143.210 rows=99778 loops=1)
-> Hash (cost=1390.78..1390.78 rows=99778 width=4) (actual time=309.923..309.923 rows=99778 loops=1)
-> Seq Scan on table_name nx (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.007..144.102 rows=99778 loops=1)
-> CTE Scan on q1 (cost=0.00..0.02 rows=1 width=4) (actual time=312.270..698.742 rows=222 loops=1)
Total runtime: 700.040 ms
(11 rows)
NOT EXISTS-version without CTE
--------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=6394.58..6394.58 rows=1 width=4) (actual time=692.313..692.625 rows=222 loops=1)
Sort Key: ((1 + tn.n))
Sort Method: quicksort Memory: 22kB
-> Hash Anti Join (cost=2980.01..6394.57 rows=1 width=4) (actual time=308.046..691.849 rows=222 loops=1)
Hash Cond: ((tn.n + 1) = nx.n)
-> Seq Scan on table_name tn (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.014..142.781 rows=99778 loops=1)
-> Hash (cost=1390.78..1390.78 rows=99778 width=4) (actual time=305.732..305.732 rows=99778 loops=1)
-> Seq Scan on table_name nx (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.007..143.783 rows=99778 loops=1)
Total runtime: 693.139 ms
(9 rows)
My conclusion is that the "NOT EXISTS" versions cause postgres to produce better plans.