I have 2 tables, "transaksi" and "buku". "transaksi" has around ~250k rows, and buku has around ~170k rows. Both tables have column called "k999a", and both tables use no indexes. Now I check these 2 statements.
Statement 1:
explain select k999a from transaksi where k999a not in (select k999a from buku);
Statement 1 outputs:
Seq Scan on transaksi (cost=0.00..721109017.46 rows=125426 width=9)
Filter: (NOT (SubPlan 1))
SubPlan 1
-> Materialize (cost=0.00..5321.60 rows=171040 width=8)
-> Seq Scan on buku (cost=0.00..3797.40 rows=171040 width=8)
Statement 2:
explain select k999a from transaksi where k999a in (select k999a from buku);
Statement 2 outputs:
Hash Semi Join (cost=6604.40..22664.82 rows=250853 width=9)
Hash Cond: (transaksi.k999a = buku.k999a)
-> Seq Scan on transaksi (cost=0.00..6356.53 rows=250853 width=9)
-> Hash (cost=3797.40..3797.40 rows=171040 width=8)
-> Seq Scan on buku (cost=0.00..3797.40 rows=171040 width=8)
Why in the NOT IN query, postgresql does loop join, making the query takes a long time?
PS: postgresql version 9.6.1 on windows 10
This is to be expected. You may get better performance using WHERE NOT EXISTS instead:
SELECT k999a
FROM transaksi
WHERE NOT EXISTS (
SELECT 1 FROM buku WHERE buku.k999a = transaksi.k999a LIMIT 1
);
Here is a good explanation as to why for each of the methods: https://explainextended.com/2009/09/16/not-in-vs-not-exists-vs-left-join-is-null-postgresql/
Related
I'm trying to migrate from SQL Server to Postgresql.
Here is my Posgresql code:
Create View person_names As
SELECT lp."Code", n."Name", n."Type"
from "Persons" lp
Left Join LATERAL
(
Select *
From "Names" n
Where n.id = lp.id
Order By "Date" desc
Limit 1
) n on true
limit 100;
Explain
Select "Code" From person_names;
It prints
"Subquery Scan on person_names (cost=0.42..448.85 rows=100 width=10)"
" -> Limit (cost=0.42..447.85 rows=100 width=56)"
" -> Nested Loop Left Join (cost=0.42..303946.91 rows=67931 width=56)"
" -> Seq Scan on ""Persons"" lp (cost=0.00..1314.31 rows=67931 width=10)"
" -> Limit (cost=0.42..4.44 rows=1 width=100)"
" -> Index Only Scan Backward using ""IX_Names_Person"" on ""Names"" n (cost=0.42..4.44 rows=1 width=100)"
" Index Cond: ("id" = (lp."id")::numeric)"
Why there is an "Index Only Scan" for the "Names" table? This table is not required to get a result. On SQL Server I get only a single scan over the "Persons" table.
How can I tune Postgres to get a better query plans? I'm trying the lastest version, which is the Postgresql 15 beta 3.
Here is SQL Server version:
Create View person_names As
SELECT top 100 lp."Code", n."Name", n."Type"
from "Persons" lp
Outer Apply
(
Select Top 1 *
From "Names" n
Where n.id = lp.id
Order By "Date" desc
) n
GO
SET SHOWPLAN_TEXT ON;
GO
Select "Code" From person_names;
It gives correct execution plan:
|--Top(TOP EXPRESSION:((100)))
|--Index Scan(OBJECT:([Persons].[IX_Persons] AS [lp]))
Change the lateral join to a regular left join, then Postgres is able to remove the select on the Names table:
create View person_names
As
SELECT lp.Code, n.Name, n.Type
from Persons lp
Left Join (
Select distinct on (id) *
From Names n
Order By id, Date desc
) n on n.id = lp.id
limit 100;
The following index will support the distinct on () in case you do include columns from the Names table:
create index on "Names"(id, "Date" desc);
For select code from names this gives me this plan:
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on persons lp (cost=0.00..309.00 rows=20000 width=7) (actual time=0.009..1.348 rows=20000 loops=1)
Planning Time: 0.262 ms
Execution Time: 1.738 ms
For select Code, name, type From person_names; this gives me this plan:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Hash Right Join (cost=559.42..14465.93 rows=20000 width=25) (actual time=5.585..68.545 rows=20000 loops=1)
Hash Cond: (n.id = lp.id)
-> Unique (cost=0.42..13653.49 rows=20074 width=26) (actual time=0.053..57.323 rows=20000 loops=1)
-> Index Scan using names_id_date_idx on names n (cost=0.42..12903.49 rows=300000 width=26) (actual time=0.052..41.125 rows=300000 loops=1)
-> Hash (cost=309.00..309.00 rows=20000 width=11) (actual time=5.407..5.407 rows=20000 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 1116kB
-> Seq Scan on persons lp (cost=0.00..309.00 rows=20000 width=11) (actual time=0.011..2.036 rows=20000 loops=1)
Planning Time: 0.460 ms
Execution Time: 69.180 ms
Of course I had to guess the table structures as you haven't provided any DDL.
Online example
Change your view definition like that
create view person_names as
select p."Code",
(select "Name"
from "Names" n
where n.id = p.id
order by "Date" desc
limit 1)
from "Persons" p
limit 100;
When joining on a table and then filtering (LIMIT 30 for instance), Postgres will apply a JOIN operation on all rows, even if the columns from those rows is only used in the returned column, and not as a filtering predicate.
This would be understandable for an INNER JOIN (PG has to know if the row will be returned or not) or for a LEFT JOIN without a unique constraint (PG has to know if more than one row will be returned or not), but for a LEFT JOIN on a UNIQUE column, this seems wasteful: if the query matches 10k rows, then 10k joins will be performed, and then only 30 will be returned.
It would seem more efficient to "delay", or defer, the join, as much as possible, and this is something that I've seen happen on some other queries.
Splitting this into a subquery (SELECT * FROM (SELECT * FROM main WHERE x LIMIT 30) LEFT JOIN secondary) works, by ensuring that only 30 items are returned from the main table before joining them, but it feels like I'm missing something, and the "standard" form of the query should also apply the same optimization.
Looking at the EXPLAIN plans, however, I can see that the number of rows joined is always the total number of rows, without "early bailing out" as you could see when, for instance, running a Seq Scan with a LIMIT 5.
Example schema, with a main table and a secondary one: secondary columns will only be returned, never filtered on.
drop table if exists secondary;
drop table if exists main;
create table main(id int primary key not null, main_column int);
create index main_column on main(main_column);
insert into main(id, main_column) SELECT i, i % 3000 from generate_series( 1, 1000000, 1) i;
create table secondary(id serial primary key not null, main_id int references main(id) not null, secondary_column int);
create unique index secondary_main_id on secondary(main_id);
insert into secondary(main_id, secondary_column) SELECT i, (i + 17) % 113 from generate_series( 1, 1000000, 1) i;
analyze main;
analyze secondary;
Example query:
explain analyze verbose select main.id, main_column, secondary_column
from main
left join secondary on main.id = secondary.main_id
where main_column = 5
order by main.id
limit 50;
This is the most "obvious" way of writing the query, takes on average around 5ms on my computer.
Explain:
Limit (cost=3742.93..3743.05 rows=50 width=12) (actual time=5.010..5.322 rows=50 loops=1)
Output: main.id, main.main_column, secondary.secondary_column
-> Sort (cost=3742.93..3743.76 rows=332 width=12) (actual time=5.006..5.094 rows=50 loops=1)
Output: main.id, main.main_column, secondary.secondary_column
Sort Key: main.id
Sort Method: top-N heapsort Memory: 27kB
-> Nested Loop Left Join (cost=11.42..3731.90 rows=332 width=12) (actual time=0.123..4.446 rows=334 loops=1)
Output: main.id, main.main_column, secondary.secondary_column
Inner Unique: true
-> Bitmap Heap Scan on public.main (cost=11.00..1036.99 rows=332 width=8) (actual time=0.106..1.021 rows=334 loops=1)
Output: main.id, main.main_column
Recheck Cond: (main.main_column = 5)
Heap Blocks: exact=334
-> Bitmap Index Scan on main_column (cost=0.00..10.92 rows=332 width=0) (actual time=0.056..0.057 rows=334 loops=1)
Index Cond: (main.main_column = 5)
-> Index Scan using secondary_main_id on public.secondary (cost=0.42..8.12 rows=1 width=8) (actual time=0.006..0.006 rows=1 loops=334)
Output: secondary.id, secondary.main_id, secondary.secondary_column
Index Cond: (secondary.main_id = main.id)
Planning Time: 0.761 ms
Execution Time: 5.423 ms
explain analyze verbose select m.id, main_column, secondary_column
from (
select main.id, main_column
from main
where main_column = 5
order by main.id
limit 50
) m
left join secondary on m.id = secondary.main_id
where main_column = 5
order by m.id
limit 50
This returns the same results, in 2ms.
The total EXPLAIN cost is also three times higher, in line with the performance gain we're seeing.
Limit (cost=1048.44..1057.21 rows=1 width=12) (actual time=1.219..2.027 rows=50 loops=1)
Output: m.id, m.main_column, secondary.secondary_column
-> Nested Loop Left Join (cost=1048.44..1057.21 rows=1 width=12) (actual time=1.216..1.900 rows=50 loops=1)
Output: m.id, m.main_column, secondary.secondary_column
Inner Unique: true
-> Subquery Scan on m (cost=1048.02..1048.77 rows=1 width=8) (actual time=1.201..1.515 rows=50 loops=1)
Output: m.id, m.main_column
Filter: (m.main_column = 5)
-> Limit (cost=1048.02..1048.14 rows=50 width=8) (actual time=1.196..1.384 rows=50 loops=1)
Output: main.id, main.main_column
-> Sort (cost=1048.02..1048.85 rows=332 width=8) (actual time=1.194..1.260 rows=50 loops=1)
Output: main.id, main.main_column
Sort Key: main.id
Sort Method: top-N heapsort Memory: 27kB
-> Bitmap Heap Scan on public.main (cost=11.00..1036.99 rows=332 width=8) (actual time=0.054..0.753 rows=334 loops=1)
Output: main.id, main.main_column
Recheck Cond: (main.main_column = 5)
Heap Blocks: exact=334
-> Bitmap Index Scan on main_column (cost=0.00..10.92 rows=332 width=0) (actual time=0.029..0.030 rows=334 loops=1)
Index Cond: (main.main_column = 5)
-> Index Scan using secondary_main_id on public.secondary (cost=0.42..8.44 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=50)
Output: secondary.id, secondary.main_id, secondary.secondary_column
Index Cond: (secondary.main_id = m.id)
Planning Time: 0.161 ms
Execution Time: 2.115 ms
This is a toy dataset here, but on a real DB, the IO difference is significant (no need to fetch 1000 rows when 30 are enough), and the timing difference also quickly adds up (up to an order of magnitude slower).
So my question: is there any way to get the planner to understand that the JOIN can be applied much later in the process?
It seems like something that could be applied automatically to gain a sizeable performance boost.
Deferred joins are good. It's usually helpful to run the limit operation on a subquery that yields only the id values. The order by....limit operation has to sort less data just to discard it.
select main.id, main.main_column, secondary.secondary_column
from main
join (
select id
from main
where main_column = 5
order by id
limit 50
) selection on main.id = selection.id
left join secondary on main.id = secondary.main_id
order by main.id
limit 50
It's also possible adding id to your main_column index will help. With a BTREE index the query planner knows it can get the id values in ascending order from the index, so it may be able to skip the sort step entirely and just scan the first 50 values.
create index main_column on main(main_column, id);
Edit In a large table, the heavy lifting of your query will be the selection of the 50 main.id values to process. To get those 50 id values as cheaply as possible you can use a scan of the covering index I proposed with the subquery I proposed. Once you've got your 50 id values, looking up 50 rows' worth of details from your various tables by main.id and secondary.main_id is trivial; you have the correct indexes in place and it's a limited number of rows. Because it's a limited number of rows it won't take much time.
It looks like your table sizes are too small for various optimizations to have much effect, though. Query plans change a lot when tables are larger.
Alternative query, using row_number() instead of LIMIT (I think you could even omit LIMIT here):
-- prepare q3 AS
select m.id, main_column, secondary_column
from (
select id, main_column
, row_number() OVER (ORDER BY id, main_column) AS rn
from main
where main_column = 5
) m
left join secondary on m.id = secondary.main_id
WHERE m.rn <= 50
ORDER BY m.id
LIMIT 50
;
Puttting the subsetting into a CTE can avoid it to be merged into the main query:
PREPARE q6 AS
WITH
-- MATERIALIZED -- not needed before version 12
xxx AS (
SELECT DISTINCT x.id
FROM main x
WHERE x.main_column = 5
ORDER BY x.id
LIMIT 50
)
select m.id, m.main_column, s.secondary_column
from main m
left join secondary s on m.id = s.main_id
WHERE EXISTS (
SELECT *
FROM xxx x WHERE x.id = m.id
)
order by m.id
-- limit 50
;
I have a table with 3 columns and composite primary key with all the 3 columns. All the individual columns have lot of duplicates and I have btree separately on all of them. The table has around 10 million records.
My query with just a condition with a hardcoded value for single column would always return more than a million records. It takes more than 40 secs whereas it takes very few seconds if I limit the query to 1 or 2 million rows without any condition.
Any help to optimize it as there is no bitmap index in Postgres? All 3 columns have lots of duplicates, would it help if I drop the btree index on them?
SELECT t1.filterid,
t1.filterby,
t1.filtertype
FROM echo_sm.usernotificationfilters t1
WHERE t1.filtertype = 9
UNION
SELECT t1.filterid, '-1' AS filterby, 9 AS filtertype
FROM echo_sm.usernotificationfilters t1
WHERE NOT EXISTS (SELECT 1
FROM echo_sm.usernotificationfilters t2
WHERE t2.filtertype = 9 AND t2.filterid = t1.filterid);
Filtertype column is integer and the rest 2 are varchar(50). All 3 columns have separate btree indexes on them.
Explain plan:
Unique (cost=2168171.15..2201747.47 rows=3357632 width=154) (actual time=32250.340..36371.928 rows=3447159 loops=1)
-> Sort (cost=2168171.15..2176565.23 rows=3357632 width=154) (actual time=32250.337..35544.050 rows=4066447 loops=1)
Sort Key: usernotificationfilters.filterid, usernotificationfilters.filterby, usernotificationfilters.filtertype
Sort Method: external merge Disk: 142696kB
-> Append (cost=62854.08..1276308.41 rows=3357632 width=154) (actual time=150.155..16025.874 rows=4066447 loops=1)
-> Bitmap Heap Scan on usernotificationfilters (cost=62854.08..172766.46 rows=3357631 width=25) (actual time=150.154..574.297 rows=3422522 loops=1)
Recheck Cond: (filtertype = 9)
Heap Blocks: exact=39987
-> Bitmap Index Scan on index_sm_usernotificationfilters_filtertype (cost=0.00..62014.67 rows=3357631 width=0) (actual time=143.585..143.585 rows=3422522 loops=1)
Index Cond: (filtertype = 9)
-> Gather (cost=232131.85..1069965.63 rows=1 width=50) (actual time=3968.492..15133.812 rows=643925 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Hash Anti Join (cost=231131.85..1068965.53 rows=1 width=50) (actual time=4135.235..12945.029 rows=214642 loops=3)
Hash Cond: ((usernotificationfilters_1.filterid)::text = (usernotificationfilters_1_1.filterid)::text)
-> Parallel Seq Scan on usernotificationfilters usernotificationfilters_1 (cost=0.00..106879.18 rows=3893718 width=14) (actual time=0.158..646.432 rows=3114974 loops=3)
-> Hash (cost=172766.46..172766.46 rows=3357631 width=14) (actual time=4133.991..4133.991 rows=3422522 loops=3)
Buckets: 131072 Batches: 64 Memory Usage: 3512kB
-> Bitmap Heap Scan on usernotificationfilters usernotificationfilters_1_1 (cost=62854.08..172766.46 rows=3357631 width=14) (actual time=394.775..1891.931 rows=3422522 loops=3)
Recheck Cond: (filtertype = 9)
Heap Blocks: exact=39987
-> Bitmap Index Scan on index_sm_usernotificationfilters_filtertype (cost=0.00..62014.67 rows=3357631 width=0) (actual time=383.635..383.635 rows=3422522 loops=3)
Index Cond: (filtertype = 9)
Planning time: 0.467 ms
Execution time: 36531.763 ms
The second subquery in your UNION takes about 15 seconds all by itself, and that could possibly be optimized separately from the rest of the query.
The sort to implement the duplicate removal implied by UNION takes about 20 seconds all by itself. It spills to disk. You could increase "work_mem" until it either stops spilling to disk, or starts using a hash rather than a sort. Of course you do need to have the RAM to backup your setting of "work_mem".
A third possibility would be not to treat these steps in isolation. If you had an index which would allow the data to be read from the 2nd branch of the union already in order, than it might not have to re-sort the whole thing. That would probably be an index on (filterid, filterby, filtertype).
This is a separate independent way to approach it.
I think your
WHERE NOT EXISTS (SELECT 1...
could be correctly changed to
WHERE t1.filtertype <> 9 NOT EXISTS AND (SELECT 1...
because the case where t1.filtertype=9 would filter itself out. Is that correct? If so, you could try writing it that way, as the planner is probably not smart enough to make that transformation on its own. Once you have done that, than maybe a filtered index, something like the below, would come in useful.
create index on echo_sm.usernotificationfilters (filterid, filterby, filtertype)
where filtertype <> 9
But, unless you get rid of or speed up that sort, there is only so much improvement you can get with other things.
It appears you only want to retrieve one record per filterid: a record with filtertype = 9 if available, or just another, with dummy values for the other columns. This can be done by ordering BY (filtertype<>9), filtertype ) and picking only the first row via row_number() = 1:
-- EXPLAIN ANALYZE
SELECT xx.filterid
, case(xx.filtertype) when 9 then xx.filterby ELSE '-1' END AS filterby
, 9 AS filtertype -- xx.filtertype
-- , xx.rn
FROM (
SELECT t1.filterid , t1.filterby , t1.filtertype
, row_number() OVER (PARTITION BY t1.filterid ORDER BY (filtertype<>9), filtertype ) AS rn
FROM userfilters t1
) xx
WHERE xx.rn = 1
-- ORDER BY xx.filterid, xx.rn
;
This query can be supported by a an index on the same expression:
CREATE INDEX ON userfilters ( filterid , (filtertype<>9), filtertype ) ;
But, on my machine the UNION ALL version is faster (using the same index):
EXPLAIN ANALYZE
SELECT t1.filterid
, t1.filterby
, t1.filtertype
FROM userfilters t1
WHERE t1.filtertype = 9
UNION ALL
SELECT DISTINCT t1.filterid , '-1' AS filterby ,9 AS filtertype
FROM userfilters t1
WHERE NOT EXISTS (
SELECT *
FROM userfilters t2
WHERE t2.filtertype = 9 AND t2.filterid = t1.filterid
)
;
Even simpler (and faster!) is to use DISTINCT ON() , supported by the same conditional index:
-- EXPLAIN ANALYZE
SELECT DISTINCT ON (t1.filterid)
t1.filterid
, case(t1.filtertype) when 9 then t1.filterby ELSE '-1' END AS filterby
, 9 AS filtertype -- t1.filtertype
FROM userfilters t1
ORDER BY t1.filterid , (t1.filtertype<>9), t1.filtertype
;
I'm trying to understand such a huge difference in performance of two queries.
Let's assume I have two tables.
First one contains A records for some set of domains:
Table "public.dns_a"
Column | Type | Modifiers | Storage | Stats target | Description
--------+------------------------+-----------+----------+--------------+-------------
name | character varying(125) | | extended | |
a | inet | | main | |
Indexes:
"dns_a_a_idx" btree (a)
"dns_a_name_idx" btree (name varchar_pattern_ops)
Second table handles CNAME records:
Table "public.dns_cname"
Column | Type | Modifiers | Storage | Stats target | Description
--------+------------------------+-----------+----------+--------------+-------------
name | character varying(256) | | extended | |
cname | character varying(256) | | extended | |
Indexes:
"dns_cname_cname_idx" btree (cname varchar_pattern_ops)
"dns_cname_name_idx" btree (name varchar_pattern_ops)
Now I'm trying to solve "simple" problem with getting all the domains pointing to the same IP address, including CNAME.
The first attempt to use CTE works kind of fine:
EXPLAIN ANALYZE WITH RECURSIVE names_traverse AS (
(
SELECT name::varchar(256), NULL::varchar(256) as cname, a FROM dns_a WHERE a = '118.145.5.20'
)
UNION ALL
SELECT c.name, c.cname, NULL::inet as a FROM names_traverse nt, dns_cname c WHERE c.cname=nt.name
)
SELECT * FROM names_traverse;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on names_traverse (cost=3051757.20..4337044.86 rows=64264383 width=1064) (actual time=0.037..1697.444 rows=199 loops=1)
CTE names_traverse
-> Recursive Union (cost=0.57..3051757.20 rows=64264383 width=45) (actual time=0.036..1697.395 rows=199 loops=1)
-> Index Scan using dns_a_a_idx on dns_a (cost=0.57..1988.89 rows=1953 width=24) (actual time=0.035..0.064 rows=14 loops=1)
Index Cond: (a = '118.145.5.20'::inet)
-> Merge Join (cost=4377.00..176448.06 rows=6426243 width=45) (actual time=498.101..848.648 rows=92 loops=2)
Merge Cond: ((c.cname)::text = (nt.name)::text)
-> Index Scan using dns_cname_cname_idx on dns_cname c (cost=0.56..69958.06 rows=2268434 width=45) (actual time=4.732..688.456 rows=2219973 loops=2)
-> Materialize (cost=4376.44..4474.09 rows=19530 width=516) (actual time=0.039..0.084 rows=187 loops=2)
-> Sort (cost=4376.44..4425.27 rows=19530 width=516) (actual time=0.037..0.053 rows=100 loops=2)
Sort Key: nt.name USING ~<~
Sort Method: quicksort Memory: 33kB
-> WorkTable Scan on names_traverse nt (cost=0.00..390.60 rows=19530 width=516) (actual time=0.001..0.007 rows=100 loops=2)
Planning time: 0.130 ms
Execution time: 1697.477 ms
(15 rows)
There are two loops in the example above, so if I make a simple outer join query, I get much better results:
EXPLAIN ANALYZE
SELECT *
FROM dns_a a
LEFT JOIN dns_cname c1 ON (c1.cname=a.name)
LEFT JOIN dns_cname c2 ON (c2.cname=c1.name)
WHERE a.a='118.145.5.20';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=1.68..65674.19 rows=1953 width=114) (actual time=1.086..12.992 rows=189 loops=1)
-> Nested Loop Left Join (cost=1.12..46889.57 rows=1953 width=69) (actual time=1.085..2.154 rows=189 loops=1)
-> Index Scan using dns_a_a_idx on dns_a a (cost=0.57..1988.89 rows=1953 width=24) (actual time=0.022..0.055 rows=14 loops=1)
Index Cond: (a = '118.145.5.20'::inet)
-> Index Scan using dns_cname_cname_idx on dns_cname c1 (cost=0.56..19.70 rows=329 width=45) (actual time=0.137..0.148 rows=13 loops=14)
Index Cond: ((cname)::text = (a.name)::text)
-> Index Scan using dns_cname_cname_idx on dns_cname c2 (cost=0.56..6.33 rows=329 width=45) (actual time=0.057..0.057 rows=0 loops=189)
Index Cond: ((cname)::text = (c1.name)::text)
Planning time: 0.452 ms
Execution time: 13.012 ms
(10 rows)
Time: 13.787 ms
So, the performance difference is about 100 times and that's the thing that worries me.
I like the convenience of recursive CTE and prefer to use it instead of doing dirty tricks on application side, but I don't get why the cost of Index Scan using dns_cname_cname_idx on dns_cname c (cost=0.56..69958.06 rows=2268434 width=45) (actual time=4.732..688.456 rows=2219973 loops=2) is so high.
Am I missing something important regarding CTE or the issue is with something else?
Thanks!
Update: A friend of mine spotted the number of affected rows I missed Index Scan using dns_cname_cname_idx on dns_cname c (cost=0.56..69958.06 rows=2268434 width=45) (actual time=4.732..688.456 rows=2219973 loops=2), it equals total number of rows in the table and, if I understand correctly, it performs full index scan without condition and I don't get where condition is missed.
Result: After applying SET LOCAL enable_mergejoin TO false; execution time is much, much better.
EXPLAIN ANALYZE WITH RECURSIVE names_traverse AS (
(
SELECT name::varchar(256), NULL::varchar(256) as cname, a FROM dns_a WHERE a = '118.145.5.20'
)
UNION ALL
SELECT c.name, c.cname, NULL::inet as a FROM names_traverse nt, dns_cname c WHERE c.cname=nt.name
)
SELECT * FROM names_traverse;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on names_traverse (cost=4746432.42..6527720.02 rows=89064380 width=1064) (actual time=0.718..45.656 rows=199 loops=1)
CTE names_traverse
-> Recursive Union (cost=0.57..4746432.42 rows=89064380 width=45) (actual time=0.717..45.597 rows=199 loops=1)
-> Index Scan using dns_a_a_idx on dns_a (cost=0.57..74.82 rows=2700 width=24) (actual time=0.716..0.717 rows=14 loops=1)
Index Cond: (a = '118.145.5.20'::inet)
-> Nested Loop (cost=0.56..296507.00 rows=8906168 width=45) (actual time=11.276..22.418 rows=92 loops=2)
-> WorkTable Scan on names_traverse nt (cost=0.00..540.00 rows=27000 width=516) (actual time=0.000..0.013 rows=100 loops=2)
-> Index Scan using dns_cname_cname_idx on dns_cname c (cost=0.56..7.66 rows=330 width=45) (actual time=0.125..0.225 rows=1 loops=199)
Index Cond: ((cname)::text = (nt.name)::text)
Planning time: 0.253 ms
Execution time: 45.697 ms
(11 rows)
The first query is slow because of the index scan, as you noted.
The plan has to scan the complete index in order to get dns_cname sorted by cname, which is needed for the merge join. A merge join requires that both input tables are sorted by the join key, which can either be done with an index scan over the complete table (as in this case) or by a sequential scan followed by an explicit sort.
You will notice that the planner grossly overestimates all row counts for the CTE evaluation, which is probably the root of the problem. For fewer rows, PostgreSQL might choose a nested loop join which would not have to scan the whole table dns_cname.
That may be fixable or not. One thing that I can see immediately is that the estimate for the initial value '118.145.5.20' is too high by a factor 139.5, which is pretty bad. You might fix that by running ANALYZE on dns_cname, perhaps after increasing the statistics target for the column:
ALTER TABLE dns_a ALTER a SET STATISTICS 1000;
See if that makes a difference.
If that doesn't do the trick, you can manually set enable_mergejoin and enable_hashjoin to off and see if a plan with a nested loop join is really better or not. If you can get away with changing these parameters for this one statement only (probably with SET LOCAL) and get a better result that way, that is another option you have.
I'm having a very wierd dataset, where several records from large table has any data at all, but when they do it's hundredths of thousands of records.
I'm trying to select only records that have data but I'm having some problems with index usage. I know you cannot usually "force" postgresql to use certain index but in this case it works.
SELECT matches.id, count(frames.id) FROM matches LEFT JOIN frames ON frames.match_id = matches.id GROUP BY matches.id HAVING count(frames.id) > 0 ORDER BY count(frames.id) DESC;
id | count
----+--------
31 | 123363
28 | 121475
24 | 110155
21 | 108258
22 | 106837
25 | 89182
26 | 87104
27 | 86152
(8 rows)
SELECT matches.id, count(frames.id) FROM matches LEFT JOIN frames ON frames.match_id = matches.id GROUP BY matches.id HAVING count(frames.id) = 0 ORDER BY count(frames.id) DESC;
....
(568 rows)
Two solutions I've found would be:
SELECT "matches".* FROM "matches" WHERE EXISTS (SELECT true FROM frames WHERE frames.match_id = matches.id LIMIT 1);
Time: 11697,645 ms
or
SELECT DISTINCT "matches".* FROM "matches" INNER JOIN "frames" ON "frames"."match_id" = "matches"."id"
Time: 879,325 ms
Neither query seems to use index on match_id in frames table. It's understendable since normally it's not very selective, unfortunately here it would be really helpful. As:
SET enable_seqscan = OFF;
SELECT "matches".* FROM "matches" WHERE (SELECT true FROM frames WHERE frames.match_id = matches.id LIMIT 1);
Time: 1,239 ms
EXPLAIN for queries:
EXPLAIN for: SELECT DISTINCT "matches".* FROM "matches" INNER JOIN "frames" ON "frames"."match_id" = "matches"."id"
QUERY PLAN
-----------------------------------------------------------------------------
HashAggregate (cost=59253.47..59256.38 rows=290 width=155)
-> Hash Join (cost=6.26..33716.73 rows=785746 width=155)
Hash Cond: (frames.match_id = matches.id)
-> Seq Scan on frames (cost=0.00..22906.46 rows=785746 width=4)
-> Hash (cost=4.45..4.45 rows=145 width=155)
-> Seq Scan on matches (cost=0.00..4.45 rows=145 width=155)
(6 rows)
EXPLAIN for: SELECT "matches".* FROM "matches" WHERE (EXISTS (SELECT id FROM frames WHERE frames.match_id = matches.id LIMIT 1))
QUERY PLAN
Seq Scan on matches (cost=0.00..41.17 rows=72 width=155)
Filter: (SubPlan 1)
SubPlan 1
-> Limit (cost=0.00..0.25 rows=1 width=4)
-> Seq Scan on frames (cost=0.00..24870.83 rows=98218 width=4)
Filter: (match_id = matches.id)
(6 rows)
SET enable_seqscan = OFF;
EXPLAIN SELECT "matches".* FROM "matches" WHERE (SELECT true FROM frames WHERE frames.match_id = matches.id LIMIT 1);
QUERY PLAN
Seq Scan on matches (cost=10000000000.00..10000000118.37 rows=72 width=155)
Filter: (SubPlan 1)
SubPlan 1
-> Limit (cost=0.00..0.79 rows=1 width=0)
-> Index Scan using index_frames_on_match_id on frames (cost=0.00..81762.68 rows=104066 width=0)
Index Cond: (match_id = matches.id)
(6 rows)
Any suggestions how to tweek the query to use index here? Maybe other ways to chec for existance of recrs that would execute closer to 1ms I get out of index then 11s ?
PS. I did run ANALYZE, VACUM ANALYZE, all the steps normally suggested to improve index usage.
EDIT Thanks for David Aldridge pointing out that LIMIT 1 might be actually hindering query planner I've gotten now to:
SELECT "matches".* FROM "matches" WHERE EXISTS (SELECT true FROM frames WHERE frames.match_id = matches.id);
Time: 163,803 ms
With the plan:
EXPLAIN SELECT "matches".* FROM "matches" WHERE EXISTS (SELECT true FROM frames WHERE frames.match_id = matches.id);
QUERY PLAN
------------------------------------------------------------------------------------
Nested Loop (cost=25455.58..25457.90 rows=8 width=155)
-> HashAggregate (cost=25455.58..25455.66 rows=8 width=4)
-> Seq Scan on frames (cost=0.00..23374.26 rows=832526 width=4)
-> Index Scan using matches_pkey on matches (cost=0.00..0.27 rows=1 width=155)
Index Cond: (id = frames.match_id)
(5 rows)
Still 100 times slower with index only version (probably because of Seq Scan + Hash Aggregate on frames that's still performed)
In the EXISTS-based alternative, the LIMIT clause is redundant but might not be helping the optimiser.
Try:
SELECT "matches".*
FROM "matches"
WHERE EXISTS (SELECT 1
FROM frames
WHERE frames.match_id = matches.id);