Postgres chooses wrong query plan - postgresql

I have problems with a query that uses a wrong query plan. Because of the non-optimal query plan the query takes almost 20s.
The problem occurs only for a small number of owner_ids. The distribution of the owner_ids is not uniform. The owner_id in the example has 7948 routes. The total number of routes is 2903096.
The database is hosted on Amazon RDS on a server with 34.2 GiB memory, 4vCPU and provisioned IOPS (instance type db.m2.2xlarge). The Postgres version is 9.3.5.
EXPLAIN ANALYZE SELECT
route.id, route_meta.name
FROM
route
INNER JOIN
route_meta
USING (id)
WHERE
route.owner_id = 128905
ORDER BY
route_meta.name
LIMIT
61
Query plan:
"Limit (cost=0.86..58637.88 rows=61 width=24) (actual time=49.731..18828.052 rows=61 loops=1)"
" -> Nested Loop (cost=0.86..7934263.10 rows=8254 width=24) (actual time=49.728..18827.887 rows=61 loops=1)"
" -> Index Scan using route_meta_i_name on route_meta (cost=0.43..289911.22 rows=2902910 width=24) (actual time=0.016..2825.932 rows=1411126 loops=1)"
" -> Index Scan using route_pkey on route (cost=0.43..2.62 rows=1 width=4) (actual time=0.009..0.009 rows=0 loops=1411126)"
" Index Cond: (id = route_meta.id)"
" Filter: (owner_id = 128905)"
" Rows Removed by Filter: 1"
"Total runtime: 18828.214 ms"
If I increase the limit to 100, a better query plan is used. It takes now less then 100ms.
EXPLAIN ANALYZE SELECT
route.id, route_meta.name
FROM
route
INNER JOIN
route_meta
USING (id)
WHERE
route.owner_id = 128905
ORDER BY
route_meta.name
LIMIT
100
Query plan:
"Limit (cost=79964.98..79965.23 rows=100 width=24) (actual time=93.037..93.294 rows=100 loops=1)"
" -> Sort (cost=79964.98..79985.61 rows=8254 width=24) (actual time=93.033..93.120 rows=100 loops=1)"
" Sort Key: route_meta.name"
" Sort Method: top-N heapsort Memory: 31kB"
" -> Nested Loop (cost=0.86..79649.52 rows=8254 width=24) (actual time=0.039..77.955 rows=7948 loops=1)"
" -> Index Scan using route_i_owner_id on route (cost=0.43..22765.84 rows=8408 width=4) (actual time=0.023..13.839 rows=7948 loops=1)"
" Index Cond: (owner_id = 128905)"
" -> Index Scan using route_meta_pkey on route_meta (cost=0.43..6.76 rows=1 width=24) (actual time=0.003..0.004 rows=1 loops=7948)"
" Index Cond: (id = route.id)"
"Total runtime: 93.444 ms"
I already tried following things:
increasing statistics for owner_id (The owner_id in the example is included in the pg_stats)
ALTER TABLE route ALTER COLUMN owner_id SET STATISTICS 1000;
reindex owner_id and name
vacuum analyse
increased work_mem from 1MB to 16MB
when I rewrite the query to row_number() OVER (ORDER BY xxx) AS rn
... WHERE rn <= yyy in a subquery, the specific case is solved. However it
introduces performance problems with other ownerids.
A similar problem was solved with a combined index, but that seems impossible here because of the different tables.
Postgres uses wrong index in query plan

Related

Postgres is using UNIQUE index instead of FTS index

I have a table with > 10.000.000 rows. The table has a column OfficialEnterprise_vatNumber that should be unique and can be part of a full text search.
Here are the indexes:
"uq_officialenterprise_vatnumber" "CREATE UNIQUE INDEX uq_officialenterprise_vatnumber ON commonservices.""OfficialEnterprise"" USING btree (""OfficialEnterprise_vatNumber"")"
"ix_officialenterprise_vatnumber" "CREATE INDEX ix_officialenterprise_vatnumber ON commonservices.""OfficialEnterprise"" USING gin (to_tsvector('commonservices.unaccent_dictionary'::regconfig, (""OfficialEnterprise_vatNumber"")::text))"
But if I EXPLAIN a query that should be using the FTS index like this
SELECT * FROM commonservices."OfficialEnterprise"
WHERE
to_tsvector('commonservices.unaccent_dictionary', "OfficialEnterprise_vatNumber") ## to_tsquery('FR:* | IE:*')
ORDER BY "OfficialEnterprise_vatNumber" ASC
LIMIT 100
It shows that the used index is uq_officialenterprise_vatnumber and not ix_officialenterprise_vatnumber.
Is their something I'm missing ?
EDIT:
Here is the EXPLAIN ANALYZE statement of the original query.
"Limit (cost=0.43..1460.27 rows=100 width=238) (actual time=6996.976..6997.057 rows=15 loops=1)"
" -> Index Scan using uq_officialenterprise_vatnumber on ""OfficialEnterprise"" (cost=0.43..1067861.32 rows=73149 width=238) (actual time=6996.975..6997.054 rows=15 loops=1)"
" Filter: (to_tsvector('commonservices.unaccent_dictionary'::regconfig, (""OfficialEnterprise_vatNumber"")::text) ## to_tsquery('FR:* | IE:*'::text))"
" Rows Removed by Filter: 1847197"
"Planning Time: 0.185 ms"
"Execution Time: 6997.081 ms"
Here is the EXPLAIN ANALYZE of the query if I add || '0' to the order by.
"Limit (cost=55558.82..55570.49 rows=100 width=270) (actual time=7.069..9.827 rows=15 loops=1)"
" -> Gather Merge (cost=55558.82..62671.09 rows=60958 width=270) (actual time=7.068..9.823 rows=15 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Sort (cost=54558.80..54635.00 rows=30479 width=270) (actual time=0.235..0.238 rows=5 loops=3)"
" Sort Key: (((""OfficialEnterprise_vatNumber"")::text || '0'::text))"
" Sort Method: quicksort Memory: 28kB"
" Worker 0: Sort Method: quicksort Memory: 25kB"
" Worker 1: Sort Method: quicksort Memory: 25kB"
" -> Parallel Bitmap Heap Scan on ""OfficialEnterprise"" (cost=719.16..53393.91 rows=30479 width=270) (actual time=0.157..0.166 rows=5 loops=3)"
" Recheck Cond: (to_tsvector('commonservices.unaccent_dictionary'::regconfig, (""OfficialEnterprise_vatNumber"")::text) ## to_tsquery('FR:* | IE:*'::text))"
" Heap Blocks: exact=6"
" -> Bitmap Index Scan on ix_officialenterprise_vatnumber (cost=0.00..700.87 rows=73149 width=0) (actual time=0.356..0.358 rows=15 loops=1)"
" Index Cond: (to_tsvector('commonservices.unaccent_dictionary'::regconfig, (""OfficialEnterprise_vatNumber"")::text) ## to_tsquery('FR:* | IE:*'::text))"
"Planning Time: 0.108 ms"
"Execution Time: 9.886 ms"
rows=73149...(actual...rows=15)
So it vastly misestimates the number of rows it will find. If it actually found 73149 rows, using the ordering index probably would be faster.
Have you ANALYZED your table since creating the functional index on it? Doing that might fix the estimation problem. Or at least improve it enough to fix the planning decision.
Yes, doing dummy operations like +0 or ||'' is hackish. They work by forcing the planner to think it can't use the index to fulfill the ORDER BY.
Doing full text search on a VAT number seems rather misguided in the first place. Maybe this would be best addressed with a LIKE query, or by creating an explicit column to hold the country of origin flag.

PostgreSQL slow order

I have table (over 100 millions records) on PostgreSQL 13.1
CREATE TABLE report
(
id serial primary key,
license_plate_id integer,
datetime timestamp
);
Indexes (for test I create both of them):
create index report_lp_datetime_index on report (license_plate_id, datetime);
create index report_lp_datetime_desc_index on report (license_plate_id desc, datetime desc);
So, my question is why query like
select * from report r
where r.license_plate_id in (1,2,4,5,6,7,8,10,15,22,34,75)
order by datetime desc
limit 100
Is very slow (~10sec). But query without order statement is fast (milliseconds).
Explain:
explain (analyze, buffers, format text) select * from report r
where r.license_plate_id in (1,2,4,5,6,7,8,10,15,22,34, 75,374,57123)
limit 100
Limit (cost=0.57..400.38 rows=100 width=316) (actual time=0.037..0.216 rows=100 loops=1)
Buffers: shared hit=103
-> Index Scan using report_lp_id_idx on report r (cost=0.57..44986.97 rows=11252 width=316) (actual time=0.035..0.202 rows=100 loops=1)
Index Cond: (license_plate_id = ANY ('{1,2,4,5,6,7,8,10,15,22,34,75,374,57123}'::integer[]))
Buffers: shared hit=103
Planning Time: 0.228 ms
Execution Time: 0.251 ms
explain (analyze, buffers, format text) select * from report r
where r.license_plate_id in (1,2,4,5,6,7,8,10,15,22,34,75,374,57123)
order by datetime desc
limit 100
Limit (cost=44193.63..44193.88 rows=100 width=316) (actual time=4921.030..4921.047 rows=100 loops=1)
Buffers: shared hit=11455 read=671
-> Sort (cost=44193.63..44221.76 rows=11252 width=316) (actual time=4921.028..4921.035 rows=100 loops=1)
Sort Key: datetime DESC
Sort Method: top-N heapsort Memory: 128kB
Buffers: shared hit=11455 read=671
-> Bitmap Heap Scan on report r (cost=151.18..43763.59 rows=11252 width=316) (actual time=54.422..4911.927 rows=12148 loops=1)
Recheck Cond: (license_plate_id = ANY ('{1,2,4,5,6,7,8,10,15,22,34,75,374,57123}'::integer[]))
Heap Blocks: exact=12063
Buffers: shared hit=11455 read=671
-> Bitmap Index Scan on report_lp_id_idx (cost=0.00..148.37 rows=11252 width=0) (actual time=52.631..52.632 rows=12148 loops=1)
Index Cond: (license_plate_id = ANY ('{1,2,4,5,6,7,8,10,15,22,34,75,374,57123}'::integer[]))
Buffers: shared hit=59 read=4
Planning Time: 0.427 ms
Execution Time: 4921.128 ms
You seem to have rather slow storage, if reading 671 8kB-blocks from disk takes a couple of seconds.
The way to speed this up is to reorder the table in the same way as the index, so that you can find the required rows in the same or adjacent table blocks:
CLUSTER report_lp_id_idx USING report_lp_id_idx;
Be warned that rewriting the table in this way causes downtime – the table will not be available while it is being rewritten. Moreover, PostgreSQL does not maintain the table order, so subsequent data modifications will cause performance to gradually deteriorate, so that after a while you will have to run CLUSTER again.
But if you need this query to be fast no matter what, CLUSTER is the way to go.
Your two indices do exactly the same thing, so you can remove the second one, it's useless.
To optimize your query, the order of the fields inside the index must be reversed:
create index report_lp_datetime_index on report (datetime,license_plate_id);
BEGIN;
CREATE TABLE foo (d INTEGER, i INTEGER);
INSERT INTO foo SELECT random()*100000, random()*1000 FROM generate_series(1,1000000) s;
CREATE INDEX foo_d_i ON foo(d DESC,i);
COMMIT;
VACUUM ANALYZE foo;
EXPLAIN ANALYZE SELECT * FROM foo WHERE i IN (1,2,4,5,6,7,8,10,15,22,34,75) ORDER BY d DESC LIMIT 100;
Limit (cost=0.42..343.92 rows=100 width=8) (actual time=0.076..9.359 rows=100 loops=1)
-> Index Only Scan Backward using foo_d_i on foo (cost=0.42..40976.43 rows=11929 width=8) (actual time=0.075..9.339 rows=100 loops=1)
Filter: (i = ANY ('{1,2,4,5,6,7,8,10,15,22,34,75}'::integer[]))
Rows Removed by Filter: 9016
Heap Fetches: 0
Planning Time: 0.339 ms
Execution Time: 9.387 ms
Note the index is not used to optimize the WHERE clause. It is used here as a compact and fast way to store references to the rows ordered by date DESC, so the ORDER BY can do an index-only scan and avoid sorting. By adding column id to the index, an index-only scan can be performed to test the condition on id, without hitting the table for every row. Since there is a low LIMIT value it does not need to scan the whole index, it only scans it in date DESC order until it finds enough rows satisfying the WHERE condition to return the result.
It will be faster if you create the index in date DESC order, this could be useful if you use ORDER BY date DESC + LIMIT in other queries too.
You forget that OP's table has a third column, and he is using SELECT *. So that wouldn't be an index-only scan.
Easy to work around. The optimum way to do this query would be an index-only scan to filter on WHERE conditions, then LIMIT, then hit the table to get the rows. For some reason if "select *" is used postgres takes the id column from the table instead of taking it from the index, which results in lots of unnecessary heap fetches for rows whose id is rejected by the WHERE condition.
Easy to work around, by doing it manually. I've also added another bogus column to make sure the SELECT * hits the table.
EXPLAIN (ANALYZE,buffers) SELECT * FROM foo
JOIN (SELECT d,i FROM foo WHERE i IN (1,2,4,5,6,7,8,10,15,22,34,75) ORDER BY d DESC LIMIT 100) f USING (d,i)
ORDER BY d DESC LIMIT 100;
Limit (cost=0.85..1281.94 rows=1 width=17) (actual time=0.052..3.618 rows=100 loops=1)
Buffers: shared hit=453
-> Nested Loop (cost=0.85..1281.94 rows=1 width=17) (actual time=0.050..3.594 rows=100 loops=1)
Buffers: shared hit=453
-> Limit (cost=0.42..435.44 rows=100 width=8) (actual time=0.037..2.953 rows=100 loops=1)
Buffers: shared hit=53
-> Index Only Scan using foo_d_i on foo foo_1 (cost=0.42..51936.43 rows=11939 width=8) (actual time=0.037..2.935 rows=100 loops=1)
Filter: (i = ANY ('{1,2,4,5,6,7,8,10,15,22,34,75}'::integer[]))
Rows Removed by Filter: 9010
Heap Fetches: 0
Buffers: shared hit=53
-> Index Scan using foo_d_i on foo (cost=0.42..8.45 rows=1 width=17) (actual time=0.005..0.005 rows=1 loops=100)
Index Cond: ((d = foo_1.d) AND (i = foo_1.i))
Buffers: shared hit=400
Execution Time: 3.663 ms
Another option is to just add the primary key to the date,license_plate index.
SELECT * FROM foo JOIN (SELECT id FROM foo WHERE i IN (1,2,4,5,6,7,8,10,15,22,34,75) ORDER BY d DESC LIMIT 100) f USING (id) ORDER BY d DESC LIMIT 100;
Limit (cost=1357.98..1358.23 rows=100 width=17) (actual time=3.920..3.947 rows=100 loops=1)
Buffers: shared hit=473
-> Sort (cost=1357.98..1358.23 rows=100 width=17) (actual time=3.919..3.931 rows=100 loops=1)
Sort Key: foo.d DESC
Sort Method: quicksort Memory: 32kB
Buffers: shared hit=473
-> Nested Loop (cost=0.85..1354.66 rows=100 width=17) (actual time=0.055..3.858 rows=100 loops=1)
Buffers: shared hit=473
-> Limit (cost=0.42..509.41 rows=100 width=8) (actual time=0.039..3.116 rows=100 loops=1)
Buffers: shared hit=73
-> Index Only Scan using foo_d_i_id on foo foo_1 (cost=0.42..60768.43 rows=11939 width=8) (actual time=0.039..3.093 rows=100 loops=1)
Filter: (i = ANY ('{1,2,4,5,6,7,8,10,15,22,34,75}'::integer[]))
Rows Removed by Filter: 9010
Heap Fetches: 0
Buffers: shared hit=73
-> Index Scan using foo_pkey on foo (cost=0.42..8.44 rows=1 width=17) (actual time=0.006..0.006 rows=1 loops=100)
Index Cond: (id = foo_1.id)
Buffers: shared hit=400
Execution Time: 3.972 ms
Edit
After thinking about it... since the LIMIT restricts the output to 100 rows ordered by date desc, wouldn't it be nice if we could get the 100 most recent rows for each license_plate_id, put all that into a top-n sort, and only keep the best 100 for all license_plate_ids? That would avoid reading and throwing away a lot of rows from the index. Even if that's much faster than hitting the table, it will still load up these index pages in RAM and clog up your buffers with stuff you don't actually need to keep in cache. Let's use LATERAL JOIN:
EXPLAIN (ANALYZE,BUFFERS)
SELECT * FROM foo
JOIN (SELECT d,i FROM
(VALUES (1),(2),(4),(5),(6),(7),(8),(10),(15),(22),(34),(75)) idlist
CROSS JOIN LATERAL
(SELECT d,i FROM foo WHERE i=idlist.column1 ORDER BY d DESC LIMIT 100) f2
ORDER BY d DESC LIMIT 100
) f3 USING (d,i)
ORDER BY d DESC LIMIT 100;
It's even faster: 2ms, and it uses the index on (license_plate_id,date) instead of the other way around. Also, and this is important, since each subquery in the lateral hits only the index pages that contain rows that will actually be selected, while the previous queries hit much more index pages. So you save on RAM buffers.
If you don't need the index on (date,license_plate_id) and don't want to keep a useless index, that could be interesting since this query doesn't use it. On the other hand, if you need the index on (date,license_plate_id) for something else and want to keep it, then... maybe not.
Please post results for the winning query 🔥

Postgres query optimizer generates bad plan after adding another order by criterion

I'm using django orm with select related and it generates the query of the form:
SELECT *
FROM "coupons_coupon"
LEFT OUTER JOIN "coupons_merchant"
ON ("coupons_coupon"."merchant_id" = "coupons_merchant"."slug")
WHERE ("coupons_coupon"."end_date" > '2020-07-10T09:10:28.101980+00:00'::timestamptz AND "coupons_coupon"."published" = true)
ORDER BY "coupons_coupon"."end_date" ASC, "coupons_coupon"."id"
LIMIT 5;
Which is then executed using the following plan:
Limit (cost=4363.28..4363.30 rows=5 width=604) (actual time=21.864..21.865 rows=5 loops=1)
-> Sort (cost=4363.28..4373.34 rows=4022 width=604) (actual time=21.863..21.863 rows=5 loops=1)
Sort Key: coupons_coupon.end_date, coupons_coupon.id"
Sort Method: top-N heapsort Memory: 32kB
-> Hash Left Join (cost=2613.51..4296.48 rows=4022 width=604) (actual time=13.918..20.209 rows=4022 loops=1)
Hash Cond: ((coupons_coupon.merchant_id)::text = (coupons_merchant.slug)::text)
-> Seq Scan on coupons_coupon (cost=0.00..291.41 rows=4022 width=261) (actual time=0.007..1.110 rows=4022 loops=1)
Filter: (published AND (end_date > '2020-07-10 09:10:28.10198+00'::timestamp with time zone))
Rows Removed by Filter: 1691
-> Hash (cost=1204.56..1204.56 rows=24956 width=331) (actual time=13.894..13.894 rows=23911 loops=1)
Buckets: 16384 Batches: 4 Memory Usage: 1948kB
-> Seq Scan on coupons_merchant (cost=0.00..1204.56 rows=24956 width=331) (actual time=0.003..4.681 rows=23911 loops=1)
Which is a bad execution plan as join can be done after the left table has been filtered, ordered and limited. When I remove the id from order by it generates an efficient plan, which basically could have been used in the previous query as well.
Limit (cost=0.57..8.84 rows=5 width=600) (actual time=0.013..0.029 rows=5 loops=1)
-> Nested Loop Left Join (cost=0.57..6650.48 rows=4022 width=600) (actual time=0.012..0.028 rows=5 loops=1)
-> Index Scan using coupons_cou_end_dat_a8d5b7_btree on coupons_coupon (cost=0.28..1015.77 rows=4022 width=261) (actual time=0.007..0.010 rows=5 loops=1)
Index Cond: (end_date > '2020-07-10 09:10:28.10198+00'::timestamp with time zone)
Filter: published
-> Index Scan using coupons_merchant_pkey on coupons_merchant (cost=0.29..1.40 rows=1 width=331) (actual time=0.003..0.003 rows=1 loops=5)
Index Cond: ((slug)::text = (coupons_coupon.merchant_id)::text)
Why is this happening? Can the optimizer be nudged to use similar plan for the former query?
I'm using postgres 12.
v13 of PostgreSQL, which should be released in the next few months, implements incremental sorting, in which it can read rows in an pre-sorted order based on prefix columns, then sorts just the ties on those prefix column(s) by the remaining column(s) in order to get a complete sort based on more columns than an index provides. I think that will do more or less what you want.
Limit (cost=2.46..2.99 rows=5 width=21)
-> Incremental Sort (cost=2.46..405.58 rows=3850 width=21)
Sort Key: coupons_coupon.end_date, coupons_coupon.id
Presorted Key: coupons_coupon.end_date
-> Nested Loop Left Join (cost=0.31..253.48 rows=3850 width=21)
-> Index Scan using coupons_coupon_end_date_idx on coupons_coupon (cost=0.15..54.71 rows=302 width=17)
Index Cond: (end_date > '2020-07-10 05:10:28.10198-04'::timestamp with time zone)
Filter: published
-> Index Only Scan using coupons_merchant_slug_idx on coupons_merchant (cost=0.15..0.53 rows=13 width=4)
Index Cond: (slug = coupons_coupon.merchant_id)
Of course just adding "id" into the current index will work under currently released versions, and even under version 13 is should be more efficient to have the index fully order the rows in the way you need them.

What is the difference between Index-Only and Bitmap Index Scan in PostgreSQL?

In my query, I just want to call data with exact where conditions. These where conditions were created in index. Bu the explain shows bit index-scan. I couldn't understand why.
My query looks like below:
Select
r.spend,
r.date,
...
from metadata m
inner join
report r
on m.org_id = r.org_id and m.country_or_region = r.country_or_region and m.campaign_id = r.campaign_id and m.keyword_id = r.keyword_id
where r.org_id = 1 and m.keyword_type = 'KEYWORD'
offset 0 limit 20
Indexes:
Metadata(org_id, keyword_type, country_or_region, campaign_id, keyword_id);
Report(org_id, country_or_region, campaign_id, keyword_id, date);
Explain Analyze:
"Limit (cost=811883.21..910327.87 rows=20 width=8) (actual time=18120.268..18235.831 rows=20 loops=1)"
" -> Gather (cost=811883.21..2702020.67 rows=384 width=8) (actual time=18120.267..18235.791 rows=20 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Parallel Hash Join (cost=810883.21..2700982.27 rows=160 width=8) (actual time=18103.440..18103.496 rows=14 loops=3)"
" Hash Cond: (((r.country_or_region)::text = (m.country_or_region)::text) AND (r.campaign_id = m.campaign_id) AND (r.keyword_id = m.keyword_id))"
" -> Parallel Bitmap Heap Scan on report r (cost=260773.11..2051875.83 rows=3939599 width=35) (actual time=552.601..8532.962 rows=3162553 loops=3)"
" Recheck Cond: (org_id = 479360)"
" Rows Removed by Index Recheck: 21"
" Heap Blocks: exact=20484 lossy=84350"
" -> Bitmap Index Scan on idx_kr_org_date_camp (cost=0.00..258409.35 rows=9455038 width=0) (actual time=539.329..539.329 rows=9487660 loops=1)"
" Index Cond: (org_id = 479360)"
" -> Parallel Hash (cost=527278.08..527278.08 rows=938173 width=26) (actual time=7425.062..7425.062 rows=727133 loops=3)"
" Buckets: 65536 Batches: 64 Memory Usage: 2656kB"
" -> Parallel Bitmap Heap Scan on metadata m (cost=88007.61..527278.08 rows=938173 width=26) (actual time=1007.028..7119.233 rows=727133 loops=3)"
" Recheck Cond: ((org_id = 479360) AND ((keyword_type)::text = 'KEYWORD'::text))"
" Rows Removed by Index Recheck: 3"
" Heap Blocks: exact=14585 lossy=11054"
" -> Bitmap Index Scan on idx_primary (cost=0.00..87444.71 rows=2251615 width=0) (actual time=1014.631..1014.631 rows=2181399 loops=1)"
" Index Cond: ((org_id = 479360) AND ((keyword_type)::text = 'KEYWORD'::text))"
"Planning Time: 0.492 ms"
"Execution Time: 18235.879 ms"
In here, I just want to call 20 items. It should be more effective?
The Bitmap Index Scan happens when the result set will have a high selectivity rate with respect to the search conditions (i.e., there is a high percentage of rows that satisfy the search criteria). In this case, the planner will plan to scan the entire index, forming a bitmap of which pages on disk to pull the data out from (which happens during the Bitmap Heap Scan step). This is better than a Sequential Scan because it only scans the relevant pages on disk, skipping the pages that it knows relevant data does not exist. Depending on the statistics available to the optimizer, it may not be advantageous to do an Index Scan or an Index-Only Scan, but it is still better than a Sequential Scan.
To complete the answer to the question, an Index-Only Scan is a scan of the index that will pull the relevant data without having to visit the actual table. This is because the relevant data is already in the index. Take, for example, this table:
postgres=# create table foo (id int primary key, name text);
CREATE TABLE
postgres=# insert into foo values (generate_series(1,1000000),'foo');
INSERT 0 1000000
There is an index on the id column of this table, and suppose we call the following query:
postgres=# EXPLAIN ANALYZE SELECT * FROM foo WHERE id < 100;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Index Scan using foo_pkey on foo (cost=0.42..10.25 rows=104 width=8) (actual time=0.012..1.027 rows=99 loops=1)
Index Cond: (id < 100)
Planning Time: 0.190 ms
Execution Time: 2.067 ms
(4 rows)
This query results in an Index scan because it scans the index for the rows that have id < 100, and then visits the actual table on disk to pull the other columns included in the * portion of the SELECT query.
However, suppose we call the following query (notice SELECT id instead of SELECT *):
postgres=# EXPLAIN ANALYZE SELECT id FROM foo WHERE id < 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Only Scan using foo_pkey on foo (cost=0.42..10.25 rows=104 width=4) (actual time=0.019..0.996 rows=99 loops=1)
Index Cond: (id < 100)
Heap Fetches: 99
Planning Time: 0.098 ms
Execution Time: 1.980 ms
(5 rows)
This results in an Index-only scan because only the id column is requested, and that is included (naturally) in the index, so there's no need to visit the actual table on disk to retrieve anything else. This saves time, but its occurrence is very limited.
To answer your question about limiting to 20 results, the limiting occurs after the Bitmap Index Scan has occurred, so the runtime will still be the same whether you limit to 20, 40, or some other value. In the case of an Index/Index-Only Scan, the executor will stop scanning after it has acquired enough rows as specified by the LIMIT clause. In your case, with the Bitmap Heap Scan, this isn’t possible

Postgres using unperformant plan when rerun

I'm importing a non circular graph and flattening the ancestors to an array per code. This works fine (for a bit): ~45s for 400k codes over ~900k edges.
However, after the first successful execution Postgres decides to stop using the Nested Loop and the update query performance drops drastically: ~2s per code.
I can force the issue by putting a vacuum right before the update but I am curious why the unoptimization is happening.
DROP TABLE IF EXISTS tmp_anc;
DROP TABLE IF EXISTS tmp_rel;
DROP TABLE IF EXISTS tmp_edges;
DROP TABLE IF EXISTS tmp_codes;
CREATE TABLE tmp_rel (
from_id BIGINT,
to_id BIGINT,
);
COPY tmp_rel FROM 'rel.txt' WITH DELIMITER E'\t' CSV HEADER;
CREATE TABLE tmp_edges(
start_node BIGINT,
end_node BIGINT
);
INSERT INTO tmp_edges(start_node, end_node)
SELECT from_id AS start_node, to_id AS end_node
FROM tmp_rel;
CREATE INDEX tmp_edges_end ON tmp_edges (end_node);
CREATE TABLE tmp_codes (
id BIGINT,
active SMALLINT,
);
COPY tmp_codes FROM 'codes.txt' WITH DELIMITER E'\t' CSV HEADER;
CREATE TABLE tmp_anc(
code BIGINT,
ancestors BIGINT[]
);
INSERT INTO tmp_anc
SELECT DISTINCT(id)
FROM tmp_codes
WHERE active = 1;
CREATE INDEX tmp_anc_codes ON tmp_anc_codes (code);
VACUUM; -- Need this for the update to execute optimally
UPDATE tmp_anc sa SET ancestors = (
WITH RECURSIVE ancestors(code) AS (
SELECT start_node FROM tmp_edges WHERE end_node = sa.code
UNION
SELECT se.start_node
FROM tmp_edges se, ancestors a
WHERE se.end_node = a.code
)
SELECT array_agg(code) FROM ancestors
);
Table stats:
tmp_rel 507 MB 0 bytes
tmp_edges 74 MB 37 MB
tmp_codes 32 MB 0 bytes
tmp_anc 22 MB 8544 kB
Explains:
Without VACUUM before UPDATE:
Update on tmp_anc sa (cost=10000000000.00..11081583053.74 rows=10 width=46) (actual time=38294.005..38294.005 rows=0 loops=1)
-> Seq Scan on tmp_anc sa (cost=10000000000.00..11081583053.74 rows=10 width=46) (actual time=3300.974..38292.613 rows=10 loops=1)
SubPlan 2
-> Aggregate (cost=108158305.25..108158305.26 rows=1 width=32) (actual time=3829.253..3829.253 rows=1 loops=10)
CTE ancestors
-> Recursive Union (cost=81.97..66015893.05 rows=1872996098 width=8) (actual time=0.037..3827.917 rows=45 loops=10)
-> Bitmap Heap Scan on tmp_edges (cost=81.97..4913.18 rows=4328 width=8) (actual time=0.022..0.022 rows=2 loops=10)
Recheck Cond: (end_node = sa.code)
Heap Blocks: exact=12
-> Bitmap Index Scan on tmp_edges_end (cost=0.00..80.89 rows=4328 width=0) (actual time=0.014..0.014 rows=2 loops=10)
Index Cond: (end_node = sa.code)
-> Merge Join (cost=4198.89..2855105.79 rows=187299177 width=8) (actual time=163.746..425.295 rows=10 loops=90)
Merge Cond: (a.code = se.end_node)
-> Sort (cost=4198.47..4306.67 rows=43280 width=8) (actual time=0.012..0.016 rows=5 loops=90)
Sort Key: a.code
Sort Method: quicksort Memory: 25kB
-> WorkTable Scan on ancestors a (cost=0.00..865.60 rows=43280 width=8) (actual time=0.000..0.001 rows=5 loops=90)
-> Materialize (cost=0.42..43367.08 rows=865523 width=16) (actual time=0.010..337.592 rows=537171 loops=90)
-> Index Scan using tmp_edges_end on edges se (cost=0.42..41203.27 rows=865523 width=16) (actual time=0.009..247.547 rows=537171 loops=90)
-> CTE Scan on ancestors (cost=0.00..37459921.96 rows=1872996098 width=8) (actual time=1.227..3829.159 rows=45 loops=10)
With VACUUM before UPDATE:
Update on tmp_anc sa (cost=0.00..2949980136.43 rows=387059 width=14) (actual time=74701.329..74701.329 rows=0 loops=1)
-> Seq Scan on tmp_anc sa (cost=0.00..2949980136.43 rows=387059 width=14) (actual time=0.336..70324.848 rows=387059 loops=1)
SubPlan 2
-> Aggregate (cost=7621.50..7621.51 rows=1 width=8) (actual time=0.180..0.180 rows=1 loops=387059)
CTE ancestors
-> Recursive Union (cost=0.42..7583.83 rows=1674 width=8) (actual time=0.005..0.162 rows=32 loops=387059)
-> Index Scan using tmp_edges_end on tmp_edges (cost=0.42..18.93 rows=4 width=8) (actual time=0.004..0.005 rows=2 loops=387059)
Index Cond: (end_node = sa.code)
-> Nested Loop (cost=0.42..753.14 rows=167 width=8) (actual time=0.003..0.019 rows=10 loops=2700448)
-> WorkTable Scan on ancestors a (cost=0.00..0.80 rows=40 width=8) (actual time=0.000..0.001 rows=5 loops=2700448)
-> Index Scan using tmp_edges_end on tmp_edges se (cost=0.42..18.77 rows=4 width=16) (actual time=0.003..0.003 rows=2 loops=12559395)
Index Cond: (end_node = a.code)
-> CTE Scan on ancestors (cost=0.00..33.48 rows=1674 width=8) (actual time=0.007..0.173 rows=32 loops=387059)
The first execution plan has really bad estimates (Bitmap Index Scan on tmp_edges_end estimates 4328 instead of 2 rows), while the second execution has good estimates and thus chooses a good plan.
So something between the two executions you quote above must have changed the estimates.
Moreover, you say that the first execution of the UPDATE (for which we have no EXPLAIN (ANALYZE) output) was fast.
The only good explanation for the initial performance drop is that it takes the autovacuum daemon some time to collect statistics for the new tables. This normally improves query performance, but of course it can also work the other way around.
Also, a VACUUM usually doesn't fix performance issues. Could it be that you used VACUUM (ANALYZE)?
It would be interesting to know how things are when you collect statistics before your initial UPDATE:
ANALYZE tmp_edges;
When I read your query more closely, however, I wonder why you use a correlated subquery for that. Maybe it would be faster to do something like:
UPDATE tmp_anc sa
SET ancestors = a.codes
FROM (WITH RECURSIVE ancestors(code, start_node) AS
(SELECT tmp_anc.code, tmp_edges.start_node
FROM tmp_edges
JOIN tmp_anc ON tmp_edges.end_node = tmp_anc.code
UNION
SELECT a.code, se.start_node
FROM tmp_edges se
JOIN ancestors a ON se.end_node = a.code
)
SELECT code,
array_agg(start_node) AS codes
FROM ancestors
GROUP BY (code)
) a
WHERE sa.code = a.code;
(This is untested, so there may be mistakes.)