PostgreSQL multi-column group by not using index when selecting minimum - postgresql
When selecting MIN on a column in PostgreSQL (11, 12, 13) after a GROUP BY operation on multiple columns, any index created on the grouped columns is not used: https://dbfiddle.uk/?rdbms=postgres_13&fiddle=30e0f341940f4c1fa6013677643a0baf
CREATE TABLE tags (id serial, series int, index int, page int);
CREATE INDEX ON tags (page, series, index);
INSERT INTO tags (series, index, page)
SELECT
ceil(random() * 10),
ceil(random() * 100),
ceil(random() * 1000)
FROM generate_series(1, 100000);
EXPLAIN ANALYZE
SELECT tags.page, tags.series, MIN(tags.index)
FROM tags GROUP BY tags.page, tags.series;
HashAggregate (cost=2291.00..2391.00 rows=10000 width=12) (actual time=108.968..133.153 rows=9999 loops=1)
Group Key: page, series
Batches: 1 Memory Usage: 1425kB
-> Seq Scan on tags (cost=0.00..1541.00 rows=100000 width=12) (actual time=0.015..55.240 rows=100000 loops=1)
Planning Time: 0.257 ms
Execution Time: 133.771 ms
Theoretically, the index should allow the database to seek in steps of (tags.page, tags.series) instead of performing a full scan. This would result in 10,000 processed rows for above dataset instead of 100,000. This link describes the method with no grouped columns.
This answer (as well as this one) suggests using DISTINCT ON with an ordering instead of GROUP BY but that produces this query plan:
Unique (cost=0.42..5680.42 rows=10000 width=12) (actual time=0.066..268.038 rows=9999 loops=1)
-> Index Only Scan using tags_page_series_index_idx on tags (cost=0.42..5180.42 rows=100000 width=12) (actual time=0.064..227.219 rows=100000 loops=1)
Heap Fetches: 100000
Planning Time: 0.426 ms
Execution Time: 268.712 ms
While the index is now being used, it still appears to be scanning the full set of rows. When using SET enable_seqscan=OFF, the GROUP BY query degrades to the same behaviour.
How can I encourage PostgreSQL to use the multi-column index?
If you can pull the set of distinct page,series from another table then you can hack it with a lateral join:
CREATE TABLE pageseries AS SELECT DISTINCT page,series FROM tags ORDER BY page,series;
EXPLAIN ANALYZE SELECT p.*, minindex FROM pageseries p CROSS JOIN LATERAL (SELECT index minindex FROM tags t WHERE t.page=p.page AND t.series=p.series ORDER BY page,series,index LIMIT 1) x;
Nested Loop (cost=0.42..8720.00 rows=10000 width=12) (actual time=0.039..56.013 rows=10000 loops=1)
-> Seq Scan on pageseries p (cost=0.00..145.00 rows=10000 width=8) (actual time=0.012..1.872 rows=10000 loops=1)
-> Limit (cost=0.42..0.84 rows=1 width=12) (actual time=0.005..0.005 rows=1 loops=10000)
-> Index Only Scan using tags_page_series_index_idx on tags t (cost=0.42..4.62 rows=10 width=12) (actual time=0.004..0.004 rows=1 loops=10000)
Index Cond: ((page = p.page) AND (series = p.series))
Heap Fetches: 0
Planning Time: 0.168 ms
Execution Time: 57.077 ms
...but it is not necessarily faster:
EXPLAIN ANALYZE SELECT tags.page, tags.series, MIN(tags.index)
FROM tags GROUP BY tags.page, tags.series;
HashAggregate (cost=2291.00..2391.00 rows=10000 width=12) (actual time=56.177..58.923 rows=10000 loops=1)
Group Key: page, series
Batches: 1 Memory Usage: 1425kB
-> Seq Scan on tags (cost=0.00..1541.00 rows=100000 width=12) (actual time=0.010..12.845 rows=100000 loops=1)
Planning Time: 0.129 ms
Execution Time: 59.644 ms
It would be massively faster IF the number of iterations in the nested loop was small, in other words if there was a low number of distinct (page,series). I'll try with series alone, since that has only 10 distinct values:
CREATE TABLE series AS SELECT DISTINCT series FROM tags;
EXPLAIN ANALYZE SELECT p.*, minindex FROM series p CROSS JOIN LATERAL (SELECT index minindex FROM tags t WHERE t.series=p.series ORDER BY series,index LIMIT 1) x;
Nested Loop (cost=0.29..886.18 rows=2550 width=8) (actual time=0.081..0.264 rows=10 loops=1)
-> Seq Scan on series p (cost=0.00..35.50 rows=2550 width=4) (actual time=0.007..0.010 rows=10 loops=1)
-> Limit (cost=0.29..0.31 rows=1 width=8) (actual time=0.024..0.024 rows=1 loops=10)
-> Index Only Scan using tags_series_index_idx on tags t (cost=0.29..211.29 rows=10000 width=8) (actual time=0.023..0.023 rows=1 loops=10)
Index Cond: (series = p.series)
Heap Fetches: 0
Planning Time: 0.198 ms
Execution Time: 0.292 ms
In this case, definitely worth it, because the query hits only 10/100000 rows. The other queries hit 10000/100000 rows, or 10% of the table, which is above the threshold where an index would really help.
Note putting the column with lower cardinality first will result in a smaller index:
CREATE INDEX ON tags (series, page, index);
select pg_relation_size( 'tags_page_series_index_idx' );
4284416
select pg_relation_size( 'tags_series_page_index_idx' );
3104768
...but it doesn't make the query any faster.
If this type of stuff is really critical, perhaps try clickhouse or dolphindb.
To support that kind of thing PostgreSQL would have to have something like an index skip scan, and it is only efficient to use that if there are few groups.
If the speed of that query is essential, you could consider using a materialized view.
Related
Postgres not using index when ORDER BY and LIMIT when LIMIT above X
I have been trying to debug an issue with postgres where it decides to not use an index when LIMIT is above a specific value. For example I have a table of 150k rows and when searching with LIMIT of 286 it uses the index while with LIMIT above 286 it does not. LIMIT 286 uses index db=# explain (analyze, buffers) SELECT * FROM tempz.tempx AS r INNER JOIN tempz.tempy AS z ON (r.id_tempy=z.id) WHERE z.int_col=2000 AND z.string_col='temp_string' ORDER BY r.name ASC, r.type ASC, r.id ASC LIMIT 286; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.56..5024.12 rows=286 width=810) (actual time=0.030..0.992 rows=286 loops=1) Buffers: shared hit=921 -> Nested Loop (cost=0.56..16968.23 rows=966 width=810) (actual time=0.030..0.977 rows=286 loops=1) Join Filter: (r.id_tempy = z.id) Rows Removed by Join Filter: 624 Buffers: shared hit=921 -> Index Scan using tempz_tempx_name_type_id_idx on tempx r (cost=0.42..14357.69 rows=173878 width=373) (actual time=0.016..0.742 rows=910 loops=1) Buffers: shared hit=919 -> Materialize (cost=0.14..2.37 rows=1 width=409) (actual time=0.000..0.000 rows=1 loops=910) Buffers: shared hit=2 -> Index Scan using tempy_string_col_idx on tempy z (cost=0.14..2.37 rows=1 width=409) (actual time=0.007..0.008 rows=1 loops=1) Index Cond: (string_col = 'temp_string'::text) Filter: (int_col = 2000) Buffers: shared hit=2 Planning Time: 0.161 ms Execution Time: 1.032 ms (16 rows) vs. LIMIT 287 doing sort db=# explain (analyze, buffers) SELECT * FROM tempz.tempx AS r INNER JOIN tempz.tempy AS z ON (r.id_tempy=z.id) WHERE z.int_col=2000 AND z.string_col='temp_string' ORDER BY r.name ASC, r.type ASC, r.id ASC LIMIT 287; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=4976.86..4977.58 rows=287 width=810) (actual time=49.802..49.828 rows=287 loops=1) Buffers: shared hit=37154 -> Sort (cost=4976.86..4979.27 rows=966 width=810) (actual time=49.801..49.813 rows=287 loops=1) Sort Key: r.name, r.type, r.id Sort Method: top-N heapsort Memory: 506kB Buffers: shared hit=37154 -> Nested Loop (cost=0.42..4932.59 rows=966 width=810) (actual time=0.020..27.973 rows=51914 loops=1) Buffers: shared hit=37154 -> Seq Scan on tempy z (cost=0.00..12.70 rows=1 width=409) (actual time=0.006..0.008 rows=1 loops=1) Filter: ((int_col = 2000) AND (string_col = 'temp_string'::text)) Rows Removed by Filter: 2 Buffers: shared hit=1 -> Index Scan using tempx_id_tempy_idx on tempx r (cost=0.42..4340.30 rows=57959 width=373) (actual time=0.012..17.075 rows=51914 loops=1) Index Cond: (id_tempy = z.id) Buffers: shared hit=37153 Planning Time: 0.258 ms Execution Time: 49.907 ms (17 rows) Update: This is Postgres 11 and VACUUM ANALYZE is run daily. Also, I have already tried to use CTE to remove the filter but the problem is the sorting specifically -> Sort (cost=4976.86..4979.27 rows=966 width=810) (actual time=49.801..49.813 rows=287 loops=1) Sort Key: r.name, r.type, r.id Sort Method: top-N heapsort Memory: 506kB Buffers: shared hit=37154 Update 2: After running VACUUM ANALYZE the database starts using the index for some hours and then it goes back to not using it.
Turns out that I can force Postgres to avoid doing any sort if I run SET enable_sort TO OFF;. This raises the cost of sorting very high which causes the Postgres planner to do index scan instead. I am not really sure why Postgres thinks that index scan is so costly cost=0.42..14357.69 and thinks sorting is cheaper and ends up choosing that. It is also very weird that immediately after a VACUUM ANALYZE it analyzes the costs correctly but after some hours it goes back to sorting. With sort off plan is still not optimized as it does materialize and loads stuff into memory but it is still faster than sorting.
PostgreSQL slow order
I have table (over 100 millions records) on PostgreSQL 13.1 CREATE TABLE report ( id serial primary key, license_plate_id integer, datetime timestamp ); Indexes (for test I create both of them): create index report_lp_datetime_index on report (license_plate_id, datetime); create index report_lp_datetime_desc_index on report (license_plate_id desc, datetime desc); So, my question is why query like select * from report r where r.license_plate_id in (1,2,4,5,6,7,8,10,15,22,34,75) order by datetime desc limit 100 Is very slow (~10sec). But query without order statement is fast (milliseconds). Explain: explain (analyze, buffers, format text) select * from report r where r.license_plate_id in (1,2,4,5,6,7,8,10,15,22,34, 75,374,57123) limit 100 Limit (cost=0.57..400.38 rows=100 width=316) (actual time=0.037..0.216 rows=100 loops=1) Buffers: shared hit=103 -> Index Scan using report_lp_id_idx on report r (cost=0.57..44986.97 rows=11252 width=316) (actual time=0.035..0.202 rows=100 loops=1) Index Cond: (license_plate_id = ANY ('{1,2,4,5,6,7,8,10,15,22,34,75,374,57123}'::integer[])) Buffers: shared hit=103 Planning Time: 0.228 ms Execution Time: 0.251 ms explain (analyze, buffers, format text) select * from report r where r.license_plate_id in (1,2,4,5,6,7,8,10,15,22,34,75,374,57123) order by datetime desc limit 100 Limit (cost=44193.63..44193.88 rows=100 width=316) (actual time=4921.030..4921.047 rows=100 loops=1) Buffers: shared hit=11455 read=671 -> Sort (cost=44193.63..44221.76 rows=11252 width=316) (actual time=4921.028..4921.035 rows=100 loops=1) Sort Key: datetime DESC Sort Method: top-N heapsort Memory: 128kB Buffers: shared hit=11455 read=671 -> Bitmap Heap Scan on report r (cost=151.18..43763.59 rows=11252 width=316) (actual time=54.422..4911.927 rows=12148 loops=1) Recheck Cond: (license_plate_id = ANY ('{1,2,4,5,6,7,8,10,15,22,34,75,374,57123}'::integer[])) Heap Blocks: exact=12063 Buffers: shared hit=11455 read=671 -> Bitmap Index Scan on report_lp_id_idx (cost=0.00..148.37 rows=11252 width=0) (actual time=52.631..52.632 rows=12148 loops=1) Index Cond: (license_plate_id = ANY ('{1,2,4,5,6,7,8,10,15,22,34,75,374,57123}'::integer[])) Buffers: shared hit=59 read=4 Planning Time: 0.427 ms Execution Time: 4921.128 ms
You seem to have rather slow storage, if reading 671 8kB-blocks from disk takes a couple of seconds. The way to speed this up is to reorder the table in the same way as the index, so that you can find the required rows in the same or adjacent table blocks: CLUSTER report_lp_id_idx USING report_lp_id_idx; Be warned that rewriting the table in this way causes downtime – the table will not be available while it is being rewritten. Moreover, PostgreSQL does not maintain the table order, so subsequent data modifications will cause performance to gradually deteriorate, so that after a while you will have to run CLUSTER again. But if you need this query to be fast no matter what, CLUSTER is the way to go.
Your two indices do exactly the same thing, so you can remove the second one, it's useless. To optimize your query, the order of the fields inside the index must be reversed: create index report_lp_datetime_index on report (datetime,license_plate_id); BEGIN; CREATE TABLE foo (d INTEGER, i INTEGER); INSERT INTO foo SELECT random()*100000, random()*1000 FROM generate_series(1,1000000) s; CREATE INDEX foo_d_i ON foo(d DESC,i); COMMIT; VACUUM ANALYZE foo; EXPLAIN ANALYZE SELECT * FROM foo WHERE i IN (1,2,4,5,6,7,8,10,15,22,34,75) ORDER BY d DESC LIMIT 100; Limit (cost=0.42..343.92 rows=100 width=8) (actual time=0.076..9.359 rows=100 loops=1) -> Index Only Scan Backward using foo_d_i on foo (cost=0.42..40976.43 rows=11929 width=8) (actual time=0.075..9.339 rows=100 loops=1) Filter: (i = ANY ('{1,2,4,5,6,7,8,10,15,22,34,75}'::integer[])) Rows Removed by Filter: 9016 Heap Fetches: 0 Planning Time: 0.339 ms Execution Time: 9.387 ms Note the index is not used to optimize the WHERE clause. It is used here as a compact and fast way to store references to the rows ordered by date DESC, so the ORDER BY can do an index-only scan and avoid sorting. By adding column id to the index, an index-only scan can be performed to test the condition on id, without hitting the table for every row. Since there is a low LIMIT value it does not need to scan the whole index, it only scans it in date DESC order until it finds enough rows satisfying the WHERE condition to return the result. It will be faster if you create the index in date DESC order, this could be useful if you use ORDER BY date DESC + LIMIT in other queries too. You forget that OP's table has a third column, and he is using SELECT *. So that wouldn't be an index-only scan. Easy to work around. The optimum way to do this query would be an index-only scan to filter on WHERE conditions, then LIMIT, then hit the table to get the rows. For some reason if "select *" is used postgres takes the id column from the table instead of taking it from the index, which results in lots of unnecessary heap fetches for rows whose id is rejected by the WHERE condition. Easy to work around, by doing it manually. I've also added another bogus column to make sure the SELECT * hits the table. EXPLAIN (ANALYZE,buffers) SELECT * FROM foo JOIN (SELECT d,i FROM foo WHERE i IN (1,2,4,5,6,7,8,10,15,22,34,75) ORDER BY d DESC LIMIT 100) f USING (d,i) ORDER BY d DESC LIMIT 100; Limit (cost=0.85..1281.94 rows=1 width=17) (actual time=0.052..3.618 rows=100 loops=1) Buffers: shared hit=453 -> Nested Loop (cost=0.85..1281.94 rows=1 width=17) (actual time=0.050..3.594 rows=100 loops=1) Buffers: shared hit=453 -> Limit (cost=0.42..435.44 rows=100 width=8) (actual time=0.037..2.953 rows=100 loops=1) Buffers: shared hit=53 -> Index Only Scan using foo_d_i on foo foo_1 (cost=0.42..51936.43 rows=11939 width=8) (actual time=0.037..2.935 rows=100 loops=1) Filter: (i = ANY ('{1,2,4,5,6,7,8,10,15,22,34,75}'::integer[])) Rows Removed by Filter: 9010 Heap Fetches: 0 Buffers: shared hit=53 -> Index Scan using foo_d_i on foo (cost=0.42..8.45 rows=1 width=17) (actual time=0.005..0.005 rows=1 loops=100) Index Cond: ((d = foo_1.d) AND (i = foo_1.i)) Buffers: shared hit=400 Execution Time: 3.663 ms Another option is to just add the primary key to the date,license_plate index. SELECT * FROM foo JOIN (SELECT id FROM foo WHERE i IN (1,2,4,5,6,7,8,10,15,22,34,75) ORDER BY d DESC LIMIT 100) f USING (id) ORDER BY d DESC LIMIT 100; Limit (cost=1357.98..1358.23 rows=100 width=17) (actual time=3.920..3.947 rows=100 loops=1) Buffers: shared hit=473 -> Sort (cost=1357.98..1358.23 rows=100 width=17) (actual time=3.919..3.931 rows=100 loops=1) Sort Key: foo.d DESC Sort Method: quicksort Memory: 32kB Buffers: shared hit=473 -> Nested Loop (cost=0.85..1354.66 rows=100 width=17) (actual time=0.055..3.858 rows=100 loops=1) Buffers: shared hit=473 -> Limit (cost=0.42..509.41 rows=100 width=8) (actual time=0.039..3.116 rows=100 loops=1) Buffers: shared hit=73 -> Index Only Scan using foo_d_i_id on foo foo_1 (cost=0.42..60768.43 rows=11939 width=8) (actual time=0.039..3.093 rows=100 loops=1) Filter: (i = ANY ('{1,2,4,5,6,7,8,10,15,22,34,75}'::integer[])) Rows Removed by Filter: 9010 Heap Fetches: 0 Buffers: shared hit=73 -> Index Scan using foo_pkey on foo (cost=0.42..8.44 rows=1 width=17) (actual time=0.006..0.006 rows=1 loops=100) Index Cond: (id = foo_1.id) Buffers: shared hit=400 Execution Time: 3.972 ms Edit After thinking about it... since the LIMIT restricts the output to 100 rows ordered by date desc, wouldn't it be nice if we could get the 100 most recent rows for each license_plate_id, put all that into a top-n sort, and only keep the best 100 for all license_plate_ids? That would avoid reading and throwing away a lot of rows from the index. Even if that's much faster than hitting the table, it will still load up these index pages in RAM and clog up your buffers with stuff you don't actually need to keep in cache. Let's use LATERAL JOIN: EXPLAIN (ANALYZE,BUFFERS) SELECT * FROM foo JOIN (SELECT d,i FROM (VALUES (1),(2),(4),(5),(6),(7),(8),(10),(15),(22),(34),(75)) idlist CROSS JOIN LATERAL (SELECT d,i FROM foo WHERE i=idlist.column1 ORDER BY d DESC LIMIT 100) f2 ORDER BY d DESC LIMIT 100 ) f3 USING (d,i) ORDER BY d DESC LIMIT 100; It's even faster: 2ms, and it uses the index on (license_plate_id,date) instead of the other way around. Also, and this is important, since each subquery in the lateral hits only the index pages that contain rows that will actually be selected, while the previous queries hit much more index pages. So you save on RAM buffers. If you don't need the index on (date,license_plate_id) and don't want to keep a useless index, that could be interesting since this query doesn't use it. On the other hand, if you need the index on (date,license_plate_id) for something else and want to keep it, then... maybe not. Please post results for the winning query 🔥
GIN index not working with `SELECT 1` but it works if I do `SELECT COUNT(*)` on PostgreSQL
I have the following query > explain analyze SELECT 1 AS one FROM "orders" WHERE "orders"."email" ILIKE '%email#gmail.com%' LIMIT 1 OFFSET 0 QUERY PLAN Limit (cost=0.00..470.44 rows=1 width=4) (actual time=2303.032..2303.033 rows=1 loops=1) Output: 1 -> Seq Scan on public.orders (cost=0.00..108200.10 rows=230 width=4) (actual time=2303.031..2303.031 rows=1 loops=1) Output: 1 Filter: ((orders.email)::text ~~* '%email#gmail.com%'::text) Rows Removed by Filter: 2309367 Planning Time: 0.195 ms Execution Time: 2303.047 ms If I run the same query but instead of using SELECT 1 I use SELECT COUNT(*) the gin index (gin_trgm_ops) start to work > explain analyze SELECT COUNT(*) FROM "orders" WHERE "orders"."email" ILIKE '%email#gmail.com%' LIMIT 1 OFFSET 0 QUERY PLAN Limit (cost=1263.98..1263.99 rows=1 width=8) (actual time=18.074..18.075 rows=1 loops=1) -> Aggregate (cost=1263.98..1263.99 rows=1 width=8) (actual time=18.073..18.073 rows=1 loops=1) -> Bitmap Heap Scan on orders (cost=377.78..1263.40 rows=230 width=0) (actual time=18.062..18.067 rows=3 loops=1) Recheck Cond: ((email)::text ~~* '%email#gmail.com%'::text) Heap Blocks: exact=2 -> Bitmap Index Scan on index_orders_on_email_gin (cost=0.00..377.72 rows=230 width=0) (actual time=18.043..18.044 rows=3 loops=1) Index Cond: ((email)::text ~~* '%email#gmail.com%'::text) Planning Time: 0.575 ms Execution Time: 18.120 ms Any idea why? Thanks
With SELECT 1 ... LIMIT 1, it can stop early once it finds one qualifying row. Since PostgreSQL misestimates how many qualifying rows there are, it misestimates how useful this stopping early will be. The LIMIT doesn't do anything when used with COUNT(*) but without a GROUP BY, since only one row is returned anyway. There is no stopping-early that can be done, as every qualifying row needs to be found in order to count them. The crux of the matter is not SELECT 1 versus SELECT COUNT(*), it is a LIMIT that does something versus one that does not.
Postgres uses Hash Join with Seq Scan when Inner Select Index Cond is faster
Postgres is using a much heavier Seq Scan on table tracking when an index is available. The first query was the original attempt, which uses a Seq Scan and therefore has a slow query. I attempted to force an Index Scan with an Inner Select, but postgres converted it back to effectively the same query with nearly the same runtime. I finally copied the list from the Inner Select of query two to make the third query. Finally postgres used the Index Scan, which dramatically decreased the runtime. The third query is not viable in a production environment. What will cause postgres to use the last query plan? (vacuum was used on both tables) Tables tracking (worker_id, localdatetime) total records: 118664105 project_worker (id, project_id) total records: 12935 INDEX CREATE INDEX tracking_worker_id_localdatetime_idx ON public.tracking USING btree (worker_id, localdatetime) Queries SELECT worker_id, localdatetime FROM tracking t JOIN project_worker pw ON t.worker_id = pw.id WHERE project_id = 68475018 Hash Join (cost=29185.80..2638162.26 rows=19294218 width=16) (actual time=16.912..18376.032 rows=177681 loops=1) Hash Cond: (t.worker_id = pw.id) -> Seq Scan on tracking t (cost=0.00..2297293.86 rows=118716186 width=16) (actual time=0.004..8242.891 rows=118674660 loops=1) -> Hash (cost=29134.80..29134.80 rows=4080 width=8) (actual time=16.855..16.855 rows=2102 loops=1) Buckets: 4096 Batches: 1 Memory Usage: 115kB -> Seq Scan on project_worker pw (cost=0.00..29134.80 rows=4080 width=8) (actual time=0.004..16.596 rows=2102 loops=1) Filter: (project_id = 68475018) Rows Removed by Filter: 10833 Planning Time: 0.192 ms Execution Time: 18382.698 ms SELECT worker_id, localdatetime FROM tracking t WHERE worker_id IN (SELECT id FROM project_worker WHERE project_id = 68475018 LIMIT 500) Hash Semi Join (cost=6905.32..2923969.14 rows=27733254 width=24) (actual time=19.715..20191.517 rows=20530 loops=1) Hash Cond: (t.worker_id = project_worker.id) -> Seq Scan on tracking t (cost=0.00..2296948.27 rows=118698327 width=24) (actual time=0.005..9184.676 rows=118657026 loops=1) -> Hash (cost=6899.07..6899.07 rows=500 width=8) (actual time=1.103..1.103 rows=500 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 28kB -> Limit (cost=0.00..6894.07 rows=500 width=8) (actual time=0.006..1.011 rows=500 loops=1) -> Seq Scan on project_worker (cost=0.00..28982.65 rows=2102 width=8) (actual time=0.005..0.968 rows=500 loops=1) Filter: (project_id = 68475018) Rows Removed by Filter: 4493 Planning Time: 0.224 ms Execution Time: 20192.421 ms SELECT worker_id, localdatetime FROM tracking t WHERE worker_id IN (322016383,316007840,...,285702579) Index Scan using tracking_worker_id_localdatetime_idx on tracking t (cost=0.57..4766798.31 rows=21877360 width=24) (actual time=0.079..29.756 rows=22112 loops=1) " Index Cond: (worker_id = ANY ('{322016383,316007840,...,285702579}'::bigint[]))" Planning Time: 1.162 ms Execution Time: 30.884 ms ... is in place of the 500 id entries used in the query Same query ran on another set of 500 id's Index Scan using tracking_worker_id_localdatetime_idx on tracking t (cost=0.57..4776714.91 rows=21900980 width=24) (actual time=0.105..5528.109 rows=117838 loops=1) " Index Cond: (worker_id = ANY ('{286237712,286237844,...,216724213}'::bigint[]))" Planning Time: 2.105 ms Execution Time: 5534.948 ms
The distribution of "worker_id" within "tracking" seems very skewed. For one thing, the number of rows in one of your instances of query 3 returns over 5 times as many rows as the other instance of it. For another, the estimated number of rows is 100 to 1000 times higher than the actual number. This can certainly lead to bad plans (although it is unlikely to be the complete picture). What is the actual number of distinct values for worker_id within tracking: select count(distinct worker_id) from tracking? What does the planner think this value is: select n_distinct from pg_stats where tablename='tracking' and attname='worker_id'? If those values are far apart and you force the planner to use a more reasonable value with alter table tracking alter column worker_id set (n_distinct = <real value>); analyze tracking; does that change the plans?
If you want to nudge PostgreSQL towards a nested loop join, try the following: Create an index on tracking that can be used for an index-only scan: CREATE INDEX ON tracking (worker_id) INCLUDE (localdatetime); Make sure that tracking is VACUUMed often, so that an index-only scan is effective. Reduce random_page_cost and increase effective_cache_size so that the optimizer prices index scans lower (but don't use insane values). Make sure that you have good estimates on project_worker: ALTER TABLE project_worker ALTER project_id SET STATISTICS 1000; ANALYZE project_worker;
Why this query does't use index-only scan in PostgreSQL?
I have a table with 28 columns and 7M records without primary key. CREATE TABLE records ( direction smallint, exporters_id integer, time_stamp integer ... ) I create index on this table and vacuum table after that (autovacuum is on) CREATE INDEX exporter_dir_time_only_index ON sacopre_records USING btree (exporters_id, direction, time_stamp); and i want execute this query SELECT count(exporters_id) FROM records WHERE exporters_id = 50 The table has 6982224 records with exporters_id = 50. I expected this query use index-only scan to get results but it used sequential scan. This is "EXPLAIN ANALYZE" output: Aggregate (cost=204562.25..204562.26 rows=1 width=4) (actual time=1521.862..1521.862 rows=1 loops=1) -> Seq Scan on sacopre_records (cost=0.00..187106.88 rows=6982149 width=4) (actual time=0.885..1216.211 rows=6982224 loops=1) Filter: (exporters_id = 50) Rows Removed by Filter: 2663 Total runtime: 1521.886 ms but when I change the exporters_id to another id, query use index-only scan Aggregate (cost=46.05..46.06 rows=1 width=4) (actual time=0.321..0.321 rows=1 loops=1) -> Index Only Scan using exporter_dir_time_only_index on sacopre_records (cost=0.43..42.85 rows=1281 width=4) (actual time=0.313..0.315 rows=4 loops=1) Index Cond: (exporters_id = 47) Heap Fetches: 0 Total runtime: 0.358 ms Where is the problem?
The explain is telling you the reason. Look closer. Aggregate (cost=204562.25..204562.26 rows=1 width=4) (actual time=1521.862..1521.862 rows=1 loops=1) -> Seq Scan on sacopre_records (cost=0.00..187106.88 rows=6982149 width=4) (actual time=0.885..1216.211 rows=6982224 loops=1) Filter: (exporters_id = 50) Rows Removed by Filter: 2663 Total runtime: 1521.886 ms Your filter is removing only 2663 rows out of the total amount of 6982149 rows in the table, hence doing a sequential scan should really be faster than using an index, as the disk head should pass through 6982149 - 2663 = 6979486 records anyway. The disk head is starting to read the entire table sequentially and on the way is removing that tiny fraction (0.000004 %) that does not match your criteria. While in the index scan case it should jump from the index file(s) and get back to the data file(s) 6979486 times, which for sure should be slower than these 1.5 seconds you are getting now!