Force GIN index scan in postgresql 9.4 - postgresql

I'm having a table of locations(29 million rows approx)
Table "public.locations"
Column | Type| Modifiers
------------------------------------+-------------------+------------------------------------------------------------
id | integer | not null default nextval('locations_id_seq'::regclass)
dl | text |
Indexes:
"locations_pkey" PRIMARY KEY, btree (id)
"locations_test_idx" gin (to_tsvector('english'::regconfig, dl))
I want the following query to have perform well.
EXPLAIN (ANALYZE,BUFFERS) SELECT id FROM locations WHERE to_tsvector('english'::regconfig, dl) ## to_tsquery('Lymps') LIMIT 10;
But the query plan produced show sequential scan being used.
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..65.18 rows=10 width=4) (actual time=62217.569..62217.569 rows=0 loops=1)
Buffers: shared hit=262 read=447808
I/O Timings: read=861.370
-> Seq Scan on locations (cost=0.00..967615.99 rows=148442 width=2) (actual time=62217.567..62217.567 rows=0 loops=1)
Filter: (to_tsvector('english'::regconfig, dl) ## to_tsquery('Lymps'::text))
Rows Removed by Filter: 29688342
Buffers: shared hit=262 read=447808
I/O Timings: read=861.370
Planning time: 0.109 ms
Execution time: 62217.584 ms
Upon forcibly turning off seq scan
set enable_seqscan to off;
The query plan now uses the gin index.
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1382.43..1403.20 rows=10 width=2) (actual time=0.043..0.043 rows=0 loops=1)
Buffers: shared hit=1 read=3
-> Bitmap Heap Scan on locations (cost=1382.43..309697.73 rows=148442 width=2) (actual time=0.043..0.043 rows=0 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, dl) ## to_tsquery('Lymps'::text))
Buffers: shared hit=1 read=3
-> Bitmap Index Scan on locations_test_idx (cost=0.00..1345.32 rows=148442 width=0) (actual time=0.041..0.041 rows=0 loops=1)
Index Cond: (to_tsvector('english'::regconfig, dl) ## to_tsquery('Lymps'::text))
Buffers: shared hit=1 read=3
Planning time: 0.089 ms
Execution time: 0.069 ms
(10 rows)
The cost settings have been pasted below.
select name,setting from pg_settings where name like '%cost';
name | setting
----------------------+---------
cpu_index_tuple_cost | 0.005
cpu_operator_cost | 0.0025
cpu_tuple_cost | 0.01
random_page_cost | 4
seq_page_cost | 1
(5 rows)
I'm looking for a solution which doesn't use sequential scan for the aforementioned query and tricks like setting sequential scan to off.
I tried to update the value of seq_page_cost to 20 but the query plan remained the same.

The problem here is that PostgreSQL thinks that there are enough rows that satisfy the condition, so it thinks it can be cheaper by sequentially fetching rows until it has 10 that match.
But there is not a single row that satisfies the condition, so the query ends up scanning the whole table, when an index scan would have found that much faster.
You can improve the quality of statistics collected for that column like this:
ALTER TABLE locations_test_idx
ALTER to_tsvector SET STATISTICS 10000;
Then run ANALYZE, and PostgreSQL will collect better statistics for that column, hopefully improving the query plan.

Related

Very slow postgres ORDER BY performance despite indices

Following tables:
Collections (around 100 records)
Events (related to collections, around 6000 per collection)
Sales (related to events, in total around 2m records)
I need to get all sales for a certain collection, sorted by either sales.timestamp DESC (datetime field) or sales.id DESC (since id is already in insertion order) but to filter by collection_id I need to join the events table in first, then filter by e.collection_id.
To help with this I created a separate index on timestamp: idx_sales_timestamp_desc (timestamp DESC NULLS LAST), alongside the usual pkey index on sales.id
EXPLAIN (analyze,buffers) SELECT * from sales s
LEFT JOIN events e ON s.sales_event = e.sales_event
WHERE e.collection_id = 9
ORDER BY s.id DESC -- identical results with s.timestamp
LIMIT 10;
Without the ORDER:
Limit (cost=0.85..196.61 rows=10 width=619) (actual time=0.069..2.416 rows=10 loops=1)
Buffers: shared hit=172 read=3
I/O Timings: read=1.810
-> Nested Loop (cost=0.85..122231.34 rows=6244 width=619) (actual time=0.068..2.413 rows=10 loops=1)
Buffers: shared hit=172 read=3
I/O Timings: read=1.810
-> Index Scan using idx_events_collection_id on events e (cost=0.43..32359.71 rows=9551 width=206) (actual time=0.027..0.074 rows=47 loops=1)
Index Cond: (collection_id = 9)
Buffers: shared hit=24
-> Index Scan using idx_sales_sales_event on sales s (cost=0.42..9.39 rows=2 width=413) (actual time=0.049..0.049 rows=0 loops=47)
Index Cond: (sales_event = e.sales_event)
Buffers: shared hit=148 read=3
I/O Timings: read=1.810
Planning:
Buffers: shared hit=20
Planning Time: 0.418 ms
Execution Time: 2.444 ms
With the order:
Limit (cost=1001.00..3353.78 rows=10 width=619) (actual time=1084.650..2353.191 rows=10 loops=1)
Buffers: shared hit=81908 read=6967 dirtied=1930
I/O Timings: read=3732.771
-> Gather Merge (cost=1001.00..1470076.56 rows=6244 width=619) (actual time=1084.649..2352.683 rows=10 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=81908 read=6967 dirtied=1930
I/O Timings: read=3732.771
-> Nested Loop (cost=0.98..1468355.82 rows=2602 width=619) (actual time=622.297..1768.414 rows=6 loops=3)
Buffers: shared hit=81908 read=6967 dirtied=1930
I/O Timings: read=3732.771
-> Parallel Index Scan Backward using sales_pkey on sales s (cost=0.42..58907.93 rows=303693 width=413) (actual time=0.237..301.251 rows=6008 loops=3)
Buffers: shared hit=9958 read=1094 dirtied=1479
I/O Timings: read=513.609
-> Index Scan using events_pkey on events e (cost=0.55..4.64 rows=1 width=206) (actual time=0.243..0.243 rows=0 loops=18024)
Index Cond: (sales_event = s.sales_event)
Filter: (collection_id = 9)
Rows Removed by Filter: 0
Buffers: shared hit=71950 read=5873 dirtied=451
I/O Timings: read=3219.161
Planning:
Buffers: shared hit=20
Planning Time: 0.268 ms
Execution Time: 2354.905 ms
sales DDL:
-- Table Definition ----------------------------------------------
CREATE TABLE sales (
id BIGSERIAL PRIMARY KEY,
transaction text,
sales_event text,
price bigint,
name text,
timestamp timestamp with time zone,
);
-- Indices -------------------------------------------------------
CREATE UNIQUE INDEX sales_pkey ON sales(id int8_ops);
CREATE INDEX idx_sales_sales_event ON sales(sales_event text_ops);
CREATE INDEX idx_sales_timestamp_desc ON sales(timestamp timestamptz_ops DESC);
Events DDL:
-- Table Definition ----------------------------------------------
CREATE TABLE events (
created_at timestamp with time zone,
updated_at timestamp with time zone,
sales_event text PRIMARY KEY,
collection_id bigint REFERENCES collections(id) REFERENCES collections(id),
);
-- Indices -------------------------------------------------------
CREATE UNIQUE INDEX events_pkey ON events(sales_event text_ops);
Without the ORDER BY, I'm at around 500ms. With the ORDER BY, it easily ends up in the 2s-3m or longer category, depending on DB load, despite all indices being used according to the explain.
The default order when omitting ORDER BY altogether is not the one I want. I kept ANALYZE up to date as well.
How do I solve this in a good way?

Postgres uses Hash Join with Seq Scan when Inner Select Index Cond is faster

Postgres is using a much heavier Seq Scan on table tracking when an index is available. The first query was the original attempt, which uses a Seq Scan and therefore has a slow query. I attempted to force an Index Scan with an Inner Select, but postgres converted it back to effectively the same query with nearly the same runtime. I finally copied the list from the Inner Select of query two to make the third query. Finally postgres used the Index Scan, which dramatically decreased the runtime. The third query is not viable in a production environment. What will cause postgres to use the last query plan?
(vacuum was used on both tables)
Tables
tracking (worker_id, localdatetime) total records: 118664105
project_worker (id, project_id) total records: 12935
INDEX
CREATE INDEX tracking_worker_id_localdatetime_idx ON public.tracking USING btree (worker_id, localdatetime)
Queries
SELECT worker_id, localdatetime FROM tracking t JOIN project_worker pw ON t.worker_id = pw.id WHERE project_id = 68475018
Hash Join (cost=29185.80..2638162.26 rows=19294218 width=16) (actual time=16.912..18376.032 rows=177681 loops=1)
Hash Cond: (t.worker_id = pw.id)
-> Seq Scan on tracking t (cost=0.00..2297293.86 rows=118716186 width=16) (actual time=0.004..8242.891 rows=118674660 loops=1)
-> Hash (cost=29134.80..29134.80 rows=4080 width=8) (actual time=16.855..16.855 rows=2102 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 115kB
-> Seq Scan on project_worker pw (cost=0.00..29134.80 rows=4080 width=8) (actual time=0.004..16.596 rows=2102 loops=1)
Filter: (project_id = 68475018)
Rows Removed by Filter: 10833
Planning Time: 0.192 ms
Execution Time: 18382.698 ms
SELECT worker_id, localdatetime FROM tracking t WHERE worker_id IN (SELECT id FROM project_worker WHERE project_id = 68475018 LIMIT 500)
Hash Semi Join (cost=6905.32..2923969.14 rows=27733254 width=24) (actual time=19.715..20191.517 rows=20530 loops=1)
Hash Cond: (t.worker_id = project_worker.id)
-> Seq Scan on tracking t (cost=0.00..2296948.27 rows=118698327 width=24) (actual time=0.005..9184.676 rows=118657026 loops=1)
-> Hash (cost=6899.07..6899.07 rows=500 width=8) (actual time=1.103..1.103 rows=500 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 28kB
-> Limit (cost=0.00..6894.07 rows=500 width=8) (actual time=0.006..1.011 rows=500 loops=1)
-> Seq Scan on project_worker (cost=0.00..28982.65 rows=2102 width=8) (actual time=0.005..0.968 rows=500 loops=1)
Filter: (project_id = 68475018)
Rows Removed by Filter: 4493
Planning Time: 0.224 ms
Execution Time: 20192.421 ms
SELECT worker_id, localdatetime FROM tracking t WHERE worker_id IN (322016383,316007840,...,285702579)
Index Scan using tracking_worker_id_localdatetime_idx on tracking t (cost=0.57..4766798.31 rows=21877360 width=24) (actual time=0.079..29.756 rows=22112 loops=1)
" Index Cond: (worker_id = ANY ('{322016383,316007840,...,285702579}'::bigint[]))"
Planning Time: 1.162 ms
Execution Time: 30.884 ms
... is in place of the 500 id entries used in the query
Same query ran on another set of 500 id's
Index Scan using tracking_worker_id_localdatetime_idx on tracking t (cost=0.57..4776714.91 rows=21900980 width=24) (actual time=0.105..5528.109 rows=117838 loops=1)
" Index Cond: (worker_id = ANY ('{286237712,286237844,...,216724213}'::bigint[]))"
Planning Time: 2.105 ms
Execution Time: 5534.948 ms
The distribution of "worker_id" within "tracking" seems very skewed. For one thing, the number of rows in one of your instances of query 3 returns over 5 times as many rows as the other instance of it. For another, the estimated number of rows is 100 to 1000 times higher than the actual number. This can certainly lead to bad plans (although it is unlikely to be the complete picture).
What is the actual number of distinct values for worker_id within tracking: select count(distinct worker_id) from tracking? What does the planner think this value is: select n_distinct from pg_stats where tablename='tracking' and attname='worker_id'? If those values are far apart and you force the planner to use a more reasonable value with alter table tracking alter column worker_id set (n_distinct = <real value>); analyze tracking; does that change the plans?
If you want to nudge PostgreSQL towards a nested loop join, try the following:
Create an index on tracking that can be used for an index-only scan:
CREATE INDEX ON tracking (worker_id) INCLUDE (localdatetime);
Make sure that tracking is VACUUMed often, so that an index-only scan is effective.
Reduce random_page_cost and increase effective_cache_size so that the optimizer prices index scans lower (but don't use insane values).
Make sure that you have good estimates on project_worker:
ALTER TABLE project_worker ALTER project_id SET STATISTICS 1000;
ANALYZE project_worker;

Nested Loop Left Join cost too much time?

This is the query:
EXPLAIN (analyze, BUFFERS, SETTINGS)
SELECT
operation.id
FROM
operation
RIGHT JOIN(
SELECT uid, did FROM (
SELECT uid, did FROM operation where id = 993754
) t
) parts ON (operation.uid = parts.uid AND operation.did = parts.did)
and EXPLAIN info:
Nested Loop Left Join (cost=0.85..29695.77 rows=100 width=8) (actual time=13.709..13.711 rows=1 loops=1)
Buffers: shared hit=4905
-> Unique (cost=0.42..8.45 rows=1 width=16) (actual time=0.011..0.013 rows=1 loops=1)
Buffers: shared hit=5
-> Index Only Scan using oi on operation operation_1 (cost=0.42..8.44 rows=1 width=16) (actual time=0.011..0.011 rows=1 loops=1)
Index Cond: (id = 993754)
Heap Fetches: 1
Buffers: shared hit=5
-> Index Only Scan using oi on operation (cost=0.42..29686.32 rows=100 width=24) (actual time=13.695..13.696 rows=1 loops=1)
Index Cond: ((uid = operation_1.uid) AND (did = operation_1.did))
Heap Fetches: 1
Buffers: shared hit=4900
Settings: max_parallel_workers_per_gather = '4', min_parallel_index_scan_size = '0', min_parallel_table_scan_size = '0', parallel_setup_cost = '0', parallel_tuple_cost = '0', work_mem = '256MB'
Planning Time: 0.084 ms
Execution Time: 13.728 ms
Why does Nested Loop cost more and more time than sum of childs cost? What can I do for that? The Execution Time should less than 1 ms right?
update:
Nested Loop Left Join (cost=5.88..400.63 rows=101 width=8) (actual time=0.012..0.012 rows=1 loops=1)
Buffers: shared hit=8
-> Index Scan using oi on operation operation_1 (cost=0.42..8.44 rows=1 width=16) (actual time=0.005..0.005 rows=1 loops=1)
Index Cond: (id = 993754)
Buffers: shared hit=4
-> Bitmap Heap Scan on operation (cost=5.45..391.19 rows=100 width=24) (actual time=0.004..0.005 rows=1 loops=1)
Recheck Cond: ((uid = operation_1.uid) AND (did = operation_1.did))
Heap Blocks: exact=1
Buffers: shared hit=4
-> Bitmap Index Scan on ou (cost=0.00..5.42 rows=100 width=0) (actual time=0.003..0.003 rows=1 loops=1)
Index Cond: ((uid = operation_1.uid) AND (did = operation_1.did))
Buffers: shared hit=3
Settings: max_parallel_workers_per_gather = '4', min_parallel_index_scan_size = '0', min_parallel_table_scan_size = '0', parallel_setup_cost = '0', parallel_tuple_cost = '0', work_mem = '256MB'
Planning Time: 0.127 ms
Execution Time: 0.028 ms
Thanks all of you, when I split the index to btree(id) and btree(uid, did), everything's going perfect, but what caused those can not be used together? Any details or rules?
BTW, the sql is used for Real-Time Calculation, there are some Window Functions code didn't show here.
The Nested Loop does not take much time actually. The actual time of 13.709..13.711 means that it took 13.709 ms until the first row was ready to be emitted from this node and it took 0.002 ms until it was finished.
Note that the startup cost of 13.709 ms includes the cost of its two child nodes. Both of the child nodes need to emit at least one row before the nested loop can start.
The Unique child began emitting its first (and only) row after 0.011 ms. The Index Only Scan child however only started to emit its first (and only) row after 13.695 ms. This means that most of your actual time spent is in this Index Only Scan.
There is a great answer here which explains the costs and actual times in depth.
Also there is a nice tool at https://explain.depesz.com which calculates an inclusive and exclusive time for each node. Here it is used for your query plan which clearly shows that most of the time is spent in the Index Only Scan.
Since the query is spending almost all of the time in this index only scan, optimizations there will have the most benefit. Creating a separate index for the columns uid and did on the operation table should improve query time a lot.
CREATE INDEX operation_uid_did ON operation(uid, did);
The current execution plan contains 2 index only scans.
A slow one:
-> Index Only Scan using oi on operation (cost=0.42..29686.32 rows=100 width=24) (actual time=13.695..13.696 rows=1 loops=1)
Index Cond: ((uid = operation_1.uid) AND (did = operation_1.did))
Heap Fetches: 1
Buffers: shared hit=4900
And a fast one:
-> Index Only Scan using oi on operation operation_1 (cost=0.42..8.44 rows=1 width=16) (actual time=0.011..0.011 rows=1 loops=1)
Index Cond: (id = 993754)
Heap Fetches: 1
Buffers: shared hit=5
Both of them use the index oi but have different index conditions. Note how the fast one, who uses the id as index condition only needs to load 5 pages of data (Buffers: shared hit=5). The slow one needs to load 4900 pages instead (Buffers: shared hit=4900). This indicates that the index is optimized to query for id but not so much for uid and did. Probably the index oi covers all 3 columns id, uid, did in this order.
A multi-column btree index can only be used efficently when there are constraints in the query on the leftmost columns. The official documentation about multi-column indexes explains this very well in depth.
Why does Nested Loop cost more and more time than sum of childs cost?
Based on your example, it doesn't. Can you elaborate on what makes you think it does?
Anyway, it seems extravagant to visit 4900 pages to fetch 1 tuple. I'm guessing your tables are not getting vacuumed enough.
Although now I prefer Florian's suggestion, that "uid" and "did" are not the leading columns of the index, and that is why it is slow. It is basically doing a full index scan, using the index as a skinny version of the table. It is a shame that EXPLAIN output doesn't make it clear when a index is being used in this fashion, rather than the traditional "jump to a specific part of the index"
So you have a missing index.

limiting the results of a query slows the query

I have a PostgreSql 9.6 database used to record debug logs of an application. It contains 130 million records. The main field is a jsonb type using a GIN index.
If I perform a query like the following it executes quickly:
select id, logentry from inettklog where
logentry #> '{"instance":"1.3.46.670589.11.0.0.11.4.2.0.8743.5.5396.2006120114440692624"}'::jsonb;
Here is the explain analyze:
Bitmap Heap Scan on inettklog (cost=2938.03..491856.81 rows=137552 width=300) (actual time=10.610..12.644 rows=128 loops=1)
Recheck Cond: (logentry #> '{"instance": "1.3.46.670589.11.0.0.11.4.2.0.8743.5.5396.2006120114440692624"}'::jsonb)
Heap Blocks: exact=128
-> Bitmap Index Scan on inettklog_ix_logentry (cost=0.00..2903.64 rows=137552 width=0) (actual time=10.564..10.564 rows=128 loops=1)
Index Cond: (logentry #> '{"instance": "1.3.46.670589.11.0.0.11.4.2.0.8743.5.5396.2006120114440692624"}'::jsonb)
Planning time: 68.522 ms
Execution time: 12.720 ms
(7 rows)
But if I simply add a limit, it suddenly becomes very slow:
select id, logentry from inettklog where
logentry #> '{"instance":"1.3.46.670589.11.0.0.11.4.2.0.8743.5.5396.2006120114440692624"}'::jsonb
limit 20;
It now takes over 20 seconds!
Limit (cost=0.00..1247.91 rows=20 width=300) (actual time=0.142..37791.319 rows=20 loops=1)
-> Seq Scan on inettklog (cost=0.00..8582696.05 rows=137553 width=300) (actual time=0.141..37791.308 rows=20 loops=1)
Filter: (logentry #> '{"instance": "1.3.46.670589.11.0.0.11.4.2.0.8743.5.5396.2006120114440692624"}'::jsonb)
Rows Removed by Filter: 30825572
Planning time: 0.174 ms
Execution time: 37791.351 ms
(6 rows)
Here are the results when ORDER BY is included, even after setting enable_seqscan=off:
With no limit:
set enable_seqscan = off;
set enable_indexscan = on;
select id, date, logentry from inettklog where
logentry #> '{"instance":"1.3.46.670589.11.0.0.11.4.2.0.8743.5.5396.2006120114440692624"}'::jsonb
order by date;
The explain analyze:
Sort (cost=523244.24..523588.24 rows=137600 width=308) (actual time=48.196..48.219 rows=128 loops=1)
Sort Key: date
Sort Method: quicksort Memory: 283kB
-> Bitmap Heap Scan on inettklog (cost=2658.40..491746.00 rows=137600 width=308) (actual time=31.773..47.865 rows=128 loops=1)
Recheck Cond: (logentry #> '{"instance": "1.3.46.670589.11.0.0.11.4.2.0.8743.5.5396.2006120114440692624"}'::jsonb)
Heap Blocks: exact=128
-> Bitmap Index Scan on inettklog_ix_logentry (cost=0.00..2624.00 rows=137600 width=0) (actual time=31.550..31.550 rows=128 loops=1)
Index Cond: (logentry #> '{"instance": "1.3.46.670589.11.0.0.11.4.2.0.8743.5.5396.2006120114440692624"}'::jsonb)
Planning time: 0.181 ms
Execution time: 48.254 ms
(10 rows)
And now when we add the limit:
set enable_seqscan = off;
set enable_indexscan = on;
select id, date, logentry from inettklog where
logentry #> '{"instance":"1.3.46.670589.11.0.0.11.4.2.0.8743.5.5396.2006120114440692624"}'::jsonb
order by date
limit 20;
It now takes 90 seconds!!!
Limit (cost=0.57..4088.36 rows=20 width=308) (actual time=32017.438..98544.017 rows=20 loops=1)
-> Index Scan using inettklog_ix_logdate on inettklog (cost=0.57..28123416.21 rows=137597 width=308) (actual time=32017.437..98544.008 rows=20 loops=1)
Filter: (logentry #> '{"instance": "1.3.46.670589.11.0.0.11.4.2.0.8743.5.5396.2006120114440692624"}'::jsonb)
Rows Removed by Filter: 27829853
Planning time: 0.249 ms
Execution time: 98544.043 ms
(6 rows)
This is all very confusing! I want to be able to provide a utility to quickly query this database but it is all counter-intuitive.
Can anyone explain what is going on?
Can anyone explain the rules?
The estimates are way off. Try to run ANALYZE, possibly with an increased default_statistics_target.
Since PostgreSQL thinks there are so many results, it thinks that it is best off performing a sequential scan and stopping as soon as it has got enough results.
Using limit without indexing the column will slow it down as it will scan whole table then give you the result. So instead of doing that create a index on logentry then run the query with limit. It will give you much faster results.
You can check this answer for reference: PostgreSQL query very slow with limit 1

LIMIT with ORDER BY makes query slow

I am having problems optimizing a query in PostgreSQL 9.5.14.
select *
from file as f
join product_collection pc on (f.product_collection_id = pc.id)
where pc.mission_id = 7
order by f.id asc
limit 100;
Takes about 100 seconds. If I drop the limit clause it takes about 0.5:
With limit:
explain (analyze,buffers) ... -- query exactly as above
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.84..859.32 rows=100 width=457) (actual time=102793.422..102856.884 rows=100 loops=1)
Buffers: shared hit=222430592
-> Nested Loop (cost=0.84..58412343.43 rows=6804163 width=457) (actual time=102793.417..102856.872 rows=100 loops=1)
Buffers: shared hit=222430592
-> Index Scan using file_pkey on file f (cost=0.57..23409008.61 rows=113831736 width=330) (actual time=0.048..28207.152 rows=55858772 loops=1)
Buffers: shared hit=55652672
-> Index Scan using product_collection_pkey on product_collection pc (cost=0.28..0.30 rows=1 width=127) (actual time=0.001..0.001 rows=0 loops=55858772)
Index Cond: (id = f.product_collection_id)
Filter: (mission_id = 7)
Rows Removed by Filter: 1
Buffers: shared hit=166777920
Planning time: 0.803 ms
Execution time: 102856.988 ms
Without limit:
=> explain (analyze,buffers) ... -- query as above, just without limit
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=20509671.01..20526681.42 rows=6804163 width=457) (actual time=456.175..510.596 rows=142055 loops=1)
Sort Key: f.id
Sort Method: quicksort Memory: 79392kB
Buffers: shared hit=37956
-> Nested Loop (cost=0.84..16494851.02 rows=6804163 width=457) (actual time=0.044..231.051 rows=142055 loops=1)
Buffers: shared hit=37956
-> Index Scan using product_collection_mission_id_index on product_collection pc (cost=0.28..46.13 rows=87 width=127) (actual time=0.017..0.101 rows=87 loops=1)
Index Cond: (mission_id = 7)
Buffers: shared hit=10
-> Index Scan using file_product_collection_id_index on file f (cost=0.57..187900.11 rows=169535 width=330) (actual time=0.007..1.335 rows=1633 loops=87)
Index Cond: (product_collection_id = pc.id)
Buffers: shared hit=37946
Planning time: 0.807 ms
Execution time: 569.865 ms
I have copied the database to a backup server so that I may safely manipulate the database without something else changing it on me.
Cardinalities:
Table file: 113,831,736 rows.
Table product_collection: 1370 rows.
The query without LIMIT: 142,055 rows.
SELECT count(*) FROM product_collection WHERE mission_id = 7: 87 rows.
What I have tried:
searching stack overflow
vacuum full analyze
creating two column indexes on file.product_collection_id & file.id. (there already are single column indexes on every field touched.)
creating two column indexes on file.id & file.product_collection_id.
increasing the statistics on file.id & file.product_collection_id, then re-vacuum analyze.
changing various query planner settings.
creating non-materialized views.
walking up and down the hallway while muttering to myself.
None of them seem to change the performance in a significant way.
Thoughts?
UPDATE from OP:
Tested this on PostgreSQL 9.6 & 10.4, and found no significant changes in plans or performance.
However, setting random_page_cost low enough is the only way to get faster performance on the without limit search.
With a default random_page_cost = 4, the without limit:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=9270013.01..9287875.64 rows=7145054 width=457) (actual time=47782.523..47843.812 rows=145697 loops=1)
Sort Key: f.id
Sort Method: external sort Disk: 59416kB
Buffers: shared hit=3997185 read=1295264, temp read=7427 written=7427
-> Hash Join (cost=24.19..6966882.72 rows=7145054 width=457) (actual time=1.323..47458.767 rows=145697 loops=1)
Hash Cond: (f.product_collection_id = pc.id)
Buffers: shared hit=3997182 read=1295264
-> Seq Scan on file f (cost=0.00..6458232.17 rows=116580217 width=330) (actual time=0.007..17097.581 rows=116729984 loops=1)
Buffers: shared hit=3997169 read=1295261
-> Hash (cost=23.08..23.08 rows=89 width=127) (actual time=0.840..0.840 rows=87 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 15kB
Buffers: shared hit=13 read=3
-> Bitmap Heap Scan on product_collection pc (cost=4.97..23.08 rows=89 width=127) (actual time=0.722..0.801 rows=87 loops=1)
Recheck Cond: (mission_id = 7)
Heap Blocks: exact=10
Buffers: shared hit=13 read=3
-> Bitmap Index Scan on product_collection_mission_id_index (cost=0.00..4.95 rows=89 width=0) (actual time=0.707..0.707 rows=87 loops=1)
Index Cond: (mission_id = 7)
Buffers: shared hit=3 read=3
Planning time: 0.929 ms
Execution time: 47911.689 ms
User Erwin's answer below will take me some time to fully understand and generalize to all of the use cases needed. In the mean time we will probably use either a materialized view or just flatten our table structure.
This query is harder for the Postgres query planner than it might look. Depending on cardinalities, data distribution, value frequencies, sizes, ... completely different query plans can prevail and the planner has a hard time predicting which is best. Current versions of Postgres are better at this in several aspects, but it's still hard to optimize.
Since you retrieve only relatively few rows from product_collection, this equivalent query with LIMIT in a LATERAL subquery should avoid performance degradation:
SELECT *
FROM product_collection pc
CROSS JOIN LATERAL (
SELECT *
FROM file f -- big table
WHERE f.product_collection_id = pc.id
ORDER BY f.id
LIMIT 100
) f
WHERE pc.mission_id = 7
ORDER BY f.id
LIMIT 100;
Edit: This results in a query plan with explain (analyze,verbose) provided by the OP:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=30524.34..30524.59 rows=100 width=457) (actual time=13.128..13.167 rows=100 loops=1)
Buffers: shared hit=3213
-> Sort (cost=30524.34..30546.09 rows=8700 width=457) (actual time=13.126..13.152 rows=100 loops=1)
Sort Key: file.id
Sort Method: top-N heapsort Memory: 76kB
Buffers: shared hit=3213
-> Nested Loop (cost=0.57..30191.83 rows=8700 width=457) (actual time=0.060..9.868 rows=2880 loops=1)
Buffers: shared hit=3213
-> Seq Scan on product_collection pc (cost=0.00..69.12 rows=87 width=127) (actual time=0.024..0.336 rows=87 loops=1)
Filter: (mission_id = 7)
Rows Removed by Filter: 1283
Buffers: shared hit=13
-> Limit (cost=0.57..344.24 rows=100 width=330) (actual time=0.008..0.071 rows=33 loops=87)
Buffers: shared hit=3200
-> Index Scan using file_pc_id_index on file (cost=0.57..582642.42 rows=169535 width=330) (actual time=0.007..0.065 rows=33 loops=87)
Index Cond: (product_collection_id = pc.id)
Buffers: shared hit=3200
Planning time: 0.595 ms
Execution time: 13.319 ms
You need these indexes (will help your original query, too):
CREATE INDEX idx1 ON file (product_collection_id, id); -- crucial
CREATE INDEX idx2 ON product_collection (mission_id, id); -- helpful
You mentioned:
two column indexes on file.id & file.product_collection_id.
Etc. But we need it the other way round: id last. The order of index expressions is crucial. See:
Is a composite index also good for queries on the first field?
Rationale: With only 87 rows from product_collection, we only fetch a maximum of 87 x 100 = 8700 rows (fewer if not every pc.id has 100 rows in table file), which are then sorted before picking the top 100. Performance degrades with the number of rows you get from product_collection and with bigger LIMIT.
With the multicolumn index idx1 above, that's 87 fast index scans. The rest is not very expensive.
More optimization is possible, depending on additional information. Related:
Can spatial index help a “range - order by - limit” query