Query plan for a query with limit clause - postgresql

I used the EXPLAIN command in a PostgreSQL (version 11.12) DB to see the query plan of the query select col1, col2 from some_table limit 10 and I got the following: -
some_db=> EXPLAIN select col1, col2 from some_table limit 10;
QUERY PLAN
--------------------------------------------------------------------------
Limit (cost=0.00..0.32 rows=10 width=33)
-> Seq Scan on user_dim (cost=0.00..263325.95 rows=8106495 width=33)
(2 rows)
As per my understanding, the lower the step in a query plan, the earlier it is executed. But I noticed that this query plan first sequentially scans the entire table and then selects the first two rows. I was surprised to see this as I had expected that the limit clause would not let the full sequential scan happen.
I tried finding an answer to this in PostgreSQL documentation and found this:
"There are cases in which the actual and estimated values won't match up well, but nothing is really wrong. One such case occurs when plan node execution is stopped short by a LIMIT or similar effect. For example, in the LIMIT query we used before,
EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000 LIMIT 2;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.29..14.71 rows=2 width=244) (actual time=0.177..0.249 rows=2 loops=1)
-> Index Scan using tenk1_unique2 on tenk1 (cost=0.29..72.42 rows=10 width=244) (actual time=0.174..0.244 rows=2 loops=1)
Index Cond: (unique2 > 9000)
Filter: (unique1 < 100)
Rows Removed by Filter: 287
Planning time: 0.096 ms
Execution time: 0.336 ms
the estimated cost and row count for the Index Scan node are shown as though it were run to completion. But in reality, the Limit node stopped requesting rows after it got two, so the actual row count is only 2 and the run time is less than the cost estimate would suggest. This is not an estimation error, only a discrepancy in the way the estimates and true values are displayed."
What I understand from this is that this is just a display issue and this query plan won't be executed actually (i.e. only the number of rows specified in the limit clause would be fetched). Is my understanding correct or am I missing something here?
Thank you for reading this.

Related

Postgres: how do you optimize queries on date column with low selectivity?

I have a table with 143 million rows (and growing), its current size is 107GB. One of the columns in the table is of type date and it has low selectivity. For any given date, its reasonable to assume that there are somewhere between 0.5 to 4 million records with the same date value.
Now, if someone tries to do something like this:
select * from large_table where date_column > '2020-01-01' limit 100
It will execute "forever", and if you EXPLAIN ANALYZE it, you can see that its doing a table scan. So the first (and only so far) idea is to try and make this into an index scan. If Postgres can scan a subsection of an index and return the "limit" number of records, it sounds fast to me:
create index our_index_on_the_date_column ON large_table (date_column DESC);
VACUUM ANALYZE large_table;
EXPLAIN ANALYZE select * from large_table where date_column > '2020-01-01' limit 100;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..37.88 rows=100 width=893) (actual time=0.034..13.520 rows=100 loops=1)
-> Seq Scan on large_table (cost=0.00..13649986.80 rows=36034774 width=893) (actual time=0.033..13.506 rows=100 loops=1)
Filter: (date_column > '2020-01-01'::date)
Rows Removed by Filter: 7542
Planning Time: 0.168 ms
Execution Time: 18.412 ms
(6 rows)
It still reverts to a sequential scan. Please disregard the execution time as this took 11 minutes before caching came into action. We can force it to use the index, by reducing the number of returned columns to what's being covered by the index:
select date_column from large_table where date_column > '2019-01-15' limit 100
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..3.42 rows=100 width=4) (actual time=0.051..0.064 rows=100 loops=1)
-> Index Only Scan using our_index_on_the_date_column on large_table (cost=0.57..907355.11 rows=31874888 width=4) (actual time=0.050..0.056 rows=100 loops=1)
Index Cond: (date_column > '2019-01-15'::date)
Heap Fetches: 0
Planning Time: 0.082 ms
Execution Time: 0.083 ms
(6 rows)
But this is of course a contrived example, since the table is very wide and covering all parts of the table in the index is not feasible.
So, anyone who can share some guidance on how to get some performance when using columns with low selectivity as predicates?

use limit is very slow if data is none

the sql is very simple.
"orders_express_idx" btree (express). express is index.
works well. because express a is exists.
select * from orders where express = 'a' order by id desc limit 1;
Limit (cost=0.43..1.29 rows=1 width=119)
-> Index Scan Backward using orders_pkey on orders (cost=0.43..4085057.23 rows=4793692 width=119)
Filter: ((express)::text = 'a'::text)
works slow. data is nonexistent. and I use limit.
select * from orders where express = 'b' order by id desc limit 1;
Limit (cost=0.43..648.86 rows=1 width=119)
-> Index Scan Backward using orders_pkey on orders (cost=0.43..4085061.83 rows=6300 width=119)
Filter: ((express)::text = 'a'::text)
works well. data is nonexistent. but I didn't use limit.
select * from orders where express = 'b' order by id desc;
Sort (cost=24230.91..24246.66 rows=6300 width=119)
Sort Key: id
-> Index Scan using orders_express_idx on orders (cost=0.56..23833.35 rows=6300 width=119)
Index Cond: ((express)::text = 'a'::text)
https://www.postgresql.org/docs/9.6/static/using-explain.html
go to the seciotn with
Here is an example showing the effects of LIMIT:
and further:
This is the same query as above, but we added a LIMIT so that not all
the rows need be retrieved, and the planner changed its mind about
what to do. Notice that the total cost and row count of the Index Scan
node are shown as if it were run to completion. However, the Limit
node is expected to stop after retrieving only a fifth of those rows,
so its total cost is only a fifth as much, and that's the actual
estimated cost of the query. This plan is preferred over adding a
Limit node to the previous plan because the Limit could not avoid
paying the startup cost of the bitmap scan, so the total cost would be
something over 25 units with that approach.
So basically - yes. adding LIMIT changes the plan and thus it can become more effective for smaller data set (expected), but also the impact can be negative (depending on statistics and settings (scan cost, effective_cache_size and so on)...
If you give the execution plans for queries above we would explain WHAT happens. But basically this is documented behaviour - LIMIT changes the plan and thus execution time - yes.

Why is Postgres not using index on a simple GROUP BY?

I have created a 36M rows table with an index on type column:
CREATE TABLE items AS
SELECT
(random()*36000000)::integer AS id,
(random()*10000)::integer AS type,
md5(random()::text) AS s
FROM
generate_series(1,36000000);
CREATE INDEX items_type_idx ON items USING btree ("type");
I run this simple query and expect postgresql to use my index:
explain select count(*) from "items" group by "type";
But the query planner decides to use Seq Scan instead:
HashAggregate (cost=734592.00..734627.90 rows=3590 width=12) (actual time=6477.913..6478.344 rows=3601 loops=1)
Group Key: type
-> Seq Scan on items (cost=0.00..554593.00 rows=35999800 width=4) (actual time=0.044..1820.522 rows=36000000 loops=1)
Planning time: 0.107 ms
Execution time: 6478.525 ms
Time without EXPLAIN: 5s 979ms
I have tried several solutions from here and here:
Run VACUUM ANALYZE or VACUUM ANALYZE
Configure default_statistics_target, random_page_cost, work_mem
but nothing helps apart from setting enable_seqscan = OFF:
SET enable_seqscan = OFF;
explain select count(*) from "items" group by "type";
GroupAggregate (cost=0.56..1114880.46 rows=3590 width=12) (actual time=5.637..5256.406 rows=3601 loops=1)
Group Key: type
-> Index Only Scan using items_type_idx on items (cost=0.56..934845.56 rows=35999800 width=4) (actual time=0.074..2783.896 rows=36000000 loops=1)
Heap Fetches: 0
Planning time: 0.103 ms
Execution time: 5256.667 ms
Time without EXPLAIN: 659ms
Query with index scan is about 10x faster on my machine.
Is there a better solution than setting enable_seqscan?
UPD1
My postgresql version is 9.6.3, work_mem = 4MB (tried 64MB), random_page_cost = 4 (tried 1.1), max_parallel_workers_per_gather = 0 (tried 4).
UPD2
I have tried to fill type column not with random numbers, but with i / 10000 to make pg_stats.correlation = 1 - still seqscan.
UPD3
#jgh is 100% right:
This typically only happens when the table's row width is much wider than some indexes
I've made large column data and now postgres use index. Thanks everyone!
The Index-only scans wiki says
It is important to realise that the planner is concerned with
minimising the total cost of the query. With databases, the cost of
I/O typically dominates. For that reason, "count(*) without any
predicate" queries will only use an index-only scan if the index is
significantly smaller than its table. This typically only happens when
the table's row width is much wider than some indexes'.
and
Index-only scans are only used when the planner surmises that that
will reduce the total amount of I/O required, according to its
imperfect cost-based modelling. This all heavily depends on visibility
of tuples, if an index would be used anyway (i.e. how selective a
predicate is, etc), and if there is actually an index available that
could be used by an index-only scan in principle
Accordingly, your index is not considered "significantly smaller" and the entire dataset is to be read, which leads the planner in using a seq scan

How can I force postgres 9.4 to hit a gin full text index a little more predictably? See my query plan bug

POSTGRES 9.4 has been generating a pretty poor query plan for a full text query with LIMIT 10 at the end:
SELECT * FROM Tbl
WHERE to_tsvector('english'::regconfig, ginIndexedColumn) ## to_tsquery('rareword')
LIMIT 10
this generates:
"Limit (cost=0.00..306.48 rows=10 width=702) (actual time=5470.323..7215.486 rows=3 loops=1)"
" -> Seq Scan on tbl (cost=0.00..24610.69 rows=803 width=702) (actual time=5470.321..7215.483 rows=3 loops=1)"
" Filter: (to_tsvector('english'::regconfig, ginIndexedColumn) ## to_tsquery('rareword'::text))"
" Rows Removed by Filter: 609661"
"Planning time: 0.436 ms"
"Execution time: 7215.573 ms"
using an index defined by:
CREATE INDEX fulltext_idx
ON Tbl
USING gin
(to_tsvector('english'::regconfig, ginIndexedColumn));
and it takes 5 or 6 seconds to execute. Even LIMIT 12 is slow.
However, the same query with LIMIT 13 (the lowest limit that hits the index)
SELECT * FROM Tbl
WHERE to_tsvector('english'::regconfig, ginIndexedColumn) ## to_tsquery('rareword')
LIMIT 13
hits the index just fine and takes a few thousandths of a second. See output below:
"Limit (cost=350.23..392.05 rows=13 width=702) (actual time=2.058..2.062 rows=3 loops=1)"
" -> Bitmap Heap Scan on tbl (cost=350.23..2933.68 rows=803 width=702) (actual time=2.057..2.060 rows=3 loops=1)"
" Recheck Cond: (to_tsvector('english'::regconfig, ginIndexedColumn) ## to_tsquery('rareword'::text))"
" Heap Blocks: exact=2"
" -> Bitmap Index Scan on fulltext_idx (cost=0.00..350.03 rows=803 width=0) (actual time=2.047..2.047 rows=3 loops=1)"
" Index Cond: (to_tsvector('english'::regconfig, ginIndexedColumn) ## to_tsquery('rareword'::text))"
"Planning time: 0.324 ms"
"Execution time: 2.145 ms"
The reason why the query plan is poor is that the word is rare and there's only 2 or 3 records in the whole 610K record table that satisfy the query, meaning the sequential scan the query optimizer picks will have to scan the whole table before the limit is ever filled. The sequential scan would obviously be quite fast if the word is common because the limit would be filled in no time.
Obviously, this little bug is no big deal. I'll simply use Limit 13 instead of 10. What's three more items. But it took me so long to realize the limit clause might affect whether it hits the index. I'm worried that there might be other little surprises in store with other SQL functions that prevent the index from being hit. What I'm looking for is assistance in tweaking Postgres to hit the GIN index all the time instead of sometimes for this particular table.
I'm quite willing to forgo possibly cheaper queries if I could be satisfied that the index is always being hit. It's incredibly fast. I don't care to save any more microseconds.
Well, it's obviously an incorrect selectivity estimation. The planner thinks that to_tsvector('english'::regconfig, ginIndexedColumn) ## to_tsquery('rareword') predicate will result in 803 rows, but actually there are only 3.
To tweak PostgreSQL to use the index you can:
Rewrite the query, for example using CTE, to postpone application of LIMIT:
WITH t as (
SELECT * FROM Tbl
WHERE to_tsvector('english'::regconfig, ginIndexedColumn) ## to_tsquery('rareword')
)
SELECT * FROM t
LIMIT 10
Of course, it makes LIMIT absolutely inefficient. (But in case of GIN index it's anyway not as efficient as it may be, because GIN cannot fetch results tuple-by-tuple. Instead it returns all the TIDs at once using bitmap. See also gin_fuzzy_search_limit.)
Set enable_seqscan=off or increase seq_page_cost to discourage the planner from using sequential scans (doc).
It can however be undesirable if your query should use seqscans of other tables.
Use pg_hint_plan extension.
Increasing the cost-estimate of the to_tsvector function as described here will probably solve the problem. This cost will automatically be increased in the next release (9.5) so adopting that change early should be considered a rather safe tweak to make.

How to optimize BETWEEN condition on big table in PostgreSQL

I have a big table (about ten million rows) and I need to perform query with ? BETWEEN columnA AND columnB.
Script to create database with table and sample data:
CREATE DATABASE test;
\c test
-- Create test table
CREATE TABLE test (id INT PRIMARY KEY, range_start NUMERIC(12, 0), range_end NUMERIC(12, 0));
-- Fill the table with sample data
INSERT INTO test (SELECT value, value, value FROM (SELECT generate_series(1, 10000000) AS value) source);
-- Query I want to be optimized
SELECT * FROM test WHERE 5000000 BETWEEN range_start AND range_end;
I want to create INDEX so that PostgreSQL can do fast INDEX SCAN instead of SEQ SCAN. However I failed with my initial (and obvious) attempts:
CREATE INDEX test1 ON test (range_start, range_end);
CREATE INDEX test2 ON test (range_start DESC, range_end);
CREATE INDEX test3 ON test (range_end, range_start);
Also note that the number in the query is specifically chosen to be in the middle of generated values (otherwise PostgreSQL is able to recognize that the value is near range boundary and perform some optimizations).
Any ideas or thoughts would be appreciated.
UPDATE 1 Based on the official documentation it seems that PostgreSQL is not able to properly use indexes for multicolumn inequality conditions. I am not sure why there is such limitation and if there is anything I can do to significantly speed up the query.
UPDATE 2 One possible approach would be to limit the INDEX SCAN by knowing what is the largest range I have, lets say it is 100000:
SELECT * FROM test WHERE range_start BETWEEN 4900000 AND 5000000 AND range_end > 5000000;
Why don't you try a range with a gist index ?
alter table test add numr numrange;
update test set numr = numrange(range_start,range_end,'[]');
CREATE INDEX test_idx ON test USING gist (numr);
EXPLAIN ANALYZE SELECT * FROM test WHERE 5000000.0 <# numr;
Bitmap Heap Scan on public.test (cost=2367.92..130112.36 rows=50000 width=48) (actual time=0.150..0.151 rows=1 loops=1)
Output: id, range_start, range_end, numr
Recheck Cond: (5000000.0 <# test.numr)
-> Bitmap Index Scan on test_idx (cost=0.00..2355.42 rows=50000 width=0) (actual time=0.142..0.142 rows=1 loops=1)
Index Cond: (5000000.0 <# test.numr)
Total runtime: 0.189 ms
After a second thought it is quite obvious why PostgreSQL can not use multicolumn index for two-column inequality condition. However what I did not understand was why there is SEQ SCAN even with LIMIT clause (sorry for not expressing that in my question):
test=# EXPLAIN ANALYZE SELECT * FROM test WHERE 5000000 BETWEEN range_start AND range_end LIMIT 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.09 rows=1 width=16) (actual time=4743.035..4743.037 rows=1 loops=1)
-> Seq Scan on test (cost=0.00..213685.51 rows=2499795 width=16) (actual time=4743.032..4743.032 rows=1 loops=1)
Filter: ((5000000::numeric >= range_start) AND (5000000::numeric <= range_end))
Total runtime: 4743.064 ms
Then it hit me that PostgreSQL can not know that it is less probable that the result will be in range_start=1 than range_start=4999999. That is why it starts scanning from the first row until it finds matching row(s).
The solution might be to convince PostgreSQL that there is some benefit to using the index:
test=# EXPLAIN ANALYZE SELECT * FROM test WHERE 5000000 BETWEEN range_start AND range_end ORDER BY range_start DESC LIMIT 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1.53 rows=1 width=16) (actual time=0.102..0.103 rows=1 loops=1)
-> Index Scan Backward using test1 on test (cost=0.00..3667714.71 rows=2403325 width=16) (actual time=0.099..0.099 rows=1 loops=1)
Index Cond: ((5000000::numeric >= range_start) AND (5000000::numeric <= range_end))
Total runtime: 0.125 ms
Quite a performance boost I would say :). However still, this boost will only work if such range exists. Otherwise it will be as slow as SEQ SCAN. So it might be good to combine this approach with what I have outlined in my second update to the original question.