Why is Postgres not using index on a simple GROUP BY?

Why is Postgres not using index on a simple GROUP BY? - postgresql

I have created a 36M rows table with an index on type column:
CREATE TABLE items AS
SELECT
(random()*36000000)::integer AS id,
(random()*10000)::integer AS type,
md5(random()::text) AS s
FROM
generate_series(1,36000000);
CREATE INDEX items_type_idx ON items USING btree ("type");
I run this simple query and expect postgresql to use my index:
explain select count(*) from "items" group by "type";
But the query planner decides to use Seq Scan instead:
HashAggregate (cost=734592.00..734627.90 rows=3590 width=12) (actual time=6477.913..6478.344 rows=3601 loops=1)
Group Key: type
-> Seq Scan on items (cost=0.00..554593.00 rows=35999800 width=4) (actual time=0.044..1820.522 rows=36000000 loops=1)
Planning time: 0.107 ms
Execution time: 6478.525 ms
Time without EXPLAIN: 5s 979ms
I have tried several solutions from here and here:
Run VACUUM ANALYZE or VACUUM ANALYZE
Configure default_statistics_target, random_page_cost, work_mem
but nothing helps apart from setting enable_seqscan = OFF:
SET enable_seqscan = OFF;
explain select count(*) from "items" group by "type";
GroupAggregate (cost=0.56..1114880.46 rows=3590 width=12) (actual time=5.637..5256.406 rows=3601 loops=1)
Group Key: type
-> Index Only Scan using items_type_idx on items (cost=0.56..934845.56 rows=35999800 width=4) (actual time=0.074..2783.896 rows=36000000 loops=1)
Heap Fetches: 0
Planning time: 0.103 ms
Execution time: 5256.667 ms
Time without EXPLAIN: 659ms
Query with index scan is about 10x faster on my machine.
Is there a better solution than setting enable_seqscan?
UPD1
My postgresql version is 9.6.3, work_mem = 4MB (tried 64MB), random_page_cost = 4 (tried 1.1), max_parallel_workers_per_gather = 0 (tried 4).
UPD2
I have tried to fill type column not with random numbers, but with i / 10000 to make pg_stats.correlation = 1 - still seqscan.
UPD3
#jgh is 100% right:
This typically only happens when the table's row width is much wider than some indexes
I've made large column data and now postgres use index. Thanks everyone!

The Index-only scans wiki says
It is important to realise that the planner is concerned with
minimising the total cost of the query. With databases, the cost of
I/O typically dominates. For that reason, "count(*) without any
predicate" queries will only use an index-only scan if the index is
significantly smaller than its table. This typically only happens when
the table's row width is much wider than some indexes'.
and
Index-only scans are only used when the planner surmises that that
will reduce the total amount of I/O required, according to its
imperfect cost-based modelling. This all heavily depends on visibility
of tuples, if an index would be used anyway (i.e. how selective a
predicate is, etc), and if there is actually an index available that
could be used by an index-only scan in principle
Accordingly, your index is not considered "significantly smaller" and the entire dataset is to be read, which leads the planner in using a seq scan

Related

Slow postgres query even though it does bitmap index scan

I have a table with 4707838 rows. When I run the following query on this table it takes around 9 seconds to execute.
SELECT json_agg(
json_build_object('accessorId',
p."accessorId",
'mobile',json_build_object('enabled', p.mobile,'settings',
json_build_object('proximityAccess', p."proximity",
'tapToAccess', p."tapToAccess",
'clickToAccessRange', p."clickToAccessRange",
'remoteAccess',p."remote")
),'
card',json_build_object('enabled',p."card"),
'fingerprint',json_build_object('enabled',p."fingerprint"))
) AS permissions
FROM permissions AS p
WHERE p."accessPointId"=99
The output of explain analyze is as follows:
Aggregate (cost=49860.12..49860.13 rows=1 width=32) (actual time=9011.711..9011.712 rows=1 loops=1)
Buffers: shared read=29720
I/O Timings: read=8192.273
-> Bitmap Heap Scan on permissions p (cost=775.86..49350.25 rows=33991 width=14) (actual time=48.886..8704.470 rows=36556 loops=1)
Recheck Cond: ("accessPointId" = 99)
Heap Blocks: exact=29331
Buffers: shared read=29720
I/O Timings: read=8192.273
-> Bitmap Index Scan on composite_key_accessor_access_point (cost=0.00..767.37 rows=33991 width=0) (actual time=38.767..38.768 rows=37032 loops=1)
Index Cond: ("accessPointId" = 99)
Buffers: shared read=105
I/O Timings: read=32.592
Planning Time: 0.142 ms
Execution Time: 9012.719 ms
This table has a btree index on accessorId column and composite index on (accessorId,accessPointId).
Can anyone tell me what could be the reason for this query to be slow even though it uses an index?

Over 90% of the time is waiting to get data from disk. At 3.6 ms per read, that is pretty fast for a harddrive (suggesting that much of the data was already in the filesystem cache, or that some of the reads brought in neighboring data that was also eventually required--that is sequential reads not just random reads) but slow for a SSD.
If you set enable_bitmapscan=off and clear the cache (or pick a not recently used "accessPointId" value) what performance do you get?
How big is the table? If you are reading a substantial fraction of the table and think you are not getting as much benefit from sequential reads as you should be, you can try making your OSes readahead settings more aggressive. On Linux that is something like sudo blockdev --setra ...
You could put all columns referred to by the query into the index, to enable index-only scans. But given the number of columns you are using that might be impractical. You could want "accessPointId" to be the first column in the index. By the way, is the index currently used really on (accessorId,accessPointId)? It looks to me like "accessPointId" is really the first column in that index, not the 2nd one.
You could cluster the table by an index which has "accessPointId" as the first column. That would group the related records together for faster access. But note it is a slow operation and takes a strong lock on the table while it is running, and future data going into the table won't be clustered, only the current data.
You could try to increase effective_io_concurrency so that you can have multiple io requests outstanding at a time. How effective this is will depend on your hardware.

Slow distinct PostgreSQL query on nested jsonb field won't use index

I'm trying to get distinct values from a nested field on JSONB column, but it takes about 2 minutes on a 400K rows table.
The original query used DISTINCT but then I read that GROUP BY works better so tried this too, but no luck - still extremely slow.
Adding an index did not help either:
create index "orders_financial_status_index" on orders ((data ->'data'->> 'financial_status'));
ANALYZE EXPLAIN gave this result:
HashAggregate (cost=13431.16..13431.22 rows=4 width=32) (actual time=123074.941..123074.943 rows=4 loops=1)
Group Key: ((data -> 'data'::text) ->> 'financial_status'::text)
-> Seq Scan on orders (cost=0.00..12354.14 rows=430809 width=32) (actual time=2.993..122780.325 rows=434080 loops=1)
Planning time: 0.119 ms
Execution time: 123074.979 ms
It's worth mentioning that there are no null values on this column, and currently there are 4 unique values.
What should I do in order to query the distinct values faster?

No index will make this faster, because the query has to scan the whole table.
As you can see, the sequential scan uses almost all the time; the hash aggregate is fast.
Still I would not drop the index, because it allows PostgreSQL to estimate the number of groups accurately and decide on the more efficient hash aggregate rather than sorting the rows. You can try without the index to be sure.
However, two minutes for half a million rows is not very fast. Do you have slow storage? Is the table bloated? If the latter, VACUUM (FULL) should improve things.
You can speed up the query by reducing I/O. Load the table into RAM with pg_prewarm, then processing should be considerably faster.

How can I force postgres 9.4 to hit a gin full text index a little more predictably? See my query plan bug

POSTGRES 9.4 has been generating a pretty poor query plan for a full text query with LIMIT 10 at the end:
SELECT * FROM Tbl
WHERE to_tsvector('english'::regconfig, ginIndexedColumn) ## to_tsquery('rareword')
LIMIT 10
this generates:
"Limit (cost=0.00..306.48 rows=10 width=702) (actual time=5470.323..7215.486 rows=3 loops=1)"
" -> Seq Scan on tbl (cost=0.00..24610.69 rows=803 width=702) (actual time=5470.321..7215.483 rows=3 loops=1)"
" Filter: (to_tsvector('english'::regconfig, ginIndexedColumn) ## to_tsquery('rareword'::text))"
" Rows Removed by Filter: 609661"
"Planning time: 0.436 ms"
"Execution time: 7215.573 ms"
using an index defined by:
CREATE INDEX fulltext_idx
ON Tbl
USING gin
(to_tsvector('english'::regconfig, ginIndexedColumn));
and it takes 5 or 6 seconds to execute. Even LIMIT 12 is slow.
However, the same query with LIMIT 13 (the lowest limit that hits the index)
SELECT * FROM Tbl
WHERE to_tsvector('english'::regconfig, ginIndexedColumn) ## to_tsquery('rareword')
LIMIT 13
hits the index just fine and takes a few thousandths of a second. See output below:
"Limit (cost=350.23..392.05 rows=13 width=702) (actual time=2.058..2.062 rows=3 loops=1)"
" -> Bitmap Heap Scan on tbl (cost=350.23..2933.68 rows=803 width=702) (actual time=2.057..2.060 rows=3 loops=1)"
" Recheck Cond: (to_tsvector('english'::regconfig, ginIndexedColumn) ## to_tsquery('rareword'::text))"
" Heap Blocks: exact=2"
" -> Bitmap Index Scan on fulltext_idx (cost=0.00..350.03 rows=803 width=0) (actual time=2.047..2.047 rows=3 loops=1)"
" Index Cond: (to_tsvector('english'::regconfig, ginIndexedColumn) ## to_tsquery('rareword'::text))"
"Planning time: 0.324 ms"
"Execution time: 2.145 ms"
The reason why the query plan is poor is that the word is rare and there's only 2 or 3 records in the whole 610K record table that satisfy the query, meaning the sequential scan the query optimizer picks will have to scan the whole table before the limit is ever filled. The sequential scan would obviously be quite fast if the word is common because the limit would be filled in no time.
Obviously, this little bug is no big deal. I'll simply use Limit 13 instead of 10. What's three more items. But it took me so long to realize the limit clause might affect whether it hits the index. I'm worried that there might be other little surprises in store with other SQL functions that prevent the index from being hit. What I'm looking for is assistance in tweaking Postgres to hit the GIN index all the time instead of sometimes for this particular table.
I'm quite willing to forgo possibly cheaper queries if I could be satisfied that the index is always being hit. It's incredibly fast. I don't care to save any more microseconds.

Well, it's obviously an incorrect selectivity estimation. The planner thinks that to_tsvector('english'::regconfig, ginIndexedColumn) ## to_tsquery('rareword') predicate will result in 803 rows, but actually there are only 3.
To tweak PostgreSQL to use the index you can:
Rewrite the query, for example using CTE, to postpone application of LIMIT:
WITH t as (
SELECT * FROM Tbl
WHERE to_tsvector('english'::regconfig, ginIndexedColumn) ## to_tsquery('rareword')
)
SELECT * FROM t
LIMIT 10
Of course, it makes LIMIT absolutely inefficient. (But in case of GIN index it's anyway not as efficient as it may be, because GIN cannot fetch results tuple-by-tuple. Instead it returns all the TIDs at once using bitmap. See also gin_fuzzy_search_limit.)
Set enable_seqscan=off or increase seq_page_cost to discourage the planner from using sequential scans (doc).
It can however be undesirable if your query should use seqscans of other tables.
Use pg_hint_plan extension.

Increasing the cost-estimate of the to_tsvector function as described here will probably solve the problem. This cost will automatically be increased in the next release (9.5) so adopting that change early should be considered a rather safe tweak to make.

Postgresql - Rows removed by Index

The table in question has a B-tree index on time
testdb=> explain analyze select avg(gl) from cdstest where time between 1407700790 and 1407711590;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1434716.75..1434716.76 rows=1 width=2) (actual time=20106.951..20106.952 rows=1 loops=1)
-> Bitmap Heap Scan on cdstest (cost=231261.49..1411280.42 rows=9374529 width=2) (actual time=811.495..10871.963 rows=9438824 loops=1)
Recheck Cond: (("time" >= 1407700790) AND ("time" <= 1407711590))
Rows Removed by Index Recheck: 204734
-> Bitmap Index Scan on timeindex (cost=0.00..228917.86 rows=9374529 width=0) (actual time=810.108..810.108 rows=9438824 loops=1)
Index Cond: (("time" >= 1407700790) AND ("time" <= 1407711590))
Total runtime: 20107.001 ms
(7 rows)
Rows Removed by Index Recheck: 204734 - What does this mean? This seems like a fairly arbitrary number.
Number of rows between the given time range:
testdb=> select count(*) from cdstest where time between 1407700790 and 1407711590;
count
---------
9438824
(1 row)
The table contains ~60million rows.

The inner Bitmap Index Scan node is producing a bitmap, putting 1 to all the places where records that match your search key are found, and 0 otherwise. As your table is quite big, the size of the bitmap is getting bigger, then available memory for these kind of operations, configured via work_mem, becomes small to keep the whole bitmap.
When in lack of a memory, inner node will start producing 1 not for records, but rather for blocks that are known to contain matching records. This means, that outer node Bitmap Heap Scan has to read all records from such block and re-check them. Obiously, there'll be some non-matching ones, and their number is what you see as Rows Removed by Index Recheck.
In the soon coming 9.4 a new feature is added, reporting how many exact and/or lossy pages where returned by the Bitmap Index Scan node. lossy are the ones you'd like to avoid. You can check more about this here.
Finally, consult your work_mem setting and try increasing it, just for this particular session.
I assume, that increasing by some 40% should be enough.
EDIT
I have 9.4beta3 running here, so I prepared a small show case:
DROP TABLE IF EXISTS tab;
SELECT id, id%10 mod
INTO tab
FROM generate_series(1,(1e7)::int) id;
CREATE INDEX i_tab_mod ON tab(mod);
VACUUM ANALYZE tab;
Now, I set work_mem to the minimal possible value and check it:
SET work_mem TO '64kB';
EXPLAIN (analyze, buffers)
SELECT * FROM tab WHERE mod=5;
EXPLAIN provides the following 2 rows:
Rows Removed by Index Recheck: 8896308
Heap Blocks: exact=510 lossy=43738
...
Execution time: 1356.938 ms
Which means, that 64kB can hold 510 exact blocks. So I calculate the total memory requirement here:
new_mem_in_bytes = (work_mem_in_bytes / exact) * lossy
= (( 64.0 * 1024 / 510 ) * 43738) / 1024
= 5488.7kB
This is not precise approach to calculate needed memory, in fact, but I think it is good enough for our needs. So I tried with SET work_mem TO '5MB':
Heap Blocks: exact=44248
...
Execution time: 283.466 ms

Postgresql 9.x: Index to optimize `xpath_exists` (XMLEXISTS) queries

We have queries of the form
select sum(acol)
where xpath_exists('/Root/KeyValue[Key="val"]/Value//text()', xmlcol)
What index can be built to speed up the where clause ?
A btree index created using
create index idx_01 using btree(xpath_exists('/Root/KeyValue[Key="val"]/Value//text()', xmlcol))
does not seem to be used at all.
EDIT
Setting enable_seqscan to off, the query using xpath_exists is much faster (one order of magnitude) and clearly shows using the corresponding index (the btree index built with xpath_exists).
Any clue why PostgreSQL would not be using the index and attempt a much slower sequential scan ?
Since I do not want to disable sequential scanning globally, I am back to square one and I am happily welcoming suggestions.
EDIT 2 - Explain plans
See below - Cost of first plan (seqscan off) is slightly higher but processing time much faster
b2box=# set enable_seqscan=off;
SET
b2box=# explain analyze
Select count(*)
from B2HEAD.item
where cluster = 'B2BOX' and ( ( xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()', content) ) ) offset 0 limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=22766.63..22766.64 rows=1 width=0) (actual time=606.042..606.042 rows=1 loops=1)
-> Aggregate (cost=22766.63..22766.64 rows=1 width=0) (actual time=606.039..606.039 rows=1 loops=1)
-> Bitmap Heap Scan on item (cost=1058.65..22701.38 rows=26102 width=0) (actual time=3.290..603.823 rows=4085 loops=1)
Filter: (xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()'::text, content, '{}'::text[]) AND ((cluster)::text = 'B2BOX'::text))
-> Bitmap Index Scan on item_counter_01 (cost=0.00..1052.13 rows=56515 width=0) (actual time=2.283..2.283 rows=4085 loops=1)
Index Cond: (xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()'::text, content, '{}'::text[]) = true)
Total runtime: 606.136 ms
(7 rows)
plan on explain.depesz.com
b2box=# set enable_seqscan=on;
SET
b2box=# explain analyze
Select count(*)
from B2HEAD.item
where cluster = 'B2BOX' and ( ( xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()', content) ) ) offset 0 limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=22555.71..22555.72 rows=1 width=0) (actual time=10864.163..10864.163 rows=1 loops=1)
-> Aggregate (cost=22555.71..22555.72 rows=1 width=0) (actual time=10864.160..10864.160 rows=1 loops=1)
-> Seq Scan on item (cost=0.00..22490.45 rows=26102 width=0) (actual time=33.574..10861.672 rows=4085 loops=1)
Filter: (xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()'::text, content, '{}'::text[]) AND ((cluster)::text = 'B2BOX'::text))
Rows Removed by Filter: 108945
Total runtime: 10864.242 ms
(6 rows)
plan on explain.depesz.com

Planner cost parameters
Cost of first plan (seqscan off) is slightly higher but processing time much faster
This tells me that your random_page_cost and seq_page_cost are probably wrong. You're likely on storage with fast random I/O - either because most of the database is cached in RAM or because you're using SSD, SAN with cache, or other storage where random I/O is inherently fast.
Try:
SET random_page_cost = 1;
SET seq_page_cost = 1.1;
to greatly reduce the cost param differences and then re-run. If that does the job consider changing those params in postgresql.conf..
Your row-count estimates are reasonable, so it doesn't look like a planner mis-estimation problem or a problem with bad table statistics.
Incorrect query
Your query is also incorrect. OFFSET 0 LIMIT 1 without an ORDER BY will produce unpredictable results unless you're guaranteed to have exactly one match, in which case the OFFSET ... LIMIT ... clauses are unnecessary and can be removed entirely.
You're usually much better off phrasing such queries as SELECT max(...) or SELECT min(...) where possible; PostgreSQL will tend to be able to use an index to just pluck off the desired value without doing an expensive table scan or an index scan and sort.
Tips
BTW, for future questions the PostgreSQL wiki has some good information in the performance category and a guide to asking Slow query questions.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse