Postgresql - Rows removed by Index - postgresql

The table in question has a B-tree index on time
testdb=> explain analyze select avg(gl) from cdstest where time between 1407700790 and 1407711590;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1434716.75..1434716.76 rows=1 width=2) (actual time=20106.951..20106.952 rows=1 loops=1)
-> Bitmap Heap Scan on cdstest (cost=231261.49..1411280.42 rows=9374529 width=2) (actual time=811.495..10871.963 rows=9438824 loops=1)
Recheck Cond: (("time" >= 1407700790) AND ("time" <= 1407711590))
Rows Removed by Index Recheck: 204734
-> Bitmap Index Scan on timeindex (cost=0.00..228917.86 rows=9374529 width=0) (actual time=810.108..810.108 rows=9438824 loops=1)
Index Cond: (("time" >= 1407700790) AND ("time" <= 1407711590))
Total runtime: 20107.001 ms
(7 rows)
Rows Removed by Index Recheck: 204734 - What does this mean? This seems like a fairly arbitrary number.
Number of rows between the given time range:
testdb=> select count(*) from cdstest where time between 1407700790 and 1407711590;
count
---------
9438824
(1 row)
The table contains ~60million rows.

The inner Bitmap Index Scan node is producing a bitmap, putting 1 to all the places where records that match your search key are found, and 0 otherwise. As your table is quite big, the size of the bitmap is getting bigger, then available memory for these kind of operations, configured via work_mem, becomes small to keep the whole bitmap.
When in lack of a memory, inner node will start producing 1 not for records, but rather for blocks that are known to contain matching records. This means, that outer node Bitmap Heap Scan has to read all records from such block and re-check them. Obiously, there'll be some non-matching ones, and their number is what you see as Rows Removed by Index Recheck.
In the soon coming 9.4 a new feature is added, reporting how many exact and/or lossy pages where returned by the Bitmap Index Scan node. lossy are the ones you'd like to avoid. You can check more about this here.
Finally, consult your work_mem setting and try increasing it, just for this particular session.
I assume, that increasing by some 40% should be enough.
EDIT
I have 9.4beta3 running here, so I prepared a small show case:
DROP TABLE IF EXISTS tab;
SELECT id, id%10 mod
INTO tab
FROM generate_series(1,(1e7)::int) id;
CREATE INDEX i_tab_mod ON tab(mod);
VACUUM ANALYZE tab;
Now, I set work_mem to the minimal possible value and check it:
SET work_mem TO '64kB';
EXPLAIN (analyze, buffers)
SELECT * FROM tab WHERE mod=5;
EXPLAIN provides the following 2 rows:
Rows Removed by Index Recheck: 8896308
Heap Blocks: exact=510 lossy=43738
...
Execution time: 1356.938 ms
Which means, that 64kB can hold 510 exact blocks. So I calculate the total memory requirement here:
new_mem_in_bytes = (work_mem_in_bytes / exact) * lossy
= (( 64.0 * 1024 / 510 ) * 43738) / 1024
= 5488.7kB
This is not precise approach to calculate needed memory, in fact, but I think it is good enough for our needs. So I tried with SET work_mem TO '5MB':
Heap Blocks: exact=44248
...
Execution time: 283.466 ms

Related

Efficient full text search in PostgreSQL, sorting on another column

In PostgreSQL, how can one efficiently do a full text search on one column, sorting on another column?
Say I have a table tbl with columns a, b, c, ... and many (> a million) rows. I want to do a full text search on column a and sort the results by some other column.
So I create a tsvector va from column a,
ALTER TABLE tbl
ADD COLUMN va tsvector GENERATED ALWAYS AS (to_tsvector('english', a)) STORED;
create an index iva for that,
CREATE INDEX iva ON tbl USING GIN (va);
and an index ib for column b,
CREATE INDEX ib ON tbl (b);
Then I query like
SELECT * FROM tbl WHERE va ## to_tsquery('english', 'test') ORDER BY b LIMIT 100
Now the obvious execution strategy for Postgres would be:
for frequent words to do an Index Scan using ib, filtering for va ## 'test'::tsquery, and stopping after 100 matches,
while for rare words to do a (Bitmap) Index Scan using iva with condition
va ## 'test'::tsquery, and then to sort on b manually
However, Postgres' query planner seems not to take word frequency into account:
With a low LIMIT (e.g. 100) it always uses strategy 1 (as I checked with EXPLAIN), and in my case takes over a minute for rare (or not occurring) words. However, if I trick it into using strategy 2 by setting a large (or no) LIMIT, it returns in a millisecond!
The other way round, with a larger LIMIT (e.g. 200) it always uses strategy 2 which works well for rare words but is very slow for frequent words
So how do I get Postgres to use a good query plan in every case?
Since there seems currently no way to let Postgres to choose the right plan automatically,
how do I get the number of rows containing a specific lexeme so I can decide on the best strategy?
(SELECT COUNT(*) FROM tbl WHERE va ## to_tsquery('english', 'test') is horribly slow (~ 1 second for lexemes occurring in 10000 rows), and ts_stat seems also not to help, apart from building my own word frequency list)
how do I then tell Postgres to use this strategy?
Here a concrete example
I have a table items with 1.5 million rows, with a tsvector column v3 on which I do the text search, and a column rating on which I sort. In this case I determined the query planner always chooses strategy 1 if the LIMIT is 135 or less, else strategy 2
Here the EXPLAIN ANALYZE for the rare word 'aberdeen' (occurring in 132 rows) with LIMIT 135:
EXPLAIN (ANALYZE, BUFFERS) SELECT nm FROM items WHERE v3 ## to_tsquery('english', 'aberdeen')
ORDER BY rating DESC NULLS LAST LIMIT 135
Limit (cost=0.43..26412.78 rows=135 width=28) (actual time=5915.455..499917.390 rows=132 loops=1)
Buffers: shared hit=4444267 read=2219412
I/O Timings: read=485517.381
-> Index Scan using ir on items (cost=0.43..1429202.13 rows=7305 width=28) (actual time=5915.453..499917.242 rows=132 loops=1)
Filter: (v3 ## '''aberdeen'''::tsquery)"
Rows Removed by Filter: 1460845
Buffers: shared hit=4444267 read=2219412
I/O Timings: read=485517.381
Planning:
Buffers: shared hit=253
Planning Time: 1.270 ms
Execution Time: 499919.196 ms
and with LIMIT 136:
EXPLAIN (ANALYZE, BUFFERS) SELECT nm FROM items WHERE v3 ## to_tsquery('english', 'aberdeen')
ORDER BY rating DESC NULLS LAST LIMIT 136
Limit (cost=26245.53..26245.87 rows=136 width=28) (actual time=29.870..29.889 rows=132 loops=1)
Buffers: shared hit=57 read=83
I/O Timings: read=29.085
-> Sort (cost=26245.53..26263.79 rows=7305 width=28) (actual time=29.868..29.876 rows=132 loops=1)
Sort Key: rating DESC NULLS LAST
Sort Method: quicksort Memory: 34kB
Buffers: shared hit=57 read=83
I/O Timings: read=29.085
-> Bitmap Heap Scan on items (cost=88.61..25950.14 rows=7305 width=28) (actual time=1.361..29.792 rows=132 loops=1)
Recheck Cond: (v3 ## '''aberdeen'''::tsquery)"
Heap Blocks: exact=132
Buffers: shared hit=54 read=83
I/O Timings: read=29.085
-> Bitmap Index Scan on iv3 (cost=0.00..86.79 rows=7305 width=0) (actual time=1.345..1.345 rows=132 loops=1)
Index Cond: (v3 ## '''aberdeen'''::tsquery)"
Buffers: shared hit=3 read=2
I/O Timings: read=1.299
Planning:
Buffers: shared hit=253
Planning Time: 1.296 ms
Execution Time: 29.932 ms
and here for the frequent word 'game' (occurring in 240464 rows) with LIMIT 135:
EXPLAIN (ANALYZE, BUFFERS) SELECT nm FROM items WHERE v3 ## to_tsquery('english', 'game')
ORDER BY rating DESC NULLS LAST LIMIT 135
Limit (cost=0.43..26412.78 rows=135 width=28) (actual time=3.240..542.252 rows=135 loops=1)
Buffers: shared hit=2876 read=1930
I/O Timings: read=529.523
-> Index Scan using ir on items (cost=0.43..1429202.13 rows=7305 width=28) (actual time=3.239..542.216 rows=135 loops=1)
Filter: (v3 ## '''game'''::tsquery)
Rows Removed by Filter: 867
Buffers: shared hit=2876 read=1930
I/O Timings: read=529.523
Planning:
Buffers: shared hit=208 read=45
I/O Timings: read=15.626
Planning Time: 25.174 ms
Execution Time: 542.306 ms
and with LIMIT 136:
EXPLAIN (ANALYZE, BUFFERS) SELECT nm FROM items WHERE v3 ## to_tsquery('english', 'game')
ORDER BY rating DESC NULLS LAST LIMIT 136
Limit (cost=26245.53..26245.87 rows=136 width=28) (actual time=69419.656..69419.675 rows=136 loops=1)
Buffers: shared hit=1757820 read=457619
I/O Timings: read=65246.893
-> Sort (cost=26245.53..26263.79 rows=7305 width=28) (actual time=69419.654..69419.662 rows=136 loops=1)
Sort Key: rating DESC NULLS LAST
Sort Method: top-N heapsort Memory: 41kB
Buffers: shared hit=1757820 read=457619
I/O Timings: read=65246.893
-> Bitmap Heap Scan on items (cost=88.61..25950.14 rows=7305 width=28) (actual time=110.959..69326.343 rows=240464 loops=1)
Recheck Cond: (v3 ## '''game'''::tsquery)
Rows Removed by Index Recheck: 394527
Heap Blocks: exact=49894 lossy=132284
Buffers: shared hit=1757817 read=457619
I/O Timings: read=65246.893
-> Bitmap Index Scan on iv3 (cost=0.00..86.79 rows=7305 width=0) (actual time=100.537..100.538 rows=240464 loops=1)
Index Cond: (v3 ## '''game'''::tsquery)
Buffers: shared hit=1 read=60
I/O Timings: read=26.870
Planning:
Buffers: shared hit=253
Planning Time: 1.195 ms
Execution Time: 69420.399 ms
This is not easy to solve: full text search requires a GIN index, but a GIN index cannot support ORDER BY. Also, if you have a B-tree index for ORDER BY and a GIN index for the full text search, these can be combined using a bitmap index scan, but a bitmap index scan cannot support ORDER BY either.
I see a certain possibility if you create your own “stop word” list that contains all the frequent words in your data (in addition to the normal English stop words). Then you can define a text search dictionary that uses that stop word file and a text search configuration english_rare using that dictionary.
Then you could create your full text index using that configuration and query in two steps like this:
look for rare words:
SELECT *
FROM (SELECT *
FROM tbl
WHERE va ## to_tsquery('english_rare', 'test')
OFFSET 0) AS q
ORDER BY b LIMIT 100;
The subquery with OFFSET 0 will keep the optimizer from scanning the index on b.
For rare words, this will return the correct result quickly. For frequent words, this will return nothing, since to_tsquery will return an empty result. To distinguish between a miss because the word does not occur and a miss because the word is frequent, watch for the following notice:
NOTICE: text-search query contains only stop words or doesn't contain lexemes, ignored
look for frequent words (if the first query gave you the notice):
SELECT *
FROM (SELECT *
FROM tbl
ORDER BY b) AS q
WHERE va ## to_tsquery('english', 'test')
LIMIT 100;
Note that we use the normal English configuration here. This will always scan the index on b and will be reasonably fast for frequent search terms.
Solution for my scenario, which I think will work well for many real-world cases:
Let Postgres use the "rare-word strategy" (2. in the question) always or mostly. The reason is that there always should be the possibility for a user to sort by relevance (e.g. using ts_rank), in which case the other strategy cannot be used, so one has to make sure that the "rare-word strategy" is fast enough for all searches anyway.
To force Postgres to use this strategy one can use a subquery, as Laurenz Albe has pointed out:
SELECT * FROM
(SELECT * FROM tbl WHERE va ## to_tsquery('english', 'test') OFFSET 0) AS q
ORDER BY b LIMIT 100;
Alternatively one can simply set LIMIT sufficiently high (while only fetching as many results as needed).
I could achieve sufficient performance (nearly all queries take < 1 second) by
doing each search first against a smaller ts_vector containing the most relevant parts of each document (e.g. title and summary), and checking the full document only if this first query yields not enough results.
specially treating words occurring very frequently, e.g. only allowing them in AND-combination with other words (adding them to the stop words is problematic since those are not sensibly treated when occurring in phrases for example)
increasing RAM and increasing shared_buffers so the whole table can be cached (8.5 GB for me currently)
For cases where these optimizations are not enough, to achieve better performance for all queries (i.e. also those sorting by relevance, which are the hardest), I think one would have to use a more sophisticated text search index instead of GIN. There is the RUM index extension which looks promising, but I haven't tried it yet.
PS: Contrary to my observation in the question I have now found that under certain circumstances the planner does take word frequency into account and makes decisions in the right direction:
For rare words the borderline LIMIT above which it chooses the "rare-word strategy" is lower than for frequent words, and in a certain range this choice seems very good. However this is in no way reliable and sometimes the choice is very wrong, e.g. for low LIMITs it chooses the "frequent-word strategy" also for very rare or non-occurring words which leads to awful slowness.
It appears to depend on many factors and seems not predictable.

Optimizing recent event search and cache usage using `random_page_cost` on postgres

I have a table where I store information about a customer and the timestamp, and time range of an event.
The indices I use look as follows:
event_index (customer_id, time)
state_index (customer_id, end, start desc)
The vast majority of queries query the last few days about state and events.
This is a sample query text (events have identical an identical problem as I'll describe for states):
SELECT "states".*
FROM "states"
WHERE ("states"."customer_id" = $1 AND "states"."start" < $2)
AND ("states"."end" IS NULL OR "states"."end" > $3)
AND ("states"."obsolete" = $4)
ORDER BY "states"."start" DESC
I see that sometimes the query planner only uses only the customer_id to filter, and then filters using a heap scan all rows for the customer:
Sort (cost=103089.00..103096.17 rows=2869 width=78)
Sort Key: start DESC
-> Bitmap Heap Scan on states (cost=1222.56..102924.23 rows=2869 width=78)
Recheck Cond: (customer_id = '----'::bpchar)
Filter: ((NOT obsolete) AND ((start)::double precision < '1557711009'::double precision) AND ((end IS NULL) OR ((end)::double precision > '1557666000'::double precision)))
-> Bitmap Index Scan on states_index (cost=0.00..1221.85 rows=26820 width=0)
Index Cond: (customer_id = '----'::bpchar)
This is in contrast to what I see in a session manually:
Sort Key: start DESC
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on states (cost=111.12..9338.04 rows=1 width=78) (actual time=141.674..141.674 rows=0 loops=1)
Recheck Cond: (((customer_id = '-----'::bpchar) AND (end IS NULL) AND (start < '1557349200'::numeric)) OR ((customer_id = '----'::bpchar) AND (end > '1557249200'::numeric) AND (start < '1557349200'::numeric)))
Filter: ((NOT obsolete) AND ((title)::text = '---'::text))
Rows Removed by Filter: 112
Heap Blocks: exact=101
-> BitmapOr (cost=111.12..111.12 rows=2333 width=0) (actual time=4.198..4.198 rows=0 loops=1)
-> Bitmap Index Scan on states_index (cost=0.00..4.57 rows=1 width=0) (actual time=0.086..0.086 rows=0 loops=1)
Index Cond: ((customer_id = '----'::bpchar) AND (end IS NULL) AND (start < '1557349200'::numeric))
-> Bitmap Index Scan on state_index (cost=0.00..106.55 rows=2332 width=0) (actual time=4.109..4.109 rows=112 loops=1)
Index Cond: ((customer_id = '---'::bpchar) AND (end > '1557262800'::numeric) AND (start < '1557349200'::numeric))
In other words - the query planner sometimes chooses to use only the first column of the index which slows the query significantly.
I can see why it makes sense to just bring the entire customer data when its small enough and filter in memory, but the problem is this data is very sparse and is probably not entirely cached (data from a year ago is probably not cached for the customer, database is a few hundreds of GBs). If the index would use the timestamps to the fullest extent (as in the second example) - the result should be much faster since recent data is cached.
I used a partial index on the last week to see if the query time drops but postgres only uses it sometimes. This solves the problem when the partial index is used since old rows do not exist in that index - but sadly postgres still selects the bigger index even when it doesn't have to. I ran vacuum analyze but to no visible effect.
I tried to see the cache hits using this:
Database Name | Temporary files | Size of temporary files | Block Hits | Block Reads
------------------+-----------------+-------------------------+---------------+-------------
customers | 1922 | 18784440622 | 69553504584 | 2401546773
And then I calculated (block_hits/(block_hits + block_reads)):
>>> 69553504584.0 / (69553504584.0 + 2401546773.0)
0.9666243477322406
So this shows me ~96.6% cache (I want it much closer to 100 because I know the nature of the queries)
I also tried increasing statistics (SET STATISTICS) on the customer_id, start and end since it seemed to be a suggestion to people facing query planner issues. It didn't help as well (and I ran analyze after...).
After reading further about this issue I saw that there is a way to make the query planner prefer index scans using lower random_page_cost than the default (4). I also saw a post backing that here:
https://amplitude.engineering/how-a-single-postgresql-config-change-improved-slow-query-performance-by-50x-85593b8991b0
Does this make sense for my use case? Will it make the query planner use the index to the fullest more often (preferably always)?
If not - is there something else I can do to lower the query time? I know partitioning can be very effective but seems to be an overkill and is not fully supported on my current postgres version (9.5.9) as far as I can tell from what I've read.
Update: After lowering random_page_cost I don't see a conclusive difference. There are still times where the query planner chooses to only use part of the. index for a much slower result.
Any suggestions are very welcome.
Thanks :)

PostgreSQL graph neighbors query slow

EDIT In my original question I noticed a difference between searching for neighbors using a JOIN and using a WHERE .. IN clause, which #LukaszSzozda rightfully pointed out is a semi-join. It turns out my node list had duplicates, which explains why the JOIN took longer to run. Thanks, #LukaszSzozda. The more important aspect of my question remains, though, which is what is brought below. UPDATE I added the relevant configuration options to the bottom, and updated statistics using ANALYZE (thanks to #joop). Also, I tested with three different indices (B-Tree, hash, BRIN). Finally, I noticed that using different queries returned different number of rows into tmp_nodes, possibly because of different ordering, so I fixed it to a constant set of rather-random 8,000 nodes .
In PostgreSQL, my query to search for neighbors of 8,000 nodes among ~200*106 nodes (within ~1.3*109 edges) is slow (~30 seconds using hash index; see index benchmarking below).
Given the setup I describe below, are there further optimizations for my server software, database, table or query to make the neighbor search faster? I am particularly surprised at this speed considering how well PostgreSQL did on the ArangoDB NoSQL benchmark.
More specifically:
I am aware of AgnesGraph, but do not wish yet to move to a graph-database solution, specifically since I cannot tell from AgnesGraph how well it keeps up-to-date with PostgreSQL. Can someone explain the performance benefits with regard to how the query actually happens in AgnesGraph vs PostgreSQL, so that I can decide whether to migrate?
Are there any configuration tweaks, whether in the server or the OS, that affect my query according to the plan, which cause it to run for longer than needed?
Set up
I have a large graph database ( ~109 edges, ~200*106 nodes) in PostgreSQL (PostgreSQL 10.1, which I had to pull from the zesty PPA) stored on the cloud (DigitalOcean, 6-core, 16GB RAM machine, Ubuntu 17.10, Intel(R) Xeon(R) CPU E5-2650 v4 # 2.20GHz), and set up with parameters suggested by PGTune (see bottom). I am querying on-server.
I have created forward- and backward-edges tables (see this question)
CREATE TABLE edges_fwd (src BIGINT, dest BIGINT, PRIMARY KEY (src, dest));
CREATE TABLE edges_back (src BIGINT, dest BIGINT, PRIMARY KEY (dest, src));
and clustered both by the respective keys (just in case):
CLUSTER edges_fwd USING edges_fwd_pkey;
CLUSTER edges_back USING edges_back_pkey;
I turned off enabled_seqscan for the purpose of testing my queries (see side note below).
I would like to load all the out-edges for 8,000 nodes (which 8,000 nodes these are can change depending on user query), whose identifiers are list in a table tmp_nodes (with a single column, nid). I initially wrote this version on the query (patting myself in the back on already following the lines of the graph talk from PGCon11):
SELECT e.*
FROM tmp_nodes
JOIN edges AS e
ON e.src = tmp_nodes.nid;
I also tried:
SELECT * FROM edges_fwd AS e
WHERE e.src IN (SELECT nid FROM tmp_nodes);
They both are slow, and take about 30 seconds to run at best (using hash indicies). EXPLAIN ANALYZE outputs are brought below.
I expected things to generally run much faster. For looking for 8,000 keys in a clustered table (yes, I know it's not really a clustered index), since the server knows that the rows are ordered, I should expect less page reads than the number of total rows returned. So while 243,708 rows are fetched, which isn't a little, they are associated with 8,000 distinct keys, and the number of reads should not be much larger than that: it's an average of 30 rows per key, which is about 1,400 bytes per read (the table size is 56GB and has 1.3B rows, so it's about 46 bytes per row; which by the way is quite a bloat for 16 bytes of data). This is far below the page size (4K) for the system. I didn't think reading 8,000 pages, even random-access, should take this long.
This brings me back to my questions (above).
Forcing index usage
I took advice from answers to another question and at least for testing (though, since my database is read-only, I might be tempted to use it in production), set enable_seqscan to off, in order to force index usage.
I ran each 5 times - the times varied by a few seconds here and there.
EXPLAIN ANALYZE outputs
Taking care to flush OS disk cached and restart the server to reflect correct random-seek timings, I used EXPLAIN ANALYZE on both queries. I used two types of indexes - B-Tree and hash. I also tried BRIN with different values for the pages_per_range option (2, 8, 32 and 128), but they are all slower (in orders or magnitude) than those mentioned above. I am giving the results below for reference.
B-Tree index, JOIN query:
Nested Loop (cost=10000000000.58..10025160709.50 rows=15783833 width=16) (actual time=4.546..39152.408 rows=243708 loops=1)
-> Seq Scan on tmp_nodes (cost=10000000000.00..10000000116.00 rows=8000 width=8) (actual time=0.712..15.721 rows=8000 loops=1)
-> Index Only Scan using edges_fwd_pkey on edges_fwd e (cost=0.58..3125.34 rows=1973 width=16) (actual time=4.565..4.879 rows=30 loops=8000)
Index Cond: (src = tmp_nodes.nid)
Heap Fetches: 243708
Planning time: 20.962 ms
Execution time: 39175.454 ms
B-Tree index, WHERE .. IN query (semi-join):
Nested Loop (cost=10000000136.58..10025160809.50 rows=15783833 width=16) (actual time=9.578..42605.783 rows=243708 loops=1)
-> HashAggregate (cost=10000000136.00..10000000216.00 rows=8000 width=8) (actual time=5.903..35.750 rows=8000 loops=1)
Group Key: tmp_nodes.nid
-> Seq Scan on tmp_nodes (cost=10000000000.00..10000000116.00 rows=8000 width=8) (actual time=0.722..2.695 rows=8000 loops=1
)
-> Index Only Scan using edges_fwd_pkey on edged_fwd e (cost=0.58..3125.34 rows=1973 width=16) (actual time=4.924..5.309 rows=30 loops=8000)
Index Cond: (src = tmp_nodes.nid)
Heap Fetches: 243708
Planning time: 19.126 ms
Execution time: 42629.084 ms
Hash index, JOIN query:
Nested Loop (cost=10000000051.08..10056052287.01 rows=15783833 width=16) (actual time=3.710..34131.371 rows=243708 loops=1)
-> Seq Scan on tmp_nodes (cost=10000000000.00..10000000116.00 rows=8000 width=8) (actual time=0.960..13.338 rows=8000 loops=1)
-> Bitmap Heap Scan on edges_fwd e (cost=51.08..6986.79 rows=1973 width=16) (actual time=4.086..4.250 rows=30 loops=8000)
Heap Blocks: exact=8094
-> Bitmap Index Scan on ix_edges_fwd_src_hash (cost=0.00..50.58 rows=1973 width=0) (actual time=2.563..2.563 rows=31
loops=8000)
Execution time: 34155.511 ms
Hash index, WHERE .. IN query (semi-join):
Nested Loop (cost=10000000187.08..10056052387.01 rows=15783833 width=16) (actual time=12.766..31834.767 rows=243708 loops=1)
-> HashAggregate (cost=10000000136.00..10000000216.00 rows=8000 width=8) (actual time=6.297..30.760 rows=8000 loops=1)
-> Seq Scan on tmp_nodes (cost=10000000000.00..10000000116.00 rows=8000 width=8) (actual time=0.883..3.108 rows=8000 loops=$
)
-> Bitmap Heap Scan on edges_fwd e (cost=51.08..6986.79 rows=1973 width=16) (actual time=3.768..3.958 rows=30 loops=8000)
Heap Blocks: exact=8094
-> Bitmap Index Scan on ix_edges_fwd_src_hash (cost=0.00..50.58 rows=1973 width=0) (actual time=2.340..2.340 rows=31
loops=8000)
Execution time: 31857.692 ms
postgresql.conf settings
I set the following configuration options as suggested by PGTune:
max_connections = 10
shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 500
random_page_cost = 4
effective_io_concurrency = 2
work_mem = 69905kB
min_wal_size = 4GB
max_wal_size = 8GB
max_worker_processes = 6
max_parallel_workers_per_gather = 3
max_parallel_workers = 6
You also need indexes to search in the reversed direction:
CREATE TABLE edges_fwd
(src BIGINT
, dest BIGINT
, PRIMARY KEY (src, dest)
);
CREATE UNIQUE INDEX ON edges_fwd(dest, src);
CREATE TABLE edges_back
(src BIGINT
, dest BIGINT
, PRIMARY KEY (dest, src)
);
CREATE UNIQUE INDEX ON edges_back(src, dest);
SELECT fwd.*
FROM edges_back AS bck
JOIN edges_fwd AS fwd
ON fwd.src = bck.src -- bck.src does not have a usable index
WHERE bck.dest = root_id;
The absence of this index causes the hashjoin (or: tablescan)
Also, you could maybe combine the two tables.
Also, you can force the src and dest columns to be NOT NULL
(a null would make no sense in a edges table)
, and make them FOREIGN KEYs to your nodes table:
CREATE TABLE nodes
(nid BIGINT NOT NULL PRIMARY KEY
-- ... more stuff...
);
CREATE TABLE edges_fwd
(src BIGINT NOT NULL REFERENCES nodes(nid)
, dest BIGINT NOT NULL REFERENCES nodes(nid)
, PRIMARY KEY (src, dest)
);
CREATE TABLE edges_back
(src BIGINT NOT NULL REFERENCES nodes(nid)
, dest BIGINT NOT NULL REFERENCES nodes(nid)
, PRIMARY KEY (dest, src)
);
INSERT INTO nodes(nid)
SELECT a
FROM generate_series(1,1000) a -- 1000 rows
;
INSERT INTO edges_fwd(src, dest)
SELECT a.nid, b.nid
FROM nodes a
JOIN nodes b ON random()< 0.1 --100K rows
;
INSERT INTO edges_back(src, dest)
SELECT a.nid, b.nid
FROM nodes a
JOIN nodes b ON random()< 0.1 --100K rows
;
This will result in this plan:
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
CREATE TABLE
CREATE TABLE
INSERT 0 1000
INSERT 0 99298
INSERT 0 99671
ANALYZE
ANALYZE
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.50..677.62 rows=9620 width=16) (actual time=0.086..5.299 rows=9630 loops=1)
-> Index Only Scan using edges_back_pkey on edges_back bck (cost=0.25..100.07 rows=97 width=8) (actual time=0.053..0.194 rows=96 loops=1)
Index Cond: (dest = 11)
Heap Fetches: 96
-> Index Only Scan using edges_fwd_pkey on edges_fwd fwd (cost=0.25..5.46 rows=99 width=16) (actual time=0.008..0.037 rows=100 loops=96)
Index Cond: (src = bck.src)
Heap Fetches: 9630
Planning time: 0.480 ms
Execution time: 5.836 ms
(9 rows)
It seems that random-access for this kind of setup is just this slow. Running a script to check random-access of 8,000 different, random 4K blocks within a large file takes nearly 30 seconds. Using Linux time and the linked script, I get in average something like 24 seconds:
File size: 8586524825 Read size: 4096
32768000 bytes read
real 0m24.076s
So it seems the assumption that random access should be quicker is wrong. Together with time taken to read the actual index, it means performance is at its peak without a hardware change. To improve performance, I will likely need to use a RAID set-up or a cluster. If a RAID set-up will improve performance in a close-to-linear fashion, I will accept my own answer.

Why is Postgres not using index on a simple GROUP BY?

I have created a 36M rows table with an index on type column:
CREATE TABLE items AS
SELECT
(random()*36000000)::integer AS id,
(random()*10000)::integer AS type,
md5(random()::text) AS s
FROM
generate_series(1,36000000);
CREATE INDEX items_type_idx ON items USING btree ("type");
I run this simple query and expect postgresql to use my index:
explain select count(*) from "items" group by "type";
But the query planner decides to use Seq Scan instead:
HashAggregate (cost=734592.00..734627.90 rows=3590 width=12) (actual time=6477.913..6478.344 rows=3601 loops=1)
Group Key: type
-> Seq Scan on items (cost=0.00..554593.00 rows=35999800 width=4) (actual time=0.044..1820.522 rows=36000000 loops=1)
Planning time: 0.107 ms
Execution time: 6478.525 ms
Time without EXPLAIN: 5s 979ms
I have tried several solutions from here and here:
Run VACUUM ANALYZE or VACUUM ANALYZE
Configure default_statistics_target, random_page_cost, work_mem
but nothing helps apart from setting enable_seqscan = OFF:
SET enable_seqscan = OFF;
explain select count(*) from "items" group by "type";
GroupAggregate (cost=0.56..1114880.46 rows=3590 width=12) (actual time=5.637..5256.406 rows=3601 loops=1)
Group Key: type
-> Index Only Scan using items_type_idx on items (cost=0.56..934845.56 rows=35999800 width=4) (actual time=0.074..2783.896 rows=36000000 loops=1)
Heap Fetches: 0
Planning time: 0.103 ms
Execution time: 5256.667 ms
Time without EXPLAIN: 659ms
Query with index scan is about 10x faster on my machine.
Is there a better solution than setting enable_seqscan?
UPD1
My postgresql version is 9.6.3, work_mem = 4MB (tried 64MB), random_page_cost = 4 (tried 1.1), max_parallel_workers_per_gather = 0 (tried 4).
UPD2
I have tried to fill type column not with random numbers, but with i / 10000 to make pg_stats.correlation = 1 - still seqscan.
UPD3
#jgh is 100% right:
This typically only happens when the table's row width is much wider than some indexes
I've made large column data and now postgres use index. Thanks everyone!
The Index-only scans wiki says
It is important to realise that the planner is concerned with
minimising the total cost of the query. With databases, the cost of
I/O typically dominates. For that reason, "count(*) without any
predicate" queries will only use an index-only scan if the index is
significantly smaller than its table. This typically only happens when
the table's row width is much wider than some indexes'.
and
Index-only scans are only used when the planner surmises that that
will reduce the total amount of I/O required, according to its
imperfect cost-based modelling. This all heavily depends on visibility
of tuples, if an index would be used anyway (i.e. how selective a
predicate is, etc), and if there is actually an index available that
could be used by an index-only scan in principle
Accordingly, your index is not considered "significantly smaller" and the entire dataset is to be read, which leads the planner in using a seq scan

How can I force postgres 9.4 to hit a gin full text index a little more predictably? See my query plan bug

POSTGRES 9.4 has been generating a pretty poor query plan for a full text query with LIMIT 10 at the end:
SELECT * FROM Tbl
WHERE to_tsvector('english'::regconfig, ginIndexedColumn) ## to_tsquery('rareword')
LIMIT 10
this generates:
"Limit (cost=0.00..306.48 rows=10 width=702) (actual time=5470.323..7215.486 rows=3 loops=1)"
" -> Seq Scan on tbl (cost=0.00..24610.69 rows=803 width=702) (actual time=5470.321..7215.483 rows=3 loops=1)"
" Filter: (to_tsvector('english'::regconfig, ginIndexedColumn) ## to_tsquery('rareword'::text))"
" Rows Removed by Filter: 609661"
"Planning time: 0.436 ms"
"Execution time: 7215.573 ms"
using an index defined by:
CREATE INDEX fulltext_idx
ON Tbl
USING gin
(to_tsvector('english'::regconfig, ginIndexedColumn));
and it takes 5 or 6 seconds to execute. Even LIMIT 12 is slow.
However, the same query with LIMIT 13 (the lowest limit that hits the index)
SELECT * FROM Tbl
WHERE to_tsvector('english'::regconfig, ginIndexedColumn) ## to_tsquery('rareword')
LIMIT 13
hits the index just fine and takes a few thousandths of a second. See output below:
"Limit (cost=350.23..392.05 rows=13 width=702) (actual time=2.058..2.062 rows=3 loops=1)"
" -> Bitmap Heap Scan on tbl (cost=350.23..2933.68 rows=803 width=702) (actual time=2.057..2.060 rows=3 loops=1)"
" Recheck Cond: (to_tsvector('english'::regconfig, ginIndexedColumn) ## to_tsquery('rareword'::text))"
" Heap Blocks: exact=2"
" -> Bitmap Index Scan on fulltext_idx (cost=0.00..350.03 rows=803 width=0) (actual time=2.047..2.047 rows=3 loops=1)"
" Index Cond: (to_tsvector('english'::regconfig, ginIndexedColumn) ## to_tsquery('rareword'::text))"
"Planning time: 0.324 ms"
"Execution time: 2.145 ms"
The reason why the query plan is poor is that the word is rare and there's only 2 or 3 records in the whole 610K record table that satisfy the query, meaning the sequential scan the query optimizer picks will have to scan the whole table before the limit is ever filled. The sequential scan would obviously be quite fast if the word is common because the limit would be filled in no time.
Obviously, this little bug is no big deal. I'll simply use Limit 13 instead of 10. What's three more items. But it took me so long to realize the limit clause might affect whether it hits the index. I'm worried that there might be other little surprises in store with other SQL functions that prevent the index from being hit. What I'm looking for is assistance in tweaking Postgres to hit the GIN index all the time instead of sometimes for this particular table.
I'm quite willing to forgo possibly cheaper queries if I could be satisfied that the index is always being hit. It's incredibly fast. I don't care to save any more microseconds.
Well, it's obviously an incorrect selectivity estimation. The planner thinks that to_tsvector('english'::regconfig, ginIndexedColumn) ## to_tsquery('rareword') predicate will result in 803 rows, but actually there are only 3.
To tweak PostgreSQL to use the index you can:
Rewrite the query, for example using CTE, to postpone application of LIMIT:
WITH t as (
SELECT * FROM Tbl
WHERE to_tsvector('english'::regconfig, ginIndexedColumn) ## to_tsquery('rareword')
)
SELECT * FROM t
LIMIT 10
Of course, it makes LIMIT absolutely inefficient. (But in case of GIN index it's anyway not as efficient as it may be, because GIN cannot fetch results tuple-by-tuple. Instead it returns all the TIDs at once using bitmap. See also gin_fuzzy_search_limit.)
Set enable_seqscan=off or increase seq_page_cost to discourage the planner from using sequential scans (doc).
It can however be undesirable if your query should use seqscans of other tables.
Use pg_hint_plan extension.
Increasing the cost-estimate of the to_tsvector function as described here will probably solve the problem. This cost will automatically be increased in the next release (9.5) so adopting that change early should be considered a rather safe tweak to make.