Postgtres query is not using index for column which is indexed - postgresql

Today I faced a strange issue. I have a big table in postgres (table T) which has many columns (col1..col100) and I have index I1 on (col2,col3,col4).
Now
explain select col2,col3 from T;
shows seq scan for the table, not using the index. I think it should be like Index only scan as we are selecting columns which are in index.
What could be the reason ?

If the SELECT returns more than approximately 5-10%(depends on configuration settings and the storage of the data as well. It's not a hard number) of all rows in the table, a sequential scan is much faster than an index scan.
Index scan requires several IO operations for each row (look up the row in the index, then retrieve the row from the heap). Sequential scan only requires a single IO for each row - or even less because a block (page) on the disk contains more than one row, so more than one row can be fetched with a single IO operation.
This is true for other DBMS as well - some optimizations as "index only scans" taken aside (but for a SELECT * it's highly unlikely such a DBMS would go for an "index only scan")

First of all read this and analyze. I think this may be helpful:
https://thoughtbot.com/blog/why-postgres-wont-always-use-an-index

Related

Postgres query performance in write-heavy database

I have a Postgres 9.6 database hosted on Heroku (standard-4 with 15GB cache).
One of my tables has almost 100M rows and that table gets maybe 3k inserts/updates per minute on average, but that can spike to 10x.
I have a few queries that run against that table that are basically counting rows with different columns having various values.
I also have a copy of the database that is currently getting no updates (basically idle).
Queries against the live database are much slower than queries on the copy and I don't know if this is due to the load/contention from the updates or poor query optimization on my part.
The values that the queries are counting are very common values.
select count(*)
from my_table
where a = '123'
and b = 'abc'
There are a lot of rows with the combination of 123 and abc. I have created a partial index for that:
create index my_table_abc_123_index
on my_table (a, b)
where a = '123' and b = 'abc'
This seems to get me an index only scan that's fast on the copy (~0.5s) but much slower on the live DB (~26s). The costs also vary quite a lot (copy first, live second):
Aggregate (cost=29751.39..29751.39 rows=1 width=8)
Aggregate (cost=179603.59..179603.59 rows=1 width=8)
Those both return a count of about 1.4M and the tables are about the same size.
It also seems that when the writes go up (10x) the degradation is significant.
It's also interesting that Postgres doesn't always choose my partial indexes, opting for a full composite index on (a, b), depending on the values passed for those columns.
I've read the Postgres indexing documentation and realize that for common values it's not always best to use the index (though Postgres is choosing an index in all these cases).
I'm wondering if this is the sort of difference I should be expecting between a live and idle database, and if I'm approaching creating indexes correctly for my workload (which is definitely write-heavy) or in general.
Some other details:
individual (full) column indexes seem much more expensive, but I've not tried individual column partial indexes.
the autovacuum threshold has been set pretty low (0.01) to deal with the dead rows due to lots of updates

How does postgres decide whether to use index scan or seq scan?

explain analyze shows that postgres will use index scanning for my query that fetches rows and performs filtering by date (i.e., 2017-04-14 05:27:51.039):
explain analyze select * from tbl t where updated > '2017-04-14 05:27:51.039';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Index Scan using updated on tbl t (cost=0.43..7317.12 rows=10418 width=93) (actual time=0.011..0.515 rows=1179 loops=1)
Index Cond: (updated > '2017-04-14 05:27:51.039'::timestamp without time zone)
Planning time: 0.102 ms
Execution time: 0.720 ms
however running the same query but with different date filter '2016-04-14 05:27:51.039' shows that postgres will run the query using seq scan instead:
explain analyze select * from tbl t where updated > '2016-04-14 05:27:51.039';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Seq Scan on tbl t (cost=0.00..176103.94 rows=5936959 width=93) (actual time=0.008..2005.455 rows=5871963 loops=1)
Filter: (updated > '2016-04-14 05:27:51.039'::timestamp without time zone)
Rows Removed by Filter: 947
Planning time: 0.100 ms
Execution time: 2910.086 ms
How does postgres decide on what to use, specifically when performing filtering by date?
The Postgres query planner bases its decisions on cost estimates and column statistics, which are gathered by ANALYZE and opportunistically by some other utility commands. That all happens automatically when autovacuum is on (by default).
The manual:
Most queries retrieve only a fraction of the rows in a table, due to
WHERE clauses that restrict the rows to be examined. The planner thus
needs to make an estimate of the selectivity of WHERE clauses, that
is, the fraction of rows that match each condition in the WHERE
clause. The information used for this task is stored in the
pg_statistic system catalog. Entries in pg_statistic are updated by
the ANALYZE and VACUUM ANALYZE commands, and are always approximate
even when freshly updated.
There is a row count (in pg_class), a list of most common values, etc.
The more rows Postgres expects to find, the more likely it will switch to a sequential scan, which is cheaper to retrieve large portions of a table.
Generally, it's index scan -> bitmap index scan -> sequential scan, the more rows are expected to be retrieved.
For your particular example, the important statistic is histogram_bounds, which give Postgres a rough idea how many rows have a greater value than the given one. There is the more convenient view pg_stats for the human eye:
SELECT histogram_bounds
FROM pg_stats
WHERE tablename = 'tbl'
AND attname = 'updated';
There is a dedicated chapter explaining row estimation in the manual.
Obviously, optimization of queries is tricky. This answer is not intended to dive into the specifics of the Postgres optimizer. Instead, it is intended to give you some background on how the decision to use an index is made.
Your first query is estimated to return 10,418 rows. When using an index, the following operations happen:
The engine uses the index to find the first value meeting the condition.
The engine then loops over the values, finishing when the condition is no longer true.
For each value in the index, the engine then looks up the data on the data page.
In other words, there is a little bit of overhead when using the index -- initializing the index and then looking up each data page individually.
When the engine does a full table scan it:
Starts with the first record on the first page
Does the comparison and accepts or rejects the record
Continues sequentially through all data pages
There is no additional overhead. Further, the engine can "pre-load" the next pages to be scanned while processing the current page. This overlap of I/O and processing is a big win.
The point I'm trying to make is that getting the balance between these two can be tricky. Somewhere between 10,418 and 5,936,959, Postgres decides that the index overhead (and fetching the pages randomly) costs more than just scanning the whole table.

Select query with offset limit is too much slow

I have read from internet resources that a query will be slow when the offset increases. But in my case I think its too much slow. I am using postgres 9.3
Here is the query (id is primary key):
select * from test_table offset 3900000 limit 100;
It returns me data in around 10 seconds. And I think its too much slow. I have around 4 million records in table. Overall size of the database is 23GB.
Machine configuration:
RAM: 12 GB
CPU: 2.30 GHz
Core: 10
Few values from postgresql.conf file which I have changed are as below. Others are default.
shared_buffers = 2048MB
temp_buffers = 512MB
work_mem = 1024MB
maintenance_work_mem = 256MB
dynamic_shared_memory_type = posix
default_statistics_target = 10000
autovacuum = on
enable_seqscan = off ## its not making any effect as I can see from Analyze doing seq-scan
Apart from these I have also tried by changing the values of random_page_cost = 2.0 and cpu_index_tuple_cost = 0.0005 and result is same.
Explain (analyze, buffers) result over the query is as below:
"Limit (cost=10000443876.02..10000443887.40 rows=100 width=1034) (actual time=12793.975..12794.292 rows=100 loops=1)"
" Buffers: shared hit=26820 read=378984"
" -> Seq Scan on test_table (cost=10000000000.00..10000467477.70 rows=4107370 width=1034) (actual time=0.008..9036.776 rows=3900100 loops=1)"
" Buffers: shared hit=26820 read=378984"
"Planning time: 0.136 ms"
"Execution time: 12794.461 ms"
How people around the world negotiates with this problem in postgres? Any alternate solution will be helpful for me as well.
UPDATE:: Adding order by id (tried with other indexed column as well) and here is the explain:
"Limit (cost=506165.06..506178.04 rows=100 width=1034) (actual time=15691.132..15691.494 rows=100 loops=1)"
" Buffers: shared hit=110813 read=415344"
" -> Index Scan using test_table_pkey on test_table (cost=0.43..533078.74 rows=4107370 width=1034) (actual time=38.264..11535.005 rows=3900100 loops=1)"
" Buffers: shared hit=110813 read=415344"
"Planning time: 0.219 ms"
"Execution time: 15691.660 ms"
It's slow because it needs to locate the top offset rows and scan the next 100. No amounts of optimization will change that when you're dealing with huge offsets.
This is because your query literally instruct the DB engine to visit lots of rows by using offset 3900000 -- that's 3.9M rows. Options to speed this up somewhat aren't many.
Super-fast RAM, SSDs, etc. will help. But you'll only gain by a constant factor in doing so, meaning it's merely kicking the can down the road until you reach a larger enough offset.
Ensuring the table fits in memory, with plenty more to spare will likewise help by a larger constant factor -- except the first time. But this may not be possible with a large enough table or index.
Ensuring you're doing index-only scans will work to an extent. (See velis' answer; it has a lot of merit.) The problem here is that, for all practical purposes, you can think of an index as a table storing a disk location and the indexed fields. (It's more optimized than that, but it's a reasonable first approximation.) With enough rows, you'll still be running into problems with a larger enough offset.
Trying to store and maintain the precise position of the rows is bound to be an expensive approach too.(This is suggested by e.g. benjist.) While technically feasible, it suffers from limitations similar to those that stem from using MPTT with a tree structure: you'll gain significantly on reads but will end up with excessive write times when a node is inserted, updated or removed in such a way that large chunks of the data needs to be updated alongside.
As is hopefully more clear, there isn't any real magic bullet when you're dealing with offsets this large. It's often better to look at alternative approaches.
If you're paginating based on the ID (or a date field, or any other indexable set of fields), a potential trick (used by blogspot, for instance) would be to make your query start at an arbitrary point in the index.
Put another way, instead of:
example.com?page_number=[huge]
Do something like:
example.com?page_following=[huge]
That way, you keep a trace of where you are in your index, and the query becomes very fast because it can head straight to the correct starting point without plowing through a gazillion rows:
select * from foo where ID > [huge] order by ID limit 100
Naturally, you lose the ability to jump to e.g. page 3000. But give this some honest thought: when was the last time you jumped to a huge page number on a site instead of going straight for its monthly archives or using its search box?
If you're paginating but want to keep the page offset by any means, yet another approach is to forbid the use of larger page number. It's not silly: it's what Google is doing with search results. When running a search query, Google gives you an estimate number of results (you can get a reasonable number using explain), and then will allow you to brows the top few thousand results -- nothing more. Among other things, they do so for performance reasons -- precisely the one you're running into.
I have upvoted Denis's answer, but will add a suggestion myself, perhaps it can be of some performance benefit for your specific use-case:
Assuming your actual table is not test_table, but some huge compound query, possibly with multiple joins. You could first determine the required starting id:
select id from test_table order by id offset 3900000 limit 1
This should be much faster than original query as it only requires to scan the index vs the entire table. Getting this id then opens up a fast index-search option for full fetch:
select * from test_table where id >= (what I got from previous query) order by id limit 100
You didn't say if your data is mainly read-only or updated often. If you can manage to create your table at one time, and only update it every now and then (say every few minutes) your problem will be easy to solve:
Add a new column "offset_id"
For your complete data set ordered by ID, create an offset_id simply by incrementing numbers: 1,2,3,4...
Instead of "offset ... limit 100" use "where offset_id >= 3900000 limit 100"
you can optimise in two steps
First get maximum id out of 3900000 records
select max(id) (select id from test_table order by id limit 3900000);
Then use this maximum id to get the next 100 records.
select * from test_table id > {max id from previous step) order by id limit 100 ;
It will be faster as both queries will do index scan by id.
This way you get the rows in semi-random order. You are not ordering the results in a query, so as a result, you get the data as it is stored in the files. The problem is that when you update the rows, the order of them can change.
To fix that you should add order by to the query. This way the query will return the rows in the same order. What's more then it will be able to use an index to speed the query up.
So two things: add an index, add order by to the query. Both to the same column. If you want to use the id column, then don't add index, just change the query to something like:
select * from test_table order by id offset 3900000 limit 100;
First, you have to define limit and offset with order by clause or you will get inconsistent result.
To speed up the query, you can have a computed index, but only for these condition :
Newly inserted data is strictly in id order
No delete nor update on column id
Here's how You can do it :
Create a row position function
create or replace function id_pos (id) returns bigint
as 'select count(id) from test_table where id <= $1;'
language sql immutable;
Create a computed index on id_pos function
create index table_by_pos on test_table using btree(id_pos(id));
Here's how You call it (offset 3900000 limit 100):
select * from test_table where id_pos(id) >= 3900000 and sales_pos(day) < 3900100;
This way, the query will not compute the 3900000 offset data, but only will compute the 100 data, making it much faster.
Please note the 2 conditions where this approach can take place, or the position will change.
I don't know all of the details of your data, but 4 million rows can be a little hefty. If there's a reasonable way to shard the table and essentially break it up into smaller tables it could be beneficial.
To explain this, let me use an example. let's say that I have a database where I have a table called survey_answer, and it's getting very large and very slow. Now let's say that these survey answers all come from a distinct group of clients (and I also have a client table keeping track of these clients). Then something I could do is I could make it so that I have a table called survey_answer that doesn't have any data in it, but is a parent table, and it has a bunch of child tables that actually contain the data the follow the naming format survey_answer_<clientid>, meaning that I'd have child tables survey_answer_1, survey_answer_2, etc., one for each client. Then when I needed to select data for that client, I'd use that table. If I needed to select data across all clients, I can select from the parent survey_answer table, but it will be slow. But for getting data for an individual client, which is what I mostly do, then it would be fast.
This is one example of how to break up data, and there are many others. Another example would be if my survey_answer table didn't break up easily by client, but instead I know that I'm typically only accessing data over a year period of time at once, then I could potentially make child tables based off of year, such as survey_answer_2014, survey_answer_2013, etc. Then if I know that I won't access more than a year at a time, I only really need to access maybe two of my child tables to get all the data I need.
In your case, all I've been given is perhaps the id. We can break it up by that as well (though perhaps not as ideal). Let's say that we break it up so that there's only about 1000000 rows per table. So our child tables would be test_table_0000001_1000000, test_table_1000001_2000000, test_table_2000001_3000000, test_table_3000001_4000000, etc. So instead of passing in an offset of 3900000, you'd do a little math first and determine that the table that you want is table test_table_3000001_4000000 with an offset of 900000 instead. So something like:
SELECT * FROM test_table_3000001_4000000 ORDER BY id OFFSET 900000 LIMIT 100;
Now if sharding the table is out of the question, you might be able to use partial indexes to do something similar, but again, I'd recommend sharding first. Learn more about partial indexes here.
I hope that helps. (Also, I agree with Szymon Guz that you want an ORDER BY).
Edit: Note that if you need to delete rows or selectively exclude rows before getting your result of 100, then sharding by id will become very hard to deal with (as pointed out by Denis; and sharding by id is not great to begin with). But if your 'just' paginating the data, and you only insert or edit (not a common thing, but it does happen; logs come to mind), then sharding by id can be done reasonably (though I'd still choose something else to shard on).
How about if paginate based on IDs instead of offset/limit?
The following query will give IDs which split all the records into chunks of size per_page. It doesn't depend on were records deleted or not
SELECT id AS from_id FROM (
SELECT id, (ROW_NUMBER() OVER(ORDER BY id DESC)) AS num FROM test_table
) AS rn
WHERE num % (per_page + 1) = 0;
With these from_IDs you can add links to the page. Iterate over :from_ids with index and add the following link to the page:
:from_id_index
When user visits the page retrieve records with ID which is greater than requested :from_id:
SELECT * FROM test_table WHERE ID >= :from_id ORDER BY id DESC LIMIT :per_page
For the first page link with from_id=0 will work
1
To avoid slow pagination with big tables always use auto-increment primary key then use the query below:
SELECT * FROM test_table WHERE id > (SELECT min(id) FROM test_table WHERE id > ((1 * 10) - 10)) ORDER BY id DESC LIMIT 10
1: is the page number
10: is the records per page
Tested and work well with 50 millions records.
There are two simple approaches to solve such a problem
Splitting the query into two subqueries that the first one do all the heavy job on index-only scan as described here
Create calculated index that holds the offset as described here, this can be enhanced using window functions.

Understanding postgres explain w/ bitmap heap/index scans

I have a table w/ 4.5 million rows. There is no primary key. The table has a column p_id, type integer. There's an index, idx_mytable_p_id on this column using the btree method. I do:
SELECT * FROM mytable WHERE p_id = 123456;
I run an explain on this and see the following output:
Bitmap Heap Scan on mytable (cost=12.04..1632.35 rows=425 width=321)
Recheck Cond: (p_id = 543094)
-> Bitmap Index Scan on idx_mytable_p_id (cost=0.00..11.93 rows=425 width=0)
Index Cond: (p_id = 543094)
Questions:
Why is that query doing a heap scan and then a bitmap index scan?
Why is it examining 425 rows? Why is the width of the operation 321?
What is the cost of 12.04..1632.35 and 0.00..11.93 telling me?
For the record there are 773 rows with the p_id value of 123456. There are 38 columns on mytable.
Thanks!
Why is that query doing a heap scan and then a bitmap index scan?
It's not, exactly. EXPLAIN output shows the structure of the execution nodes, with the ones on the "higher" level (not indented as far) pulling rows from the nodes below them. So when the Bitmap Heap Scan node goes to pull its first row the Bitmap Index Scan runs to determine the set of rows to be used, and passes information on the first row to the heap scan. The index scan passes the index to determine which rows need to be read, and the heap scan actually reads them. The idea is that by reading the heap from beginning to end rather than in index order it will do less random access -- all matching rows from a given page will be read when that page is loaded, and enough pages may be read in order to use cheaper sequential access rather than seeking back and forth all over the disk.
Why is it examining 425 rows?
It's not. You ran EXPLAIN, which just shows you estimates and the chosen plan, it doesn't really examine the rows at all. That makes the value of EXPLAIN rather limited compared to running EXPLAIN ANALYZE, which actually runs the query and shows you the estimates and the actual numbers.
Why is the width of the operation 321?
Apparently that's the size, in bytes, of the tuples in mytable.
What is the cost of 12.04..1632.35 and 0.00..11.93 telling me?
The first number is the cost to return the first row from that node; the second number is the cost to return all of the rows for that node. Remember, these are estimates. The unit is an abstract cost unit. The absolute number means nothing; what matters in planning is which plan has the lowest cost. If you are using a cursor the first number matters; otherwise it is usually the second number. (I think it interpolates for a LIMIT clause.)
It is often necessary to adjust configurable cost factors, such as random_page_cost and cpu_tuple_cost, to accurately model the costs within your environment. Without such adjustments the comparative costs are likely to not match the corresponding run times, so a less-than-optimal plan might be chosen.
re 1) execution plans have to be read from the inner most node to the outermost node. So it's first doing an index scan (to find the rows) and the accessing the actual table to return the rows the index scan found
re 2) the number of rows shown in the plan is just an estimation based on the statistics and as such 425 vs. 773 sounds fairly reasonable. If you want to see real figures, use explain analyze
re 3) the first number in the cost figure is the "startup" cost to intialize the step of the planner, the second cost is the total cost of that step.
This is all documented in the manual:
http://www.postgresql.org/docs/current/static/using-explain.html
You might want to go through these links in the PostgreSQL Wiki as well:
PostgreSQL EXPLAIN
Using Explain

Why does PostgreSQL perform sequential scan on indexed column?

Very simple example - one table, one index, one query:
CREATE TABLE book
(
id bigserial NOT NULL,
"year" integer,
-- other columns...
);
CREATE INDEX book_year_idx ON book (year)
EXPLAIN
SELECT *
FROM book b
WHERE b.year > 2009
gives me:
Seq Scan on book b (cost=0.00..25663.80 rows=105425 width=622)
Filter: (year > 2009)
Why it does NOT perform index scan instead?
What am I missing?
If the SELECT returns more than approximately 5-10% of all rows in the table, a sequential scan is much faster than an index scan.
This is because an index scan requires several IO operations for each row (look up the row in the index, then retrieve the row from the heap). Whereas a sequential scan only requires a single IO for each row - or even less because a block (page) on the disk contains more than one row, so more than one row can be fetched with a single IO operation.
Btw: this is true for other DBMS as well - some optimizations as "index only scans" taken aside (but for a SELECT * it's highly unlikely such a DBMS would go for an "index only scan")
Did you ANALYZE the table/database? And what about the statistics? When there are many records where year > 2009, a sequential scan might be faster than an index scan.
#a_horse_with_no_name explained it quite well. Also if you really want to use an index scan, you should generally use bounded ranges in where clause. eg -
year > 2019 and year < 2020.
A lot of the times statistics are not updated on a table and it may not be possible to do so due to constraints. In this case, the optimizer will not know how many rows it should take in year > 2019. Thus it selects a sequential scan in lieu of full knowledge. Bounded partitions will solve the problem most of the time.
In index scan, read head jumps from one row to another which is 1000 times slower than reading the next physical block (in the sequential scan).
So, if the (number of records to be retrieved * 1000) is less than the total number of records, the index scan will perform better.