Query takes long time to run on postgreSQL database despite creating an index - postgresql

Using PostgreSQL 14.3.1, I have created a database instance that is now 1TB in size. The main userlogs table is 751GB in size with 525GB used for data and 226GB used for various indexes on this table. The userlogs table currently contains over 900 million rows. In order to assist with querying this table, a separate Logdates table holds all unique dates for the user logs and there is an integer foreign key column for logdates created in userlogs called logdateID. Amongst the various indexes on the userlogs table, one of them is on logdateID. There are 104 date entries in Logdates table. When running the below query I would expect the index to be used and the 104 records to be retrieved in a reasonable period of time.
select distinct logdateid from userlogs;
This query took a few hours to return with the data. I did an explain plan on the query and the output is as shown below.
"HashAggregate (cost=80564410.60..80564412.60 rows=200 width=4)"
" Group Key: logdateid"
" -> Seq Scan on userlogs (cost=0.00..78220134.28 rows=937710528 width=4)"
I then issues the below command to request the database to use the index.
set enable_seqscan=off
The revised explain plan now comes as below:
"Unique (cost=0.57..3705494150.82 rows=200 width=4)"
" -> Index Only Scan using ix_userlogs_logdateid on userlogs (cost=0.57..3703149874.49 rows=937710528 width=4)"
However, when running the same query, it still takes a few hours to retrieve the data. My question is, why should it take that long to retrieve the data if it is doing an index only scan?
The machine on which the database sits is highly spec'd: a xeon 16-core processor, that with virtualisation enabled, gives 32 logical cores. There is 96GB of RAM and data storage is via a RAID 10 configured 2TB SSD disk with a separate 500GB system SSD disk.

There is no possibilities to optimize such queries in PostGreSQL due to the internal structure of the data storage into rows inside pages.
All queries involving an aggregate in PostGreSQL such as COUNT, COUNT DISTINCT or DISTINCT must read all rows inside the table pages to produce the result.
Let'us take a look over the paper I wrote about this problem :
PostGreSQL vs Microsoft SQL Server – Comparison part 2 : COUNT performances

It seems like your table has none of its pages set as all visible (compare pg_class.relallvisible to the actual number of pages in the table), which is weird because even insert-only tables should get autovacuumed in v13 and up. This will severely punish the index-only scan. You can try to manually vacuum the table to see if that changes things.
It is also weird that it is not using parallelization. It certainly should be. What are your non-default configuration settings?
Finally, I wouldn't expect even the poor plan you show to take a few hours. Maybe your hardware is not performing up to what it should. (Also, RAID 10 requires at least 4 disks, but your description makes it sound like that is not what you have)
Since you have the foreign key table, you could use that in your query, just testing each row that it has at least one row from the log table.
select logdateid from logdate where exists
(select 1 from userlogs where userlogs.logdateid=logdate.logdateid);

Related

Query on large, indexed table times out

I am relatively new to using Postgres, but am wondering what could be the workaround here.
I have a table with about 20 columns and 250 million rows, and an index created for the timestamp column time (but no partitions).
Queries sent to the table have been failing (although using the view first/last 100 rows function in PgAdmin works), running endlessly. Even simple select * queries.
For example, if I want to LIMIT a selection of the data to 10
SELECT * from mytable
WHERE time::timestamp < '2019-01-01'
LIMIT 10;
Such a query hangs - what can be done to optimize queries in a table this large? When the table was of a smaller size (~ 100 million rows), queries would always complete. What should one do in this case?
If time is of data type timestamp or the index is created on (time::timestamp), the query should be fast as lightning.
Please show the CREATE TABLE and the CREATE INDEX statement, and the EXPLAIN output for the query for more details.
"Query that doesn't complete" usually means that it does disk swaps. Especially when you mention the fact that with 100M rows it manages to complete. That's because index for 100M rows still fits in your memory. But index twice this size doesn't.
Limit won't help you here, as database probably decides to read the index first, and that's what kills it.
You could try and increase available memory, but partitioning would actually be the best solution here.
Partitioning means smaller tables. Smaller tables means smaller indexes. Smaller indexes have better chances to fit into your memory.

Postgres query performance in write-heavy database

I have a Postgres 9.6 database hosted on Heroku (standard-4 with 15GB cache).
One of my tables has almost 100M rows and that table gets maybe 3k inserts/updates per minute on average, but that can spike to 10x.
I have a few queries that run against that table that are basically counting rows with different columns having various values.
I also have a copy of the database that is currently getting no updates (basically idle).
Queries against the live database are much slower than queries on the copy and I don't know if this is due to the load/contention from the updates or poor query optimization on my part.
The values that the queries are counting are very common values.
select count(*)
from my_table
where a = '123'
and b = 'abc'
There are a lot of rows with the combination of 123 and abc. I have created a partial index for that:
create index my_table_abc_123_index
on my_table (a, b)
where a = '123' and b = 'abc'
This seems to get me an index only scan that's fast on the copy (~0.5s) but much slower on the live DB (~26s). The costs also vary quite a lot (copy first, live second):
Aggregate (cost=29751.39..29751.39 rows=1 width=8)
Aggregate (cost=179603.59..179603.59 rows=1 width=8)
Those both return a count of about 1.4M and the tables are about the same size.
It also seems that when the writes go up (10x) the degradation is significant.
It's also interesting that Postgres doesn't always choose my partial indexes, opting for a full composite index on (a, b), depending on the values passed for those columns.
I've read the Postgres indexing documentation and realize that for common values it's not always best to use the index (though Postgres is choosing an index in all these cases).
I'm wondering if this is the sort of difference I should be expecting between a live and idle database, and if I'm approaching creating indexes correctly for my workload (which is definitely write-heavy) or in general.
Some other details:
individual (full) column indexes seem much more expensive, but I've not tried individual column partial indexes.
the autovacuum threshold has been set pretty low (0.01) to deal with the dead rows due to lots of updates

Slow Postgres 9.3 Queries, again

This is a follow-up to the question at Slow Postgres 9.3 queries.
The new indexes definitely help. But what we're seeing is sometimes queries are much slower in practice than when we run EXPLAIN ANALYZE. An example is the following, run on the production database:
explain analyze SELECT * FROM messages WHERE groupid=957 ORDER BY id DESC LIMIT 20 OFFSET 31980;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=127361.90..127441.55 rows=20 width=747) (actual time=152.036..152.143 rows=20 loops=1)
-> Index Scan Backward using idx_groupid_id on messages (cost=0.43..158780.12 rows=39869 width=747) (actual time=0.080..150.484 rows=32000 loops=1)
Index Cond: (groupid = 957)
Total runtime: 152.186 ms
(4 rows)
With slow query logging turned on, we see instances of this query taking over 2 seconds. We also have log_lock_waits=true, and no slow locks are reported around the same time. What could explain the vast difference in execution times?
LIMIT x OFFSET y generally performs not much faster than LIMIT x + y. A large OFFSET is always comparatively expensive. The suggested index in the linked question helps, but while you cannot get index-only scans out of it, Postgres still has to check visibility in the heap (the main relation) for at least x + y rows to determine the correct result.
SELECT *
FROM messages
WHERE groupid = 957
ORDER BY id DESC
LIMIT 20
OFFSET 31980;
CLUSTER on your index (groupid,id) would help to increase locality of data in the heap and reduce the number of data pages to be read per query. Definitely a win. But if all groupid are equally likely to be queried, that's not going to remove the bottleneck of too little RAM for cache. If you have concurrent access, consider pg_repack instead of CLUSTER:
Optimize Postgres timestamp query range
Do you actually need all columns returned? (SELECT *) A covering index enabling index-only scans might help if you only need a few small columns returned. (autovacuum must be strong enough to cope with writes to the table, though. Read-only table would be ideal.)
Also, according to your linked question, your table is 32 GB on disk. (Typically a bit more in RAM). The index on (groupid,id) adds another 308 MB at least (without any bloat):
SELECT pg_size_pretty(7337880.0 * 44); -- row count * tuple size
Making sense of Postgres row sizes
You have 8 GB RAM, of which you expect around 4.5 GB to be used for cache (effective_cache_size = 4608MB). That's enough to cache the index for repeated use, but not nearly enough to also cache the whole table.
If your query happens to find data pages in cache, it's fast. Else, not so much. Big difference, even with SSD storage (much more with HDD).
Not directly related to this query, but 8 MB of work_mem (work_mem = 7864kB) seems way to small for your setup. Depending on various other factors I would set this to at least 64MB (unless you have many concurrent queries with sort / hash operations). Like #Craig commented, EXPLAIN (BUFFERS, ANALYZE) might tell us more.
The best query plan also depends on value frequencies. If only few rows pass the filter, the result might be empty for certain groupid and the query is comparatively fast. If a large portion of the table has to be fetched, a plain sequential scan wins. You need valid table statistics (autovacuum again). And possibly a larger statistics target for groupid:
Keep PostgreSQL from sometimes choosing a bad query plan
Since OFFSET is slow, an alternative is to simulate OFFSET using another column and some index preparation. We require a UNIQUE column (like a PRIMARY KEY) on the table. If there is none, one can be added with:
CREATE SEQUENCE messages_pkey_seq ;
ALTER TABLE messages
ADD COLUMN message_id integer DEFAULT nextval('messages_pkey_seq');
Next we create the position column for the OFFSET simulation:
ALTER TABLE messages ADD COLUMN position INTEGER;
UPDATE messages SET position = q.position FROM (SELECT message_id,
row_number() OVER (PARTITION BY group_id ORDER BY id DESC) AS position
FROM messages ) AS q WHERE q.message_id=messages.message_id ;
CREATE INDEX ON messages ( group_id, position ) ;
Now we are ready for the new version of the query in the OP:
SELECT * FROM messages WHERE group_id = 957 AND
position BETWEEN 31980 AND (31980+20-1) ;

Select query with offset limit is too much slow

I have read from internet resources that a query will be slow when the offset increases. But in my case I think its too much slow. I am using postgres 9.3
Here is the query (id is primary key):
select * from test_table offset 3900000 limit 100;
It returns me data in around 10 seconds. And I think its too much slow. I have around 4 million records in table. Overall size of the database is 23GB.
Machine configuration:
RAM: 12 GB
CPU: 2.30 GHz
Core: 10
Few values from postgresql.conf file which I have changed are as below. Others are default.
shared_buffers = 2048MB
temp_buffers = 512MB
work_mem = 1024MB
maintenance_work_mem = 256MB
dynamic_shared_memory_type = posix
default_statistics_target = 10000
autovacuum = on
enable_seqscan = off ## its not making any effect as I can see from Analyze doing seq-scan
Apart from these I have also tried by changing the values of random_page_cost = 2.0 and cpu_index_tuple_cost = 0.0005 and result is same.
Explain (analyze, buffers) result over the query is as below:
"Limit (cost=10000443876.02..10000443887.40 rows=100 width=1034) (actual time=12793.975..12794.292 rows=100 loops=1)"
" Buffers: shared hit=26820 read=378984"
" -> Seq Scan on test_table (cost=10000000000.00..10000467477.70 rows=4107370 width=1034) (actual time=0.008..9036.776 rows=3900100 loops=1)"
" Buffers: shared hit=26820 read=378984"
"Planning time: 0.136 ms"
"Execution time: 12794.461 ms"
How people around the world negotiates with this problem in postgres? Any alternate solution will be helpful for me as well.
UPDATE:: Adding order by id (tried with other indexed column as well) and here is the explain:
"Limit (cost=506165.06..506178.04 rows=100 width=1034) (actual time=15691.132..15691.494 rows=100 loops=1)"
" Buffers: shared hit=110813 read=415344"
" -> Index Scan using test_table_pkey on test_table (cost=0.43..533078.74 rows=4107370 width=1034) (actual time=38.264..11535.005 rows=3900100 loops=1)"
" Buffers: shared hit=110813 read=415344"
"Planning time: 0.219 ms"
"Execution time: 15691.660 ms"
It's slow because it needs to locate the top offset rows and scan the next 100. No amounts of optimization will change that when you're dealing with huge offsets.
This is because your query literally instruct the DB engine to visit lots of rows by using offset 3900000 -- that's 3.9M rows. Options to speed this up somewhat aren't many.
Super-fast RAM, SSDs, etc. will help. But you'll only gain by a constant factor in doing so, meaning it's merely kicking the can down the road until you reach a larger enough offset.
Ensuring the table fits in memory, with plenty more to spare will likewise help by a larger constant factor -- except the first time. But this may not be possible with a large enough table or index.
Ensuring you're doing index-only scans will work to an extent. (See velis' answer; it has a lot of merit.) The problem here is that, for all practical purposes, you can think of an index as a table storing a disk location and the indexed fields. (It's more optimized than that, but it's a reasonable first approximation.) With enough rows, you'll still be running into problems with a larger enough offset.
Trying to store and maintain the precise position of the rows is bound to be an expensive approach too.(This is suggested by e.g. benjist.) While technically feasible, it suffers from limitations similar to those that stem from using MPTT with a tree structure: you'll gain significantly on reads but will end up with excessive write times when a node is inserted, updated or removed in such a way that large chunks of the data needs to be updated alongside.
As is hopefully more clear, there isn't any real magic bullet when you're dealing with offsets this large. It's often better to look at alternative approaches.
If you're paginating based on the ID (or a date field, or any other indexable set of fields), a potential trick (used by blogspot, for instance) would be to make your query start at an arbitrary point in the index.
Put another way, instead of:
example.com?page_number=[huge]
Do something like:
example.com?page_following=[huge]
That way, you keep a trace of where you are in your index, and the query becomes very fast because it can head straight to the correct starting point without plowing through a gazillion rows:
select * from foo where ID > [huge] order by ID limit 100
Naturally, you lose the ability to jump to e.g. page 3000. But give this some honest thought: when was the last time you jumped to a huge page number on a site instead of going straight for its monthly archives or using its search box?
If you're paginating but want to keep the page offset by any means, yet another approach is to forbid the use of larger page number. It's not silly: it's what Google is doing with search results. When running a search query, Google gives you an estimate number of results (you can get a reasonable number using explain), and then will allow you to brows the top few thousand results -- nothing more. Among other things, they do so for performance reasons -- precisely the one you're running into.
I have upvoted Denis's answer, but will add a suggestion myself, perhaps it can be of some performance benefit for your specific use-case:
Assuming your actual table is not test_table, but some huge compound query, possibly with multiple joins. You could first determine the required starting id:
select id from test_table order by id offset 3900000 limit 1
This should be much faster than original query as it only requires to scan the index vs the entire table. Getting this id then opens up a fast index-search option for full fetch:
select * from test_table where id >= (what I got from previous query) order by id limit 100
You didn't say if your data is mainly read-only or updated often. If you can manage to create your table at one time, and only update it every now and then (say every few minutes) your problem will be easy to solve:
Add a new column "offset_id"
For your complete data set ordered by ID, create an offset_id simply by incrementing numbers: 1,2,3,4...
Instead of "offset ... limit 100" use "where offset_id >= 3900000 limit 100"
you can optimise in two steps
First get maximum id out of 3900000 records
select max(id) (select id from test_table order by id limit 3900000);
Then use this maximum id to get the next 100 records.
select * from test_table id > {max id from previous step) order by id limit 100 ;
It will be faster as both queries will do index scan by id.
This way you get the rows in semi-random order. You are not ordering the results in a query, so as a result, you get the data as it is stored in the files. The problem is that when you update the rows, the order of them can change.
To fix that you should add order by to the query. This way the query will return the rows in the same order. What's more then it will be able to use an index to speed the query up.
So two things: add an index, add order by to the query. Both to the same column. If you want to use the id column, then don't add index, just change the query to something like:
select * from test_table order by id offset 3900000 limit 100;
First, you have to define limit and offset with order by clause or you will get inconsistent result.
To speed up the query, you can have a computed index, but only for these condition :
Newly inserted data is strictly in id order
No delete nor update on column id
Here's how You can do it :
Create a row position function
create or replace function id_pos (id) returns bigint
as 'select count(id) from test_table where id <= $1;'
language sql immutable;
Create a computed index on id_pos function
create index table_by_pos on test_table using btree(id_pos(id));
Here's how You call it (offset 3900000 limit 100):
select * from test_table where id_pos(id) >= 3900000 and sales_pos(day) < 3900100;
This way, the query will not compute the 3900000 offset data, but only will compute the 100 data, making it much faster.
Please note the 2 conditions where this approach can take place, or the position will change.
I don't know all of the details of your data, but 4 million rows can be a little hefty. If there's a reasonable way to shard the table and essentially break it up into smaller tables it could be beneficial.
To explain this, let me use an example. let's say that I have a database where I have a table called survey_answer, and it's getting very large and very slow. Now let's say that these survey answers all come from a distinct group of clients (and I also have a client table keeping track of these clients). Then something I could do is I could make it so that I have a table called survey_answer that doesn't have any data in it, but is a parent table, and it has a bunch of child tables that actually contain the data the follow the naming format survey_answer_<clientid>, meaning that I'd have child tables survey_answer_1, survey_answer_2, etc., one for each client. Then when I needed to select data for that client, I'd use that table. If I needed to select data across all clients, I can select from the parent survey_answer table, but it will be slow. But for getting data for an individual client, which is what I mostly do, then it would be fast.
This is one example of how to break up data, and there are many others. Another example would be if my survey_answer table didn't break up easily by client, but instead I know that I'm typically only accessing data over a year period of time at once, then I could potentially make child tables based off of year, such as survey_answer_2014, survey_answer_2013, etc. Then if I know that I won't access more than a year at a time, I only really need to access maybe two of my child tables to get all the data I need.
In your case, all I've been given is perhaps the id. We can break it up by that as well (though perhaps not as ideal). Let's say that we break it up so that there's only about 1000000 rows per table. So our child tables would be test_table_0000001_1000000, test_table_1000001_2000000, test_table_2000001_3000000, test_table_3000001_4000000, etc. So instead of passing in an offset of 3900000, you'd do a little math first and determine that the table that you want is table test_table_3000001_4000000 with an offset of 900000 instead. So something like:
SELECT * FROM test_table_3000001_4000000 ORDER BY id OFFSET 900000 LIMIT 100;
Now if sharding the table is out of the question, you might be able to use partial indexes to do something similar, but again, I'd recommend sharding first. Learn more about partial indexes here.
I hope that helps. (Also, I agree with Szymon Guz that you want an ORDER BY).
Edit: Note that if you need to delete rows or selectively exclude rows before getting your result of 100, then sharding by id will become very hard to deal with (as pointed out by Denis; and sharding by id is not great to begin with). But if your 'just' paginating the data, and you only insert or edit (not a common thing, but it does happen; logs come to mind), then sharding by id can be done reasonably (though I'd still choose something else to shard on).
How about if paginate based on IDs instead of offset/limit?
The following query will give IDs which split all the records into chunks of size per_page. It doesn't depend on were records deleted or not
SELECT id AS from_id FROM (
SELECT id, (ROW_NUMBER() OVER(ORDER BY id DESC)) AS num FROM test_table
) AS rn
WHERE num % (per_page + 1) = 0;
With these from_IDs you can add links to the page. Iterate over :from_ids with index and add the following link to the page:
:from_id_index
When user visits the page retrieve records with ID which is greater than requested :from_id:
SELECT * FROM test_table WHERE ID >= :from_id ORDER BY id DESC LIMIT :per_page
For the first page link with from_id=0 will work
1
To avoid slow pagination with big tables always use auto-increment primary key then use the query below:
SELECT * FROM test_table WHERE id > (SELECT min(id) FROM test_table WHERE id > ((1 * 10) - 10)) ORDER BY id DESC LIMIT 10
1: is the page number
10: is the records per page
Tested and work well with 50 millions records.
There are two simple approaches to solve such a problem
Splitting the query into two subqueries that the first one do all the heavy job on index-only scan as described here
Create calculated index that holds the offset as described here, this can be enhanced using window functions.

Optimize Postgres query on timestamp range

I have the following table and indices defined:
CREATE TABLE ticket (
wid bigint NOT NULL DEFAULT nextval('tickets_id_seq'::regclass),
eid bigint,
created timestamp with time zone NOT NULL DEFAULT now(),
status integer NOT NULL DEFAULT 0,
argsxml text,
moduleid character varying(255),
source_id bigint,
file_type_id bigint,
file_name character varying(255),
status_reason character varying(255),
...
)
I created an index on the created timestamp as follows:
CREATE INDEX ticket_1_idx
ON ticket
USING btree
(created );
Here's my query:
select * from ticket
where created between '2012-12-19 00:00:00' and '2012-12-20 00:00:00'
This was working fine until the number of records started to grow (about 5 million) and now it's taking forever to return.
Explain analyze reveals this:
Index Scan using ticket_1_idx on ticket (cost=0.00..10202.64 rows=52543 width=1297) (actual time=0.109..125.704 rows=53340 loops=1)
Index Cond: ((created >= '2012-12-19 00:00:00+00'::timestamp with time zone) AND (created <= '2012-12-20 00:00:00+00'::timestamp with time zone))
Total runtime: 175.853 ms
So far I've tried setting:
random_page_cost = 1.75
effective_cache_size = 3
Also created:
create CLUSTER ticket USING ticket_1_idx;
Nothing works. What am I doing wrong? Why is it selecting sequential scan? The indexes are supposed to make the query fast. Anything that can be done to optimize it?
CLUSTER
If you intend to use CLUSTER, the displayed syntax is invalid.
create CLUSTER ticket USING ticket_1_idx;
Run once:
CLUSTER ticket USING ticket_1_idx;
This can help a lot with bigger result sets. Less for a single or few rows returned.
If your table isn't read-only the effect deteriorates over time. Re-run CLUSTER at reasonable intervals. Postgres remembers the index for subsequent calls, so this works, too:
CLUSTER ticket;
(But I would rather be explicit and use the first form.)
However, if you have lots of updates, CLUSTER (or VACUUM FULL) may actually be bad for performance. The right amount of bloat allows UPDATE to place new row versions on the same data page and avoids the need for extending the underlying physical file (expensively) too often. You can use a carefully tuned FILLFACTOR to get the best of both worlds:
Fillfactor for a sequential index that is PK
pg_repack / pg_squeeze
CLUSTER takes an exclusive lock on the table, which may be a problem in a multi-user environment. Quoting the manual:
When a table is being clustered, an ACCESS EXCLUSIVE lock is acquired
on it. This prevents any other database operations (both reads and
writes) from operating on the table until the CLUSTER is finished.
Bold emphasis mine. Consider the alternatives!
pg_repack:
Unlike CLUSTER and VACUUM FULL it works online, without holding an
exclusive lock on the processed tables during processing. pg_repack is
efficient to boot, with performance comparable to using CLUSTER directly.
and:
pg_repack needs to take an exclusive lock at the end of the reorganization.
The current version 1.4.7 works with PostgreSQL 9.4 - 14.
pg_squeeze is a newer alternative that claims:
In fact we try to replace pg_repack extension.
The current version 1.4 works with Postgres 10 - 14.
Query
The query is simple enough not to cause any performance problems per se.
However: The BETWEEN construct includes boundaries. Your query selects all of Dec. 19, plus records from Dec. 20, 00:00. That's an extremely unlikely requirement. Chances are, you really want:
SELECT *
FROM ticket
WHERE created >= '2012-12-19 00:00'
AND created < '2012-12-20 00:00';
Performance
Why is it selecting sequential scan?
Your EXPLAIN output clearly shows an Index Scan, not a sequential table scan. There must be some kind of misunderstanding.
You may be able to improve performance, but the necessary background information is not in the question. Possible options include:
Only query required columns instead of * to reduce transfer cost (and other performance benefits).
Look at partitioning and put practical time slices into separate tables. Add indexes to partitions as needed.
If partitioning is not an option, another related but less intrusive technique would be to add one or more partial indexes.
For example, if you mostly query the current month, you could create the following partial index:
CREATE INDEX ticket_created_idx ON ticket(created)
WHERE created >= '2012-12-01 00:00:00'::timestamp;
CREATE a new index right before the start of a new month. You can easily automate the task with a cron job.
Optionally DROP partial indexes for old months later.
Keep the total index in addition for CLUSTER (which cannot operate on partial indexes). If old records never change, table partitioning would help this task a lot, since you only need to re-cluster newer partitions.
Then again if records never change at all, you probably don't need CLUSTER.
Performance Basics
You may be missing one of the basics. All the usual performance advice applies:
https://wiki.postgresql.org/wiki/Slow_Query_Questions
https://wiki.postgresql.org/wiki/Performance_Optimization