Does Postgres lock more rows than the limit provided? - postgresql

If the LIMIT is applied to the rows returned from a query, does this mean that more rows could be locked that are not returned?
Like this:
select * from myTable where status = 'READY' limit 10 FOR UPDATE
If there are 1000 rows in a status of READY, does it row lock them all but only return 10?
I am seeing quite a costly -> LockRows on my explain plan and trying to understand why.
Thanks

From the documentation, it seems pretty clear that only the actual records returned by your select query would be locked:
FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being locked, modified or deleted by other transactions until the current transaction ends.
That being said, one possible explanation for why the LockRows operation seems so costly is that, in order to isolate the 10 records you want for locking, it first has to do a sort to implement LIMIT. This is an operation which involves the entire table, so for a large table, and without an index to help, this could take some time.
Let's say this were your actual query:
select * from myTable where status = 'READY' order by some_col limit 10 FOR UPDATE
This query would benefit from the following index:
create index idx on myTable (status, some_col);
The first column in the index status would let Postgres discard records not matching the WHERE filter. After this, the index also covers some_col, which means Postgres could easily find the limit 10 records you want already in the correct order.

Related

How to index a table in Postgres to speed up ORDER BY

How do you create an index in PostgreSQL 11 to speed up a specific query containing an ORDER BY?
I have a query that needs to get the first 100 records from a table containing 2M records, along with a few common filters like:
SELECT id, first_name, last_name
FROM users
WHERE active = true AND region IN (1,2,3)
ORDER BY last_active_timestamp DESC;
Without the ORDER BY clause, it returns in ~1 sec, almost instantly. However, with the clause, it takes an excruciating ~5 minutes.
So I tried creating a partial index like:
CREATE INDEX CONCURRENTLY my_user_index ON users (active, region, last_active_timestamp DESC NULLS LAST)
WHERE region IN (1, 2, 3) AND active = True;
but that had virtually no effect. The above query still takes several minutes. Is that just a limitation of ORDER BY in Postgres, or is there a different type of index I could use to speed it up?
To try an index was correct but you used the wrong one. Try this here:
CREATE INDEX CONCURRENTLY my_user_index
ON users (last_active_timestamp DESC)
WHERE region IN (1, 2, 3)
AND active = true;
Your index was only sorted by last_active_timestamp after already being sorted by active and region, thus you could not just use the index to have your sorted output.
For some more speedup, you could also include the columns of your select-clause within the index using INCLUDE (id, first_name, last_name). Now your query can (if the planner chooses so and I think it will) run on the index only without touching the table data at all.
In order to use an index with the ORDER BY in your query, you need to index on all the relevant columns (last_active_timestamp, along with a condition to include only active==true and regions a,b,c). This will essentially pull the data out in order for you).
Also, if you share your EXPLAIN ANALYZE output, you may see a Sort Method: external merge Disk: ####kB, indicating that the sort spilled out to disk and not in memory, due to an insufficiently-sized work_mem. The solution would then be to increase work_mem to a value of at least ####kB, and try again.
Note that you can set work_mem on a per-session basis, as a global change in work_mem could potentially have negative side-effects, such as running out of memory, because postgresql.conf-configured work_mem is allocated for each session (basically, it has a multiplicative effect).
If the query is still slow after tuning up work_mem (i.e., it's all sorting in memory, and it's still slow), then your returned data set is simply too large to sort quickly.

How can you use 'For update skip locked' in postgres without locking rows in all tables used in the query?

When you want to use postgres's SELECT FOR UPDATE SKIP LOCKED functionality to ensure that two different users reading from a table and claiming tasks do not get blocked by each other and also do not get tasks already being read by another user:
A join is being used in the query to retrieve tasks. We do not want any other table to have row-level locking except the table that contains the main info. Sample query below - Lock only the rows in the table -'task' in the below query
SELECT v.someid , v.info, v.parentinfo_id, v.stage FROM task v, parentinfo pi WHERE v.stage = 'READY_TASK'
AND v.parentinfo_id = pi.id
AND pi.important_info_number = (
SELECT MAX(important_info_number) FROM parentinfo )
ORDER BY v.id limit 200 for update skip locked;
Now if user A is retrieving some 200 rows of this table, user B should be able to retrieve another set of 200 rows.
EDIT: As per the comment below, the query will be changed to :
SELECT v.someid , v.info, v.parentinfo_id, v.stage FROM task v, parentinfo pi WHERE v.stage = 'READY_TASK'
AND v.parentinfo_id = pi.id
AND pi.important_info_number = (
SELECT MAX(important_info_number) FROM parentinfo) ORDER BY v.id limit 200 for update of v skip locked;
How best to place order by such that rows are ordered? While the order would get effected if multiple users invoke this command, still some order sanctity should be maintained of the rows that are being returned.
Also, does this also ensure that multiple threads invoking the same select query would be retrieving a different set of rows or is the locking only done for update commands?
Just experimented with this a little bit - multiple select queries will end up retrieving different set of rows. Also, order by ensures the order of the final result obtained.
Yes,
FOR UPDATE OF "TABLE_NAME" SKIP LOCKED
will lock only TABLE_NAME

Count(*) filtering on indexed datetime field runs too long

I am trying to count all the records created yesterday. There is a created_at column and it is indexed.
If i run
explain
select count(*) from events where created_at::date = current_date - 1;
It says
Aggregate (cost=14365728.05..14365728.06 rows=1 width=0)
-> Index Only Scan using index_events_created_at on events (cost=0.57..14362310.20 rows=1367140 width=0)
Filter: ((created_at)::date = (('now'::cstring)::date - 1))
So it event kind of knows how many rows there are. But the
select count(*) from events where created_at::date = current_date - 1;
query itself keeps running forever. Why is that?
TRY this:
SELECT count(*)
FROM events
WHERE created_at >= current_date - 1
AND created_at < current_date;
So, to start: Why is the explain plan able to provide an estimated row count so much quicker than the query can run?
The optimizer is estimating the row count based on stored statistics and/or extrapolations from stored statistics. As you can see, this isn't necessary very accurate. (Based on comment discussion, the estimate was off by almost 20%.) So the query has to actually count, based on either data in the table or data in the index. So that's more work. But it's not obvious why it's 10 minutes worth of "more work".
One reasonable guess would be lock contention. Depending on your transaction isolation settings, it could be that your query keeps having to wait on inserts or updates to the table to finish. (The optimizer wouldn't have this problem in calculating its estimate, because it will just assume that the effects of concurrent queries are not a big deal for its purposes.) Even though none of the added data would affect your count, table-level locks could still conflict.
One way to test this theory would be to copy the table, so that you have a table with the same data (and same indexes, etc) that nobody's querying, and see if your count runs faster against it.
(As an aside: In general when the stats seem significantly off you could suspect that the optimizer had picked a poor execution plan; but it's hard to see how an index scan could be the wrong solution here.)

Select query with offset limit is too much slow

I have read from internet resources that a query will be slow when the offset increases. But in my case I think its too much slow. I am using postgres 9.3
Here is the query (id is primary key):
select * from test_table offset 3900000 limit 100;
It returns me data in around 10 seconds. And I think its too much slow. I have around 4 million records in table. Overall size of the database is 23GB.
Machine configuration:
RAM: 12 GB
CPU: 2.30 GHz
Core: 10
Few values from postgresql.conf file which I have changed are as below. Others are default.
shared_buffers = 2048MB
temp_buffers = 512MB
work_mem = 1024MB
maintenance_work_mem = 256MB
dynamic_shared_memory_type = posix
default_statistics_target = 10000
autovacuum = on
enable_seqscan = off ## its not making any effect as I can see from Analyze doing seq-scan
Apart from these I have also tried by changing the values of random_page_cost = 2.0 and cpu_index_tuple_cost = 0.0005 and result is same.
Explain (analyze, buffers) result over the query is as below:
"Limit (cost=10000443876.02..10000443887.40 rows=100 width=1034) (actual time=12793.975..12794.292 rows=100 loops=1)"
" Buffers: shared hit=26820 read=378984"
" -> Seq Scan on test_table (cost=10000000000.00..10000467477.70 rows=4107370 width=1034) (actual time=0.008..9036.776 rows=3900100 loops=1)"
" Buffers: shared hit=26820 read=378984"
"Planning time: 0.136 ms"
"Execution time: 12794.461 ms"
How people around the world negotiates with this problem in postgres? Any alternate solution will be helpful for me as well.
UPDATE:: Adding order by id (tried with other indexed column as well) and here is the explain:
"Limit (cost=506165.06..506178.04 rows=100 width=1034) (actual time=15691.132..15691.494 rows=100 loops=1)"
" Buffers: shared hit=110813 read=415344"
" -> Index Scan using test_table_pkey on test_table (cost=0.43..533078.74 rows=4107370 width=1034) (actual time=38.264..11535.005 rows=3900100 loops=1)"
" Buffers: shared hit=110813 read=415344"
"Planning time: 0.219 ms"
"Execution time: 15691.660 ms"
It's slow because it needs to locate the top offset rows and scan the next 100. No amounts of optimization will change that when you're dealing with huge offsets.
This is because your query literally instruct the DB engine to visit lots of rows by using offset 3900000 -- that's 3.9M rows. Options to speed this up somewhat aren't many.
Super-fast RAM, SSDs, etc. will help. But you'll only gain by a constant factor in doing so, meaning it's merely kicking the can down the road until you reach a larger enough offset.
Ensuring the table fits in memory, with plenty more to spare will likewise help by a larger constant factor -- except the first time. But this may not be possible with a large enough table or index.
Ensuring you're doing index-only scans will work to an extent. (See velis' answer; it has a lot of merit.) The problem here is that, for all practical purposes, you can think of an index as a table storing a disk location and the indexed fields. (It's more optimized than that, but it's a reasonable first approximation.) With enough rows, you'll still be running into problems with a larger enough offset.
Trying to store and maintain the precise position of the rows is bound to be an expensive approach too.(This is suggested by e.g. benjist.) While technically feasible, it suffers from limitations similar to those that stem from using MPTT with a tree structure: you'll gain significantly on reads but will end up with excessive write times when a node is inserted, updated or removed in such a way that large chunks of the data needs to be updated alongside.
As is hopefully more clear, there isn't any real magic bullet when you're dealing with offsets this large. It's often better to look at alternative approaches.
If you're paginating based on the ID (or a date field, or any other indexable set of fields), a potential trick (used by blogspot, for instance) would be to make your query start at an arbitrary point in the index.
Put another way, instead of:
example.com?page_number=[huge]
Do something like:
example.com?page_following=[huge]
That way, you keep a trace of where you are in your index, and the query becomes very fast because it can head straight to the correct starting point without plowing through a gazillion rows:
select * from foo where ID > [huge] order by ID limit 100
Naturally, you lose the ability to jump to e.g. page 3000. But give this some honest thought: when was the last time you jumped to a huge page number on a site instead of going straight for its monthly archives or using its search box?
If you're paginating but want to keep the page offset by any means, yet another approach is to forbid the use of larger page number. It's not silly: it's what Google is doing with search results. When running a search query, Google gives you an estimate number of results (you can get a reasonable number using explain), and then will allow you to brows the top few thousand results -- nothing more. Among other things, they do so for performance reasons -- precisely the one you're running into.
I have upvoted Denis's answer, but will add a suggestion myself, perhaps it can be of some performance benefit for your specific use-case:
Assuming your actual table is not test_table, but some huge compound query, possibly with multiple joins. You could first determine the required starting id:
select id from test_table order by id offset 3900000 limit 1
This should be much faster than original query as it only requires to scan the index vs the entire table. Getting this id then opens up a fast index-search option for full fetch:
select * from test_table where id >= (what I got from previous query) order by id limit 100
You didn't say if your data is mainly read-only or updated often. If you can manage to create your table at one time, and only update it every now and then (say every few minutes) your problem will be easy to solve:
Add a new column "offset_id"
For your complete data set ordered by ID, create an offset_id simply by incrementing numbers: 1,2,3,4...
Instead of "offset ... limit 100" use "where offset_id >= 3900000 limit 100"
you can optimise in two steps
First get maximum id out of 3900000 records
select max(id) (select id from test_table order by id limit 3900000);
Then use this maximum id to get the next 100 records.
select * from test_table id > {max id from previous step) order by id limit 100 ;
It will be faster as both queries will do index scan by id.
This way you get the rows in semi-random order. You are not ordering the results in a query, so as a result, you get the data as it is stored in the files. The problem is that when you update the rows, the order of them can change.
To fix that you should add order by to the query. This way the query will return the rows in the same order. What's more then it will be able to use an index to speed the query up.
So two things: add an index, add order by to the query. Both to the same column. If you want to use the id column, then don't add index, just change the query to something like:
select * from test_table order by id offset 3900000 limit 100;
First, you have to define limit and offset with order by clause or you will get inconsistent result.
To speed up the query, you can have a computed index, but only for these condition :
Newly inserted data is strictly in id order
No delete nor update on column id
Here's how You can do it :
Create a row position function
create or replace function id_pos (id) returns bigint
as 'select count(id) from test_table where id <= $1;'
language sql immutable;
Create a computed index on id_pos function
create index table_by_pos on test_table using btree(id_pos(id));
Here's how You call it (offset 3900000 limit 100):
select * from test_table where id_pos(id) >= 3900000 and sales_pos(day) < 3900100;
This way, the query will not compute the 3900000 offset data, but only will compute the 100 data, making it much faster.
Please note the 2 conditions where this approach can take place, or the position will change.
I don't know all of the details of your data, but 4 million rows can be a little hefty. If there's a reasonable way to shard the table and essentially break it up into smaller tables it could be beneficial.
To explain this, let me use an example. let's say that I have a database where I have a table called survey_answer, and it's getting very large and very slow. Now let's say that these survey answers all come from a distinct group of clients (and I also have a client table keeping track of these clients). Then something I could do is I could make it so that I have a table called survey_answer that doesn't have any data in it, but is a parent table, and it has a bunch of child tables that actually contain the data the follow the naming format survey_answer_<clientid>, meaning that I'd have child tables survey_answer_1, survey_answer_2, etc., one for each client. Then when I needed to select data for that client, I'd use that table. If I needed to select data across all clients, I can select from the parent survey_answer table, but it will be slow. But for getting data for an individual client, which is what I mostly do, then it would be fast.
This is one example of how to break up data, and there are many others. Another example would be if my survey_answer table didn't break up easily by client, but instead I know that I'm typically only accessing data over a year period of time at once, then I could potentially make child tables based off of year, such as survey_answer_2014, survey_answer_2013, etc. Then if I know that I won't access more than a year at a time, I only really need to access maybe two of my child tables to get all the data I need.
In your case, all I've been given is perhaps the id. We can break it up by that as well (though perhaps not as ideal). Let's say that we break it up so that there's only about 1000000 rows per table. So our child tables would be test_table_0000001_1000000, test_table_1000001_2000000, test_table_2000001_3000000, test_table_3000001_4000000, etc. So instead of passing in an offset of 3900000, you'd do a little math first and determine that the table that you want is table test_table_3000001_4000000 with an offset of 900000 instead. So something like:
SELECT * FROM test_table_3000001_4000000 ORDER BY id OFFSET 900000 LIMIT 100;
Now if sharding the table is out of the question, you might be able to use partial indexes to do something similar, but again, I'd recommend sharding first. Learn more about partial indexes here.
I hope that helps. (Also, I agree with Szymon Guz that you want an ORDER BY).
Edit: Note that if you need to delete rows or selectively exclude rows before getting your result of 100, then sharding by id will become very hard to deal with (as pointed out by Denis; and sharding by id is not great to begin with). But if your 'just' paginating the data, and you only insert or edit (not a common thing, but it does happen; logs come to mind), then sharding by id can be done reasonably (though I'd still choose something else to shard on).
How about if paginate based on IDs instead of offset/limit?
The following query will give IDs which split all the records into chunks of size per_page. It doesn't depend on were records deleted or not
SELECT id AS from_id FROM (
SELECT id, (ROW_NUMBER() OVER(ORDER BY id DESC)) AS num FROM test_table
) AS rn
WHERE num % (per_page + 1) = 0;
With these from_IDs you can add links to the page. Iterate over :from_ids with index and add the following link to the page:
:from_id_index
When user visits the page retrieve records with ID which is greater than requested :from_id:
SELECT * FROM test_table WHERE ID >= :from_id ORDER BY id DESC LIMIT :per_page
For the first page link with from_id=0 will work
1
To avoid slow pagination with big tables always use auto-increment primary key then use the query below:
SELECT * FROM test_table WHERE id > (SELECT min(id) FROM test_table WHERE id > ((1 * 10) - 10)) ORDER BY id DESC LIMIT 10
1: is the page number
10: is the records per page
Tested and work well with 50 millions records.
There are two simple approaches to solve such a problem
Splitting the query into two subqueries that the first one do all the heavy job on index-only scan as described here
Create calculated index that holds the offset as described here, this can be enhanced using window functions.

How does this PostgreSQL query slow down when the number of rows increases?

I have a table briefly structured like this:
tn( id integer NOT NULL primary key DEFAULT nextval('tn_sequence'),
create_dt TIMESTAMP NOT NULL DEFAULT NOW(),
...............
deleted boolean );
create_dt is the timestamp when the row is inserted into the database.
deleted indicates that the row is or no longer useful.
And I have the following queries:
select * from tn where create_dt > ( NOW() - interval '150 seconds ) and deleted = FALSE;
select * from tn where create_dt < ( NOW() - interval '150 seconds ) and deleted = FALSE;
My question is how these query will slow down when the number of rows increase? For instance, when the number of rows exceeds 10K, 20K, or 100K, will it make a big impact on the speed? Is there any way I can optimize these queries? Note that every 5 seconds I will turn the column 'deleted' of rows which are older than 150 seconds into 'TRUE'.
The effect of table growth on performance will depend on the query plan chosen, available indexes, the selectivity of the query, and lots of other factors. EXPLAIN ANALYZE on the query might help. In short, if your query only selects a few rows and can use a simple b-tree index then it won't usually slow down tons, only a little as the index grows. On the other hand queries using complex non-indexed conditions or returning lots of rows could perform very badly indeed.
Your issue appears to mirror that in the question How should we handle rows which won't be queried once they are old in PostgreSQL?
The advice given there should apply:
Use a partial index with the condition WHERE (not deleted); or
partition on 'deleted' with constraint exclusion enabled.
For example, you might:
CREATE INDEX create_dt_when_not_deleted_idx
ON tn (create_dt)
WHERE (NOT deleted);
This includes only rows where deleted = 'f' (assuming deleted is `not null) in the index. This isn't the same as having them gone from the table completely.
Nothing changes with full table sequential scans, the deleted='t' rows must still be scanned; and
There's more I/O than if the deleted = 't' rows weren't there because any given heap page is likely to contain a mix of deleted = 't' and deleted = 'f' rows.
You can reduce the impact of the latter by CLUSTERing on an index that includes deleted. Again, this will have no effect on sequential scans. To help with sequential scans you would have to partition the table on deleted.
Pg 9.2's index only scans should (I think, haven't tested) use the partial index. When an index only scan is possible the partial index should be as fast as an index on a table containing only the deleted = 'f' rows.
Note that you'll need to keep table and index bloat under control. Ensure autovaccum runs very frequently and use a current version of PostgreSQL that doesn't need things like manually-managed free space map and has the latest, best-behaved autovacuum. I'd recommend 9.0 or above, preferably 9.1 or 9.2. Tune autovacuum to run aggressively.
When tuning and testing performance - test your queries with EXPLAIN ANALYZE, don't just guess.