How can you use 'For update skip locked' in postgres without locking rows in all tables used in the query? - postgresql

When you want to use postgres's SELECT FOR UPDATE SKIP LOCKED functionality to ensure that two different users reading from a table and claiming tasks do not get blocked by each other and also do not get tasks already being read by another user:
A join is being used in the query to retrieve tasks. We do not want any other table to have row-level locking except the table that contains the main info. Sample query below - Lock only the rows in the table -'task' in the below query
SELECT v.someid , v.info, v.parentinfo_id, v.stage FROM task v, parentinfo pi WHERE v.stage = 'READY_TASK'
AND v.parentinfo_id = pi.id
AND pi.important_info_number = (
SELECT MAX(important_info_number) FROM parentinfo )
ORDER BY v.id limit 200 for update skip locked;
Now if user A is retrieving some 200 rows of this table, user B should be able to retrieve another set of 200 rows.
EDIT: As per the comment below, the query will be changed to :
SELECT v.someid , v.info, v.parentinfo_id, v.stage FROM task v, parentinfo pi WHERE v.stage = 'READY_TASK'
AND v.parentinfo_id = pi.id
AND pi.important_info_number = (
SELECT MAX(important_info_number) FROM parentinfo) ORDER BY v.id limit 200 for update of v skip locked;
How best to place order by such that rows are ordered? While the order would get effected if multiple users invoke this command, still some order sanctity should be maintained of the rows that are being returned.
Also, does this also ensure that multiple threads invoking the same select query would be retrieving a different set of rows or is the locking only done for update commands?

Just experimented with this a little bit - multiple select queries will end up retrieving different set of rows. Also, order by ensures the order of the final result obtained.

Yes,
FOR UPDATE OF "TABLE_NAME" SKIP LOCKED
will lock only TABLE_NAME

Related

Does Postgres lock more rows than the limit provided?

If the LIMIT is applied to the rows returned from a query, does this mean that more rows could be locked that are not returned?
Like this:
select * from myTable where status = 'READY' limit 10 FOR UPDATE
If there are 1000 rows in a status of READY, does it row lock them all but only return 10?
I am seeing quite a costly -> LockRows on my explain plan and trying to understand why.
Thanks
From the documentation, it seems pretty clear that only the actual records returned by your select query would be locked:
FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being locked, modified or deleted by other transactions until the current transaction ends.
That being said, one possible explanation for why the LockRows operation seems so costly is that, in order to isolate the 10 records you want for locking, it first has to do a sort to implement LIMIT. This is an operation which involves the entire table, so for a large table, and without an index to help, this could take some time.
Let's say this were your actual query:
select * from myTable where status = 'READY' order by some_col limit 10 FOR UPDATE
This query would benefit from the following index:
create index idx on myTable (status, some_col);
The first column in the index status would let Postgres discard records not matching the WHERE filter. After this, the index also covers some_col, which means Postgres could easily find the limit 10 records you want already in the correct order.

lock the selected rows until update operation

Can I lock a particular selected rows until updates are done.
we have written query for selecting the rows from Java (CrudRepository) and process them and then update
Issue is:- If two request comes and I have 2 rows available for the requested criteria, then currently same row is selected by select query for both the requests. our expectation is to select different rows rather than selecting the same.
If you want the SELECT to lock the selected rows, and you don't want two concurrent queries to select the same rows, use this:
SELECT ... FROM atable
WHERE ...
FOR UPDATE SKIP LOCKED;
Then the first query will fetch and select all rows that meet the WHERE condition, while the second query will see nothing.
Two comments:
You need to use a transaction that contains both the SELECT and the UPDATE, because locks are released at transaction end.
You can add a LIMIT clause to the SELECT to select and lock a certain maximum of rows.

How can I get the total run time of a query in redshift, with a query?

I'm in the process of benchmarking some queries in redshift so that I can say something intelligent about changes I've made to a table, such as adding encodings and running a vacuum. I can query the stl_query table with a LIKE clause to find the queries I'm interested in, so I have the query id, but tables/views like stv_query_summary are much too granular and I'm not sure how to generate the summarization I need!
The gui dashboard shows the metrics I'm interested in, but the format is difficult to store for later analysis/comparison (in other words, I want to avoid taking screenshots). Is there a good way to rebuild that view with sql selects?
To add to Alex answer, I want to comment that stl_query table has the inconvenience that if the query was in a queue before the runtime then the queue time will be included in the run time and therefore the runtime won't be a very good indicator of performance for the query.
To understand the actual runtime of the query, check on stl_wlm_query for the total_exec_time.
select total_exec_time
from stl_wlm_query
where query='query_id'
There are some usefuls tools/scripts in https://github.com/awslabs/amazon-redshift-utils
Here is one of said scripts stripped out to give you query run times in milliseconds. Play with the filters, ordering etc to show the results you are looking for:
select userid, label, stl_query.query, trim(database) as database, trim(querytxt) as qrytext, starttime, endtime, datediff(milliseconds, starttime,endtime)::numeric(12,2) as run_milliseconds,
aborted, decode(alrt.event,'Very selective query filter','Filter','Scanned a large number of deleted rows','Deleted','Nested Loop Join in the query plan','Nested Loop','Distributed a large number of rows across the network','Distributed','Broadcasted a large number of rows across the network','Broadcast','Missing query planner statistics','Stats',alrt.event) as event
from stl_query
left outer join ( select query, trim(split_part(event,':',1)) as event from STL_ALERT_EVENT_LOG group by query, trim(split_part(event,':',1)) ) as alrt on alrt.query = stl_query.query
where userid <> 1
-- and (querytxt like 'SELECT%' or querytxt like 'select%' )
-- and database = ''
order by starttime desc
limit 100

returning rows from an implicit JOIN where results either exist in both tables OR only one table

I've got a PostgreSQL-9.0.x database that manages an automated testing environment. There are a bunch of tables that contain assorted static data (OS versions, test names, etc) named 'buildlist' & 'osversmap'. However, there are also two tables which contain data which changes often. The first is a 'pending' table which is effectively a test queue where pending tests are self-selected by the test systems, and then deleted when the test run has completed. The second is a 'results' table which contains the test results as they are produced (in progress and completed).
The records in the pending table have a one to many relationship with the records in the results table (each row in pending can have 0 or more rows in results). For example, if no test systems have self-assigned a pending row, then there will be zero associated rows in results, and then once a pending row is assigned, the number of rows in results will increase for each pending row. An added catch is that I always want only the newest results table row associated with each pending table row. What I need to do is query the 'pending' table for pending tests, and then also get a 'logurl' from the results table that corresponds to each pending table row.
All of this is rather similar to this problem, except that I have the added burden of the two additional tables with the static data (buildlist & osversmap):
PHP/SQL: Using only one query, SELECT rows from two tables if data is in both tables, or just SELECT from one table if not
I'm stumbling over how to integrate those two tables with static data into the query. The following query works fine as long as there's at least one row in the 'results' table that corresponds to each row in the pending table (however, it doesn't return anything for rows that only exist in 'pending' yet not yet in 'results'):
SELECT
pending.cl,
pending.id,
pending.buildid,
pending.build_type,
pending.active,
pending.submittracker,
pending.os,pending.arch,
pending.osversion,
pending.branch,
pending.comment,
osversmap.osname,
buildlist.buildname,
results.logurl
FROM pending ,osversmap ,buildlist ,results
WHERE
pending.buildid=buildlist.id
AND pending.os=osversmap.os
AND pending.osversion=osversmap.osversion
AND pending.owner='$owner'
AND pending.completed='f'
AND results.hostname=pending.active
AND results.submittracker=pending.submittracker
AND pending.cl=results.cl
AND results.current_status!='PASSED'
AND results.current_status NOT LIKE '%FAILED'
ORDER BY pending.submittracker,pending.branch,pending.os,pending.arch
thanks in advance!
Does the following work for you?
SELECT pending.cl,
pending.id,
pending.buildid,
pending.build_type,
pending.active,
pending.submittracker,
pending.os,
pending.arch,
pending.osversion,
pending.branch,
pending.comment,
osversmap.osname,
buildlist.buildname,
results.logurl
FROM pending
JOIN osversmap
ON ( pending.os = osversmap.os
AND pending.osversion = osversmap.osversion )
JOIN buildlist
ON ( pending.buildid = buildlist.id )
LEFT OUTER JOIN results
ON ( pending.active = results.hostname
AND pending.submittracker = results.submittracker
AND pending.cl = results.cl
AND results.current_status != 'PASSED'
AND results.current_status NOT LIKE '%FAILED'
)
WHERE pending.owner = '$owner'
AND pending.completed = 'f'
ORDER BY pending.submittracker,
pending.branch,
pending.os,
pending.arch
All of this is rather similar to this problem, except that I have the added burden of the two additional tables with the static data (buildlist & osversmap):
Simplest approach would be to build a view that returns the right rows without referencing buildlist and osversmap. Then join those two tables to the view.
CREATE VIEW your-meaningful-view-name AS
SELECT
pending.cl,
pending.id,
pending.buildid,
pending.build_type,
pending.active,
pending.submittracker,
pending.os,pending.arch,
pending.osversion,
pending.branch,
pending.comment,
results.logurl
FROM pending
-- No DDL or sample INSERT statements. You might need an outer join.
INNER JOIN results
ON (results.hostname=pending.active AND
results.submittracker=pending.submittracker AND
results.cl=pending.cl)
WHERE pending.owner='$owner' AND
pending.completed='f' AND
-- Are *both* these really necessary?
results.current_status!='PASSED' AND
results.current_status NOT LIKE '%FAILED'
And then
SELECT osversmap.osname, buildlist.buildname, ymvn.*,
FROM your-meaningful-view-name ymvn
INNER JOIN osversmap ON ymvn.os=osversmap.os
AND ymvn.osversion=osversmap.osversion
INNER JOIN buildlist ON ymvn.buildid=buildlist.id
ORDER BY ymvn.submittracker, ymvn.branch, ymvn.os, ymvn.arch

lock the rows until next select postgres

Is there a way in postgres to lock the rows until the next select query execution from the same system.And one more thing is there will be no update process on locked rows.
scenario is something like this
If the table1 contains data like
id | txt
-------------------
1 | World
2 | Text
3 | Crawler
4 | Solution
5 | Nation
6 | Under
7 | Padding
8 | Settle
9 | Begin
10 | Large
11 | Someone
12 | Dance
If sys1 executes
select * from table1 order by id limit 5;
then it should lock row from id 1 to 5 for other system which are executing select statement concurrently.
Later if sys1 again execute another select query like
select * from table1 where id>10 order by id limit 5;
then pereviously locked rows should be released.
I don't think this is possible. You cannot block a read only access to a table (unless that select is done FOR UPDATE)
As far as I can tell, the only chance you have is to use the pg_advisory_lock() function.
http://www.postgresql.org/docs/current/static/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS
But this requires a "manual" release of the locks obtained through it. You won't get an automatic unlocking with that.
To lock the rows you would need something like this:
select pg_advisory_lock(id), *
from
(
select * table1 order by id limit 5
) t
(Note the use of the derived table for the LIMIT part. See the manual link I posted for an explanation)
Then you need to store the retrieved IDs and later call pg_advisory_unlock() for each ID.
If each process is always releasing all IDs at once, you could simply use pg_advisory_unlock_all() instead. Then you will not need to store the retrieved IDs.
Note that this will not prevent others from reading the rows using "normal" selects. It will only work if every process that accesses that table uses the same pattern of obtaining the locks.
It looks like you really have a transaction which transcends the borders of your database, and all the change happens in an another system.
My idea is select ... for update no wait to lock the relevant rows, then offload the data into another system, then rollback to unlock the rows. No two select ... for update queries will select the same row, and the second select will fail immediately rather than wait and proceed.
But you don't seem to mark offloaded records in any way; I don't see why two non-consecutive selects won't happily select overlapping range. So I'd still update the records with a flag and/or a target user name and would only select records with the flag unset.
I tried both select...for update and pg_try_advisory_lock and managed to get near my requirement.
/*rows are locking but limit is the problem*/
select * from table1 where pg_try_advisory_lock( id) limit 5;
.
.
$_SESSION['rows'] = $rowcount; // no of row to process
.
.
/*afer each process of word*/
$_SESSION['rows'] -=1;
.
.
/*and finally unlock locked rows*/
if ($_SESSION['rows']===0)
select pg_advisory_unlock_all() from table1
But there are two problem in this
1. As Limit will apply before lock, every time the same rows are trying to lock in different instance.
2. Not sure whether pg_advisory_unlock_all will unlock the rows locked by current instance or all the instance.