lock the rows until next select postgres - postgresql

Is there a way in postgres to lock the rows until the next select query execution from the same system.And one more thing is there will be no update process on locked rows.
scenario is something like this
If the table1 contains data like
id | txt
-------------------
1 | World
2 | Text
3 | Crawler
4 | Solution
5 | Nation
6 | Under
7 | Padding
8 | Settle
9 | Begin
10 | Large
11 | Someone
12 | Dance
If sys1 executes
select * from table1 order by id limit 5;
then it should lock row from id 1 to 5 for other system which are executing select statement concurrently.
Later if sys1 again execute another select query like
select * from table1 where id>10 order by id limit 5;
then pereviously locked rows should be released.

I don't think this is possible. You cannot block a read only access to a table (unless that select is done FOR UPDATE)
As far as I can tell, the only chance you have is to use the pg_advisory_lock() function.
http://www.postgresql.org/docs/current/static/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS
But this requires a "manual" release of the locks obtained through it. You won't get an automatic unlocking with that.
To lock the rows you would need something like this:
select pg_advisory_lock(id), *
from
(
select * table1 order by id limit 5
) t
(Note the use of the derived table for the LIMIT part. See the manual link I posted for an explanation)
Then you need to store the retrieved IDs and later call pg_advisory_unlock() for each ID.
If each process is always releasing all IDs at once, you could simply use pg_advisory_unlock_all() instead. Then you will not need to store the retrieved IDs.
Note that this will not prevent others from reading the rows using "normal" selects. It will only work if every process that accesses that table uses the same pattern of obtaining the locks.

It looks like you really have a transaction which transcends the borders of your database, and all the change happens in an another system.
My idea is select ... for update no wait to lock the relevant rows, then offload the data into another system, then rollback to unlock the rows. No two select ... for update queries will select the same row, and the second select will fail immediately rather than wait and proceed.
But you don't seem to mark offloaded records in any way; I don't see why two non-consecutive selects won't happily select overlapping range. So I'd still update the records with a flag and/or a target user name and would only select records with the flag unset.

I tried both select...for update and pg_try_advisory_lock and managed to get near my requirement.
/*rows are locking but limit is the problem*/
select * from table1 where pg_try_advisory_lock( id) limit 5;
.
.
$_SESSION['rows'] = $rowcount; // no of row to process
.
.
/*afer each process of word*/
$_SESSION['rows'] -=1;
.
.
/*and finally unlock locked rows*/
if ($_SESSION['rows']===0)
select pg_advisory_unlock_all() from table1
But there are two problem in this
1. As Limit will apply before lock, every time the same rows are trying to lock in different instance.
2. Not sure whether pg_advisory_unlock_all will unlock the rows locked by current instance or all the instance.

Related

How can you use 'For update skip locked' in postgres without locking rows in all tables used in the query?

When you want to use postgres's SELECT FOR UPDATE SKIP LOCKED functionality to ensure that two different users reading from a table and claiming tasks do not get blocked by each other and also do not get tasks already being read by another user:
A join is being used in the query to retrieve tasks. We do not want any other table to have row-level locking except the table that contains the main info. Sample query below - Lock only the rows in the table -'task' in the below query
SELECT v.someid , v.info, v.parentinfo_id, v.stage FROM task v, parentinfo pi WHERE v.stage = 'READY_TASK'
AND v.parentinfo_id = pi.id
AND pi.important_info_number = (
SELECT MAX(important_info_number) FROM parentinfo )
ORDER BY v.id limit 200 for update skip locked;
Now if user A is retrieving some 200 rows of this table, user B should be able to retrieve another set of 200 rows.
EDIT: As per the comment below, the query will be changed to :
SELECT v.someid , v.info, v.parentinfo_id, v.stage FROM task v, parentinfo pi WHERE v.stage = 'READY_TASK'
AND v.parentinfo_id = pi.id
AND pi.important_info_number = (
SELECT MAX(important_info_number) FROM parentinfo) ORDER BY v.id limit 200 for update of v skip locked;
How best to place order by such that rows are ordered? While the order would get effected if multiple users invoke this command, still some order sanctity should be maintained of the rows that are being returned.
Also, does this also ensure that multiple threads invoking the same select query would be retrieving a different set of rows or is the locking only done for update commands?
Just experimented with this a little bit - multiple select queries will end up retrieving different set of rows. Also, order by ensures the order of the final result obtained.
Yes,
FOR UPDATE OF "TABLE_NAME" SKIP LOCKED
will lock only TABLE_NAME

historical result of SELECT statement

I want to query a large number of rows and displayed to the user, however the user will see only for example 10 rows and I will use LIMIT and OFFSET, he will press 'next' button in the user interface and the next 10 rows will be fetched.
The database will be updated all the time, is there any way to guarantee that the user will see the data of the next 10 rows as they were in the first select, so any changes in the data will not be reflected if he choose to see the next 10 rows of result.
This is like using SELECT statement as a snapshot of the past, any subsequent updates after the first SELECT will not be visible for the subsequent SELECT LIMIT OFFSET.
You can use cursors, example:
drop table if exists test;
create table test(id int primary key);
insert into test
select i from generate_series(1, 50) i;
declare cur cursor with hold for
select * from test order by id;
move absolute 0 cur; -- get 1st page
fetch 5 cur;
id
----
1
2
3
4
5
(5 rows)
truncate test;
move absolute 5 cur; -- get 2nd page
fetch 5 cur; -- works even though the table is already empty
id
----
6
7
8
9
10
(5 rows)
close cur;
Note that it is rather expensive solution. A declared cursor creates a temporary table with a complete result set (a snapshot). This can cause significant server load. Personally I would rather search for alternative approaches (with dynamic results).
Read the documentation: DECLARE, FETCH.

Transaction Isolation Across Multiple Tables using PostgreSQL MVCC

Question Summary
This is a question about serializability of queries within a SQL transaction.
Specifically, I am using PostgreSQL. It may be assumed that I am using the most current version of PostgreSQL. From what I have read, I believe the technology used to support what I am trying to do is known as "MultiVersion Concurrency Control", or "MVCC".
To sum it up: If I have one primary table, and more-than-1 foreign-key-linked table connected to that primary table, how do I guarantee that, for a given key in the tables, and any number of SELECT statements using that key inside one transaction, each of which is SELECTing from any of the linked tables, I will get data as it existed at the time I started the transaction?
Other Questions
This question is similar, but broader, and the question and answer did not relate specifically to PostgreSQL:
Transaction isolation and reading from multiple tables on SQL Server Express and SQL Server 2005
Example
Let's say I have 3 tables:
bricks
brickworks (primary key)
completion_time (primary key)
has_been_sold
brick_colors
brickworks (primary key, foreign key pointing to "bricks")
completion_time (primary key, foreign key pointing to "bricks")
quadrant (primary key)
color
brick_weight
brickworks (primary key, foreign key pointing to "bricks")
completion_time (primary key, foreign key pointing to "bricks")
weight
A brickworks produces one brick at a time. It makes bricks that may be of different colors in each of its 4 quadrants.
Someone later analyzes the bricks to determine their color combination, and writes the results to the brick_colors table.
Someone else analyzes the bricks to determine their weight, and writes the results to the brick_weight table.
At any given time, an existing brick may or may not have a recorded color, and may or may not have a recorded weight.
An application exists, and this application receives word that someone wants to buy a particular brick (already known at this point to the application by its brickworks/completion_time composite key).
The application wants to select all known properties of the brick AT THE EXACT TIME IT STARTS THE QUERY.
If color or weight information is added MID-TRANSACTION, the application does NOT want to know about it.
The application wants to perform SEPARATE QUERIES (not a SELECT with multiple JOINs to the foreign-key-linked tables, which might return multiple rows because of the brick_colors table).
This example is deliberately simple; the desire to do this without one SELECT with multiple JOINs would be clearer if my example included, say, 10 foreign-key-linked tables, and many or all of them could return multiple rows for the same primary key (like brick_colors does in the example as I have it above).
Attempted Solution
Here's what I've come up with so far:
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY ;
-- All this statement accomplishes is telling the database what rows should be returned from the present point-in-time in future queries within the transaction
SELECT DISTINCT true
FROM bricks b
LEFT JOIN brick_colors bc ON bc.brickworks = b.brickworks AND bc.completion_time = b.completion_time
LEFT JOIN brick_weight bw ON bw.brickworks = b.brickworks AND bw.completion_time = b.completion_time
WHERE b.brickworks = 'Brick-o-Matic' AND b.completion_time = '2017-02-01T07:35:00.000Z' ;
SELECT * FROM brick_colors WHERE b.brickworks = 'Brick-o-Matic' AND b.completion_time = '2017-02-01T07:35:00.000Z' ;
SELECT * FROM brick_weight WHERE b.brickworks = 'Brick-o-Matic' AND b.completion_time = '2017-02-01T07:35:00.000Z' ;
COMMIT ;
It just seems wasteful to use that first SELECT with the JOINs solely for purposes of ensuring serializability.
Is there any other way to do this?
References
PostgreSQL Concurrency Control
PostgreSQL Transcation Isolation
PostgreSQL SET TRANSACTION statement
This is the essence of your question:
how do I guarantee that, for ...... any number of SELECT statements
..... inside one transaction ....... I will get data as it existed at
the time I started the transaction?
This is exactly what Repeatable Read Isolation Level guarantees:
The Repeatable Read isolation level only sees data committed before
the transaction began; it never sees either uncommitted data or
changes committed during transaction execution by concurrent
transactions. (However, the query does see the effects of previous
updates executed within its own transaction, even though they are not
yet committed.) This is a stronger guarantee than is required by the
SQL standard for this isolation level, and prevents all of the
phenomena described in Table 13-1. As mentioned above, this is
specifically allowed by the standard, which only describes the minimum
protections each isolation level must provide.
This level is different from Read Committed in that a query in a
repeatable read transaction sees a snapshot as of the start of the
transaction, not as of the start of the current query within the
transaction. Thus, successive SELECT commands within a single
transaction see the same data, i.e., they do not see changes made by
other transactions that committed after their own transaction started.
A practical example - let say we have 2 simple tables:
CREATE TABLE t1( x int );
INSERT INTO t1 VALUES (1),(2),(3);
CREATE TABLE t2( y int );
INSERT INTO t2 VALUES (1),(2),(3);
A number of tables, their structures, primary keys, foreign keys etc. are unimportant here.
Lets open a first session, start repeatable read isolation level, and run two simple and separate SELECT statements:
test=# START TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION
test=# SELECT * FROM t1;
x
---
1
2
3
(3 wiersze)
test=# SELECT * FROM t2;
y
---
1
2
3
(3 wiersze)
Note that START TRANSACTION command automatically disables autocommit mode in the session.
Now in another session (with default autocommit mode enabled)insert a few records into t1:
test2=# INSERT INTO t1 VALUES(10),(11);
New values were inserded and automatically commited (because autocommit is on).
Now go back to the first session and run SELECT again:
test=# select * from t1;
x
---
1
2
3
(3 wiersze)
As you see, session1 (with active repeatable read transaction) doesn't see any changes commited after the start of the transation.
Lets do the same experiment whit table t2 - go to the second session and issue:
test2=# DELETE FROM t2 WHERE y = 2;
DELETE 1
Now go back to the first session and run SELECT again:
test=# SELECT * FROM t2;
y
---
1
2
3
(3 wiersze)
As you see, again, session1 (with active repeatable read transaction) doesn't see any changes commited after the start of the transation.
And now, in session1, finish the transaction issuing COMMIT, and then SELECT:
test=# SELECT * FROM t1;
x
---
1
2
3
(3 wiersze)
test=# SELECT * FROM t2;
y
---
1
2
3
(3 wiersze)
test=# COMMIT;
COMMIT
test=# select * from t1;
x
----
1
2
3
10
11
(5 wierszy)
test=# select * from t2;
y
---
1
3
(2 wiersze)
As you see, when the repeatable read transaction is started and active, you can run many separate select statement multiple times, and all of these select statements see the same stable snapshot of data as of the start of the transaction, regardles of any commited data in other sessions.

Counting entries depending on foreign key

I have 2 tables, let's call them T_FATHER and T_CHILD, where each father can have multiple childs, like so:
T_FATHER
--------------------------
ID - BIGINT, from Generator
T_CHILD
-------------------------------
ID - BIGINT, from Generator
FATHER_ID - BIGINT, Foreign Key
Now I want to add a counter to the T_CHILD table, that starts with 1 and adds 1 for every new child, but not globally, but per father, like:
ID | FATHER_ID | COUNTER |
--------------------------
1 | 1 | 1 |
--------------------------
2 | 1 | 2 |
--------------------------
3 | 2 | 1 |
--------------------------
My initial thought was creating a before-insert-trigger that counts how many childs are present for the given father and add 1 for the counter. This should work fine unless there are 2 inserts at the same time, which would end with the same counter. Chances are high that this never actually happens - but better save than sorry.
I don't know if it is possible to use a generator, but don't think so as there would have to be a generator per father.
My current approach is using the aforementioned trigger and add a unique index to FATHER_ID + COUNTER, so that only one of the simultaneous inserts goes through. I will have to handle the exception client-side (and reattempt the failed insert).
Is there a better way to handle this directly in Firebird?
PS: There won't be any deletes on any of the two tables, so this is not an issue.
Even with a generator per FATHER_ID you couldn't use them for this, because generators are not transaction safe. If your transaction is rolled back for whatever reason, the generator will have increased anyway, causing gaps.
If there are no deletes, I think your approach with a unique constraint is valid. I would consider an alternative however.
You could decide not to store the counter as such – storing counters in a database is often a bad idea. Instead, only track the insertion order. For that, a generator is usable, because every new record will have a higher value and gaps won't matter. In fact, you don't need anything but the ID you already have. Determine the numbering when selecting; for every child you want to know how many children there are with the same father but a lower ID. As a bonus, deletes would work normally.
Here's a proof of concept using a nested query:
SELECT ID, FATHER_ID,
(SELECT 1 + COUNT(*)
FROM T_CHILD AS OTHERS
WHERE OTHERS.FATHER_ID = C.FATHER_ID
AND OTHERS.ID < C.ID) AS COUNTER
FROM T_CHILD AS C
There's also the option of a window function. It has to have a derived table to also count any rows that are ultimately not being selected:
SELECT * FROM (
SELECT ID, FATHER_ID,
ROW_NUMBER() OVER(PARTITION BY FATHER_ID ORDER BY ID) AS COUNTER
FROM T_CHILD
-- Filtering that wouldn't affect COUNTER (e.g. WHERE FATHER_ID ... AND ID < ...)
)
-- Filtering that would affect COUNTER (e.g. WHERE ID > ...)
These two options have completely different performance characteristics. Which one, if either at all, is suitable for you depends on your data size and access patterns.
And when you try with a computed field and the Select solution of Thijs van Dien ?
CREATE TABLE T_CHILD(
ID INTEGER,
FATHER_ID INTEGER,
COUNTER COMPUTED BY (
(SELECT 1 + COUNT(*)
FROM T_CHILD AS OTHERS
WHERE OTHERS.FATHER_ID = T_CHILD.FATHER_ID
AND OTHERS.ID < T_CHILD.ID)
)
);
During the insert, you should just do a "Select...count + 1" directly inside that field.
But I would probably reconsider adding that field in the first place. It feels like redundant information that could easily be deduced at the moment you need it.(For example, by using DENSE_RANK http://www.firebirdfaq.org/faq343/)

SQLite - a smart way to remove and add new objects

I have a table in my database and I want for each row in my table to have an unique id and to have the rows named sequently.
For example: I have 10 rows, each has an id - starting from 0, ending at 9. When I remove a row from a table, lets say - row number 5, there occurs a "hole". And afterwards I add more data, but the "hole" is still there.
It is important for me to know exact number of rows and to have at every row data in order to access my table arbitrarily.
There is a way in sqlite to do it? Or do I have to manually manage removing and adding of data?
Thank you in advance,
Ilya.
It may be worth considering whether you really want to do this. Primary keys usually should not change through the lifetime of the row, and you can always find the total number of rows by running:
SELECT COUNT(*) FROM table_name;
That said, the following trigger should "roll down" every ID number whenever a delete creates a hole:
CREATE TRIGGER sequentialize_ids AFTER DELETE ON table_name FOR EACH ROW
BEGIN
UPDATE table_name SET id=id-1 WHERE id > OLD.id;
END;
I tested this on a sample database and it appears to work as advertised. If you have the following table:
id name
1 First
2 Second
3 Third
4 Fourth
And delete where id=2, afterwards the table will be:
id name
1 First
2 Third
3 Fourth
This trigger can take a long time and has very poor scaling properties (it takes longer for each row you delete and each remaining row in the table). On my computer, deleting 15 rows at the beginning of a 1000 row table took 0.26 seconds, but this will certainly be longer on an iPhone.
I strongly suggest that you re-think your design. In my opinion your asking yourself for troubles in the future (e.g. if you create another table and want to have some relations between the tables).
If you want to know the number of rows just use:
SELECT count(*) FROM table_name;
If you want to access rows in the order of id, just define this field using PRIMARY KEY constraint:
CREATE TABLE test (
id INTEGER PRIMARY KEY,
...
);
and get rows using ORDER BY clause with ASC or DESC:
SELECT * FROM table_name ORDER BY id ASC;
Sqlite creates an index for the primary key field, so this query is fast.
I think that you would be interested in reading about LIMIT and OFFSET clauses.
The best source of information is the SQLite documentation.
If you don't want to take Stephen Jennings's very clever but performance-killing approach, just query a little differently. Instead of:
SELECT * FROM mytable WHERE id = ?
Do:
SELECT * FROM mytable ORDER BY id LIMIT 1 OFFSET ?
Note that OFFSET is zero-based, so you may need to subtract 1 from the variable you're indexing in with.
If you want to reclaim deleted row ids the VACUUM command or pragma may be what you seek,
http://www.sqlite.org/faq.html#q12
http://www.sqlite.org/lang_vacuum.html
http://www.sqlite.org/pragma.html#pragma_auto_vacuum