Transaction Isolation Across Multiple Tables using PostgreSQL MVCC - postgresql

Question Summary
This is a question about serializability of queries within a SQL transaction.
Specifically, I am using PostgreSQL. It may be assumed that I am using the most current version of PostgreSQL. From what I have read, I believe the technology used to support what I am trying to do is known as "MultiVersion Concurrency Control", or "MVCC".
To sum it up: If I have one primary table, and more-than-1 foreign-key-linked table connected to that primary table, how do I guarantee that, for a given key in the tables, and any number of SELECT statements using that key inside one transaction, each of which is SELECTing from any of the linked tables, I will get data as it existed at the time I started the transaction?
Other Questions
This question is similar, but broader, and the question and answer did not relate specifically to PostgreSQL:
Transaction isolation and reading from multiple tables on SQL Server Express and SQL Server 2005
Example
Let's say I have 3 tables:
bricks
brickworks (primary key)
completion_time (primary key)
has_been_sold
brick_colors
brickworks (primary key, foreign key pointing to "bricks")
completion_time (primary key, foreign key pointing to "bricks")
quadrant (primary key)
color
brick_weight
brickworks (primary key, foreign key pointing to "bricks")
completion_time (primary key, foreign key pointing to "bricks")
weight
A brickworks produces one brick at a time. It makes bricks that may be of different colors in each of its 4 quadrants.
Someone later analyzes the bricks to determine their color combination, and writes the results to the brick_colors table.
Someone else analyzes the bricks to determine their weight, and writes the results to the brick_weight table.
At any given time, an existing brick may or may not have a recorded color, and may or may not have a recorded weight.
An application exists, and this application receives word that someone wants to buy a particular brick (already known at this point to the application by its brickworks/completion_time composite key).
The application wants to select all known properties of the brick AT THE EXACT TIME IT STARTS THE QUERY.
If color or weight information is added MID-TRANSACTION, the application does NOT want to know about it.
The application wants to perform SEPARATE QUERIES (not a SELECT with multiple JOINs to the foreign-key-linked tables, which might return multiple rows because of the brick_colors table).
This example is deliberately simple; the desire to do this without one SELECT with multiple JOINs would be clearer if my example included, say, 10 foreign-key-linked tables, and many or all of them could return multiple rows for the same primary key (like brick_colors does in the example as I have it above).
Attempted Solution
Here's what I've come up with so far:
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY ;
-- All this statement accomplishes is telling the database what rows should be returned from the present point-in-time in future queries within the transaction
SELECT DISTINCT true
FROM bricks b
LEFT JOIN brick_colors bc ON bc.brickworks = b.brickworks AND bc.completion_time = b.completion_time
LEFT JOIN brick_weight bw ON bw.brickworks = b.brickworks AND bw.completion_time = b.completion_time
WHERE b.brickworks = 'Brick-o-Matic' AND b.completion_time = '2017-02-01T07:35:00.000Z' ;
SELECT * FROM brick_colors WHERE b.brickworks = 'Brick-o-Matic' AND b.completion_time = '2017-02-01T07:35:00.000Z' ;
SELECT * FROM brick_weight WHERE b.brickworks = 'Brick-o-Matic' AND b.completion_time = '2017-02-01T07:35:00.000Z' ;
COMMIT ;
It just seems wasteful to use that first SELECT with the JOINs solely for purposes of ensuring serializability.
Is there any other way to do this?
References
PostgreSQL Concurrency Control
PostgreSQL Transcation Isolation
PostgreSQL SET TRANSACTION statement

This is the essence of your question:
how do I guarantee that, for ...... any number of SELECT statements
..... inside one transaction ....... I will get data as it existed at
the time I started the transaction?
This is exactly what Repeatable Read Isolation Level guarantees:
The Repeatable Read isolation level only sees data committed before
the transaction began; it never sees either uncommitted data or
changes committed during transaction execution by concurrent
transactions. (However, the query does see the effects of previous
updates executed within its own transaction, even though they are not
yet committed.) This is a stronger guarantee than is required by the
SQL standard for this isolation level, and prevents all of the
phenomena described in Table 13-1. As mentioned above, this is
specifically allowed by the standard, which only describes the minimum
protections each isolation level must provide.
This level is different from Read Committed in that a query in a
repeatable read transaction sees a snapshot as of the start of the
transaction, not as of the start of the current query within the
transaction. Thus, successive SELECT commands within a single
transaction see the same data, i.e., they do not see changes made by
other transactions that committed after their own transaction started.
A practical example - let say we have 2 simple tables:
CREATE TABLE t1( x int );
INSERT INTO t1 VALUES (1),(2),(3);
CREATE TABLE t2( y int );
INSERT INTO t2 VALUES (1),(2),(3);
A number of tables, their structures, primary keys, foreign keys etc. are unimportant here.
Lets open a first session, start repeatable read isolation level, and run two simple and separate SELECT statements:
test=# START TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION
test=# SELECT * FROM t1;
x
---
1
2
3
(3 wiersze)
test=# SELECT * FROM t2;
y
---
1
2
3
(3 wiersze)
Note that START TRANSACTION command automatically disables autocommit mode in the session.
Now in another session (with default autocommit mode enabled)insert a few records into t1:
test2=# INSERT INTO t1 VALUES(10),(11);
New values were inserded and automatically commited (because autocommit is on).
Now go back to the first session and run SELECT again:
test=# select * from t1;
x
---
1
2
3
(3 wiersze)
As you see, session1 (with active repeatable read transaction) doesn't see any changes commited after the start of the transation.
Lets do the same experiment whit table t2 - go to the second session and issue:
test2=# DELETE FROM t2 WHERE y = 2;
DELETE 1
Now go back to the first session and run SELECT again:
test=# SELECT * FROM t2;
y
---
1
2
3
(3 wiersze)
As you see, again, session1 (with active repeatable read transaction) doesn't see any changes commited after the start of the transation.
And now, in session1, finish the transaction issuing COMMIT, and then SELECT:
test=# SELECT * FROM t1;
x
---
1
2
3
(3 wiersze)
test=# SELECT * FROM t2;
y
---
1
2
3
(3 wiersze)
test=# COMMIT;
COMMIT
test=# select * from t1;
x
----
1
2
3
10
11
(5 wierszy)
test=# select * from t2;
y
---
1
3
(2 wiersze)
As you see, when the repeatable read transaction is started and active, you can run many separate select statement multiple times, and all of these select statements see the same stable snapshot of data as of the start of the transaction, regardles of any commited data in other sessions.

Related

REPEATABLE READ isolation level - successive SELECTs return different values

I'm trying to check the REPEATABLE READ isolation level in PostgreSQL, specifically the statement:
successive SELECT commands within a single transaction see the same
data, i.e., they do not see changes made by other transactions that
committed after their own transaction started.
To check that I run two scripts in two Query Editors in pg_admin.
The 1st one:
begin ISOLATION LEVEL REPEATABLE READ;
DROP TABLE IF EXISTS t1;
DROP TABLE IF EXISTS t2;
select * into t1 from topic where id=1;
select pg_sleep(5);
select * into t2 from topic where id=1 ;
commit;
Here we have two successive SELECTs with the pause of 5 seconds between them, just to have time to run the 2nd UPDATE script. t1 and t2 are tables used to save the results. Thanks to them I'm able to check the selected data after the scripts execution.
I run the 2nd script immediately after the fist:
begin ISOLATION LEVEL REPEATABLE READ;
update topic set title='new value' where id=1;
commit;
It must commit after the 1st select but before the 2nd one.
The problem is that two successive SELECTs return different values - the results in t1 and t2 are different. I suggested they must the same. Could you explain me why this is happening?
Maybe pg_sleep starts transaction implicitly?

Understanding Postgres SERIALIZABLE isolation level

I have two experiments that do not work as expect after reading postgres documentation. I use postgres 12
Experiment 1
Data preparation:
CREATE TABLE Test
(
id SERIAL primary key,
level int,
value int
);
INSERT INTO Test (
level,
value
)
SELECT 1, 10 UNION ALL
SELECT 1, 20 UNION all
SELECT 1, 30 UNION ALL
SELECT 2, 100;
Then I open two query windows
Window 1
BEGIN TRANSACTION ISOLATION level SERIALIZABLE;
INSERT INTO Test(level, value)
SELECT 2, SUM(value)
FROM Test
WHERE level = 1
Window 2
BEGIN TRANSACTION ISOLATION level SERIALIZABLE;
INSERT INTO Test(level, value)
SELECT 3, SUM(value)
FROM Test
WHERE level = 2
Now, if I commit first Window 1 then Window 2, Window 2 fails because the data it has read is stale, which is expected. However, if I first commit Window 2 then Window 1, Window 1 fails, but why? Window 2 has already committed, its result has not been affected by Window 1. The result of Window 1 has not been affected either. Therefore, I don't understand why Window 1 commit fails after Window 2 commit
Experiment 2
It is very similar to experiment 1, but now different levels are stored in different tables.
Data preparation
CREATE TABLE Level1 (
id SERIAL primary key,
value int
);
CREATE TABLE Level2 (
id SERIAL primary key,
value int
);
CREATE TABLE Level3 (
id SERIAL primary key,
value int
);
INSERT INTO Level1 (
value
)
SELECT 10 UNION ALL
SELECT 20 UNION all
SELECT 30;
INSERT INTO Level2 (
value
)
SELECT 100;
Window 1
BEGIN TRANSACTION ISOLATION level SERIALIZABLE;
INSERT INTO Level2(value)
SELECT SUM(value)
FROM Level1
Window 2
BEGIN TRANSACTION ISOLATION level SERIALIZABLE;
INSERT INTO Level3(value)
SELECT SUM(value)
FROM Level2
Now both windows commit successfully in any order I commit them! It totally confuses me. Logically, it is the same as experiment 1, I would understand if it worked as in experiment 1, but here the serializable isolation does not seem to be working at all!
In experiment 1, both transaction read and write the same tables, so there is a potential for conflict.
You are right that there is no actual conflict, but the SELECT statements you ran performed a sequential scan, which reads all rows and consequently places a predicate lock on the whole table. That is why you get a false positive serialization error.
Compare what the documentation has to say:
Predicate locks in PostgreSQL, like in most other database systems, are based on data actually accessed by a transaction. These will show up in the pg_locks system view with a mode of SIReadLock. The particular locks acquired during execution of a query will depend on the plan used by the query
While PostgreSQL's Serializable transaction isolation level only allows concurrent transactions to commit if it can prove there is a serial order of execution that would produce the same effect, it doesn't always prevent errors from being raised that would not occur in true serial execution.
This does not happen in your second experiment. The two transactions can be serialized (like in your first experiment): first the one in window 2, then the one in window 1. This time there is no false positive serialization error.

Are race conditions possible with PostgreSQL auto-increment

Are there any conditions under which records created in a table using a typical auto-increment field would be available for read out of sequence?
For instance, could a record with value 10 ever appear in the result of a select query when the record with value 9 is not yet visible to a select query?
The purpose for my question is… I want to know if it is reliable to use the maximum value retrieved from one query as the lower bound to identify previously unretrieved values in a later query, or could that potentially miss a row?
If that kind of race condition is possible under some circumstances, then are any of the isolation levels that can be used for the select queries that are immune to that problem?
Yes, and good on you for thinking about it.
You can trivially demonstrate this with three concurrent psql sessions, given some table
CREATE TABLE x (
seq serial primary key,
n integer not null
);
then
SESSION 1 SESSION 2 SESSION 3
BEGIN;
BEGIN;
INSERT INTO x(n) VALUES(1)
INSERT INTO x(n) VALUES (2);
COMMIT;
SELECT * FROM x;
COMMIT;
SELECT * FROM x;
It is not safe to assume that for any generated value n, all generated values n-1 have been used by already-committed or already-aborted xacts. They might be in progress and commit after you see n.
I don't think isolation levels really help you here. There's no mutual dependency for SERIALIZABLE to detect.
This is partly why logical decoding was added, so you can get a consistent stream in commit order.

In-order sequence generation

Is there a way to generate some kind of in-order identifier for a table records?
Suppose that we have two threads doing queries:
Thread 1:
begin;
insert into table1(id, value) values (nextval('table1_seq'), 'hello');
commit;
Thread 2:
begin;
insert into table1(id, value) values (nextval('table1_seq'), 'world');
commit;
It's entirely possible (depending on timing) that an external observer would see the (2, 'world') record appear before the (1, 'hello').
That's fine, but I want a way to get all the records in the 'table1' that appeared since the last time the external observer checked it.
So, is there any way to get the records in the order they were inserted? Maybe OIDs can help?
No. Since there is no natural order of rows in a database table, all you have to work with is the values in your table.
Well, there are the Postgres specific system columns cmin and ctid you could abuse to some degree.
The tuple ID (ctid) contains the file block number and position in the block for the row. So this represents the current physical ordering on disk. Later additions will have a bigger ctid, normally. Your SELECT statement could look like this
SELECT *, ctid -- save ctid from last row in last_ctid
FROM tbl
WHERE ctid > last_ctid
ORDER BY ctid
ctid has the data type tid. Example: '(0,9)'::tid
However it is not stable as long-term identifier, since VACUUM or any concurrent UPDATE or some other operations can change the physical location of a tuple at any time. For the duration of a transaction it is stable, though. And if you are just inserting and nothing else, it should work locally for your purpose.
I would add a timestamp column with default now() in addition to the serial column ...
I would also let a column default populate your id column (a serial or IDENTITY column). That retrieves the number from the sequence at a later stage than explicitly fetching and then inserting it, thereby minimizing (but not eliminating) the window for a race condition - the chance that a lower id would be inserted at a later time. Detailed instructions:
Auto increment table column
What you want is to force transactions to commit (making their inserts visible) in the same order that they did the inserts. As far as other clients are concerned the inserts haven't happened until they're committed, since they might roll back and vanish.
This is true even if you don't wrap the inserts in an explicit begin / commit. Transaction commit, even if done implicitly, still doesn't necessarily run in the same order that the row its self was inserted. It's subject to operating system CPU scheduler ordering decisions, etc.
Even if PostgreSQL supported dirty reads this would still be true. Just because you start three inserts in a given order doesn't mean they'll finish in that order.
There is no easy or reliable way to do what you seem to want that will preserve concurrency. You'll need to do your inserts in order on a single worker - or use table locking as Tometzky suggests, which has basically the same effect since only one of your insert threads can be doing anything at any given time.
You can use advisory locking, but the effect is the same.
Using a timestamp won't help, since you don't know if for any two timestamps there's a row with a timestamp between the two that hasn't yet been committed.
You can't rely on an identity column where you read rows only up to the first "gap" because gaps are normal in system-generated columns due to rollbacks.
I think you should step back and look at why you have this requirement and, given this requirement, why you're using individual concurrent inserts.
Maybe you'll be better off doing small-block batched inserts from a single session?
If you mean that every query if it sees world row it has to also see hello row then you'd need to do:
begin;
lock table table1 in share update exclusive mode;
insert into table1(id, value) values (nextval('table1_seq'), 'hello');
commit;
This share update exclusive mode is the weakest lock mode which is self-exclusive — only one session can hold it at a time.
Be aware that this will not make this sequence gap-less — this is a different issue.
We found another solution with recent PostgreSQL servers, similar to #erwin's answer but with txid.
When inserting rows, instead of using a sequence, insert txid_current() as row id. This ID is monotonically increasing on each new transaction.
Then, when selecting rows from the table, add to the WHERE clause id < txid_snapshot_xmin(txid_current_snapshot()).
txid_snapshot_xmin(txid_current_snapshot()) corresponds to the transaction index of the oldest still-open transaction. Thus, if row 20 is committed before row 19, it will be filtered out because transaction 19 will still be open. When the transaction 19 is committed, both rows 19 and 20 will become visible.
When no transaction is opened, the snapshot xmin will be the transaction id of the currently running SELECT statement.
The returned transaction IDs are 64-bits, the higher 32 bits are an epoch and the lower 32 bits are the actual ID.
Here is the documentation of these functions: https://www.postgresql.org/docs/9.6/static/functions-info.html#FUNCTIONS-TXID-SNAPSHOT
Credits to tux3 for the idea.

lock the rows until next select postgres

Is there a way in postgres to lock the rows until the next select query execution from the same system.And one more thing is there will be no update process on locked rows.
scenario is something like this
If the table1 contains data like
id | txt
-------------------
1 | World
2 | Text
3 | Crawler
4 | Solution
5 | Nation
6 | Under
7 | Padding
8 | Settle
9 | Begin
10 | Large
11 | Someone
12 | Dance
If sys1 executes
select * from table1 order by id limit 5;
then it should lock row from id 1 to 5 for other system which are executing select statement concurrently.
Later if sys1 again execute another select query like
select * from table1 where id>10 order by id limit 5;
then pereviously locked rows should be released.
I don't think this is possible. You cannot block a read only access to a table (unless that select is done FOR UPDATE)
As far as I can tell, the only chance you have is to use the pg_advisory_lock() function.
http://www.postgresql.org/docs/current/static/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS
But this requires a "manual" release of the locks obtained through it. You won't get an automatic unlocking with that.
To lock the rows you would need something like this:
select pg_advisory_lock(id), *
from
(
select * table1 order by id limit 5
) t
(Note the use of the derived table for the LIMIT part. See the manual link I posted for an explanation)
Then you need to store the retrieved IDs and later call pg_advisory_unlock() for each ID.
If each process is always releasing all IDs at once, you could simply use pg_advisory_unlock_all() instead. Then you will not need to store the retrieved IDs.
Note that this will not prevent others from reading the rows using "normal" selects. It will only work if every process that accesses that table uses the same pattern of obtaining the locks.
It looks like you really have a transaction which transcends the borders of your database, and all the change happens in an another system.
My idea is select ... for update no wait to lock the relevant rows, then offload the data into another system, then rollback to unlock the rows. No two select ... for update queries will select the same row, and the second select will fail immediately rather than wait and proceed.
But you don't seem to mark offloaded records in any way; I don't see why two non-consecutive selects won't happily select overlapping range. So I'd still update the records with a flag and/or a target user name and would only select records with the flag unset.
I tried both select...for update and pg_try_advisory_lock and managed to get near my requirement.
/*rows are locking but limit is the problem*/
select * from table1 where pg_try_advisory_lock( id) limit 5;
.
.
$_SESSION['rows'] = $rowcount; // no of row to process
.
.
/*afer each process of word*/
$_SESSION['rows'] -=1;
.
.
/*and finally unlock locked rows*/
if ($_SESSION['rows']===0)
select pg_advisory_unlock_all() from table1
But there are two problem in this
1. As Limit will apply before lock, every time the same rows are trying to lock in different instance.
2. Not sure whether pg_advisory_unlock_all will unlock the rows locked by current instance or all the instance.