Understanding Postgres SERIALIZABLE isolation level - postgresql

I have two experiments that do not work as expect after reading postgres documentation. I use postgres 12
Experiment 1
Data preparation:
CREATE TABLE Test
(
id SERIAL primary key,
level int,
value int
);
INSERT INTO Test (
level,
value
)
SELECT 1, 10 UNION ALL
SELECT 1, 20 UNION all
SELECT 1, 30 UNION ALL
SELECT 2, 100;
Then I open two query windows
Window 1
BEGIN TRANSACTION ISOLATION level SERIALIZABLE;
INSERT INTO Test(level, value)
SELECT 2, SUM(value)
FROM Test
WHERE level = 1
Window 2
BEGIN TRANSACTION ISOLATION level SERIALIZABLE;
INSERT INTO Test(level, value)
SELECT 3, SUM(value)
FROM Test
WHERE level = 2
Now, if I commit first Window 1 then Window 2, Window 2 fails because the data it has read is stale, which is expected. However, if I first commit Window 2 then Window 1, Window 1 fails, but why? Window 2 has already committed, its result has not been affected by Window 1. The result of Window 1 has not been affected either. Therefore, I don't understand why Window 1 commit fails after Window 2 commit
Experiment 2
It is very similar to experiment 1, but now different levels are stored in different tables.
Data preparation
CREATE TABLE Level1 (
id SERIAL primary key,
value int
);
CREATE TABLE Level2 (
id SERIAL primary key,
value int
);
CREATE TABLE Level3 (
id SERIAL primary key,
value int
);
INSERT INTO Level1 (
value
)
SELECT 10 UNION ALL
SELECT 20 UNION all
SELECT 30;
INSERT INTO Level2 (
value
)
SELECT 100;
Window 1
BEGIN TRANSACTION ISOLATION level SERIALIZABLE;
INSERT INTO Level2(value)
SELECT SUM(value)
FROM Level1
Window 2
BEGIN TRANSACTION ISOLATION level SERIALIZABLE;
INSERT INTO Level3(value)
SELECT SUM(value)
FROM Level2
Now both windows commit successfully in any order I commit them! It totally confuses me. Logically, it is the same as experiment 1, I would understand if it worked as in experiment 1, but here the serializable isolation does not seem to be working at all!

In experiment 1, both transaction read and write the same tables, so there is a potential for conflict.
You are right that there is no actual conflict, but the SELECT statements you ran performed a sequential scan, which reads all rows and consequently places a predicate lock on the whole table. That is why you get a false positive serialization error.
Compare what the documentation has to say:
Predicate locks in PostgreSQL, like in most other database systems, are based on data actually accessed by a transaction. These will show up in the pg_locks system view with a mode of SIReadLock. The particular locks acquired during execution of a query will depend on the plan used by the query
While PostgreSQL's Serializable transaction isolation level only allows concurrent transactions to commit if it can prove there is a serial order of execution that would produce the same effect, it doesn't always prevent errors from being raised that would not occur in true serial execution.
This does not happen in your second experiment. The two transactions can be serialized (like in your first experiment): first the one in window 2, then the one in window 1. This time there is no false positive serialization error.

Related

How to ensure sum of amounts in one table is less than the amount of another?

Say I have a table of marbles
id
color
total
1
blue
5
2
red
10
3
swirly
3
and I need to put them into bags with a unique constraint on (bag_id, marble_id):
bag_id
marble_id
quantity
1
1 (blue)
2
1
2 (red)
3
2
1 (blue)
2
I have a query for bagging at most the number of remaining marbles
WITH unbagged AS (
SELECT
marble.total - COALESCE( SUM( bag.quantity ), 0 ) AS quantity
FROM marble
LEFT JOIN bag ON marble.id = bag.marble_id
WHERE marble.id = :marble_id
GROUP BY marble.id )
INSERT INTO bag (bag_id, marble_id, quantity)
SELECT
:bag_id,
:marble_id,
LEAST( :quantity, unbagged.quantity )
FROM unbagged
ON CONFLICT (bag_id, marble_id) DO UPDATE SET
quantity = bag.quantity
+ LEAST(
EXCLUDED.quantity,
(SELECT quantity FROM unbagged) )
which works great until one day, it gets called twice at exactly the same time with the same item and I end up with 6 swirly marbles in a bag (or maybe 3 each in 2 bags), even though there are only 3 total.
I think I understand why, but I don't know how to prevent this from happening?
Your algorithm isn't exactly clear to me, but the core issue is concurrency.
Manual locking
Your query processes a single given row in table marble at a time. The cheapest solution is to take an exclusive lock on that row (assuming that's the only query writing to marble and bag). Then the next transaction trying to mess with the same kind of marble has to wait until the current one has committed (or rolled back).
BEGIN;
SELECT FROM marble WHERE id = :marble_id FOR UPDATE; -- row level lock
WITH unbagged AS ( ...
COMMIT;
SERIALIZABLE
Or use serializable transaction isolation, that's the more expensive "catch-all" solution - and be prepared to repeat the transaction in case of a serialization error. Like:
BEGIN ISOLATION LEVEL SERIALIZABLE;
WITH unbagged AS ( ...
COMMIT;
Related:
How to atomically replace a subset of table data
Atomic UPDATE .. SELECT in Postgres

Postgres SERIAL column Autocommit gap

If I have table:
CREATE TABLE table_name(
id SERIAL
);
And I have following id's inserted: ..., 68, 69.
Then I have 2 and I have to competing transactions (T1, T2) running in parallel. I understand that it could happened that transaction finishing first gets higher number because id is assigned and written in WAL before transaction commits.
T1 (Take Number = 70), T2 (Take number = 71), T2 (Commit), T1(Commit)
What is situation when having AUTOCOMMIT (When inserting row outside of transaction).
And I have to very close insert is id guaranteed that first inserted row will get lower number?
Use case is following:
After inserting row, I would execute SELECT id FROM table_name ORDER BY id. Could it happen that i execute this command twice once after another and then get following result:
Select 1 result: 68,69,71
Select 2 result: 68,69,70,71
Even if you don't use explicit transactions, it is not guaranteed that the statement that gets the lower sequence value will also commit first.

Prevent two threads from selecting same row ibm db2

I have a situation where I have multiple (potentially hundreds) threads repeating the same task (using a java scheduled executor, if you are curious). This task entails selecting rows of changes (from a table called change) that have not yet been processed (processed changes are kept track in a m:n join table called process_change_rel that keeps track of the process id, record id and status) processing them, then updating back the status.
My question is, how is the best way to prevent two threads from the same process from selecting the same row? Will the below solution (using for update to lock rows ) work? If not, please suggest a working solution
Create table change(
—id , autogenerated pk
—other fields
)
Create table change_process_rel(
—change id (pk of change table)
—process id (pk of process table)
—status)
Query I would use is listed below
Select * from
change c
where c.id not in(select changeid from change_process_rel with cs) for update
Please let me know if this would work
You have to "lock" a row which you are going to process somehow. Such a "locking" should be concurrent of course with minimum conflicts / errors.
One way is as follows:
Create table change
(
id int not null generated always as identity
, v varchar(10)
) in userspace1;
insert into change (v) values '1', '2', '3';
Create table change_process_rel
(
id int not null
, pid int not null
, status int not null
) in userspace1;
create unique index change_process_rel1 on change_process_rel(id);
Now you should be able to run the same statement from multiple concurrent sessions:
SELECT ID
FROM NEW TABLE
(
insert into change_process_rel (id, pid, status)
select c.id, mon_get_application_handle(), 1
from change c
where not exists (select 1 from change_process_rel r where r.id = c.id)
fetch first 1 row only
with ur
);
Every such a statement inserts 1 or 0 rows into the change_process_rel table, which is used here as a "lock" table. The corresponding ID from change is returned, and you may proceed with processing of the corresponding event in the same transaction.
If the transaction completes successfully, then the row inserted into the change_process_rel table is saved, so, the corresponding id from change may be considered as processed. If the transaction fails, the corresponding "lock" row from change_process_rel disappears, and this row may be processed later by this or another application.
The problem of this method is, that when both tables become large enough, such a sub-select may not work as quick as previously.
Another method is to use Evaluate uncommitted data through lock deferral.
It requires to place the status column into the change table.
Unfortunately, Db2 for LUW doesn't have SKIP LOCKED functionality, which might help with such a sort of algorithms.
If, let's say, status=0 is "not processed", and status<>0 is some processing / processed status, then after setting these DB2_EVALUNCOMMITTED and DB2_SKIP* registry variables and restart the instance, you may "catch" the next ID for processing with the following statement.
SELECT ID
FROM NEW TABLE
(
update
(
select id, status
from change
where status=0
fetch first 1 row only
)
set status=1
);
Once you get it, you may do further processing of this ID in the same transaction as previously.
It's good to create an index for performance:
create index change1 on change(status);
and may be set this table as volatile or collect distribution statistics on this column in addition to regular statistics on table and its indexes periodically.
Note that such a registry variables setting has global effect, and you should keep it in mind...

Transaction Isolation Across Multiple Tables using PostgreSQL MVCC

Question Summary
This is a question about serializability of queries within a SQL transaction.
Specifically, I am using PostgreSQL. It may be assumed that I am using the most current version of PostgreSQL. From what I have read, I believe the technology used to support what I am trying to do is known as "MultiVersion Concurrency Control", or "MVCC".
To sum it up: If I have one primary table, and more-than-1 foreign-key-linked table connected to that primary table, how do I guarantee that, for a given key in the tables, and any number of SELECT statements using that key inside one transaction, each of which is SELECTing from any of the linked tables, I will get data as it existed at the time I started the transaction?
Other Questions
This question is similar, but broader, and the question and answer did not relate specifically to PostgreSQL:
Transaction isolation and reading from multiple tables on SQL Server Express and SQL Server 2005
Example
Let's say I have 3 tables:
bricks
brickworks (primary key)
completion_time (primary key)
has_been_sold
brick_colors
brickworks (primary key, foreign key pointing to "bricks")
completion_time (primary key, foreign key pointing to "bricks")
quadrant (primary key)
color
brick_weight
brickworks (primary key, foreign key pointing to "bricks")
completion_time (primary key, foreign key pointing to "bricks")
weight
A brickworks produces one brick at a time. It makes bricks that may be of different colors in each of its 4 quadrants.
Someone later analyzes the bricks to determine their color combination, and writes the results to the brick_colors table.
Someone else analyzes the bricks to determine their weight, and writes the results to the brick_weight table.
At any given time, an existing brick may or may not have a recorded color, and may or may not have a recorded weight.
An application exists, and this application receives word that someone wants to buy a particular brick (already known at this point to the application by its brickworks/completion_time composite key).
The application wants to select all known properties of the brick AT THE EXACT TIME IT STARTS THE QUERY.
If color or weight information is added MID-TRANSACTION, the application does NOT want to know about it.
The application wants to perform SEPARATE QUERIES (not a SELECT with multiple JOINs to the foreign-key-linked tables, which might return multiple rows because of the brick_colors table).
This example is deliberately simple; the desire to do this without one SELECT with multiple JOINs would be clearer if my example included, say, 10 foreign-key-linked tables, and many or all of them could return multiple rows for the same primary key (like brick_colors does in the example as I have it above).
Attempted Solution
Here's what I've come up with so far:
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY ;
-- All this statement accomplishes is telling the database what rows should be returned from the present point-in-time in future queries within the transaction
SELECT DISTINCT true
FROM bricks b
LEFT JOIN brick_colors bc ON bc.brickworks = b.brickworks AND bc.completion_time = b.completion_time
LEFT JOIN brick_weight bw ON bw.brickworks = b.brickworks AND bw.completion_time = b.completion_time
WHERE b.brickworks = 'Brick-o-Matic' AND b.completion_time = '2017-02-01T07:35:00.000Z' ;
SELECT * FROM brick_colors WHERE b.brickworks = 'Brick-o-Matic' AND b.completion_time = '2017-02-01T07:35:00.000Z' ;
SELECT * FROM brick_weight WHERE b.brickworks = 'Brick-o-Matic' AND b.completion_time = '2017-02-01T07:35:00.000Z' ;
COMMIT ;
It just seems wasteful to use that first SELECT with the JOINs solely for purposes of ensuring serializability.
Is there any other way to do this?
References
PostgreSQL Concurrency Control
PostgreSQL Transcation Isolation
PostgreSQL SET TRANSACTION statement
This is the essence of your question:
how do I guarantee that, for ...... any number of SELECT statements
..... inside one transaction ....... I will get data as it existed at
the time I started the transaction?
This is exactly what Repeatable Read Isolation Level guarantees:
The Repeatable Read isolation level only sees data committed before
the transaction began; it never sees either uncommitted data or
changes committed during transaction execution by concurrent
transactions. (However, the query does see the effects of previous
updates executed within its own transaction, even though they are not
yet committed.) This is a stronger guarantee than is required by the
SQL standard for this isolation level, and prevents all of the
phenomena described in Table 13-1. As mentioned above, this is
specifically allowed by the standard, which only describes the minimum
protections each isolation level must provide.
This level is different from Read Committed in that a query in a
repeatable read transaction sees a snapshot as of the start of the
transaction, not as of the start of the current query within the
transaction. Thus, successive SELECT commands within a single
transaction see the same data, i.e., they do not see changes made by
other transactions that committed after their own transaction started.
A practical example - let say we have 2 simple tables:
CREATE TABLE t1( x int );
INSERT INTO t1 VALUES (1),(2),(3);
CREATE TABLE t2( y int );
INSERT INTO t2 VALUES (1),(2),(3);
A number of tables, their structures, primary keys, foreign keys etc. are unimportant here.
Lets open a first session, start repeatable read isolation level, and run two simple and separate SELECT statements:
test=# START TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION
test=# SELECT * FROM t1;
x
---
1
2
3
(3 wiersze)
test=# SELECT * FROM t2;
y
---
1
2
3
(3 wiersze)
Note that START TRANSACTION command automatically disables autocommit mode in the session.
Now in another session (with default autocommit mode enabled)insert a few records into t1:
test2=# INSERT INTO t1 VALUES(10),(11);
New values were inserded and automatically commited (because autocommit is on).
Now go back to the first session and run SELECT again:
test=# select * from t1;
x
---
1
2
3
(3 wiersze)
As you see, session1 (with active repeatable read transaction) doesn't see any changes commited after the start of the transation.
Lets do the same experiment whit table t2 - go to the second session and issue:
test2=# DELETE FROM t2 WHERE y = 2;
DELETE 1
Now go back to the first session and run SELECT again:
test=# SELECT * FROM t2;
y
---
1
2
3
(3 wiersze)
As you see, again, session1 (with active repeatable read transaction) doesn't see any changes commited after the start of the transation.
And now, in session1, finish the transaction issuing COMMIT, and then SELECT:
test=# SELECT * FROM t1;
x
---
1
2
3
(3 wiersze)
test=# SELECT * FROM t2;
y
---
1
2
3
(3 wiersze)
test=# COMMIT;
COMMIT
test=# select * from t1;
x
----
1
2
3
10
11
(5 wierszy)
test=# select * from t2;
y
---
1
3
(2 wiersze)
As you see, when the repeatable read transaction is started and active, you can run many separate select statement multiple times, and all of these select statements see the same stable snapshot of data as of the start of the transaction, regardles of any commited data in other sessions.

Enforce Atomic Operations x Locks

I have a database model that works similar to a banking account (one table for operations, a nd a trigger to update the balance). I'm currently using SQL SERVER 2008 R2.
TABLE OPERATIONS
----------------
VL_CREDIT decimal(10,2)
VL_DEBIT decimal(10,2)
TABLE BALANCE
-------------
DT_OPERATION datetime
VL_CURRENT decimal(10,2)
PROCEDURE INSERT_OPERATION
--------------------------
GET LAST BALANCE BY DATE
CHECK IF VALUE OF OPERATION > BALANCE
IF > RETURN ERROR
ELSE INSERT INTO OPERATION(...,....)
The issue I have is the following:
The procedure to insert the operation has to check the balance to see if there's money available before inserting the operation, so the balance never gets negative. If there's no balance, I return some code to tell the user the balance is not enough.
My concern is: If this procedure gets called multiple times in a row, how can I guarantee that it's atomic?
I have some ideas, but as I am not sure which would guarantee it:
BEGIN TRANSACTION on the OPERATION PROCEDURE
Some sort of lock on selecting the BALANCE table, but it must hold until the end of procedure execution
Can you suggest some approach to guarantee that? Thanks in advance.
UPDATE
I read on MSDN (http://technet.microsoft.com/en-us/library/ms187373.aspx) that if my procedure has BEGIN/END TRANSACTION, and the SELECT on table BALANCE has WITH(TABLOCKX), it locks the table until the end of the transaction, so if a subsequent call to this procedure is made during the execution of the first, it will wait, and then guarantee that the value is always the last updated. Will it work? And if so, is it the best practice?
If you're amenable to changing your table structures, I'd build it this way:
create table Transactions (
SequenceNo int not null,
OpeningBalance decimal(38,4) not null,
Amount decimal(38,4) not null,
ClosingBalance as CONVERT(decimal(38,4),OpeningBalance + Amount) persisted,
PrevSequenceNo as CASE WHEN SequenceNo > 1 THEN SequenceNo - 1 END persisted,
constraint CK_Transaction_Sequence CHECK (SequenceNo > 0),
constraint PK_Transaction_Sequence PRIMARY KEY (SequenceNo),
constraint CK_Transaction_NotNegative CHECK (OpeningBalance + Amount >= 0),
constraint UQ_Transaction_BalanceCheck UNIQUE (SequenceNo, ClosingBalance),
constraint FK_Transaction_BalanceCheck FOREIGN KEY
(PrevSequenceNo, OpeningBalance)
references Transactions
(SequenceNo,ClosingBalance)
/* Optional - another check that Transaction 1 has 0 amount and
0 opening balance, if required */
)
Where you just apply credits and debits as +ve or -ve values for Amount. The above structure is enough to enforce the "not going negative" requirement (via CK_Transaction_NotNegative), and it also ensures that you know the current balance (by finding the row with the highest SequenceNo and taking the ClosingBalance value. Together, UQ_Transaction_BalanceCheck and FK_Transaction_BalanceCheck (and the computed columns) ensure that the entire sequence of transactions is valid, and PK_Transaction_Sequence keeps everything building in order
So, if we populate it with some data:
insert into Transactions (SequenceNo,OpeningBalance,Amount) values
(1,0.0,10.0),
(2,10.0,-5.50),
(3,4.50,2.75)
And now we can attempt an insert (this could be INSERT_PROCEDURE with #NewAmount passed as a parameter):
declare #NewAmount decimal(38,4)
set #NewAmount = -15.50
;With LastTransaction as (
select SequenceNo,ClosingBalance,
ROW_NUMBER() OVER (ORDER BY SequenceNo desc) as rn
from Transactions
)
insert into Transactions (SequenceNo,OpeningBalance,Amount)
select SequenceNo + 1, ClosingBalance, #NewAmount
from LastTransaction
where rn = 1
This insert fails because it would have caused the balance to go negative. But if #NewAmount was small enough, it would have succeeded. And if two inserts are attempted at "the same time" then either a) They're just far enough apart in reality that they both succeed, and the balances are all kept correct, or b) One of them will receive a PK violation error.