Why does PostgreSQL serializable transaction think this as conflict? - postgresql

In my understanding PostgreSQL use some kind of monitors to guess if there's a conflict in serializable isolation level. Many examples are about modifying same resource in concurrent transaction, and serializable transaction works great. But I want to test concurrent issue in another way.
I decide to test 2 users modifying their own account balance, and wish PostgreSQL is smart enough to not detect it as conflict, but the result is not what I want.
Below is my table, there're 4 accounts which belongs to 2 users, each user has a checking account and a saving account.
create table accounts (
id serial primary key,
user_id int,
type varchar,
balance numeric
);
insert into accounts (user_id, type, balance) values
(1, 'checking', 1000),
(1, 'saving', 1000),
(2, 'checking', 1000),
(2, 'saving', 1000);
The table data is like this:
id | user_id | type | balance
----+---------+----------+---------
1 | 1 | checking | 1000
2 | 1 | saving | 1000
3 | 2 | checking | 1000
4 | 2 | saving | 1000
Now I run 2 concurrent transaction for 2 users. In each transaction, I reduce the checking account with some money, and check that user's total balance. If it's greater than 1000, then commit, otherwise rollback.
The user 1's example:
begin;
-- Reduce checking account for user 1
update accounts set balance = balance - 200 where user_id = 1 and type = 'checking';
-- Make sure user 1's total balance > 1000, then commit
select sum(balance) from accounts where user_id = 1;
commit;
The user 2 is the same, except the user_id = 2 in where:
begin;
update accounts set balance = balance - 200 where user_id = 2 and type = 'checking';
select sum(balance) from accounts where user_id = 2;
commit;
I first commit user 1's transaction, it success with no doubt. When I commit user 2's transaction, it fails.
ERROR: could not serialize access due to read/write dependencies among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt.
HINT: The transaction might succeed if retried.
My questions are:
Why PostgreSQL thinks this 2 transactions are conflict? I added user_id condition for all SQL, and doesn't modify user_id, but all these have no effect.
Does that mean serializable transaction doesn't allow concurrent transactions happened on the same table, even if their read/write have no conflict?
Do something per user is very common, Should I avoid use serializable transaction for operations which happen very frequently?

You can fix this problem with the following index:
CREATE INDEX accounts_user_idx ON accounts(user_id);
Since there are so few data in your example table, you will have to tell PostgreSQL to use an index scan:
SET enable_seqscan=off;
Now your example will work!
If that seems like black magic, take a look at the query execution plans of your SELECT and UPDATE statements.
Without the index both will use a sequential scan on the table, thereby reading all rows in the table. So both transactions will end up with a SIReadLock on the whole table.
This triggers the serialization failure.

To my knowledge serializable has the highest level of isolation, therefore lowest level of concurrency. The transactions occur one after the other with zero concurrency.

Related

PostgreSQL - Does row locking depends on update syntax in transaction?

I have a table called user_table with 2 columns: id (integer) and data_value (integer).
Here are two transactions that end up with the same results:
-- TRANSACTION 1
BEGIN;
UPDATE user_table
SET data_value = 100
WHERE id = 0;
UPDATE user_table
SET data_value = 200
WHERE id = 1;
COMMIT;
-- TRANSACTION 2
BEGIN;
UPDATE user_table AS user_with_old_value SET
data_value = user_with_new_value.data_value
FROM (VALUES
(0, 100),
(1, 200)
) AS user_with_new_value(id, data_value)
WHERE user_with_new_value.id = user_with_old_value.id;
COMMIT;
I would like to know if there is a difference on the lock applied on the rows.
If I understand it correctly, transaction 1 will first lock user 0, then lock user 1, then free both locks.
But what does transaction 2 do ?
Does it do the same thing or does it do: lock user 0 and user 1, then free both locks ?
There is a difference because if i have two concurent transactions, if i write my queries as the first transaction, i might encounter deadlocks issues. But if I write my transactions like the second one, can I run into deadlocks issues ?
If it does the same thing, is there a way to write this transaction so that at the beginning of a transaction, before doing anything, the transaction checks for each rows it needs to update, waits until all this rows are not locked, then lock all rows at the same time ?
links:
the syntax of the second transaction comes from: Update multiple rows in same query using PostgreSQL
Both transactions lock the same two rows, and both can run into a deadlock. The difference is that the first transaction always locks the two rows in a certain order.
If your objective is to avoid deadlocks the first transaction is better: if you make a rule that any transaction must update the user with the lower id first, then no such two transactions can deadlock with each other (but they can still deadlock with other transactions that do not obey that rule).

Transaction Isolation Across Multiple Tables using PostgreSQL MVCC

Question Summary
This is a question about serializability of queries within a SQL transaction.
Specifically, I am using PostgreSQL. It may be assumed that I am using the most current version of PostgreSQL. From what I have read, I believe the technology used to support what I am trying to do is known as "MultiVersion Concurrency Control", or "MVCC".
To sum it up: If I have one primary table, and more-than-1 foreign-key-linked table connected to that primary table, how do I guarantee that, for a given key in the tables, and any number of SELECT statements using that key inside one transaction, each of which is SELECTing from any of the linked tables, I will get data as it existed at the time I started the transaction?
Other Questions
This question is similar, but broader, and the question and answer did not relate specifically to PostgreSQL:
Transaction isolation and reading from multiple tables on SQL Server Express and SQL Server 2005
Example
Let's say I have 3 tables:
bricks
brickworks (primary key)
completion_time (primary key)
has_been_sold
brick_colors
brickworks (primary key, foreign key pointing to "bricks")
completion_time (primary key, foreign key pointing to "bricks")
quadrant (primary key)
color
brick_weight
brickworks (primary key, foreign key pointing to "bricks")
completion_time (primary key, foreign key pointing to "bricks")
weight
A brickworks produces one brick at a time. It makes bricks that may be of different colors in each of its 4 quadrants.
Someone later analyzes the bricks to determine their color combination, and writes the results to the brick_colors table.
Someone else analyzes the bricks to determine their weight, and writes the results to the brick_weight table.
At any given time, an existing brick may or may not have a recorded color, and may or may not have a recorded weight.
An application exists, and this application receives word that someone wants to buy a particular brick (already known at this point to the application by its brickworks/completion_time composite key).
The application wants to select all known properties of the brick AT THE EXACT TIME IT STARTS THE QUERY.
If color or weight information is added MID-TRANSACTION, the application does NOT want to know about it.
The application wants to perform SEPARATE QUERIES (not a SELECT with multiple JOINs to the foreign-key-linked tables, which might return multiple rows because of the brick_colors table).
This example is deliberately simple; the desire to do this without one SELECT with multiple JOINs would be clearer if my example included, say, 10 foreign-key-linked tables, and many or all of them could return multiple rows for the same primary key (like brick_colors does in the example as I have it above).
Attempted Solution
Here's what I've come up with so far:
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY ;
-- All this statement accomplishes is telling the database what rows should be returned from the present point-in-time in future queries within the transaction
SELECT DISTINCT true
FROM bricks b
LEFT JOIN brick_colors bc ON bc.brickworks = b.brickworks AND bc.completion_time = b.completion_time
LEFT JOIN brick_weight bw ON bw.brickworks = b.brickworks AND bw.completion_time = b.completion_time
WHERE b.brickworks = 'Brick-o-Matic' AND b.completion_time = '2017-02-01T07:35:00.000Z' ;
SELECT * FROM brick_colors WHERE b.brickworks = 'Brick-o-Matic' AND b.completion_time = '2017-02-01T07:35:00.000Z' ;
SELECT * FROM brick_weight WHERE b.brickworks = 'Brick-o-Matic' AND b.completion_time = '2017-02-01T07:35:00.000Z' ;
COMMIT ;
It just seems wasteful to use that first SELECT with the JOINs solely for purposes of ensuring serializability.
Is there any other way to do this?
References
PostgreSQL Concurrency Control
PostgreSQL Transcation Isolation
PostgreSQL SET TRANSACTION statement
This is the essence of your question:
how do I guarantee that, for ...... any number of SELECT statements
..... inside one transaction ....... I will get data as it existed at
the time I started the transaction?
This is exactly what Repeatable Read Isolation Level guarantees:
The Repeatable Read isolation level only sees data committed before
the transaction began; it never sees either uncommitted data or
changes committed during transaction execution by concurrent
transactions. (However, the query does see the effects of previous
updates executed within its own transaction, even though they are not
yet committed.) This is a stronger guarantee than is required by the
SQL standard for this isolation level, and prevents all of the
phenomena described in Table 13-1. As mentioned above, this is
specifically allowed by the standard, which only describes the minimum
protections each isolation level must provide.
This level is different from Read Committed in that a query in a
repeatable read transaction sees a snapshot as of the start of the
transaction, not as of the start of the current query within the
transaction. Thus, successive SELECT commands within a single
transaction see the same data, i.e., they do not see changes made by
other transactions that committed after their own transaction started.
A practical example - let say we have 2 simple tables:
CREATE TABLE t1( x int );
INSERT INTO t1 VALUES (1),(2),(3);
CREATE TABLE t2( y int );
INSERT INTO t2 VALUES (1),(2),(3);
A number of tables, their structures, primary keys, foreign keys etc. are unimportant here.
Lets open a first session, start repeatable read isolation level, and run two simple and separate SELECT statements:
test=# START TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION
test=# SELECT * FROM t1;
x
---
1
2
3
(3 wiersze)
test=# SELECT * FROM t2;
y
---
1
2
3
(3 wiersze)
Note that START TRANSACTION command automatically disables autocommit mode in the session.
Now in another session (with default autocommit mode enabled)insert a few records into t1:
test2=# INSERT INTO t1 VALUES(10),(11);
New values were inserded and automatically commited (because autocommit is on).
Now go back to the first session and run SELECT again:
test=# select * from t1;
x
---
1
2
3
(3 wiersze)
As you see, session1 (with active repeatable read transaction) doesn't see any changes commited after the start of the transation.
Lets do the same experiment whit table t2 - go to the second session and issue:
test2=# DELETE FROM t2 WHERE y = 2;
DELETE 1
Now go back to the first session and run SELECT again:
test=# SELECT * FROM t2;
y
---
1
2
3
(3 wiersze)
As you see, again, session1 (with active repeatable read transaction) doesn't see any changes commited after the start of the transation.
And now, in session1, finish the transaction issuing COMMIT, and then SELECT:
test=# SELECT * FROM t1;
x
---
1
2
3
(3 wiersze)
test=# SELECT * FROM t2;
y
---
1
2
3
(3 wiersze)
test=# COMMIT;
COMMIT
test=# select * from t1;
x
----
1
2
3
10
11
(5 wierszy)
test=# select * from t2;
y
---
1
3
(2 wiersze)
As you see, when the repeatable read transaction is started and active, you can run many separate select statement multiple times, and all of these select statements see the same stable snapshot of data as of the start of the transaction, regardles of any commited data in other sessions.

Concurrency scenarios with INSERTs

I'm designing a booking system in PHP + PostgreSQL.
I'm not able to find a clean solution to a concurrency problem based on INSERTs operations.
The DB system is mainly made of these tables:
CREATE TABLE booking (
booking_id INT,
user_id INT,
state SMALLINT,
nb_coupons INT
);
CREATE booking_state_history (
booking_state_history_id INT,
timestamp TIMESTAMP,
booking_id INT,
state SMALLINT);
CREATE TABLE coupon_purchase(
coupon_purchase_id,
user_id INT,
nb INT,
value MONEY)
CREATE TABLE coupon_refund(
coupon_refund_id INT,
user_id,
nb INT,
value MONEY)
CREATE TABLE booking_payment(
booking_payment_id INT,
user_id,
booking_id,
nb INT,
value MONEY)
A booking must be paid with coupons that have been previously purchased by the user. Some coupons may have been refund. All these operations are stored in the two corresponding tables to keep an history and be able to compute the coupon balance.
Constraint: the coupon balance cannot be negative at any time.
A booking is finalized when it is paid with coupons.
Then the following operations happen:
BEGIN;
(1) Check there are enough coupons remaining to pay the booking. (SELECT)
(2) Decide which coupons (number and value) will be used to pay the booking
(mainly, higher cost coupon used first. But that is not the issue here.)
(3) Add records to booking_payment (INSERTs)
(4) Move the booking to state="PAID" (integer value representing "PAID") (UPDATE)
(5) Add a record to booking_state_history (INSERT)
COMMIT;
These operations need to be atomic to preserve DB information coherency.
Hence the usage of transactions that allow to COMMIT or ROLLBACK in case of failure, DB exception, PHP exception or any other issue in the middle of the operations.
Scenario 1
Since I'm in a concurrent access environment (web site) nothing prevents the user from (for instance) asking for a coupon refund while doing a booking payment at the same time.
Scenario 2
He can also trigger two concurrent booking payments at the same time in two different transactions.
So the following can happen:
Scenario 1
After (1) is done, the coupon refund is triggered by the user and the subsequent coupon balance is not enough to pay the booking any more.
When it COMMITs the balance becomes negative.
Note:
Even if I do a recheck of coupon balance in a new (6) step, there is a possibility for the coupon refund to happen in the meantime between (6) and COMMIT.
Scenario 2
Two concurrent booking payment transactions for which total number of coupons for payment is too much for the global balance to stay positive. Only one of them can happen.
Transaction 1 and transaction 2 are checking for balance and seeing enough coupons for their respective payment in step (1).
They go on with their operations and COMMIT. The new balance is negative and conflicting with the constraint.
Note:
Even if I do a coupon balance recheck in a new (6) step the transactions cannot see the operations not yet commited by the other one.
So they blindly proceed to COMMIT.
I guess this is an usual concurrency case but I cannot find a pattern to solve this on the internet.
I thought of rechecking the balance after the COMMIT so I can manually UNDO all the operations. But it is not totally safe since if an exception happen after the commit the UNDO won't be done.
Any idea to solve this concurrency problem?
Thanks.
Your problem boils down to the question of "what should be the synchronization lock". From your question it seems that the booking is not booking of a specific item. But lets assume, that a user is booking a specific hotel room so you need to solve two problems:
prevent overbooking (e.g. booking the same thing for two people)
prevent parallel account state miscalculation
So when a user gets to a point when he/she is about to hit confirm button, this is a possible scenario you can implement:
begin transaction
lock the user entry so that parallel processes are blocked
SELECT * FROM user FOR UPDATE WHERE id = :id
re-check account balance and throw exception / rollback if there are insufficient funds
lock the item to be booked to prevent overbooking
SELECT * FROM room FOR UPDATE WHERE id = :id
re-check booking availability and throw exception / rollback if the item is already booked
create booking entry and subtract funds from user's account
commit transaction (all locks will be released)
If, in your case, you don't need to check for overbooking, just skip / ignore steps 4 and 5.
Below is the solution I've implemented.
Note: I just treated the coupon transfer part below but it is the same with booking state change and booking_state_history.
The main idea is to preserve this part of the processing as a critical section.
When an INSERT into booking_payment, coupon_purchase or coupon_refund is to be done I prevent other transactions from doing the same by putting a lock on a dedicated table through an UPDATE for the given user_id.
This way, only transactions impacting this given user_id for the same kind of treatment will be locked.
Intialization
DROP TABLE coupon_purchase;
DROP TABLE coupon_refund;
DROP TABLE booking_payment;
DROP TABLE lock_coupon_transaction;
CREATE TABLE coupon_purchase(
coupon_purchase_id SERIAL PRIMARY KEY,
user_id INT,
nb INT);
CREATE TABLE coupon_refund(
coupon_refund_id SERIAL PRIMARY KEY,
user_id INT,
nb INT);
CREATE TABLE booking_payment(
booking_payment_id SERIAL PRIMARY KEY,
user_id INT,
booking_id INT,
nb INT);
CREATE TABLE lock_coupon_transaction (
user_id INT,
timestamp TIMESTAMP);
INSERT INTO coupon_purchase
(user_id, nb) VALUES
(1, 1),
(1, 5);
INSERT INTO coupon_refund
(user_id, nb) VALUES
(1, 3);
INSERT INTO lock_coupon_transaction
(user_id, timestamp) VALUES
(1, current_timestamp);
Transaction 1
BEGIN;
UPDATE lock_coupon_transaction SET timestamp=current_timestamp WHERE user_id='1';
WITH coupon_balance AS (
SELECT
t1.nb_purchased_coupons -
t2.nb_refunded_coupons -
t3.nb_booking_payment_coupons AS total
FROM
(SELECT COALESCE(SUM(nb),0) AS nb_purchased_coupons FROM coupon_purchase WHERE user_id='1' ) t1,
(SELECT COALESCE(SUM(nb),0) AS nb_refunded_coupons FROM coupon_refund WHERE user_id='1' ) t2,
(SELECT COALESCE(SUM(nb),0) AS nb_booking_payment_coupons FROM booking_payment WHERE user_id='1' ) t3
)
INSERT INTO booking_payment
(user_id, booking_id, nb)
SELECT 1::INT, 1::INT, 3::INT
FROM coupon_balance
WHERE (total::INT >= 3::INT);
INSERT 0 1
Transaction 2
BEGIN;
UPDATE lock_coupon_transaction SET timestamp=current_timestamp WHERE user_id='1';
// Transaction is locked waiting for a COMMIT or ROLLBACK from transaction 1.
Transaction 1
COMMIT;
COMMIT
Transaction 2
// Transaction 1 lock has been released so transaction 2 can go on
WITH coupon_balance AS (
SELECT
t1.nb_purchased_coupons -
t2.nb_refunded_coupons -
t3.nb_booking_payment_coupons AS total
FROM
(SELECT COALESCE(SUM(nb),0) AS nb_purchased_coupons FROM coupon_purchase WHERE user_id='1' ) t1,
(SELECT COALESCE(SUM(nb),0) AS nb_refunded_coupons FROM coupon_refund WHERE user_id='1' ) t2,
(SELECT COALESCE(SUM(nb),0) AS nb_booking_payment_coupons FROM booking_payment WHERE user_id='1' ) t3
)
INSERT INTO coupon_refund
(user_id, nb)
SELECT 1::INT, 3::INT
FROM coupon_balance
WHERE (total::INT >= 3::INT);
INSERT 0 0
COMMIT;
COMMIT
INSERT couldn't be done since not enough money on the account. This is the expected behavior.
The previous transaction was commited when the second one proceeded. So transaction 2 could see all changes made by transaction 1.
This way there is not risk to have concurrent access to coupons handling.

Enforce Atomic Operations x Locks

I have a database model that works similar to a banking account (one table for operations, a nd a trigger to update the balance). I'm currently using SQL SERVER 2008 R2.
TABLE OPERATIONS
----------------
VL_CREDIT decimal(10,2)
VL_DEBIT decimal(10,2)
TABLE BALANCE
-------------
DT_OPERATION datetime
VL_CURRENT decimal(10,2)
PROCEDURE INSERT_OPERATION
--------------------------
GET LAST BALANCE BY DATE
CHECK IF VALUE OF OPERATION > BALANCE
IF > RETURN ERROR
ELSE INSERT INTO OPERATION(...,....)
The issue I have is the following:
The procedure to insert the operation has to check the balance to see if there's money available before inserting the operation, so the balance never gets negative. If there's no balance, I return some code to tell the user the balance is not enough.
My concern is: If this procedure gets called multiple times in a row, how can I guarantee that it's atomic?
I have some ideas, but as I am not sure which would guarantee it:
BEGIN TRANSACTION on the OPERATION PROCEDURE
Some sort of lock on selecting the BALANCE table, but it must hold until the end of procedure execution
Can you suggest some approach to guarantee that? Thanks in advance.
UPDATE
I read on MSDN (http://technet.microsoft.com/en-us/library/ms187373.aspx) that if my procedure has BEGIN/END TRANSACTION, and the SELECT on table BALANCE has WITH(TABLOCKX), it locks the table until the end of the transaction, so if a subsequent call to this procedure is made during the execution of the first, it will wait, and then guarantee that the value is always the last updated. Will it work? And if so, is it the best practice?
If you're amenable to changing your table structures, I'd build it this way:
create table Transactions (
SequenceNo int not null,
OpeningBalance decimal(38,4) not null,
Amount decimal(38,4) not null,
ClosingBalance as CONVERT(decimal(38,4),OpeningBalance + Amount) persisted,
PrevSequenceNo as CASE WHEN SequenceNo > 1 THEN SequenceNo - 1 END persisted,
constraint CK_Transaction_Sequence CHECK (SequenceNo > 0),
constraint PK_Transaction_Sequence PRIMARY KEY (SequenceNo),
constraint CK_Transaction_NotNegative CHECK (OpeningBalance + Amount >= 0),
constraint UQ_Transaction_BalanceCheck UNIQUE (SequenceNo, ClosingBalance),
constraint FK_Transaction_BalanceCheck FOREIGN KEY
(PrevSequenceNo, OpeningBalance)
references Transactions
(SequenceNo,ClosingBalance)
/* Optional - another check that Transaction 1 has 0 amount and
0 opening balance, if required */
)
Where you just apply credits and debits as +ve or -ve values for Amount. The above structure is enough to enforce the "not going negative" requirement (via CK_Transaction_NotNegative), and it also ensures that you know the current balance (by finding the row with the highest SequenceNo and taking the ClosingBalance value. Together, UQ_Transaction_BalanceCheck and FK_Transaction_BalanceCheck (and the computed columns) ensure that the entire sequence of transactions is valid, and PK_Transaction_Sequence keeps everything building in order
So, if we populate it with some data:
insert into Transactions (SequenceNo,OpeningBalance,Amount) values
(1,0.0,10.0),
(2,10.0,-5.50),
(3,4.50,2.75)
And now we can attempt an insert (this could be INSERT_PROCEDURE with #NewAmount passed as a parameter):
declare #NewAmount decimal(38,4)
set #NewAmount = -15.50
;With LastTransaction as (
select SequenceNo,ClosingBalance,
ROW_NUMBER() OVER (ORDER BY SequenceNo desc) as rn
from Transactions
)
insert into Transactions (SequenceNo,OpeningBalance,Amount)
select SequenceNo + 1, ClosingBalance, #NewAmount
from LastTransaction
where rn = 1
This insert fails because it would have caused the balance to go negative. But if #NewAmount was small enough, it would have succeeded. And if two inserts are attempted at "the same time" then either a) They're just far enough apart in reality that they both succeed, and the balances are all kept correct, or b) One of them will receive a PK violation error.

How to disallow parallel INSERTs in PostgreSQL?

I have an ON INSERT trigger in PostgreSQL 9.2, which does some calculation and injects extra data into every row being inserted. The problem is that I don't want any two INSERT transactions to happen in parallel. I want them to go one by one, because my calculations have to be incremental and take into account previous calculation results. Is it possible to achieve this somehow?
I'm trying to create a rolling balance on a list of payments:
id | amount | balance
----------------------
1 | 50 | 50
2 | 130 | 180
3 | -75 | 105
4 | 15 | 120
The balance has to be calculated on every INSERT as a previous balance plus a new payment amount. If INSERTs happen in parallel I have duplicates in balance column, which is logical. I need to find a way how to enforce them to be executed in a strict sequential order.
SERIALIZABLE transaction didn't help, mostly because of the problem explained here. I simply get errors on most of transactions: could not serialize access due to read/write dependencies among transactions.
What did help was an explicit LOCK before every INSERT:
LOCK TABLE receipt IN SHARE ROW EXCLUSIVE MODE;
INSERT INTO receipt ...
All inserts are happening consequently now.