Concurrency scenarios with INSERTs - postgresql

I'm designing a booking system in PHP + PostgreSQL.
I'm not able to find a clean solution to a concurrency problem based on INSERTs operations.
The DB system is mainly made of these tables:
CREATE TABLE booking (
booking_id INT,
user_id INT,
state SMALLINT,
nb_coupons INT
);
CREATE booking_state_history (
booking_state_history_id INT,
timestamp TIMESTAMP,
booking_id INT,
state SMALLINT);
CREATE TABLE coupon_purchase(
coupon_purchase_id,
user_id INT,
nb INT,
value MONEY)
CREATE TABLE coupon_refund(
coupon_refund_id INT,
user_id,
nb INT,
value MONEY)
CREATE TABLE booking_payment(
booking_payment_id INT,
user_id,
booking_id,
nb INT,
value MONEY)
A booking must be paid with coupons that have been previously purchased by the user. Some coupons may have been refund. All these operations are stored in the two corresponding tables to keep an history and be able to compute the coupon balance.
Constraint: the coupon balance cannot be negative at any time.
A booking is finalized when it is paid with coupons.
Then the following operations happen:
BEGIN;
(1) Check there are enough coupons remaining to pay the booking. (SELECT)
(2) Decide which coupons (number and value) will be used to pay the booking
(mainly, higher cost coupon used first. But that is not the issue here.)
(3) Add records to booking_payment (INSERTs)
(4) Move the booking to state="PAID" (integer value representing "PAID") (UPDATE)
(5) Add a record to booking_state_history (INSERT)
COMMIT;
These operations need to be atomic to preserve DB information coherency.
Hence the usage of transactions that allow to COMMIT or ROLLBACK in case of failure, DB exception, PHP exception or any other issue in the middle of the operations.
Scenario 1
Since I'm in a concurrent access environment (web site) nothing prevents the user from (for instance) asking for a coupon refund while doing a booking payment at the same time.
Scenario 2
He can also trigger two concurrent booking payments at the same time in two different transactions.
So the following can happen:
Scenario 1
After (1) is done, the coupon refund is triggered by the user and the subsequent coupon balance is not enough to pay the booking any more.
When it COMMITs the balance becomes negative.
Note:
Even if I do a recheck of coupon balance in a new (6) step, there is a possibility for the coupon refund to happen in the meantime between (6) and COMMIT.
Scenario 2
Two concurrent booking payment transactions for which total number of coupons for payment is too much for the global balance to stay positive. Only one of them can happen.
Transaction 1 and transaction 2 are checking for balance and seeing enough coupons for their respective payment in step (1).
They go on with their operations and COMMIT. The new balance is negative and conflicting with the constraint.
Note:
Even if I do a coupon balance recheck in a new (6) step the transactions cannot see the operations not yet commited by the other one.
So they blindly proceed to COMMIT.
I guess this is an usual concurrency case but I cannot find a pattern to solve this on the internet.
I thought of rechecking the balance after the COMMIT so I can manually UNDO all the operations. But it is not totally safe since if an exception happen after the commit the UNDO won't be done.
Any idea to solve this concurrency problem?
Thanks.

Your problem boils down to the question of "what should be the synchronization lock". From your question it seems that the booking is not booking of a specific item. But lets assume, that a user is booking a specific hotel room so you need to solve two problems:
prevent overbooking (e.g. booking the same thing for two people)
prevent parallel account state miscalculation
So when a user gets to a point when he/she is about to hit confirm button, this is a possible scenario you can implement:
begin transaction
lock the user entry so that parallel processes are blocked
SELECT * FROM user FOR UPDATE WHERE id = :id
re-check account balance and throw exception / rollback if there are insufficient funds
lock the item to be booked to prevent overbooking
SELECT * FROM room FOR UPDATE WHERE id = :id
re-check booking availability and throw exception / rollback if the item is already booked
create booking entry and subtract funds from user's account
commit transaction (all locks will be released)
If, in your case, you don't need to check for overbooking, just skip / ignore steps 4 and 5.

Below is the solution I've implemented.
Note: I just treated the coupon transfer part below but it is the same with booking state change and booking_state_history.
The main idea is to preserve this part of the processing as a critical section.
When an INSERT into booking_payment, coupon_purchase or coupon_refund is to be done I prevent other transactions from doing the same by putting a lock on a dedicated table through an UPDATE for the given user_id.
This way, only transactions impacting this given user_id for the same kind of treatment will be locked.
Intialization
DROP TABLE coupon_purchase;
DROP TABLE coupon_refund;
DROP TABLE booking_payment;
DROP TABLE lock_coupon_transaction;
CREATE TABLE coupon_purchase(
coupon_purchase_id SERIAL PRIMARY KEY,
user_id INT,
nb INT);
CREATE TABLE coupon_refund(
coupon_refund_id SERIAL PRIMARY KEY,
user_id INT,
nb INT);
CREATE TABLE booking_payment(
booking_payment_id SERIAL PRIMARY KEY,
user_id INT,
booking_id INT,
nb INT);
CREATE TABLE lock_coupon_transaction (
user_id INT,
timestamp TIMESTAMP);
INSERT INTO coupon_purchase
(user_id, nb) VALUES
(1, 1),
(1, 5);
INSERT INTO coupon_refund
(user_id, nb) VALUES
(1, 3);
INSERT INTO lock_coupon_transaction
(user_id, timestamp) VALUES
(1, current_timestamp);
Transaction 1
BEGIN;
UPDATE lock_coupon_transaction SET timestamp=current_timestamp WHERE user_id='1';
WITH coupon_balance AS (
SELECT
t1.nb_purchased_coupons -
t2.nb_refunded_coupons -
t3.nb_booking_payment_coupons AS total
FROM
(SELECT COALESCE(SUM(nb),0) AS nb_purchased_coupons FROM coupon_purchase WHERE user_id='1' ) t1,
(SELECT COALESCE(SUM(nb),0) AS nb_refunded_coupons FROM coupon_refund WHERE user_id='1' ) t2,
(SELECT COALESCE(SUM(nb),0) AS nb_booking_payment_coupons FROM booking_payment WHERE user_id='1' ) t3
)
INSERT INTO booking_payment
(user_id, booking_id, nb)
SELECT 1::INT, 1::INT, 3::INT
FROM coupon_balance
WHERE (total::INT >= 3::INT);
INSERT 0 1
Transaction 2
BEGIN;
UPDATE lock_coupon_transaction SET timestamp=current_timestamp WHERE user_id='1';
// Transaction is locked waiting for a COMMIT or ROLLBACK from transaction 1.
Transaction 1
COMMIT;
COMMIT
Transaction 2
// Transaction 1 lock has been released so transaction 2 can go on
WITH coupon_balance AS (
SELECT
t1.nb_purchased_coupons -
t2.nb_refunded_coupons -
t3.nb_booking_payment_coupons AS total
FROM
(SELECT COALESCE(SUM(nb),0) AS nb_purchased_coupons FROM coupon_purchase WHERE user_id='1' ) t1,
(SELECT COALESCE(SUM(nb),0) AS nb_refunded_coupons FROM coupon_refund WHERE user_id='1' ) t2,
(SELECT COALESCE(SUM(nb),0) AS nb_booking_payment_coupons FROM booking_payment WHERE user_id='1' ) t3
)
INSERT INTO coupon_refund
(user_id, nb)
SELECT 1::INT, 3::INT
FROM coupon_balance
WHERE (total::INT >= 3::INT);
INSERT 0 0
COMMIT;
COMMIT
INSERT couldn't be done since not enough money on the account. This is the expected behavior.
The previous transaction was commited when the second one proceeded. So transaction 2 could see all changes made by transaction 1.
This way there is not risk to have concurrent access to coupons handling.

Related

PostgreSQL - Does row locking depends on update syntax in transaction?

I have a table called user_table with 2 columns: id (integer) and data_value (integer).
Here are two transactions that end up with the same results:
-- TRANSACTION 1
BEGIN;
UPDATE user_table
SET data_value = 100
WHERE id = 0;
UPDATE user_table
SET data_value = 200
WHERE id = 1;
COMMIT;
-- TRANSACTION 2
BEGIN;
UPDATE user_table AS user_with_old_value SET
data_value = user_with_new_value.data_value
FROM (VALUES
(0, 100),
(1, 200)
) AS user_with_new_value(id, data_value)
WHERE user_with_new_value.id = user_with_old_value.id;
COMMIT;
I would like to know if there is a difference on the lock applied on the rows.
If I understand it correctly, transaction 1 will first lock user 0, then lock user 1, then free both locks.
But what does transaction 2 do ?
Does it do the same thing or does it do: lock user 0 and user 1, then free both locks ?
There is a difference because if i have two concurent transactions, if i write my queries as the first transaction, i might encounter deadlocks issues. But if I write my transactions like the second one, can I run into deadlocks issues ?
If it does the same thing, is there a way to write this transaction so that at the beginning of a transaction, before doing anything, the transaction checks for each rows it needs to update, waits until all this rows are not locked, then lock all rows at the same time ?
links:
the syntax of the second transaction comes from: Update multiple rows in same query using PostgreSQL
Both transactions lock the same two rows, and both can run into a deadlock. The difference is that the first transaction always locks the two rows in a certain order.
If your objective is to avoid deadlocks the first transaction is better: if you make a rule that any transaction must update the user with the lower id first, then no such two transactions can deadlock with each other (but they can still deadlock with other transactions that do not obey that rule).

Prevent two threads from selecting same row ibm db2

I have a situation where I have multiple (potentially hundreds) threads repeating the same task (using a java scheduled executor, if you are curious). This task entails selecting rows of changes (from a table called change) that have not yet been processed (processed changes are kept track in a m:n join table called process_change_rel that keeps track of the process id, record id and status) processing them, then updating back the status.
My question is, how is the best way to prevent two threads from the same process from selecting the same row? Will the below solution (using for update to lock rows ) work? If not, please suggest a working solution
Create table change(
—id , autogenerated pk
—other fields
)
Create table change_process_rel(
—change id (pk of change table)
—process id (pk of process table)
—status)
Query I would use is listed below
Select * from
change c
where c.id not in(select changeid from change_process_rel with cs) for update
Please let me know if this would work
You have to "lock" a row which you are going to process somehow. Such a "locking" should be concurrent of course with minimum conflicts / errors.
One way is as follows:
Create table change
(
id int not null generated always as identity
, v varchar(10)
) in userspace1;
insert into change (v) values '1', '2', '3';
Create table change_process_rel
(
id int not null
, pid int not null
, status int not null
) in userspace1;
create unique index change_process_rel1 on change_process_rel(id);
Now you should be able to run the same statement from multiple concurrent sessions:
SELECT ID
FROM NEW TABLE
(
insert into change_process_rel (id, pid, status)
select c.id, mon_get_application_handle(), 1
from change c
where not exists (select 1 from change_process_rel r where r.id = c.id)
fetch first 1 row only
with ur
);
Every such a statement inserts 1 or 0 rows into the change_process_rel table, which is used here as a "lock" table. The corresponding ID from change is returned, and you may proceed with processing of the corresponding event in the same transaction.
If the transaction completes successfully, then the row inserted into the change_process_rel table is saved, so, the corresponding id from change may be considered as processed. If the transaction fails, the corresponding "lock" row from change_process_rel disappears, and this row may be processed later by this or another application.
The problem of this method is, that when both tables become large enough, such a sub-select may not work as quick as previously.
Another method is to use Evaluate uncommitted data through lock deferral.
It requires to place the status column into the change table.
Unfortunately, Db2 for LUW doesn't have SKIP LOCKED functionality, which might help with such a sort of algorithms.
If, let's say, status=0 is "not processed", and status<>0 is some processing / processed status, then after setting these DB2_EVALUNCOMMITTED and DB2_SKIP* registry variables and restart the instance, you may "catch" the next ID for processing with the following statement.
SELECT ID
FROM NEW TABLE
(
update
(
select id, status
from change
where status=0
fetch first 1 row only
)
set status=1
);
Once you get it, you may do further processing of this ID in the same transaction as previously.
It's good to create an index for performance:
create index change1 on change(status);
and may be set this table as volatile or collect distribution statistics on this column in addition to regular statistics on table and its indexes periodically.
Note that such a registry variables setting has global effect, and you should keep it in mind...

In certain cases, can UPDATE transactions fail rather than block waiting for "FOR UPDATE lock"?

I'm using Postgres 9.6.5.
The docs say:
13.3. Explicit Locking
13.3.2. Row-level Locks" -> "Row-level Lock Modes" -> "FOR UPDATE":
FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being locked, modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE, SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of these rows will be blocked until the current transaction ends; conversely, SELECT FOR UPDATE will wait for a concurrent transaction that has run any of those commands on the same row, and will then lock and return the updated row (or no row, if the row was deleted). ...
The mode is also acquired by any DELETE on a row, and also by an UPDATE that modifies the values on certain columns. Currently, the set of columns considered for the UPDATE case are those that have a unique index on them that can be used in a foreign key (so partial indexes and expressional indexes are not considered), but this may change in the future.
Regarding UPDATEs on rows that are locked via "SELECT FOR UPDATE" in another transaction, I read the above as follows: other transactions that attempt UPDATE of these rows will be blocked until the current transaction ( which did "SELECT FOR UPDATE" for those rows ) ends, unless the columns in these rows being UPDATE'ed are those that don't have a unique index on them that can be used in a foreign key.
Is this correct ? If so, if I have a table "program" with a text column "stage" ( this column doesn't fit "have a unique index on them that can be used in a foreign key" ), and I have a transaction that does "SELECT FOR UPDATE" for some rows followed by UPDATE'ing "stage" in these rows, is it correct that other concurrent transactions UPDATE'ing "stage" on these rows can fail, rather than block until the former transaction ends ?
Your transactions can fail if a deadlock is detected in one of the UPDATE or SELECT...FOR UPDATE, for example:
Transaction 1
BEGIN;
SELECT * FROM T1 FOR UPDATE;
SELECT * FROM T2 FOR UPDATE;
-- Intentionally, no rollback nor commit yet
Transaction 2
BEGIN;
SELECT * FROM T2 FOR UPDATE; -- blocks T2 first
SELECT * FROM T1 FOR UPDATE; -- exception will happen here
The moment the second transaction tries to lock T1 you'll get:
ERROR: 40P01: deadlock detected
DETAIL: Process 15981 waits for ShareLock on transaction 3538264; blocked by process 15942.
Process 15942 waits for ShareLock on transaction 3538265; blocked by process 15981.
HINT: See server log for query details.
CONTEXT: while locking tuple (0,1) in relation "t1"
LOCATION: DeadLockReport, deadlock.c:956
Time: 1001.728 ms

Why does PostgreSQL serializable transaction think this as conflict?

In my understanding PostgreSQL use some kind of monitors to guess if there's a conflict in serializable isolation level. Many examples are about modifying same resource in concurrent transaction, and serializable transaction works great. But I want to test concurrent issue in another way.
I decide to test 2 users modifying their own account balance, and wish PostgreSQL is smart enough to not detect it as conflict, but the result is not what I want.
Below is my table, there're 4 accounts which belongs to 2 users, each user has a checking account and a saving account.
create table accounts (
id serial primary key,
user_id int,
type varchar,
balance numeric
);
insert into accounts (user_id, type, balance) values
(1, 'checking', 1000),
(1, 'saving', 1000),
(2, 'checking', 1000),
(2, 'saving', 1000);
The table data is like this:
id | user_id | type | balance
----+---------+----------+---------
1 | 1 | checking | 1000
2 | 1 | saving | 1000
3 | 2 | checking | 1000
4 | 2 | saving | 1000
Now I run 2 concurrent transaction for 2 users. In each transaction, I reduce the checking account with some money, and check that user's total balance. If it's greater than 1000, then commit, otherwise rollback.
The user 1's example:
begin;
-- Reduce checking account for user 1
update accounts set balance = balance - 200 where user_id = 1 and type = 'checking';
-- Make sure user 1's total balance > 1000, then commit
select sum(balance) from accounts where user_id = 1;
commit;
The user 2 is the same, except the user_id = 2 in where:
begin;
update accounts set balance = balance - 200 where user_id = 2 and type = 'checking';
select sum(balance) from accounts where user_id = 2;
commit;
I first commit user 1's transaction, it success with no doubt. When I commit user 2's transaction, it fails.
ERROR: could not serialize access due to read/write dependencies among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt.
HINT: The transaction might succeed if retried.
My questions are:
Why PostgreSQL thinks this 2 transactions are conflict? I added user_id condition for all SQL, and doesn't modify user_id, but all these have no effect.
Does that mean serializable transaction doesn't allow concurrent transactions happened on the same table, even if their read/write have no conflict?
Do something per user is very common, Should I avoid use serializable transaction for operations which happen very frequently?
You can fix this problem with the following index:
CREATE INDEX accounts_user_idx ON accounts(user_id);
Since there are so few data in your example table, you will have to tell PostgreSQL to use an index scan:
SET enable_seqscan=off;
Now your example will work!
If that seems like black magic, take a look at the query execution plans of your SELECT and UPDATE statements.
Without the index both will use a sequential scan on the table, thereby reading all rows in the table. So both transactions will end up with a SIReadLock on the whole table.
This triggers the serialization failure.
To my knowledge serializable has the highest level of isolation, therefore lowest level of concurrency. The transactions occur one after the other with zero concurrency.

Enforce Atomic Operations x Locks

I have a database model that works similar to a banking account (one table for operations, a nd a trigger to update the balance). I'm currently using SQL SERVER 2008 R2.
TABLE OPERATIONS
----------------
VL_CREDIT decimal(10,2)
VL_DEBIT decimal(10,2)
TABLE BALANCE
-------------
DT_OPERATION datetime
VL_CURRENT decimal(10,2)
PROCEDURE INSERT_OPERATION
--------------------------
GET LAST BALANCE BY DATE
CHECK IF VALUE OF OPERATION > BALANCE
IF > RETURN ERROR
ELSE INSERT INTO OPERATION(...,....)
The issue I have is the following:
The procedure to insert the operation has to check the balance to see if there's money available before inserting the operation, so the balance never gets negative. If there's no balance, I return some code to tell the user the balance is not enough.
My concern is: If this procedure gets called multiple times in a row, how can I guarantee that it's atomic?
I have some ideas, but as I am not sure which would guarantee it:
BEGIN TRANSACTION on the OPERATION PROCEDURE
Some sort of lock on selecting the BALANCE table, but it must hold until the end of procedure execution
Can you suggest some approach to guarantee that? Thanks in advance.
UPDATE
I read on MSDN (http://technet.microsoft.com/en-us/library/ms187373.aspx) that if my procedure has BEGIN/END TRANSACTION, and the SELECT on table BALANCE has WITH(TABLOCKX), it locks the table until the end of the transaction, so if a subsequent call to this procedure is made during the execution of the first, it will wait, and then guarantee that the value is always the last updated. Will it work? And if so, is it the best practice?
If you're amenable to changing your table structures, I'd build it this way:
create table Transactions (
SequenceNo int not null,
OpeningBalance decimal(38,4) not null,
Amount decimal(38,4) not null,
ClosingBalance as CONVERT(decimal(38,4),OpeningBalance + Amount) persisted,
PrevSequenceNo as CASE WHEN SequenceNo > 1 THEN SequenceNo - 1 END persisted,
constraint CK_Transaction_Sequence CHECK (SequenceNo > 0),
constraint PK_Transaction_Sequence PRIMARY KEY (SequenceNo),
constraint CK_Transaction_NotNegative CHECK (OpeningBalance + Amount >= 0),
constraint UQ_Transaction_BalanceCheck UNIQUE (SequenceNo, ClosingBalance),
constraint FK_Transaction_BalanceCheck FOREIGN KEY
(PrevSequenceNo, OpeningBalance)
references Transactions
(SequenceNo,ClosingBalance)
/* Optional - another check that Transaction 1 has 0 amount and
0 opening balance, if required */
)
Where you just apply credits and debits as +ve or -ve values for Amount. The above structure is enough to enforce the "not going negative" requirement (via CK_Transaction_NotNegative), and it also ensures that you know the current balance (by finding the row with the highest SequenceNo and taking the ClosingBalance value. Together, UQ_Transaction_BalanceCheck and FK_Transaction_BalanceCheck (and the computed columns) ensure that the entire sequence of transactions is valid, and PK_Transaction_Sequence keeps everything building in order
So, if we populate it with some data:
insert into Transactions (SequenceNo,OpeningBalance,Amount) values
(1,0.0,10.0),
(2,10.0,-5.50),
(3,4.50,2.75)
And now we can attempt an insert (this could be INSERT_PROCEDURE with #NewAmount passed as a parameter):
declare #NewAmount decimal(38,4)
set #NewAmount = -15.50
;With LastTransaction as (
select SequenceNo,ClosingBalance,
ROW_NUMBER() OVER (ORDER BY SequenceNo desc) as rn
from Transactions
)
insert into Transactions (SequenceNo,OpeningBalance,Amount)
select SequenceNo + 1, ClosingBalance, #NewAmount
from LastTransaction
where rn = 1
This insert fails because it would have caused the balance to go negative. But if #NewAmount was small enough, it would have succeeded. And if two inserts are attempted at "the same time" then either a) They're just far enough apart in reality that they both succeed, and the balances are all kept correct, or b) One of them will receive a PK violation error.