Two concurrent INSERT ON CONFLICT UPDATE in parallel transactions - postgresql

A table contains no record with id=1 initially.
What will be a result after 2 transactions:
(1) -------------B+++++++++++++++++++++++C------------------------->
(2) --------------------B+++++++++++++++++++++++C------------------>
The first transaction contains UPSERT for a row with id=1
INSERT INTO some_table (id, amount) VALUES (1, 10)
ON CONFLICT DO
UPDATE some_table SET amount=amount+10 WHERE id=1;
The second transaction contains a similar one:
INSERT INTO some_table (id, amount) VALUES (1, 20)
ON CONFLICT DO
UPDATE some_table SET amount=amount+20 WHERE id=1;
What will be a result?
And as a follow up question, how it will work in this case?
(1) -------------B+++++++++++++++++++++++++++++++++++C------------->
(2) --------------------B+++++++++++++++++++++++C------------------>

The result is always the same: 30 will be added to amount. There is no race condition.
The reason is that the row locks taken by the update serialize the UPDATE operations, so the later update can only start when the transaction with the earlier update is committed, and the blocked transaction will see the committed result of the other transaction.

Related

Redshift insert a date value into a table

insert into table1 (ID,date)
select
ID,sysdate
from table2
assume i insert a record into table2 with value ID:1,date:2023-1-1
the expected result is update the ID of table1 base on the ID from table2 and update the value of date of table1 base on the sysdate from table2.
select *
from table1;
the expected result after running the insert statement will be
ID
date
1
2023-1-6
but what i get is:
ID
date
1
2023-1-1
I see a few possibilities based on the information given:
You say "the expected result is update the ID of table1 base on the ID from table2" and this begs the question - did ID = 1 exist in table1 BEFORE you ran the INSERT statement? If so are you expecting that the INSERT will update the value for ID #1? Redshift doesn't enforce or check uniqueness of primary keys and you would get 2 rows in the table1 in this case. Is this what is happening?
SYSDATE on Redshift provides the start timestamp of the current transaction, NOT the current statement. Have you had the current transaction open since the 1st?
You didn't COMMIT the results (or the statement failed) and are checking from a different session. It could also be that the transaction started before in the second session before the COMMIT completed. Working with MVCC across multiple sessions can trip anyone up.
There are likely other possible explanations. If you could provide DDL, sample data, and a simple test case so that others can recreate what you are seeing it would greatly narrow down the possibilities.

PostgreSQL - Does row locking depends on update syntax in transaction?

I have a table called user_table with 2 columns: id (integer) and data_value (integer).
Here are two transactions that end up with the same results:
-- TRANSACTION 1
BEGIN;
UPDATE user_table
SET data_value = 100
WHERE id = 0;
UPDATE user_table
SET data_value = 200
WHERE id = 1;
COMMIT;
-- TRANSACTION 2
BEGIN;
UPDATE user_table AS user_with_old_value SET
data_value = user_with_new_value.data_value
FROM (VALUES
(0, 100),
(1, 200)
) AS user_with_new_value(id, data_value)
WHERE user_with_new_value.id = user_with_old_value.id;
COMMIT;
I would like to know if there is a difference on the lock applied on the rows.
If I understand it correctly, transaction 1 will first lock user 0, then lock user 1, then free both locks.
But what does transaction 2 do ?
Does it do the same thing or does it do: lock user 0 and user 1, then free both locks ?
There is a difference because if i have two concurent transactions, if i write my queries as the first transaction, i might encounter deadlocks issues. But if I write my transactions like the second one, can I run into deadlocks issues ?
If it does the same thing, is there a way to write this transaction so that at the beginning of a transaction, before doing anything, the transaction checks for each rows it needs to update, waits until all this rows are not locked, then lock all rows at the same time ?
links:
the syntax of the second transaction comes from: Update multiple rows in same query using PostgreSQL
Both transactions lock the same two rows, and both can run into a deadlock. The difference is that the first transaction always locks the two rows in a certain order.
If your objective is to avoid deadlocks the first transaction is better: if you make a rule that any transaction must update the user with the lower id first, then no such two transactions can deadlock with each other (but they can still deadlock with other transactions that do not obey that rule).

In certain cases, can UPDATE transactions fail rather than block waiting for "FOR UPDATE lock"?

I'm using Postgres 9.6.5.
The docs say:
13.3. Explicit Locking
13.3.2. Row-level Locks" -> "Row-level Lock Modes" -> "FOR UPDATE":
FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being locked, modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE, SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of these rows will be blocked until the current transaction ends; conversely, SELECT FOR UPDATE will wait for a concurrent transaction that has run any of those commands on the same row, and will then lock and return the updated row (or no row, if the row was deleted). ...
The mode is also acquired by any DELETE on a row, and also by an UPDATE that modifies the values on certain columns. Currently, the set of columns considered for the UPDATE case are those that have a unique index on them that can be used in a foreign key (so partial indexes and expressional indexes are not considered), but this may change in the future.
Regarding UPDATEs on rows that are locked via "SELECT FOR UPDATE" in another transaction, I read the above as follows: other transactions that attempt UPDATE of these rows will be blocked until the current transaction ( which did "SELECT FOR UPDATE" for those rows ) ends, unless the columns in these rows being UPDATE'ed are those that don't have a unique index on them that can be used in a foreign key.
Is this correct ? If so, if I have a table "program" with a text column "stage" ( this column doesn't fit "have a unique index on them that can be used in a foreign key" ), and I have a transaction that does "SELECT FOR UPDATE" for some rows followed by UPDATE'ing "stage" in these rows, is it correct that other concurrent transactions UPDATE'ing "stage" on these rows can fail, rather than block until the former transaction ends ?
Your transactions can fail if a deadlock is detected in one of the UPDATE or SELECT...FOR UPDATE, for example:
Transaction 1
BEGIN;
SELECT * FROM T1 FOR UPDATE;
SELECT * FROM T2 FOR UPDATE;
-- Intentionally, no rollback nor commit yet
Transaction 2
BEGIN;
SELECT * FROM T2 FOR UPDATE; -- blocks T2 first
SELECT * FROM T1 FOR UPDATE; -- exception will happen here
The moment the second transaction tries to lock T1 you'll get:
ERROR: 40P01: deadlock detected
DETAIL: Process 15981 waits for ShareLock on transaction 3538264; blocked by process 15942.
Process 15942 waits for ShareLock on transaction 3538265; blocked by process 15981.
HINT: See server log for query details.
CONTEXT: while locking tuple (0,1) in relation "t1"
LOCATION: DeadLockReport, deadlock.c:956
Time: 1001.728 ms

Concurrency scenarios with INSERTs

I'm designing a booking system in PHP + PostgreSQL.
I'm not able to find a clean solution to a concurrency problem based on INSERTs operations.
The DB system is mainly made of these tables:
CREATE TABLE booking (
booking_id INT,
user_id INT,
state SMALLINT,
nb_coupons INT
);
CREATE booking_state_history (
booking_state_history_id INT,
timestamp TIMESTAMP,
booking_id INT,
state SMALLINT);
CREATE TABLE coupon_purchase(
coupon_purchase_id,
user_id INT,
nb INT,
value MONEY)
CREATE TABLE coupon_refund(
coupon_refund_id INT,
user_id,
nb INT,
value MONEY)
CREATE TABLE booking_payment(
booking_payment_id INT,
user_id,
booking_id,
nb INT,
value MONEY)
A booking must be paid with coupons that have been previously purchased by the user. Some coupons may have been refund. All these operations are stored in the two corresponding tables to keep an history and be able to compute the coupon balance.
Constraint: the coupon balance cannot be negative at any time.
A booking is finalized when it is paid with coupons.
Then the following operations happen:
BEGIN;
(1) Check there are enough coupons remaining to pay the booking. (SELECT)
(2) Decide which coupons (number and value) will be used to pay the booking
(mainly, higher cost coupon used first. But that is not the issue here.)
(3) Add records to booking_payment (INSERTs)
(4) Move the booking to state="PAID" (integer value representing "PAID") (UPDATE)
(5) Add a record to booking_state_history (INSERT)
COMMIT;
These operations need to be atomic to preserve DB information coherency.
Hence the usage of transactions that allow to COMMIT or ROLLBACK in case of failure, DB exception, PHP exception or any other issue in the middle of the operations.
Scenario 1
Since I'm in a concurrent access environment (web site) nothing prevents the user from (for instance) asking for a coupon refund while doing a booking payment at the same time.
Scenario 2
He can also trigger two concurrent booking payments at the same time in two different transactions.
So the following can happen:
Scenario 1
After (1) is done, the coupon refund is triggered by the user and the subsequent coupon balance is not enough to pay the booking any more.
When it COMMITs the balance becomes negative.
Note:
Even if I do a recheck of coupon balance in a new (6) step, there is a possibility for the coupon refund to happen in the meantime between (6) and COMMIT.
Scenario 2
Two concurrent booking payment transactions for which total number of coupons for payment is too much for the global balance to stay positive. Only one of them can happen.
Transaction 1 and transaction 2 are checking for balance and seeing enough coupons for their respective payment in step (1).
They go on with their operations and COMMIT. The new balance is negative and conflicting with the constraint.
Note:
Even if I do a coupon balance recheck in a new (6) step the transactions cannot see the operations not yet commited by the other one.
So they blindly proceed to COMMIT.
I guess this is an usual concurrency case but I cannot find a pattern to solve this on the internet.
I thought of rechecking the balance after the COMMIT so I can manually UNDO all the operations. But it is not totally safe since if an exception happen after the commit the UNDO won't be done.
Any idea to solve this concurrency problem?
Thanks.
Your problem boils down to the question of "what should be the synchronization lock". From your question it seems that the booking is not booking of a specific item. But lets assume, that a user is booking a specific hotel room so you need to solve two problems:
prevent overbooking (e.g. booking the same thing for two people)
prevent parallel account state miscalculation
So when a user gets to a point when he/she is about to hit confirm button, this is a possible scenario you can implement:
begin transaction
lock the user entry so that parallel processes are blocked
SELECT * FROM user FOR UPDATE WHERE id = :id
re-check account balance and throw exception / rollback if there are insufficient funds
lock the item to be booked to prevent overbooking
SELECT * FROM room FOR UPDATE WHERE id = :id
re-check booking availability and throw exception / rollback if the item is already booked
create booking entry and subtract funds from user's account
commit transaction (all locks will be released)
If, in your case, you don't need to check for overbooking, just skip / ignore steps 4 and 5.
Below is the solution I've implemented.
Note: I just treated the coupon transfer part below but it is the same with booking state change and booking_state_history.
The main idea is to preserve this part of the processing as a critical section.
When an INSERT into booking_payment, coupon_purchase or coupon_refund is to be done I prevent other transactions from doing the same by putting a lock on a dedicated table through an UPDATE for the given user_id.
This way, only transactions impacting this given user_id for the same kind of treatment will be locked.
Intialization
DROP TABLE coupon_purchase;
DROP TABLE coupon_refund;
DROP TABLE booking_payment;
DROP TABLE lock_coupon_transaction;
CREATE TABLE coupon_purchase(
coupon_purchase_id SERIAL PRIMARY KEY,
user_id INT,
nb INT);
CREATE TABLE coupon_refund(
coupon_refund_id SERIAL PRIMARY KEY,
user_id INT,
nb INT);
CREATE TABLE booking_payment(
booking_payment_id SERIAL PRIMARY KEY,
user_id INT,
booking_id INT,
nb INT);
CREATE TABLE lock_coupon_transaction (
user_id INT,
timestamp TIMESTAMP);
INSERT INTO coupon_purchase
(user_id, nb) VALUES
(1, 1),
(1, 5);
INSERT INTO coupon_refund
(user_id, nb) VALUES
(1, 3);
INSERT INTO lock_coupon_transaction
(user_id, timestamp) VALUES
(1, current_timestamp);
Transaction 1
BEGIN;
UPDATE lock_coupon_transaction SET timestamp=current_timestamp WHERE user_id='1';
WITH coupon_balance AS (
SELECT
t1.nb_purchased_coupons -
t2.nb_refunded_coupons -
t3.nb_booking_payment_coupons AS total
FROM
(SELECT COALESCE(SUM(nb),0) AS nb_purchased_coupons FROM coupon_purchase WHERE user_id='1' ) t1,
(SELECT COALESCE(SUM(nb),0) AS nb_refunded_coupons FROM coupon_refund WHERE user_id='1' ) t2,
(SELECT COALESCE(SUM(nb),0) AS nb_booking_payment_coupons FROM booking_payment WHERE user_id='1' ) t3
)
INSERT INTO booking_payment
(user_id, booking_id, nb)
SELECT 1::INT, 1::INT, 3::INT
FROM coupon_balance
WHERE (total::INT >= 3::INT);
INSERT 0 1
Transaction 2
BEGIN;
UPDATE lock_coupon_transaction SET timestamp=current_timestamp WHERE user_id='1';
// Transaction is locked waiting for a COMMIT or ROLLBACK from transaction 1.
Transaction 1
COMMIT;
COMMIT
Transaction 2
// Transaction 1 lock has been released so transaction 2 can go on
WITH coupon_balance AS (
SELECT
t1.nb_purchased_coupons -
t2.nb_refunded_coupons -
t3.nb_booking_payment_coupons AS total
FROM
(SELECT COALESCE(SUM(nb),0) AS nb_purchased_coupons FROM coupon_purchase WHERE user_id='1' ) t1,
(SELECT COALESCE(SUM(nb),0) AS nb_refunded_coupons FROM coupon_refund WHERE user_id='1' ) t2,
(SELECT COALESCE(SUM(nb),0) AS nb_booking_payment_coupons FROM booking_payment WHERE user_id='1' ) t3
)
INSERT INTO coupon_refund
(user_id, nb)
SELECT 1::INT, 3::INT
FROM coupon_balance
WHERE (total::INT >= 3::INT);
INSERT 0 0
COMMIT;
COMMIT
INSERT couldn't be done since not enough money on the account. This is the expected behavior.
The previous transaction was commited when the second one proceeded. So transaction 2 could see all changes made by transaction 1.
This way there is not risk to have concurrent access to coupons handling.

Why is this Postgres query throwing "duplicate key value violates unique constraint"? [duplicate]

This question already has answers here:
Insert, on duplicate update in PostgreSQL?
(18 answers)
Closed 8 years ago.
I've implemented simple update/insert query like this:
-- NOTE: :time is replaced in real code, ids are placed statically for example purposes
-- set status_id=1 to existing rows, update others
UPDATE account_addresses
SET status_id = 1, updated_at = :time
WHERE account_id = 1
AND address_id IN (1,2,3)
AND status_id IN (2);
-- filter values according to what that update query returns, i.e. construct query like this to insert remaining new records:
INSERT INTO account_addresses (account_id, address_id, status_id, created_at, updated_at)
SELECT account_id, address_id, status_id, created_at::timestamptz, updated_at::timestamptz
FROM (VALUES (1,1,1,:time,:time),(1,2,1,:time,:time)) AS sub(account_id, address_id, status_id, created_at, updated_at)
WHERE NOT EXISTS (
SELECT 1 FROM account_addresses AS aa2
WHERE aa2.account_id = sub.account_id AND aa2.address_id = sub.address_id
)
RETURNING id;
-- throws:
-- PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "..."
-- DETAIL: Key (account_id, address_id)=(1, 1) already exists.
The reason why I'm doing it this way is: the record MAY exist with status_id=2. If so, set status_id=1.
Then insert new records. If it already exists, but was not affected by first UPDATE query, ignore it (i.e. rows with status_id=3).
This works nicely, but doing it concurrently, it crashes on duplicate key in race condition.
But why is race condition occurring, if I'm trying to do that "insert-where-not-exists" atomically?
Ah. I just searched a little more and insert where not exists is not atomic.
Quote from http://www.postgresql.org/message-id/26970.1296761016#sss.pgh.pa.us :
Mage writes:
The main question is that isn't "insert into ... select ... where not
exists" atomic?
No, it isn't: it will fail in the presence of other transactions
doing the same thing, because the EXISTS test will only see rows that
committed before the command started. You might care to read the
manual's chapter about concurrency:
http://www.postgresql.org/docs/9.0/static/mvcc.html