How to handle two insert on conflict do update parallel postgres queries - postgresql

My android device calls an endpoint in spring boot. When two devices call in parallel - the insert is not completed for the first call, the second call reaches the db meanwhile. The first call seems to be generating the primary key and the second call sees the conflict and hence tries to update. The first call is supposed to insert, subsequent calls should increment the value by 1. The insert is not completed so the second call tries to update the value. Hence it updates the value as null. Subsequent calls do value + 1 hence they update null again. How do I handle this scenario to make sure one call locks the row or how do I solve this problem?
insert
into
table1 (primary_key,
quantity)
values(:primary_key,
:qty) on
conflict primary_key do
update
set
quantity = (
select
coalesce(quantity, 1) + :qty
from
table1
where
primary_key = :primary_key)
where
primary_key = :primary_key;
Note - :qty will be 1 and :primary_key will be the key passed on from the code.

Make your database insert action a transaction (what is a database transaction?), in short it either finishes everything or nothing.
You didn't post your back-end code but for reference it can be done in spring boot like this https://spring.io/guides/gs/managing-transactions/

The explanation is that your subquery doesn't find a row yet, because it is not yet visible to the transaction. Hence the result is NULL.
Avoid that race condition with the much simpler
INSERT INTO table1 (primary_key, quantity)
VALUES (:primary_key, :qty)
ON CONFLICT (primary_key)
DO UPDATE
SET quantity = coalesce(table1.quantity, 1) + EXCLUDED.quantity;

Related

Is INSTEAD OF UPDATE trigger the best option

I have to check when a table is inserted to/updated to see if a column value exists for the same HotelID and different RoomNo in the same table. I'm thinking that an INSTEAD OF trigger on the table would be a good option, but I read that it's a bad idea to update/insert the table the trigger executes on inside the trigger and you should create the trigger on a view instead (which raises more questions for me)
Is it ok to create a trigger like this? Is there a better option?
CREATE TRIGGER dbo.tgr_tblInterfaceRoomMappingUpsert
ON dbo.tblInterfaceRoomMapping
INSTEAD OF INSERT, UPDATE
AS
BEGIN
SET NOCOUNT ON;
DECLARE #txtRoomNo nvarchar(20)
SELECT #txtRoomNo = Sonifi_RoomNo
FROM dbo.tblInterfaceRoomMapping r
INNER JOIN INSERTED i
ON r.iHotelID = i.iHotelID
AND r.Sonifi_RoomNo = i.Sonifi_RoomNo
AND r.txtRoomNo <> i.txtRoomNo
IF #txtRoomNo IS NULL
BEGIN
-- Insert/update the record
END
ELSE
BEGIN
-- Raise error
END
END
GO
So it sounds like you only want 1 row per combo of HotelID and Sonifi_RoomNo.
CREATE UNIQUE INDEX UQ_dbo_tblInterfaceRoomMapping
ON dbo.tblInterfaceRoomMapping(HotelID,Sonifi_RoomNo)
Now if you try and put a second row with the same values, it will bark at you.
It's (usually) not okay to create a trigger like that.
Your trigger assumes a single row update or insert will only ever occur - is that guaranteed?
What will be the value of #txtRoomNo if multiple rows are inserted or updated in the same batch?
Eg, if an update is performed against the table resulting in 1 row with correct data and 1 row with incorrect data, how do you think your trigger would cope in that situation? Remember triggers fire once per insert/update, not per row.
Depending on your requirments you could keep the instead of trigger concept, however I would suggest a separate trigger for inserts and for updates.
In each you can then insert / update and include a where not exists clause to only allow valid inserts / updates, ignoring inserting or updating anything invalid.
I would avoid raising an error in the trigger, if you need to handle bad data you could also insert into some logging table with the reverse where exists logic and then handle separately.
Ultimately though, it would be best for the application to check if the roomNo is already used.

Lock row, release later

I'm trying to understand how to lock a row, and only release that lock later.
I have a table like this :
create table testTable (Name varchar(100));
Some test data
insert into testTable (name) select 'Bob';
insert into testTable (name) select 'John';
insert into testTable (name) select 'Steve';
Now, I want to select one of those rows, and prevent other other queries from seeing this row. I achieve that like this :
begin transaction;
select * from testTable where name = 'Bob' for update;
In another window, I do this :
select * from testTable for update skip locked;
Great, I don't see 'Bob' in that result set. Now, I want to do something with the primary retrieved row (Bob), and after I did my work, I want to release that row again. Simple answer would be to do :
commit transaction
However, I am running multiple transactions on the same connection, so I can't just begin and commit transactions all over the show. Ideally I would like to have a "named" transaction, something like :
begin transaction 'myTransaction';
select * from testTable where name = 'Bob' for update;
//do stuff with the data, outside sql then later call ...
commit transaction 'myTransaction';
But postgres doesn't support that. I have found "prepare transaction", but that seems to be a pear-shaped path I don't want to go down, especially as these transaction seem to persist through restarts even.
Is there anyway I can have a reference to commit/rollback for a specific transaction?
You can have only one transaction in a database session, so the question as such is moot.
But I assume that you do not really want to run a transaction, you want to block access to a certain row for a while.
It is usually not a good idea to use regular database locks for such a purpose (the exception are advisory locks, which serve exactly that purpose, but are not tied to table rows). The problem is that long database transactions keep autovacuum from doing its job.
I recommend that you add a status column to the table and change the status rather than locking the row. That would server the same purpose in a more natural fashion and make your problem go away.
If you are concerned that the status flag might not get cleared due to application logic problems, replace it with a visible_from column of type timestamp with time zone that initially contains -infinity. Instead of locking the row, set the value to current_timestamp + INTERVAL '5 minutes'. Only select rows that fulfill WHERE visible_from < current_timestamp. That way the “lock” will automatically expire after 5 minutes.

Postgres: SELECT FOR UPDATE does not see new rows after lock release

Trying to support PostgreSQL DB in my application, found this strange behaviour.
Preparation:
CREATE TABLE test(id INTEGER, flag BOOLEAN);
INSERT INTO test(id, flag) VALUES (1, true);
Assume two concurrent transactions (Autocommit=false, READ_COMMITTED) TX1 and TX2:
TX1:
UPDATE test SET flag = FALSE WHERE id = 1;
INSERT INTO test(id, flag) VALUES (2, TRUE);
-- (wait, no COMMIT yet)
TX2:
SELECT id FROM test WHERE flag=true FOR UPDATE;
-- waits for TX1 to release lock
Now, if I COMMIT in TX1, the SELECT in TX2 returns empty cursor.
It is strange to me, because same experiment in Oracle and MariaDB results in selecting newly created row (id=2).
I could not find anything about this behaviour in PG documentation.
Am I missing something?
Is there any way to force PG server to "refresh" statement visibility after acquiring lock?
PS: PostgreSQL version 11.1
TX2 scans the table and tries to lock the results.
The scan sees the snapshot of the database from the start of the query, so it cannot see any rows that were inserted (or made eligible in some other way) by concurrent modifications that started after that snapshot was taken.
That is why you cannot see the row with the id 2.
For id 1, that is also true, so the scan finds that row. But the query has to wait until the lock is released. When that finally happens, it fetches that latest committed version of the row and performs the check again, so that row is excluded as well.
This “EvalPlanQual” recheck (to use PostgreSQL jargon) is only performed for rows that were found during the scan, but were locked. The second row isn't even found during the scan, so no such processing happens there.
This is a bit odd, admitted. But it is not a bug, it is just the way PostgreSQL wirks.
If you want to avoid such anomalies, use the REPEATABLE READ isolation level. Then you will get a serialization error in such a case and can retry the transaction, thus avoiding inconsistencies like that.

postgresql on conflict-cannot affect row a second time

I have a table, i have auto numbering/sequence on data_id
tabledata
---------
data_id [PK]
data_code [Unique]
data_desc
example code:
insert into tabledata(data_code,data_desc) values(Z01,'red')
on conflict (data_code) do update set data_desc=excluded.data_desc
works fine, and then i insert again
insert into tabledata(data_code,data_desc) values(Z01,'blue')
on conflict (data_code) do update set data_desc=excluded.data_desc
i got this error
[Err] ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
this is my real code
insert into psa_aso_branch(branch_code,branch_desc,regional_code,status,created_date,lastmodified_date)
(select branch_code, branch, kode_regional,
case when status_data='Y' then true
else false end, current_date, current_date
from branch_history) on conflict (branch_code) do
update set branch_desc = excluded.branch_desc, regional_code = excluded.regional_code,status = (case when excluded.status='Y' then true else false end), created_date=current_date, lastmodified_date=current_date;
working fine on first, but not the next one (like the example i give you before)
You can use update on the existing record/row, and not on row you are inserting.
Here update in on conflict clause applies to row in excluded table, which holds row temporarily.
In the first case record is inserted since there is no clash on data_code and update is not executed at all.
In the second insert you are inserting Z01 which is already inserted as data_code and data_code is unique.
The excluded table still holds the duplicate value of data_code after the update, so the record is not inserted. In update set data_code have to be changed in order to insert record properly.
I have been stuck on this issue for about 24 hours.
It is weird when I test the query on cli and it's works fine. It is working fine when I make an insertion using one data row. This errors only appear when I'm using insert-select.
It is not mostly because of insert-select problem. It is because the select rows is not unique. This will trigger the CONFLICT for more than once.
Thanks to #zivaricha comment. I experiment from his notes. Just that its hard to understand at first.
Solution:
Using distinct to make sure the select returns unique result.
This error comes when the duplicacy occurs multiple times in the single insertion
for example you have column a , b , c and combination of a and b is unique and on duplicate you are updating c.
Now suppose you already have a = 1 , b = 2 , c = 3 and you are inserting a = 1 b = 2 c = 4 and a = 1 b = 2 c = 4
so means conflict occurs twice so it cant update a row twice
I think what is happening here
when you do an update on conflict, it does an update that re conflicts again and then throws that error
We can find the error message from the source code, which we can simply understand why we got ON CONFLICT DO UPDATE command cannot affect row a second time.
In the source code of PostgreSQL at src/backend/executor/nodeModifyTable.c and the function of ExecOnConflictUpdate(), we can find this comment:
This can occur when a just inserted tuple is updated again in the same command. E.g. because multiple rows with the same conflicting key values are inserted.
This is somewhat similar to the ExecUpdate() TM_SelfModified case. We do not want to proceed because it would lead to the same row being updated a second time in some unspecified order, and in contrast to plain UPDATEs there's no historical behavior to break.
As the comment said, we can not update the row which we are inserting in INSERT ... ON CONFLICT, just like:
postgres=# CREATE TABLE t (id int primary key, name varchar);
postgres=# INSERT INTO t VALUES (1, 'smart'), (1, 'keyerror')
postgres=# ON CONFLICT (id) DO UPDATE SET name = 'Buuuuuz';
ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
Remember, the executor of postgresql is a volcano model, so it will process the data we insert one by one. When we process to (1, 'smart'), since the table is empty, we can insert normally. When we get to (1, 'keyerror'), there is a conflict with the (1, 'smart') we just inserted, so the update logic is executed, which results in updating our own inserted data, which PostgreSQL doesn't allow us to do.
Similarly, we cannot update the same row of data twice:
postgres=# DROP TABLE IF EXISTS t;
postgres=# CREATE TABLE t (id int primary key, name varchar);
postgres=# INSERT INTO t VALUES (1, 'keyerror'), (1, 'buuuuz')
postgres=# ON CONFLICT (id) DO UPDATE SET name = 'Buuuuuuuuuz';
ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.

Getting Affected Rows by UPDATE statement in RAW plpgsql

This has been asked multiple times here and here, but none of the answers are suitable in my case because I do not want to execute my update statement in a PL/PgSQL function and use GET DIAGNOSTICS integer_var = ROW_COUNT.
I have to do this in raw SQL.
For instance, in MS SQL SERVER we have ##ROWCOUNT which could be used like the following :
UPDATE <target_table>
SET Proprerty0 = Value0
WHERE <predicate>;
SELECT <computed_value_columns>
FROM <target>
WHERE ##ROWCOUNT > 0;
In one roundtrip to the database I know if the update was successfull and get the calculated values back.
What could be used instead of '##ROWCOUNT' ?
Can someone confirm that this is in fact impossible at this time ?
Thanks in advance.
EDIT 1 : I confirm that I need to use raw SQL (I wrote "raw plpgsql" in the original description).
In an attempt to make my question clearer please consider that the update statement affects only one row and think about optimistic concurrency:
The client did a SELECT Statement at first.
He builds the UPDATE and knows which database computed columns are to be included in the SELECT clause. Among other things, the predicate includes a timestamp that is computed each time the rows is updated.
So, if we have 1 row returned then everything is OK. If no row is returned then we know that there was a previous update and the client may need to refresh the data before trying to update clause again. This is why we need to know how many rows where affected by the update statement before returning computed columns. No row should be returned if the update fails.
What you want is not currently possible in the form that you describe, but I think you can do what you want with UPDATE ... RETURNING. See UPDATE ... RETURNING in the manual.
UPDATE <target_table>
SET Proprerty0 = Value0
WHERE <predicate>
RETURNING Property0;
It's hard to be sure, since the example you've provided is so abstract as to be somewhat meaningless.
You can also use a wCTE, which allows more complex cases:
WITH updated_rows AS (
UPDATE <target_table>
SET Proprerty0 = Value0
WHERE <predicate>
RETURNING row_id, Property0
)
SELECT row_id, some_computed_value_from_property
FROM updated_rows;
See common table expressions (WITH queries) and depesz's article on wCTEs.
UPDATE based on some added detail in the question, here's a demo using UPDATE ... RETURNING:
CREATE TABLE upret_demo(
id serial primary key,
somecol text not null,
last_updated timestamptz
);
INSERT INTO upret_demo (somecol, last_updated) VALUES ('blah',current_timestamp);
UPDATE upret_demo
SET
somecol = 'newvalue',
last_updated = current_timestamp
WHERE last_updated = '2012-12-03 19:36:15.045159+08' -- Change to your timestamp
RETURNING
somecol || '_computed' AS a,
'totally_new_computed_column' AS b;
Output when run the 1st time:
a | b
-------------------+-----------------------------
newvalue_computed | totally_new_computed_column
(1 row)
When run again, it'll have no effect and return no rows.
If you have more complex calculations to do in the result set, you can use a wCTE so you can JOIN on the results of the update and do other complex things.
WITH upd_row AS (
UPDATE upret_demo SET
somecol = 'newvalue',
last_updated = current_timestamp
WHERE last_updated = '2012-12-03 19:36:15.045159+08'
RETURNING id, somecol, last_updated
)
SELECT
'row_'||id||'_'||somecol||', updated '||last_updated AS calc1,
repeat('x',4) AS calc2
FROM upd_row;
In other words: Use UPDATE ... RETURNING, either directly to produce the calculated rows, or in a writeable CTE for more complex cases.
Generally the answer to this question depends on the type of the driver used.
PQcmdTuples() function does what is needed, if the application uses libpq. Other libraries on top of libpq need to have some wrapper on top of this function.
For JDBC the Statement.executeUpdate() method seems to the job.
ODBC provides SQLRowCount() function for the similar purpose.