I would like to test deadlocks on PostgreSQL 13 using pgAdmin 4, with different lock types and isolation levels.
So far, I have tried opening two pgAdmin tabs and running different transaction blocks like these:
--LOCK stats IN SHARE ROW EXCLUSIVE MODE;
--LOCK stats IN ROW SHARE MODE;
--LOCK stats IN ROW EXCLUSIVE MODE;
--SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
UPDATE stats SET clicks = 10 WHERE id = 59;
UPDATE stats SET leads = 10 WHERE id = 60;
UPDATE stats SET calls = 10 WHERE id = 59;
UPDATE stats SET reviews = 10 WHERE id = 60;
UPDATE stats SET saves = 10 WHERE id = 59;
UPDATE stats SET bookings = 10 WHERE id = 60;
COMMIT;
--SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
UPDATE stats SET clicks = 10 WHERE vendor_id = 60;
UPDATE stats SET leads = 10 WHERE vendor_id = 59;
UPDATE stats SET calls = 10 WHERE vendor_id = 60;
UPDATE stats SET reviews = 10 WHERE vendor_id = 59;
UPDATE stats SET saves = 10 WHERE vendor_id = 60;
UPDATE stats SET bookings = 10 WHERE vendor_id = 59;
COMMIT;
But to my surprise, rows are updated perfectly fine regardless of the lock type and isolation level. Reading the documentation I assume the default table lock is ROW EXCLUSIVE and the default transaction isolation level is READ COMMITTED.
I guess both transaction blocks are never executed concurrently when running on different pgAdmin tabs. Is this the expected behavior or am I doing something wrong? How could I run both transaction blocks in different threads?
Thanks in advance.
Anyway, to avoid deadlocks I have implemented the different transactions (insert, update) inside a procedure and released the locks with COMMIT after each transaction.
For testing it, I have deployed with docker-compose different services that call the procedure. This procedure updates the same table rows and is called repeatedly.
Related
I have a financial system where users have tokens and can add transactions. The system has to calculate the balance and mean acquisition price of each token. Data integrity is of utmost importance in the system and it should be impossible to have incorrect balances or mean prices in the system.
To comply with these requirements I've come up with the following tables:
token (to hold each token)
transaction (to hold each transaction of a token)
balance (to hold the token balances without having to calculate each time using all transactions)
The token and transaction tables are straight forward. The balance table is a table that is automatically updated using a PostgreSQL trigger to hold each change of balance in a token. This table exists so every time we need to know something like "What was the balance/mean price of token A in 2023-01-05?" we don't need to sum all transactions and calculate from scratch.
Trigger
Enough of explanation, this is the trigger I've come up with. It fires AFTER every INSERT in the transaction table.
DECLARE
old_balance NUMERIC(17, 8);
old_mean_price NUMERIC(17, 8);
old_local_mean_price NUMERIC(17, 8);
new_balance NUMERIC(17, 8);
new_mean_price NUMERIC(17, 8);
new_local_mean_price NUMERIC(17, 8);
BEGIN
-- Prevent the creation of retroactive transaction since it would mess up the balance table
IF EXISTS (
SELECT * FROM transaction
WHERE
token_id = NEW.token_id
AND date > NEW.date
) THEN
RAISE EXCEPTION 'There is already a newer transaction for token %', NEW.token_id;
END IF;
-- Fetch the latest balance of this token
SELECT
amount,
mean_price,
local_mean_price
INTO
old_balance, old_mean_price, old_local_mean_price
FROM balance
WHERE
token_id = NEW.token_id
AND date <= NEW.date
ORDER BY date DESC
LIMIT 1;
-- If there's no balance in the table then set everything to zero
old_balance := COALESCE(old_balance, 0);
old_mean_price := COALESCE(old_mean_price, 0);
old_local_mean_price := COALESCE(old_local_mean_price, 0);
-- Calculate the new values
IF NEW.side = 'buy' THEN
new_balance := old_balance + NEW.quantity;
new_mean_price := (old_balance * old_mean_price + NEW.quantity * NEW.unit_price) / new_balance;
new_local_mean_price := (old_balance * old_local_mean_price + NEW.quantity * NEW.local_unit_price) / new_balance;
ELSIF NEW.side = 'sell' THEN
new_balance := old_balance - NEW.quantity;
new_mean_price := old_mean_price;
new_local_mean_price := old_local_mean_price;
ELSE
RAISE EXCEPTION 'Side is invalid %', NEW.side;
END IF;
-- Update the balance table
IF NOT EXISTS (
SELECT * FROM balance
WHERE
date = NEW.date
AND token_id = NEW.token_id
) THEN
-- Create a row in the balance table
INSERT INTO balance
(date, token_id, amount, mean_price, local_mean_price)
VALUES
(
NEW.date,
NEW.token_id,
new_balance,
new_mean_price,
new_local_mean_price
);
ELSE
-- There's already a row for this token and date in the balance table. We should update it.
UPDATE balance
SET
amount = new_balance,
mean_price = new_mean_price,
local_mean_price = new_local_mean_price
WHERE
date = NEW.date
AND token_id = NEW.token_id;
END IF;
RETURN NULL;
END;
This trigger does some things:
Prevents the insertion of retroactive transactions, since this means we would have to update all the following balances
Add a new row in the balance table with the updated balance and mean prices of the token
Or, update the row in the balance if one already exists with the same datetime
Race condition
This works fine, but it has a race condition when executing 2 concurrent transactions. Imagine the following scenario:
Start T1 using BEGIN
Start T2 using BEGIN
T1 inserts a row in the transaction table
The trigger is fired inside T1 and it inserts a row in balance
T2 inserts a row in the transaction table
The trigger is fired inside T2 but it cannot see the changes made from the T1 trigger since it has not commited yet
The balance created by T2 is incorrect because it used stale data
Imperfect solution 1
Maybe I could change the SELECT statement in the trigger (the one that selects the previous balance) to use a SELECT FOR UPDATE. This way the trigger is locked until a concurrent trigger is commited. This doesn't work because of three things:
If it's the first transaction then the table balance doesn't have a row for that particular token (this could be solved by locking the token table)
Even if we lock and wait the concurrent transaction to commit, because of the way transaction works in PostgreSQL we would still fetch stale data since inside a transaction we only have visibility of the data that was there when the transaction started.
Even if we managed to get the most up to date information, there's still the issue that T1 can rollback and this means that the balance generated in T2 would still be incorrect
Imperfect solution 2
Another solution would be to scrap the FOR UPDATE and just defer the trigger execution to the transaction commit. This solves the race condition since the trigger is executed after the end of the transaction and has visibility of the most recent changed. The only issue is that it leaves me unable to use the balance table inside the transaction (since it will only be updated after the transaction commits)
Question
I have two questions regarding this:
Does the Imperfect solution 2 really solves all the race condition problems or am I missing something?
Is there a way to solve this problem and also update the balance table ASAP?
Your solution 2 only narrows the race condition, but does not fix it. Both transactions could commit at the same time.
There are only two ways to prevent such a race condition:
use the SERIALIZABLE transaction isolation level (you can set that as default value with the parameter default_transaction_isolation)
lock whatever is necessary to prevent concurrent operations (for example, the corresponding row in balance)
Besides, your code can be improved: You should check for the existence of a balance only once, and you could use INSERT ... ON CONFLICT.
You could read my article for a more detailed analysis.
You could also consider either creating an extra table containing the running transactions and trowing an error if an insert into that is not possible or simply locking the relevant balance rows and forcing the transactions to run fully sequentially that way. Either way you force conflicting statements to run one at a time thus resolving the race condition.
I have a requirement to read 50K records from one database and then insert or update those records into another database. The read takes a couple seconds. But the inserts/updates for those 50K records is taking up to 23 minutes even with multithreading. I have been playing with page, fetch and batch sizes but the performance isn't improving that much.
Is there a way to implement this kind of statement in a JdbcItemWriter?
-- hv_nrows = 3
-- hv_activity(1) = 'D'; hv_description(1) = 'Dance'; hv_date(1) = '03/01/07'
-- hv_activity(2) = 'S'; hv_description(2) = 'Singing'; hv_date(2) = '03/17/07'
-- hv_activity(3) = 'T'; hv_description(3) = 'Tai-chi'; hv_date(3) = '05/01/07'
-- hv_group = 'A';
-- note that hv_group is not an array. All 3 values contain the same values
MERGE INTO RECORDS AR
USING (VALUES (:hv_activity, :hv_description, :hv_date, :hv_group)
FOR :hv_nrows ROWS)
AS AC (ACTIVITY, DESCRIPTION, DATE, GROUP)
ON AR.ACTIVITY = AC.ACTIVITY AND AR.GROUP = AC.GROUP
WHEN MATCHED
THEN UPDATE SET (DESCRIPTION, DATE, LAST_MODIFIED)
= (AC.DESCRIPTION, AC.DATE, CURRENT TIMESTAMP)
WHEN NOT MATCHED
THEN INSERT (GROUP, ACTIVITY, DESCRIPTION, DATE, LAST_MODIFIED)
VALUES (AC.GROUP, AC.ACTIVITY, AC.DESCRIPTION, AC.DATE, CURRENT TIMESTAMP)
NOT ATOMIC CONTINUE ON SQLEXCEPTION;
My idea is to send a bunch of rows to be merged at once and see if the perfomance improves.
I tried something like this:
MERGE INTO TEST.DEST_TABLE A
USING (
('00000031955190','0107737793'),
('00000118659978','0107828212'),
('00000118978436','0095878120'),
('00000122944473','0106845043')
) AS B(CDFILIAC,CDRFC)
ON A.CDFILIAC = B.CDFILIAC
WHEN MATCHED THEN
UPDATE SET
A.CDRFC=B.CDRFC,
A.CDNUMPOL=B.CDNUMPOL
And while it works for DB2 LUW, it doesn't for DB2 ZOS.
Is there a way to implement this kind of statement in a JdbcItemWriter?
The JdbcBatchItemWriter delegates to org.springframework.jdbc.core.namedparam.NamedParameterJdbcOperations. So if NamedParameterJdbcOperations from Spring Framework supports this kind of queries, Spring Batch also does. Otherwise, you would need to create a custom writer.
while it works for DB2 LUW, it doesn't for DB2 ZOS.
Spring Batch cannot help at this level, as this seems to be DB specific.
Below the steps I followed to test the SKIP LOCKED:
open one sql console of some Postgres UI client
Connect to Postgres DB
execute the queries
CREATE TABLE t_demo AS
SELECT *
FROM generate_series(1, 4) AS id;
check rows are created in that table:
TABLE t_demo
select rows using below query:
SELECT *
FROM t_demo
WHERE id = 2
FOR UPDATE SKIP LOCKED;
it is returning results as 2
Now execute the above query again:
SELECT *
FROM t_demo
WHERE id = 2
FOR UPDATE SKIP LOCKED;
this second query should not return any results, but it is returning results as 2
https://www.postgresql.org/docs/current/static/sql-select.html#SQL-FOR-UPDATE-SHARE
To prevent the operation from waiting for other transactions to
commit, use either the NOWAIT or SKIP LOCKED option
(emphasis mine)
if you run both queries in one window - you probably either run both in one transaction (then your next statement is not other transaction" or autocommiting after each statement (default)((but then you commit first statement transaction before second starts, thus lock released and you observe no effect
I am new to Postgres so this may be obvious (or very difficult, I am not sure).
I would like to force a table or row to be "locked" for at least a few seconds at a time. Which will cause a second operation to "wait".
I am using golang with "github.com/lib/pq" to interact with the database.
The reason I need this is because I am working on a project that monitors postgresql. Thanks for any help.
You can also use select ... for update to lock a row or rows for the length of the transaction.
Basically, it's like:
begin;
select * from foo where quatloos = 100 for update;
update foo set feens = feens + 1 where quatloos = 100;
commit;
This will execute an exclusive row-level lock on foo table rows where quatloos = 100. Any other transaction attempting to access those rows will be blocked until commit or rollback has been issued once the select for update has run.
Ideally, these locks should live as short as possible.
See: https://www.postgresql.org/docs/current/static/explicit-locking.html
I have big stored procedures that handle user actions.
They consist of multiple select statements. These are filtered, most of the times only getting one row. The Selects are copied into temptables or otherwise evaluated.
Finally, a merge-Statement does the needed changes in the DB.
All is encapsulated in a transaction.
I have concurrent input from users, and the selected rows of the select statements should be locked to keep data integrity.
How can I lock the selected Rows of all select statements, so that they aren't updated through other transactions while the current transaction is in process?
Does a table hint combination of ROWLOCK and HOLDLOCK work in a way that only the selected rows are locked, or are the whole tables locked because of the HOLDLOCK?
SELECT *
FROM dbo.Test
WITH (ROWLOCK HOLDLOCK )
WHERE id = #testId
Can I instead use
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
right after the start of the transaction? Or does this lock the whole tables?
I am using SQL2008 R2, but would also be interested if things work differently in SQL2012.
PS: I just read about the table hints UPDLOCK and SERIALIZE. UPDLOCK seems to be a solution to lock only one row, and it seems as if UPDLOCK always locks instead of ROWLOCK, which does only specify that locks are row based IF locks are applied. I am still confused about the best way to solve this...
Changing the isolation level fixed the problem (and locked on row level):
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
Here is how I tested it.
I created a statement in a blank page of the SQL Management Studio:
begin tran
select
*
into #message
from dbo.MessageBody
where MessageBody.headerId = 28
WAITFOR DELAY '0:00:05'
update dbo.MessageBody set [message] = 'message1'
where headerId = (select headerId from #message)
select * from dbo.MessageBody where headerId = (select headerId from #message)
drop table #message
commit tran
While executing this statement (which takes at last 5 seconds due to the delay), I called the second query in another window:
begin tran
select
*
into #message
from dbo.MessageBody
where MessageBody.headerId = 28
update dbo.MessageBody set [message] = 'message2'
where headerId = (select headerId from #message)
select * from dbo.MessageBody where headerId = (select headerId from #message)
drop table #message
commit tran
and I was rather surprised that it executed instantaneously. This was due to the default SQL Server transaction level "Read Commited" http://technet.microsoft.com/en-us/library/ms173763.aspx . Since the update of the first script is done after the delay, during the second script there are no umcommited changes yet, so the row 28 is read and updated.
Changing the Isolation level to Serialization prevented this, but it also prevented concurrency - both scipts were executed consecutively.
That was OK, since both scripts read and changed the same row (via headerId=28). Changing headerId to another value in the second script, the statements were executed parallel. So the lock from SERIALIZATION seems to be on row level.
Adding the table hint
WITH ( SERIALIZABLE)
in the first select of the first statement does also prevent further reads oth the selected row.