PostgreSQL concurrent check if row exists in table - postgresql

Suppose I have simple logic:
If user had no balance accrual earlier (which is recorded in accruals table), we must give him 100$ to balance:
START TRANSACTION;
DO LANGUAGE plpgsql $$
DECLARE _accrual accruals;
BEGIN
--LOCK TABLE accruals; -- label A
SELECT * INTO _accrual from accruals WHERE user_id = 1;
IF _accrual.accrual_id IS NOT NULL THEN
RAISE SQLSTATE '22023';
END IF;
UPDATE users SET balance = balance + 100 WHERE user_id = 1;
INSERT INTO accruals (user_id, amount) VALUES (1, 100);
END
$$;
COMMIT;
The problem of this transaction is it's not concurrent.
Running this transaction in parrallel results getting user_id=1 with balance=200 and 2 accruals recorded.
How do I test concurrency ?
1. I run in session 1: START TRANSACTION; LOCK TABLE accruals;
2. In session 2 and session 3 I run this transaction
3. In session 1: ROLLBACK
The question is: How do I make this 100% concurrent and make sure user will have 100$ only once.
The only way I see is to lock the table (label A in code sample)
But do I have another way ?

The simplest way is probably to use the serializable isolation level (by changing default_transaction_isolation). Then one of the processes should get something like "ERROR: could not serialize access due to concurrent update"
If you want to keep the isolation level at 'read committed', then you can just count accruals at the end and throw an error:
START TRANSACTION;
DO LANGUAGE plpgsql $$
DECLARE _accrual accruals;
_count int;
BEGIN
SELECT * INTO _accrual from accruals WHERE user_id = 1;
IF _accrual.accrual_id IS NOT NULL THEN
RAISE SQLSTATE '22023';
END IF;
UPDATE users SET balance = balance + 100 WHERE user_id = 1;
INSERT INTO accruals (user_id, amount) VALUES (1, 100);
select count(*) into _count from accruals where user_id=1;
IF _count >1 THEN
RAISE SQLSTATE '22023';
END IF;
END
$$;
COMMIT;
This works because one process will block the other on the UPDATE (assuming non-zero number of rows get updated), and by the time one process commits to release the blocked process, its inserted row will be visible to the other one.
Formally there is then no need for the first check, but if you don't want a lot of churn due to rolled back INSERT and UPDATE, you might want to keep it.

Related

PostgreSQL read commit on different transactions

I was running some tests to better understanding read commits for postgresql.
I have two transactions running in parallel:
-- transaction 1
begin;
select id from item order by id asc FETCH FIRST 500 ROWS ONLY;
select pg_sleep(10);
commit;
--transaction 2
begin;
select id from item order by id asc FETCH FIRST 500 ROWS ONLY;
commit;
The first transaction will select first 500 ids and then hold the id by sleeping 10s
The second transaction will in the mean while querying for first 500 rows in the table.
Based my understanding of read commits, first transaction will select 1 to 500 records and second transaction will select 501 to 1000 records.
But the actual result is that both two transactions select 1 to 500 records.
I will be really appreciated if someone can point out which part is wrong. Thanks
You are misinterpreting the meaning of read committed. It means that a transaction cannot see (select) updates that are not committed. Try the following:
create table read_create_test( id integer generated always as identity
, cola text
) ;
insert into read_create_test(cola)
select 'item-' || to_char(n,'fm000')
from generate_series(1,50) gs(n);
-- transaction 1
do $$
max_val integer;
begin
insert into read_create_test(cola)
select 'item2-' || to_char(n+100,'fm000')
from generate_series(1,50) gs(n);
select max(id)
into max_val
from read_create_test;
raise notice 'Transaction 1 Max id: %',max_val;
select pg_sleep(30); -- make sure 2nd transaction has time to start
commit;
end;
$$;
-- transaction 2 (run after transaction 1 begins but before it ends)
do $$
max_val integer;
begin
select max(id)
into max_val
from read_create_test;
raise notice 'Transaction 2 Max id: %',max_val;
end;
$$;
-- transaction 3 (run after transaction 1 ends)
do $$
max_val integer;
begin
select max(id)
into max_val
from read_create_test;
raise notice 'Transaction 3 Max id: %',max_val;
end;
$$;
Analyze the results keeping in mind that A transaction cannot see uncommitted DML.

Optimistic Locking with PostgreSQL

Problem statement: I am using the Repository pattern to pull and update records from a PostgreSQL DB (version 11) into a Node.JS API via the pg npm module. If two users try to modify the same record at almost the same time, the changes made by the first user to submit will be overwritten by the second user to submit. I want to prevent the second user's changes from being submitted until they have updated their local copy of the record with the first user's changes.
I know that a DB like CouchDB has a "_rev" property that it uses in this case to detect an attempt to update from a stale snapshot. As I've researched more, I've found this is called optimistic locking (Optimistic locking queue). So I'll add a rev column to my table and use it in my SQL update statement.
UPDATE tableX
SET field1 = value1,
field2 = value2,
...,
rev_field = uuid_generate_v4()
WHERE id_field = id_value
AND rev_field = rev_value
However, if id_value is a match and rev_value isn't that won't tell my repository code that the record was stale, only that 0 rows were affected by the query.
So I've got a script that I've written in pgAdmin that will detect cases where the update affected 0 rows and then checks the rev_field.
DO $$
DECLARE
i_id numeric;
i_uuid uuid;
v_count numeric;
v_rev uuid;
BEGIN
i_id := 1;
i_uuid := '20b2e135-42d0-4a49-94c0-5557dd09abd1';
UPDATE account_r
SET account_name = 'savings',
rev = uuid_generate_v4()
WHERE account_id = i_id
AND rev = i_uuid;
GET DIAGNOSTICS v_count = ROW_COUNT;
IF v_count < 1 THEN
SELECT rev INTO v_rev
FROM account_r
WHERE account_id = i_id;
IF v_rev <> i_uuid THEN
RAISE EXCEPTION 'revision mismatch';
END IF;
END IF;
RAISE NOTICE 'rows affected: %', v_count;
END $$;
While I'm perfectly comfortable adapting this code into a stored proc and calling that from Node, I'm hoping that there's a solution to this that's not nearly as complex. On the one hand, moving these functions to the DB will clean up my JS code, on the other hand, this is a lot of boilerplate SQL to write, since it will have to be done for UPDATE and DELETE for each table.
Is there an easier way to get this done? (Perhaps the code at Optimistic locking queue is the easier way?) Should I be looking at an ORM to help reduce the headache here?
There is no need to maintain a rev value. You can get the md5 hash of a table's row.
SQL Fiddle Here
create table mytable (
id int primary key,
some_text text,
some_int int,
some_date timestamptz
);
insert into mytable
values (1, 'First entry', 0, now() - interval '1 day'),
(2, 'Second entry', 1, now()),
(3, 'Third entry', 2, now() + interval '1 day');
select *, md5(mytable::text) from mytable order by id;
The fiddle includes other queries to demonstrate that the calculated md5() is based on the values of the row.
Using that hash for optimistic locking, the updates can take the form:
update mytable
set some_int = -1
where id = 1
and md5(mytable::text) = <md5 hash from select>
returning *
You will still need to check for no return rows, but that could be abstracted away on the Node side.
It looks like result.row_count contains the number of rows affected, so you will not need the returning * part.

How to delete many rows from frequently accessed table

I need to delete the majority (say, 90%) of a very large table (say, 5m rows). The other 10% of this table is frequently read, but not written to.
From "Best way to delete millions of rows by ID", I gather that I should remove any index on the 90% I'm deleting, to speed up the process (except an index I'm using to select the rows for deletion).
From "PostgreSQL locking mode", I see that this operation will acquire a ROW EXCLUSIVE lock on the entire table. But since I'm only reading the other 10%, this ought not matter.
So, is it safe to delete everything in one command (i.e. DELETE FROM table WHERE delete_flag='t')? I'm worried that if the deletion of one row fails, triggering an enormous rollback, then it will affect my ability to read from the table. Would it be wiser to delete in batches?
Indexes are typically useless for operations on 90% of all rows. Sequential scans will be faster either way. (Exotic exceptions apply.)
If you need to allow concurrent reads, you cannot take an exclusive lock on the table. So you also cannot drop any indexes in the same transaction.
You could drop indexes in separate transactions to keep the duration of the exclusive lock at a minimum. In Postgres 9.2 or later you can also use DROP INDEX CONCURRENTLY, which only needs minimal locks. Later use CREATE INDEX CONCURRENTLY to rebuild the index in the background - and only take a very brief exclusive lock.
If you have a stable condition to identify the 10 % (or less) of rows that stay, I would suggest a partial index on just those rows to get the best for both:
Reading queries can access the table quickly (using the partial index) at all times.
The big DELETE is not going to modify the partial index at all, since none of the rows are involved in the DELETE.
CREATE INDEX foo (some_id) WHERE delete_flag = FALSE;
Assuming delete_flag is boolean. You have to include the same predicate in your queries (even if it seems logically redundant) to make sure Postgres can the partial index.
delete using batches of specific size and sleep between deletes:
create temp table t as
select id from tbl where ...;
create index on t(id);
do $$
declare sleep int = 5;
declare batch_size int = 10000;
declare c refcursor;
declare cur_id int = 0;
declare seq_id int = 0;
declare del_id int = 0;
declare ts timestamp;
begin
<<top>>
loop
raise notice 'sleep % sec', sleep;
perform pg_sleep(sleep);
raise notice 'continue..';
open c for select id from t order by id;
<<inn>>
loop
fetch from c into cur_id;
seq_id = seq_id + 1;
del_id = del_id + 1;
if cur_id is null then
raise notice 'goin to del end: %', del_id;
ts = current_timestamp;
close c;
delete from tbl tb using t where tb.id = t.id;
delete from t;
commit;
raise notice 'ok: %s', current_timestamp - ts;
exit top;
elsif seq_id >= batch_size then
raise notice 'goin to del: %', del_id;
ts = current_timestamp;
delete from tbl tb using t where t.id = tb.id and t.id <= cur_id;
delete from t where id <= cur_id;
close c;
commit;
raise notice 'ok: %s', current_timestamp - ts;
seq_id = 0;
exit inn;
end if;
end loop inn;
end loop top;
end;
$$;

Postgres concurrency issue

I wrote the following trigger to guarantee that the field 'filesequence' on the insert receives always the maximum value + 1, for one stakeholder.
CREATE OR REPLACE FUNCTION update_filesequence()
RETURNS TRIGGER AS '
DECLARE
lastSequence file.filesequence%TYPE;
BEGIN
IF (NEW.filesequence IS NULL) THEN
PERFORM ''SELECT id FROM stakeholder WHERE id = NEW.stakeholder FOR UPDATE'';
SELECT max(filesequence) INTO lastSequence FROM file WHERE stakeholder = NEW.stakeholder;
IF (lastSequence IS NULL) THEN
lastSequence = 0;
END IF;
lastSequence = lastSequence + 1;
NEW.filesequence = lastSequence;
END IF;
RETURN NEW;
END;
' LANGUAGE 'plpgsql';
CREATE TRIGGER file_update_filesequence BEFORE INSERT
ON file FOR EACH ROW EXECUTE PROCEDURE
update_filesequence();
But I have repeated 'filesequence' on the database:
select id, filesequence, stakeholder from file where stakeholder=5273;
id filesequence stakeholder
6773 5 5273
6774 5 5273
By my undertanding, the SELECT... FOR UPDATE would LOCK two transactions on the same stakeholder, and then the second one would read the new 'filesequence'. But it is not working.
I made some tests on PgAdmin, executing the following:
BEGIN;
select id from stakeholder where id = 5273 FOR UPDATE;
And it realy LOCKED other records being inserted to the same stakeholder. Then it seems that the LOCK is working.
But when I run the application with concurrent uploads, I see then repeating.
Someone could help me in finding what is the issue with my trigger?
Thanks,
Douglas.
Your idea is right. To get an autoincrement based on another field (let's say it designate to a group) you cannot use a sequence, then you have to lock the rows of that group before incrementing it.
The logic of your trigger function does that. But you have a misunderstood about the PERFORM operation. It supposed to be put instead of the SELECT keyword, so it does not receive an string as parameter. It means that when you do:
PERFORM 'SELECT id FROM stakeholder WHERE id = NEW.stakeholder FOR UPDATE';
The PL/pgSQL is actually executing:
SELECT 'SELECT id FROM stakeholder WHERE id = NEW.stakeholder FOR UPDATE';
And ignoring the result.
What you have to do on this line is:
PERFORM id FROM stakeholder WHERE id = NEW.stakeholder FOR UPDATE;
That is it, only change this line and you are done.

count number of rows to be affected before update in trigger

I want to know number of rows that will be affected by UPDATE query in BEFORE per statement trigger . Is that possible?
The problem is that i want to allow only queries that will update up to 4 rows. If affected rows count is 5 or more i want to raise error.
I don't want to do this in code because i need this check on db level.
Is this at all possible?
Thanks in advance for any clues on that
Write a function that updates the rows for you or performs a rollback. Sorry for poor style formatting.
create function update_max(varchar, int)
RETURNS void AS
$BODY$
DECLARE
sql ALIAS FOR $1;
max ALIAS FOR $2;
rcount INT;
BEGIN
EXECUTE sql;
GET DIAGNOSTICS rcount = ROW_COUNT;
IF rcount > max THEN
--ROLLBACK;
RAISE EXCEPTION 'Too much rows affected (%).', rcount;
END IF;
--COMMIT;
END;
$BODY$ LANGUAGE plpgsql
Then call it like
select update_max('update t1 set id=id+10 where id < 4', 3);
where the first param ist your sql-Statement and the 2nd your max rows.
Simon had a good idea but his implementation is unnecessarily complicated. This is my proposition:
create or replace function trg_check_max_4()
returns trigger as $$
begin
perform true from pg_class
where relname='check_max_4' and relnamespace=pg_my_temp_schema();
if not FOUND then
create temporary table check_max_4
(value int check (value<=4))
on commit drop;
insert into check_max_4 values (0);
end if;
update check_max_4 set value=value+1;
return new;
end; $$ language plpgsql;
I've created something like this:
begin;
create table test (
id integer
);
insert into test(id) select generate_series(1,100);
create or replace function trg_check_max_4_updated_records()
returns trigger as $$
declare
counter_ integer := 0;
tablename_ text := 'temptable';
begin
raise notice 'trigger fired';
select count(42) into counter_
from pg_catalog.pg_tables where tablename = tablename_;
if counter_ = 0 then
raise notice 'Creating table %', tablename_;
execute 'create temporary table ' || tablename_ || ' (counter integer) on commit drop';
execute 'insert into ' || tablename_ || ' (counter) values(1)';
execute 'select counter from ' || tablename_ into counter_;
raise notice 'Actual value for counter= [%]', counter_;
else
execute 'select counter from ' || tablename_ into counter_;
execute 'update ' || tablename_ || ' set counter = counter + 1';
raise notice 'updating';
execute 'select counter from ' || tablename_ into counter_;
raise notice 'Actual value for counter= [%]', counter_;
if counter_ > 4 then
raise exception 'Cannot change more than 4 rows in one trancation';
end if;
end if;
return new;
end; $$ language plpgsql;
create trigger trg_bu_test before
update on test
for each row
execute procedure trg_check_max_4_updated_records();
update test set id = 10 where id <= 1;
update test set id = 10 where id <= 2;
update test set id = 10 where id <= 3;
update test set id = 10 where id <= 4;
update test set id = 10 where id <= 5;
rollback;
The main idea is to have a trigger on 'before update for each row' that creates (if necessary) a temporary table (that is dropped at the end of transaction). In this table there is just one row with one value, that is the number of updated rows in current transaction. For each update the value is incremented. If the value is bigger than 4, the transaction is stopped.
But I think that this is a wrong solution for your problem. What's a problem to run such wrong query that you've written about, twice, so you'll have 8 rows changed. What about deletion rows or truncating them?
PostgreSQL has two types of triggers: row and statement triggers. Row triggers only work within the context of a row so you can't use those. Unfortunately, "before" statement triggers don't see what kind of change is about to take place so I don't believe you can use those, either.
Based on that, I would say it's unlikely you'll be able to build that kind of protection into the database using triggers, not unless you don't mind using an "after" trigger and rolling back the transaction if the condition isn't satisfied. Wouldn't mind being proved wrong. :)
Have a look at using Serializable Isolation Level. I believe this will give you a consistent view of the database data within your transaction. Then you can use option #1 that MusiGenesis mentioned, without the timing vulnerability. Test it of course to validate.
I've never worked with postgresql, so my answer may not apply. In SQL Server, your trigger can call a stored procedure which would do one of two things:
Perform a SELECT COUNT(*) to determine the number of records that will be affected by the UPDATE, and then only execute the UPDATE if the count is 4 or less
Perform the UPDATE within a transaction, and only commit the transaction if the returned number of rows affected is 4 or less
No. 1 is timing vulnerable (the number of records affected by the UPDATE may change between the COUNT(*) check and the actual UPDATE. No. 2 is pretty inefficient, if there are many cases where the number of rows updated is greater than 4.