Locking in Postgres function - postgresql

Let's say I have a transactions table and transaction_summary table. I have created following trigger to update transaction_summary table.
CREATE OR REPLACE FUNCTION doSomeThing() RETURNS TRIGGER AS
$BODY$
DECLARE
rec_cnt bigint;
BEGIN
-- lock rows which have to be updated
SELECT count(1) from (SELECT 1 FROM transaction_summary WHERE receiver = new.receiver FOR UPDATE) r INTO rec_cnt ;
IF rec_cnt = 0
THEN
-- if there are no rows then create new entry in summary table
-- lock whole table
LOCK TABLE "transaction_summary" IN ACCESS EXCLUSIVE MODE;
INSERT INTO transaction_summary( ... ) VALUES ( ... );
ELSE
UPDATE transaction_summary SET ... WHERE receiver = new.receiver;
END IF;
SELECT count(1) from (SELECT 1 FROM transaction_summary WHERE sender = new.sender FOR UPDATE) r INTO rec_cnt ;
IF rec_cnt = 0
THEN
LOCK TABLE "transaction_summary" IN ACCESS EXCLUSIVE MODE;
INSERT INTO transaction_summary( ... ) VALUES ( ... );
ELSE
UPDATE transaction_summary SET ... WHERE sender = new.sender;
END IF;
RETURN new;
END;
$BODY$
language plpgsql;
Question: Will there be a dead lock? According to my understanding deadlock it might happen like this:
_________
|__table__| <- executor #1 waits on executor #2 to be able to lock the whole table AND
|_________| executor #2 waits on executor #1 to be able to lock the whole table
|_________|
|_________| <- row is locked by executor #1
|_________|
|_________| <- row is locked by executor #2
It seems that only option is to lock the whole table every time in transaction beginning.

Are your 'SELECT 1 FROM transactions WHERE ...' meant to access 'transactions_summary' instead? Also, notice that those two queries can at least theoretically deadlock each other if two DB transactions are inserting two 'transactions' rows, with new.sender1=new.receiver2 and new.receiver1=new.sender2.
You can't, in general, guarantee that you won't get a deadlock from a database. Even if you try and prevent them by writing your queries carefully (eg, ordering updates) you can still get caught out because you can't control the order of INSERT/UPDATE, or of constraint checks. In any case, comparing every transaction against every other to check for deadlocks doesn't scale as your application grows.
So, your code should always be prepared to re-run transactions when you get 'deadlock detected' errors. If you do that and you think that conflicting transactions will be uncommon then you might as well let your deadlock handling code deal with it.
If you think deadlocks will be common then it might cause you a performance problem - although contending on a big table lock could be, too. Here are some options:
If new.receiver and new.sender are, for example, the IDs of rows in a MyUsers table, you could require all code which inserts into 'transactions_summary' to first do 'SELECT 1 FROM MyUsers WHERE id IN (user1, user2) FOR UPDATE'. It'll break if someone forgets, but so will your table locking. By doing it that way you'll swap one big table lock for many separate row locks.
Add UNIQUE constraints to transactions_summary and look for the error when it's violated. You should probably add constraints anyway, even if you handle this another way. It'll detect bugs.
You could allow duplicate transaction_summary rows, and require users of that table to add them up. Messy, and easy for developers who don't know to create bugs (though you could add a view which does the adding). But if you really can't take the performance hit of locking and deadlocks you could do it.
You could try the SERIALIZABLE transaction isolation level and take out the table locks. By my reading, the SELECT ... FOR UPDATE should create a predicate lock (and so should a plain SELECT). That'd stop any other transaction that does a conflicting insert from committing successfully. However, using SERIALIZABLE throughout your application will cost you performance and give you a lot more transactions to retry.
Here's how SERIALIZABLE transaction isolation level works:
create table test (id serial, x integer, total integer); ...
Transaction 1:
DB=# begin transaction isolation level serializable;
BEGIN
DB=# insert into test (x, total) select 3, 100 where not exists (select true from test where x=3);
INSERT 0 1
DB=# select * from test;
id | x | total
----+---+-------
1 | 3 | 100
(1 row)
DB=# commit;
COMMIT
Transaction 2, interleaved line for line with the first:
DB=# begin transaction isolation level serializable;
BEGIN
DB=# insert into test (x, total) select 3, 200 where not exists (select true from test where x=3);
INSERT 0 1
DB=# select * from test;
id | x | total
----+---+-------
2 | 3 | 200
(1 row)
DB=# commit;
ERROR: could not serialize access due to read/write dependencies among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt.
HINT: The transaction might succeed if retried.

Related

How to avoid deadlock when delete/update the same record in the Postgres

I have a scenario when I play with Postgres.
We have one table with primary key, and there are two concurrent process, the one can update record, another process can delete record.
Now we are facing deadlock, when two processes play with update/delete the same record in the table.
I google how to avoid deadlock, someone says to use "SELECT FOR UPDATE".
Suppose there are two statements as following
update table_A set name='aaaa' where cid=1;
delete table_A where cid=1;
My question is,
(1) Do I need to add "SELECT FOR UPDATE" to both statements or just one statement in order to avoid deadlock?
(2) Could you give a complete example how to add "SELECT FOR UPDATE" ? I mean, what does it look like after you add "SELECT FOR UPDATE"? I never do it before, I want to learn how to add it.
SELECT ... FOR UPDATE locks the selected rows so that any other transaction can neither perform an update nor a SELECT ... FOR UPDATE on these rows. These transactions must wait until the transaction with the first SELECT ... FOR UPDATE releases the lock on the rows again.
If SELECT ... FOR UPDATE is the first statement in all transactions, no deadlock can occur. Because no transaction can lock other lines, which could be used in the further course of other transactions.
So your two transactions should look like this:
BEGIN;
SELECT * FROM table_A WHERE cid = 1 FOR UPDATE;
-- some other statements
UPDATE table_A SET name = 'aaaa' WHERE cid = 1;
END;
and:
BEGIN;
SELECT * FROM table_A WHERE cid = 1 FOR UPDATE;
-- some other statements
DELETE FROM table_A WHERE cid = 1;
END;

Why does PostgreSQL think there is a conflict between the two serializable transactions?

I'm trying to figure out how the serializable isolation level in PostgreSQL works. In theory and according to PostgreSQL's own documentation PostgreSQL should be smart enough to somehow detect serialization conflicts and automatically roll back offending transactions. Yet when I tried to play with serializable isolation level myself I stumbled upon a lot of false positives and started to doubt my own understanding of the concept of serializability or PostgreSQL's implementation of it. Below you can find one of the simplest examples of such false positives:
create table mytab(
class integer,
value integer not null
);
create index mytab_class_idx on mytab (class);
insert into mytab (class, value) values (1, 10);
insert into mytab (class, value) values (1, 20);
insert into mytab (class, value) values (2, 100);
insert into mytab (class, value) values (2, 200);
The table data is the following:
class | value
-------+-------
1 | 10
1 | 20
2 | 100
2 | 200
Then I run two concurrent transactions. Step n comments in code show an order in which I execute the statements. Following advice from https://stackoverflow.com/a/42303225/3249257 I explicitly disabled sequential scan to force PostgreSQL to use an index:
SET enable_seqscan=off;
Transaction A:
begin; -- step 1
select sum(value) from mytab where class = 1; -- step 2
insert into mytab(class, value) values (3, 30); -- step 5
commit; -- step 7
Transaction B:
begin; -- step 3
select sum(value) from mytab where class = 2; -- step 4
insert into mytab(class, value) values (4, 300); -- step 6
commit; -- step 8
As I understand it, there shoudn't be any conflict between those two transactions. They don't touch the same rows. However, when I commit the second transaction it fails with this error:
[40001] ERROR: could not serialize access due to read/write dependencies among transactions
Detail: Reason code: Canceled on identification as a pivot, during commit attempt.
Hint: The transaction might succeed if retried.
What's going on here? Is my understanding of serializable isolation level flawed? Is it a failure of PostgreSQL's heuristics mentioned in this answer https://stackoverflow.com/a/50809788/3249257?
I'm using PostgreSQL 11.5 on x86_64-apple-darwin18.6.0, compiled by Apple LLVM version 10.0.1 (clang-1001.0.46.4), 64-bit.
The problem here is with predicate locks (SIReadLock) that are used by PostgreSQL to figure out whether there is a conflict between concurrent transactions. If you run the query bellow during the course of transactions' execution, you will see these locks:
select relation::regclass, locktype, page, tuple, pid from pg_locks
where mode = 'SIReadLock';
In this case, the issue was with page locks on the mytab_class_idx index. If the concurrent transactions happen to acquire a lock for the same page of mytab_class_idx relation, serialization conflict occurs. If they acquire locks for different pages, they both commit successfully.
If there is not enough data like in the question above, index entries for all rows will fall on the same page and then a serialization conflict will inevitably occur. For big enough tables serialization conflicts will happen rarely, though not as rare as they could.

Which explicit lock to use for a trigger?

I am trying to understand which type of a lock to use for a trigger function.
Simplified function:
CREATE OR REPLACE FUNCTION max_count() RETURNS TRIGGER AS
$$
DECLARE
max_row INTEGER := 6;
association_count INTEGER := 0;
BEGIN
LOCK TABLE my_table IN ROW EXCLUSIVE MODE;
SELECT INTO association_count COUNT(*) FROM my_table WHERE user_id = NEW.user_id;
IF association_count > max_row THEN
RAISE EXCEPTION 'Too many rows';
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE CONSTRAINT TRIGGER my_max_count
AFTER INSERT OR UPDATE ON my_table
DEFERRABLE INITIALLY DEFERRED
FOR EACH ROW
EXECUTE PROCEDURE max_count();
I initially was planning to use EXCLUSIVE but it feels too heavy. What I really want is to ensure that during this function execution no new rows are added to the table with concerned user_id.
If you want to prevent concurrent transactions from modifying the table, a SHARE lock would be correct. But that could lead to a deadlock if two such transactions run at the same time — each has modified some rows and is blocked by the other one when it tries to escalate the table lock.
Moreover, all table locks that conflict with SHARE UPDATE EXCLUSIVE will lead to autovacuum cancelation, which will cause table bloat when it happens too often.
So stay away from table locks, they are usually the wrong thing.
The better way to go about this is to use no explicit locking at all, but to use the SERIALIZABLE isolation level for all transactions that access this table.
Then you can simply use your trigger (without lock), and no anomalies can occur. If you get a serialization error, repeat the transaction.
This comes with a certain performance penalty, but allows more concurrency than a table lock. It also avoids the problems described in the beginning.

how to get last Postresql serial ID that all rows before it are committed?

I am using Postgresql 9.0.5 and I have a cron job that periodically reads newly created rows from a table and accumulate its value into a summary table that has hourly data.
I need to get the latest ID (serial) that is committed and all rows before it are committed.
The currval function will not give a correct value in this case, because the transaction inserting currval may commit earlier than others. Using SELECT statement at a moment, I can see Id column is not continuous because some rows are still not committed.
Here is some sample code I have used to test:
--test race condition
create table mydata(id serial,val int);
--run in thread 1
create or replace function insert_delay() returns void as $$
begin
insert into mydata(val) values (1);
perform pg_sleep(60);
end;
$$ language 'plpgsql';
--run in thread 2
create or replace function insert_ok() returns void as $$
begin
insert into mydata(val) values (2);
end;
$$ language 'plpgsql';
--run in thread 3
mytest=# select * from mydata; --SHOULD HAVE SEEN id = 1 and 2;
id | val
----+-----
2 | 2
(1 row)
I even tried some statement like the one below;
select max(id) from mydata age(xmin) >= age(txid_snapshot_xmin(txid_current_snapshot())::text::xid);
But in production line (running high volume transactions), the returned max(id) will not move forwards (even all the busy transaction are finished). So this does not work either.
There isn't a really good way to do this directly. I think the best option really is to create a temporary table which truncates on transaction commit, and a trigger that inserts such into that table. Then you can look up the values from the temp table.

incrementing on errors

How do I suppress the 'id' in this table from incrementing when an error occurs?
db=> CREATE TABLE test (id serial primary key, info text, UNIQUE(info));
NOTICE: CREATE TABLE will create implicit sequence "test_id_seq" for serial column "test.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test"
NOTICE: CREATE TABLE / UNIQUE will create implicit index "test_info_key" for table "test"
CREATE TABLE
db=> INSERT INTO test (info) VALUES ('hello') ;
INSERT 0 1
db=> INSERT INTO test (info) VALUES ('hello') ;
ERROR: duplicate key violates unique constraint "test_info_key"
db=> INSERT INTO test (info) VALUES ('hello') ;
ERROR: duplicate key violates unique constraint "test_info_key"
db=> INSERT INTO test (info) VALUES ('goodbye') ;
INSERT 0 1
db=> SELECT * from test; SELECT last_value from test_id_seq;
id | info
----+---------
1 | hello
4 | goodbye
(2 rows)
last_value
------------
4
(1 row)
You cannot suppress this - and there is nothing wrong with having gaps in your ID values.
The primary key is a totally meaningless value that is only used to uniquely identify one row in a table.
You cannot rely on the ID to never have any gaps - just think what happens if you delete a row.
Simply ignore it - nothing is wrong
Edit
Just wanted to mention that this behaviour is also clearly stated in the manual:
To avoid blocking concurrent transactions that obtain numbers from the same sequence, a nextval operation is never rolled back
http://www.postgresql.org/docs/current/static/functions-sequence.html
(Scroll to the bottom)
Your question boils down to this: "Can I rollback the next value from a PostgreSQL sequence?"
And the answer is, "You can't." PostgreSQL documentation says
To avoid blocking concurrent transactions that obtain numbers from the same sequence, a nextval operation is never rolled back . . .
Imagine two different transactions go to insert. Transaction A gets id=1 Transaction B gets id=2. Transaction B commits. transaction A rolls back. Now what do we do? How could we roll back the sequence for A without affecting B or later transactions?
I figured it out.
I needed to write a wrapper function around my INSERT statement.
The database will normally have one user at a time so the 'race to the next id' condition is rare. What I was concerned about was when my (unmentioned) 'pull rows from remote database table' function would try to reinsert the growing remote database table into the master database table. I am displaying the row ids and I didn't want the users to see the gap in numbering as missing data.
Anyways here is my solution:
CREATE FUNCTION testInsert (test.info%TYPE) RETURNS void AS '
BEGIN
PERFORM info FROM test WHERE info=$1;
IF NOT FOUND THEN
INSERT INTO test (info) VALUES ($1);
END IF;
END;' LANGUAGE plpgsql;