I am trying to get a better grasp on how transactions work in PostgreSQL. I did a lot of research, but I could not find any answers on the following question.
question 1
I have two transactions with isolation set to read committed, the default. I also have the following table:
create table test(a integer primary key);
Let's start the first transaction:
begin;
insert into test(a) values(1);
Now let's start the second transaction and do the same:
begin;
insert into test(a) values(1);
Now I notice that the second transaction is blocking until the first transaction either commits or rollbacks. Why is that? Why isn't it possible in the second transaction to simply continue after the insert and throw a unique-key-constraint-exception when the transaction is requested to be committed instead of throwing the exception directly after the insert call?
question 2
Now, a second scenario. Let's start from scratch with the first transaction:
begin;
insert into test(a) values(1);
delete from test where a = 1;
Now let's go to the second transaction:
begin;
insert into test(a) values(1);
Now I notice that the second transaction is also blocking. Why is it blocking on a row which does not exists anyway?
Why is it blocking on a row which does not exists anyway?
Because both transactions are inserting the same value for the primary key. The second transactions needs to wait for the outcome of the first transaction to know whether it can succeed or not. If the first transactions commits, the second will fail with a primary key violation. If the first transaction rolls back, the insert in the second transaction succeeds.
Related
When the following transaction is run concurrently on different connections it sometimes errors with
trigger "my_trigger" for relation "my_table" already exists
What am I doing wrong?
BEGIN;
DROP TRIGGER IF EXISTS my_trigger ON my_table;
CREATE TRIGGER my_trigger
AFTER INSERT ON my_table
REFERENCING NEW TABLE AS new_table
FOR EACH STATEMENT EXECUTE PROCEDURE my_function();
COMMIT;
I am trying to set up a system where I can add triggers to notify about data changes in specific tables. If a table already has such a trigger then skip it. Otherwise CREATE all CRUD triggers. This logic needs to run sequentially in case of concurrent requests.
After trying ISOLATION LEVEL SERIALIZABLE I noticed that any conflicting transactions are failed and dropped (I would need to manually check sql status and retry). But what I want is to queue up these transactions and run afterwards one by one in the order they're sent.
At the moment I am trying to achieve this by having a my_triggers (table_name TEXT) table that has a BEFORE INSERT OR DELETE trigger. Within this trigger I do the actual table trigger upsert logic. Inserts or deletes on my_triggers are made with LOCK TABLE my_triggers IN ACCESS EXCLUSIVE MODE ... which should queue up conflicting CRUD transactions ?!
What happens is following:
BEGIN....DROP TRIGGER IF EXISTS....CREATE TRIGGER....COMMIT;
..BEGIN....DROP TRIGGER IF EXISTS....CREATE TRIGGER--------EXCEPTION.
Both transactions starts when trigger is not present.
Both succeed in drop trigger because of "IF EXISTS" statement.
First transaction starts creating a trigger. For that a SHARE ROW EXCLUSIVE lock is placed on table my_table. The lock SHARE ROW EXCLUSIVE conflicts with it self so no other transaction is allowed to create a trigger until the first one completes.
Second transaction blocks on CREATE TRIGGER.
First transaction completes.
Second transaction proceeds with CREATE TRIGGER but it already exists. Exception is raised.
What you need is adding a LOCK before DROP TRIGGER statement. This way you will ensure the trigger is dropped and not created in concurrent transaction.
BEGIN;
LOCK TABLE my_table IN SHARE ROW EXCLUSIVE MODE ;
DROP TRIGGER IF EXISTS my_trigger ON my_table;
CREATE TRIGGER my_trigger
AFTER INSERT ON my_table
REFERENCING NEW TABLE AS new_table
FOR EACH STATEMENT EXECUTE PROCEDURE my_function();
COMMIT;
Please help with my understanding of how triggers and locks can interact
I bulk load records to a table with statements something like this…..
BEGIN;
INSERT INTO table_a VALUES (record1) , (record2), (record3)………;
INSERT INTO table_a VALUES (record91) , (record92), (record93)………;
…..
….
COMMIT;
There can be several hundred records in a single insert, and there can be several dozen INSERT statements between COMMITs
Table_a has a trigger on it defined as….
AFTER INSERT ON table_a FOR EACH ROW EXECUTE PROCEDURE foo();
The procedure foo() parses each new row as it’s added, and will (amongst other stuff) update a record in a summary table_b (uniquely identified by primary key). So, for every record inserted into table_a a corresponding record will be updated in table_b
I have a 2nd process that also attempts to (occasionally) update records in table_b. On very rare occasions it may attempt to update the same row in table_b that the bulk process is updating
Questions – should anything in the bulk insert statements affect my 2nd process being able to update records in table_b? I understand that the bulk insert process will obtain a row lock each time it updates a row in table_b, but when will that row lock be released? – when the individual record (record1, record2, record3 etc etc) has been inserted? Or when the entire INSERT statement has completed? Or when the COMMIT is reached?
Some more info - my overall purpose for this question is to try to understand why my 2nd process occasionally pauses for a minute or more when trying to update a row in table_b that is also being updated by the bulk-load process. What appears to be happening is that the lock on the target record in table_b isn't actually being released until the COMMIT has been reached - which is contrary to what I think ought to be happening. (I think a row-lock should be released as soon as the UPDATE on that row is done)
UPDATE after answer(s) - yes of course you're both right. In my mind I had somehow convinced myself that the individual updates performed within the trigger were somehow separate from the overall BEGIN and COMMIT of the whole transaction. Silly me.
The practice of adding multiple records with one INSERT, and multiple INSERTs between COMMITs was introduced to improve the bulk load speed (which it does) I had forgotten about the side-effect of increasing the time before locks would be released.
What should happen when the transaction is rolled back? It is rather obvious that all inserts on table_a, as well as all updates on table_b, should be rolled back. This is why all rows of table_b updated by the trigger will be locked until the transaction completes.
Committing after each insert (reducing the number of rows inserted in a single transaction) will reduce the chance of conflicts with concurrent processes.
I use PostgreSQL 9.2, and I do not use explicit locking anywhere, neither LOCK statement nor SELECT ... FOR UPDATE. However, recently I got ERROR: 40P01: deadlock detected. The query where deadlock was detected is wrapped in transaction block though. Anyway, how comes it?
You don't need any explicit LOCK to go into a deadlock. Here's a very simple demo from scratch with only INSERTs:
create table a(i int primary key);
create table b(i int primary key);
Session #1 does:
begin;
insert into a values(1);
Then session #2 does:
begin;
insert into b values(1);
insert into a values(1);
-- here it goes into waiting for session #1 to finish its transaction
Then session #1 does:
insert into b values(1);
And then the deadlock occurs:
ERROR: deadlock detected
DETAIL: Process 9571 waits for ShareLock on
transaction 4150; blocked by process 9501.
Process 9501 waits for
ShareLock on transaction 4149; blocked by process 9571.
HINT: See
server log for query details.
The same could happen with simple UPDATEs or a combination of UPDATEs and INSERTs.
These operations take implicit locks, and if they happen in different sessions in different orders, they may deadlock.
I would suspect hash indexes first.
Switch any hash-indexes you have to B-tree
Use Serializable isolation level if it seems appropriate.
I have a question. Transaction isolation level is set to serializable. When the one user opens a transaction and INSERTs or UPDATEs data in "table1" and then another user opens a transaction and tries to INSERT data to the same table, does the second user need to wait 'til the first user commits the transaction?
Generally, no. The second transaction is inserting only, so unless there is a unique index check or other trigger that needs to take place, the data can be inserted unconditionally. In the case of a unique index (including primary key), it will block if both transactions are updating rows with the same value, e.g.:
-- Session 1 -- Session 2
CREATE TABLE t (x INT PRIMARY KEY);
BEGIN;
INSERT INTO t VALUES (1);
BEGIN;
INSERT INTO t VALUES (1); -- blocks here
COMMIT;
-- finally completes with duplicate key error
Things are less obvious in the case of updates that may affect insertions by the other transaction. I understand PostgreSQL does not yet support "true" serialisability in this case. I do not know how commonly supported it is by other SQL systems.
See http://www.postgresql.org/docs/current/interactive/mvcc.html
The second user will be blocked until the first user commits or rolls back his/her changes.
I'm trying to insert or update data in a PostgreSQL db. The simplest case is a key-value pairing (the actual data is more complicated, but this is the smallest clear example)
When you set a value, I'd like it to insert if the key is not there, otherwise update. Sadly Postgres does not have an insert or update statement, so I have to emulate it myself.
I've been working with the idea of basically SELECTing whether the key exists, and then running the appropriate INSERT or UPDATE. Now clearly this needs to be be in a transaction or all manner of bad things could happen.
However, this is not working exactly how I'd like it to - I understand that there are limitations to serializable transactions, but I'm not sure how to work around this one.
Here's the situation -
ab: => set transaction isolation level serializable;
a: => select count(1) from table where id=1; --> 0
b: => select count(1) from table where id=1; --> 0
a: => insert into table values(1); --> 1
b: => insert into table values(1); -->
ERROR: duplicate key value violates unique constraint "serial_test_pkey"
Now I would expect it to throw the usual "couldn't commit due to concurrent update" but I'm guessing since the inserts are different "rows" this does not happen.
Is there an easy way to work around this?
Prior to Postgres 9.1 there were issues with isolation:
asking for serializable isolation guaranteed only that a single MVCC snapshot would be used for the entire transaction, which allowed certain documented anomalies.
Perhaps you're running into these "anomalies".
You can try SELECT … FOR UPDATE when checking if the row exists.
Alternatively, LOCK TABLE yourself.
If you're trying to implement UPSERT, then a slightly more reliable (or rather less unreliable) way is to first attempt UPDATE, check number of affected rows, and then try INSERT if no rows were updated.