Please help with my understanding of how triggers and locks can interact
I bulk load records to a table with statements something like this…..
BEGIN;
INSERT INTO table_a VALUES (record1) , (record2), (record3)………;
INSERT INTO table_a VALUES (record91) , (record92), (record93)………;
…..
….
COMMIT;
There can be several hundred records in a single insert, and there can be several dozen INSERT statements between COMMITs
Table_a has a trigger on it defined as….
AFTER INSERT ON table_a FOR EACH ROW EXECUTE PROCEDURE foo();
The procedure foo() parses each new row as it’s added, and will (amongst other stuff) update a record in a summary table_b (uniquely identified by primary key). So, for every record inserted into table_a a corresponding record will be updated in table_b
I have a 2nd process that also attempts to (occasionally) update records in table_b. On very rare occasions it may attempt to update the same row in table_b that the bulk process is updating
Questions – should anything in the bulk insert statements affect my 2nd process being able to update records in table_b? I understand that the bulk insert process will obtain a row lock each time it updates a row in table_b, but when will that row lock be released? – when the individual record (record1, record2, record3 etc etc) has been inserted? Or when the entire INSERT statement has completed? Or when the COMMIT is reached?
Some more info - my overall purpose for this question is to try to understand why my 2nd process occasionally pauses for a minute or more when trying to update a row in table_b that is also being updated by the bulk-load process. What appears to be happening is that the lock on the target record in table_b isn't actually being released until the COMMIT has been reached - which is contrary to what I think ought to be happening. (I think a row-lock should be released as soon as the UPDATE on that row is done)
UPDATE after answer(s) - yes of course you're both right. In my mind I had somehow convinced myself that the individual updates performed within the trigger were somehow separate from the overall BEGIN and COMMIT of the whole transaction. Silly me.
The practice of adding multiple records with one INSERT, and multiple INSERTs between COMMITs was introduced to improve the bulk load speed (which it does) I had forgotten about the side-effect of increasing the time before locks would be released.
What should happen when the transaction is rolled back? It is rather obvious that all inserts on table_a, as well as all updates on table_b, should be rolled back. This is why all rows of table_b updated by the trigger will be locked until the transaction completes.
Committing after each insert (reducing the number of rows inserted in a single transaction) will reduce the chance of conflicts with concurrent processes.
Related
I am trying to get a better grasp on how transactions work in PostgreSQL. I did a lot of research, but I could not find any answers on the following question.
question 1
I have two transactions with isolation set to read committed, the default. I also have the following table:
create table test(a integer primary key);
Let's start the first transaction:
begin;
insert into test(a) values(1);
Now let's start the second transaction and do the same:
begin;
insert into test(a) values(1);
Now I notice that the second transaction is blocking until the first transaction either commits or rollbacks. Why is that? Why isn't it possible in the second transaction to simply continue after the insert and throw a unique-key-constraint-exception when the transaction is requested to be committed instead of throwing the exception directly after the insert call?
question 2
Now, a second scenario. Let's start from scratch with the first transaction:
begin;
insert into test(a) values(1);
delete from test where a = 1;
Now let's go to the second transaction:
begin;
insert into test(a) values(1);
Now I notice that the second transaction is also blocking. Why is it blocking on a row which does not exists anyway?
Why is it blocking on a row which does not exists anyway?
Because both transactions are inserting the same value for the primary key. The second transactions needs to wait for the outcome of the first transaction to know whether it can succeed or not. If the first transactions commits, the second will fail with a primary key violation. If the first transaction rolls back, the insert in the second transaction succeeds.
Problem is following: remove all records from one table, and insert them to another.
I have a table that is partitioned by date criteria. To avoid partitioning each record one by one, I'm collecting the data in one table, and periodically move them to another table. Copied records have to be removed from first table. I'm using DELETE query with RETURNING, but the side effect is that autovacuum is having a lot of work to do to clean up the mess from original table.
I'm trying to achieve the same effect (copy and remove records), but without creating additional work for vacuum mechanism.
As I'm removing all rows (by delete without where conditions), I was thinking about TRUNCATE, but it does not support RETURNING clause. Another idea was to somehow configure the table, to automatically remove tuple from page on delete operation, without waiting for vacuum, but I did not found if it is possible.
Can you suggest something, that I could use to solve my problem?
You need to use something like:
--Open your transaction
BEGIN;
--Prevent concurrent writes, but allow concurrent data access
LOCK TABLE table_a IN SHARE MODE;
--Copy the data from table_a to table_b, you can also use CREATE TABLE AS to do this
INSERT INTO table_b AS SELECT * FROM table_a;
--Zeroying table_a
TRUNCATE TABLE table_a;
--Commits and release the lock
COMMIT;
In postgresql: multiple sessions want to get one record from the the table, but we need to make sure they don't interfere with each other. I could do it using message queue: put the data in a queue, and them let each session get data from the queue. But is it doable in postgresql? since it will be easier for SQL guys to cal stored procedure. Any way to configure a stored procedure so that no concurrent calling will happen, or use some special lock?
I would recommend making sure the stored procedure uses SELECT FOR UPDATE, which should prevent the same row in the table from being accessed by multiple transactions.
Per the Postgres doc:
FOR UPDATE causes the rows retrieved by the SELECT statement to be
locked as though for update. This prevents them from being modified or
deleted by other transactions until the current transaction ends. That
is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE,
SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of
these rows will be blocked until the current transaction ends. The FOR
UPDATE lock mode is also acquired by any DELETE on a row, and also by
an UPDATE that modifies the values on certain columns. Currently, the
set of columns considered for the UPDATE case are those that have an
unique index on them that can be used in a foreign key (so partial
indexes and expressional indexes are not considered), but this may
change in the future.
More SELECT info.
So you don't end up locking all of the rows in the table at once (i.e. by SELECTing all of the records), I would recommend you use ORDER BY to sort the table in a consistent manner, and then do a LIMIT 1, so that it only gets the next one in the queue. Also add a WHERE clause that checks for a certain column value (i.e. processed), and then once processed set the column to a value that will prevent the WHERE clause from picking it up.
I am in the unfortunate situation of needing to add triggers to a table to track changes to a legacy system. I have insert, update, and delete triggers on TABLE_A each one of them writes the values of two columns to a TABLE_B, and a bit flag that is set to 1 if populated by the delete trigger.
Every entry in TABLE_B shows up twice. An insert crates two rows, and update creates two rows (we believe), and a delete creates an insert and then a delete.
Is the legacy application doing this, or is SQL doing it?
EDIT (adding more detail):
body of triggers:
.. after delete
INSERT INTO TableB(col1, isdelete) SELECT col1, 1 from DELETED
.. after insert
INSERT INTO TableB(col1, isdelete) SELECT col1, 0 from INSERTED
.. after update
INSERT INTO TableB(col1, isdelete) SELECT col1, 0 from DELETED
I have tried profiler, and do not see any duplicate statements being executed.
It may be that the application is changing the data again when it sees the operations on its data.
It's also possible that triggers exist elsewhere - is there any possiblity that there is a trigger on TableB that is creating extra rows?
More detail would be needed to address the question more fully.
I have a question. Transaction isolation level is set to serializable. When the one user opens a transaction and INSERTs or UPDATEs data in "table1" and then another user opens a transaction and tries to INSERT data to the same table, does the second user need to wait 'til the first user commits the transaction?
Generally, no. The second transaction is inserting only, so unless there is a unique index check or other trigger that needs to take place, the data can be inserted unconditionally. In the case of a unique index (including primary key), it will block if both transactions are updating rows with the same value, e.g.:
-- Session 1 -- Session 2
CREATE TABLE t (x INT PRIMARY KEY);
BEGIN;
INSERT INTO t VALUES (1);
BEGIN;
INSERT INTO t VALUES (1); -- blocks here
COMMIT;
-- finally completes with duplicate key error
Things are less obvious in the case of updates that may affect insertions by the other transaction. I understand PostgreSQL does not yet support "true" serialisability in this case. I do not know how commonly supported it is by other SQL systems.
See http://www.postgresql.org/docs/current/interactive/mvcc.html
The second user will be blocked until the first user commits or rolls back his/her changes.