Discover if another record is being inserted right now by another transaction in postgresql - postgresql

Imagine there is an open, ongoing transaction in Postgresql inserting a record and doing something else as well.
BEGIN
INSERT INTO films(id, name) VALUES(10, 'A comedy')
# We are at this moment in time
# The transaction is not yet committed
# ...
COMMIT
Is there any non-blocking way to discover from outside of this transaction that there is an ongoing transaction inserting record with ID=10 right now?
The only way I could think about was:
BEGIN
SET statement_timeout to 100
INSERT INTO films(id, name) VALUES(10, '') ON CONFLICT (id) DO NOTHING
ROLLBACK
if I get a timeout from INSERT than it means that there is another ongoing transaction
if I inserted nothing then there was a transaction which is now finished and conflict on unique ID occured
if the INSERT succeeded and thenI rolled-back than it means there is no transaction right now trying to insert a row with ID=10
However this is less than ideal as:
It is not non-blocking, I am waiting 100ms here
I am doing an INSERT operation, whereas I would prefer a read-only solution
As far as I understand I am actively triggering a conflict but I cannot in any easy way enforce that the 2nd transaction gets deadlock and that I won't ever interrupt the work of the 1st transaction
I am effectively trying to workaround the lack of READ UNCOMMITTED transaction isolation level in Postgresql.
I am in charge of both parts of the code so I can change them in any way necessary to allow it.

Related

Can not execute select queries while making a long lasting insert transaction

I'm pretty new to PostgreSQL and I'm sure I'm missing something here.
The scenario is with version 11, executing a big drop table and insert transaction on a given table with the nodejs driver, which may take 30 minutes.
While doing that, if I try to query with select on that table using the jdbc driver, the query execution waits for the transaction to finish. If I close the transaction (by finishing it or by forcing it to exit), the jdbc query becomes responsive.
I thought I can read a table with one connection while performing a transaction with another one.
What am I missing here?
Should I keep the table (without dropping it at the beginning of the transaction) ?
DROP TABLE takes an ACCESS EXCLUSIVE lock on the table, which is there precisely to prevent it from taking place concurrently with any other operation on the table. After all, DROP TABLE physically removes the table.
Since all locks are held until the end of the database transaction, all access to the dropped table is blocked until the transaction ends.
Of course the files are only removed when the transaction commits, so you might wonder why PostgreSQL doesn't let concurrent transactions read in the mean time. But that would mean that COMMIT may be blocked by a concurrent reader, or a SELECT might cause a system error in the middle of reading, both of which don't sound appealing.

Firebird 2.5 exception handling within autonomous transaction

I'm experiencing performance drop in one of our Firebird stored procedures and I have no clue why. I have found the following code in the mentioned SP:
declare v_dummy integer;
...
in autonomous transaction do
begin
-- insert may fail, but that is not a problem because it means the record is already there
insert into my_table(my_field) values (:input_param);
when ANY do
v_dummy = 1;
end
I see few dozens of records in RDB$TRANSACTIONS table with STATE 3, no relevant records in MON$TRANSACTIONS table.
The question is, if the insert fails will the autonomous transaction be rolled back or does the "when ANY do" prevent the rollback and there will be an opened transaction? Can I just remove the exception handling, so it will be rolled back automatically without raising an exception and blocking the rest of the code?
Using a when any do inside an autonomous transaction block will not rollback the transaction, instead it will commit once the block ends because the exception does not escape the block.
However, this is probably the desired result: committing transactions in Firebird is (relatively) cheaper than rolling back. In fact, if a transaction rolls back when nothing was changed, Firebird will convert a rollback into a commit anyway.
I don't think this is the cause of your performance problem, but without reproducible example, it is hard to reason about this.
As an aside, transactions with state 3 are rolled back, and rolled back transactions have ended. MON$TRANSACTIONS only shows active transactions, so rolled back transactions will not be shown in that virtual table.

How do transactions work in the context of reads to the database?

I am using transactions to make changes to a SQL database. As I understand it, this means that changes to the database will happen in an all-or-nothing fashion. What I want to know is, does this have any guarantees for reads? For example, suppose I have some (pseudo)-code like this:
1) start TRANSACTION
2) INSERT INTO users ... // insert some data
3) count = SELECT COUNT(*) FROM ... // count something in the database
4) if count > 10: // do something based on the read
5) INSERT INTO other_table ... // write based on the read
6) COMMMIT TRANSACTION
In this code, I'm doing an INSERT, followed by a SELECT, and then conditionally doing another INSERT based on the outcome of the SELECT.
So my question is, if another process modifies the database between steps (3) and (5), what happens to the count variable, and to my transaction?
If it makes a difference, I am using PostgreSQL.
As Xin pointed out, it depends on the isolation level.
At the default READ COMMITTED level, records from other sessions will become visible as they are committed; you would see the same records if you didn't start a transaction at all (though of course, other processes would see your inserts appear at different times).
With REPEATABLE READ, your queries will not see any records committed by other sessions after your transaction starts. But while you don't have to worry about the result of SELECT COUNT(*) changing during your transaction, you can't assume that this result will still be accurate by the time you commit.
Using SERIALIZABLE provides the strongest guarantee: if your script does the right thing when given exclusive access to the database, then it will do the right thing in the presence of other serialisable transactions (or it will fail outright). However, this means that all transactions which might interfere with yours must be using the same isolation level (which comes at a cost), and all must be prepared to retry their transaction in the event of a serialisation failure.
When serialisable transactions are not an option, you generally guard against race conditions by explicitly locking things against concurrent writes. It's often enough to lock a selection of records, but you can't exactly lock the result of a COUNT(*); in your case, you'd probably need to lock the whole table.
I am not working on postgreSQL, but I think I can answer your question. Think of every query is parallel. I am saying so, because there are 2 transactions: when you insert into a; others can insert into b; then when you check b; whether you can see the new data depends on your isolation setting (read committed or just dirty read).
Also please note that, in database, there is a technology called lock: you can lock a table so that prevent altering it from others before committing your transaction.
Hope

How to wait during SELECT that pending INSERT commit?

I'm using PostgreSQL 9.2 in a Windows environment.
I'm in a 2PC (2 phase commit) environment using MSDTC.
I have a client application, that starts a transaction at the SERIALIZABLE isolation level, inserts a new row of data in a table for a specific foreign key value (there is an index on the column), and vote for completion of the transaction (The transaction is PREPARED). The transaction will be COMMITED by the Transaction Coordinator.
Immediatly after that, outside of a transaction, the same client requests all the rows for this same specific foreign key value.
Because there may be a delay before the previous transaction is really commited, the SELECT clause may return a previous snapshot of the data. In fact, it does happen sometimes, and this is problematic. Of course the application may be redesigned but until then, I'm looking for a lock solution. Advisory Lock ?
I already solved the problem while performing UPDATE on specific rows, then using SELECT...FOR SHARE, and it works well. The SELECT waits until the transaction commits and return old and new rows.
Now I'm trying to solve it for INSERT.
SELECT...FOR SHARE does not block and return immediatley.
There is no concurrency issue here as only one client deals with a specific set of rows. I already know about MVCC.
Any help appreciated.
To wait for a not-yet-committed INSERT you'd need to take a predicate lock. There's limited predicate locking in PostgreSQL for the serializable support, but it's not exposed directly to the user.
Simple SERIALIZABLE isolation won't help you here, because SERIALIZABLE only requires that there be an order in which the transactions could've occurred to produce a consistent result. In your case this ordering is SELECT followed by INSERT.
The only option I can think of is to take an ACCESS EXCLUSIVE lock on the table before INSERTing. This will only get released at COMMIT PREPARED or ROLLBACK PREPARED time, and in the mean time any other queries will wait for the lock. You can enforce this via a BEFORE trigger to avoid the need to change the app. You'll probably get the odd deadlock and rollback if you do it that way, though, because INSERT will take a lower lock then you'll attempt lock promotion in the trigger. If possible it's better to run the LOCK TABLE ... IN ACCESS EXCLUSIVE MODE command before the INSERT.
As you've alluded to, this is mostly an application mis-design problem. Expecting to see not-yet-committed rows doesn't really make any sense.

T-SQL "after insert trigger" invoke by store procedure

I have a store procedure that inserts records in table and at the same time delete all records older then 20 minutes.
I have try to optimized it and have seen that the delete operation cost to much. So, I have decided to create after insert trigger that do the delete operation.
It seems, that it works faster now, but the execution plan is showing the "delete" statement from the trigger - the only difference is that now the delete statement has "Query Cost (relative to the batch): 0%."
My question is, when the procedure inserts the records, will it immediately returns the results or it will wait for the after insert trigger to complete?
The trigger is part of the same transaction as the calling procedure, so the whole transaction will only complete once the trigger has fired, from MSDN:
The trigger and the statement that fires it are treated as a single transaction, which can be rolled back from within the trigger. If a severe error is detected (for example, insufficient disk space), the entire transaction automatically rolls back.
MSDN - Understanding DML Triggers