I have a concurrent workflow which inserts a record with a unique index on column A and column B, and if this is successful, performs an async action that cannot be rolled back (API request), inside a single transaction.
Said API request should only ever happens once, but currently it's possibly that it gets triggered multiple times if that record is being inserted in parallel.
If i'm not mistaken, the way to solve this problem is to set a lock on the offending row to make sure that any parallel inerts will wait until the initial transaction is complete.
Which lock would be the correct one for this usecase?
No need for an explicit lock.
If a second transactions inserts the same values for the PK that an un-committed other transaction has already inserted, the second will wait until the first transactions commits or rolls back.
If the first transaction rolls back, the second will succeed. If the first transaction commits, the second will get an "unique key violation" error.
Related
I'm pretty new to PostgreSQL and I'm sure I'm missing something here.
The scenario is with version 11, executing a big drop table and insert transaction on a given table with the nodejs driver, which may take 30 minutes.
While doing that, if I try to query with select on that table using the jdbc driver, the query execution waits for the transaction to finish. If I close the transaction (by finishing it or by forcing it to exit), the jdbc query becomes responsive.
I thought I can read a table with one connection while performing a transaction with another one.
What am I missing here?
Should I keep the table (without dropping it at the beginning of the transaction) ?
DROP TABLE takes an ACCESS EXCLUSIVE lock on the table, which is there precisely to prevent it from taking place concurrently with any other operation on the table. After all, DROP TABLE physically removes the table.
Since all locks are held until the end of the database transaction, all access to the dropped table is blocked until the transaction ends.
Of course the files are only removed when the transaction commits, so you might wonder why PostgreSQL doesn't let concurrent transactions read in the mean time. But that would mean that COMMIT may be blocked by a concurrent reader, or a SELECT might cause a system error in the middle of reading, both of which don't sound appealing.
For illustration, say I'm updating a table ProductOffers and their prices. Mutations to this table are of the form: add new ProductOffer, change price of existing ProductOffer.
Based on the above changes, I'd like to update a Product-table which holds pricing info per product aggregated over all offers.
It seems logical to implement this using a row-based update/insert trigger, where the trigger runs a procedure creating/updating a Product row.
I'd like to properly implement concurrent updates (and thus triggers). I.e.: updating productOffers of the same Product concurrently, would potentially lead to wrong aggregate values (because multiple triggered procedures would concurrently attempt to insert/update the same Product-row)
It seems I cannot use row-based locking on the product-table (i.e.: select .. for update) because it's not guaranteed that a particular product-row already exists. Instead the first time around a Product row must be created (instead of updated) once a ProductOffer triggers the procedure. Afaik, row-locking can't work with new rows to be inserted, which totally makes sense.
So where does that leave me? Would I need to roll my own optimistic locking scheme? This would need to include:
check row not exists => create new row fail if already exists. (which is possible if 2 triggers concurrently try to create a row). Try again afterwards, with an update.
check row exists and has version=x => update row but fail if row.version !=x. Try again afterwards
Would the above work, or any better / more out-of-the-box solutions?
EDIT:
For future ref: found official example which exactly illustrates what I want to accomplish: Example 39-6. A PL/pgSQL Trigger Procedure For Maintaining A Summary Table
Things are much simpler than you think they are, thanks to the I an ACID.
The trigger you envision will run in the same transaction as the data modification that triggered it, and each modification to the aggregate table will first lock the row that it wants to update with an EXCLUSIVE lock.
So if two concurrent transactions cause an UPDATE on the same row in the aggregate table, the first transaction will get the lock and proceed, while the second transaction will have to wait until the first transaction commits (or rolls back) before it can get the lock on the row and modify it.
So data modifications that update the same row in the aggregate table will effectively be serialized, which may hurt performance, but guarantees exact results.
I made a mistake with an update query on a very large database. I realised my mistake while the update/query was still running and clicked 'cancel query'.
I've checked the History panel (pgAdmin3) and it sais 'Execution Cancelled'. It does not say anything about any rows being affected.
Does this mean that no rows were affected by the update? Is there a way to check a log of some sort to see if any rows were affected?
Postgres runs a query in a transaction. The transaction completes as a whole or is cancelled as a whole. This property is called Atomicity (from ACID.)
If your query was cancelled, no rows where affected.
If I have two READ COMMITTED PostgreSQL database transactions that both create a new row with the same primary key and then lock this row, is it possible to acquire both locks successfully at the same time?
My instinct is yes since these new rows both only exist in the individual transactions' scopes, but I was curious if new rows and locking is handled differently between transactions.
No.
Primary keys are implemented with a UNIQUE (currently only) b-tree index. This is what happens when trying to write to the index, per documentation:
If a conflicting row has been inserted by an as-yet-uncommitted
transaction, the would-be inserter must wait to see if that
transaction commits. If it rolls back then there is no conflict. If it
commits without deleting the conflicting row again, there is a
uniqueness violation. (In practice we just wait for the other
transaction to end and then redo the visibility check in toto.)
Bold emphasis mine.
You can just try it with two open transactions (two different sessions) in parallel.
I'm using PostgreSQL 9.2 in a Windows environment.
I'm in a 2PC (2 phase commit) environment using MSDTC.
I have a client application, that starts a transaction at the SERIALIZABLE isolation level, inserts a new row of data in a table for a specific foreign key value (there is an index on the column), and vote for completion of the transaction (The transaction is PREPARED). The transaction will be COMMITED by the Transaction Coordinator.
Immediatly after that, outside of a transaction, the same client requests all the rows for this same specific foreign key value.
Because there may be a delay before the previous transaction is really commited, the SELECT clause may return a previous snapshot of the data. In fact, it does happen sometimes, and this is problematic. Of course the application may be redesigned but until then, I'm looking for a lock solution. Advisory Lock ?
I already solved the problem while performing UPDATE on specific rows, then using SELECT...FOR SHARE, and it works well. The SELECT waits until the transaction commits and return old and new rows.
Now I'm trying to solve it for INSERT.
SELECT...FOR SHARE does not block and return immediatley.
There is no concurrency issue here as only one client deals with a specific set of rows. I already know about MVCC.
Any help appreciated.
To wait for a not-yet-committed INSERT you'd need to take a predicate lock. There's limited predicate locking in PostgreSQL for the serializable support, but it's not exposed directly to the user.
Simple SERIALIZABLE isolation won't help you here, because SERIALIZABLE only requires that there be an order in which the transactions could've occurred to produce a consistent result. In your case this ordering is SELECT followed by INSERT.
The only option I can think of is to take an ACCESS EXCLUSIVE lock on the table before INSERTing. This will only get released at COMMIT PREPARED or ROLLBACK PREPARED time, and in the mean time any other queries will wait for the lock. You can enforce this via a BEFORE trigger to avoid the need to change the app. You'll probably get the odd deadlock and rollback if you do it that way, though, because INSERT will take a lower lock then you'll attempt lock promotion in the trigger. If possible it's better to run the LOCK TABLE ... IN ACCESS EXCLUSIVE MODE command before the INSERT.
As you've alluded to, this is mostly an application mis-design problem. Expecting to see not-yet-committed rows doesn't really make any sense.