Locking a newly created, uncommitted row in two concurrent READ COMMITTED database transactions - postgresql

If I have two READ COMMITTED PostgreSQL database transactions that both create a new row with the same primary key and then lock this row, is it possible to acquire both locks successfully at the same time?
My instinct is yes since these new rows both only exist in the individual transactions' scopes, but I was curious if new rows and locking is handled differently between transactions.

No.
Primary keys are implemented with a UNIQUE (currently only) b-tree index. This is what happens when trying to write to the index, per documentation:
If a conflicting row has been inserted by an as-yet-uncommitted
transaction, the would-be inserter must wait to see if that
transaction commits. If it rolls back then there is no conflict. If it
commits without deleting the conflicting row again, there is a
uniqueness violation. (In practice we just wait for the other
transaction to end and then redo the visibility check in toto.)
Bold emphasis mine.
You can just try it with two open transactions (two different sessions) in parallel.

Related

How to update an aggregate table in a trigger procedure while taking care of proper concurrency?

For illustration, say I'm updating a table ProductOffers and their prices. Mutations to this table are of the form: add new ProductOffer, change price of existing ProductOffer.
Based on the above changes, I'd like to update a Product-table which holds pricing info per product aggregated over all offers.
It seems logical to implement this using a row-based update/insert trigger, where the trigger runs a procedure creating/updating a Product row.
I'd like to properly implement concurrent updates (and thus triggers). I.e.: updating productOffers of the same Product concurrently, would potentially lead to wrong aggregate values (because multiple triggered procedures would concurrently attempt to insert/update the same Product-row)
It seems I cannot use row-based locking on the product-table (i.e.: select .. for update) because it's not guaranteed that a particular product-row already exists. Instead the first time around a Product row must be created (instead of updated) once a ProductOffer triggers the procedure. Afaik, row-locking can't work with new rows to be inserted, which totally makes sense.
So where does that leave me? Would I need to roll my own optimistic locking scheme? This would need to include:
check row not exists => create new row fail if already exists. (which is possible if 2 triggers concurrently try to create a row). Try again afterwards, with an update.
check row exists and has version=x => update row but fail if row.version !=x. Try again afterwards
Would the above work, or any better / more out-of-the-box solutions?
EDIT:
For future ref: found official example which exactly illustrates what I want to accomplish: Example 39-6. A PL/pgSQL Trigger Procedure For Maintaining A Summary Table
Things are much simpler than you think they are, thanks to the I an ACID.
The trigger you envision will run in the same transaction as the data modification that triggered it, and each modification to the aggregate table will first lock the row that it wants to update with an EXCLUSIVE lock.
So if two concurrent transactions cause an UPDATE on the same row in the aggregate table, the first transaction will get the lock and proceed, while the second transaction will have to wait until the first transaction commits (or rolls back) before it can get the lock on the row and modify it.
So data modifications that update the same row in the aggregate table will effectively be serialized, which may hurt performance, but guarantees exact results.

Postgres locks within a transaction

I'm having trouble understanding how locks interact with transactions in Postgres.
When I run this (long) query, I am surprised by the high degree of locking that occurs:
BEGIN;
TRUNCATE foo;
\COPY foo FROM 'backup.txt';
COMMIT;
The documentation for \COPY doesn't mention what level of lock it requires, but this post indicates that it only gets a RowExclusiveLock. But when I run this query during the \COPY:
SELECT mode, granted FROM pg_locks
WHERE relation='foo'::regclass::oid;
I get this:
mode granted
RowExclusiveLock true
ShareLock true
AccessExclusiveLock true
Where the heck is that AccessExclusiveLock coming from? I assume it's coming from the TRUNCATE, which requires an AccessExclusiveLock. But the TRUNCATE finishes quickly, so I'd expect the lock to release quickly as well. This leaves me with a few questions.
When a lock is acquired by a command within a transaction, is that lock released at the end of the command (before the end of the transaction)? If so, why do I observe the above behavior? If not, why not? In fact, since transactions don't touch the table until the COMMIT, why does a TRUNCATE in a transaction need to block the table at all?
I don't see any discussion of this in the documentation for transactions in PG.
There are a couple of misconceptions to be cleared up here.
First, a transaction does touch the table before it is committed. The comment you are quoting says that a ROLLBACK (and a COMMIT as well) don't touch the table, which is something different. They record the transaction state in the commit log (in pg_clog), and COMMIT flushes the transaction log to disk (one notable exception to this is TRUNCATE, which is relevant for your question: the old table is kept around until the end of the transaction and gets deleted during COMMIT).
If all changes were held back until COMMIT and no locks would be taken, COMMIT would be quite expensive and would routinely fail because of concurrent modifications. The transaction would have to remember the state of the database as it was before and check if the changes still apply. This way of handling concurrency is called optimistic concurreny control, and while it is a decent strategy for an application, it won't work well for a relational database, where COMMIT should be efficient and should not fail (unless there are major problems with the infrastructure).
So what relational databases use is pessimistic concurrency control or locking, i.e. they lock a database object before they access it to prevent concurrent activity from getting in their way.
Second, relational databases use two-phase locking, in which locks (at least the user-visible, so-called heavyweight locks) are always held until the end of the transaction.
This is necessary (but not sufficient) to keep transactions in a logical order (serializable) and consistent. What if you release a lock, and somebody else removes the row that your inserted, but uncommitted row refers to via a foreign key constraint?
Answer to the question
The upshot of all this is that your table will keep the ACCESS EXCLUSIVE lock from TRUNCATE until the end of the transaction. Isn't it evident why that is necessary? If other transactions were allowed to even read the table after the (as of yet uncommitted) TRUNCATE, they would find it empty, since TRUNCATE really empties the table and does not adhere to MVCC semantics. Such a dirty read (of uncommitted data that might yet be rolled back) cannot be allowed.
If you really need read access to the table during the refill, you could use DELETE instead of TRUNCATE. The downside is that this is a much more expensive operation that will leave the table with a lot of “dead tuples” that have to be removed by autovacuum, resulting in a lot of empty space (table bloat). But if you are willing to live with a table and indexes that are bloated such that table and index scans will take at least twice as long, it is an option.

Optimistic concurrency control across tables in Postgres

I'm looking for a way to manage optimistic concurrency control across more than one table in Postgres. I'm also trying to keep business logic out of the database. I have a table setup something like this:
CREATE TABLE master
(
id SERIAL PRIMARY KEY NOT NULL,
status VARCHAR NOT NULL,
some_value INT NOT NULL,
row_version INT NOT NULL DEFAULT(1)
)
CREATE TABLE detail
(
id SERIAL PRIMARY KEY NOT NULL,
master_id INT NOT NULL REFERENCES master ON DELETE CASCADE ON UPDATE CASCADE,
some_data VARCHAR NOT NULL
)
master.row_version is automatically incremented by a trigger whenever the row is updated.
The client application does the following:
Reads a record from the master table.
Calculates some business logic based on the values of the record, this may include a delay of several minutes involving user interaction.
Inserts a record into the detail table based on logic in step 2.
I want step 3 to be rejected if the value of master.row_version has changed since the record was read at step 1. Optimistic concurrency control seems to me like the right answer (the only answer?), but I'm not sure how to manage it across two tables like this.
I'm thinking a function in Postgres with a row-level lock on the relevant record in the master table is probably the way to go. But I'm not sure if this is my best/only option, or what that would look like (I'm a bit green on Postgres syntax).
I'm using Npgsql, given that the client application is written in C#. I don't know if there's anything in it which can help me? I'd like to avoid a function if possible, but I'm struggling to find a way to do this with straight-up SQL, and anonymous code blocks (at least in Npgsql) don't support the I/O operations I'd need.
Locking is out if you want to use optimistic concurrency control, see the Wikipedia article on the topic:
OCC assumes that multiple transactions can frequently complete without
interfering with each other. While running, transactions use data
resources without acquiring locks on those resources.
You could use a more complicated INSERT statement.
If $1 is the original row_version and $2 and $3 are master_id and some_data to be inserted in detail, run
WITH m(id) AS
(SELECT CASE WHEN master.row_version = $1
THEN $2
ELSE NULL
END
FROM master
WHERE master.id = $2)
INSERT INTO detail (master_id, some_data)
SELECT m.id, $3 FROM m
If row_version has changed, this will try to insert NULL as detail.id, which will cause an
ERROR: null value in column "id" violates not-null constraint
that you can translate into a more meaningful error message.
I've since come to the conclusion that a row lock can be employed in a "typical" pessimistic concurrency control approach, but when combined with a row version can produce a "hybrid" approach with some meaningful benefits.
Unsurprisingly, the choice of pessimistic, optimistic or "hybrid" concurrency control depends on the needs of the application.
Pessimistic Concurrency Control
A typical pessimistic concurrency control approach might look like this.
Begin database transaction.
Read (and lock) record from master table.
Perform business logic.
Insert a record into detail table.
Commit database transaction.
If the business logic at step 3 is long-running, this approach may be undesirable as it leads to a long-running transaction (generally unfavourable), and a long-running lock on the record in master which may be otherwise problematic for concurrency.
Optimistic Concurrency Control
An approach using only optimistic concurrency control might look more like this.
Read record (including row version) from master table.
Perform business logic.
Begin database transaction.
Increment row version on record in master table (an optimistic concurrency control check).
Insert a record into detail table.
Commit database transaction.
In this scenario, the database transaction is held for a shorter period of time, as are any (implicit) row locks. But, the increment of row version on the record in the master table may be a bit misleading to concurrent operations. Imagine several concurrent operations of this scenario, they'll start failing on the optimistic concurrency check because the row version has been incremented, even though the meaningful properties on the record haven't been changed.
Hybrid Concurrency Control
A "hybrid" approach uses both pessimistic locking and (sort of) optimistic locking, like this.
Read record (including row version) from master table.
Perform business logic.
Begin database transaction.
Re-read record from master table based on it's ID and row version (an optimistic concurrency control check of sorts) AND lock the row.
Insert a record into detail table.
Commit database transaction.
If step 4 fails to obtain a record, this should be considered an optimistic concurrency control check failure. The record has been changed since step 1 so the business logic is no longer valid.
Like the typical pessimistic concurrency control scenario, this involves a transaction and an explicit row lock, but the duration of the transaction+lock no longer includes the time necessary to perform the business logic.
Like the optimistic concurrency control scenario, the record requires a version. But where it differs is that the version is not updated, which means other operations depending on that row version won't be impacted.
Example of Hybrid Approach
An example of where the hybrid approach might be favourable:
A blog has a post table and comment table. Comments can be added to a post only if the post.comments_locked flag is false. The process for adding comments could use the hybrid approach, ensuring users can concurrently add comments without any concurrency exceptions.
The owner of the blog may edit their post, in which case the conventional optimistic concurrency control approach could be employed. The owner of the blog can have a long-running edit process which won't be affected by users adding comments. When the post is updated to the database, the version will be incremented, which means any in-progress comment-adding operations will fail, but they could be easily retried with a database-wins approach of re-fetching the post record from the database and retrying the process.

Dead lock occurs even update on different fields

I have been trying to develop replication from a Firebird database to other.
I simply add a new field to tables named replication_flag.
My replication program starts a read committed transaction, select rows, update this replication_flag field of rows then then commits or rollbacks.
My production client(s) does not update this replication_flag field and uses read committed isolation. My only one replication client only update this replication_flag field and does not update any other fields.
I still see dead locks and do not understand why. How can I avoid dead locks?
It seems that your replication app use a large transaction updating each record of each table. Probably, at the end, the whole database has been "locked".
You should consider using transactions by table or record packets. it's also possible to use a read-only transaction to read, and use an other transaction to write, with frequent commit, that allow other transactions to update the record.
An interesting slideshow: http://slideplayer.us/slide/1651121/

How to wait during SELECT that pending INSERT commit?

I'm using PostgreSQL 9.2 in a Windows environment.
I'm in a 2PC (2 phase commit) environment using MSDTC.
I have a client application, that starts a transaction at the SERIALIZABLE isolation level, inserts a new row of data in a table for a specific foreign key value (there is an index on the column), and vote for completion of the transaction (The transaction is PREPARED). The transaction will be COMMITED by the Transaction Coordinator.
Immediatly after that, outside of a transaction, the same client requests all the rows for this same specific foreign key value.
Because there may be a delay before the previous transaction is really commited, the SELECT clause may return a previous snapshot of the data. In fact, it does happen sometimes, and this is problematic. Of course the application may be redesigned but until then, I'm looking for a lock solution. Advisory Lock ?
I already solved the problem while performing UPDATE on specific rows, then using SELECT...FOR SHARE, and it works well. The SELECT waits until the transaction commits and return old and new rows.
Now I'm trying to solve it for INSERT.
SELECT...FOR SHARE does not block and return immediatley.
There is no concurrency issue here as only one client deals with a specific set of rows. I already know about MVCC.
Any help appreciated.
To wait for a not-yet-committed INSERT you'd need to take a predicate lock. There's limited predicate locking in PostgreSQL for the serializable support, but it's not exposed directly to the user.
Simple SERIALIZABLE isolation won't help you here, because SERIALIZABLE only requires that there be an order in which the transactions could've occurred to produce a consistent result. In your case this ordering is SELECT followed by INSERT.
The only option I can think of is to take an ACCESS EXCLUSIVE lock on the table before INSERTing. This will only get released at COMMIT PREPARED or ROLLBACK PREPARED time, and in the mean time any other queries will wait for the lock. You can enforce this via a BEFORE trigger to avoid the need to change the app. You'll probably get the odd deadlock and rollback if you do it that way, though, because INSERT will take a lower lock then you'll attempt lock promotion in the trigger. If possible it's better to run the LOCK TABLE ... IN ACCESS EXCLUSIVE MODE command before the INSERT.
As you've alluded to, this is mostly an application mis-design problem. Expecting to see not-yet-committed rows doesn't really make any sense.