I'm trying to understand MVCC and can't get it. For example.
Transaction1 (T1) try to read some data row. In that same time T2 update the same row.
The flow of transaction is
T1 begin -> T2 begin -> T2 commit -> T1 commit
So the first transaction get it snapshot of database and returns to user result on which he is gonna build other calculation. But as I understand, customer get the old data value? As I understand MVCC, after T1 transaction begins, that transaction doesn't know, that some other transaction change data. So if now user doing some calculation after that (without DB involved), he is doing it on wrong data? Or I'm not right and first transaction have some mechanisms to know that row was changed?
Let's now change the transaction flow.
T2beg -> T1beg -> T2com -> T1com
In 2PL user get the newer version of data because of locks (T1 must wait before exclusive lock released). But in case of MVCC it still will be the old data, as I understand the postgresql MVCC model. So I can get stale data in exchange for speed. Am I right? Or I miss something?
Yes, it can happen that you read some old data from the database (that a concurrent transaction has modified), perform a calculation based on this and store “outdated” data in the database.
This is not a problem, because rather than the actual order in which transactions happen, the logical order is more relevant:
If T1 reads some data, then T2 modifies the data, then T1 modifies the database based on what it read, you can say that T1 logically took place before T2, and there is no inconsistency.
You only get an anomaly if T1 and T2 modify the same data: T1 reads data, T2 modifies the data, then T1 modifies the same data based on what it read. This anomaly is called a “lost update”.
Lost updates can only occur with the weakest (which is the default) isolation level, READ COMMITTED.
If you want better isolation from concurrent activity, you need to use at least REPEATABLE READ isolation. Then T1 would receive a serialization error when it tries to update the data, and you would have to repeat the transaction. On the second attempt, it would read the new data, and everything will be consistent.

Unless T1 and T2 try to update the same row, this could usually be reordered to be the same as: T1 begin; T1 commit; T2 begin; T2 commit;
In other words, any undesirable business outcome that could be achieved by T2 changing the data while T1 was making a decision, could have also occurred by T2 changing the data immediately after T1 made the same decision.

Each transaction can only see data that is younger or equal to that transaction's ID.
When transaction 1 reads data, it marks the read time stamp of that data to transaction 1.
If transaction 2 tries to read the same data, it checks the read timestamp of the data, if the read timestamp for the data is less than transaction 2, then transaction 2 is aborted because 1 < 2 -- 1 got there before us and they must finish before us.
At commit time, we also check if the read timestamp of the data is less than the committing transaction. If it is, we abort the transaction and restart with a new transaction ID.
We also check if write timestamps are less than our transaction. Younger transaction wins.
There is an edge case where a younger transaction can abort if an older transaction gets ahead of the younger transaction.
Taken a table like this:
create table mytable(id SERIAL, largeImage bytea);
Imagine 2 process (A, B) simultaneously do inserts:
Process A arrives first but contains a very large file.
After that Process B arrives but very small file.
I suppose that inserts will work in paralell and consistency order (id assignament) is done because nextval is assigned on arrives moment. (Maybe I am wrong).
Because the Process B file is small than process A, the Process B is faster thant Process A to save on disk. (my guess)
My questions:
Process A will be assigned with ID=1 a, d Process B will get Id=2. Thats correct?
It is posible (rare but possible) that "select * from mytable" will return only the ID=2 because save disk operation of Process A (Id=1) has not finished already?
Thanks in advance.
Probably yes, but I have not tested it.
This is possible for a variety of reasons and not only for a short period of time. Imagine the process that got the ID=1 crashes before it finishes the transaction. You will then never see a row with ID=1 in your table.
If you catch yourself assuming that SERIAL means "continuous", take a step back and redesign your data model. SERIAL rather means "automatic value that is guaranteed to be unique". It is a good idea for an ID, it is generally not appropriate for numbering.

Ok so here's the schema, which is pretty self-explanatory:
STORE(storeID, name, city)
PRODUCT(productID, name, brand)
PRODUCT_FOR_SALE(productID, storeID, price)
I have 2 transactions: T1 and T2.
T1 raises by 5% the price for any product sold in any store in 'London'.
T2 lowers by 10% the price for any product whose cost is >= $1050
What I am asked is to tell what kind of concurrency anomaly they may result in, and what isolation level I should apply to which transaction to make it safe.
The code for the transactions is not given, but I suppose it would be something on the lines of:
# T1:
UPDATE product_for_sale
SET price = price + ((price/100) *5)
WHERE storeID IN (SELECT storeID FROM store WHERE city='London')
# T2:
UPDATE product_for_sale
SET price = price - (price/10)
WHERE price >= 1050
My "guess" to what might happen with READ COMMITTED (default) is:
Considering a product P, sold in 'London' for $1049
both transactions begin
they both consider their row sets: T1 will consider all products sold in London (which includes P), T2 will consider products whose price is $1050 or more (which excludes P)
T1 commits and sets the price of P to $1101 but, since P wasn't in T2's row set to begin with, the change goes unnoticed, and T2 commits without considering it
Which, if I'm not messing up definitions, should be a case of phantom read, which would be fixed if I set T2 to ISOLATION LEVEL REPEATABLE READ
First, it is not quite clear what you mean with a concurrency issue. It could be:
something that could conceivably be a problem, but is handled by PostgreSQL automatically so that no problems arise.
something that can generate an unexpected error or an undesirable result.
For 1., this is handled by locks that serialize transactions that are trying to modify the same rows concurrently.
I assume you are more interested in 2.
What you describe can happen, but it is not a concurrency problem. It just means that T1 logically takes place before T2. This will work just fine on all isolation levels.
I might be missing something, but the only potential problem I see here is a deadlock between the two statements:
They both can update several rows, so it could happen that one of them updates row X first and then tries to update row Y, which has already been updated by the other statement. The first statement then is blocked. Now the second statement wants to update row Y and is blocked too.
Such a deadlock is broken by the deadlock resolver after one second by killing off one of the statements with an error.
Note that deadlocks are not real problems either, all your code has to do is to repeat the failed transaction.

My application uses pessimistic locking. When a user opens the form for update a record, the application executes this query (table names are exemplary):
select *
from master m
natural join detail d
where m.master_id = 123456
for update nowait;
The query locks one master row and several (to several dozen) detail rows. Transaction is open until a user confirms or cancels updates.
I need to know what rows (at least master rows) are locked. I have excavated the documentation and postgres wiki without success.
Is it possible to list all locked rows?
PostgreSQL 9.5 added a new option to FOR UPDATE that provides a straightforward way to do this.
SELECT master_id
FROM master
WHERE master_id NOT IN (
SELECT master_id
FROM master
This acquires locks on all the not-currently-locked rows, so think through whether that's a problem for you, especially if your table is large. If nothing else, you'll want to avoid doing this in an open transaction. If your table is huge you can apply additional WHERE conditions and step through it in chunks to avoid locking everything at once.
Is it possible? Probably yes, but it is the Greatest Mystery of Postgres. I think you would need to write your own extension for it (*).
However, there is an easy way to work around the problem. You can use very nice Postgres feature, advisory locks. Two arguments of the function pg_try_advisory_lock(key1 int, key2 int) you can interpret as: table oid (key1) and row id (key2). Then
select pg_try_advisory_lock(('master'::regclass)::integer, 123456)
locks row 123456 of table master, if it was not locked earlier. The function returns boolean.
After update the lock has to be freed:
select pg_advisory_unlock(('master'::regclass)::integer, 123456)
And the nicest thing, list of locked rows:
select classid::regclass, objid
from pg_locks
where locktype = 'advisory'
Advisory locks may be complementary to regular locks or you can use them independently. The second option is very temptive, as it can significantly simplify the code. But it should be applied with caution because you have to make sure that all updates (deletes) on the table in all applications are performed with this locking.
(*) Mr. Tatsuo Ishii did it (I did not know about it, have just found).

Is there a way to generate some kind of in-order identifier for a table records?
Suppose that we have two threads doing queries:
Thread 1:
insert into table1(id, value) values (nextval('table1_seq'), 'hello');
Thread 2:
insert into table1(id, value) values (nextval('table1_seq'), 'world');
It's entirely possible (depending on timing) that an external observer would see the (2, 'world') record appear before the (1, 'hello').
That's fine, but I want a way to get all the records in the 'table1' that appeared since the last time the external observer checked it.
So, is there any way to get the records in the order they were inserted? Maybe OIDs can help?
No. Since there is no natural order of rows in a database table, all you have to work with is the values in your table.
Well, there are the Postgres specific system columns cmin and ctid you could abuse to some degree.
The tuple ID (ctid) contains the file block number and position in the block for the row. So this represents the current physical ordering on disk. Later additions will have a bigger ctid, normally. Your SELECT statement could look like this
SELECT *, ctid -- save ctid from last row in last_ctid
FROM tbl
WHERE ctid > last_ctid
ctid has the data type tid. Example: '(0,9)'::tid
However it is not stable as long-term identifier, since VACUUM or any concurrent UPDATE or some other operations can change the physical location of a tuple at any time. For the duration of a transaction it is stable, though. And if you are just inserting and nothing else, it should work locally for your purpose.
I would add a timestamp column with default now() in addition to the serial column ...
I would also let a column default populate your id column (a serial or IDENTITY column). That retrieves the number from the sequence at a later stage than explicitly fetching and then inserting it, thereby minimizing (but not eliminating) the window for a race condition - the chance that a lower id would be inserted at a later time. Detailed instructions:
Auto increment table column
What you want is to force transactions to commit (making their inserts visible) in the same order that they did the inserts. As far as other clients are concerned the inserts haven't happened until they're committed, since they might roll back and vanish.
This is true even if you don't wrap the inserts in an explicit begin / commit. Transaction commit, even if done implicitly, still doesn't necessarily run in the same order that the row its self was inserted. It's subject to operating system CPU scheduler ordering decisions, etc.
Even if PostgreSQL supported dirty reads this would still be true. Just because you start three inserts in a given order doesn't mean they'll finish in that order.
There is no easy or reliable way to do what you seem to want that will preserve concurrency. You'll need to do your inserts in order on a single worker - or use table locking as Tometzky suggests, which has basically the same effect since only one of your insert threads can be doing anything at any given time.
You can use advisory locking, but the effect is the same.
Using a timestamp won't help, since you don't know if for any two timestamps there's a row with a timestamp between the two that hasn't yet been committed.
You can't rely on an identity column where you read rows only up to the first "gap" because gaps are normal in system-generated columns due to rollbacks.
I think you should step back and look at why you have this requirement and, given this requirement, why you're using individual concurrent inserts.
Maybe you'll be better off doing small-block batched inserts from a single session?
If you mean that every query if it sees world row it has to also see hello row then you'd need to do:
lock table table1 in share update exclusive mode;
insert into table1(id, value) values (nextval('table1_seq'), 'hello');
This share update exclusive mode is the weakest lock mode which is self-exclusive — only one session can hold it at a time.
Be aware that this will not make this sequence gap-less — this is a different issue.
We found another solution with recent PostgreSQL servers, similar to #erwin's answer but with txid.
When inserting rows, instead of using a sequence, insert txid_current() as row id. This ID is monotonically increasing on each new transaction.
Then, when selecting rows from the table, add to the WHERE clause id < txid_snapshot_xmin(txid_current_snapshot()).
txid_snapshot_xmin(txid_current_snapshot()) corresponds to the transaction index of the oldest still-open transaction. Thus, if row 20 is committed before row 19, it will be filtered out because transaction 19 will still be open. When the transaction 19 is committed, both rows 19 and 20 will become visible.
When no transaction is opened, the snapshot xmin will be the transaction id of the currently running SELECT statement.
The returned transaction IDs are 64-bits, the higher 32 bits are an epoch and the lower 32 bits are the actual ID.
Here is the documentation of these functions:
Credits to tux3 for the idea.

We have noticed a rare occurrence of a deadlock on a Postgresql 9.2 server on the following situation:
T1 starts the batch operation:
UPDATE BB bb SET status = 'PROCESSING', chunk_id = 0 WHERE bb.status ='PENDING'
AND bb.bulk_id = 1 AND bb.user_id IN (SELECT user_id FROM BB WHERE bulk_id = 1
AND chunk_id IS NULL AND status ='PENDING' LIMIT 2000)
When T1 commits after a few hundred milliseconds or so (BB has many millions of rows), multiple threads begin new Transactions (one transaction per thread) that read items from BB, do some processing and update them in batches of 50 or so with the queries:
For select:
SELECT *, RANK() as rno OVER(ORDER BY user_id) FROM BB WHERE status = 'PROCESSING' AND bulk_id = 1 and rno = $1
And Update:
UPDATE BB set datetime=$1, status='DONE', message_id=$2 WHERE bulk_id=1 AND user_id=$3
(user_id, bulk_id have a UNIQUE constraint).
Due to an external to the situation problem, another transaction T2 executes the same query with T1 almost immediately after T1 has committed (the initial batch operation where items are marked as 'PROCESSING').
UPDATE BB bb SET status = 'PROCESSING', chunk_id = 0 WHERE bb.status ='PENDING'
AND bb.bulk_id = 1 AND bb.user_id IN (SELECT user_id FROM BB WHERE bulk_id = 1
AND chunk_id IS NULL AND status ='PENDING' LIMIT 2000)
However although these items are marked as 'PROCESSING' this query deadlocks with some of the updates (which are done in batches as i said) off the worker threads. To my understanding this should not happen with READ_COMMITTED isolation level (default) that we use. I am sure that T1 has committed because the worker threads execute after it has done so.
edit: One thing i should clear up is that T2 starts after T1 but before it commits. However due to a write_exclusive tuple lock that we acquire with a SELECT for UPDATE on the same row (that is not affected by any of the above queries), it waits for T1 to commit before it runs the batch update query.
When T1 commits after a few hundred milliseconds or so (BB has many millions of rows), multiple threads begin new Transactions (one transaction per thread) that read items from BB, do some processing and update them in batches of 50 or so with the queries:
This strikes me as a concurrency problem. I think you are far better off to have one transaction read the rows and hand them off to worker processes, and then update them in batches, when they come back. Your fundamental problem is going to be that these rows are effectively working on uncertain state, holding the rows during transactions, and the like. You have to handle rollbacks and so forth separately, and consequently the locking is a real problem.
Now, if that solution is not possible, I would have a separate locking table. In this case, each thread spins up separately, locks the locking table, claims a bunch of rows, inserts records into the locking table, and commits. In this way each one thread has claimed records. Then they can work on their record sets, update them, etc. You may want to have a process which periodically clears out stale locks.
In essence your problem is that rows go from state A -> processing -> state B and may be rolled back. Since the other threads have no way of knowing what rows are processing and by which threads, you can't safely allocate records. One option is to change the model to:
state A -> claimed state -> processing -> state B. However you have to have some way of ensuring that rows are effectively allocated and that the threads know which rows have been allocated to eachother.