How does Postgres decide which transactions are visible to a given transaction according to the isolation level?
I know that Postgres uses xmin and xmax and compares it to xid, but I haven't found the articles with proper details.
Do you know the process under hood?
This depends on the current snapshot.
READ COMMITTED transactions take a new snapshot for every query, while REPEATABLE READ and SERIALIZABLE transactions take a snapshot when the first query is run and keep it for the whole duration of the transaction.
The snapshot is defined as struct SnapshotData in include/utils/snapshot.h and essentially contains the following:
a minimal transaction ID xmin: all older transactions are visible to this snapshot.
a maximal transaction ID xmax: all later transactions are not visible to this snapshot.
an array of transaction IDs xid that contains all in-between transactions that are not visible to this snapshot.
To determine if a tuple is visible to a snapshot, its xmin must be a committed transaction ID that is visible and its xmax must not be a committed transaction ID that is visible.
To determine if a transaction is committed or not, the commit log has to be consulted unless the hint bits of the tuple (which cache that information) have already been set.
Related
As per my understanding, I can see that a transaction is holding a snapshot by either of the columns backend_xid or backend_xmin not being NULL in pg_stat_activity.
I am currently investigating cases where backend_xid is not null for sessions from dbeaver and I don't understand why the transaction is requiring a snapshot. This is of interest as long running transaction that are holding a snapshot can cause problems, for autovacuum for instance.
My question is: Can I (serverside) find the reason why a transaction is holding a snapshot? Is there a table where I can see why the transaction is holding a snapshot?
backend_xid is the transaction ID of the session and does not mean that the session has an active snapshot. The documentation says:
Top-level transaction identifier of this backend, if any.
backend_xmin is described as
The current backend's xmin horizon.
“xmin horizon” is PostgreSQL jargon and refers to the lowest transaction ID that was active when the snapshot was taken. It is an upper limit of what VACUUM is allowed to remove.
Suppose I do the following list of operations in MongoDB
Start a session
Start a transaction for that session
run an insert command with a new document
run a find command on the collection of the inserted document
Commit the transaction
End the session
I understand that outside of the transaction the insert done in the third step will not be visible until the transaction is committed, but what about within the transaction, will the find run in the fourth step see this new document or will it not?
Yes, a transactional find sees a document inserted in a previous transactional insert. You could assume the Read your own writes property.
Every time a transaction is started a new snapshot is created. Outside the transaction, the snapshot is obviously invisible: this is accomplished by using (and abusing if your transaction involves many updates) the WiredTiger cache. This cache is structured as a tree, similar to the following one1:
where each transactional operation is represented as a new update block that could in turn be chained to another update block.
Outside operations only see non-transactional tree entries, while transactional operations see all the entries added before the snapshot is taken + the update blocks for the given transaction.
I am aware that it is a very brief explanation on how MongoDB manages transaction atomicity, but if you are interested in understanding more on this, then I suggest you to read a report I have written. in the same repository you can find some scenarios for the most typical doubts.
1: image taken from Aly Cabral's presentation about How and when to use Multi-document transactions
If I have two READ COMMITTED PostgreSQL database transactions that both create a new row with the same primary key and then lock this row, is it possible to acquire both locks successfully at the same time?
My instinct is yes since these new rows both only exist in the individual transactions' scopes, but I was curious if new rows and locking is handled differently between transactions.
No.
Primary keys are implemented with a UNIQUE (currently only) b-tree index. This is what happens when trying to write to the index, per documentation:
If a conflicting row has been inserted by an as-yet-uncommitted
transaction, the would-be inserter must wait to see if that
transaction commits. If it rolls back then there is no conflict. If it
commits without deleting the conflicting row again, there is a
uniqueness violation. (In practice we just wait for the other
transaction to end and then redo the visibility check in toto.)
Bold emphasis mine.
You can just try it with two open transactions (two different sessions) in parallel.
Let's assume in SQL window 1 I do:
-- query 1
BEGIN TRANSACTION;
UPDATE post SET title = 'edited' WHERE id = 1;
-- note that there is no explicit commit
Then from another window (window 2) I do:
-- query 2
SELECT * FROM post WHERE id = 1;
I get:
1 | original title
Which is fine as the default isolation level is READ COMMITTED and because query 1 is never committed, the change it performs is not readable until I explicitly commit from window 1.
In fact if I, in window 1, do:
COMMIT TRANSACTION;
I can then see the change if I re-run query 2.
1 | edited
My question is:
Why is query 2 returning fine the first time I run it? I was expecting it to block as the transaction in window 1 was not committed yet and the lock placed on row with id = 1 was (should be) an unreleased exclusive one that should block a read like the one performed in window 2. All the rest makes sense to me but I was expecting the SELECT to get stuck until an explicit commit in window 1 was executed.
The behaviour you describe is normal and expected in any transactional relational database.
If PostgreSQL showed you the value edited for the first SELECT it'd be wrong to do so - that's called a "dirty read", and is bad news in databases.
PostgreSQL would be allowed to wait at the SELECT until you committed or rolled back, but it isn't required to by the SQL standard, you haven't told it you want to wait, and it doesn't have to wait for any technical reason, so it returns the data you asked for immediately. After all, until it's committed, that update only kind-of exists - it still might or might not happen.
If PostgreSQL always waited here, then you'd quickly land up with a situation where only one connection could be doing anything with the database at a time. Not pretty for performance, and totally unnecessary the vast majority of the time.
If you want to wait for a concurrent UPDATE (or DELETE), you'd use SELECT ... FOR SHARE. (But be aware that this won't work for INSERT).
Details:
SELECT without a FOR UPDATE or FOR SHARE clause does not take any row level locks. So it sees whatever is the current committed row, and is not affected by any in-flight transactions that might be modifying that row. The concepts are explained in the MVCC section of the docs. The general idea is that PostgreSQL is copy-on-write, with versioning that allows it to return the correct copy based on what the transaction or statement could "see" at the time it started - what PostgreSQL calls a "snapshot".
In the default READ COMMITTED isolation snapshots are taken at the statement level, so if you SELECT a row, COMMIT a change to it from another transaction, and SELECT it again you'll see different values even within one transation. You can use SNAPSHOT isolation if you don't want to see changes committed after the transaction begins, or SERIALIZABLE isolation to add further protection against certain kinds of transaction inter-dependencies.
See the transaction isolation chapter in the documentation.
If you want a SELECT to wait for in-progress transactions to commit or rollback changes to rows being selected, you must use SELECT ... FOR SHARE. This will block on the lock taken by an UPDATE or DELETE until the transaction that took the lock rolls back or commits.
INSERT is different, though - the tuples just don't exist to other transactions until commit. The only way to wait for concurrent INSERTs is to take an EXCLUSIVE table-level lock, so you know nobody else is changing the table while you read it. Usually the need to do that means you have a design problem in the application though - your app should not care if there are uncommitted inserts still in flight.
See the explicit locking chapter of the documentation.
In PostgreSQL's MVCC implementation, the principle is reading does not block writing and vice-versa. The manual:
The main advantage of using the MVCC model of concurrency control
rather than locking is that in MVCC locks acquired for querying
(reading) data do not conflict with locks acquired for writing data,
and so reading never blocks writing and writing never blocks reading.
PostgreSQL maintains this guarantee even when providing the strictest
level of transaction isolation through the use of an innovative
Serializable Snapshot Isolation (SSI) level.
Each transaction only sees (mostly) what has been committed before the transaction began.
That does not mean there'd be no locking. Not at all. For many operations various kinds of locks are acquired. And various strategies are applied to resolve possible conflicts.
I found the following description for the Serializable (IsolationLevel.Serializable) isolation level in the MSDN documentation:
Volatile data can be read but not modified, and no new data can be added during the transaction.
(Reference)
And on the same page volatile data is defined as:
The data affected by a transaction is called volatile.
My question is, how can I prevent other transactions from reading volatile data and also prevent them from adding any new data.
Thank you very much.
I think this is highest isolation level you can get. According to this link , this should be enough for your need.
SERIALIZABLE Specifies the following: Statements cannot read data that
has been modified but not yet committed by other transactions. No
other transactions can modify data that has been read by the current
transaction until the current transaction completes. Other
transactions cannot insert new rows with key values that would fall in
the range of keys read by any statements in the current transaction
until the current transaction completes.