Are postgresql transaction levels repeatable read and serializable the same? - postgresql

Quote from http://www.postgresql.org/docs/9.4/static/transaction-iso.html :
When you select the level Read Uncommitted you really get Read Committed, and phantom reads are not possible in the PostgreSQL implementation of Repeatable Read, so the actual isolation level might be stricter than what you select.
To clarify: does it mean pg's repeatable read = serializable ?

No; the difference is described on the page you linked to:
In fact, this isolation level works exactly the same as Repeatable Read except that it monitors for conditions which could make execution of a concurrent set of serializable transactions behave in a manner inconsistent with all possible serial (one at a time) executions of those transactions.
The documentation goes on to give an example where Repeatable Read and Serializable behave differently. A Serializable transaction can abort with a "serialization failure", but does not block any extra transactions from completing.
The section you quoted is explaining some anomalies because the standard SQL isolation levels are designed around locking data, but PostgreSQL is implemented with an "MVCC" design, where concurrent transactions can be given independent snapshots of the data. Thus some of the distinctions present in other systems don't apply, and Postgres interprets the isolation levels as "at least as strict as..."
As Mark Hildreth pointed out in comments, this distinction is only true from PostgreSQL 9.1 onwards. The documentation for 9.0 states:
But internally, there are only two distinct isolation levels, which correspond to the levels Read Committed and Serializable.
Whereas in newer versions this has been amended to:
But internally, there are only three distinct isolation levels, which correspond to the levels Read Committed, Repeatable Read, and Serializable.

No.
Repeatable read is snapshot isolation. It means your transaction sees a single consistent "snapshot" of the database. It is not full serializability, because some operations may produce results inconsistent with any serial ordering. For example, if one transaction inserts a row which matches another transaction's SELECT operation, and vice versa, this may cause serialization anomalies. Serializable uses a technology called "predicate locking" to detect these situations and reject any offending transactions. This "locking" does not block the transaction and cannot participate in a deadlock.

Related

Postgres: Opting in to weaker transaction isolation guarantees for a single table or a single read?

I'm writing a web app with Postgres 13 as the backend. Most requests are wrapped in a transaction, using the SERIALIZABLE isolation level.
For most things, this works great. However, there are some cases where I'd like some reads in the transaction to have less strict isolation.
For example, I'm introducing a global_flags table for infrequently-changed settings that any request might make use of:
await sqlAsync(`BEGIN; SET TRANSACTION ISOLATION LEVEL SERIALIZABLE`);
const batchSize = await sqlAsync(
`SELECT value FROM global_flags WHERE name = 'batch_size'`);
// ... a bunch more work ...
await sqlAsync('COMMIT');
I'm a bit worried that when we manually make changes to global_flags entries, we might cause an increase in "serialization failure" errors for in-flight transactions. Is there a way to tell Postgres that I don't need as strong of a consistency guarantee for reads of the global_flags table?
You needn't worry a bit.
If the one transaction does nothing except change that flag, and the other just reads the table (and doesn't try to write to it!), the two transactions will have a simple RW or WR dependency on each other, and there will be no cycles that cause a serialization error.

PostgreSQL Serialized Inserts Interleaving Sequence Numbers

I have multiple processes inserting into a Postgres (10.3) table using the SERIALIZED isolation level.
Another part of our system needs to read these records and be guaranteed that it receives all of them in sequence. For example, in the picture below, the consumer would need to
select * from table where sequanceNum > 2309 limit 5
and then receive sequence numbers 2310, 2311, 2312, 2313 and 2314.
The reading query is using READCOMMITTED isolation level.
What I'm seeing though is that the reading query is only receiving the rows I've highlighted in yellow. Looking at the xmin, I'm guessing that transaction 334250 had begun but not finished, then transactions 334251, 334252 et al started and finished prior to my reading query starting.
My question is, how did they get sequence numbers interleaved in those of 334250? Why weren't those transactions blocked by merrit of all of the writing transactions being serialized?
Any suggestions on how to achieve what I'm after? Which is, a guarantee that different transactions don't generate interleaving sequence numbers? (It's ok if there are gaps.... but they can't interleave).
Thanks very much for your help. I'm losing hair over this one!
PS - I just noticed that 334250 has a non zero xmax. Is that a clue that I'm missing perhaps?
The SQL standard in its usual brevity defines SERIALIZABLE as:
The execution of concurrent SQL-transactions at isolation level SERIALIZABLE is guaranteed to be serializable.
A serializable execution is defined to be an execution of the operations of concurrently executing SQL-transactions
that produces the same effect as some serial execution of those same SQL-transactions. A serial execution
is one in which each SQL-transaction executes to completion before the next SQL-transaction begins.
In the light of this definition, I understand that your wish is that the sequence numbers be in the same order as the “serial execution” that “produces the same effect”.
Unfortunately the equivalent serial ordering is not clear at the time the transactions begin, because statements later in the transaction can determine the “logical” order of the transactions.
Sequence numbers on the other hand are ordered according to the wall time when the number was requested.
In a way, you would need sequence numbers that are determined by something that is not certain until the transactions commit, and that is a contradiction in terms.
So I think that it is not possible to get what you want, unless you actually serialize the execution, e.g. by locking the table in SHARE ROW EXCLUSIVE mode before you insert the data.
My question is why you have that unusual demand. I cannot think of a good reason.

Can PostgreSQL Serializable be safely mixed with lower isolation levels?

I'm trying to figure out what the behaviour is when mixing Serialisable and lower levels of isolation, but not having much luck. Specifically, we have a reorder-queue-items transaction that is currently taking the table lock to ensure no new items will be added to it in the meantime. Will making it Serialised and reading the head of the queue (to ensure a dependency on the result) provide the same level of guarantee, even if other transactions operate at Repeatable Read?
Using PostgreSQL 9.5.
PS. I know there's another question with a similar title, but it's over 4 years old, much less specific in what it asks, and the only answer is essentially unsourced given the cited materials, and doesn't truly answer the question.
To guarantee serializability, all participating transactions should use the SERIALIZABLE isolation level.
But I am not certain that serializable isolation is the solution to your problem. It won't block anybody from reading from or writing to the queue, it will make some transactions fail, and the failing transaction might well be the one that is trying to reorder the queue.
I think that a lock is the way to go in such a case. Using serializable transactions will affect performance just as bad, the difference being that rather than waiting, you have to keep retrying transactions until reordering succeeds.

Making multiple users access to PSQL database

I'm a rookie in this topic, all I ever did was making a connection to database for one user, so I'm not familiar with making multiple user access to database.
My case is: 10 facilities will use my program for recording when workers are coming and leaving, the database will be on the main server and all I made was one user while I was programming/testing that program. My question is: Can multiple remote locations use one user for database to connect (there should be no collision because they are all writing different stuff, but at the same tables) and if that's not the case, what should I do?
Good relational databases handle this quite well, it is the “I” in the the so-called ACID properties of transactions in relational databases; it stands for isolation.
Concurrent processes are protected from simultaneously writing the same table row by locks that block other transactions until one transaction is done writing.
Readers are protected from concurrent writing by means of multiversion concurrency control (MVCC), which keeps old versions of the data around to serve readers without blocking anybody.
If you have enclosed all data modifications that belong together into a transaction, so that they happen atomically (the “A” in ACID), and your transactions are simple and short, your application will probably work just fine.
Problems may arise if these conditions are not satisfied:
If your data modifications are not protected by transactions, a concurrent session may see intermediate, incomplete results of a different session and thus work with inconsistent data.
If your transactions are complicated, later statements inside a transaction may rely on results of previous statements in indirect ways. This assumption can be broken by concurrent activity that modifies the data. There are three approaches to that:
Pessimistic locking: lock all data the first time you use them with something like SELECT ... FOR UPDATE so that nobody can modify them until your transaction is done.
Optimistic locking: don't lock, but whenever you access the data a second time, check that nobody else has modified them in the meantime. If that has been the case, roll the transaction back and try it again.
Use high transaction isolation levels like REPEATABLE READ and SERIALIZABLE which give better guarantees that the data you are using don't get modified concurrently. You have to be prepared to receive serialization errors if the database cannot keep the guarantees, in which case you have to roll the transaction back and retry it.
These techniques achieve the same goal in different ways. The discussion when to use which one exceeds the scope of this answer.
If your transactions are complicated and/or take a long time (long transactions are to be avoided as much as possible, because they cause all kinds of problems in a database), you may encounter a deadlock, which is two transactions locking each other in a kind of “deadly embrace”.
The database will detect this condition and interrupt one of the transactions with an error.
There are two ways to deal with that:
Avoid deadlocks by always locking resources in a certain order (e.g., always update the account with the lower account number first).
When you encounter a deadlock, your code has to retry the transaction.
Contrary to common believe, a deadlock is not necessarily a bug.
I recommend that you read the chapter about concurrency control in the PostgreSQL documentation.

atomic operations and atomic transactions

Can someone explain to me, whats the difference between atomic operations and atomic transactions? Its seems to me that these two are the same thing.Is that correct?
The concept of Atomicity is common between atomic transactions and atomic operations, but they are usually related to different domains.
Atomic Transactions are associated with Database operations where a set of actions must ALL complete or else NONE of them complete. For example, if someone is booking a flight, you want to both get payment AND reserve the seat OR do neither. If either one were allowed to succeed without the other also succeeding, the database would be inconsistent.
Atomic Operations on the other hand are usually associated with low-level programming with regards to multi-processing or multi-threading applications and are similar to Critical Sections.
For example, if two threads both access and modify the same variable, each thread goes through the following steps:
Read the variable from storage into local memory.
Modify the value in local memory.
Write the modified value back to the original storage location.
But in a multi-threaded system an interrupt or other context switch might happen after the first process has read the value but has not written it back. The second process (or interrupt) will then read and modify the OLD value and write its modified value back to storage. When the first process is re-enabled, it doesn't know that something might have changed so it writes back its change to the original value. Hence the operation that the second process did to the variable will be lost.
If an operation is atomic, it is guaranteed to complete without being interrupted once it begins. This is usually accomplished using hardware-level primitives like Test-and-Set or Compare-and-Swap.
To get a wider picture, you can take a look at:
MySQL Transactions and Atomic Operations
Atomicity (database systems)
Atomicity (Programming)
Some quotes from the above-cited resources:
About databases:
In an atomic transaction, a series of database operations either all
occur, or nothing occurs. A guarantee of atomicity prevents updates to
the database occurring only partially, which can cause greater
problems than rejecting the whole series outright. In other words,
atomicity means indivisibility and irreducibility.
About programming:
In concurrent programming, an operation (or set of operations) is
atomic, linearizable, indivisible or uninterruptible if it appears to
the rest of the system to occur instantaneously. Atomicity is a
guarantee of isolation from concurrent processes. Additionally, atomic
operations commonly have a succeed-or-fail definition — they either
successfully change the state of the system, or have no apparent
effect.
I have seen the word transaction used more often for databases and operation in programming, especially in kernel-level programming.
In a statement:
an atomic transaction is the smallest set of operations to perform the required steps.
Either all of those required operations happen(successfully) or the atomic transaction fails.
An atomic operation usually has nothing in common with transactions. To my knowledge this comes from hardware programming, where an set of operations (or one) happen to get solved instantly.