Can PostgreSQL Serializable be safely mixed with lower isolation levels? - postgresql

I'm trying to figure out what the behaviour is when mixing Serialisable and lower levels of isolation, but not having much luck. Specifically, we have a reorder-queue-items transaction that is currently taking the table lock to ensure no new items will be added to it in the meantime. Will making it Serialised and reading the head of the queue (to ensure a dependency on the result) provide the same level of guarantee, even if other transactions operate at Repeatable Read?
Using PostgreSQL 9.5.
PS. I know there's another question with a similar title, but it's over 4 years old, much less specific in what it asks, and the only answer is essentially unsourced given the cited materials, and doesn't truly answer the question.

To guarantee serializability, all participating transactions should use the SERIALIZABLE isolation level.
But I am not certain that serializable isolation is the solution to your problem. It won't block anybody from reading from or writing to the queue, it will make some transactions fail, and the failing transaction might well be the one that is trying to reorder the queue.
I think that a lock is the way to go in such a case. Using serializable transactions will affect performance just as bad, the difference being that rather than waiting, you have to keep retrying transactions until reordering succeeds.

Related

Do Firebase/Firestore Transactions create internal queues?

I'm wondering if transactions (https://firebase.google.com/docs/firestore/manage-data/transactions) are viable tools to use in something like a ticketing system where users maybe be attempting to read/write to the same collection/document and whoever made the request first will be handled first and second will be handled second etc.
If not what would be a good structure for such a need with firestore?
Transactions just guarantee atomic consistent update among the documents involved in the transaction. It doesn't guarantee the order in which those transactions complete, as the transaction handler might get retried in the face of contention.
Since you tagged this question with google-cloud-functions (but didn't mention it in your question), it sounds like you might be considering writing a database trigger to handle incoming writes. Cloud Functions triggers also do not guarantee any ordering when under load.
Ordering of any kind at the scale on which Firestore and other Google Cloud products operate is a really difficult problem to solve (please read that link to get a sense of that). There is not a simple database structure that will impose an order where changes are made. I suggest you think carefully about your need for ordering, and come up with a different solution.
The best indication of order you can get is probably by adding a server timestamp to individual documents, but you will still have to figure out how to process them. The easiest thing might be to have a backend periodically query the collection, ordered by that timestamp, and process things in that order, in batch.

Making multiple users access to PSQL database

I'm a rookie in this topic, all I ever did was making a connection to database for one user, so I'm not familiar with making multiple user access to database.
My case is: 10 facilities will use my program for recording when workers are coming and leaving, the database will be on the main server and all I made was one user while I was programming/testing that program. My question is: Can multiple remote locations use one user for database to connect (there should be no collision because they are all writing different stuff, but at the same tables) and if that's not the case, what should I do?
Good relational databases handle this quite well, it is the “I” in the the so-called ACID properties of transactions in relational databases; it stands for isolation.
Concurrent processes are protected from simultaneously writing the same table row by locks that block other transactions until one transaction is done writing.
Readers are protected from concurrent writing by means of multiversion concurrency control (MVCC), which keeps old versions of the data around to serve readers without blocking anybody.
If you have enclosed all data modifications that belong together into a transaction, so that they happen atomically (the “A” in ACID), and your transactions are simple and short, your application will probably work just fine.
Problems may arise if these conditions are not satisfied:
If your data modifications are not protected by transactions, a concurrent session may see intermediate, incomplete results of a different session and thus work with inconsistent data.
If your transactions are complicated, later statements inside a transaction may rely on results of previous statements in indirect ways. This assumption can be broken by concurrent activity that modifies the data. There are three approaches to that:
Pessimistic locking: lock all data the first time you use them with something like SELECT ... FOR UPDATE so that nobody can modify them until your transaction is done.
Optimistic locking: don't lock, but whenever you access the data a second time, check that nobody else has modified them in the meantime. If that has been the case, roll the transaction back and try it again.
Use high transaction isolation levels like REPEATABLE READ and SERIALIZABLE which give better guarantees that the data you are using don't get modified concurrently. You have to be prepared to receive serialization errors if the database cannot keep the guarantees, in which case you have to roll the transaction back and retry it.
These techniques achieve the same goal in different ways. The discussion when to use which one exceeds the scope of this answer.
If your transactions are complicated and/or take a long time (long transactions are to be avoided as much as possible, because they cause all kinds of problems in a database), you may encounter a deadlock, which is two transactions locking each other in a kind of “deadly embrace”.
The database will detect this condition and interrupt one of the transactions with an error.
There are two ways to deal with that:
Avoid deadlocks by always locking resources in a certain order (e.g., always update the account with the lower account number first).
When you encounter a deadlock, your code has to retry the transaction.
Contrary to common believe, a deadlock is not necessarily a bug.
I recommend that you read the chapter about concurrency control in the PostgreSQL documentation.

Are postgresql transaction levels repeatable read and serializable the same?

Quote from http://www.postgresql.org/docs/9.4/static/transaction-iso.html :
When you select the level Read Uncommitted you really get Read Committed, and phantom reads are not possible in the PostgreSQL implementation of Repeatable Read, so the actual isolation level might be stricter than what you select.
To clarify: does it mean pg's repeatable read = serializable ?
No; the difference is described on the page you linked to:
In fact, this isolation level works exactly the same as Repeatable Read except that it monitors for conditions which could make execution of a concurrent set of serializable transactions behave in a manner inconsistent with all possible serial (one at a time) executions of those transactions.
The documentation goes on to give an example where Repeatable Read and Serializable behave differently. A Serializable transaction can abort with a "serialization failure", but does not block any extra transactions from completing.
The section you quoted is explaining some anomalies because the standard SQL isolation levels are designed around locking data, but PostgreSQL is implemented with an "MVCC" design, where concurrent transactions can be given independent snapshots of the data. Thus some of the distinctions present in other systems don't apply, and Postgres interprets the isolation levels as "at least as strict as..."
As Mark Hildreth pointed out in comments, this distinction is only true from PostgreSQL 9.1 onwards. The documentation for 9.0 states:
But internally, there are only two distinct isolation levels, which correspond to the levels Read Committed and Serializable.
Whereas in newer versions this has been amended to:
But internally, there are only three distinct isolation levels, which correspond to the levels Read Committed, Repeatable Read, and Serializable.
No.
Repeatable read is snapshot isolation. It means your transaction sees a single consistent "snapshot" of the database. It is not full serializability, because some operations may produce results inconsistent with any serial ordering. For example, if one transaction inserts a row which matches another transaction's SELECT operation, and vice versa, this may cause serialization anomalies. Serializable uses a technology called "predicate locking" to detect these situations and reject any offending transactions. This "locking" does not block the transaction and cannot participate in a deadlock.

Concurrency, Atomicty, and Isolation in Entity Framework

Based on some periodically and concurrently incoming data, I'm performing an operation that will either insert a new row into a table, or update an existing row in the same table. Whether it inserts or updates a row is dependent on the states of the existing rows. So, the result of this operation will be affected by previous runs of this operation, and affect subsequent runs. I need to ensure atomicity/isolation using transactions, or locks, or something. There seems to be so many options and caveats with Entity Framework (and I'm a complete newbie with database stuff in general too) that I have no idea what direction I should be headed. TransactionScope, BeginTransaction, ambient transactions? Serializable or RepeatableRead? SaveChanges and AcceptAllChanges? Do I even need to do anything special? The fact that a new row can be added makes me worry especially about phantom rows, though I barely understand what that means. Any guidance on the subject would be greatly appreciated.
This tutorial may be helpful to you - http://www.asp.net/mvc/tutorials/getting-started-with-ef-using-mvc/handling-concurrency-with-the-entity-framework-in-an-asp-net-mvc-application
Quote:
Pessimistic Concurrency (Locking)
If your application does need to prevent accidental data loss in
concurrency scenarios, one way to do that is to use database locks.
This is called pessimistic concurrency. For example, before you read a
row from a database, you request a lock for read-only or for update
access. If you lock a row for update access, no other users are
allowed to lock the row either for read-only or update access, because
they would get a copy of data that's in the process of being changed.
If you lock a row for read-only access, others can also lock it for
read-only access but not for update. Managing locks has some
disadvantages. It can be complex to program. It requires significant
database management resources, and it can cause performance problems
as the number of users of an application increases (that is, it
doesn't scale well). For these reasons, not all database management
systems support pessimistic concurrency. The Entity Framework provides
no built-in support for it, and this tutorial doesn't show you how to
implement it.
Optimistic Concurrency
The alternative to pessimistic concurrency is optimistic concurrency.
Optimistic concurrency means allowing concurrency conflicts to happen,
and then reacting appropriately if they do. For example, John runs the
Departments Edit page, changes the Budget amount for the English
department from $350,000.00 to $100,000.00. (John administers a
competing department and wants to free up money for his own
department.)*
There are code examples for both models in the in the tutorial.

Is LockModeType.PESSIMISTIC_WRITE sufficient for an UPSERT in JPA?

I've read this article on JPA concurrency, but either I am too thick or it is not explicit enough.
I am looking to do a database-controlled atomic update-if-found-else-insert operation (an UPSERT).
It looks to my poor slow brain that I can--within a transaction of course--run a named query with a lock mode of PESSIMISTIC_WRITE, see if it returns any results, and then either a persist() or an update() afterwards.
What I am not clear on are the differences between doing this operation with a PESSIMISTIC_WRITE lock vs. a PESSIMISTIC_READ lock. I've read the sentences--I understand that PESSIMISTIC_READ is intended to prevent non-repeatable reads, and PESSIMISTIC_WRITE is...well, maybe I don't understand that one so well :-) --but underneath it's just a SQL SELECT FOR UPDATE, yeah? In both cases?
I am looking to do a database-controlled atomic update-if-found-else-insert operation (an UPSERT).
I'm maybe not answering exactly the whole question but if you want to implement the above without any race condition, you need IMO a table-level LOCK IN EXCLUSIVE MODE (not only rows). I don't know if this can be done with JPA. Maybe you could clarify what would be acceptable for you.
I have faced this kind of situation and found this:
Pessimistic Locking, that means locking of objects on transaction begin and keeping the lock during transaction is done by these 2 PessimisticLockModes:
- LockModeType.PESSIMISTIC_READ -->
entity can be read by other transactions but no changes can be made
- LockModeType.PESSIMISTIC_WRITE -->
entity can not be read or written by other transactions
link to the article
I am looking to do a database-controlled atomic
update-if-found-else-insert operation (an UPSERT).
INSERT .. ON DUPLICATE KEY UPDATE does that.