Concurrency, Atomicty, and Isolation in Entity Framework - entity-framework

Based on some periodically and concurrently incoming data, I'm performing an operation that will either insert a new row into a table, or update an existing row in the same table. Whether it inserts or updates a row is dependent on the states of the existing rows. So, the result of this operation will be affected by previous runs of this operation, and affect subsequent runs. I need to ensure atomicity/isolation using transactions, or locks, or something. There seems to be so many options and caveats with Entity Framework (and I'm a complete newbie with database stuff in general too) that I have no idea what direction I should be headed. TransactionScope, BeginTransaction, ambient transactions? Serializable or RepeatableRead? SaveChanges and AcceptAllChanges? Do I even need to do anything special? The fact that a new row can be added makes me worry especially about phantom rows, though I barely understand what that means. Any guidance on the subject would be greatly appreciated.

This tutorial may be helpful to you - http://www.asp.net/mvc/tutorials/getting-started-with-ef-using-mvc/handling-concurrency-with-the-entity-framework-in-an-asp-net-mvc-application
Quote:
Pessimistic Concurrency (Locking)
If your application does need to prevent accidental data loss in
concurrency scenarios, one way to do that is to use database locks.
This is called pessimistic concurrency. For example, before you read a
row from a database, you request a lock for read-only or for update
access. If you lock a row for update access, no other users are
allowed to lock the row either for read-only or update access, because
they would get a copy of data that's in the process of being changed.
If you lock a row for read-only access, others can also lock it for
read-only access but not for update. Managing locks has some
disadvantages. It can be complex to program. It requires significant
database management resources, and it can cause performance problems
as the number of users of an application increases (that is, it
doesn't scale well). For these reasons, not all database management
systems support pessimistic concurrency. The Entity Framework provides
no built-in support for it, and this tutorial doesn't show you how to
implement it.
Optimistic Concurrency
The alternative to pessimistic concurrency is optimistic concurrency.
Optimistic concurrency means allowing concurrency conflicts to happen,
and then reacting appropriately if they do. For example, John runs the
Departments Edit page, changes the Budget amount for the English
department from $350,000.00 to $100,000.00. (John administers a
competing department and wants to free up money for his own
department.)*
There are code examples for both models in the in the tutorial.

Related

Disabling MVCC in Postgres

I've decades of experience with MSSQL but none with Postgres and its MVCC style of concurrency control.
In MSSQL if I had a very large dataset which was read-only, I would set the database to read-only (for safety) and use transaction isolation level read uncommitted, and that should avoid lock contention, which the dataset didn't need anyway.
In Postgres, is there some equivalent? Some way of setting a database to read-only and reassuring PG that is completely safe not to use MVCC, just read without making row copies? Because it seems that MVCC has some considerable overhead which for multiple readers of very large passive data sets seems potentially expensive.
Edit: comments say I misunderstand that copies are only made when writing occurs, not reading as I assumed.
"MVCC" stands for "Multiversion Concurrency Control". Multiple versions of the same table row are only spawned by write activity (mostly UPDATE).
If your database is read-only - enforced or voluntarily, all the same for the purpose of this question - then there cannot be multiple versions of a row, ever. And the question is moot.
No, there is no way to do that, and there is no reason for it either.
Since PostgreSQL, writers will never block readers and vice versa, precisely because of its MVCC implementation that you want to disable. So there is no need for the unsavory crutch of reading uncommitted data.

In databases, is row level locking an example of ACID, optimistic concurrency, or both?

simultaneous writes
Also what happens in a nosql database?
I'll ignore the NoSQL part, otherwise I would have to close the question as too unfocused.
Row level locking is a technique that relational databases use to provide isolation, which is the I of ACID. Isolation means that concurrent database sessions are isolated from each other – the database tries to keep them from being influenced by each other's activities.
Specifically, if two concurrent sessions try to modify the same data row, they have to “take turns”: the second one has to wait until the transaction of the first session is done. This wait is usually very short and does not hurt, but it prevents inconsisiencies (consistency is the C of ACID).
Row level locking, and locking in general, are part of pessimistic locking: you lock a row to prevent other sessions from messing with the row while you are working on it. It is done with SELECT ... FOR UPDATE. It is called “pessimistic” because it reflects a mindset like “I expect someone will try to modify the row while I am working on it, so let's lock it to be sure”.
Optimistic locking is ill-named, because no locks are actually taken. You don't prevent concurrent transactions from modifying the row you are interested in. Instead you check afterwards if the row has been modified by a concurrent transaction or not, and if it has, you try the operation again.

Making multiple users access to PSQL database

I'm a rookie in this topic, all I ever did was making a connection to database for one user, so I'm not familiar with making multiple user access to database.
My case is: 10 facilities will use my program for recording when workers are coming and leaving, the database will be on the main server and all I made was one user while I was programming/testing that program. My question is: Can multiple remote locations use one user for database to connect (there should be no collision because they are all writing different stuff, but at the same tables) and if that's not the case, what should I do?
Good relational databases handle this quite well, it is the “I” in the the so-called ACID properties of transactions in relational databases; it stands for isolation.
Concurrent processes are protected from simultaneously writing the same table row by locks that block other transactions until one transaction is done writing.
Readers are protected from concurrent writing by means of multiversion concurrency control (MVCC), which keeps old versions of the data around to serve readers without blocking anybody.
If you have enclosed all data modifications that belong together into a transaction, so that they happen atomically (the “A” in ACID), and your transactions are simple and short, your application will probably work just fine.
Problems may arise if these conditions are not satisfied:
If your data modifications are not protected by transactions, a concurrent session may see intermediate, incomplete results of a different session and thus work with inconsistent data.
If your transactions are complicated, later statements inside a transaction may rely on results of previous statements in indirect ways. This assumption can be broken by concurrent activity that modifies the data. There are three approaches to that:
Pessimistic locking: lock all data the first time you use them with something like SELECT ... FOR UPDATE so that nobody can modify them until your transaction is done.
Optimistic locking: don't lock, but whenever you access the data a second time, check that nobody else has modified them in the meantime. If that has been the case, roll the transaction back and try it again.
Use high transaction isolation levels like REPEATABLE READ and SERIALIZABLE which give better guarantees that the data you are using don't get modified concurrently. You have to be prepared to receive serialization errors if the database cannot keep the guarantees, in which case you have to roll the transaction back and retry it.
These techniques achieve the same goal in different ways. The discussion when to use which one exceeds the scope of this answer.
If your transactions are complicated and/or take a long time (long transactions are to be avoided as much as possible, because they cause all kinds of problems in a database), you may encounter a deadlock, which is two transactions locking each other in a kind of “deadly embrace”.
The database will detect this condition and interrupt one of the transactions with an error.
There are two ways to deal with that:
Avoid deadlocks by always locking resources in a certain order (e.g., always update the account with the lower account number first).
When you encounter a deadlock, your code has to retry the transaction.
Contrary to common believe, a deadlock is not necessarily a bug.
I recommend that you read the chapter about concurrency control in the PostgreSQL documentation.

Manually lock for MongoDB

I have these operations:
Find a doc from collection.
Manipulate doc.prop base on it's current value, which "prop" is a string.
Update doc back to collection.
So in this case, I have to make sure these operations are atomic, because updating doc.prop must base on the current value.
Here are two approaches:
1. Add "valueKey"(Number) property in doc, make sure valueKey is matched when updating doc. Increase valueKey after updated. If valueKey is not matched, mark this update as failure and retry again.
2. Use "fsyncLock" provided by MongoDB to lock the whole mongod instance, during the operations.
The 1st approach I mentioned above is well, but when facing huge volume of these operations at the same time, the "failure" and "retry" would be frequent.
The 2nd approach, which I haven't tried, I think it is for backing up database and is not good in this case.
So I'm wondering is there any other efficient approach?
The first approach is called an optimistic lock. Optimistic locks assume that the probability of collision is low, otherwise, as you already pointed out, there are a lot of retries. Those retries can also be destructive - if a text is edited, it might make sense to merge the edits, but it hardly ever makes sense for a phone number.
Locking the entire database is an extreme form of a pessimistic (offline) lock, where the concurrency of the system is deliberately reduced. However, that has problems because clients don't know what's going on - their edits will simply fail which is worst-of-a-kind user experience.
So pessimistic locks really only make sense if clients have a chance of actually knowing that something is locked. For instance, you'd somehow need to inform the user that it's not possible to edit the item she wants to edit, because someone else is already in edit-mode for that item. This also has problems, especially if another user left the screen and is blocking all other users.
If you wanted to go for a pessimistic lock, however, that should absolutely never be implemented by something like a global database lock: simply lock the item itself and implement the business logic for the locking in your code.
Morale: This isn't a technology problem, it's a logical problem. Google Docs demonstrates one way to allow concurrent editing of multiple users, but it's hard to implement, has limited use in other types of applications and is still deemed annoying by some users. Git and the likes show another method, where the logic of branches, merging and conflicts is exposed to the user as well, but asynchronously (multi-version concurrency control).

Why using Locking in MongoDB?

MoongoDB is from the NoSql era, and Lock is something related to RDBMS? from Wikipedia:
Optimistic concurrency control (OCC) is a concurrency control method for relational database management systems...
So why do i find in PyMongo is_locked , and even in driver that makes non-blocking calls, Lock still exists, Motor has is_locked.
NoSQL does not mean automatically no locks.
There always some operations that do require a lock.
For example building of index
And official MongoDB documentation is a more reliable source than wikipedia(none offense meant to wikipedia :) )
http://docs.mongodb.org/manual/faq/concurrency/
Mongo does in-place updates, so it needs to lock in order to modify the database. There are other things that need locks, so read the link #Tigra provided for more info.
This is pretty standard as far as databases and it isn't an RDBMS-specific thing (Redis also does this, but on a per-key basis).
There are plans to implement collection-level (instead of database-level) locking: https://jira.mongodb.org/browse/SERVER-1240
Some databases, like CouchDB, get around the locking problem by only appending new documents. They create a new, unique revision id and once the document is finished writing, the database points to the new revision. I'm sure there's some kind of concurrency control when changing which revision is used, but it doesn't need to block the database to do that. There are certain downsides to this, such as compaction needing to be run regularly.
MongoDB implements a Database level locking system. This means that operations which are not atomic will lock on a per database level, unlike SQL whereby most techs lock on a table level for basic operations.
In-place updates only occur on certain operators - $set being one of them, MongoDB documentation did used to have a page that displayed all of them but I can't find it now.
MongoDB currently implements a read/write lock whereby each is separate but they can block each other.
Locks are utterly vital to any database, for example, how can you ensure a consistent read of a document if it is currently being written to? And if you write to the document how do you ensure that you only apply that single update at once and not multiple updates at the same time?
I am unsure how version control can stop this in CouchDB, locks are really quite vital for a consistent read and are separate to version control, i.e. what if you wish to apply a read lock to the same version or read a document that is currently being written to a new revision? You will obviously see a lock queue appear. Even though version control might help a little with write lock saturation there will still be a write lock and it will still need to work on a level.
As for concurrency features; MongoDB has the ability (for one), if the data is not in RAM, to subside a operation for other operations. This means that locks will not just sit there waiting for data to be paged in and other operations will run in the mean time.
As a side note, MongoDB actually has more locks than this, it also has a JavaScript lock which is global and blocking, it does not have the normal concurrency features of regular locks.
and even in driver that makes non-blocking calls
Hmm I think you might be confused by what is meant as a "non-blocking" application or server: http://en.wikipedia.org/wiki/Non-blocking_algorithm