In databases, is row level locking an example of ACID, optimistic concurrency, or both? - postgresql

simultaneous writes
Also what happens in a nosql database?

I'll ignore the NoSQL part, otherwise I would have to close the question as too unfocused.
Row level locking is a technique that relational databases use to provide isolation, which is the I of ACID. Isolation means that concurrent database sessions are isolated from each other – the database tries to keep them from being influenced by each other's activities.
Specifically, if two concurrent sessions try to modify the same data row, they have to “take turns”: the second one has to wait until the transaction of the first session is done. This wait is usually very short and does not hurt, but it prevents inconsisiencies (consistency is the C of ACID).
Row level locking, and locking in general, are part of pessimistic locking: you lock a row to prevent other sessions from messing with the row while you are working on it. It is done with SELECT ... FOR UPDATE. It is called “pessimistic” because it reflects a mindset like “I expect someone will try to modify the row while I am working on it, so let's lock it to be sure”.
Optimistic locking is ill-named, because no locks are actually taken. You don't prevent concurrent transactions from modifying the row you are interested in. Instead you check afterwards if the row has been modified by a concurrent transaction or not, and if it has, you try the operation again.

Related

Lightest lock for exclusive inserts in PostgreSQL

I want to limit INSERTs to whichever transaction gets the lock (with the others waiting in line, not failing) while allowing concurrent reads, updates, and deletes (but obviously not of the data being inserted, which is impossible in PG anyway).
What is the lightest LOCK to achieve this?
Yes, if you want to lock a table against all concurrent modifications, SHARE ROW EXCLUSIVE is the cheapest lock.
I'm not going to ask why you want to restrict concurrency in that way...

Making multiple users access to PSQL database

I'm a rookie in this topic, all I ever did was making a connection to database for one user, so I'm not familiar with making multiple user access to database.
My case is: 10 facilities will use my program for recording when workers are coming and leaving, the database will be on the main server and all I made was one user while I was programming/testing that program. My question is: Can multiple remote locations use one user for database to connect (there should be no collision because they are all writing different stuff, but at the same tables) and if that's not the case, what should I do?
Good relational databases handle this quite well, it is the “I” in the the so-called ACID properties of transactions in relational databases; it stands for isolation.
Concurrent processes are protected from simultaneously writing the same table row by locks that block other transactions until one transaction is done writing.
Readers are protected from concurrent writing by means of multiversion concurrency control (MVCC), which keeps old versions of the data around to serve readers without blocking anybody.
If you have enclosed all data modifications that belong together into a transaction, so that they happen atomically (the “A” in ACID), and your transactions are simple and short, your application will probably work just fine.
Problems may arise if these conditions are not satisfied:
If your data modifications are not protected by transactions, a concurrent session may see intermediate, incomplete results of a different session and thus work with inconsistent data.
If your transactions are complicated, later statements inside a transaction may rely on results of previous statements in indirect ways. This assumption can be broken by concurrent activity that modifies the data. There are three approaches to that:
Pessimistic locking: lock all data the first time you use them with something like SELECT ... FOR UPDATE so that nobody can modify them until your transaction is done.
Optimistic locking: don't lock, but whenever you access the data a second time, check that nobody else has modified them in the meantime. If that has been the case, roll the transaction back and try it again.
Use high transaction isolation levels like REPEATABLE READ and SERIALIZABLE which give better guarantees that the data you are using don't get modified concurrently. You have to be prepared to receive serialization errors if the database cannot keep the guarantees, in which case you have to roll the transaction back and retry it.
These techniques achieve the same goal in different ways. The discussion when to use which one exceeds the scope of this answer.
If your transactions are complicated and/or take a long time (long transactions are to be avoided as much as possible, because they cause all kinds of problems in a database), you may encounter a deadlock, which is two transactions locking each other in a kind of “deadly embrace”.
The database will detect this condition and interrupt one of the transactions with an error.
There are two ways to deal with that:
Avoid deadlocks by always locking resources in a certain order (e.g., always update the account with the lower account number first).
When you encounter a deadlock, your code has to retry the transaction.
Contrary to common believe, a deadlock is not necessarily a bug.
I recommend that you read the chapter about concurrency control in the PostgreSQL documentation.

postgresql concurrent queries as stored procedures

I have 2 stored procedures that interact with the same datatables.
first executes for several hours and second one is instant.
So if I run first one, and after that second one (second connection) then the second procedure will wait for the first one to end.
It is harmless for my data if both can run at the same time, how to do that?
The fact that the shorter query is blocked while being on a second connection suggests that the longer query is getting an exclusive lock on the table during the query.
That suggests it is doing writes, as if they were both reads there shouldn't be any locking issues. PgAdmin can show what locks are active during the longer query and also if the shorter query is indeed blocked on the longer one.
If the longer query is indeed doing writes, it's possible that you may be able to reduce the lock contention -- by chunking it, for example, which could allow readers in between chunked updates/inserts -- but if it's an operation that requires an exclusive write lock, then it will block everybody until it's done.
It's also possible that you may be able to optimize the query such that it needs to be a lower-level lock that isn't exclusive, but that would all depend on the specifics of what the query is doing and your data.

PostgreSQL Lock Row on indefinitely time

I wanna lock one row by some user until he work with this row on indefinitely time and he must unlock it when done. So any others users will not be able to lock this row for yourself. It is possible to do on data base level?
You can do it with a long-lived transaction, but there'll be performance issues with that. This sounds like more of a job for optimistic concurrency control.
You can just open a transaction and do a SELECT 1 FROM mytable WHERE clause to match row FOR UPDATE;. Then keep the transaction open until you're done. The problem with this is that it can cause issues with vacuum that result in table and index bloat, where tables get filled with deleted data and indexes fill up with entries pointing to obsolete blocks.
It'd be much better to use an advisory lock. You still have to hold the connection the holds the lock open, but it doesn't have to keep an open idle transaction, so it's much lower impact. Transactions that wish to update the row must explicitly check for a conflicting advisory lock, though, otherwise they can just proceed as if it wasn't locked. This approach also scales poorly to lots of tables (due to limited advisory lock namespace) or lots of concurrent locks (due to number of connections).
You can use a trigger to check for the advisory lock and wait for it if you can't make sure your client apps will always get the advisory lock explicitly. However, this can create deadlock issues.
For that reason, the best approach is probably to have a locked_by field that records a user ID, and a locked_time field that records when it was locked. Do it at the application level and/or with triggers. To deal with concurrent attempts to obtain the lock you can use optimistic concurrency control techniques, where the WHERE clause on the UPDATE that sets locked_by and locked_time will not match if someone else gets there first, so the rowcount will be zero and you'll know you lost the race for the lock and have to re-check. That WHERE clause usually tests locked_by and locked_time. So you'd write something like:
UPDATE t
SET locked_by = 'me' AND locked_time = current_timestamp
WHERE locked_by IS NULL AND locked_time IS NULL
AND id = [ID of row to update];
(This is a simplified optimistic locking mode for grabbing a lock, where you don't mind if someone else jumped in and did an entire transaction. If you want stricter ordering, you use a row-version column or you check that a last_modified column hasn't changed.)

Concurrency, Atomicty, and Isolation in Entity Framework

Based on some periodically and concurrently incoming data, I'm performing an operation that will either insert a new row into a table, or update an existing row in the same table. Whether it inserts or updates a row is dependent on the states of the existing rows. So, the result of this operation will be affected by previous runs of this operation, and affect subsequent runs. I need to ensure atomicity/isolation using transactions, or locks, or something. There seems to be so many options and caveats with Entity Framework (and I'm a complete newbie with database stuff in general too) that I have no idea what direction I should be headed. TransactionScope, BeginTransaction, ambient transactions? Serializable or RepeatableRead? SaveChanges and AcceptAllChanges? Do I even need to do anything special? The fact that a new row can be added makes me worry especially about phantom rows, though I barely understand what that means. Any guidance on the subject would be greatly appreciated.
This tutorial may be helpful to you - http://www.asp.net/mvc/tutorials/getting-started-with-ef-using-mvc/handling-concurrency-with-the-entity-framework-in-an-asp-net-mvc-application
Quote:
Pessimistic Concurrency (Locking)
If your application does need to prevent accidental data loss in
concurrency scenarios, one way to do that is to use database locks.
This is called pessimistic concurrency. For example, before you read a
row from a database, you request a lock for read-only or for update
access. If you lock a row for update access, no other users are
allowed to lock the row either for read-only or update access, because
they would get a copy of data that's in the process of being changed.
If you lock a row for read-only access, others can also lock it for
read-only access but not for update. Managing locks has some
disadvantages. It can be complex to program. It requires significant
database management resources, and it can cause performance problems
as the number of users of an application increases (that is, it
doesn't scale well). For these reasons, not all database management
systems support pessimistic concurrency. The Entity Framework provides
no built-in support for it, and this tutorial doesn't show you how to
implement it.
Optimistic Concurrency
The alternative to pessimistic concurrency is optimistic concurrency.
Optimistic concurrency means allowing concurrency conflicts to happen,
and then reacting appropriately if they do. For example, John runs the
Departments Edit page, changes the Budget amount for the English
department from $350,000.00 to $100,000.00. (John administers a
competing department and wants to free up money for his own
department.)*
There are code examples for both models in the in the tutorial.