DB2 Read committed without locking? - db2

We have a transaction that is modifying a record. The transaction must call a web service, rolling back the transaction if the service fails (so it can't commit it before hand). Because the record is modified, the client app has a lock on it. However, the web service must retrieve that record to get information from it as part of it's processing. Bam, deadlock.
We use websphere, which, for reasons that boggle my mind, defaults to repeatable read isolation level. We knocked it down to read_committed, thinking that this would retrieve the row without seeking a lock. In our dev environment, it seemed to work, but in staging we're getting deadlocks.
I'm not asking why it behaved differently, we probably made a mistake somewhere. Nor am I asking about the specifics of the web service example above, because obviously this same thing could happen elsewhere.
But based on reading the docs, it seems like read_committed DOES acquire a shared lock during read, and as a result will wait for an exclusive lock held by another transaction (in this case the client app). But I don't want to go to read_uncommitted isolation level because I don't want dirty reads. Is there a less extreme solution? I need some middle ground where I can perform reads without any lock-waiting, and retrieve only committed data.
Is there such a goldilocks solution? Not too deadlock-y, not too dirty-read-y? If not in siolation level, maybe some modifier I can tack onto my SQL? Anything?

I assume you are talking jdbc isolation levels, and not db2. The difference between read_committed (cursor stability in db2) and repeatable_read (read stability) is how long the share locks are kept. repeatable_read keeps every lock that satisfied the predicates, read_committed on the other hand only keeps the lock until another row that matches the predicate is found.
Have you compared the plans? If the plans are different you may end up with different behaviour.
Are there any escalations occurring?
Have you tried CURRENTLY_COMMITTED (assuming you are on 9.7+)?
Pre currently_committed there where the following settings, DB2_SKIPINSERTED, DB2_EVALUNCOMMITTED and DB2_SKIPDELETED

The lowest isolation level that reads committed rows is read committed.
Usually, you process rows in a DB2 database like this:
Read database row with no read locks (ordinary SELECT with read committed).
Process data so you have a row with changed values.
Read database row again, with a read lock. (SELECT for UPDATE)
Check to see the database row in 1. matches the database row in 3.
If rows match, update database row.
If rows don't match, release update lock and go back to 2.

Related

DB2 for z/OS: CURSOR FOR UPDATE locking behavior

I have a question concerning the FOR UPDATE Clause in CURSORs for IBM DB2 for z/OS.
Assume Isolation Level Cursor Stability (standard parameter in BIND command).
DB2 Version is 11.
My first question is: can a CURSOR that is coded with the FOR UPDATE clause prevent concurrent transactions form reading the row on which the CURSOR is currently positioned on?
My second question is: does the UPDATE ... WHERE CURRENT OF ... statement detect when the updated row has been changed after the CURSOR has been opened and before it has been fetched from the CURSORs resultset?
I have read some contradictory statements on the web regarding these questions.
As of my (current) understanding, the FETCH operation only aquires an update lock on the fetched row, so concurrent transactions can at least read the same row. The U-Lock is only promoted to an X-Lock in case the UPDATE WHERE CURRENT OF CURSOR is actually done (dependent on application logic). But this confuses me, because it then would not prevent a lost update phenomenon (when the concurrent process is allowed to read the value before the update in the first process is done it continues its processing with the old value and overwrites the update of the first process which has updated via CURRENT OF CURSOR).
Can a cursor that is coded with the FOR UPDATE clause prevent concurrent transactions from reading the row on which the cursor is currently positioned?
No - with isolation level CS, Db2 will hold a U lock on the current row which is compatible with the S locks potentially required (see later comments about the CURRENTDATA bind parameter and it's impact on avoidance of the S lock for readers).
Does the UPDATE ... WHERE CURRENT OF statement detect when the updated row has been changed after the cursor has been opened and before it has been fetched from the CURSORs result set?
No - with isolation level CS, Db2 will not acquire a lock until the row is read. If you require the data to remain unchanged after OPEN CURSOR you need a different isolation level.
But this confuses me, because it then would not prevent a lost update phenomenon (when the concurrent process is allowed to read the value before the update in the first process is done it continues its processing with the old value and overwrites the update of the first process which has updated via CURRENT OF CURSOR).
Assuming both transactions are using FOR UPDATE and UPDATE ... WHERE CURRENT OF this scenario cannot happen. Each read would attempt to acquire a U lock. Since U locks are incompatible with each other the second read would wait on the first U lock to be released. (https://www.ibm.com/docs/en/db2-for-zos/12?topic=locks-lock-modes-compatibility)
For the more complex case where one (or both) of the transactions are not using FOR UPDATE and UPDATE ... WHERE CURRENT OF there are opportunities for the lost update phenomenon to occur.
Long ago, Db2 introduced bind parameter CURRENTDATA to help control this behavior.
CURRENTDATA(NO) (default as of Db2 10) - Attempt lock avoidance where possible but with an increased risk of obtaining non-current data
CURRENTDATA(YES) - Acquire S locks to reduce the risk of obtaining non-current data. It's important to note that CURRENTDATA(YES) does not completely eliminate the risk of non-current data.
Db2 manual - Choosing CURRENTDATA Option
Gareth has some great articles on this with much more detail - Db2 for z/OS Locking for Application Developers Part 8
To completely guard against the risk of losing an update, a good approach is to add predicates to ensure the update only occurs against the expected data. Gareth provides three options for this in Part 9 of his blog on locking. The general idea is to have something like an update timestamp that is always updated when any part of the row is updated. Then include a predicate in the WHERE clause of the UPDATE statement to ensure that the update will only occur if the update timestamp is the same as when the row was originally read. The ROW CHANGE TIMESTAMP feature in Db2 9 makes this approach easier.

In databases, is row level locking an example of ACID, optimistic concurrency, or both?

simultaneous writes
Also what happens in a nosql database?
I'll ignore the NoSQL part, otherwise I would have to close the question as too unfocused.
Row level locking is a technique that relational databases use to provide isolation, which is the I of ACID. Isolation means that concurrent database sessions are isolated from each other – the database tries to keep them from being influenced by each other's activities.
Specifically, if two concurrent sessions try to modify the same data row, they have to “take turns”: the second one has to wait until the transaction of the first session is done. This wait is usually very short and does not hurt, but it prevents inconsisiencies (consistency is the C of ACID).
Row level locking, and locking in general, are part of pessimistic locking: you lock a row to prevent other sessions from messing with the row while you are working on it. It is done with SELECT ... FOR UPDATE. It is called “pessimistic” because it reflects a mindset like “I expect someone will try to modify the row while I am working on it, so let's lock it to be sure”.
Optimistic locking is ill-named, because no locks are actually taken. You don't prevent concurrent transactions from modifying the row you are interested in. Instead you check afterwards if the row has been modified by a concurrent transaction or not, and if it has, you try the operation again.

In Postgres, what does pg_stat_database.xact_commit actually mean?

I'm trying to understand SELECT xact_commit FROM pg_stat_database; According to docs, it is "Number of transactions in this database that have been committed". But I turned on logging all queries (log_min_duration = 0) and it seems there are other things besides that can affect xact_commit than just a query. For example, connecting a psql client or typing BEGIN; will increase it by various values. There is a step in my application that runs a single query (as confirmed by the log), but consistently increases the counter by 15-20. Does anyone know anything more specific about what is counted in xact_commit, or if there is a way to count only actual queries?
pg_stat_database.xact_commit really is the number of commits in the database (remember that every statement that is not run in a transaction block actually runs in its own little transaction, so it will cause a commit).
The mystery that remains to be solved is why you see more commits than statements, which seems quite impossible (For example, BEGIN starts a transaction, so by definition it cannot increase xact_commit).
The solution is probably that database activity statistics are collected asynchronously: they are sent to the statistics collector process via an UDP socket, and the statistics collector eventually updates the statistics.
So my guess is that the increased transaction count you see is actually from earlier activities.
Try keeping the database absolutely idle for a while and then try again, then you should see a slower increase.

Making multiple users access to PSQL database

I'm a rookie in this topic, all I ever did was making a connection to database for one user, so I'm not familiar with making multiple user access to database.
My case is: 10 facilities will use my program for recording when workers are coming and leaving, the database will be on the main server and all I made was one user while I was programming/testing that program. My question is: Can multiple remote locations use one user for database to connect (there should be no collision because they are all writing different stuff, but at the same tables) and if that's not the case, what should I do?
Good relational databases handle this quite well, it is the “I” in the the so-called ACID properties of transactions in relational databases; it stands for isolation.
Concurrent processes are protected from simultaneously writing the same table row by locks that block other transactions until one transaction is done writing.
Readers are protected from concurrent writing by means of multiversion concurrency control (MVCC), which keeps old versions of the data around to serve readers without blocking anybody.
If you have enclosed all data modifications that belong together into a transaction, so that they happen atomically (the “A” in ACID), and your transactions are simple and short, your application will probably work just fine.
Problems may arise if these conditions are not satisfied:
If your data modifications are not protected by transactions, a concurrent session may see intermediate, incomplete results of a different session and thus work with inconsistent data.
If your transactions are complicated, later statements inside a transaction may rely on results of previous statements in indirect ways. This assumption can be broken by concurrent activity that modifies the data. There are three approaches to that:
Pessimistic locking: lock all data the first time you use them with something like SELECT ... FOR UPDATE so that nobody can modify them until your transaction is done.
Optimistic locking: don't lock, but whenever you access the data a second time, check that nobody else has modified them in the meantime. If that has been the case, roll the transaction back and try it again.
Use high transaction isolation levels like REPEATABLE READ and SERIALIZABLE which give better guarantees that the data you are using don't get modified concurrently. You have to be prepared to receive serialization errors if the database cannot keep the guarantees, in which case you have to roll the transaction back and retry it.
These techniques achieve the same goal in different ways. The discussion when to use which one exceeds the scope of this answer.
If your transactions are complicated and/or take a long time (long transactions are to be avoided as much as possible, because they cause all kinds of problems in a database), you may encounter a deadlock, which is two transactions locking each other in a kind of “deadly embrace”.
The database will detect this condition and interrupt one of the transactions with an error.
There are two ways to deal with that:
Avoid deadlocks by always locking resources in a certain order (e.g., always update the account with the lower account number first).
When you encounter a deadlock, your code has to retry the transaction.
Contrary to common believe, a deadlock is not necessarily a bug.
I recommend that you read the chapter about concurrency control in the PostgreSQL documentation.

mongo save documents in monotically increasing sequence

I know mongo docs provide a way to simulate auto_increment.
http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/
But it is not concurrency-proof as guaranteed by say MySQL.
Consider the sequence of events:
client 1 obtains an index of 1
client 2 obtains an index of 2
client 2 saves doc with id=2
client 1 saves doc with id=1
In this case, it is possible to save a doc with id less than the current max that is already saved. For MySql, this can never happen since auto increment id is assigned by the server.
How do I prevent this? One way is to do optimistic looping at each client, but for many clients, this will result in heavy contention. Any other better way?
The use case for this is to ensure id is "forward-only". This is important for say a chat room where many messages are posted, and messages are paginated, I do not want new messages to be inserted in a previous page.
But it is not concurrency-proof as guaranteed by say MySQL.
That depends on the definition of concurrency-proof, but let's see
In this case, it is possible to save a doc with id less than the current max that is already saved.
That is correct, but it depends on the definition of simultaneity and monotonicity. Let's say your code snapshots the state of some other part of the system, then fetches the monotonic key, then performs an insert that may take a while. In that case, this apparently non-monotonic insert might actually be 'more monotonic' in the sense that index 2 was indeed captured at a later time, possibly reflecting a more recent state. In other words: does the time it took to insert really matter?
For MySql, this can never happen since auto increment id is assigned by the server.
That sounds like folklore. Most relational dbs offer fine-grained control over these features, since strict guarantees severely impact concurrency.
MySQL does neither guarantee that there are no gaps, nor that a transaction with a high AUTO_INCREMENT id isn't visible to other readers before a transaction that acquired a lower AUTO_INCREMENT value was committed, unless you keep a table-level lock, which severely impacts concurrency.
For gaplessness, consider a transaction rollback of the first of two concurrent inserts. Does the second insert now get a new id assigned while it's being committed? No - from the InnoDB documentation:
You may see gaps in the sequence of values assigned to the AUTO_INCREMENT column if you roll back transactions that have generated numbers using the counter. (see end of 14.6.5.5.1, "Traditional InnoDB Auto-Increment Locking")
and
In all lock modes (0, 1, and 2), if a transaction that generated auto-increment values rolls back, those auto-increment values are “lost”
also, you're completely ignoring the problem of replication where sequences lead to even more trouble:
Thus, table-level locks held until the end of a statement make INSERT statements using auto-increment safe for use with statement-based replication. However, those locks limit concurrency and scalability when multiple transactions are executing insert statements at the same time. (see 14.6.5.5.2 "Configurable InnoDB Auto-Increment Locking")
The sheer length of the documentation of the InnoDB behavior is a reminder of the true complexity of making apparently simple guarantees in a concurrent system. Yes, monotonicity of inserts is possible with table-level locks, but hardly desirable. If you take a distributed view of the system, things get worse, because we can't even be sure of the counter value in partition mode...