what is visibility check in index scan - postgresql

I am looking up query optimization in Postgres.
I don't understand this statement:
Index scans involve random disk access and still have to read the underlying data blocks for visibility checks.
what does "visibility check" mean here?

PostgreSQL uses a technique called Multi-Version Concurrency Control for managing concurrent access to data. Data is not visible until the transaction that inserted it commits. Under other cases, the data is silently ignored for other transactions so they don't see it (except in rare cases, explicit locks, or higher isolations levels).
What this means is that PostgreSQL must check the transaction id's of the actual rows to make sure they are visible for all transactions. Now, 9.2 (iirc) and higher allow PostgreSQL to skip this check if all tuples in a page are visible. However otherwise it has to check per row.

Related

In databases, is row level locking an example of ACID, optimistic concurrency, or both?

simultaneous writes
Also what happens in a nosql database?
I'll ignore the NoSQL part, otherwise I would have to close the question as too unfocused.
Row level locking is a technique that relational databases use to provide isolation, which is the I of ACID. Isolation means that concurrent database sessions are isolated from each other – the database tries to keep them from being influenced by each other's activities.
Specifically, if two concurrent sessions try to modify the same data row, they have to “take turns”: the second one has to wait until the transaction of the first session is done. This wait is usually very short and does not hurt, but it prevents inconsisiencies (consistency is the C of ACID).
Row level locking, and locking in general, are part of pessimistic locking: you lock a row to prevent other sessions from messing with the row while you are working on it. It is done with SELECT ... FOR UPDATE. It is called “pessimistic” because it reflects a mindset like “I expect someone will try to modify the row while I am working on it, so let's lock it to be sure”.
Optimistic locking is ill-named, because no locks are actually taken. You don't prevent concurrent transactions from modifying the row you are interested in. Instead you check afterwards if the row has been modified by a concurrent transaction or not, and if it has, you try the operation again.

mongo save documents in monotically increasing sequence

I know mongo docs provide a way to simulate auto_increment.
http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/
But it is not concurrency-proof as guaranteed by say MySQL.
Consider the sequence of events:
client 1 obtains an index of 1
client 2 obtains an index of 2
client 2 saves doc with id=2
client 1 saves doc with id=1
In this case, it is possible to save a doc with id less than the current max that is already saved. For MySql, this can never happen since auto increment id is assigned by the server.
How do I prevent this? One way is to do optimistic looping at each client, but for many clients, this will result in heavy contention. Any other better way?
The use case for this is to ensure id is "forward-only". This is important for say a chat room where many messages are posted, and messages are paginated, I do not want new messages to be inserted in a previous page.
But it is not concurrency-proof as guaranteed by say MySQL.
That depends on the definition of concurrency-proof, but let's see
In this case, it is possible to save a doc with id less than the current max that is already saved.
That is correct, but it depends on the definition of simultaneity and monotonicity. Let's say your code snapshots the state of some other part of the system, then fetches the monotonic key, then performs an insert that may take a while. In that case, this apparently non-monotonic insert might actually be 'more monotonic' in the sense that index 2 was indeed captured at a later time, possibly reflecting a more recent state. In other words: does the time it took to insert really matter?
For MySql, this can never happen since auto increment id is assigned by the server.
That sounds like folklore. Most relational dbs offer fine-grained control over these features, since strict guarantees severely impact concurrency.
MySQL does neither guarantee that there are no gaps, nor that a transaction with a high AUTO_INCREMENT id isn't visible to other readers before a transaction that acquired a lower AUTO_INCREMENT value was committed, unless you keep a table-level lock, which severely impacts concurrency.
For gaplessness, consider a transaction rollback of the first of two concurrent inserts. Does the second insert now get a new id assigned while it's being committed? No - from the InnoDB documentation:
You may see gaps in the sequence of values assigned to the AUTO_INCREMENT column if you roll back transactions that have generated numbers using the counter. (see end of 14.6.5.5.1, "Traditional InnoDB Auto-Increment Locking")
and
In all lock modes (0, 1, and 2), if a transaction that generated auto-increment values rolls back, those auto-increment values are “lost”
also, you're completely ignoring the problem of replication where sequences lead to even more trouble:
Thus, table-level locks held until the end of a statement make INSERT statements using auto-increment safe for use with statement-based replication. However, those locks limit concurrency and scalability when multiple transactions are executing insert statements at the same time. (see 14.6.5.5.2 "Configurable InnoDB Auto-Increment Locking")
The sheer length of the documentation of the InnoDB behavior is a reminder of the true complexity of making apparently simple guarantees in a concurrent system. Yes, monotonicity of inserts is possible with table-level locks, but hardly desirable. If you take a distributed view of the system, things get worse, because we can't even be sure of the counter value in partition mode...

DB2 Read committed without locking?

We have a transaction that is modifying a record. The transaction must call a web service, rolling back the transaction if the service fails (so it can't commit it before hand). Because the record is modified, the client app has a lock on it. However, the web service must retrieve that record to get information from it as part of it's processing. Bam, deadlock.
We use websphere, which, for reasons that boggle my mind, defaults to repeatable read isolation level. We knocked it down to read_committed, thinking that this would retrieve the row without seeking a lock. In our dev environment, it seemed to work, but in staging we're getting deadlocks.
I'm not asking why it behaved differently, we probably made a mistake somewhere. Nor am I asking about the specifics of the web service example above, because obviously this same thing could happen elsewhere.
But based on reading the docs, it seems like read_committed DOES acquire a shared lock during read, and as a result will wait for an exclusive lock held by another transaction (in this case the client app). But I don't want to go to read_uncommitted isolation level because I don't want dirty reads. Is there a less extreme solution? I need some middle ground where I can perform reads without any lock-waiting, and retrieve only committed data.
Is there such a goldilocks solution? Not too deadlock-y, not too dirty-read-y? If not in siolation level, maybe some modifier I can tack onto my SQL? Anything?
I assume you are talking jdbc isolation levels, and not db2. The difference between read_committed (cursor stability in db2) and repeatable_read (read stability) is how long the share locks are kept. repeatable_read keeps every lock that satisfied the predicates, read_committed on the other hand only keeps the lock until another row that matches the predicate is found.
Have you compared the plans? If the plans are different you may end up with different behaviour.
Are there any escalations occurring?
Have you tried CURRENTLY_COMMITTED (assuming you are on 9.7+)?
Pre currently_committed there where the following settings, DB2_SKIPINSERTED, DB2_EVALUNCOMMITTED and DB2_SKIPDELETED
The lowest isolation level that reads committed rows is read committed.
Usually, you process rows in a DB2 database like this:
Read database row with no read locks (ordinary SELECT with read committed).
Process data so you have a row with changed values.
Read database row again, with a read lock. (SELECT for UPDATE)
Check to see the database row in 1. matches the database row in 3.
If rows match, update database row.
If rows don't match, release update lock and go back to 2.

postgresql concurrent queries as stored procedures

I have 2 stored procedures that interact with the same datatables.
first executes for several hours and second one is instant.
So if I run first one, and after that second one (second connection) then the second procedure will wait for the first one to end.
It is harmless for my data if both can run at the same time, how to do that?
The fact that the shorter query is blocked while being on a second connection suggests that the longer query is getting an exclusive lock on the table during the query.
That suggests it is doing writes, as if they were both reads there shouldn't be any locking issues. PgAdmin can show what locks are active during the longer query and also if the shorter query is indeed blocked on the longer one.
If the longer query is indeed doing writes, it's possible that you may be able to reduce the lock contention -- by chunking it, for example, which could allow readers in between chunked updates/inserts -- but if it's an operation that requires an exclusive write lock, then it will block everybody until it's done.
It's also possible that you may be able to optimize the query such that it needs to be a lower-level lock that isn't exclusive, but that would all depend on the specifics of what the query is doing and your data.

Concurrency, Atomicty, and Isolation in Entity Framework

Based on some periodically and concurrently incoming data, I'm performing an operation that will either insert a new row into a table, or update an existing row in the same table. Whether it inserts or updates a row is dependent on the states of the existing rows. So, the result of this operation will be affected by previous runs of this operation, and affect subsequent runs. I need to ensure atomicity/isolation using transactions, or locks, or something. There seems to be so many options and caveats with Entity Framework (and I'm a complete newbie with database stuff in general too) that I have no idea what direction I should be headed. TransactionScope, BeginTransaction, ambient transactions? Serializable or RepeatableRead? SaveChanges and AcceptAllChanges? Do I even need to do anything special? The fact that a new row can be added makes me worry especially about phantom rows, though I barely understand what that means. Any guidance on the subject would be greatly appreciated.
This tutorial may be helpful to you - http://www.asp.net/mvc/tutorials/getting-started-with-ef-using-mvc/handling-concurrency-with-the-entity-framework-in-an-asp-net-mvc-application
Quote:
Pessimistic Concurrency (Locking)
If your application does need to prevent accidental data loss in
concurrency scenarios, one way to do that is to use database locks.
This is called pessimistic concurrency. For example, before you read a
row from a database, you request a lock for read-only or for update
access. If you lock a row for update access, no other users are
allowed to lock the row either for read-only or update access, because
they would get a copy of data that's in the process of being changed.
If you lock a row for read-only access, others can also lock it for
read-only access but not for update. Managing locks has some
disadvantages. It can be complex to program. It requires significant
database management resources, and it can cause performance problems
as the number of users of an application increases (that is, it
doesn't scale well). For these reasons, not all database management
systems support pessimistic concurrency. The Entity Framework provides
no built-in support for it, and this tutorial doesn't show you how to
implement it.
Optimistic Concurrency
The alternative to pessimistic concurrency is optimistic concurrency.
Optimistic concurrency means allowing concurrency conflicts to happen,
and then reacting appropriately if they do. For example, John runs the
Departments Edit page, changes the Budget amount for the English
department from $350,000.00 to $100,000.00. (John administers a
competing department and wants to free up money for his own
department.)*
There are code examples for both models in the in the tutorial.