Do writes in SNAPSHOT isolation level block writes in another SNAPSHOT transaction in SQL Server 2008 R2 - sql-server-2008-r2

For SNAPSHOt isolation level in SQL Server 2008 R2, the following is mentioned in MSDN ADO.Net documentation:
Transactions that modify data do not block transactions that read data, and transactions that read data do not block transactions that write data, as they normally would under the default READ COMMITTED isolation level in SQL Server.
There is no mention of whether writes will block writes, when both transactions are in SNAPSHOT isolation mode. So my question is as follows:
Will writes in a SNAPSHOT transaction1 block writes to same tables in another SNAPHOT transaction2?
LATEST UPDATE
After doing a lot of thinking on my question, I am coming to a conclusion as mentioned in paragraph below. Hope others can throw more light on this.
There is no relational database in which writes do NOT block writes. In other words, writes will always block writes. Writes would include statements like INSERT or UPDATE or DELETE. This is true no matter which isolation level you use, since all relational databases need to implement data consistency, when multiple writes are happening in database. Of course, the simultaneous writes need to be conflicting ( as in inserting into the same table or updating the same row/s) for this blocking to occur.

Ligos is actually incorrect - if two separate transactions are trying to update the same record with Snapshot on, transaction 2 WILL be blocked until transaction 1 releases the lock. Then, and ONLY then, will you get error 3960. I realize this thread is over 2 years old, but I wanted to avoid miss-information being out there.
Even the link Ligos references says the exact same thing I am mentioning above (check out the last non-red paragraph)
Write vs. Write will only not be blocked if the two records (ie. rows) trying to be updated are different

No. They will not block. Instead, the UPDATE command in trans2 will fail with error number 3960.
Because of how SNAPSHOT isolation level works, any UPDATE command may fail. The only way you can tell is to catch and handle error 3960 (it is called optimistic concurrency because you don't expect this situation to happen very often).
I ended up testing this empirically, because it's not entirely obvious from the documentation. This blog post illustrates it nicely though.
Assumption: both trans1 and trans2 are UPDATEing the same row in the same table. Updating two different rows should work just fine.

Related

does sql statement ensure atomicity in postgres

I have a simple bug in my program that uses multi user support. I'm using knex to build sql queries, and I have a pseudocode that depicts the scenerio:
const value = queryBuilder().readDataFromTheDatabase();//executes this
//do some other work and get value
queryBuilder.writeValueToTheDatabase(updateValue(value));
This piece of code is being use in sort of a middleware function. And as you can see, this is a possible race condition i.e. when multiple users access the thing, one of them gets a stale value when they try to execute this at roughly the same amount of time.
My solution
So, I was think a possible solution would be create a single queryBuilder statement:
queryBuilder().readAndUpdateValueInTheDatabase();
So, I'll probably have to use a little bit of plpgsql. I was wondering if this solution will be sufficient. Will the statement be executed atomically? i.e. When one request reads and doesn't finish his write, does another request wait around to both read and write or just waits to write but, reads the stale value?
I think what you are looking for here is isolation, not atomicity. You could set all transactions to the highest isolation level, serializable (which is higher than the usual default level). With that level, if data that a transaction read (and presumably relied upon) is changed, then when it tries to commit it might get a serialization failure error. I say "might", because the system could conclude the situation would be consistent with the data change having happened after the commit, in which case the commit is allowed to stand.
To avoid a race condition with such a setup, you must run both the read and the write in the same database transaction.
There are two ways to do that:
Use the default READ COMMITTED isolation level and lock the rows when you read them:
SELECT ... FROM ... FOR NO KEY UPDATE;
That locks the rows against concurrent modifications, and the lock is held until the end of the transaction.
Use the REPEATABLE READ isolation level and don't lock anything. Then your UPDATE will receive a serialization error (SQLSTATE 40001) if somebody modified the row concurrently. In that case, you roll the transaction back and try again in a new REPEATABLE READ transaction.
The first solution is usually better if you expect conflicts frequently, while the second is better if conflicts are rare.
Note that you should keep the database transaction as short as possible in both cases to keep the risk of conflicts low.
Transaction in PostgreSQL use an optimistic locking model when accessing to tables, while some other DBMS do pessimistic locking (IBM Db2) or the two locking model (MS SQL Server).
Optimistic locking snapshot the data on which you are working, and the modifications are done on the snapshot until the transaction ended. When the transaction finishes, the snapshot modifications are postponed on the real database (table rows), but if some other user had made a change between the moment of the snapshot capture and the commit, then the commit cannot apply and the COMMIT is rejected as a ROLLBACK.
You can try to raise the ISOLATION LEVEL (REPEATABLE READ or SERIALIZABLE) to avoid the trouble.

How do transactions work in the context of reads to the database?

I am using transactions to make changes to a SQL database. As I understand it, this means that changes to the database will happen in an all-or-nothing fashion. What I want to know is, does this have any guarantees for reads? For example, suppose I have some (pseudo)-code like this:
1) start TRANSACTION
2) INSERT INTO users ... // insert some data
3) count = SELECT COUNT(*) FROM ... // count something in the database
4) if count > 10: // do something based on the read
5) INSERT INTO other_table ... // write based on the read
6) COMMMIT TRANSACTION
In this code, I'm doing an INSERT, followed by a SELECT, and then conditionally doing another INSERT based on the outcome of the SELECT.
So my question is, if another process modifies the database between steps (3) and (5), what happens to the count variable, and to my transaction?
If it makes a difference, I am using PostgreSQL.
As Xin pointed out, it depends on the isolation level.
At the default READ COMMITTED level, records from other sessions will become visible as they are committed; you would see the same records if you didn't start a transaction at all (though of course, other processes would see your inserts appear at different times).
With REPEATABLE READ, your queries will not see any records committed by other sessions after your transaction starts. But while you don't have to worry about the result of SELECT COUNT(*) changing during your transaction, you can't assume that this result will still be accurate by the time you commit.
Using SERIALIZABLE provides the strongest guarantee: if your script does the right thing when given exclusive access to the database, then it will do the right thing in the presence of other serialisable transactions (or it will fail outright). However, this means that all transactions which might interfere with yours must be using the same isolation level (which comes at a cost), and all must be prepared to retry their transaction in the event of a serialisation failure.
When serialisable transactions are not an option, you generally guard against race conditions by explicitly locking things against concurrent writes. It's often enough to lock a selection of records, but you can't exactly lock the result of a COUNT(*); in your case, you'd probably need to lock the whole table.
I am not working on postgreSQL, but I think I can answer your question. Think of every query is parallel. I am saying so, because there are 2 transactions: when you insert into a; others can insert into b; then when you check b; whether you can see the new data depends on your isolation setting (read committed or just dirty read).
Also please note that, in database, there is a technology called lock: you can lock a table so that prevent altering it from others before committing your transaction.
Hope

What's the difference between issuing a query with or without a "begin" and "commit" command in PostgreSQL?

As title say, it is possible to issue a query on psql with a "begin", query, and "commit".
What I want to know is what happens if I don't use a "begin" command?
Some database engine will allow you to execute modifications (INSERT, UPDATE, DELETE) without an open transaction. It's basically assumed that you have an instant BEGIN / COMMIT around each of your instructions, which is a bad practice in case something goes wrong in a batch of many instructions.
You can still make a SELECT, but no INSERT, UPDATE, DELETE without a BEGIN to enforces the good practice. That way, if something goes wrong, a ROLLBACK is instantly executed, canceling all your modifications as if they never existed.
Using a transaction around a batch of various SELECT will guarantee that the data you get for each SELECT matches the same version of the database at the instant you open the transaction depending on your ISOLATION level.
Please read this for more information :
http://www.postgresql.org/docs/9.5/static/sql-start-transaction.html
and
http://www.postgresql.org/docs/9.5/static/tutorial-transactions.html
If you don't use BEGIN/COMMIT, it's the same as wrapping each individual query in a BEGIN/COMMIT block. You can use BEGIN/COMMIT to group multiple queries into a single transaction. A few reasons you might want to do so include
Updating multiple tables at the same time. For instance, usually when you delete a record you also want to delete other rows that reference it. If you do this in the same transaction, nothing will ever be able to reference a row that's already been deleted.
You want to be able to revert some changes if something goes wrong later. Suppose you're writing some user inputted data to multiple tables. At some point you realize that some of it isn't formatted properly. You probably wouldn't want to insert any of it, so you should wrap the entire operation in a transaction.
If you want to ensure the data you're updating hasn't been updated while you're writing to it. Suppose I'm adding $10 to a bank account from two separate connections. I want to add $20 in total - I don't want one of the UPDATEs to clobber the other.
Postgres gives you the first two of these by default. The last one would require a higher transaction isolation level, and makes your query run the risk of raising a serialization error. Transaction isolation levels are a fairly complicated topic, so if you want more info on them the best place to go is the documentation.

Why does 'read uncommited' isolation level allow locks?

I put a breakpoint in my code to pause the execution before transaction is commited or rolled back. Then I'd like to see the current state of the database, but when I set in ssms the transaction isolation level to read uncommited and run the query against the tables affected by the paused transaction it gets locked and waits until transaction is finished.
Why does this happen, and is it possible to disable locking?
My crystal ball told me that the transaction that you'd paused had made schema modifications.
Such modifications take out [Schema Modification locks] (Sch-M)(https://technet.microsoft.com/en-us/library/ms175519%28v=sql.105%29.aspx?f=255&MSPPError=-2147217396):
This means the Sch-M lock blocks all outside operations until the lock is released.
And this includes even being able to compile your read uncommitted using query because:
The Database Engine uses schema stability (Sch-S) locks when compiling and executing queries
Which makes some sense - schema modifications could include adding or removing columns so other queries can't know what the current layout of data on disk/in memory actually means.
Even for your case where all you've done is disabled constraints, the optimizer would usually make use of constraint information when planning a query - e.g. whether a check constraint can actually be trusted or not.

Lock and transaction in postgres that should block a query

Let's assume in SQL window 1 I do:
-- query 1
BEGIN TRANSACTION;
UPDATE post SET title = 'edited' WHERE id = 1;
-- note that there is no explicit commit
Then from another window (window 2) I do:
-- query 2
SELECT * FROM post WHERE id = 1;
I get:
1 | original title
Which is fine as the default isolation level is READ COMMITTED and because query 1 is never committed, the change it performs is not readable until I explicitly commit from window 1.
In fact if I, in window 1, do:
COMMIT TRANSACTION;
I can then see the change if I re-run query 2.
1 | edited
My question is:
Why is query 2 returning fine the first time I run it? I was expecting it to block as the transaction in window 1 was not committed yet and the lock placed on row with id = 1 was (should be) an unreleased exclusive one that should block a read like the one performed in window 2. All the rest makes sense to me but I was expecting the SELECT to get stuck until an explicit commit in window 1 was executed.
The behaviour you describe is normal and expected in any transactional relational database.
If PostgreSQL showed you the value edited for the first SELECT it'd be wrong to do so - that's called a "dirty read", and is bad news in databases.
PostgreSQL would be allowed to wait at the SELECT until you committed or rolled back, but it isn't required to by the SQL standard, you haven't told it you want to wait, and it doesn't have to wait for any technical reason, so it returns the data you asked for immediately. After all, until it's committed, that update only kind-of exists - it still might or might not happen.
If PostgreSQL always waited here, then you'd quickly land up with a situation where only one connection could be doing anything with the database at a time. Not pretty for performance, and totally unnecessary the vast majority of the time.
If you want to wait for a concurrent UPDATE (or DELETE), you'd use SELECT ... FOR SHARE. (But be aware that this won't work for INSERT).
Details:
SELECT without a FOR UPDATE or FOR SHARE clause does not take any row level locks. So it sees whatever is the current committed row, and is not affected by any in-flight transactions that might be modifying that row. The concepts are explained in the MVCC section of the docs. The general idea is that PostgreSQL is copy-on-write, with versioning that allows it to return the correct copy based on what the transaction or statement could "see" at the time it started - what PostgreSQL calls a "snapshot".
In the default READ COMMITTED isolation snapshots are taken at the statement level, so if you SELECT a row, COMMIT a change to it from another transaction, and SELECT it again you'll see different values even within one transation. You can use SNAPSHOT isolation if you don't want to see changes committed after the transaction begins, or SERIALIZABLE isolation to add further protection against certain kinds of transaction inter-dependencies.
See the transaction isolation chapter in the documentation.
If you want a SELECT to wait for in-progress transactions to commit or rollback changes to rows being selected, you must use SELECT ... FOR SHARE. This will block on the lock taken by an UPDATE or DELETE until the transaction that took the lock rolls back or commits.
INSERT is different, though - the tuples just don't exist to other transactions until commit. The only way to wait for concurrent INSERTs is to take an EXCLUSIVE table-level lock, so you know nobody else is changing the table while you read it. Usually the need to do that means you have a design problem in the application though - your app should not care if there are uncommitted inserts still in flight.
See the explicit locking chapter of the documentation.
In PostgreSQL's MVCC implementation, the principle is reading does not block writing and vice-versa. The manual:
The main advantage of using the MVCC model of concurrency control
rather than locking is that in MVCC locks acquired for querying
(reading) data do not conflict with locks acquired for writing data,
and so reading never blocks writing and writing never blocks reading.
PostgreSQL maintains this guarantee even when providing the strictest
level of transaction isolation through the use of an innovative
Serializable Snapshot Isolation (SSI) level.
Each transaction only sees (mostly) what has been committed before the transaction began.
That does not mean there'd be no locking. Not at all. For many operations various kinds of locks are acquired. And various strategies are applied to resolve possible conflicts.