I found the following description for the Serializable (IsolationLevel.Serializable) isolation level in the MSDN documentation:
Volatile data can be read but not modified, and no new data can be added during the transaction.
(Reference)
And on the same page volatile data is defined as:
The data affected by a transaction is called volatile.
My question is, how can I prevent other transactions from reading volatile data and also prevent them from adding any new data.
Thank you very much.
I think this is highest isolation level you can get. According to this link , this should be enough for your need.
SERIALIZABLE Specifies the following: Statements cannot read data that
has been modified but not yet committed by other transactions. No
other transactions can modify data that has been read by the current
transaction until the current transaction completes. Other
transactions cannot insert new rows with key values that would fall in
the range of keys read by any statements in the current transaction
until the current transaction completes.
Related
Suppose I do the following list of operations in MongoDB
Start a session
Start a transaction for that session
run an insert command with a new document
run a find command on the collection of the inserted document
Commit the transaction
End the session
I understand that outside of the transaction the insert done in the third step will not be visible until the transaction is committed, but what about within the transaction, will the find run in the fourth step see this new document or will it not?
Yes, a transactional find sees a document inserted in a previous transactional insert. You could assume the Read your own writes property.
Every time a transaction is started a new snapshot is created. Outside the transaction, the snapshot is obviously invisible: this is accomplished by using (and abusing if your transaction involves many updates) the WiredTiger cache. This cache is structured as a tree, similar to the following one1:
where each transactional operation is represented as a new update block that could in turn be chained to another update block.
Outside operations only see non-transactional tree entries, while transactional operations see all the entries added before the snapshot is taken + the update blocks for the given transaction.
I am aware that it is a very brief explanation on how MongoDB manages transaction atomicity, but if you are interested in understanding more on this, then I suggest you to read a report I have written. in the same repository you can find some scenarios for the most typical doubts.
1: image taken from Aly Cabral's presentation about How and when to use Multi-document transactions
I have a simple bug in my program that uses multi user support. I'm using knex to build sql queries, and I have a pseudocode that depicts the scenerio:
const value = queryBuilder().readDataFromTheDatabase();//executes this
//do some other work and get value
queryBuilder.writeValueToTheDatabase(updateValue(value));
This piece of code is being use in sort of a middleware function. And as you can see, this is a possible race condition i.e. when multiple users access the thing, one of them gets a stale value when they try to execute this at roughly the same amount of time.
My solution
So, I was think a possible solution would be create a single queryBuilder statement:
queryBuilder().readAndUpdateValueInTheDatabase();
So, I'll probably have to use a little bit of plpgsql. I was wondering if this solution will be sufficient. Will the statement be executed atomically? i.e. When one request reads and doesn't finish his write, does another request wait around to both read and write or just waits to write but, reads the stale value?
I think what you are looking for here is isolation, not atomicity. You could set all transactions to the highest isolation level, serializable (which is higher than the usual default level). With that level, if data that a transaction read (and presumably relied upon) is changed, then when it tries to commit it might get a serialization failure error. I say "might", because the system could conclude the situation would be consistent with the data change having happened after the commit, in which case the commit is allowed to stand.
To avoid a race condition with such a setup, you must run both the read and the write in the same database transaction.
There are two ways to do that:
Use the default READ COMMITTED isolation level and lock the rows when you read them:
SELECT ... FROM ... FOR NO KEY UPDATE;
That locks the rows against concurrent modifications, and the lock is held until the end of the transaction.
Use the REPEATABLE READ isolation level and don't lock anything. Then your UPDATE will receive a serialization error (SQLSTATE 40001) if somebody modified the row concurrently. In that case, you roll the transaction back and try again in a new REPEATABLE READ transaction.
The first solution is usually better if you expect conflicts frequently, while the second is better if conflicts are rare.
Note that you should keep the database transaction as short as possible in both cases to keep the risk of conflicts low.
Transaction in PostgreSQL use an optimistic locking model when accessing to tables, while some other DBMS do pessimistic locking (IBM Db2) or the two locking model (MS SQL Server).
Optimistic locking snapshot the data on which you are working, and the modifications are done on the snapshot until the transaction ended. When the transaction finishes, the snapshot modifications are postponed on the real database (table rows), but if some other user had made a change between the moment of the snapshot capture and the commit, then the commit cannot apply and the COMMIT is rejected as a ROLLBACK.
You can try to raise the ISOLATION LEVEL (REPEATABLE READ or SERIALIZABLE) to avoid the trouble.
How does Postgres decide which transactions are visible to a given transaction according to the isolation level?
I know that Postgres uses xmin and xmax and compares it to xid, but I haven't found the articles with proper details.
Do you know the process under hood?
This depends on the current snapshot.
READ COMMITTED transactions take a new snapshot for every query, while REPEATABLE READ and SERIALIZABLE transactions take a snapshot when the first query is run and keep it for the whole duration of the transaction.
The snapshot is defined as struct SnapshotData in include/utils/snapshot.h and essentially contains the following:
a minimal transaction ID xmin: all older transactions are visible to this snapshot.
a maximal transaction ID xmax: all later transactions are not visible to this snapshot.
an array of transaction IDs xid that contains all in-between transactions that are not visible to this snapshot.
To determine if a tuple is visible to a snapshot, its xmin must be a committed transaction ID that is visible and its xmax must not be a committed transaction ID that is visible.
To determine if a transaction is committed or not, the commit log has to be consulted unless the hint bits of the tuple (which cache that information) have already been set.
I am using transactions to make changes to a SQL database. As I understand it, this means that changes to the database will happen in an all-or-nothing fashion. What I want to know is, does this have any guarantees for reads? For example, suppose I have some (pseudo)-code like this:
1) start TRANSACTION
2) INSERT INTO users ... // insert some data
3) count = SELECT COUNT(*) FROM ... // count something in the database
4) if count > 10: // do something based on the read
5) INSERT INTO other_table ... // write based on the read
6) COMMMIT TRANSACTION
In this code, I'm doing an INSERT, followed by a SELECT, and then conditionally doing another INSERT based on the outcome of the SELECT.
So my question is, if another process modifies the database between steps (3) and (5), what happens to the count variable, and to my transaction?
If it makes a difference, I am using PostgreSQL.
As Xin pointed out, it depends on the isolation level.
At the default READ COMMITTED level, records from other sessions will become visible as they are committed; you would see the same records if you didn't start a transaction at all (though of course, other processes would see your inserts appear at different times).
With REPEATABLE READ, your queries will not see any records committed by other sessions after your transaction starts. But while you don't have to worry about the result of SELECT COUNT(*) changing during your transaction, you can't assume that this result will still be accurate by the time you commit.
Using SERIALIZABLE provides the strongest guarantee: if your script does the right thing when given exclusive access to the database, then it will do the right thing in the presence of other serialisable transactions (or it will fail outright). However, this means that all transactions which might interfere with yours must be using the same isolation level (which comes at a cost), and all must be prepared to retry their transaction in the event of a serialisation failure.
When serialisable transactions are not an option, you generally guard against race conditions by explicitly locking things against concurrent writes. It's often enough to lock a selection of records, but you can't exactly lock the result of a COUNT(*); in your case, you'd probably need to lock the whole table.
I am not working on postgreSQL, but I think I can answer your question. Think of every query is parallel. I am saying so, because there are 2 transactions: when you insert into a; others can insert into b; then when you check b; whether you can see the new data depends on your isolation setting (read committed or just dirty read).
Also please note that, in database, there is a technology called lock: you can lock a table so that prevent altering it from others before committing your transaction.
Hope
What IsolationLevel should I use in my TransactionScopes for:
Reading a single record and I may update that record. This record is
independent of all other data in the database so I only need to lock that one record.
Trying to read a single record. If no record exists, then create a record with that
value in that table. This is independent of all other tables, but it
needs to lock this table so another thread doesn't also find no
record, and then add the same record.
In the 2nd case, I think I need to lock the table to stop an insert on the table and any access on the record read, but allow reads of other records in the table and any access on any other table.
thanks - dave
A am not sure about EF as I have not worked with it, but my answer is following:
It is enough to use 'REPEATABLE READ' since it "Specifies that statements cannot read data that has been modified but not yet committed by other transactions and that no other transactions can modify data that has been read by the current transaction until the current transaction completes."
I would use 'SERIALIZABLE' since "No other transactions can modify data that has been read by the current transaction until the current transaction completes."
You can read more here about isolation levels.