MongoDB read documents within transaction - mongodb

Suppose I do the following list of operations in MongoDB
Start a session
Start a transaction for that session
run an insert command with a new document
run a find command on the collection of the inserted document
Commit the transaction
End the session
I understand that outside of the transaction the insert done in the third step will not be visible until the transaction is committed, but what about within the transaction, will the find run in the fourth step see this new document or will it not?

Yes, a transactional find sees a document inserted in a previous transactional insert. You could assume the Read your own writes property.
Every time a transaction is started a new snapshot is created. Outside the transaction, the snapshot is obviously invisible: this is accomplished by using (and abusing if your transaction involves many updates) the WiredTiger cache. This cache is structured as a tree, similar to the following one1:
where each transactional operation is represented as a new update block that could in turn be chained to another update block.
Outside operations only see non-transactional tree entries, while transactional operations see all the entries added before the snapshot is taken + the update blocks for the given transaction.
I am aware that it is a very brief explanation on how MongoDB manages transaction atomicity, but if you are interested in understanding more on this, then I suggest you to read a report I have written. in the same repository you can find some scenarios for the most typical doubts.
1: image taken from Aly Cabral's presentation about How and when to use Multi-document transactions

Related

Update and retrieve records from Postgres

I am a new for Postgres. Currently I hit a situation, I need to retrieve a record from DB/table, then update this record with some changes, then retrieve the updated record again to return to customer, these 3 operations are executed sequentially.
After I run the above 3 steps, sometimes it seems the record has not been updated. But in fact the record has been updated, I suspect when I retrieve the record, Postgres return a cached data rather than fresh data. Actually all DB operations are correct, I guess just Postgres returns cached records.
I am wondering if there is any mechanism to flush my updated data immediately?
Another question is, I am wondering which one is a good practice:
Update a record (actually write to DB), then immediately retrieve the record from DB, both operations are using statement to operate.
Update a record (actually write to DB), then don't retrieve record from DB, because we know updated data, we just use these data to return to customer. However, the record might fail to write DB.
Any ideas for the above?
Some programming languages do db calls asynchronously, meaning that your code is moving on to the next db operation without waiting for the first to finish. So, it could be as simple as using your language's "await" keyword to make sure you are waiting for your db to finish the "update record" before trying to read it again.
If you are writing raw sql or you know it is not an issue with making several db call asynchronously, you could try writing your update and read calls as a single transaction.
See https://www.postgresql.org/docs/14/tutorial-transactions.html if you're unfamiliar with writing transactions.

does sql statement ensure atomicity in postgres

I have a simple bug in my program that uses multi user support. I'm using knex to build sql queries, and I have a pseudocode that depicts the scenerio:
const value = queryBuilder().readDataFromTheDatabase();//executes this
//do some other work and get value
queryBuilder.writeValueToTheDatabase(updateValue(value));
This piece of code is being use in sort of a middleware function. And as you can see, this is a possible race condition i.e. when multiple users access the thing, one of them gets a stale value when they try to execute this at roughly the same amount of time.
My solution
So, I was think a possible solution would be create a single queryBuilder statement:
queryBuilder().readAndUpdateValueInTheDatabase();
So, I'll probably have to use a little bit of plpgsql. I was wondering if this solution will be sufficient. Will the statement be executed atomically? i.e. When one request reads and doesn't finish his write, does another request wait around to both read and write or just waits to write but, reads the stale value?
I think what you are looking for here is isolation, not atomicity. You could set all transactions to the highest isolation level, serializable (which is higher than the usual default level). With that level, if data that a transaction read (and presumably relied upon) is changed, then when it tries to commit it might get a serialization failure error. I say "might", because the system could conclude the situation would be consistent with the data change having happened after the commit, in which case the commit is allowed to stand.
To avoid a race condition with such a setup, you must run both the read and the write in the same database transaction.
There are two ways to do that:
Use the default READ COMMITTED isolation level and lock the rows when you read them:
SELECT ... FROM ... FOR NO KEY UPDATE;
That locks the rows against concurrent modifications, and the lock is held until the end of the transaction.
Use the REPEATABLE READ isolation level and don't lock anything. Then your UPDATE will receive a serialization error (SQLSTATE 40001) if somebody modified the row concurrently. In that case, you roll the transaction back and try again in a new REPEATABLE READ transaction.
The first solution is usually better if you expect conflicts frequently, while the second is better if conflicts are rare.
Note that you should keep the database transaction as short as possible in both cases to keep the risk of conflicts low.
Transaction in PostgreSQL use an optimistic locking model when accessing to tables, while some other DBMS do pessimistic locking (IBM Db2) or the two locking model (MS SQL Server).
Optimistic locking snapshot the data on which you are working, and the modifications are done on the snapshot until the transaction ended. When the transaction finishes, the snapshot modifications are postponed on the real database (table rows), but if some other user had made a change between the moment of the snapshot capture and the commit, then the commit cannot apply and the COMMIT is rejected as a ROLLBACK.
You can try to raise the ISOLATION LEVEL (REPEATABLE READ or SERIALIZABLE) to avoid the trouble.

How mongodb handles users requests when multiple insert commands execute

I am new in mongodb and i want to know How mongodb handels users requests.
What happened if the multiple users fire the multiple insert commands or read commands at the same time.
2:-When or where Snapshot coming in to the picture.(Which phase).
Multiple Inserts and Multiple Reads
MongoDB allows multiple clients to read and write the same data.
In order to ensure consistency, it uses locking and other concurrency control measures to prevent multiple clients from modifying the same piece of data simultaneously
Read this documentation it will give you complete info about concurrency
concurrency reference
MongoDB allows very fast writes and updates by default. The tradeoff is that you are not explicitly notified of failures.By default most drivers do asynchronous, ‘unsafe’ writes - this means that the driver does not return an error directly, similar to INSERT DELAYED with MySQL. If you want to know if something succeeded, you have to manually check for errors using getLastError.
MongoDB doesn't offer durability if you use the default configuration. It writes once every minute data to the disk.
This can be configured using j Option and Write Concern on the insert query.
write-concern reference
Snapshot
The $snapshot operator prevents the cursor from returning a document more than once because an intervening write operation results in a move of the document.
Even in snapshot mode, objects inserted or deleted during the lifetime of the cursor may or may not be returned.
snapshot reference
References: here and here
Hope it Helps!!
I am asking that question in the context of journaling in mongodb. As per the mongodb documentation. A write operation first come into the private view.So the Quetion is if multiple write operation have been performed at the same time,than multiple private view will be created...
2;-Checkpoints and snapshot:in the journaling process which point of place snapshot of data is available..?

What's the difference between issuing a query with or without a "begin" and "commit" command in PostgreSQL?

As title say, it is possible to issue a query on psql with a "begin", query, and "commit".
What I want to know is what happens if I don't use a "begin" command?
Some database engine will allow you to execute modifications (INSERT, UPDATE, DELETE) without an open transaction. It's basically assumed that you have an instant BEGIN / COMMIT around each of your instructions, which is a bad practice in case something goes wrong in a batch of many instructions.
You can still make a SELECT, but no INSERT, UPDATE, DELETE without a BEGIN to enforces the good practice. That way, if something goes wrong, a ROLLBACK is instantly executed, canceling all your modifications as if they never existed.
Using a transaction around a batch of various SELECT will guarantee that the data you get for each SELECT matches the same version of the database at the instant you open the transaction depending on your ISOLATION level.
Please read this for more information :
http://www.postgresql.org/docs/9.5/static/sql-start-transaction.html
and
http://www.postgresql.org/docs/9.5/static/tutorial-transactions.html
If you don't use BEGIN/COMMIT, it's the same as wrapping each individual query in a BEGIN/COMMIT block. You can use BEGIN/COMMIT to group multiple queries into a single transaction. A few reasons you might want to do so include
Updating multiple tables at the same time. For instance, usually when you delete a record you also want to delete other rows that reference it. If you do this in the same transaction, nothing will ever be able to reference a row that's already been deleted.
You want to be able to revert some changes if something goes wrong later. Suppose you're writing some user inputted data to multiple tables. At some point you realize that some of it isn't formatted properly. You probably wouldn't want to insert any of it, so you should wrap the entire operation in a transaction.
If you want to ensure the data you're updating hasn't been updated while you're writing to it. Suppose I'm adding $10 to a bank account from two separate connections. I want to add $20 in total - I don't want one of the UPDATEs to clobber the other.
Postgres gives you the first two of these by default. The last one would require a higher transaction isolation level, and makes your query run the risk of raising a serialization error. Transaction isolation levels are a fairly complicated topic, so if you want more info on them the best place to go is the documentation.

Do writes in SNAPSHOT isolation level block writes in another SNAPSHOT transaction in SQL Server 2008 R2

For SNAPSHOt isolation level in SQL Server 2008 R2, the following is mentioned in MSDN ADO.Net documentation:
Transactions that modify data do not block transactions that read data, and transactions that read data do not block transactions that write data, as they normally would under the default READ COMMITTED isolation level in SQL Server.
There is no mention of whether writes will block writes, when both transactions are in SNAPSHOT isolation mode. So my question is as follows:
Will writes in a SNAPSHOT transaction1 block writes to same tables in another SNAPHOT transaction2?
LATEST UPDATE
After doing a lot of thinking on my question, I am coming to a conclusion as mentioned in paragraph below. Hope others can throw more light on this.
There is no relational database in which writes do NOT block writes. In other words, writes will always block writes. Writes would include statements like INSERT or UPDATE or DELETE. This is true no matter which isolation level you use, since all relational databases need to implement data consistency, when multiple writes are happening in database. Of course, the simultaneous writes need to be conflicting ( as in inserting into the same table or updating the same row/s) for this blocking to occur.
Ligos is actually incorrect - if two separate transactions are trying to update the same record with Snapshot on, transaction 2 WILL be blocked until transaction 1 releases the lock. Then, and ONLY then, will you get error 3960. I realize this thread is over 2 years old, but I wanted to avoid miss-information being out there.
Even the link Ligos references says the exact same thing I am mentioning above (check out the last non-red paragraph)
Write vs. Write will only not be blocked if the two records (ie. rows) trying to be updated are different
No. They will not block. Instead, the UPDATE command in trans2 will fail with error number 3960.
Because of how SNAPSHOT isolation level works, any UPDATE command may fail. The only way you can tell is to catch and handle error 3960 (it is called optimistic concurrency because you don't expect this situation to happen very often).
I ended up testing this empirically, because it's not entirely obvious from the documentation. This blog post illustrates it nicely though.
Assumption: both trans1 and trans2 are UPDATEing the same row in the same table. Updating two different rows should work just fine.