MongoDB retryable writes in unordered bulk operation - mongodb

I am building a bulk operation for my application and it only consist of single-document write operations.
However, I need each operation to have mongodb "retryable writes" enabled correctly.
So I am wondering if an unordered bulk write works just fine for it or wether it only works with an ordered bulk operation (which would be less efficient) ?
Beside, I have correctly added the retryable write option in my connection string.
Thanks in advance,

Yes, the operations can be retried in an unordered operation. From the MongoDB docs:
Bulk write operations that only consist of the single-document write operations. A retryable bulk operation can include any combination of the specified write operations but cannot include any multi-document write operations, such as updateMany.
Note that the write is only retried a single time, so the application still needs to be able to deal with write failures.

Related

Spring Data MongoDB Concurrent Updates Behavior

Imagine theres a document containing a single field: {availableSpots: 100}
and there are millions of users, racing to get a spot by sending a request to an API server.
each time a request comes, the server reads the document and if the availableSpot is > 0, it then decrements it by 1 and creates a booking in another collection.
Now i read that mongodb locks the document whenever an update operation is performed.
What will happen if theres a million concurrent requests? will it take a long time because the same document keeps getting locked? Also, the server reads the value of document before it tries to update the document, and by the time it acquires the lock, the spot may not be available anymore.
It is also possible that the threads are getting "availableSpot > 0" is true at the same instant in time, but in reality the availableSpot may not be enough for all the requests. How to deal with this?
The most important thing here is atomicity and concurrency.
1. Atomicity
Your operation to update (decrement by one) if availableSpots > 0 :
db.collection.updateOne({"availableSpots" :{$gt : 0}}, { $inc: { availableSpots: -1 })
is atomic.
$inc is an atomic operation within a single document.
Refer : https://docs.mongodb.com/manual/reference/operator/update/inc/
2. Concurrency
Since MongoDB has document-level concurrency control for write operations. Each update will take a lock on the document.
Now your questions:
What will happen if theres a million concurrent requests?
Yes each update will be performed one by one (due to locking) hence will slow down.
the server reads the value of document before it tries to update the
document, and by the time it acquires the lock, the spot may not be
available anymore.
Since the operation is atomic, this will not happen. It will work as you want, only 100 updates will be executed with number of affected rows greater than 0 or equal to 1.
MongoDB uses Wired Tiger as a default storage engine starting version 3.2.
Wired Tiger provides document level concurrency:
From docs:
WiredTiger uses document-level concurrency control for write
operations. As a result, multiple clients can modify different
documents of a collection at the same time.
For most read and write operations, WiredTiger uses optimistic
concurrency control. WiredTiger uses only intent locks at the global,
database and collection levels. When the storage engine detects
conflicts between two operations, one will incur a write conflict
causing MongoDB to transparently retry that operation.
When multiple clients are trying to update a value in a document, only that document will be locked, but not the entire collections.
My understanding is that you are concerned about the performance of many concurrent ACID-compliant transactions against two separate collections:
a collection (let us call it spots) with one document {availableSpots: 999..}
another collection (let us call it bookings) with multiple documents, one per booking.
Now i read that mongodb locks the document whenever an update operation is performed.
It is also possible that the threads are getting "availableSpot > 0"
is true at the same instant in time, but in reality the availableSpot
may not be enough for all the requests. How to deal with this?
With version 4.0, MongoDB provides the ability to perform multi-document transactions against replica sets. (The forthcoming MongoDB 4.2 will extend this multi-document ACID transaction capability to sharded clusters.)
This means that no write operations within a multi-document transaction (such as updates to both the spots and bookings collections, per your proposed approach) are visible outside the transaction until the transaction commits.
Nevertheless, as noted in the MongoDB documentation on transactions a denormalized approach will usually provide better performance than multi-document transactions:
In most cases, multi-document transaction incurs a greater performance
cost over single document writes, and the availability of
multi-document transaction should not be a replacement for effective
schema design. For many scenarios, the denormalized data model
(embedded documents and arrays) will continue to be optimal for your
data and use cases. That is, for many scenarios, modeling your data
appropriately will minimize the need for multi-document transactions.
In MongoDB, an operation on a single document is atomic. Because you can use embedded documents and arrays to capture relationships between data in a single document structure instead of normalizing across multiple documents and collections, this single-document atomicity obviates the need for multi-document transactions for many practical use cases.
But do bear in mind that your use case, if implemented within one collection as a single denormalized document containing one availableSpots sub-document and many thousands of bookings sub-documents, may not be feasible as the maximum document size is 16MB.
So, in conclusion, a denormalized approach to write atomicity will usually perform better than a multi-document approach, but is constrained by the maximum document size of 16MB.
You can try using findAndModify() option while trying to update the document. Each time you will need to cherry pick whichever field you want to update in that particular document. Also, since mongo db replicates data to Primary and secondary nodes, you may also want to adjust your WriteConcern values as well. You can read more about this in official documentation. I have something similar coded that handles similar kind of concurrency issues in mongoDB using spring mongoTemplate. Let me know if you want any reference related to java with that.

MongoDB Chain queries, pseudo transactions

I understand you cannot do transactions in MongoDB and the thinking is that its not needed because everything locks the whole database or collection, I am not sure which. However how then do you perform the following?
How do I chain together multiple insert, update, delete or select queries in mongodb so that other queries that might operate on the same data wait until these queries finish? An analogy would be serialization transaction isolation in ms sql server.
more..
I want to insert/update record into collection A and update a record in collection B and then read Collection A and B but I don't want anyone (process or thread) to read or write to collection A or B until BOTH A and B have been updated or inserted by the first queries.
Yes, that's absolutely possible.
It is called ordered bulk operations on planet Mongo and works like this in the mongo shell:
bulk = db.emptyCollection.initializeOrderedBulkOp()
bulk.insert({name:"First document"})
bulk.find({name:"First document"})
.update({$set:{name:"First document, updated"}})
bulk.execute()
bulk.findOne()
> {_id: <someObjectId>, name:"First document, updated"}
Please read the manual regarding Bulk Write Operations for details.
Edit: Somehow is misread your question. It isn't possible for two collections. Remember though, that you can have different documents in one collection. Some ODMs even allow to have different models saved to the same collection. Exploiting this, you should be able to achieve what you want using the above bulk operations. You may want to combine this with locking to prevent writing. But preventing reading and writing would be the same as an transaction in terms of global and possibly distributed locks.

How to lock a Collection in MongoDB

I have a collection in my database
1.I want to lock my collection when the User Updating the Document
2.No operations are Done Expect Reads while Updating the collection for another Users
please give suggestions how to Lock the collection in MongoDB
Best Regards
GSY
MongoDB implements a writer greedy database level lock already.
This means that when a specific document is being written to:
The User collection would be locked
No reads will be available until the data is written
The reason that no reads are available is because MongoDB cannot do a consistent read while writing (darn you physics, you win again).
It is good to note that if you wish for a more complex lock, spanning multiple rows, then this will not be available in MongoDB and there is no real way of implementing such a thing.
MongoDB locking already does that for you. See what operations acquire which lock and what does each lock mean.
See the MongoDB documentation on write operations paying special attention to this section:
Isolation of Write Operations
The modification of a single document is always atomic, even if the write operation modifies >multiple sub-documents within that document. For write operations that modify multiple >documents, the operation as a whole is not atomic, and other operations may interleave.
No other operations are atomic. You can, however, attempt to isolate a write operation that >affects multiple documents using the isolation operator.
To isolate a sequence of write operations from other read and write operations, see Perform >Two Phase Commits.

Performance gain by using bulk inserts vs regular inserts in MongoDB

What is the performance gain by using bulk inserts vs regular inserts in MongoDB and pymongo specifically. Are bulk inserts just a wrapper for regular inserts?
Bulk inserts are no wrappers for regular inserts. A bulk insert operation contains many documents sent as a whole. It saves as many database round trips. It is much more performant since you don't have to send each document over the network separately.
#dsmilkov Thats an advantage as you dont have to open a connection every single time.

Can I set the priority of a MongoDB query so that batch/archiving operations are yielded to user requests?

I have been using Mongo's map/reduce for some time and this blocks other operations since the JS engine in 2.0 took out a lock (or so I believe). I am just experimenting with the new aggregation framework in 2.2 and had hoped that since it's just reading it would not need to lock, but according to db.currentOps() it is locking.
So in relation to the new aggregation framework (and indeed with any MongoDB operation) I would like to know if it's possible to indicate a priority of a certain operation so that MongoDB can intelligently yield low priority operations (such as some background updates) to a time-sensitive high-priority operation?
In this doc you can see it says Map Reduce "Allows substantial concurrent operation but exclusive to other javascript execution."
So this means that it already yields operation on the database. Plus mostly this is bounded to the map/reduce being a single threaded JavaScript operation.
But if you want make sure there is no locking you can write the output of Map Reduce to another database and then move the collection to the original database.
>use admin
>db.runCommand( {renameCollection: "mapreducedb.mycol", to: "appdb.mycol"} )
Same for the Aggregation Framework
EDIT: can't be used for the Aggregation framework (as of 2.2), as it does not have an {{$out}} operator to write to another database. But Aggregation operations are still safe to execute on the production/main database as Yielding of those operations will still occur.