MongoDB multiple update isolation - mongodb

I'm confused about how MongoDB updates works.
In the following docs: https://docs.mongodb.com/manual/core/write-operations-atomicity/ says:
In MongoDB, a write operation is atomic on the level of a single
document, even if the operation modifies multiple embedded documents
within a single document.
When a single write operation modifies multiple documents, the
modification of each document is atomic, but the operation as a whole
is not atomic and other operations may interleave.
I guess it means: if I'm updating all fields of a document I will be unable to see a partial update:
If I get the document before the update I will see it without any change
If I get the document after the update I will see it with all the changes
For a multiple elements the same behavior happens for each document. I guess we could say there is a transaction for each document update instead of a big one for all of them.
But let's say there are a lots of documents on the multiple update, and it takes a while to update all of them. What happen with the queries by other threads during the update?
They will see the old version? Or they will be blocked until the update finishes?
Other updates to same documents are possible during this big update? If so, could this intermediate update exclude some document from the big update?

They will see the old version? Or they will be blocked until the update finishes?
I guess other threads may see the old version of a document or the new, depending on whether they query the document before or after the update is finished, but they will never see a partial update on a document (i.e. one field changed and another not changed).
Other updates to same documents are possible during this big update? If so, could this intermediate update exclude some document from the big update?
Instead of big or small updates, think of 2 threads doing an update on the same document. Thread 1 sets fields {a:1, b:2} and thread 2 sets {b:3, c:4}. If the original document is {a:0, b:0, c:0} then we can have two scenarios:
Update 1 is executed before update 2:
The document will finally be {a:1, b:3, c:4}.
Update 2 is executed before update 1:
The document will finally be {a:1, b:2, c:4}.

Related

SELECT FOR UPDATE SKIP LOCKED in MongoDB

I have 100+ worker threads, which are going to poll database, looking for a new job.
To take a job, a thread need to change status of the bunch of documents from NEW to IN_PROGRESS, so no other threads can peek the same job.
This can be solved perfectly fine in PostgreSQL with SELECT FOR UPDATE SKIP LOCKED WHERE status = "NEW" statement.
Is there a way to do such atomic update in MongoDB for a single document? For a batch?
There's a findAndModify method, which works exactly as you've described for a single document.
For a batch, it's not possible right now, as
In MongoDB, write operations, e.g. db.collection.update(), db.collection.findAndModify(), db.collection.remove(), are atomic on the level of a single document.
It will be possible in MongoDB 4.0 though, with transactions.

Rebuild collection in pymongo

I'd like to "rebuild" my collection atomically, which means delete all existing documents and populate it from scratch.
The thing is, since transactions are not supported there is a small time gap that the collection is empty, which is what I want to avoid.
Is there a way to perform such action in an atomically matter? so there will be no point where the collection is empty?
You can build a new collection with a different name and then use rename command to rename the new collection and drop the existing collection (using dropTarget=True option).
There are several caveats though:
The command will invalidate open cursors which interrupts queries that
are currently returning data.
renameCollection blocks all database activity for the duration of the operation.
renameCollection is not compatible with sharded collections.
If the renameCollection operation does not complete, the target collection and indexes will not be usable and will require manual intervention to clean up.
You can find more info in the official docs.

What is the preferred way to add many fields to all documents in a MongoDB collection?

I have have a Python application that is iteratively going through every document in a MongoDB (3.0.2) collection (typically between 10K and 1M documents), and adding new fields (probably doubling/tripling the number of fields in the document).
My initial thought was that I would use upsert the entire of the revised documents (using pyMongo) - now I'm questioning that:
Given that the revised documents are significantly bigger should I be inserting only the new fields, or just replacing the document?
Also, is it better to perform a write to the collection on a document by document basis or in bulk?
this is actually a great question that can be solved a few different ways depending on how you are managing your data.
if you are upserting additional fields does this mean your data is appending additional fields at a later point in time with the only changes being the addition of the additional fields? if so you could set the ttl on your documents so that the old ones drop off over time. keep in mind that if you do this you will want to set an index that sorts your results by descending _id so that the most recent additions are selected before the older ones.
the benefit of this of doing it this way is that your are continually writing data as opposed to seeking and updating data so it is faster.
in regards to upserts vs bulk inserts. bulk inserts are always faster than upserts since bulk upserting requires you to find the original document first.
Given that the revised documents are significantly bigger should I be inserting only the new fields, or just replacing the document?
you really need to understand your data fully to determine what is best but if only change to the data is additional fields or changes that only need to be considered from that point forward then bulk inserting and setting a ttl on your older data is the better method from the stand point of write operations as opposed to seek, find and update. when using this method you will want to db.document.find_one() as opposed to db.document.find() so that only your current record is returned.
Also, is it better to perform a write to the collection on a document by document basis or in bulk?
bulk inserts will be faster than inserting each one sequentially.

MongoDB: upsert multiple fields based on multiple criteria

I am new to Mongo. I wanted to atomically upsert a document based on multiple criteria. The document looks like the following:
{_id:..., hourOfTime:..., total:..., max:..., min:..., last:...}
This is basically hourly aggregation for number of clicks for an item identified by _id. The clicks for each item is is flushed from the application to MongoDB every five seconds. So, the document need to be updated every five seconds.
Here is the case. Lets say at t=t0, we have {_id:"nike", total:123, max:10, min:3, last:9} then at t=t1, I get message {_id:"nike", count: 12}. Now, for _id="nike", I need to do the following,
Increment total by 12
if max < 12, update max=12
if min > 12, update min=12
update last=12
I want all this operation to be atomic. I unable to convert this in one single query. Any help/hint is appreciated.
This cannot be done with a single query. Here is how I would do it:
Have a field on the document called locked. Run a findAndModify to grab the document if the locked field is false, and set the locked field to the Date() that it is locked at. This will stop other application instances from modifying the document, providing they also check for the locked field.
Modify the document application side, then atomically update it, setting locked to false.
As long as anything modifying the document runs a findAndModify on the locked field, this entire modification should be atomic.
Make sure to have a threshold at which a lock times out, so that documents do not become locked indefinitely if a machine blows up. This means that when updating the document the second time (and releasing the lock), the application should make sure that the date in the document is what it expects, to make sure it still has the lock.

Does findAndModify effectively lock the document to prevent update conflicts?

What type of locking does findAndModify() offer? Is is a write lock only, or read/write? Does it prevent simultaneous updates on the same record?
MongoDB has a global (per-instance) write lock, which serializes all updates across all data in the server (though different servers in a sharded cluster will each have their own independent locks). This means that at any given instant in time, only one update is taking place on any document, and therefore only one update for any given document.
findAndModify doesn't do anything different in this regard than an ordinary update -- it just returns the document to you.
According to the MongoDB docs for MongoDB: findAndModify() for under MongoDB: Atomic Operations it should be.