Does Mongo ensure isolation when updating single documents?

Does Mongo ensure isolation when updating single documents? - mongodb

The Mongo documentation on atomicity and isolation is a tad vague and slightly confusing. I have this document structure and I want to ensure mongo will handle updates while isolating updated from different users.
Assume a collection of product discount offers. When a user redeems an offer, the offers_redeemed field is incremented. The max_allowed_redemptions field controls how many times an offer can be redeemed.
{
offer_id: 99,
description: “Save 25% on Overstock.com…”
max_allowed_redemptions: 10,
offers_redeemed: 3
}
I tested this using findAndModify and it appears to work by updating the offer only if another redemption would be less than or equal to the max allowed; I want to know if this is the correct way to do it and if this would work in a multi user, sharded environment. Is there a scenario where an update to offers_redeemed would force it to exceed max_allowed_redemptions ? , obviously, that would corrupt the record so we need to avoid it.
db.offer.findAndModify({
query:{ offer_id: 99, $where: “this.offers_redeemed + 1 <= this.max_allowed_redemptions”},
update: {$inc: {offers_redeemed: 1},
new: true })

First as the documentation very clearly says
If you don't need to return the document, you can use Update (which can affect multiple documents, as well).
Second, watch the Update if Current strategy on the atomic page. It clearly shows that if the condition applies then the update happens and nothing can come between the two.

Related

MongoDB - select document for update - without another operation modifying the same document after the select

I've got a document that needs to be read and updated. Meanwhile, it's quite likely that another process is doing the same which would break the document update.
For example, if Process A reads document d and adds field 'a' to it and writes the document, and Process B reads document d before Process A writes it, and adds field b and writes the document, then whichever process writes the changes out will get their change because it clobbers the change by the one that wrote first.
I've read this article and some other very complicated transaction articles around mongo. Can someone describe a simple solution to this - I have not come across something that makes me comfortable with this yet.
https://www.mongodb.com/blog/post/how-to-select--for-update-inside-mongodb-transactions
[UPDATE]- In addition, I'm trying to augment a document that might not yet exist. I need to create the document if it doesn't exist. I also need to read it to analyze it. One key is "relatedIds" (an array). I push to that array if the id is not found in it. Another method I have that needs to create the document if it doesn't exist adds to a separate collection of objects.
[ANOTHER UPDATE x2] --> From what I've been reading and getting from various sources - is that the only way to properly create a transaction for this - is to "findOneAndModify" the document to mark it as dirty with some field that will definitely update, such as "lock" with an objectId (since that will never result in a NO-OP - ie, it definitely causes a change).
If another operation tries to write to it, Mongo can now detect that this record is already part of a transaction.
Thus anything that writes to it will cause a writeError on that other operation. My transaction can then slowly work on that record and have a lock on it. When it writes it out and commits, that record is definitely not touched by anything else. If there's no way to do this without a transaction for some reason, then am I creating the transaction in the easiest way here?

Using Mongo's transactions is the "proper" way to go but i'll offer a simple solution that is sufficient ( with some caveats ).
The simplest solution would be to use findOneAndUpdate to read the document and update a new field, let's call it status, since it is atomic this is possible.
the query would look like so:
const doc = await db.collection.findOneAndUpdate(
{
_id: docId,
status: { $ne: 'processing' }
},
{
$set: {
status: 'processing'
}
}
);
so if dov.value is null then it means (assuming the document exists) that another process is processing it. When you finish processing you just need to reset status to be any other value.
Now because you are inherently locking this document from being read until the process finishes you have to make sure that you handle cases like an error thrown throughout the process, update failure, db connection issue's, etc .
Overall I would be cautious about using this approach as it will only "lock" the document for the "proper" queries ( every single process needs to be updated to use the status field ), which is a little problematic, depending on your usecase.

Incremental field to existing collection

I have a collection containing around 100k documents. I want to add an auto incrementing "custom_id" field to my documents, and keep adding my documents by incrementing that field from now on.
What's the best approach for this? I've seen some examples in the official document (http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/) however they're only for adding new documents, not for updating an existing collection.
Example code I created based on the link above to increment my counter:
function incrementAndGetNext(counter, callback) {
counters.findAndModify({
name: counter
}, [["_id", 1]], {
$inc: {
"count": 1
}
}, {
"new": true
}, function (err, doc) {
if (err) return console.log(err);
callback(doc.value);
})
}
On the above code counters is db.counters collection and I have this document there:
{_id:"...",name:"post",count:"0"}
Would love to know.
Thank you.
P.S. I'm using native mongojs driver for js

Well, using the link you mentionned, I'd rather use the counters collection approach.
The counters collections approach has some drawbacks including :
It always generates multiples request (two): one to get the sequence number, another to do the insertion using the id you got via the sequence,
If you are using sharding features of mongodb, a document responsible for storing a counter state may be used a lot, and each time it will reach the same server.
However it should be appropriate for most uses.
The approach you mentionned ("the optimistic loop") should not break IMO, and I don't guess why you have a problem with it. However I'd not recommend it. What happens if you execute the code on multiple mongo clients, if one has a lot of latency and others keep taking IDs? I'd not like to encounter this kind of problem... Furthermore, there are at least two request per successful operation, but no maximum of retries before a success...

Conditional update for MongoDB (Meteor)

TLDR: Is there a way I can conditionally update a Meteor Mongo record inside a collection, so that if I use the id as a selector, I want to update if that matches and only if the revision number is greater than what already exists, or perform an upsert if there is no id match?
I am having an issue with updates to server side Meteor Mongo collections, whereby it seems the added() function callback in the Observers is being triggered on an upsert.
Here is what I am trying to do in a nutshell.
My meteor js app boots and then connects to an endpoint, fetching data and then upserting it into the collection.
collection.update({'sys.id': item.sys.id}, item, {upsert: true});
The 'sys.id' selector checks to see if the item exists, and then updates if it does or adds if it does not.
I have an observer monitoring the above collection, which then acts when an item has been added/updated to the collection.
collection.find({}).observeChanges({
added: this.itemAdded.bind(this),
changed: this.itemChanged.bind(this),
removed: this.itemRemoved.bind(this)
});
The first thing that puzzles me is that when the app is closed and then booted again, the 'added()' callback is fired when the collection is observed. What I would hope to happen is that the changed() callback is fired.
Going back to my original update - is it possible in Mongo to conditionally update something, so you have the selector, then the item, but only perform the update when another condition is met?
// Incoming item
var item = {
sys: {
id: 1,
revision: 5
}
};
collection.update({'sys.id': item.sys.id, 'sys.revision': {$gt: item.sys.revision}, item, {upsert: true});
If you look at the above code, what this is going to do is try to match the sys.id which is fine, but then the revisions will of course be different which means the update function will see it as a different document and then perform a new insert, thus creating duplicate data.
How do I fix this?

To your main question:
What you want is called findAndModify. First, look for the the document meeting the specs, and then update accordingly. This is a really powerful idea because if you did it in 2 queries, the document you found could be deleted/updated before you got to update it. Luckily for you, someone made a package (I really wish this existed a year ago!) https://github.com/fongandrew/meteor-find-and-modify
If you were to do this without using findAndModify you'd have to use javascript to find the doc, see if it matches your criteria, and then update it. In your use case, this would probably work, but there will always be that "what if" in the back of your mind.
Regarding observeChanges, the added is called each time the local minimongo receives a document (it's just reading what the DDP is telling it). Since a refresh will delete your local collection, you have to add those docs one by one. What you could do is wait until all added callbacks have fired, and then run your server method. In doing so, you get a ton of adds, and then a couple more changes will trickle in afterwards.

As Matt K said, you want findAndModify. There are some gotchas to be aware of:
findAndModify is about 100x slower than a find followed by an update. Find+modify is, obviously, not atomic and so won't do what you need, but be aware of the speed hit. (This is based off experience with MongoDB v2.4, so run some benchmarks to confirm under your own version.)
If your query matches multiple items, findAndModify will only act on the first one. In this case, you're querying on a unique id, but be aware of the issue for future use.
findAndModify will return the document after doing its thing, but by default it returns the pre-modification version. If you want the modified one, you need to pass the 'new: true' in your query.

MongoDB findAndModify from multiple clients

my MongoDB collection is used as a job queue, and there are 3 C++ machines that read from this collection. The problem is that those three can't perform the same job. All jobs need to be made only once.
I fetch all un-done jobs by searching the collection for all records with 'isDone:False' and then update this document, 'isDone:True'. But if 2 machines find the same document at the same time, they would to the same job both. How can I avoid this?
Edit: My question is - do findAndModify really solves that problem?
(After reading A way to ensure exclusive reads in MongoDb's findAndModify?)

Yes, findAndModify solve it.
Ref: MongoDB findAndModify from multiple clients
"...
Note: This command obtains a write lock on the affected database and will block other operations until it has completed; however, typically the write lock is short lived and equivalent to other similar update() operations.
..."
Ref: http://docs.mongodb.org/manual/reference/method/db.collection.update/#db.collection.update
"...
For unsharded collections, you can override this behavior with the $isolated isolation operator, which isolates the update operation and blocks other write operations during the update. See the isolation operator.
..."
Ref: http://docs.mongodb.org/manual/reference/operator/isolated/
Regards,
Moacy

Yes, find-and-modify will solve your problem:
db.collection.findAndModify( {
query: { isDone: false },
update: { $set: { isDone: true } },
new: true,
upsert: false # never create new docs
} );
This will return a single document that it just updated from false to true.
But you have a serious problem if your C++ clients ever have a hiccup (the box dies, they are killed, the code has an error, etc.) Imagine if your TCP connection drops just after the update on the server, but before the C++ code gets the job. It's generally better to have multi-phase approach:
change "isDone" to "isInProgress", then when it's done, delete the document. (Now, you can see the stack of "todo" and "being done". If something is "being done" for a long time, the client probably died.
change "isDone" to "phase" and atomically set it from "new" to "started" (and later set it to "finished"). Now you can see if something is "started" for a long time, the client may have died.
If you're really sophisticated, you can make a partial index. For example, "Only index documents with "phase:{ $ne: 'finished'}". Now you don't need to waste space indexing the millions of finished documents. The index only holds the handful of new/in-progress documents, so it's smaller/faster.

Multi-collection, multi-document 'transactions' in MongoDB

I realise that MongoDB, by it's very nature, doesn't and probably never will support these kinds of transactions. However, I have found that I do need to use them in a somewhat limited fashion, so I've come up with the following solution, and I'm wondering: is this the best way of doing it, and can it be improved upon? (before I go and implement it in my app!)
Obviously the transaction is controlled via the application (in my case, a Python web app). For each document in this transaction (in any collection), the following fields are added:
'lock_status': bool (true = locked, false = unlocked),
'data_old': dict (of any old values - current values really - that are being changed),
'data_new': dict (of values replacing the old (current) values - should be an identical list to data_old),
'change_complete': bool (true = the update to this specific document has occurred and was successful),
'transaction_id': ObjectId of the parent transaction
In addition, there is a transaction collection which stores documents detailing each transaction in progress. They look like:
{
'_id': ObjectId,
'date_added': datetime,
'status': bool (true = all changes successful, false = in progress),
'collections': array of collection names involved in the transaction
}
And here's the logic of the process. Hopefully it works in such a way that if it's interupted, or fails in some other way, it can be rolled back properly.
1: Set up a transaction document
2: For each document that is affected by this transaction:
Set lock_status to true (to 'lock' the document from being modified)
Set data_old and data_new to their old and new values
Set change_complete to false
Set transaction_id to the ObjectId of the transaction document we just made
3: Perform the update. For each document affected:
Replace any affected fields in that document with the data_new values
Set change_complete to true
4: Set the transaction document's status to true (as all data has been modified successfully)
5: For each document affected by the transaction, do some clean up:
remove the data_old and data_new, as they're no longer needed
set lock_status to false (to unlock the document)
6: Remove the transaction document set up in step 1 (or as suggested, mark it as complete)
I think that logically works in such a way that if it fails at any point, all data can be either rolled back or the transaction can be continued (depending on what you want to do). Obviously all rollback/recovery/etc. is performed by the application and not the database, by using the transaction documents and the documents in the other collections with that transaction_id.
Is there any glaring error in this logic that I've missed or overlooked? Is there a more efficient way of going about it (e.g. less writing/reading from the database)?

As a generic response multi-document commits on MongoDB can be performed as two phase commits, which have been somewhat extensively documented in the manual (See: http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/).
The pattern suggested by the manual is briefly to following:
Set up a separate transactions collection, that includes target document, source document, value and state (of the transaction)
Create new transaction object with initial as the state
Start making a transaction and update state to pending
Apply transactions to both documents (target, source)
Update transaction state to committed
Use find to determine whether documents reflect the transaction state, if ok, update transaction state to done
In addition:
You need to manually handle failure scenarios (something didn't happen as described below)
You need to manually implement a rollback, basically by introducing a name state value canceling
Some specific notes for your implementation:
I would discourage you from adding fields like lock_status, data_old, data_new into source/target documents. These should be properties of the transactions, not the documents themselves.
To generalize the concept of target/source documents, I think you could use DBrefs: http://www.mongodb.org/display/DOCS/Database+References
I don't like the idea of deleting transaction documents when they are done. Setting state to done seems like a better idea since this allows you to later debug and find out what kind of transactions have been performed. I'm pretty sure you won't run out of disk space either (and for this there are solutions as well).
In your model how do you guarantee that everything has been changed as expected? Do you inspect the changes somehow?

MongoDB 4.0 adds support for multi-document ACID transactions.
Java Example:
try (ClientSession clientSession = client.startSession()) {
clientSession.startTransaction();
collection.insertOne(clientSession, docOne);
collection.insertOne(clientSession, docTwo);
clientSession.commitTransaction();
}
Note, it works for replica set. You can still have a replica set with one node and run it on local machine.
https://stackoverflow.com/a/51396785/4587961
https://docs.mongodb.com/manual/tutorial/deploy-replica-set-for-testing/

MongoDB 4.0 is adding (multi-collection) multi-document transactions: link

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse