MongoDB/Mongoose atomic read & write on single Document - mongodb

I need to update a Document based on certain criteria with mongoDB/mongoose.
So I search for the Document in the Collection based on ID.
Once I have the document I check if it satisfies the condition I'm looking for (values of one the nested properties).
Once I've confirmed the Document satisfies the criteria, I perform certain actions on the document and then Save the document.
The whole process of finding the Document, checking for criteria, making adjustments and updating, takes some time to complete.
Problem is I can trigger the api that runs this process multiple times.
While the entire process is running for the first api call, I can call the api multiple times (can't control api call) and have the entire process running again for the same document.
So now I end up making the same updates twice on the Document.
If the first call runs through successfully, the next call will not cos the first update will ensure the criteria is no longer met. But since they are being called while the first one hasn't finished updating it ends up going through successfully.
Any way I can perform all the steps as one atomic action?

Well, I'm a year or two late to this party, and this answer I'm about to write has some issues depending on your use case, but for my situation, it has worked pretty well.
Here's the solution with Mongoose / JavaScript:
const findCriteria1 = { _id: theId };
myModel.findOne(findCriteria1, function(error, dataSet) {
if (dataSet.fieldInQuestion === valueIWantFieldToBe) {
const findCriteria2 = { _id: theId, fieldInQuestion: dataSet.fieldInQuestion };
const updateObject = { fieldInQuestion: updateValue };
myModel.findOneAndUpdate(findCriteria2, updateObject, function(error, dataSet) {
if (!error) {
console.log('success')
}
});
}
}
So basically, you find() the document with the value you want, and if it meets conditions, you do a findOneAndUpdate(), making sure that the document value did not change from what it was when you found it (as seen in findCriteria2).
One major issue with this solution is that it is possible for this operation to fail because the document value was updated by another user in between this operation's DB calls. This is unlikely, but may not be acceptable, especially if your DB is being pounded with frequent calls to the same document. A much better solution, if it exists, would be for a document lock and update queue, much like most SQL databases can do.
One way to help with that issue would be to wrap the whole solution I gave in a loop, and if the findOneAndUpdate fails, to try the loop again until it doesn't fail. You could set how many times you tried the loop... and this seems like a really bad idea, but you could do an infinite loop... of course, yeah, that could be really dangerous because it has the potential to totally disable the DB.
Another issue that my loop idea doesn't solve is that if you need a "first come, first served" model, that might not always be the case, as a DB request that thwarts the request before it may get to be the "first served".
And, a better idea altogether might just be to change how you model your data. Of course, this depends on what checks you need to run on your data, but you mentioned "values of nested properties" in your answer... what if those values were just in a seperate document and you could simply check what you needed to on the findOneAndUpdate() criteria parameter?

To operate on a consistent snapshot of the entire database, use a transaction with read concern snapshot.
Transactions do not magically prevent concurrency. withTransaction helper handles the mechanics of many cases of concurrent modifications transparently to the application, but you still need to understand concurrent operations on databases in general to write working/correct code.

Related

Firestore Increment - Cloud Function Invoked Twice

With Firestore Increment, what happens if you're using it in a Cloud Function and the Cloud Function is accidentally invoked twice?
To make sure that your function behaves correctly on retried execution attempts, you should make it idempotent by implementing it so that an event results in the desired results (and side effects) even if it is delivered multiple times.
E.g. the function is trying to increment a document field by 1
document("post/Post_ID_1").
updateData(["likes" : FieldValue.increment(1)])
So while Increment may be atomic it's not idempotent? If we want to make our counters idempotent we still need to use a transaction and keep track of who was the last person to like the post?
It will increment once for each invocation of the function. If that's not acceptable, you will need to write some code to figure out if any subsequent invocations are valid for your case.
There are many strategies to implement this, and it's up to you to choose one that suits your needs. The usual strategy is to use the event ID in the context object passed to your function to determine if that event has been successfully processed in the past. Maybe this involves storing that record in another document, in Redis, or somewhere that persists long enough for duplicates to be prevented (an hour should be OK).

Detecting concurrent data modification of document between read and write

I'm interested in a scenario where a document is fetched from the database, some computations are run based on some external conditions, one of the fields of the document gets updated and then the document gets saved, all in a system that might have concurrent threads accessing the DB.
To make it easier to understand, here's a very simplistic example. Suppose I have the following document:
{
...
items_average: 1234,
last_10_items: [10,2187,2133, ...]
...
}
Suppose a new item (X) comes in, five things will need to be done:
read the document from the DB
remove the first (oldest) item in the last_10_items
add X to the end of the array
re-compute the average* and save it in items_average.
write the document to the DB
* NOTE: the average computation was chosen as a very simple example, but the question should take into account more complex operations based on data existing in the document and on new data (i.e. not something solvable with the $inc operator)
This certainly is something easy to implement in a single-threaded system, but in a concurrent system, if 2 threads would like to follow the above steps, inconsistencies might occur since both will update the last_10_items and items_average values without considering and/or overwriting the concurrent changes.
So, my question is how can such a scenario be handled? Is there a way to check or react-upon the fact that the underlying document was changed between steps 1 and 5? Is there such a thing as WATCH from redis or 'Concurrent Modification Error' from relational DBs?
Thanks
In database system,it uses a memory inspection and roll back scheme which is similar to transactional memory.
Briefly speaking, it simply monitors the share memory parts you specified and do something like compare and swap or load and link or test and set.
Therefore,if any memory content is changed during transaction,it will abort and try again until there is no conflict operation for that shared memory.
For example,GCC implements the following:
https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html
type __sync_lock_test_and_set (type *ptr, type value, ...)
type __sync_val_compare_and_swap (type *ptr, type oldval type newval, ...)
For more info about transactional memory,
http://en.wikipedia.org/wiki/Software_transactional_memory

How cursor.observe works and how to avoid multiple instances running?

Observe
I was trying to figure it out how cursor.observe runs inside meteor, but found nothing about it.
Docs says
Establishes a live query that notifies callbacks on any change to the query result.
I would like to understand better what live query means.
Where will be my observer function executed? By Meteor or by mongo?
Multiple runs
When we have more than just a user subscribing an observer, one instance runs for each client, leading us to a performance and race condition issue.
How can I implement my observe to it be like a singleton? Just one instance running for all.
Edit: There was a third question here, but now it is a separated question: How to avoid race conditions on cursor.observe?
Server side, as of right now, observe works as follows:
Construct the set of documents that match the query.
Regularly poll the database with query and take a diff of the changes, emitting the relevant events to the callbacks.
When matching data is changed/inserted into mongo by meteor itself, emit the relevant events, short circuiting step #2 above.
There are plans (possibly in the next release) to automatically ensure that calls to subscribe that have the same arguments are shared. So basically taking care of the singleton part for you automatically.
Certainly you could achieve something like this yourself, but I believe it's a high priority for the meteor team, so it's probably not worth the effort at this point.

Does MongoDB's update atomicity apply to both query and modification?

MongoDB has support for atomic updates. I.e. I can be sure that when a document is updated no other update will overwrite my previous change. My question relates to the combination of query and update statement, and is best illustrated by the example shown below.
db.foo.update(
{ state : 1, players: { $size: 2 } } ,
{ $push: { players : { new player document } } },
false , true );
In the above example, I only want to push a new player into a collection of players, if the number of players equals 2. Given the above query and update statement, is it possible that two simultaneous updates both push a player onto the same document, because at the time of reading the document its players $size is 2? I.e. does the atomicity span across the query and update part of the update statement or not?
Edit More in-depth sequence of events:
Consider firing the same update twice (U1 and U2) at the same time. Is the following sequence of events possible or not?
U1 finds that document #1 matches the query portion of the update
statement.
U2 finds that document #1 matches the query portion of
the update statement.
U1 pushes a new player in document #1.
U2 pushes a new player in document #1.
The end result is that document #1 contains one more player than expected, because both U1 and U2 were under the impression that document #1 contains only two players.
I've asked this question on the mongodb-user group. http://groups.google.com/group/mongodb-user/browse_thread/thread/e61e220dc0f6f64c
According to the answer by Marc (who works at 10gen) the situation described by me cannot occur.
The situation that you described is not possible; there is no danger
of both updates modifying the same document.
Update: not sure of my knowledge anymore... See "The ABA Nuance". Please don't accept this answer (or my comment below) as it is probably not correct. Would love to be corrected.
Your explanation of atomic is incorrect (I can be sure that when a document is updated no other update will overwrite my previous change). Other updates can (and will) overwrite your change. But they won't do it in a way that would interfere with integrity of your query.
It is important to know that MongoDB updates are atomic on single document. So when a document matches your query, it is "locked" and ready for an update. Note that your update ($push) works inside the same document that was locked. When update is finished, lock is released.
I am not sure I understand "does the atomicity span across the query and update part of the update statement or not", but: atomic means that other queries can't mess with our query. Our query can change data that is "locked" by itself.
Disclaimer: I am not privy to internal mechanisms MongoDB uses to ensure this atomicity, so this description might be lacking from technical viewpoint (especially in connection to locking) - but it is valid conceptually. This is how it works from external viewpoint.
With the sequence of events that you write down, you can indeed have one player too many. The update's "find" and "update" work very much like doing it yourself with a "find" and then an "update" on each of the documents that you're iterating over. You probably want to have a look at the "$atomic" operator: http://www.mongodb.org/display/DOCS/Atomic+Operations#AtomicOperations-ApplyingtoMultipleObjectsAtOnce

How can I execute a callback after a massive multiple insert with Mongoose?

I have an object results that's very large (maybe over 1,000 items). I'm iterating over it to save to the DB but this seems very inefficient:
for result in results
item = new Item result
item.save()
Is there a more optimal way to do this and THEN get a callback as opposed to a callback for EVERY save?
The async module will help a lot with this. You're probably looking for a queue.
https://github.com/caolan/async#queue
You may be getting near the edge of a the normal Node.js use case.