Does MongoDB's update atomicity apply to both query and modification? - mongodb

MongoDB has support for atomic updates. I.e. I can be sure that when a document is updated no other update will overwrite my previous change. My question relates to the combination of query and update statement, and is best illustrated by the example shown below.
db.foo.update(
{ state : 1, players: { $size: 2 } } ,
{ $push: { players : { new player document } } },
false , true );
In the above example, I only want to push a new player into a collection of players, if the number of players equals 2. Given the above query and update statement, is it possible that two simultaneous updates both push a player onto the same document, because at the time of reading the document its players $size is 2? I.e. does the atomicity span across the query and update part of the update statement or not?
Edit More in-depth sequence of events:
Consider firing the same update twice (U1 and U2) at the same time. Is the following sequence of events possible or not?
U1 finds that document #1 matches the query portion of the update
statement.
U2 finds that document #1 matches the query portion of
the update statement.
U1 pushes a new player in document #1.
U2 pushes a new player in document #1.
The end result is that document #1 contains one more player than expected, because both U1 and U2 were under the impression that document #1 contains only two players.

I've asked this question on the mongodb-user group. http://groups.google.com/group/mongodb-user/browse_thread/thread/e61e220dc0f6f64c
According to the answer by Marc (who works at 10gen) the situation described by me cannot occur.
The situation that you described is not possible; there is no danger
of both updates modifying the same document.

Update: not sure of my knowledge anymore... See "The ABA Nuance". Please don't accept this answer (or my comment below) as it is probably not correct. Would love to be corrected.
Your explanation of atomic is incorrect (I can be sure that when a document is updated no other update will overwrite my previous change). Other updates can (and will) overwrite your change. But they won't do it in a way that would interfere with integrity of your query.
It is important to know that MongoDB updates are atomic on single document. So when a document matches your query, it is "locked" and ready for an update. Note that your update ($push) works inside the same document that was locked. When update is finished, lock is released.
I am not sure I understand "does the atomicity span across the query and update part of the update statement or not", but: atomic means that other queries can't mess with our query. Our query can change data that is "locked" by itself.
Disclaimer: I am not privy to internal mechanisms MongoDB uses to ensure this atomicity, so this description might be lacking from technical viewpoint (especially in connection to locking) - but it is valid conceptually. This is how it works from external viewpoint.

With the sequence of events that you write down, you can indeed have one player too many. The update's "find" and "update" work very much like doing it yourself with a "find" and then an "update" on each of the documents that you're iterating over. You probably want to have a look at the "$atomic" operator: http://www.mongodb.org/display/DOCS/Atomic+Operations#AtomicOperations-ApplyingtoMultipleObjectsAtOnce

Related

Firestore, why use "update" instead of "set merge"?

set with merge will update fields in the document or create it if it doesn't exists
update will update fields but will fail if the document doesn't exist
Wouldn't it be much easier to always use set merges?
Are the prices slightly different?
Between set merge and update there is difference in the use case.
You may find detailed information regarding this on this post.
Regarding the pricing, as stated here:
Each set or update operation counts as a single write and is being billed according to the region.
=========================================================================
EDIT:
The choice of which operation to use is greatly depending on the use case, as if you use "set merge" for a batch update, your request will successfully update all existing documents but also create dummy documents for non existent ids, which sometimes is not what you want.
After investigating a bit further, we could add another difference:
set merge will always override the data with the data you pass, while
update is specifically designed to give you the possibility to perform a partial update of a document without the possibility of creating incomplete documents that your code isn't otherwise prepared to handle. Please check this answer, as well as this scenario.
The difference is that .set(data, {merge:true}) will update the document if it exists, or create the document if it doesn't.
.update() fails if the document doesn't exist.
But why does .update() still exist? Well, probably for backward compatibility. I believe .set() with merge:true has been introduced at a later date than .update(). As you have pointed out, set/merge is more versatile. I use it instead of .update() and instead of .add()

MongoDB/Mongoose atomic read & write on single Document

I need to update a Document based on certain criteria with mongoDB/mongoose.
So I search for the Document in the Collection based on ID.
Once I have the document I check if it satisfies the condition I'm looking for (values of one the nested properties).
Once I've confirmed the Document satisfies the criteria, I perform certain actions on the document and then Save the document.
The whole process of finding the Document, checking for criteria, making adjustments and updating, takes some time to complete.
Problem is I can trigger the api that runs this process multiple times.
While the entire process is running for the first api call, I can call the api multiple times (can't control api call) and have the entire process running again for the same document.
So now I end up making the same updates twice on the Document.
If the first call runs through successfully, the next call will not cos the first update will ensure the criteria is no longer met. But since they are being called while the first one hasn't finished updating it ends up going through successfully.
Any way I can perform all the steps as one atomic action?
Well, I'm a year or two late to this party, and this answer I'm about to write has some issues depending on your use case, but for my situation, it has worked pretty well.
Here's the solution with Mongoose / JavaScript:
const findCriteria1 = { _id: theId };
myModel.findOne(findCriteria1, function(error, dataSet) {
if (dataSet.fieldInQuestion === valueIWantFieldToBe) {
const findCriteria2 = { _id: theId, fieldInQuestion: dataSet.fieldInQuestion };
const updateObject = { fieldInQuestion: updateValue };
myModel.findOneAndUpdate(findCriteria2, updateObject, function(error, dataSet) {
if (!error) {
console.log('success')
}
});
}
}
So basically, you find() the document with the value you want, and if it meets conditions, you do a findOneAndUpdate(), making sure that the document value did not change from what it was when you found it (as seen in findCriteria2).
One major issue with this solution is that it is possible for this operation to fail because the document value was updated by another user in between this operation's DB calls. This is unlikely, but may not be acceptable, especially if your DB is being pounded with frequent calls to the same document. A much better solution, if it exists, would be for a document lock and update queue, much like most SQL databases can do.
One way to help with that issue would be to wrap the whole solution I gave in a loop, and if the findOneAndUpdate fails, to try the loop again until it doesn't fail. You could set how many times you tried the loop... and this seems like a really bad idea, but you could do an infinite loop... of course, yeah, that could be really dangerous because it has the potential to totally disable the DB.
Another issue that my loop idea doesn't solve is that if you need a "first come, first served" model, that might not always be the case, as a DB request that thwarts the request before it may get to be the "first served".
And, a better idea altogether might just be to change how you model your data. Of course, this depends on what checks you need to run on your data, but you mentioned "values of nested properties" in your answer... what if those values were just in a seperate document and you could simply check what you needed to on the findOneAndUpdate() criteria parameter?
To operate on a consistent snapshot of the entire database, use a transaction with read concern snapshot.
Transactions do not magically prevent concurrency. withTransaction helper handles the mechanics of many cases of concurrent modifications transparently to the application, but you still need to understand concurrent operations on databases in general to write working/correct code.

What do the oplog fields actually mean?

I've seen this question posed before, but the answers were very vague. I have been doing some research on oplog, and am trying to understand exactly how it works. In particular, I want to have a good understanding of the fields in an oplog document and what data they store.
These are the fields I have found through tests and what I think they mean as well as what I am still unsure of:
ts: timestamp of the write operation / oplog entry
h: a unique identifier for the oplog entry (but why is it sometimes positive and sometimes negative?)
op: type of operation performed (usually i/u/d for insert, update or delete)
ns: database & collection affected
o: the new state of the document after performing the change
o2: Seems to contain the _id field of the document during an update operation. Why is this needed when that same field is present as part of the o field, which also contains the rest of the document?
b: Seems to be a bool that appears for delete operations. What is the significance of this field?
I would like to confirm whether or not the points I made above are accurate, as well as clarifications for the bits that aren't clear. I am also interested to know if there any other fields that can appear in an oplog document.
h is a hash (signed Long)
ts is the internal timestamp format (the "\x11" type shown at bsonspec.org; search the API docs for your driver at api.mongodb.org for further information)
you are correct on op, ns, o, and o2
there's also a "v" field (I'm gonna speculate that this is version, which would allow them to update the schema for the oplog).
b is True for all the delete operations I could find, so I can't provide any information.
The best source of documentation I've found is this. It was a presentation by a company called Stripe at 2014's MongoDB World conference, and it includes some sample Ruby code.

How to deal with many users update over Document driven nosql database

Problem
Starting with nosql document database I figured out lots of new possibilities, however, I see some pitfalls, and I would like to know how can I deal with them.
Suppose I have a product, and this product can be sold in many regions. There is one responsible for each region (with access to CMS). Each responsible modifies the products accordingly regional laws and rules.
Since Join feature isn't supported as we know it on relational databases, the document should be design in a way it contains all the needed information to build our selection statements and selection result to avoid round trips to the database.
So my first though was to design a document that follows more or less this structure:
{
type : "product",
id : "product_id",
title : "title",
allowedAge : 12,
regions : {
'TX' : {
title : "overriden title",
allowedAge : 13
},
'FL' : {
title : "still another title"
}
}
}
But I have the impression that this approach will generate conflicts while updating the document. Suppose we have a lot of users updating lots of document through a CMS. When same document is updated, the last update overwrites the updates done before, even the users are able to modify just fragments of this document (in this case the responsible should be able to modify just the regional data).
How to deal with this situation?
One possible solution I think of would be partial document updates. Positive: reducing the data overwriting from different operations, Negative: lose the optimistic lock feature since locking if done over a document not a fragment of such.
Is there another approach for the problem?
In this case you can use 3 solutions:
Leave current document structure and always check CAS value on update operations. If CAS doesn't match - call store function again. (But as you say if you have a lot of users it can be very slow).
Separate doc in several parts that could be updated independently, and then on app-side combine them together. This will result in increasing view calls (one for get main doc, another call to get i.e. regions, etc.). If you have many "parts" it will also reduce performance.
See this doc. It's about simulating joins in couchbase. There is also good example written by #Tug Grall.
If you're not bounded to using Couchbase (not clear from your question if it's general or specific to it) - look also into MongoDB. It supports partial updates on documents and also other atomic operations (like increments and array operations), so it might suite your use case better (checkout possible update operations on mongo - http://docs.mongodb.org/manual/core/update/ )

mongo aggregation framework get transactions in sequence

How do i do this in mongo aggregation framework?
given these records
record 1. {id:1,action:'clicked',user:'id', time:'1'}
record 2. {id:2,action:'video play',user:'id',time:'2'}
record 3. {id:3,action:'page load',user:'id',time:'3'}
record 4. {id:4,action:'video play',user:'id',time:'4'}
record 5. {id:1,action:'clicked',user:1id', time:'5'}
record 6. {id:2,action:'video play',user:'id',time:'6'}
now, how do i get the all "video play" that are after clicked action? anybody come across with this kind of aggregation?
You will need to redesign your schema. I can think of two approaches. In your application you can track the click path of a session. When you insert an action to your collection, you will need to also track the previous interaction. Once you have this, then you just need to do something like db.actions.find({prevAction:"clicked",action:"video play"}).count(). This will be very fast.
Alternatively, if you decide you like to track session click path information, you may have a document like:
{_id:sessionid
user:usderid
actions:[
{...login}
{...click link}
{...play video}
]}
You can create this collection by doing upserts. Make sure you keep the action subdocuments small so you don't exceed the 16MB limit for standard documents. Also set the collection's padding factor to "powersof2."
Once you have this collection, you can pull out these documents to get some interesting info. The specific aggregation that you want to do would be more complex on this collection than the suggestion that I previously made though. You will need to create a MR process that may run periodically behind the scenes to calculate what you want (emit key value only if the area contains the expected sequence of actions).