I am using a MongoDB cursor to find a large number of documents, which takes quite some time. What happens if during this time, there are documents added to the database that match the search criteria of the cursor.
Will the cursor return the documents?
Or does the cursor take some kind of snapshot if it begins, and thus omits the later added results?
Will the cursor return the documents?
Yes. This also happens when you update some documents which you received from the cursor, causing them to grow out of their current disk bounds and move to a bigger slot in the data files. In this case, you may see such documents twice (or more).
Related
I am using a MongoDB cursor to find a large number of documents, which takes quite some time. What happens if during this time, there are documents added to the database that match the search criteria of the cursor.
Will the cursor return the documents?
Or does the cursor take some kind of snapshot if it begins, and thus omits the later added results?
Will the cursor return the documents?
Yes. This also happens when you update some documents which you received from the cursor, causing them to grow out of their current disk bounds and move to a bigger slot in the data files. In this case, you may see such documents twice (or more).
I'm in the learning phase of mongodb.
I have a test website project where each step of a story is a domain.com/step
for instance, step 14 is accessed through domain.com/14
In other words, for the above case, I will need to access 14th document in my collection to serve it.
I've been using find().skip(n).limit(1) method for this so far to return nth document however it becomes extremely slow when there are too many documents to skip. So I need a more efficient way to get the nth document in my collection.
Any ideas are appreciated.
Add a field to your documents which tells you which step it is, add an index to that field and query by it.
Document:
{
step:14
text:"text",
date:date,
imageurl:"imageurl"
}
Index:
db.collection.createIndex({step:1});
Query:
db.collection.find({step:14});
Relying on natural order in the collection is not just slow (as you found out), it is also unreliable. When you start a new collection and insert a bunch of documents, you will usually find them in the order you inserted them. But when you change documents after they were inserted, it can happen that the order gets messed up in unpredictable ways. So never rely on insertion order being consistent.
Exception: Capped Collections guarantee that insertion order stays consistent. But there are very few use-cases where these are useful, and I don't think you have such a case here.
I attempt to give MongoDB a try for a new project. Never worked with it before.
The manual on cursors says:
Because the cursor is not isolated during its lifetime, intervening
write operations on a document may result in a cursor that returns a
document more than once if that document has changed. To handle this
situation, see the information on snapshot mode.
That means that I always have to use snapshot() on read and/or $isolated and write operations to ensure consistent result sets or in other words, to apply some kind of transactionality. Is this correct? Or why should I not use snapshot() since not using it would be a risk to always get incosistent data?
You should use snapshot() when you are modifying the results of the cursor itself: while iterating the cursor and modifying the documents on iteration Or if you are calling a collection that you expect to be modified between calling and the iteration of the cursor itself.
also if your cursor result is larger than 1mb because you should consider using snapshot. queries that have a result set of less than 1 megabyte are snapshot from default
Notice that you can't use snapshot if you are using sharded collection
In MongoDB, read operation on the collection returns cursor.
If the read operation is accessing most of the documents in the collection and it may be possible that it may interleave with other update operation.
In that case, may it be possible that cursor will have duplicate documents ?
How to make sure that cursor will avoid duplicates ?
The distinct method will not be of much help here. This is not a problem that the function can solve, not only that but it is a fraction of the speed of a normal cursor.
If the read operation is accessing most of the documents in the collection and it may be possible that it may interleave with other update operation.
It is possible if the documents move in such a manner that with the sort of the cursor they get read again.
Whether this is a problem or not depends, if you are sorting by something that won't be updated, for example _id, then you don't really need to worry, however, if you are sorting by something that will be updated and could shift then yes; you will have a problem.
One method of solving this is to look at the last _id in that iteration of the cursor, filling the cursor into batchs of 1000 in an array or something. After you have the last _id in that batch you range, taking everything greater than that _id.
Another method could be to do snapshot queries: http://docs.mongodb.org/manual/reference/operator/snapshot/ however this function has quite a few limitations, for example it cannot be used with sharded collections.
I have a very large capped collection in mongodb. Given that the capped collection structure is predictable (i.e. sort is predefined, memory footprint is predefined, etc), is there a better way to get a cursor on the LATEST item inserted instead of iterating?
In other words, what I'm doing right now is to get the size of my collection (n), and then create a cursor that sets skip=n-1 to put me at the end of the collection. Then I iterate on the cursor and handle all new additions to the collection.
The problem with this approach is that my collection is huge. lets say 11m records. that takes 20 minutes to skip. Which means that when my cursor starts emitting data, its 20 minutes behind.
Try db.cappedCollection.find().limit(1).sort({$natural:-1}) .
Have you tried indexing the collection and using $gt - this should be faster although the index will have some impact on the speed of the writes to the collection.