mongodb limit operation retrieve newest docs - mongodb

mongodb find return docs by ascending order of "_id", when apply limit(n) on find(), it always return oldest n docs (Assume doc1's _id > doc2's _id imply doc1 newer than doc2, for example, the ObjectId ). I want let it return newest n docs so I do:
col.find().sort({"_id":-1}).limit(n)
Is this inefficient? Will mongodb sort all docs in 'col'?

The _id field is essentially the "primary key" and therefore has an index so there is not actually a "sort" on the whole collection, it just traverses that primary index in reverse order in this case.
Provided that you are happy enough that this does reflect your "newest" documents, and in normal circumstances there is no reason to believe otherwise, then this will return what you want in an efficient manner.
If indeed you want to sort by something else such as a timestamp or other field then just create an index on that field and sort as you have above. The general cases should use that index as well and just return in "descending order" or as specified in the direction of your sort or default index
db.collection.ensureIndex({ "created": 1 })
Or as default "descending" :
db.collection.ensureIndex({ "created": -1 })
Then query:
db.collection.find().sort({ "created": -1 })
So basically it does not "sort" the whole collection when an index is present to use. The _id key is always indexed.
Also see .ensureIndex() in the documentation.

Related

Using object as _id in MongoDb causes collscan on queries

I'm having some issues with using a custom object as my _id value in MongoDb.
The objects I'm storing in _id looks like this:
"_id" : {
"EDIEL" : "1010101010101",
"StartDateTicks" : NumberLong(636081120000000000)
}
Now, when I'm performing the following query:
.find({
"_id.EDIEL": { $eq: "1010101010101" },
"_id.StartDateTicks": { $gte: 636082776000000000, $lt: 636108696000000000 }
}).explain()
I does a COLLSCAN. I can't figure out why exactly. Is it because I'm not querying against the _id object with an object?
Does anyone know what I'm doing wrong here? :-)
Edit:
Tried to create a compound index containing the EDIEL and StartDateTicks fields, ran the query again and now it uses the index instead of a column scan. While this works, it would still be nice to avoid having the extra index and just having the _id (since it's basically a "free" index) So, the question still stands: why can't I query against the _id.EDIEL and _id.StartDateTicks and make use of the index?
Indexes are used on keys and not on objects, so when you use object for _id, the indexing on object can't be used for the specific query you do on the field of the object.
This is true not only for _id but subdocument also.
{
"name":"awesome book",
"detail" :{
"pages":375,
"alias" : "AB"
}
}
Now when you have index on detail and you query by detail.pages or detail.alias, the index on detail cannot be used and certainly not for range queries. You need to have indexes on detail.pages and detail.alias.
when index is applied on object it maintains the index of object as a whole and not per field, that's why queries on object fields are not able to use object indexes.
Hope that helps
You will need to index the two fields separately, since indexes cant be on embedded documents. Thus creating a compound index is the only option available, or creating multiple indexes on the fields which in turn use intersection index are the options for you.

MongoDB: Update field with size of embedded array

I have a collection of documents with an array (set in this case): my_array
I'm adding things to this set periodically
collection.find({ "_id": target_id })
.upsert().update({ '$addToSet':{ 'my_array': new_entry }})
Many of the logical operations I perform on this DB are based on this sub-array's size. So I've created a field (indexed) called len_of_array. The index is quite critical to my use case.
In the case where this is a true array and not a set, the $incr would work beautifully in the same update
However, since the sub-collection is a set, the length of the collection, my_array, may or may not have changed.
My current solution:
Call this periodically for each target_id, but this requires performing a find in order to get the correct len_of_array
collection.find({ '_id': target_id})
.upsert().update({ '$set':{ 'len_of_array': new_length }})
My Question
Is there a way to set a field of a document to the indexed size of a sub-array in the same document in a single update?
You don't need the field length_of_array in order to query by its size. There is the $size operator. Which would save you the periodical update, too.
Let's say you want to find all documents for which the length of my_array is greater than 2:
db.coll.find({ "my_array":{ "$size" :{ "$gt": 2 }}})

Does MongoDB find() query return documents sorted by creation time?

I need documents sorted by creation time (from oldest to newest).
Since ObjectID saves timestamp by default, we can use it to get documents sorted by creation time with CollectionName.find().sort({_id: 1}).
Also, I noticed that regular CollectionName.find() query always returns the documents in same order as CollectionName.find().sort({_id: 1}).
My question is:
Is CollectionName.find() guaranteed to return documents in same order as CollectionName.find().sort({_id: 1}) so I could leave sorting out?
No. Well, not exactly.
A db.collection.find() will give you the documents in the order they appear in the data files most of the times, though this isn't guaranteed.
Result Ordering
Unless you specify the sort() method or use the $near operator, MongoDB does not guarantee the order of query results.
As long as your data files are relatively new and few updates happen, the documents might (and most of the times will) be returned in what appears to be sorted by _id since ObjectId is monotonically increasing.
Later in the lifecycle, old documents may have been moved from their old position (because they increased in size and documents are never partitioned) and new ones are written in the place formerly occupied by another document. In this case, a newer document may be returned in a position between two old documents.
There is nothing wrong with sorting documents by _id, since the index will be used for that, adding only some latency for document retrieval.
However, I would strongly recommend against using the ObjectId for date operations for several reasons:
ObjectIds can not be used for date comparison queries. So you couldn't query for all documents created between date x and date y. To archive that, you'd have to load all documents, extract the date from the ObjectId and compare it – which is extremely inefficient.
If the creation date matters, it should be explicitly addressable in the documents
I see ObjectIds as a choice of last resort for the _id field and tend to use other values (compound on occasions) as _ids, since the field is indexed by default and it is very likely that one can save precious RAM by using a more meaningful value as id.
You could use the following for example which utilizes DBRefs
{
_id: {
creationDate: new ISODate(),
user: {
"$ref" : "creators",
"$id" : "mwmahlberg",
"$db" : "users"
}
}
}
And do a quite cheap sort by using
db.collection.find().sort({_id.creationDate:1})
Is CollectionName.find() guaranteed to return documents in same order as CollectionName.find().sort({_id: 1})
No, it's not! If you didn't specify any order, then a so-called "natural" ordering is used. Meaning that documents will be returned in the order in which they physically appear in data files.
Now, if you only insert documents and never modify them, this natural order will coincide with ascending _id order. Imagine, however, that you update a document in such a way that it grows in size and has to be moved to a free slot inside of a data file (usually this means somewhere at the end of the file). If you were to query documents now, they wouldn't follow any sensible (to an external observer) order.
So, if you care about order, make it explicit.
Source: http://docs.mongodb.org/manual/reference/glossary/#term-natural-order
natural order
The order in which the database refers to documents on disk. This is the default sort order. See $natural and Return in Natural Order.
Testing script (for the confused)
> db.foo.insert({name: 'Joe'})
WriteResult({ "nInserted" : 1 })
> db.foo.insert({name: 'Bob'})
WriteResult({ "nInserted" : 1 })
> db.foo.find()
{ "_id" : ObjectId("55814b944e019172b7d358a0"), "name" : "Joe" }
{ "_id" : ObjectId("55814ba44e019172b7d358a1"), "name" : "Bob" }
> db.foo.update({_id: ObjectId("55814b944e019172b7d358a0")}, {$set: {answer: "On a sharded collection the $natural operator returns a collection scan sorted in natural order, the order the database inserts and stores documents on disk. Queries that include a sort by $natural order do not use indexes to fulfill the query predicate with the following exception: If the query predicate is an equality condition on the _id field { _id: <value> }, then the query with the sort by $natural order can use the _id index. You cannot specify $natural sort order if the query includes a $text expression."}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.foo.find()
{ "_id" : ObjectId("55814ba44e019172b7d358a1"), "name" : "Bob" }
{ "_id" : ObjectId("55814b944e019172b7d358a0"), "name" : "Joe", "answer" : "On a sharded collection the $natural operator returns a collection scan sorted in natural order, the order the database inserts and stores documents on disk. Queries that include a sort by $natural order do not use indexes to fulfill the query predicate with the following exception: If the query predicate is an equality condition on the _id field { _id: <value> }, then the query with the sort by $natural order can use the _id index. You cannot specify $natural sort order if the query includes a $text expression." }

In MongoDB, is db.collection.find() same as db.collection.find().sort({$natural:1})?

I'm sure this is an easy one, but I just wanted to make sure. Is find() with some search and projection criterion same as applying a sort({$natural:1}) on it?
Also, what is the default natural sort order? How is it different from a sort({_id:1}), say?
db.collection.find() has the result as same as db.collection.find().sort({$natural:1})
{"$natural" : 1} forces the find query to do a table scan (default sort), it specifies hard-disk order when used in a sort.
When you are updating your document, mongo could move your document to another place of hard-disk.
for example insert documents as below
{
_id : 0,
},
{
_id : 1,
}
then update:
db.collection.update({ _id : 0} , { $set : { blob : BIG DATA}})
And when you perform the find query you will get
{
"_id" : 1
},
{
"_id" : 0,
"blob" : BIG DATA
}
as you see the order of documents has changed => the default order is not by _id
If you don't specify the sort then mongodb find() will return documents in the order they are stored on disk. Document storage on disk may coincide with insertion order but thats not always going to be true. It is also worth noting that the location of a document on disk may change. For instance in case of update, mongodb may move a document from one place to another if needed.
In case of index - The default order will be the order in which indexes are found if the query uses an index.
The $natural is the order in which documents are found on disk.
It is recommended that you specifiy sort explicitly to be sure of sorting order.

MongoDB multikeys on _id + some value

In MongoDB I have a query which looks like this to find out for which comments the user has already voted:
db.comments.find({
_id: { $in: [...some ids...] },
votes.uid: "4fe1d64d85d4f4c00d000002"
});
As the documentation says you should have
One index per query
So what's better creating a multikey on _id + votes.uid or is it enough to just index on votes.uid because Mongo handles _id automatically in any way?
There is automatically an index on _id.
Depending of your queries (how many ids you have in the $in array) and your data, (how many votes you have on one object) you may create a index on votes.uid.
Take care of which index is used during query execution and remember you can force Mongo to use the index you want by adding .hints(field:1) or hints('indexname')