Does MongoDB find() query return documents sorted by creation time? - mongodb

I need documents sorted by creation time (from oldest to newest).
Since ObjectID saves timestamp by default, we can use it to get documents sorted by creation time with CollectionName.find().sort({_id: 1}).
Also, I noticed that regular CollectionName.find() query always returns the documents in same order as CollectionName.find().sort({_id: 1}).
My question is:
Is CollectionName.find() guaranteed to return documents in same order as CollectionName.find().sort({_id: 1}) so I could leave sorting out?

No. Well, not exactly.
A db.collection.find() will give you the documents in the order they appear in the data files most of the times, though this isn't guaranteed.
Result Ordering
Unless you specify the sort() method or use the $near operator, MongoDB does not guarantee the order of query results.
As long as your data files are relatively new and few updates happen, the documents might (and most of the times will) be returned in what appears to be sorted by _id since ObjectId is monotonically increasing.
Later in the lifecycle, old documents may have been moved from their old position (because they increased in size and documents are never partitioned) and new ones are written in the place formerly occupied by another document. In this case, a newer document may be returned in a position between two old documents.
There is nothing wrong with sorting documents by _id, since the index will be used for that, adding only some latency for document retrieval.
However, I would strongly recommend against using the ObjectId for date operations for several reasons:
ObjectIds can not be used for date comparison queries. So you couldn't query for all documents created between date x and date y. To archive that, you'd have to load all documents, extract the date from the ObjectId and compare it – which is extremely inefficient.
If the creation date matters, it should be explicitly addressable in the documents
I see ObjectIds as a choice of last resort for the _id field and tend to use other values (compound on occasions) as _ids, since the field is indexed by default and it is very likely that one can save precious RAM by using a more meaningful value as id.
You could use the following for example which utilizes DBRefs
{
_id: {
creationDate: new ISODate(),
user: {
"$ref" : "creators",
"$id" : "mwmahlberg",
"$db" : "users"
}
}
}
And do a quite cheap sort by using
db.collection.find().sort({_id.creationDate:1})

Is CollectionName.find() guaranteed to return documents in same order as CollectionName.find().sort({_id: 1})
No, it's not! If you didn't specify any order, then a so-called "natural" ordering is used. Meaning that documents will be returned in the order in which they physically appear in data files.
Now, if you only insert documents and never modify them, this natural order will coincide with ascending _id order. Imagine, however, that you update a document in such a way that it grows in size and has to be moved to a free slot inside of a data file (usually this means somewhere at the end of the file). If you were to query documents now, they wouldn't follow any sensible (to an external observer) order.
So, if you care about order, make it explicit.
Source: http://docs.mongodb.org/manual/reference/glossary/#term-natural-order
natural order
The order in which the database refers to documents on disk. This is the default sort order. See $natural and Return in Natural Order.
Testing script (for the confused)
> db.foo.insert({name: 'Joe'})
WriteResult({ "nInserted" : 1 })
> db.foo.insert({name: 'Bob'})
WriteResult({ "nInserted" : 1 })
> db.foo.find()
{ "_id" : ObjectId("55814b944e019172b7d358a0"), "name" : "Joe" }
{ "_id" : ObjectId("55814ba44e019172b7d358a1"), "name" : "Bob" }
> db.foo.update({_id: ObjectId("55814b944e019172b7d358a0")}, {$set: {answer: "On a sharded collection the $natural operator returns a collection scan sorted in natural order, the order the database inserts and stores documents on disk. Queries that include a sort by $natural order do not use indexes to fulfill the query predicate with the following exception: If the query predicate is an equality condition on the _id field { _id: <value> }, then the query with the sort by $natural order can use the _id index. You cannot specify $natural sort order if the query includes a $text expression."}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.foo.find()
{ "_id" : ObjectId("55814ba44e019172b7d358a1"), "name" : "Bob" }
{ "_id" : ObjectId("55814b944e019172b7d358a0"), "name" : "Joe", "answer" : "On a sharded collection the $natural operator returns a collection scan sorted in natural order, the order the database inserts and stores documents on disk. Queries that include a sort by $natural order do not use indexes to fulfill the query predicate with the following exception: If the query predicate is an equality condition on the _id field { _id: <value> }, then the query with the sort by $natural order can use the _id index. You cannot specify $natural sort order if the query includes a $text expression." }

Related

How to eliminate Query Targeting: Scanned Objects / Returned has gone above 1000 in MongoDB?

There are some questions 1, 2 talk about the MongoDB warning Query Targeting: Scanned Objects / Returned has gone above 1000, however, my question is another case.
The schema of our document is
{
"_id" : ObjectId("abc"),
"key" : "key_1",
"val" : "1",
"created_at" : ISODate("2021-09-25T07:38:04.985Z"),
"a_has_sent" : false,
"b_has_sent" : false,
"updated_at" : ISODate("2021-09-25T07:38:04.985Z")
}
The indexes of this collections are
{
"key" : {
"updated_at" : 1
},
"name" : "updated_at_1",
"expireAfterSeconds" : 5184000,
"background" : true
},
{
"key" : {
"updated_at" : 1,
"a_has_sent" : 1,
"b_has_sent" : 1
},
"name" : "updated_at_1_a_has_sent_1_b_has_sent_1",
"background" : true
}
The total number of documents after 2021-09-24 is over 600000, and the distinct value of key is 5.
The above waning caused by the query
db.collectionname.find({ "updated_at": { "$gte": ISODate("2021-09-24")}, "$or": [{ "a_has_sent": false }, {"b_has_sent": false}], "key": "key_1"})
Our server sends one document to a and b simutinously with batch size 2000. After sending to a successfully, mark a_has_sent to true. The same logic to b. As sending process goes on, the number of documents with a_has_sent: false reduce. And the above warning comes up.
After checking the explain result of this query, the index named updated_at_1 is used rather than updated_at_1_a_has_sent_1_b_has_sent_1.
What we had tried.
We add another new index {"updated_at": 1, "key": 1}, and expect this query could use the new index to reduce the number of scanned documents. Unfortunately, we failed. The index named updated_at_1 is still used.
We try to replace find with aggregate
aggregate([{"$match": { "updated_at": { "$gte": ISODate("2021-09-24") }, "$or": [{ "a_has_sent": false }, { "b_has_sent": false}], "key": "key_1"}}]). Unfortunately, The index named updated_at_1 is still used.
We want to know how to eliminate this warning Scanned Objects / Returned has gone above 1000?
Mongo 4.0 is used in our case.
Follow the ESR rule
For compound indexes, this rule of thumb is helpful in deciding the order of fields in the index:
First, add those fields against which Equality queries are run.
The next fields to be indexed should reflect the Sort order of the query.
The last fields represent the Range of data to be accessed.
We create the index {"action_key" : 1,"adjust_sent" : 1,"facebook_sent" : 1,"updated_at" : 1}, this index could be used by the query now
Update 08/15/2022
Query Targeting alerts indicate inefficient queries.
Query Targeting: Scanned Objects / Returned occurs if the number of documents examined to fulfill a query relative to the actual number of returned documents meets or exceeds a user-defined threshold. The default is 1000, which means that a query must scan more than 1000 documents for each document returned to trigger the alert.
Here are some steps to solve this issue
First, The Performance Advisor provides the easiest and quickest way to create an index. If there is any create Indexes suggestion, you can create this recommended index.
Then, you could check the query profile if there is no recommended index in Performance Advisor. The Query Profiler contains several metrics you can use to pinpoint specific inefficient queries. The Query Profiler can show the Examined : Returned Ratio (index keys examined to documents returned) of logged queries, which might help you identify the queries that triggered a
Query Targeting: Scanned / Returned
alert. The chart shows the number of index keys examined to fulfill a query relative to the actual number of returned documents.
You can use the following resources to determine which query generated the alert:
The Real-Time Performance Panel monitors and displays current network traffic and database operations on machines hosting MongoDB in your Atlas clusters.
The MongoDB logs maintain an account of activity, including queries, for each mongod instance in your Atlas clusters.
The following mongod log entry shows statistics generated from an inefficient query:
<Timestamp> COMMAND <query>
planSummary: COLLSCAN keysExamined:0
docsExamined: 10000 cursorExhausted:1 numYields:234
nreturned:4 protocol:op_query 358ms
This query scanned 10,000 documents and returned only 4 for a ratio of 2500, which is highly inefficient. No index keys were examined, so MongoDB scanned all documents in the collection, known as a collection scan
The cursor.explain() command for mongosh provides performance details for all queries.
The Data Profiler records operations that Atlas considers slow when compared to average execution time for all operations on your cluster.
Note - Enabling the Database Profiler incurs a performance overhead.
MongoDB cannot use a single index to process an $or that looks at different field values.
The index on
{
"updated_at" : 1,
"a_has_sent" : 1,
"b_has_sent" : 1
}
can be used with the $or expression to match either a_has_sent or b_has_sent.
To minimize the number of documents examined, create 2 indexes, one for each branch of the $or, combined with the enclosing $and (the filter implicitly combines the top-level query predicates with and). Such as:
{
"updated_at" : 1,
"a_has_sent" : 1
}
and
{
"updated_at" : 1,
"b_has_sent" : 1
}
Also note that the alert for Query Targeting: Scanned Objects / Returned has gone above 1000 does not refer to a single query.
The MongoDB server keeps a counter (64-bit?) that tracks the number of documents examined since the server was start, and another counter for the number of documents returned.
That scanned per returned ration is derive by simply dividing the examined counter by the returned counter.
This means that if you have something like a count query that requires examining documents, you may have hundreds or thousands of documents examined, but only 1 returned. It won't take many of these kinds of queries to push the ratio over the 1000 alert limit

Order of Fields in Mongo Query vs Ordered Checked In

Say you're querying documents based on 2 data points. One is a simple bool parameter, and the other is a complicated $geoWithin calculation.
db.collection.find( {"geoField": { "$geoWithin" : ...}, "boolField" : true} )
Will mongo reorder these parameters, so that it checks the boolField 1st, before running the complicated check?
MongoDB uses indexes like any other DBs. So the important thing for mongoDB is if any query fields has an index or not, not the order of query fields. At least there is no information in their documentation that mongoDB try to checks primitive query fields first. So for your example if boolField has an index mongoDB first check this field and eliminate documents whose boolField is false. But If geoField has an index then mongoDB first execute query on this field.
So what happens if none of them have index or both of them have? It should be the given order of fields in query because there is no suggestion or info beside of indexes in query optimization page of mongoDB. Additionally you can always test your queries performances with just adding .explain("executionStats").
So check the performance of db.collection.find( {"geoField": { "$geoWithin" : ...}, "boolField" : true} ) and db.collection.find( { "boolField" : true, "geoField": { "$geoWithin" : ...} } ). And let us know :)
To add to above response, if you want mongo to use specific index you can use cursor.hint . This https://docs.mongodb.com/manual/core/query-plans/ explains how default index selection is done.

In MongoDB, is db.collection.find() same as db.collection.find().sort({$natural:1})?

I'm sure this is an easy one, but I just wanted to make sure. Is find() with some search and projection criterion same as applying a sort({$natural:1}) on it?
Also, what is the default natural sort order? How is it different from a sort({_id:1}), say?
db.collection.find() has the result as same as db.collection.find().sort({$natural:1})
{"$natural" : 1} forces the find query to do a table scan (default sort), it specifies hard-disk order when used in a sort.
When you are updating your document, mongo could move your document to another place of hard-disk.
for example insert documents as below
{
_id : 0,
},
{
_id : 1,
}
then update:
db.collection.update({ _id : 0} , { $set : { blob : BIG DATA}})
And when you perform the find query you will get
{
"_id" : 1
},
{
"_id" : 0,
"blob" : BIG DATA
}
as you see the order of documents has changed => the default order is not by _id
If you don't specify the sort then mongodb find() will return documents in the order they are stored on disk. Document storage on disk may coincide with insertion order but thats not always going to be true. It is also worth noting that the location of a document on disk may change. For instance in case of update, mongodb may move a document from one place to another if needed.
In case of index - The default order will be the order in which indexes are found if the query uses an index.
The $natural is the order in which documents are found on disk.
It is recommended that you specifiy sort explicitly to be sure of sorting order.

Mongodb set _id as decreasing index

I want to use mongodb's default _id but in decreasing order. I want to store posts and want to get the latest posts at the start when I use find(). I am using mongoose. I tried with
postSchema.index({_id:-1})
but it didn't work
> db.posts.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "mean-dev.posts"
}]
I dropped the database and restarted mongod. No luck with that.
Is there any way to set _id as a decreasing index at the sametime using mongodb's default index? I don't want to use sort() to sort the result according to _id decreasingly.
Short answer
You cannot a descending index on _id field. You also don't need it. Mongo will use the existing default index when doing a descending sort on the _id field.
Long answer
As stated in the documentation MongoDB index on _id field is created automatically as an ascending unique index and you can't remove it.
You also don't need to create an additional descending index on _id field because MongoDB can use the default index for sorting.
To verify that MongoDB is using index for your sorting you can use explain command:
db.coll.find().sort({_id : -1}).explain();
In the output explain command, the relevant part is
"cursor" : "BtreeCursor _id_ reverse"
which means that MongoDB is using index for sorting your query in reverse order.
actually you can use this index, just put .sort({"_id":-1}) at the end of you query
ObjectId values do not represent a strict insertion order.
From documentation: http://docs.mongodb.org/manual/reference/object-id/
IMPORTANT The relationship between the order of ObjectId values and
generation time is not strict within a single second. If multiple
systems, or multiple processes or threads on a single system generate
values, within a single second; ObjectId values do not represent a
strict insertion order. Clock skew between clients can also result in
non-strict ordering even for values, because client drivers generate
ObjectId values, not the mongod process.

Sort collection permanently in Mongodb

Whenever we do db.Collection.find().sort(), only our output is sorted, not the collection itself,
i.e. If i do db.collection.find() then i see the original collection, not the sorted one.
Is there any way to sort the collection itself insted of just sorting the output?
Exporting the sorted result into entire new collection would also work.
if i have numbered _id field.(like _id:1 , _id_2 , _id:3 and so on)
Also I do not see any reason for doing this (index on the field on which you are going to sort it will help you to get this sort fast), here is a solution for your problem:
You have your test collection this way
{ "_id" : ObjectId("5273f6987c6c502364ddfe94"), "n" : 5 }
{ "_id" : ObjectId("5273f6e57c6c502364ddfe95"), "n" : 14}
{ "_id" : ObjectId("5273f6ee7c6c502364ddfe96"), "n" : -5}
Then the following command will create a sorted collection for you
db.test.find().sort({n : 1}).forEach(function(e){
db.testSorted.insert(e);
})
Completely the same way you can achieve with this (which I assume might perform a faster, but I have not done any testing):
db.testSorted.insert(db.test.find().sort({n : 1}).toArray());
And just to make this answer complete, also I understand that this is an overkill, you can do this with aggregation framework option $out.
Just to highlight: with all this you can solve bigger problem: save into another collection some sort of modification/subset of previous collection.
Documents in a collection are stored in natural order which is affected by document moves (when the document grows larger than the current record space allocated) and deletions (free space can be reused for inserted/moved documents). There is currently (as at MongoDB 2.4) no option to control the order of documents on disk aside from using a capped collection, which is a fixed-size collection that maintains insertion order but is subject to a number of restrictions.
An index is the appropriate way to efficiently return documents in an expected sort order. For more information see: Using Indexes to Sort Query Results in the MongoDB manual.
A related feature is a clustered index, which would store documents on disk to match an index ordering. This is not a current feature of MongoDB, although it has been requested (see SERVER-3294).