Why aggregation framework is slower than simple find query - mongodb

I am new to mongodb and came across some strange behaviour of aggregation framework.
I have a collection named 'billingData', this collection has approximately 2M documents.
I am comparing two queries which give me same output but their execution time different.
Query 1:
db.billingData.find().sort({"_id":-1}).skip(100000).limit(50)
Execution Plan:
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "billingDetails.billingData",
"indexFilterSet" : false,
"parsedQuery" : {},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 50,
"inputStage" : {
"stage" : "SKIP",
"skipAmount" : 100000,
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"_id" : 1
},
"indexName" : "_id_",
"isMultiKey" : false,
"multiKeyPaths" : {
"_id" : []
},
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "backward",
"indexBounds" : {
"_id" : [
"[MaxKey, MinKey]"
]
}
}
}
}
},
"rejectedPlans" : []
},
"serverInfo" : {
"host" : "ip-172-60-62-125",
"port" : 27017,
"version" : "3.6.3",
"gitVersion" : "9586e557d54ef70f9ca4b43c26892cd55257e1a5"
},
"ok" : 1.0
}
Query 2:
db.billingData.aggregate([
{$sort : {"_id":-1}},
{$skip:100000},
{$limit:50}
])
Execution Plan:
{
"stages" : [
{
"$cursor" : {
"query" : {},
"sort" : {
"_id" : -1
},
"limit" : NumberLong(100050),
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "billingDetails.billingData",
"indexFilterSet" : false,
"parsedQuery" : {},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"_id" : 1
},
"indexName" : "_id_",
"isMultiKey" : false,
"multiKeyPaths" : {
"_id" : []
},
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "backward",
"indexBounds" : {
"_id" : [
"[MaxKey, MinKey]"
]
}
}
},
"rejectedPlans" : []
}
}
},
{
"$skip" : NumberLong(100000)
}
],
"ok" : 1.0
}
I was expecting same results from aggregation framework and find query but find query returned results in 2sec and aggregation took 16sec.
Although in both the queries, I am sorting my documents in descending order(on the basis of _id) and fetching 50 records after skipping 100,000 records.
Can someone explain me why aggregation framework is working this way?
What can I do to make it performance wise similar to find query?

Related

Explain why results from mongo are being returned in reverse ObjectId order?

I have a list of news article items which I am tagging for entities, and topic tags.
my query
db["fmetadata"].find({'$and': [{'$text': {'$search': 'apple trump'}}, {'$or':
[{'entities': {'$elemMatch': {'$regex': 'apple|trump'}}}, {'tags': {'$elemMatch': {'$regex': 'apple|trump'}}}]}]}).explain()
query plan
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "dfabric.fmetadata",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"$or" : [
{
"entities" : {
"$elemMatch" : {
"$regex" : "apple|trump"
}
}
},
{
"tags" : {
"$elemMatch" : {
"$regex" : "apple|trump"
}
}
}
]
},
{
"$text" : {
"$search" : "apple trump",
"$language" : "english",
"$caseSensitive" : false,
"$diacriticSensitive" : false
}
}
]
},
"winningPlan" : {
"stage" : "FETCH",
"filter" : {
"$or" : [
{
"entities" : {
"$elemMatch" : {
"$regex" : "apple|trump"
}
}
},
{
"tags" : {
"$elemMatch" : {
"$regex" : "apple|trump"
}
}
}
]
},
"inputStage" : {
"stage" : "TEXT",
"indexPrefix" : {
},
"indexName" : "title_text_tags_text_entities_text",
"parsedTextQuery" : {
"terms" : [
"appl",
"trump"
],
"negatedTerms" : [ ],
"phrases" : [ ],
"negatedPhrases" : [ ]
},
"textIndexVersion" : 3,
"inputStage" : {
"stage" : "TEXT_MATCH",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "OR",
"inputStages" : [
{
"stage" : "IXSCAN",
"keyPattern" : {
"_fts" : "text",
"_ftsx" : 1
},
"indexName" : "title_text_tags_text_entities_text",
"isMultiKey" : true,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "backward",
"indexBounds" : {
}
},
{
"stage" : "IXSCAN",
"keyPattern" : {
"_fts" : "text",
"_ftsx" : 1
},
"indexName" : "title_text_tags_text_entities_text",
"isMultiKey" : true,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "backward",
"indexBounds" : {
}
}
]
}
}
}
}
},
"rejectedPlans" : [ ]
},
"serverInfo" : {
"host" : "fabric-dev",
"port" : 27017,
"version" : "4.0.2",
"gitVersion" : "fc1573ba18aee42f97a3bb13b67af7d837826b47"
},
"ok" : 1
}
I see that
["queryPlanner"]["winningPlan"]["inputStage"]["inputStage"]["inputStages"]
"stage": "IXSCAN"
"direction": "backward"
Can this please be explained why?
I was developing a pagination cursor using >lastId, and limit technique. But since, results are being returned backwards, I have to use < lastId which seems counterintuitive.
If I don't sort my results in the natural order, can it be guaranteed that it will always be backwards/reverse?
Edit: as mentioned in the comment below
My objective here is to get the intuition as to why the index was scanned backwards- is it the way I formulated my query? or something else entirely? The ordering- forwards or backwards doesn't matter as much as the consistency of it remaining always so does- either always forwards or vice versa
I came across this question on stackoverflow, and I believe the accepted answer, with the comments below satisfactorily gives me the intuition I was looking for.
How does MongoDB sort records when no sort order is specified?

Why does Mongo FETCH on count() with $nin?

I am trying to understand why Mongo can't use a covered index with my query using $nin, and how to resolve it. My issue is with a compound index, but it happens with a simple index too.
Take a simple document:
{b: "text1"}
And a simple index:
{
"v" : 1,
"key" : {
"b" : 1
},
"name" : "b_1",
"ns" : "mytest"
}
And what I thought was a simple count() query:
db.mytest.count( {b: $nin: [ "foo" ]}, {b:1, _id:0} )
The winningPlan unexpectedly includes a FETCH:
"winningPlan" : {
"stage" : "COUNT",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"b" : 1
},
"indexName" : "b_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"b" : [
"[MinKey, \"foo\")",
"(\"foo\", MaxKey]"
]
}
}
}
}
But with a simple equality condition it uses COUNT_SCAN (as expected):
> db.mytest.count( {b: "bar" }, {b:1, _id:0} )
"winningPlan" : {
"stage" : "COUNT",
"inputStage" : {
"stage" : "COUNT_SCAN",
"keyPattern" : {
"b" : 1
},
"indexName" : "b_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1
}
},
To make things more interesting, a find() instead of a count() doesn't look at any documents:
> db.mytest.find({b:{ $nin: [ 3 ] }}, {b:1, _id:0})
"winningPlan" : {
"stage" : "PROJECTION",
"transformBy" : {
"b" : 1,
"_id" : 0
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"b" : 1
},
"indexName" : "b_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"b" : [
"[MinKey, 3.0)",
"(3.0, MaxKey]"
]
}
}
}
Why does Mongo need to FETCH with $nin? It should be able to fulfill this exclusively from the index.
So it appears that this is a bug that was fixed in 3.6. There was definitely an unnecessary FETCH in many COUNT situations.

explains shows fast execution time, but running the query never returns

I have a query that seems to never return.
When I run explain on that query, it shows me executionStats.executionTimeMillis of 27ms, and that the initial input-stage is IXSCAN that should return 4 objects only.
I've confirmed that querying for the input-stage query returns only 4 results.
This is my query:
{"$or":[
{"field1.key":{"$in":["name1","name2",/^prefix.*suffix$/]},"field2.key":"foobar"},
{"field1.key":{"$in":["name1","name2",/^prefix.*suffix$/]},"field3.key":"foobar"}
]}
This is the explain({ verbose : "executionStats" }) output (sorry for the long paste):
{
"queryPlanner" : {
"mongosPlannerVersion" : 1,
"winningPlan" : {
"stage" : "SHARD_MERGE",
"shards" : [
{
"shardName" : "...",
"plannerVersion" : 1,
"indexFilterSet" : false,
"parsedQuery" : { ... },
"winningPlan" : {
"stage" : "SUBPLAN",
"inputStage" : {
"stage" : "OR",
"inputStages" : [
{
"stage" : "FETCH",
"filter" : {"field1.key":{"$in":["name1","name2",/^prefix.*suffix$/]},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : { "field3.key" : 1.0 },
"indexName" : "field3.key_1",
"isMultiKey" : true,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"field3.key" : [ "[\"foobar\", \"foobar\"]" ]
}
}
},
{
"stage" : "FETCH",
"filter" : {"field1.key":{"$in":["name1","name2",/^prefix.*suffix$/]},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : { "field2.key" : 1.0 },
"indexName" : "field2.key_1",
"isMultiKey" : true,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"field2.key" : [ "[\"foobar\", \"foobar\"]" ]
}
}
}
]
}
},
"rejectedPlans" : []
},
...
// same plan for the 3 other shards
...
]
}
},
"executionStats" : {
"nReturned" : 0,
"executionTimeMillis" : 27,
"totalKeysExamined" : 4,
"totalDocsExamined" : 4,
"executionStages" : {
"stage" : "SHARD_MERGE",
"nReturned" : 0,
"executionTimeMillis" : 27,
"totalKeysExamined" : 4,
"totalDocsExamined" : 4,
"totalChildMillis" : NumberLong(63),
...
// execution times for each shard
...
},
"allPlansExecution" : []
},
"ok" : 1.0
}
UPDATE
It seems that despite explain mentioning it uses "field2.key" for the first part of the $or and "field3.key" for the second part of the $or, when looking at db.currentOp().inprog it shows:
"planSummary": "IXSCAN { field1.key: 1.0 }, IXSCAN { field3.key: 1.0 }"
so it selected the wrong index for one of the $or parts, and thus making the query scan a huge number of documents.
Any idea why explain gets the indexes right, but the query itself doesn't?
How can we hint mongo to use the correct indexes, when using $or?

MongoDB use index with $nin seems not to work in combination with $regex

It seems that my index in my MongoDB is not correct.
I have created 3 indexes. These:
{
_id: 1
}
{
isbn: 1
}
{
_id: 1,
isbn: 1
}
When doing a query with isbn or _id its working perfect. Even with isbn and _id. For example:
db.getCollection('books').find({
isbn: {
$regex: '^978048627.*'
},
_id: 'vGXejKQH5kw8Kfutk'
}
needs around 3ms.
But lets now say I want to search for an ISBN and need to exclude some _ids - I do this:
db.getCollection('books').find({
isbn: {
$regex: '^97804862731.*'
},
_id: {
$nin:['vGXejKQH5kw8Kfutk']
}
})
Now its not working as it should. The query took more then 10 seconds!
When I do a isbn search without $regex but with $nin its works perfect - again around 3ms for the query. Example:
db.getCollection('books').find({
isbn: '9780486273136',
_id: {
$nin:['vGXejKQH5kw8Kfutk']
}
})
Am I doing something wrong ? And why the index is not working correctly as it should ?
Here is the .explain() output when querying the 10 seconds query:
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "***.books",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"isbn" : /^97804862731.*/
},
{
"$not" : {
"_id" : {
"$in" : [
"vGXejKQH5kw8Kfutk"
]
}
}
}
]
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"filter" : {
"isbn" : /^97804862731.*/
},
"keyPattern" : {
"isbn" : 1.0,
"_id" : 1.0
},
"indexName" : "isbn_1__id_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"isbn" : [
"[\"97804862731\", \"97804862732\")",
"[/^97804862731.*/, /^97804862731.*/]"
],
"_id" : [
"[MinKey, \"vGXejKQH5kw8Kfutk\")",
"(\"vGXejKQH5kw8Kfutk\", MaxKey]"
]
}
}
},
"rejectedPlans" : [
{
"stage" : "FETCH",
"filter" : {
"isbn" : /^97804862731.*/
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"_id" : 1
},
"indexName" : "_id_",
"isMultiKey" : false,
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"_id" : [
"[MinKey, \"vGXejKQH5kw8Kfutk\")",
"(\"vGXejKQH5kw8Kfutk\", MaxKey]"
]
}
}
},
{
"stage" : "FETCH",
"filter" : {
"$not" : {
"_id" : {
"$in" : [
"vGXejKQH5kw8Kfutk"
]
}
}
},
"inputStage" : {
"stage" : "IXSCAN",
"filter" : {
"isbn" : /^97804862731.*/
},
"keyPattern" : {
"isbn" : 1
},
"indexName" : "isbn_1",
"isMultiKey" : false,
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"isbn" : [
"[\"97804862731\", \"97804862732\")",
"[/^97804862731.*/, /^97804862731.*/]"
]
}
}
}
]
},
"serverInfo" : {
"host" : "Ubuntu-1604-xenial-64-minimal",
"port" : 27017,
"version" : "3.2.11",
"gitVersion" : "009580ad490190ba33d1c6253ebd8d91808923e4"
},
"ok" : 1.0
}
Solution
My solution - I do not know why - but is to use $and and $ne instead of $nin.
My query looks like this now:
db.getCollection('books').find({isbn:{$regex: '^97804862731.*'}, $and: [
{
_id: {
$ne: 'vGXejKQH5kw8Kfutk'
}
},
{
_id: {
$ne: 'another-id'
}
}
]})
and just takes around 3ms
Maybe someone can explain how this can happen ?
The explain() of this query
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "***.books",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"isbn" : /^97804862731.*/
},
{
"$not" : {
"_id" : {
"$eq" : "vGXejKQH5kw8Kfutk"
}
}
},
{
"$not" : {
"_id" : {
"$eq" : "another-id"
}
}
}
]
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"filter" : {
"isbn" : /^97804862731.*/
},
"keyPattern" : {
"isbn" : 1.0,
"_id" : 1.0
},
"indexName" : "isbn_1__id_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"isbn" : [
"[\"97804862731\", \"97804862732\")",
"[/^97804862731.*/, /^97804862731.*/]"
],
"_id" : [
"[MinKey, \"another-id\")",
"(\"another-id\", \"vGXejKQH5kw8Kfutk\")",
"(\"vGXejKQH5kw8Kfutk\", MaxKey]"
]
}
}
},
"rejectedPlans" : [
{
"stage" : "FETCH",
"filter" : {
"isbn" : /^97804862731.*/
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"_id" : 1
},
"indexName" : "_id_",
"isMultiKey" : false,
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"_id" : [
"[MinKey, \"another-id\")",
"(\"another-id\", \"vGXejKQH5kw8Kfutk\")",
"(\"vGXejKQH5kw8Kfutk\", MaxKey]"
]
}
}
},
{
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"$not" : {
"_id" : {
"$eq" : "vGXejKQH5kw8Kfutk"
}
}
},
{
"$not" : {
"_id" : {
"$eq" : "another-id"
}
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"filter" : {
"isbn" : /^97804862731.*/
},
"keyPattern" : {
"isbn" : 1
},
"indexName" : "isbn_1",
"isMultiKey" : false,
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"isbn" : [
"[\"97804862731\", \"97804862732\")",
"[/^97804862731.*/, /^97804862731.*/]"
]
}
}
}
]
},
"serverInfo" : {
"host" : "Ubuntu-1604-xenial-64-minimal",
"port" : 27017,
"version" : "3.2.11",
"gitVersion" : "009580ad490190ba33d1c6253ebd8d91808923e4"
},
"ok" : 1.0
}

Sorting with $in not returning all docs

I have the following query.
db.getCollection('logs').find({'uid.$id': {
'$in': [
ObjectId("580e3397812de36b86d68c04"),
ObjectId("580e33a9812de36b86d68c0b"),
ObjectId("580e339a812de36b86d68c09"),
ObjectId("580e339a812de36b86d68c08"),
ObjectId("580e33a9812de36b86d68c0a"),
ObjectId("580e33bd812de36b86d68c11"),
ObjectId("580e33c0812de36b86d68c13")
]}, levelno: { '$gte': 10 }
}).sort({_id: 1})
This should return 1847 documents. However, when executing it, I only get 1000 documents, which is the cursor's batchSize and then the cursor closes (setting its cursorId to 0), as if all documents were returned.
If I take out the sorting, then I get all 1847 documents.
So my question is, why does it silently fail when using sorting with the $in operator?
EDIT
Using explain gives the following output
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "session.logs",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"levelno" : {
"$gte" : 10
}
},
{
"uid.$id" : {
"$in" : [
ObjectId("580e3397812de36b86d68c04"),
ObjectId("580e339a812de36b86d68c08"),
ObjectId("580e339a812de36b86d68c09"),
ObjectId("580e33a9812de36b86d68c0a"),
ObjectId("580e33a9812de36b86d68c0b"),
ObjectId("580e33bd812de36b86d68c11"),
ObjectId("580e33c0812de36b86d68c13")
]
}
}
]
},
"winningPlan" : {
"stage" : "SORT",
"sortPattern" : {
"_id" : 1
},
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"uid.$id" : 1,
"levelno" : 1,
"_id" : 1
},
"indexName" : "uid.$id_1_levelno_1__id_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"uid.$id" : [
"[ObjectId('580e3397812de36b86d68c04'), ObjectId('580e3397812de36b86d68c04')]",
"[ObjectId('580e339a812de36b86d68c08'), ObjectId('580e339a812de36b86d68c08')]",
"[ObjectId('580e339a812de36b86d68c09'), ObjectId('580e339a812de36b86d68c09')]",
"[ObjectId('580e33a9812de36b86d68c0a'), ObjectId('580e33a9812de36b86d68c0a')]",
"[ObjectId('580e33a9812de36b86d68c0b'), ObjectId('580e33a9812de36b86d68c0b')]",
"[ObjectId('580e33bd812de36b86d68c11'), ObjectId('580e33bd812de36b86d68c11')]",
"[ObjectId('580e33c0812de36b86d68c13'), ObjectId('580e33c0812de36b86d68c13')]"
],
"levelno" : [
"[10.0, inf.0]"
],
"_id" : [
"[MinKey, MaxKey]"
]
}
}
}
}
},
"rejectedPlans" : [
{
"stage" : "SORT",
"sortPattern" : {
"_id" : 1
},
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"levelno" : 1,
"_id" : 1,
"uid.$id" : 1
},
"indexName" : "levelno_1__id_1_uid.$id_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"levelno" : [
"[10.0, inf.0]"
],
"_id" : [
"[MinKey, MaxKey]"
],
"uid.$id" : [
"[ObjectId('580e3397812de36b86d68c04'), ObjectId('580e3397812de36b86d68c04')]",
"[ObjectId('580e339a812de36b86d68c08'), ObjectId('580e339a812de36b86d68c08')]",
"[ObjectId('580e339a812de36b86d68c09'), ObjectId('580e339a812de36b86d68c09')]",
"[ObjectId('580e33a9812de36b86d68c0a'), ObjectId('580e33a9812de36b86d68c0a')]",
"[ObjectId('580e33a9812de36b86d68c0b'), ObjectId('580e33a9812de36b86d68c0b')]",
"[ObjectId('580e33bd812de36b86d68c11'), ObjectId('580e33bd812de36b86d68c11')]",
"[ObjectId('580e33c0812de36b86d68c13'), ObjectId('580e33c0812de36b86d68c13')]"
]
}
}
}
}
},
{
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"levelno" : {
"$gte" : 10
}
},
{
"uid.$id" : {
"$in" : [
ObjectId("580e3397812de36b86d68c04"),
ObjectId("580e339a812de36b86d68c08"),
ObjectId("580e339a812de36b86d68c09"),
ObjectId("580e33a9812de36b86d68c0a"),
ObjectId("580e33a9812de36b86d68c0b"),
ObjectId("580e33bd812de36b86d68c11"),
ObjectId("580e33c0812de36b86d68c13")
]
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"_id" : 1
},
"indexName" : "_id_",
"isMultiKey" : false,
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"_id" : [
"[MinKey, MaxKey]"
]
}
}
}
]
},
"ok" : 1
}
What's happening is that this sorted query must be performed in-memory as it's not supported by an index, and this limits the results to 32 MB. This behavior is documented here, with a JIRA about addressing this here.
Furthermore, you can't define an index to support this query as you're sorting on a field that isn't part of the query, and neither of these cases apply:
If the sort keys correspond to the index keys or an index prefix,
MongoDB can use the index to sort the query results. A prefix of a
compound index is a subset that consists of one or more keys at the
start of the index key pattern.
...
An index can support sort operations on a non-prefix subset of the
index key pattern. To do so, the query must include equality
conditions on all the prefix keys that precede the sort keys.
You should be able to work around the limitation by using the aggregation framework which can be instructed to use temporary files for its pipeline stage outputs if required via the allowDiskUse: true option:
db.getCollection('logs').aggregate([
{$match: {'uid.$id': {
'$in': [
ObjectId("580e3397812de36b86d68c04"),
ObjectId("580e33a9812de36b86d68c0b"),
ObjectId("580e339a812de36b86d68c09"),
ObjectId("580e339a812de36b86d68c08"),
ObjectId("580e33a9812de36b86d68c0a"),
ObjectId("580e33bd812de36b86d68c11"),
ObjectId("580e33c0812de36b86d68c13")
]}, levelno: { '$gte': 10 }
}},
{$sort: {_id: 1}}
], { allowDiskUse: true })
You can use objsLeftInBatch() method to determine how many object are left in batch and iterate over it.
You can override the size and limit of the cursor batch size using cursor.batchSize(size) and cursor.limit(limit)