Index on objectid field - $in operator - strange keysExamined - mongodb

I have a MongoDB 3.4 replicaset with a collection "page" where all documents have a "site" field (which is an ObjectId). "site" field has only 100 possible values. I have created an index on this field via db.page.createIndex({site:1}). There are about 3.6 millions documents in the "page" collection.
Now, I see logs like this in the mongod.log file
command db.page command: count { count: "page", query: { site: { $in:
[ ObjectId('A'), ObjectId('B'), ObjectId('C'), ObjectId('D'),
ObjectId('E'), ObjectId('F'), ObjectId('G'), ObjectId('H'),
ObjectId('I'), ObjectId('J'),, ObjectId('K'),, ObjectId('L') ] } } }
planSummary: IXSCAN { site: 1 } keysExamined:221888
docsExamined:221881 numYields:1786 reslen:44...
I don't understand the "keysExamined:221888" -> there are only 100 possible values, so my understanding would be that I would see keysExamined:100 at most, and here I would actually expect to see "keysExamined:12". What am I missing? For info, here is an explain on the request:
PRIMARY> db.page.explain().count({ site: { $in: [ ObjectId('A'), ObjectId('F'), ObjectId('H'), ObjectId('G'), ObjectId('I'), ObjectId('B'), ObjectId('C'), ObjectId('J'), ObjectId('K'), ObjectId('D'), ObjectId('E'), ObjectId('L') ] } } )
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "db.page",
"indexFilterSet" : false,
"parsedQuery" : {
"site" : {
"$in" : [
ObjectId("B"),
ObjectId("C"),
ObjectId("D"),
ObjectId("E"),
ObjectId("F"),
ObjectId("A"),
ObjectId("G"),
ObjectId("H"),
ObjectId("I"),
ObjectId("J"),
ObjectId("K"),
ObjectId("L")
]
}
},
"winningPlan" : {
"stage" : "COUNT",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"site" : 1
},
"indexName" : "site_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"site" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"site" : [
"[ObjectId('B'), ObjectId('B')]",
"[ObjectId('C'), ObjectId('C')]",
"[ObjectId('D'), ObjectId('D')]",
"[ObjectId('E'), ObjectId('E')]",
"[ObjectId('F'), ObjectId('F')]",
"[ObjectId('A'), ObjectId('A')]",
"[ObjectId('G'), ObjectId('G')]",
"[ObjectId('H'), ObjectId('H')]",
"[ObjectId('I'), ObjectId('I')]",
"[ObjectId('J'), ObjectId('J')]",
"[ObjectId('K'), ObjectId('K')]",
"[ObjectId('L'), ObjectId('L')]"
]
}
}
}
},
"rejectedPlans" : [ ]
},
"serverInfo" : {
"host" : "9a18351b5211",
"port" : 27017,
"version" : "3.4.18",
"gitVersion" : "4410706bef6463369ea2f42399e9843903b31923"
},
"ok" : 1
}
PRIMARY>
I know we are on a fairly old MongoDB version and we are planning to upgrade soon to 5.0.X (via incremental upgrade to 3.6 / 4.0 / 4.2 / 4.4). Is there a fix in the next versions to your knowledge?

So after checking I realized I was expecting mongodb to use counted b-trees for its index but that is not the case, hence mongo has indeed to go through all the subkeys of the index. Details in jira.mongodb.org/plugins/servlet/mobile#issue/server-7745
Hence at the moment count request will run in O(N) for N docs if indexes are used

Related

MongoDB uses filter before IXSCAN although compound index exists

this is my schema:
{
"_id" : ObjectId("5b726f066f8400317d55b9d7"),
"question" : ObjectId("5b726bf66f8400317d54ea79"),
"variableCollections" : [
{
"variableId" : ObjectId("5b726d746f8400317d553e9c"),
"variableCollectionId" : ObjectId("5b726d2e6f8400317d54feda")
}
]
}
this is the index of the schema
{
"question" : 1,
"variableCollections.variableCollectionId" : 1,
"variableCollections.variableId" : 1
}
When I try the following query even with or without hint. winningPlan always do a $eq filter before IXSCAN but it should have directly use the IXSCAN right away without filter.
db.getCollection('questionAnswers').find({
question: ObjectId('5b726bf66f8400317d54ea79'),
'variableCollections.variableId': ObjectId("5b726d746f8400317d553e9c"),
'variableCollections.variableCollectionId':ObjectId("5b726d2e6f8400317d54feda")
})
.hint("test1")
.explain({"verbosity":"allPlansExecution"})
The winningPlan is as follows
{
"stage" : "FETCH",
"filter" : {
"variableCollections.variableId" : {
"$eq" : ObjectId("5b726d746f8400317d553e9c")
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"question" : 1,
"variableCollections.variableCollectionId" : 1,
"variableCollections.variableId" : 1
},
"indexName" : "test1",
"isMultiKey" : true,
"multiKeyPaths" : {
"question" : [],
"variableCollections.variableCollectionId" : [
"variableCollections"
],
"variableCollections.variableId" : [
"variableCollections"
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"question" : [
"[ObjectId('5b726bf66f8400317d54ea79'), ObjectId('5b726bf66f8400317d54ea79')]"
],
"variableCollections.variableCollectionId" : [
"[ObjectId('5b726d2e6f8400317d54feda'), ObjectId('5b726d2e6f8400317d54feda')]"
],
"variableCollections.variableId" : [
"[MinKey, MaxKey]"
]
}
}
}
How can I force mongo to use IXSCAN without using $eq in filter to improve the performance of this query? or this is already the best performance I can get?
As per my knowledge for the find operation, MongoDB always use $eq operation internally. So I think you have the best query plan. But don't use many indexes as your query result might get slow.

MongoDB Aggregation is slow on Indexed fields

I have a collection with ~2.5m documents, the collection size is 14,1GB, storage size 4.2GB and average object size 5,8KB. I created two separate indexes on two of the fields dataSourceName and version (text fields) and tried to make an aggregate query to list their 'grouped by' values.
(Trying to achieve this: select dsn, v from collection group by dsn, v).
db.getCollection("the-collection").aggregate(
[
{
"$group" : {
"_id" : {
"dataSourceName" : "$dataSourceName",
"version" : "$version"
}
}
}
],
{
"allowDiskUse" : false
}
);
Even though MongoDB eats ~10GB RAM on the server, the fields are indexed and nothing else is running at all, the aggregation takes ~40 seconds.
I tried to make a new index, which contains both fields in order, but still, the query does not seem to use the index:
{
"stages" : [
{
"$cursor" : {
"query" : {
},
"fields" : {
"dataSourceName" : NumberInt(1),
"version" : NumberInt(1),
"_id" : NumberInt(0)
},
"queryPlanner" : {
"plannerVersion" : NumberInt(1),
"namespace" : "db.the-collection",
"indexFilterSet" : false,
"parsedQuery" : {
},
"winningPlan" : {
"stage" : "COLLSCAN",
"direction" : "forward"
},
"rejectedPlans" : [
]
}
}
},
{
"$group" : {
"_id" : {
"dataSourceName" : "$dataSourceName",
"version" : "$version"
}
}
}
],
"ok" : 1.0
}
I am using MongoDB 3.6.5 64bit on Windows, so it should use the indexes: https://docs.mongodb.com/master/core/aggregation-pipeline/#pipeline-operators-and-indexes
As #Alex-Blex suggested, I tried it with sorting, but I an get OOM error:
The following error occurred while attempting to execute the aggregate query
Mongo Server error (MongoCommandException): Command failed with error 16819: 'Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.' on server server-address:port.
The full response is:
{
"ok" : 0.0,
"errmsg" : "Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.",
"code" : NumberInt(16819),
"codeName" : "Location16819"
}
My bad, I tried it on the wrong collection... Adding the same sort as the index works, now it is using the index. Still not fast thought, took ~10 seconds to give me the results.
The new exaplain:
{
"stages" : [
{
"$cursor" : {
"query" : {
},
"sort" : {
"dataSourceName" : NumberInt(1),
"version" : NumberInt(1)
},
"fields" : {
"dataSourceName" : NumberInt(1),
"version" : NumberInt(1),
"_id" : NumberInt(0)
},
"queryPlanner" : {
"plannerVersion" : NumberInt(1),
"namespace" : "....",
"indexFilterSet" : false,
"parsedQuery" : {
},
"winningPlan" : {
"stage" : "PROJECTION",
"transformBy" : {
"dataSourceName" : NumberInt(1),
"version" : NumberInt(1),
"_id" : NumberInt(0)
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"dataSourceName" : NumberInt(1),
"version" : NumberInt(1)
},
"indexName" : "dataSourceName_1_version_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"dataSourceName" : [
],
"version" : [
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : NumberInt(2),
"direction" : "forward",
"indexBounds" : {
"dataSourceName" : [
"[MinKey, MaxKey]"
],
"version" : [
"[MinKey, MaxKey]"
]
}
}
},
"rejectedPlans" : [
]
}
}
},
{
"$group" : {
"_id" : {
"dataSourceName" : "$dataSourceName",
"version" : "$version"
}
}
}
],
"ok" : 1.0
}
The page you are referring to says exactly opposite:
The $match and $sort pipeline operators can take advantage of an index
Your first stage is $group, which is neither $match nor $sort.
Try to sort it on the first stage to trigger use of the index:
db.getCollection("the-collection").aggregate(
[
{ $sort: { dataSourceName:1, version:1 } },
{
"$group" : {
"_id" : {
"dataSourceName" : "$dataSourceName",
"version" : "$version"
}
}
}
],
{
"allowDiskUse" : false
}
);
Please note, it should be a single compound index with the same fields and sorting:
db.getCollection("the-collection").createIndex({ dataSourceName:1, version:1 })

MongoDB Sharding - Query via mongos takes more time than query on mongod (PRIMARY)

Setup Details:
mongos:
RAM: 8 GB, CPUs: 2
Config Servers (Replica set of 3 config servers):
RAM: 4 GB, CPUs: 2
Shard Cluster-1 (Replica of 3 mongod):
RAM: 30 GB, CPUs: 4
Shard Cluster-2 (Replica of 3 mongod):
RAM: 30 GB, CPUs: 4
Sharding:
Collection: rptDlp, Key: {incidentOn: "hashed"}
Description:
I have more than 15 million records in a collection.
I am retrieving last page documents having sorted by a field(indexed one) of type date.
Actual Query:
db.getCollection("rptDlp").find({ incidentOn: { $gte: new Date(1513641600000), $lt: new Date(1516233600000) } })
.sort({ incidentOn: -1 }).skip(15610600).limit(10)
If I execute this query directly against mongo shard server (PRIMARY), it shows result in 14 seconds. But through mongos, it takes more than 2 minutes and due to query timeout my application results in showing an error prompt.
If we assume it as network congestion, then every query should take 2 minutes. But when i retrieve documents for first page it shows result in few seconds.
Explain query result (against mongos):
{
"queryPlanner" : {
"mongosPlannerVersion" : 1,
"winningPlan" : {
"stage" : "SHARD_MERGE_SORT",
"shards" : [
{
"shardName" : "rs0",
"connectionString" : "rs0/172.18.64.47:27017,172.18.64.48:27017,172.18.64.53:27017",
"serverInfo" : {
"host" : "UemCent7x64-70",
"port" : 27017,
"version" : "3.4.10",
"gitVersion" : "078f28920cb24de0dd479b5ea6c66c644f6326e9"
},
"plannerVersion" : 1,
"namespace" : "mydatabase.rptDlp",
"indexFilterSet" : false,
"parsedQuery" : {a
"$and" : [
{
"incidentOn" : {
"$lt" : ISODate("2018-01-19T06:13:39.000Z")
}
},
{
"incidentOn" : {
"$gte" : ISODate("2017-12-19T00:00:00.000Z")
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 10,
"inputStage" : {
"stage" : "SKIP",
"skipAmount" : 7519340,
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"incidentOn" : -1.0
},
"indexName" : "incidentOn_-1",
"isMultiKey" : false,
"multiKeyPaths" : {
"incidentOn" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"incidentOn" : [
"(new Date(1516342419000), new Date(1513641600000)]"
]
}
}
}
}
},
"rejectedPlans" : []
},
{
"shardName" : "rs1",
"connectionString" : "rs1/172.18.64.54:27017",
"serverInfo" : {
"host" : "UemCent7x64-76",
"port" : 27017,
"version" : "3.4.10",
"gitVersion" : "078f28920cb24de0dd479b5ea6c66c644f6326e9"
},
"plannerVersion" : 1,
"namespace" : "mydatabase.rptDlp",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"incidentOn" : {
"$lt" : ISODate("2018-01-19T06:13:39.000Z")
}
},
{
"incidentOn" : {
"$gte" : ISODate("2017-12-19T00:00:00.000Z")
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 10,
"inputStage" : {
"stage" : "SKIP",
"skipAmount" : 7519340,
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "SHARDING_FILTER",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"incidentOn" : -1.0
},
"indexName" : "incidentOn_-1",
"isMultiKey" : false,
"multiKeyPaths" : {
"incidentOn" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"incidentOn" : [
"(new Date(1516342419000), new Date(1513641600000)]"
]
}
}
}
}
}
},
"rejectedPlans" : []
}
]
}
},
"ok" : 1.0
}
Explain query result (against mongo shard):
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mydatabase.rptDlp",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"incidentOn" : {
"$lt" : ISODate("2018-01-19T06:13:39.000Z")
}
},
{
"incidentOn" : {
"$gte" : ISODate("2017-12-19T00:00:00.000Z")
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 10,
"inputStage" : {
"stage" : "SKIP",
"skipAmount" : 7519340,
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"incidentOn" : -1.0
},
"indexName" : "incidentOn_-1",
"isMultiKey" : false,
"multiKeyPaths" : {
"incidentOn" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"incidentOn" : [
"(new Date(1516342419000), new Date(1513641600000)]"
]
}
}
}
}
},
"rejectedPlans" : []
},
"serverInfo" : {
"host" : "UemCent7x64-69",
"port" : 27017,
"version" : "3.4.10",
"gitVersion" : "078f28920cb24de0dd479b5ea6c66c644f6326e9"
},
"ok" : 1.0
}
Any suggestion would be helpful, thanks in advance.
FINDINGS :
While we execute skip() query on mongos(routers) and on shard it behaves differently.
When executing the skip(n) and limit(m) on shard, it actually skips the 'n' number of record and only returns 'm' records mentioned in limit.
But this is not possible through mongos, because it may possible that data is divided on multiple shards and due to which shard may contains less than 'n' number of records(mentioned in skip).
Hence instead of applying skip(n) query, mongos will execute limit(n+m) query on shard by adding skip count n and limit count m to collect all records. After collecting results from all shard mongos will apply skip on assembled records.
Also if data is huge mongos fetches that data in chunks by using getMore command, which also slow down the performance.
As per mongo doc reference from : https://docs.mongodb.com/v3.0/core/sharded-cluster-query-router/
If the query limits the size of the result set using the limit() cursor method, the mongos instance passes that limit to the shards and then re-applies the limit to the result before returning the result to the client.
If the query specifies a number of records to skip using the skip() cursor method, the mongos cannot pass the skip to the shards, but rather retrieves unskipped results from the shards and skips the appropriate number of documents when assembling the complete result. However, when used in conjunction with a limit(), the mongos will pass the limit plus the value of the skip() to the shards to improve the efficiency of these operations.
Is there any solution to improve skip query performance executed via mongos(routers)?
Thanks in advance.
If you have a HASHed shard key you won't be able to use range queries to find which nodes each item is on so it will need to scan all nodes within the sharded cluster. So the slowest query will be the slowest node in the set plus time to aggregate the results on the mongos before sending them back to the client.
Using a HASHed shard key scatters the results throughout the cluster so you'll only be able to query based on a key match.
Check out the documentation here - https://docs.mongodb.com/manual/core/hashed-sharding/
If you don't mind the query doing a full cluster scan then you could make it more efficient by adding a standard index on incidentOn this will make the query a lot faster on each node but still won't be able to pinpoint the nodes in the cluster.

When using $elemMatch, why is MongoDB performing a FETCH stage if index is covering the filter?

I have the following table :
> db.foo.find()
{ "_id" : 1, "k" : [ { "a" : 50, "b" : 10 } ] }
{ "_id" : 2, "k" : [ { "a" : 90, "b" : 80 } ] }
With an compound index on k field :
"key" : {
"k.a" : 1,
"k.b" : 1
},
"name" : "k.a_1_k.b_1"
If I run the following query :
db.foo.aggregate([
{ $match: { "k.a" : 50 } },
{ $project: { _id : 0, "dummy": {$literal:""} }}
])
The index if used (make sense) and there is no need of FETCH stage :
"winningPlan" : {
"stage" : "COUNT_SCAN",
"keyPattern" : {
"k.a" : 1,
"k.b" : 1
},
"indexName" : "k.a_1_k.b_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"k.a" : [ ],
"k.b" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"indexBounds" : {
"startKey" : {
"k.a" : 50,
"k.b" : { "$minKey" : 1 }
},
"startKeyInclusive" : true,
"endKey" : {
"k.a" : 50,
"k.b" : { "$maxKey" : 1 }
},
"endKeyInclusive" : true
}
}
However, If I run the following query that use $elemMatch :
db.foo.aggregate([
{ $match: { k: {$elemMatch: {a : 50, b : { $in : [5, 6, 10]}}}} },
{ $project: { _id : 0, "dummy" : {$literal : ""}} }
])
There is a FETCH stage (which AFAIK is not necessary) :
"winningPlan" : {
"stage" : "FETCH",
"filter" : {
"k" : {
"$elemMatch" : {
"$and" : [
{ "a" : { "$eq" : 50 } },
{ "b" : { "$in" : [ 5, 6, 10 ] } }
]
}
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"k.a" : 1,
"k.b" : 1
},
"indexName" : "k.a_1_k.b_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"k.a" : [ ],
"k.b" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"k.a" : [
"[50.0, 50.0]"
],
"k.b" : [
"[5.0, 5.0]",
"[6.0, 6.0]",
"[10.0, 10.0]"
]
}
}
},
I am using MongoDB 3.4.
I ask this because I have a DB with lot of documents and there is a query that use aggregate() and $elemMatch (it perform more useful things than projecting nothing as in this question OFC, but theoretically things does not require a FETCH stage). I found out main reason of query being slow if is the FETCH stage, which AFAIK is not needed.
Is there some logic that force MongoDB to use FETCH when $elemMatch is used, or is it just a missing optimization ?
Looks like even a single "$elemMatch" forces mongo to do FETCH -> COUNT instead of COUNT_SCAN. Opened a ticket in their Jira with simple steps to reproduce - https://jira.mongodb.org/browse/SERVER-35223
TLDR: this is the expected behaviour of a multikey index combined with an $elemMatch.
From the covered query section of the multikey index documents:
Multikey indexes cannot cover queries over array field(s).
Meaning all information about a sub-document is not in the multikey index.
Let's imagine the following scenario:
//doc1
{
"k" : [ { "a" : 50, "b" : 10 } ]
}
//doc2
{
"k" : { "a" : 50, "b" : 10 }
}
Because Mongo "flattens" the array it indexes, once the index is built Mongo cannot differentiate between these 2 documents, $elemMatch specifically requires an array object to match (i.e doc2 will never match an $elemMatch query).
Meaning Mongo is forced to FETCH the documents to determine which docs will match, this is the premise causing the behaviour you see.

Sorting with $in not returning all docs

I have the following query.
db.getCollection('logs').find({'uid.$id': {
'$in': [
ObjectId("580e3397812de36b86d68c04"),
ObjectId("580e33a9812de36b86d68c0b"),
ObjectId("580e339a812de36b86d68c09"),
ObjectId("580e339a812de36b86d68c08"),
ObjectId("580e33a9812de36b86d68c0a"),
ObjectId("580e33bd812de36b86d68c11"),
ObjectId("580e33c0812de36b86d68c13")
]}, levelno: { '$gte': 10 }
}).sort({_id: 1})
This should return 1847 documents. However, when executing it, I only get 1000 documents, which is the cursor's batchSize and then the cursor closes (setting its cursorId to 0), as if all documents were returned.
If I take out the sorting, then I get all 1847 documents.
So my question is, why does it silently fail when using sorting with the $in operator?
EDIT
Using explain gives the following output
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "session.logs",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"levelno" : {
"$gte" : 10
}
},
{
"uid.$id" : {
"$in" : [
ObjectId("580e3397812de36b86d68c04"),
ObjectId("580e339a812de36b86d68c08"),
ObjectId("580e339a812de36b86d68c09"),
ObjectId("580e33a9812de36b86d68c0a"),
ObjectId("580e33a9812de36b86d68c0b"),
ObjectId("580e33bd812de36b86d68c11"),
ObjectId("580e33c0812de36b86d68c13")
]
}
}
]
},
"winningPlan" : {
"stage" : "SORT",
"sortPattern" : {
"_id" : 1
},
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"uid.$id" : 1,
"levelno" : 1,
"_id" : 1
},
"indexName" : "uid.$id_1_levelno_1__id_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"uid.$id" : [
"[ObjectId('580e3397812de36b86d68c04'), ObjectId('580e3397812de36b86d68c04')]",
"[ObjectId('580e339a812de36b86d68c08'), ObjectId('580e339a812de36b86d68c08')]",
"[ObjectId('580e339a812de36b86d68c09'), ObjectId('580e339a812de36b86d68c09')]",
"[ObjectId('580e33a9812de36b86d68c0a'), ObjectId('580e33a9812de36b86d68c0a')]",
"[ObjectId('580e33a9812de36b86d68c0b'), ObjectId('580e33a9812de36b86d68c0b')]",
"[ObjectId('580e33bd812de36b86d68c11'), ObjectId('580e33bd812de36b86d68c11')]",
"[ObjectId('580e33c0812de36b86d68c13'), ObjectId('580e33c0812de36b86d68c13')]"
],
"levelno" : [
"[10.0, inf.0]"
],
"_id" : [
"[MinKey, MaxKey]"
]
}
}
}
}
},
"rejectedPlans" : [
{
"stage" : "SORT",
"sortPattern" : {
"_id" : 1
},
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"levelno" : 1,
"_id" : 1,
"uid.$id" : 1
},
"indexName" : "levelno_1__id_1_uid.$id_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"levelno" : [
"[10.0, inf.0]"
],
"_id" : [
"[MinKey, MaxKey]"
],
"uid.$id" : [
"[ObjectId('580e3397812de36b86d68c04'), ObjectId('580e3397812de36b86d68c04')]",
"[ObjectId('580e339a812de36b86d68c08'), ObjectId('580e339a812de36b86d68c08')]",
"[ObjectId('580e339a812de36b86d68c09'), ObjectId('580e339a812de36b86d68c09')]",
"[ObjectId('580e33a9812de36b86d68c0a'), ObjectId('580e33a9812de36b86d68c0a')]",
"[ObjectId('580e33a9812de36b86d68c0b'), ObjectId('580e33a9812de36b86d68c0b')]",
"[ObjectId('580e33bd812de36b86d68c11'), ObjectId('580e33bd812de36b86d68c11')]",
"[ObjectId('580e33c0812de36b86d68c13'), ObjectId('580e33c0812de36b86d68c13')]"
]
}
}
}
}
},
{
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"levelno" : {
"$gte" : 10
}
},
{
"uid.$id" : {
"$in" : [
ObjectId("580e3397812de36b86d68c04"),
ObjectId("580e339a812de36b86d68c08"),
ObjectId("580e339a812de36b86d68c09"),
ObjectId("580e33a9812de36b86d68c0a"),
ObjectId("580e33a9812de36b86d68c0b"),
ObjectId("580e33bd812de36b86d68c11"),
ObjectId("580e33c0812de36b86d68c13")
]
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"_id" : 1
},
"indexName" : "_id_",
"isMultiKey" : false,
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"_id" : [
"[MinKey, MaxKey]"
]
}
}
}
]
},
"ok" : 1
}
What's happening is that this sorted query must be performed in-memory as it's not supported by an index, and this limits the results to 32 MB. This behavior is documented here, with a JIRA about addressing this here.
Furthermore, you can't define an index to support this query as you're sorting on a field that isn't part of the query, and neither of these cases apply:
If the sort keys correspond to the index keys or an index prefix,
MongoDB can use the index to sort the query results. A prefix of a
compound index is a subset that consists of one or more keys at the
start of the index key pattern.
...
An index can support sort operations on a non-prefix subset of the
index key pattern. To do so, the query must include equality
conditions on all the prefix keys that precede the sort keys.
You should be able to work around the limitation by using the aggregation framework which can be instructed to use temporary files for its pipeline stage outputs if required via the allowDiskUse: true option:
db.getCollection('logs').aggregate([
{$match: {'uid.$id': {
'$in': [
ObjectId("580e3397812de36b86d68c04"),
ObjectId("580e33a9812de36b86d68c0b"),
ObjectId("580e339a812de36b86d68c09"),
ObjectId("580e339a812de36b86d68c08"),
ObjectId("580e33a9812de36b86d68c0a"),
ObjectId("580e33bd812de36b86d68c11"),
ObjectId("580e33c0812de36b86d68c13")
]}, levelno: { '$gte': 10 }
}},
{$sort: {_id: 1}}
], { allowDiskUse: true })
You can use objsLeftInBatch() method to determine how many object are left in batch and iterate over it.
You can override the size and limit of the cursor batch size using cursor.batchSize(size) and cursor.limit(limit)