I'm having performance issues when querying ~12,000 user documents, indexed by 1 column, (companyId), no other filter. The whole collection only has ~27000. It took me about 12 seconds to get the ~12000 rows of data...
I tried running explain for this query:
db.instoreMember.find({companyId:"5b6be3e2096abd567974f924"}).explain();
result follows:
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "production.instoreMember",
"indexFilterSet" : false,
"parsedQuery" : {
"companyId" : {
"$eq" : "5b6be3e2096abd567974f924"
}
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"companyId" : 1,
"name" : 1,
"phoneNumber" : 1
},
"indexName" : "companyId_1_name_1_phoneNumber_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"companyId" : [ ],
"name" : [ ],
"phoneNumber" : [ ]
},
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"companyId" : [
"[\"5b6be3e2096abd567974f924\", \"5b6be3e2096abd567974f924\"]"
],
"name" : [
"[MinKey, MaxKey]"
],
"phoneNumber" : [
"[MinKey, MaxKey]"
]
}
}
},
"rejectedPlans" : [
{
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"companyId" : 1
},
"indexName" : "companyId_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"companyId" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"companyId" : [
"[\"5b6be3e2096abd567974f924\", \"5b6be3e2096abd567974f924\"]"
]
}
}
}
]
},
"serverInfo" : {
},
"ok" : 1
}
It seems that it is actually using the indexed companyId field, and if i do the search directly via mongodb shell, it's very fast: only 1~2 seconds.
But via Spring MongoDB Data - MongoTemplate:
final Query query = new Query().addCriteria(Criteria.where("companyId").is(adminCompanyId));
final List<InstoreMember> listOfInstoreMembers = mongoTemplate.find(query, InstoreMember.class);
This becomes very slow ~10-12seconds. (How i measure is that I put a break point at the find statement, let it step through to next line, which took about ~10-12seconds)
I've added the DEBUG line for mongodb spring bootstrap and here is the logged output of the find statement :
2018-08-14 23:53:34.493 DEBUG 22733 --- [bio-8080-exec-2] o.s.data.mongodb.core.MongoTemplate :
find using query: { "companyId" : "58fa36dd31d103038e64b061"} fields: null for class: class fn.model.InstoreMember in collection: instoreMember
Version of spring-data-mongodb i use:
compile ("org.springframework.data:spring-data-mongodb:1.10.7.RELEASE")
I had this problem.
The slow part is mapping Document to Java object. Mongo template doesn't map at the codec level so it goes bson->Document->POJO. If you use just the mongo driver with the POJO codec that will go bson->pojo and remove the template mapping layer overhead.
Also, if you have old data and have moved packages, this will also break their mapping layer and make it super slow as it falls back to reflection.
How do i see the raw query that is actually being executed by spring
mongodb data?
If the query is taking 12 seconds to load, you actually have time to run db.currentOp to see what is taking so long. The output will contain 'command' field that you can use to see what the DB has been tasked with.
To see the raw query being executed you can update the log level of the mongodb driver dependency to debug level. This should log the query to your log file.
Related
I have a MongoDB 3.4 replicaset with a collection "page" where all documents have a "site" field (which is an ObjectId). "site" field has only 100 possible values. I have created an index on this field via db.page.createIndex({site:1}). There are about 3.6 millions documents in the "page" collection.
Now, I see logs like this in the mongod.log file
command db.page command: count { count: "page", query: { site: { $in:
[ ObjectId('A'), ObjectId('B'), ObjectId('C'), ObjectId('D'),
ObjectId('E'), ObjectId('F'), ObjectId('G'), ObjectId('H'),
ObjectId('I'), ObjectId('J'),, ObjectId('K'),, ObjectId('L') ] } } }
planSummary: IXSCAN { site: 1 } keysExamined:221888
docsExamined:221881 numYields:1786 reslen:44...
I don't understand the "keysExamined:221888" -> there are only 100 possible values, so my understanding would be that I would see keysExamined:100 at most, and here I would actually expect to see "keysExamined:12". What am I missing? For info, here is an explain on the request:
PRIMARY> db.page.explain().count({ site: { $in: [ ObjectId('A'), ObjectId('F'), ObjectId('H'), ObjectId('G'), ObjectId('I'), ObjectId('B'), ObjectId('C'), ObjectId('J'), ObjectId('K'), ObjectId('D'), ObjectId('E'), ObjectId('L') ] } } )
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "db.page",
"indexFilterSet" : false,
"parsedQuery" : {
"site" : {
"$in" : [
ObjectId("B"),
ObjectId("C"),
ObjectId("D"),
ObjectId("E"),
ObjectId("F"),
ObjectId("A"),
ObjectId("G"),
ObjectId("H"),
ObjectId("I"),
ObjectId("J"),
ObjectId("K"),
ObjectId("L")
]
}
},
"winningPlan" : {
"stage" : "COUNT",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"site" : 1
},
"indexName" : "site_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"site" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"site" : [
"[ObjectId('B'), ObjectId('B')]",
"[ObjectId('C'), ObjectId('C')]",
"[ObjectId('D'), ObjectId('D')]",
"[ObjectId('E'), ObjectId('E')]",
"[ObjectId('F'), ObjectId('F')]",
"[ObjectId('A'), ObjectId('A')]",
"[ObjectId('G'), ObjectId('G')]",
"[ObjectId('H'), ObjectId('H')]",
"[ObjectId('I'), ObjectId('I')]",
"[ObjectId('J'), ObjectId('J')]",
"[ObjectId('K'), ObjectId('K')]",
"[ObjectId('L'), ObjectId('L')]"
]
}
}
}
},
"rejectedPlans" : [ ]
},
"serverInfo" : {
"host" : "9a18351b5211",
"port" : 27017,
"version" : "3.4.18",
"gitVersion" : "4410706bef6463369ea2f42399e9843903b31923"
},
"ok" : 1
}
PRIMARY>
I know we are on a fairly old MongoDB version and we are planning to upgrade soon to 5.0.X (via incremental upgrade to 3.6 / 4.0 / 4.2 / 4.4). Is there a fix in the next versions to your knowledge?
So after checking I realized I was expecting mongodb to use counted b-trees for its index but that is not the case, hence mongo has indeed to go through all the subkeys of the index. Details in jira.mongodb.org/plugins/servlet/mobile#issue/server-7745
Hence at the moment count request will run in O(N) for N docs if indexes are used
this is my schema:
{
"_id" : ObjectId("5b726f066f8400317d55b9d7"),
"question" : ObjectId("5b726bf66f8400317d54ea79"),
"variableCollections" : [
{
"variableId" : ObjectId("5b726d746f8400317d553e9c"),
"variableCollectionId" : ObjectId("5b726d2e6f8400317d54feda")
}
]
}
this is the index of the schema
{
"question" : 1,
"variableCollections.variableCollectionId" : 1,
"variableCollections.variableId" : 1
}
When I try the following query even with or without hint. winningPlan always do a $eq filter before IXSCAN but it should have directly use the IXSCAN right away without filter.
db.getCollection('questionAnswers').find({
question: ObjectId('5b726bf66f8400317d54ea79'),
'variableCollections.variableId': ObjectId("5b726d746f8400317d553e9c"),
'variableCollections.variableCollectionId':ObjectId("5b726d2e6f8400317d54feda")
})
.hint("test1")
.explain({"verbosity":"allPlansExecution"})
The winningPlan is as follows
{
"stage" : "FETCH",
"filter" : {
"variableCollections.variableId" : {
"$eq" : ObjectId("5b726d746f8400317d553e9c")
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"question" : 1,
"variableCollections.variableCollectionId" : 1,
"variableCollections.variableId" : 1
},
"indexName" : "test1",
"isMultiKey" : true,
"multiKeyPaths" : {
"question" : [],
"variableCollections.variableCollectionId" : [
"variableCollections"
],
"variableCollections.variableId" : [
"variableCollections"
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"question" : [
"[ObjectId('5b726bf66f8400317d54ea79'), ObjectId('5b726bf66f8400317d54ea79')]"
],
"variableCollections.variableCollectionId" : [
"[ObjectId('5b726d2e6f8400317d54feda'), ObjectId('5b726d2e6f8400317d54feda')]"
],
"variableCollections.variableId" : [
"[MinKey, MaxKey]"
]
}
}
}
How can I force mongo to use IXSCAN without using $eq in filter to improve the performance of this query? or this is already the best performance I can get?
As per my knowledge for the find operation, MongoDB always use $eq operation internally. So I think you have the best query plan. But don't use many indexes as your query result might get slow.
I am new to mongodb and came across some strange behaviour of aggregation framework.
I have a collection named 'billingData', this collection has approximately 2M documents.
I am comparing two queries which give me same output but their execution time different.
Query 1:
db.billingData.find().sort({"_id":-1}).skip(100000).limit(50)
Execution Plan:
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "billingDetails.billingData",
"indexFilterSet" : false,
"parsedQuery" : {},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 50,
"inputStage" : {
"stage" : "SKIP",
"skipAmount" : 100000,
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"_id" : 1
},
"indexName" : "_id_",
"isMultiKey" : false,
"multiKeyPaths" : {
"_id" : []
},
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "backward",
"indexBounds" : {
"_id" : [
"[MaxKey, MinKey]"
]
}
}
}
}
},
"rejectedPlans" : []
},
"serverInfo" : {
"host" : "ip-172-60-62-125",
"port" : 27017,
"version" : "3.6.3",
"gitVersion" : "9586e557d54ef70f9ca4b43c26892cd55257e1a5"
},
"ok" : 1.0
}
Query 2:
db.billingData.aggregate([
{$sort : {"_id":-1}},
{$skip:100000},
{$limit:50}
])
Execution Plan:
{
"stages" : [
{
"$cursor" : {
"query" : {},
"sort" : {
"_id" : -1
},
"limit" : NumberLong(100050),
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "billingDetails.billingData",
"indexFilterSet" : false,
"parsedQuery" : {},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"_id" : 1
},
"indexName" : "_id_",
"isMultiKey" : false,
"multiKeyPaths" : {
"_id" : []
},
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "backward",
"indexBounds" : {
"_id" : [
"[MaxKey, MinKey]"
]
}
}
},
"rejectedPlans" : []
}
}
},
{
"$skip" : NumberLong(100000)
}
],
"ok" : 1.0
}
I was expecting same results from aggregation framework and find query but find query returned results in 2sec and aggregation took 16sec.
Although in both the queries, I am sorting my documents in descending order(on the basis of _id) and fetching 50 records after skipping 100,000 records.
Can someone explain me why aggregation framework is working this way?
What can I do to make it performance wise similar to find query?
Setup Details:
mongos:
RAM: 8 GB, CPUs: 2
Config Servers (Replica set of 3 config servers):
RAM: 4 GB, CPUs: 2
Shard Cluster-1 (Replica of 3 mongod):
RAM: 30 GB, CPUs: 4
Shard Cluster-2 (Replica of 3 mongod):
RAM: 30 GB, CPUs: 4
Sharding:
Collection: rptDlp, Key: {incidentOn: "hashed"}
Description:
I have more than 15 million records in a collection.
I am retrieving last page documents having sorted by a field(indexed one) of type date.
Actual Query:
db.getCollection("rptDlp").find({ incidentOn: { $gte: new Date(1513641600000), $lt: new Date(1516233600000) } })
.sort({ incidentOn: -1 }).skip(15610600).limit(10)
If I execute this query directly against mongo shard server (PRIMARY), it shows result in 14 seconds. But through mongos, it takes more than 2 minutes and due to query timeout my application results in showing an error prompt.
If we assume it as network congestion, then every query should take 2 minutes. But when i retrieve documents for first page it shows result in few seconds.
Explain query result (against mongos):
{
"queryPlanner" : {
"mongosPlannerVersion" : 1,
"winningPlan" : {
"stage" : "SHARD_MERGE_SORT",
"shards" : [
{
"shardName" : "rs0",
"connectionString" : "rs0/172.18.64.47:27017,172.18.64.48:27017,172.18.64.53:27017",
"serverInfo" : {
"host" : "UemCent7x64-70",
"port" : 27017,
"version" : "3.4.10",
"gitVersion" : "078f28920cb24de0dd479b5ea6c66c644f6326e9"
},
"plannerVersion" : 1,
"namespace" : "mydatabase.rptDlp",
"indexFilterSet" : false,
"parsedQuery" : {a
"$and" : [
{
"incidentOn" : {
"$lt" : ISODate("2018-01-19T06:13:39.000Z")
}
},
{
"incidentOn" : {
"$gte" : ISODate("2017-12-19T00:00:00.000Z")
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 10,
"inputStage" : {
"stage" : "SKIP",
"skipAmount" : 7519340,
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"incidentOn" : -1.0
},
"indexName" : "incidentOn_-1",
"isMultiKey" : false,
"multiKeyPaths" : {
"incidentOn" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"incidentOn" : [
"(new Date(1516342419000), new Date(1513641600000)]"
]
}
}
}
}
},
"rejectedPlans" : []
},
{
"shardName" : "rs1",
"connectionString" : "rs1/172.18.64.54:27017",
"serverInfo" : {
"host" : "UemCent7x64-76",
"port" : 27017,
"version" : "3.4.10",
"gitVersion" : "078f28920cb24de0dd479b5ea6c66c644f6326e9"
},
"plannerVersion" : 1,
"namespace" : "mydatabase.rptDlp",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"incidentOn" : {
"$lt" : ISODate("2018-01-19T06:13:39.000Z")
}
},
{
"incidentOn" : {
"$gte" : ISODate("2017-12-19T00:00:00.000Z")
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 10,
"inputStage" : {
"stage" : "SKIP",
"skipAmount" : 7519340,
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "SHARDING_FILTER",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"incidentOn" : -1.0
},
"indexName" : "incidentOn_-1",
"isMultiKey" : false,
"multiKeyPaths" : {
"incidentOn" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"incidentOn" : [
"(new Date(1516342419000), new Date(1513641600000)]"
]
}
}
}
}
}
},
"rejectedPlans" : []
}
]
}
},
"ok" : 1.0
}
Explain query result (against mongo shard):
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mydatabase.rptDlp",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"incidentOn" : {
"$lt" : ISODate("2018-01-19T06:13:39.000Z")
}
},
{
"incidentOn" : {
"$gte" : ISODate("2017-12-19T00:00:00.000Z")
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 10,
"inputStage" : {
"stage" : "SKIP",
"skipAmount" : 7519340,
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"incidentOn" : -1.0
},
"indexName" : "incidentOn_-1",
"isMultiKey" : false,
"multiKeyPaths" : {
"incidentOn" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"incidentOn" : [
"(new Date(1516342419000), new Date(1513641600000)]"
]
}
}
}
}
},
"rejectedPlans" : []
},
"serverInfo" : {
"host" : "UemCent7x64-69",
"port" : 27017,
"version" : "3.4.10",
"gitVersion" : "078f28920cb24de0dd479b5ea6c66c644f6326e9"
},
"ok" : 1.0
}
Any suggestion would be helpful, thanks in advance.
FINDINGS :
While we execute skip() query on mongos(routers) and on shard it behaves differently.
When executing the skip(n) and limit(m) on shard, it actually skips the 'n' number of record and only returns 'm' records mentioned in limit.
But this is not possible through mongos, because it may possible that data is divided on multiple shards and due to which shard may contains less than 'n' number of records(mentioned in skip).
Hence instead of applying skip(n) query, mongos will execute limit(n+m) query on shard by adding skip count n and limit count m to collect all records. After collecting results from all shard mongos will apply skip on assembled records.
Also if data is huge mongos fetches that data in chunks by using getMore command, which also slow down the performance.
As per mongo doc reference from : https://docs.mongodb.com/v3.0/core/sharded-cluster-query-router/
If the query limits the size of the result set using the limit() cursor method, the mongos instance passes that limit to the shards and then re-applies the limit to the result before returning the result to the client.
If the query specifies a number of records to skip using the skip() cursor method, the mongos cannot pass the skip to the shards, but rather retrieves unskipped results from the shards and skips the appropriate number of documents when assembling the complete result. However, when used in conjunction with a limit(), the mongos will pass the limit plus the value of the skip() to the shards to improve the efficiency of these operations.
Is there any solution to improve skip query performance executed via mongos(routers)?
Thanks in advance.
If you have a HASHed shard key you won't be able to use range queries to find which nodes each item is on so it will need to scan all nodes within the sharded cluster. So the slowest query will be the slowest node in the set plus time to aggregate the results on the mongos before sending them back to the client.
Using a HASHed shard key scatters the results throughout the cluster so you'll only be able to query based on a key match.
Check out the documentation here - https://docs.mongodb.com/manual/core/hashed-sharding/
If you don't mind the query doing a full cluster scan then you could make it more efficient by adding a standard index on incidentOn this will make the query a lot faster on each node but still won't be able to pinpoint the nodes in the cluster.
I have the following query.
db.getCollection('logs').find({'uid.$id': {
'$in': [
ObjectId("580e3397812de36b86d68c04"),
ObjectId("580e33a9812de36b86d68c0b"),
ObjectId("580e339a812de36b86d68c09"),
ObjectId("580e339a812de36b86d68c08"),
ObjectId("580e33a9812de36b86d68c0a"),
ObjectId("580e33bd812de36b86d68c11"),
ObjectId("580e33c0812de36b86d68c13")
]}, levelno: { '$gte': 10 }
}).sort({_id: 1})
This should return 1847 documents. However, when executing it, I only get 1000 documents, which is the cursor's batchSize and then the cursor closes (setting its cursorId to 0), as if all documents were returned.
If I take out the sorting, then I get all 1847 documents.
So my question is, why does it silently fail when using sorting with the $in operator?
EDIT
Using explain gives the following output
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "session.logs",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"levelno" : {
"$gte" : 10
}
},
{
"uid.$id" : {
"$in" : [
ObjectId("580e3397812de36b86d68c04"),
ObjectId("580e339a812de36b86d68c08"),
ObjectId("580e339a812de36b86d68c09"),
ObjectId("580e33a9812de36b86d68c0a"),
ObjectId("580e33a9812de36b86d68c0b"),
ObjectId("580e33bd812de36b86d68c11"),
ObjectId("580e33c0812de36b86d68c13")
]
}
}
]
},
"winningPlan" : {
"stage" : "SORT",
"sortPattern" : {
"_id" : 1
},
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"uid.$id" : 1,
"levelno" : 1,
"_id" : 1
},
"indexName" : "uid.$id_1_levelno_1__id_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"uid.$id" : [
"[ObjectId('580e3397812de36b86d68c04'), ObjectId('580e3397812de36b86d68c04')]",
"[ObjectId('580e339a812de36b86d68c08'), ObjectId('580e339a812de36b86d68c08')]",
"[ObjectId('580e339a812de36b86d68c09'), ObjectId('580e339a812de36b86d68c09')]",
"[ObjectId('580e33a9812de36b86d68c0a'), ObjectId('580e33a9812de36b86d68c0a')]",
"[ObjectId('580e33a9812de36b86d68c0b'), ObjectId('580e33a9812de36b86d68c0b')]",
"[ObjectId('580e33bd812de36b86d68c11'), ObjectId('580e33bd812de36b86d68c11')]",
"[ObjectId('580e33c0812de36b86d68c13'), ObjectId('580e33c0812de36b86d68c13')]"
],
"levelno" : [
"[10.0, inf.0]"
],
"_id" : [
"[MinKey, MaxKey]"
]
}
}
}
}
},
"rejectedPlans" : [
{
"stage" : "SORT",
"sortPattern" : {
"_id" : 1
},
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"levelno" : 1,
"_id" : 1,
"uid.$id" : 1
},
"indexName" : "levelno_1__id_1_uid.$id_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"levelno" : [
"[10.0, inf.0]"
],
"_id" : [
"[MinKey, MaxKey]"
],
"uid.$id" : [
"[ObjectId('580e3397812de36b86d68c04'), ObjectId('580e3397812de36b86d68c04')]",
"[ObjectId('580e339a812de36b86d68c08'), ObjectId('580e339a812de36b86d68c08')]",
"[ObjectId('580e339a812de36b86d68c09'), ObjectId('580e339a812de36b86d68c09')]",
"[ObjectId('580e33a9812de36b86d68c0a'), ObjectId('580e33a9812de36b86d68c0a')]",
"[ObjectId('580e33a9812de36b86d68c0b'), ObjectId('580e33a9812de36b86d68c0b')]",
"[ObjectId('580e33bd812de36b86d68c11'), ObjectId('580e33bd812de36b86d68c11')]",
"[ObjectId('580e33c0812de36b86d68c13'), ObjectId('580e33c0812de36b86d68c13')]"
]
}
}
}
}
},
{
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"levelno" : {
"$gte" : 10
}
},
{
"uid.$id" : {
"$in" : [
ObjectId("580e3397812de36b86d68c04"),
ObjectId("580e339a812de36b86d68c08"),
ObjectId("580e339a812de36b86d68c09"),
ObjectId("580e33a9812de36b86d68c0a"),
ObjectId("580e33a9812de36b86d68c0b"),
ObjectId("580e33bd812de36b86d68c11"),
ObjectId("580e33c0812de36b86d68c13")
]
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"_id" : 1
},
"indexName" : "_id_",
"isMultiKey" : false,
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"_id" : [
"[MinKey, MaxKey]"
]
}
}
}
]
},
"ok" : 1
}
What's happening is that this sorted query must be performed in-memory as it's not supported by an index, and this limits the results to 32 MB. This behavior is documented here, with a JIRA about addressing this here.
Furthermore, you can't define an index to support this query as you're sorting on a field that isn't part of the query, and neither of these cases apply:
If the sort keys correspond to the index keys or an index prefix,
MongoDB can use the index to sort the query results. A prefix of a
compound index is a subset that consists of one or more keys at the
start of the index key pattern.
...
An index can support sort operations on a non-prefix subset of the
index key pattern. To do so, the query must include equality
conditions on all the prefix keys that precede the sort keys.
You should be able to work around the limitation by using the aggregation framework which can be instructed to use temporary files for its pipeline stage outputs if required via the allowDiskUse: true option:
db.getCollection('logs').aggregate([
{$match: {'uid.$id': {
'$in': [
ObjectId("580e3397812de36b86d68c04"),
ObjectId("580e33a9812de36b86d68c0b"),
ObjectId("580e339a812de36b86d68c09"),
ObjectId("580e339a812de36b86d68c08"),
ObjectId("580e33a9812de36b86d68c0a"),
ObjectId("580e33bd812de36b86d68c11"),
ObjectId("580e33c0812de36b86d68c13")
]}, levelno: { '$gte': 10 }
}},
{$sort: {_id: 1}}
], { allowDiskUse: true })
You can use objsLeftInBatch() method to determine how many object are left in batch and iterate over it.
You can override the size and limit of the cursor batch size using cursor.batchSize(size) and cursor.limit(limit)