MongoDB slow with index and sort - mongodb

I have compound index:
{
"hidden" : 1,
"country" : 1,
"edited" : 1,
"changeset.when" : -1
}
And query:
{
"country" : "ua",
"edited" : true,
"hidden" : false,
"changeset.when" : { "$lt" : ISODate("5138-11-16T09:46:40Z") }
}
It works well and fast. Now I want to sort result by: { "changeset.when" : -1 } and it slows down a lot. From hundred of milliseconds to 15 seconds.
And here is explain for query with sorting:
"winningPlan" : {
"stage" : "SORT",
"sortPattern" : {
"changeset.when" : -1
},
"limitAmount" : 15,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"hidden" : 1,
"country" : 1,
"edited" : 1,
"changeset.when" : -1
},
"indexName" : "edited_news",
"isMultiKey" : true,
"multiKeyPaths" : {
"hidden" : [ ],
"country" : [ ],
"edited" : [ ],
"changeset.when" : [
"changeset"
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"hidden" : [
"[false, false]"
],
"country" : [
"[\"ua\", \"ua\"]"
],
"edited" : [
"[true, true]"
],
"changeset.when" : [
"(new Date(100000000000000), true)"
]
}
}
}
}
}
Why is it so slow? Explain shows that it successfully uses needed index and field changeset.when is in descending order.

In case if you have compound index try to create query key sequence like your index sequencency. It will bring more performance.
You don't need to make aditional sort for result, by default result will be sorted according index (in your case result will be sorted descending by changeset.when)
For more info please share some documents from your collection.
If you have any question feel free to ask

Related

Mongo $or query with ranges is doing an in-memory sort?

I'm running into a unique situation where one query seems to do an in-memory sort. Query 1 is the one that does the in-memory sort, while Query 2 is doing a merge sort correctly.
There are a few parts to the query, so I want to know which part is causing the query sort to be done in memory?
I do have a workaround, but I would like to know the reason behind this. They both have 2 input stages, so I'm not sure what is the cause.
Schema:
schema = {
date: Date, // date that can change
createTime: Date, // create time of document
value: Number
}
Index:
schema.index({value: 1, createTime: -1, date: 1});
Query 1: I have $or at the top level to avoid using incorrect index: MongoDB query to slow when using $or operator
db.getCollection('dates').find({
$or: [
{value: {$in: [1, 2]}, date: null},
{value: {$in: [1, 2]}, date: {$gt: ISODate("2020-06-16T23:59:59.999Z")}}
]
}).sort({createTime:-1}).explain()
Query 1 plan: As you can see it does a sort in-memory. I'm not sure exactly why this is occurring.
{
"stage" : "SUBPLAN",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "SORT",
"sortPattern" : {
"createTime" : -1.0
},
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "OR",
"inputStages" : [
{
"stage" : "FETCH",
"filter" : {
"date" : {
"$eq" : null
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"value" : 1,
"createTime" : -1,
"date" : 1
},
"indexName" : "value_1_createTime_-1_date_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"value" : [],
"createTime" : [],
"date" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"value" : [
"[1.0, 1.0]",
"[2.0, 2.0]"
],
"createTime" : [
"[MaxKey, MinKey]"
],
"date" : [
"[undefined, undefined]",
"[null, null]"
]
}
}
},
{
"stage" : "IXSCAN",
"keyPattern" : {
"value" : 1,
"createTime" : -1,
"date" : 1
},
"indexName" : "value_1_createTime_-1_date_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"value" : [],
"createTime" : [],
"date" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"value" : [
"[1.0, 1.0]",
"[2.0, 2.0]"
],
"createTime" : [
"[MaxKey, MinKey]"
],
"date" : [
"(new Date(1592351999999), new Date(9223372036854775807)]"
]
}
}
]
}
}
}
}
}
Query 2:
db.getCollection('dates').find({
value: {$in: [1, 2]},
date: {$not: {$lte: ISODate("2020-06-16T23:59:59.999Z")}}
}).sort({createTime:-1}).explain()
Query 2 plan: The workaround query I used, which does a merge sort successfully.
{
"stage" : "FETCH",
"inputStage" : {
"stage" : "SORT_MERGE",
"sortPattern" : {
"createTime" : -1.0
},
"inputStages" : [
{
"stage" : "IXSCAN",
"keyPattern" : {
"value" : 1,
"createTime" : -1,
"date" : 1
},
"indexName" : "value_1_createTime_-1_date_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"value" : [],
"createTime" : [],
"date" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"value" : [
"[1.0, 1.0]"
],
"createTime" : [
"[MaxKey, MinKey]"
],
"date" : [
"[MinKey, true]",
"(new Date(1592351999999), MaxKey]"
]
}
},
{
"stage" : "IXSCAN",
"keyPattern" : {
"value" : 1,
"createTime" : -1,
"date" : 1
},
"indexName" : "value_1_createTime_-1_date_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"value" : [],
"createTime" : [],
"date" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"value" : [
"[2.0, 2.0]"
],
"createTime" : [
"[MaxKey, MinKey]"
],
"date" : [
"[MinKey, true]",
"(new Date(1592351999999), MaxKey]"
]
}
}
]
}
}
Each of the branches of $or could use an index, but then you still have two result sets and if you apply sort on top the database has to sort the results in memory. Seems reasonable that having sort over an $or operator would produce an in-memory sort.

Explain why results from mongo are being returned in reverse ObjectId order?

I have a list of news article items which I am tagging for entities, and topic tags.
my query
db["fmetadata"].find({'$and': [{'$text': {'$search': 'apple trump'}}, {'$or':
[{'entities': {'$elemMatch': {'$regex': 'apple|trump'}}}, {'tags': {'$elemMatch': {'$regex': 'apple|trump'}}}]}]}).explain()
query plan
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "dfabric.fmetadata",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"$or" : [
{
"entities" : {
"$elemMatch" : {
"$regex" : "apple|trump"
}
}
},
{
"tags" : {
"$elemMatch" : {
"$regex" : "apple|trump"
}
}
}
]
},
{
"$text" : {
"$search" : "apple trump",
"$language" : "english",
"$caseSensitive" : false,
"$diacriticSensitive" : false
}
}
]
},
"winningPlan" : {
"stage" : "FETCH",
"filter" : {
"$or" : [
{
"entities" : {
"$elemMatch" : {
"$regex" : "apple|trump"
}
}
},
{
"tags" : {
"$elemMatch" : {
"$regex" : "apple|trump"
}
}
}
]
},
"inputStage" : {
"stage" : "TEXT",
"indexPrefix" : {
},
"indexName" : "title_text_tags_text_entities_text",
"parsedTextQuery" : {
"terms" : [
"appl",
"trump"
],
"negatedTerms" : [ ],
"phrases" : [ ],
"negatedPhrases" : [ ]
},
"textIndexVersion" : 3,
"inputStage" : {
"stage" : "TEXT_MATCH",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "OR",
"inputStages" : [
{
"stage" : "IXSCAN",
"keyPattern" : {
"_fts" : "text",
"_ftsx" : 1
},
"indexName" : "title_text_tags_text_entities_text",
"isMultiKey" : true,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "backward",
"indexBounds" : {
}
},
{
"stage" : "IXSCAN",
"keyPattern" : {
"_fts" : "text",
"_ftsx" : 1
},
"indexName" : "title_text_tags_text_entities_text",
"isMultiKey" : true,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "backward",
"indexBounds" : {
}
}
]
}
}
}
}
},
"rejectedPlans" : [ ]
},
"serverInfo" : {
"host" : "fabric-dev",
"port" : 27017,
"version" : "4.0.2",
"gitVersion" : "fc1573ba18aee42f97a3bb13b67af7d837826b47"
},
"ok" : 1
}
I see that
["queryPlanner"]["winningPlan"]["inputStage"]["inputStage"]["inputStages"]
"stage": "IXSCAN"
"direction": "backward"
Can this please be explained why?
I was developing a pagination cursor using >lastId, and limit technique. But since, results are being returned backwards, I have to use < lastId which seems counterintuitive.
If I don't sort my results in the natural order, can it be guaranteed that it will always be backwards/reverse?
Edit: as mentioned in the comment below
My objective here is to get the intuition as to why the index was scanned backwards- is it the way I formulated my query? or something else entirely? The ordering- forwards or backwards doesn't matter as much as the consistency of it remaining always so does- either always forwards or vice versa
I came across this question on stackoverflow, and I believe the accepted answer, with the comments below satisfactorily gives me the intuition I was looking for.
How does MongoDB sort records when no sort order is specified?

MongoDB - Index not being used when sorting and limiting on ranged query

I'm trying to get a sorted list of items using a ranged query on a collection containing bulletin-board data. The data structure of a "thread" document is:
{
"_id" : ObjectId("5a779b47f4fa72412126526a"),
"title" : "necessitatibus tincidunt libris assueverit",
"content" : "Corrumpitvenenatis cubilia adipiscing sollicitudin",
"flagged" : false,
"locked" : false,
"sticky" : false,
"lastPostAt" : ISODate("2018-02-05T06:35:24.656Z"),
"postCount" : 42,
"user" : ObjectId("5a779b46f4fa72412126525a"),
"category" : ObjectId("5a779b31f4fa724121265164"),
"createdAt" : ISODate("2018-02-04T23:46:15.852Z"),
"updatedAt" : ISODate("2018-02-05T06:35:24.656Z")
}
The query is:
db.threads.find({
category: ObjectId('5a779b31f4fa724121265142'),
_id : { $gt: ObjectId('5a779b5cf4fa724121269be8') }
}).sort({ sticky: -1, lastPostAt: -1, _id: 1 }).limit(25)
I set up the following indexes to support it:
{ category: 1, _id: 1 }
{ category: 1, _id: 1, sticky: 1, lastPostAt: 1 }
{ sticky: 1, lastPostAt: 1, _id: 1 }
In spite of this, it's still scanning hundreds of documents/keys according to execution stats:
{
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 772,
"executionTimeMillis" : 17,
"totalKeysExamined" : 772,
"totalDocsExamined" : 772,
"executionStages" : {
"stage" : "SORT",
"nReturned" : 772,
"executionTimeMillisEstimate" : 0,
"works" : 1547,
"advanced" : 772,
"needTime" : 774,
"needYield" : 0,
"saveState" : 33,
"restoreState" : 33,
"isEOF" : 1,
"invalidates" : 0,
"sortPattern" : {
"sticky" : -1,
"lastPostAt" : -1,
"_id" : 1
},
"memUsage" : 1482601,
"memLimit" : 33554432,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"nReturned" : 772,
"executionTimeMillisEstimate" : 0,
"works" : 774,
"advanced" : 772,
"needTime" : 1,
"needYield" : 0,
"saveState" : 33,
"restoreState" : 33,
"isEOF" : 1,
"invalidates" : 0,
"inputStage" : {
"stage" : "FETCH",
"nReturned" : 772,
"executionTimeMillisEstimate" : 0,
"works" : 773,
"advanced" : 772,
"needTime" : 0,
"needYield" : 0,
"saveState" : 33,
"restoreState" : 33,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 772,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 772,
"executionTimeMillisEstimate" : 0,
"works" : 773,
"advanced" : 772,
"needTime" : 0,
"needYield" : 0,
"saveState" : 33,
"restoreState" : 33,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"category" : 1,
"_id" : 1,
"sticky" : 1,
"lastPostAt" : 1
},
"indexName" : "category_1__id_1_sticky_1_lastPostAt_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"category" : [ ],
"_id" : [ ],
"sticky" : [ ],
"lastPostAt" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"category" : [
"[ObjectId('5a779b31f4fa724121265142'), ObjectId('5a779b31f4fa724121265142')]"
],
"_id" : [
"(ObjectId('5a779b5cf4fa724121269be8'), ObjectId('ffffffffffffffffffffffff')]"
],
"sticky" : [
"[MinKey, MaxKey]"
],
"lastPostAt" : [
"[MinKey, MaxKey]"
]
},
"keysExamined" : 772,
"seeks" : 1,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
}
}
}
}
When I take out the sorting stage, it correctly scans only 25 documents. And the keys examined (772) remains the same no matter which fields I place in the sort function.
Here is the full explain() for the sorted query:
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "database.threads",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"category" : {
"$eq" : ObjectId("5a779b31f4fa724121265142")
}
},
{
"_id" : {
"$gt" : ObjectId("5a779b5cf4fa724121269be8")
}
}
]
},
"winningPlan" : {
"stage" : "SORT",
"sortPattern" : {
"sticky" : -1,
"lastPostAt" : -1,
"_id" : 1
},
"limitAmount" : 25,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"category" : 1,
"_id" : 1,
"sticky" : 1,
"lastPostAt" : 1
},
"indexName" : "category_1__id_1_sticky_1_lastPostAt_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"category" : [ ],
"_id" : [ ],
"sticky" : [ ],
"lastPostAt" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"category" : [
"[ObjectId('5a779b31f4fa724121265142'), ObjectId('5a779b31f4fa724121265142')]"
],
"_id" : [
"(ObjectId('5a779b5cf4fa724121269be8'), ObjectId('ffffffffffffffffffffffff')]"
],
"sticky" : [
"[MinKey, MaxKey]"
],
"lastPostAt" : [
"[MinKey, MaxKey]"
]
}
}
}
}
},
"rejectedPlans" : [
{
"stage" : "SORT",
"sortPattern" : {
"sticky" : -1,
"lastPostAt" : -1,
"_id" : 1
},
"limitAmount" : 25,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"_id" : {
"$gt" : ObjectId("5a779b5cf4fa724121269be8")
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"category" : 1
},
"indexName" : "category_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"category" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"category" : [
"[ObjectId('5a779b31f4fa724121265142'), ObjectId('5a779b31f4fa724121265142')]"
]
}
}
}
}
},
{
"stage" : "SORT",
"sortPattern" : {
"sticky" : -1,
"lastPostAt" : -1,
"_id" : 1
},
"limitAmount" : 25,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"category" : 1,
"_id" : 1
},
"indexName" : "category_1__id_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"category" : [ ],
"_id" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"category" : [
"[ObjectId('5a779b31f4fa724121265142'), ObjectId('5a779b31f4fa724121265142')]"
],
"_id" : [
"(ObjectId('5a779b5cf4fa724121269be8'), ObjectId('ffffffffffffffffffffffff')]"
]
}
}
}
}
},
{
"stage" : "SORT",
"sortPattern" : {
"sticky" : -1,
"lastPostAt" : -1,
"_id" : 1
},
"limitAmount" : 25,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"category" : {
"$eq" : ObjectId("5a779b31f4fa724121265142")
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"_id" : 1
},
"indexName" : "_id_",
"isMultiKey" : false,
"multiKeyPaths" : {
"_id" : [ ]
},
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"_id" : [
"(ObjectId('5a779b5cf4fa724121269be8'), ObjectId('ffffffffffffffffffffffff')]"
]
}
}
}
}
}
]
},
"serverInfo" : {
"host" : "CRF-MBP.local",
"port" : 27017,
"version" : "3.6.2",
"gitVersion" : "489d177dbd0f0420a8ca04d39fd78d0a2c539420"
},
"ok" : 1
}
And here is the full explain() for the non-sorted query:
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "database.threads",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"category" : {
"$eq" : ObjectId("5a779b31f4fa724121265142")
}
},
{
"_id" : {
"$gt" : ObjectId("5a779b5cf4fa724121269be8")
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 25,
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"category" : 1,
"_id" : 1,
"sticky" : 1,
"lastPostAt" : 1
},
"indexName" : "category_1__id_1_sticky_1_lastPostAt_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"category" : [ ],
"_id" : [ ],
"sticky" : [ ],
"lastPostAt" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"category" : [
"[ObjectId('5a779b31f4fa724121265142'), ObjectId('5a779b31f4fa724121265142')]"
],
"_id" : [
"(ObjectId('5a779b5cf4fa724121269be8'), ObjectId('ffffffffffffffffffffffff')]"
],
"sticky" : [
"[MinKey, MaxKey]"
],
"lastPostAt" : [
"[MinKey, MaxKey]"
]
}
}
}
},
"rejectedPlans" : [
{
"stage" : "LIMIT",
"limitAmount" : 25,
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"_id" : {
"$gt" : ObjectId("5a779b5cf4fa724121269be8")
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"category" : 1
},
"indexName" : "category_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"category" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"category" : [
"[ObjectId('5a779b31f4fa724121265142'), ObjectId('5a779b31f4fa724121265142')]"
]
}
}
}
},
{
"stage" : "LIMIT",
"limitAmount" : 25,
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"category" : 1,
"_id" : 1
},
"indexName" : "category_1__id_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"category" : [ ],
"_id" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"category" : [
"[ObjectId('5a779b31f4fa724121265142'), ObjectId('5a779b31f4fa724121265142')]"
],
"_id" : [
"(ObjectId('5a779b5cf4fa724121269be8'), ObjectId('ffffffffffffffffffffffff')]"
]
}
}
}
},
{
"stage" : "LIMIT",
"limitAmount" : 25,
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"category" : {
"$eq" : ObjectId("5a779b31f4fa724121265142")
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"_id" : 1
},
"indexName" : "_id_",
"isMultiKey" : false,
"multiKeyPaths" : {
"_id" : [ ]
},
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"_id" : [
"(ObjectId('5a779b5cf4fa724121269be8'), ObjectId('ffffffffffffffffffffffff')]"
]
}
}
}
}
]
},
"serverInfo" : {
"host" : "CRF-MBP.local",
"port" : 27017,
"version" : "3.6.2",
"gitVersion" : "489d177dbd0f0420a8ca04d39fd78d0a2c539420"
},
"ok" : 1
}
Does anyone have any idea why this might not fully use an index?
The issue is that none of your indexes actually help with the sorted query. This is the reason for the high number of scanned objects and the presence of SORT_KEY_GENERATOR stage (in-memory sort, limited to 32MB).
The non-sorted query, on the other hand, can use either the { category: 1, _id: 1 } or { category: 1, _id: 1, sticky: 1, lastPostAt: 1 } indexes. Note that it's perfectly valid to use either one, since one contains the prefix of the other. See Prefixes for more details.
MongoDB find() queries typically uses only one index, so a single compound index should cater for all the parameters of your query. This would include both the parameters of find() and sort().
A good writeup of how your index should be created is available in Optimizing MongoDB Compound Indexes. Let's take the main point of the article, where the compound index ordering should be equality --> sort --> range:
Your query "shape" is:
db.collection.find({category:..., _id: {$gt:...}})
.sort({sticky:-1, lastPostAt:-1, _id:1})
.limit(25)
We see that:
category:... is equality
sticky:-1, lastPostAt:-1, _id:1 is sort
_id: {$gt:...} is range
So the compound index you need is:
{category:1, sticky:-1, lastPostAt:-1, _id:1}
Where the winning plan of the explain() output of your query with the above index shows:
"winningPlan": {
"stage": "LIMIT",
"limitAmount": 25,
"inputStage": {
"stage": "FETCH",
"inputStage": {
"stage": "IXSCAN",
"keyPattern": {
"category": 1,
"sticky": -1,
"lastPostAt": -1,
"_id": 1
},
"indexName": "category_1_sticky_-1_lastPostAt_-1__id_1",
"isMultiKey": false,
"multiKeyPaths": {
"category": [ ],
"sticky": [ ],
"lastPostAt": [ ],
"_id": [ ]
},
"isUnique": false,
"isSparse": false,
"isPartial": false,
"indexVersion": 2,
"direction": "forward",
"indexBounds": {
"category": [
"[ObjectId('5a779b31f4fa724121265142'), ObjectId('5a779b31f4fa724121265142')]"
],
"sticky": [
"[MaxKey, MinKey]"
],
"lastPostAt": [
"[MaxKey, MinKey]"
],
"_id": [
"(ObjectId('5a779b5cf4fa724121269be8'), ObjectId('ffffffffffffffffffffffff')]"
]
}
}
}
}
Note that the winning plan doesn't contain a SORT_KEY_GENERATOR stage. This means that the index can be fully utilized to respond to the sorted query.
I believe there are 2 problems both having to do with your sort. These problems come straight from the documentations but if you would comment I'll help explain (and might possibly learn something myself)
The first and biggest problem is that you must sort in the order given by the index. From docs:
You can specify a sort on all the keys of the index or on a subset;
however, the sort keys must be listed in the same order as they appear
in the index. For example, an index key pattern { a: 1, b: 1 } can
support a sort on { a: 1, b: 1 } but not on { b: 1, a: 1 }.
This means that you must sort in the order given by your winning plan: category, _id, sticky, lastPostAt (or any prefix of that order such as category, _id, sticky or category _id). If not mongodb will identify the 772 docs which are indexed using your winning plan, but will then have to comb through each key in order to assess values and provide the desired sort order. If you want to sort by the order you are curenttly querying must provide a index in that order:
The second problem is that you must sort in the direction that you provided by the index (or the inverse direction).
For a query to use a compound index for a sort, the specified sort
direction for all keys in the cursor.sort() document must match the
index key pattern or match the inverse of the index key pattern. For
example, an index key pattern { a: 1, b: -1 } can support a sort on {
a: 1, b: -1 } and { a: -1, b: 1 } but not on { a: -1, b: -1 } or {a:
1, b: 1}.
Because your indexes are all in ascending order, you would have to either sort in ascending order for all indexes, or descending order for all indexes. If not we run into the same problem in which mongo finds all the relevant docs, but has to comb through them to provide the desired order.
I believe you would get imporved functionality by providing an additional index of:
{ sticky: -1, lastPostAt: -1, _id: 1 }
or its inverse:
{ sticky: 1, lastPostAt: 1, _id: -1 }
This would create a situation where mongo uses your first index
{ category: 1, _id: 1 }
To identify potential unsorted documents, then uses the one of the new index (provided above) since they would already be sorted. Then the limit would take care of giving you your 25 docs.
I'm pretty sure this would created a covered query (a query with no docs examined). Let me know how it goes, cheers!

Sorting with $in not returning all docs

I have the following query.
db.getCollection('logs').find({'uid.$id': {
'$in': [
ObjectId("580e3397812de36b86d68c04"),
ObjectId("580e33a9812de36b86d68c0b"),
ObjectId("580e339a812de36b86d68c09"),
ObjectId("580e339a812de36b86d68c08"),
ObjectId("580e33a9812de36b86d68c0a"),
ObjectId("580e33bd812de36b86d68c11"),
ObjectId("580e33c0812de36b86d68c13")
]}, levelno: { '$gte': 10 }
}).sort({_id: 1})
This should return 1847 documents. However, when executing it, I only get 1000 documents, which is the cursor's batchSize and then the cursor closes (setting its cursorId to 0), as if all documents were returned.
If I take out the sorting, then I get all 1847 documents.
So my question is, why does it silently fail when using sorting with the $in operator?
EDIT
Using explain gives the following output
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "session.logs",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"levelno" : {
"$gte" : 10
}
},
{
"uid.$id" : {
"$in" : [
ObjectId("580e3397812de36b86d68c04"),
ObjectId("580e339a812de36b86d68c08"),
ObjectId("580e339a812de36b86d68c09"),
ObjectId("580e33a9812de36b86d68c0a"),
ObjectId("580e33a9812de36b86d68c0b"),
ObjectId("580e33bd812de36b86d68c11"),
ObjectId("580e33c0812de36b86d68c13")
]
}
}
]
},
"winningPlan" : {
"stage" : "SORT",
"sortPattern" : {
"_id" : 1
},
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"uid.$id" : 1,
"levelno" : 1,
"_id" : 1
},
"indexName" : "uid.$id_1_levelno_1__id_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"uid.$id" : [
"[ObjectId('580e3397812de36b86d68c04'), ObjectId('580e3397812de36b86d68c04')]",
"[ObjectId('580e339a812de36b86d68c08'), ObjectId('580e339a812de36b86d68c08')]",
"[ObjectId('580e339a812de36b86d68c09'), ObjectId('580e339a812de36b86d68c09')]",
"[ObjectId('580e33a9812de36b86d68c0a'), ObjectId('580e33a9812de36b86d68c0a')]",
"[ObjectId('580e33a9812de36b86d68c0b'), ObjectId('580e33a9812de36b86d68c0b')]",
"[ObjectId('580e33bd812de36b86d68c11'), ObjectId('580e33bd812de36b86d68c11')]",
"[ObjectId('580e33c0812de36b86d68c13'), ObjectId('580e33c0812de36b86d68c13')]"
],
"levelno" : [
"[10.0, inf.0]"
],
"_id" : [
"[MinKey, MaxKey]"
]
}
}
}
}
},
"rejectedPlans" : [
{
"stage" : "SORT",
"sortPattern" : {
"_id" : 1
},
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"levelno" : 1,
"_id" : 1,
"uid.$id" : 1
},
"indexName" : "levelno_1__id_1_uid.$id_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"levelno" : [
"[10.0, inf.0]"
],
"_id" : [
"[MinKey, MaxKey]"
],
"uid.$id" : [
"[ObjectId('580e3397812de36b86d68c04'), ObjectId('580e3397812de36b86d68c04')]",
"[ObjectId('580e339a812de36b86d68c08'), ObjectId('580e339a812de36b86d68c08')]",
"[ObjectId('580e339a812de36b86d68c09'), ObjectId('580e339a812de36b86d68c09')]",
"[ObjectId('580e33a9812de36b86d68c0a'), ObjectId('580e33a9812de36b86d68c0a')]",
"[ObjectId('580e33a9812de36b86d68c0b'), ObjectId('580e33a9812de36b86d68c0b')]",
"[ObjectId('580e33bd812de36b86d68c11'), ObjectId('580e33bd812de36b86d68c11')]",
"[ObjectId('580e33c0812de36b86d68c13'), ObjectId('580e33c0812de36b86d68c13')]"
]
}
}
}
}
},
{
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"levelno" : {
"$gte" : 10
}
},
{
"uid.$id" : {
"$in" : [
ObjectId("580e3397812de36b86d68c04"),
ObjectId("580e339a812de36b86d68c08"),
ObjectId("580e339a812de36b86d68c09"),
ObjectId("580e33a9812de36b86d68c0a"),
ObjectId("580e33a9812de36b86d68c0b"),
ObjectId("580e33bd812de36b86d68c11"),
ObjectId("580e33c0812de36b86d68c13")
]
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"_id" : 1
},
"indexName" : "_id_",
"isMultiKey" : false,
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"_id" : [
"[MinKey, MaxKey]"
]
}
}
}
]
},
"ok" : 1
}
What's happening is that this sorted query must be performed in-memory as it's not supported by an index, and this limits the results to 32 MB. This behavior is documented here, with a JIRA about addressing this here.
Furthermore, you can't define an index to support this query as you're sorting on a field that isn't part of the query, and neither of these cases apply:
If the sort keys correspond to the index keys or an index prefix,
MongoDB can use the index to sort the query results. A prefix of a
compound index is a subset that consists of one or more keys at the
start of the index key pattern.
...
An index can support sort operations on a non-prefix subset of the
index key pattern. To do so, the query must include equality
conditions on all the prefix keys that precede the sort keys.
You should be able to work around the limitation by using the aggregation framework which can be instructed to use temporary files for its pipeline stage outputs if required via the allowDiskUse: true option:
db.getCollection('logs').aggregate([
{$match: {'uid.$id': {
'$in': [
ObjectId("580e3397812de36b86d68c04"),
ObjectId("580e33a9812de36b86d68c0b"),
ObjectId("580e339a812de36b86d68c09"),
ObjectId("580e339a812de36b86d68c08"),
ObjectId("580e33a9812de36b86d68c0a"),
ObjectId("580e33bd812de36b86d68c11"),
ObjectId("580e33c0812de36b86d68c13")
]}, levelno: { '$gte': 10 }
}},
{$sort: {_id: 1}}
], { allowDiskUse: true })
You can use objsLeftInBatch() method to determine how many object are left in batch and iterate over it.
You can override the size and limit of the cursor batch size using cursor.batchSize(size) and cursor.limit(limit)

MongoDB optimize indexes for aggregation

I have an aggregate on a collection with about 1.6M of registers. That consult is a simple example of other more complex, but illustrate the poor optimization of index used in my opinion.
db.getCollection('cbAlters').runCommand("aggregate", {pipeline: [
{
$match: { cre_carteraId: "31" }
},
{
$group: { _id: { ca_tramomora: "$cre_tramoMora" },
count: { $sum: 1 } }
}
]})
That query toke about 5 sec. The colleccion have 25 indexes configured to differents consults. The one used according to query explain is:
{
"v" : 1,
"key" : {
"cre_carteraId" : 1,
"cre_periodo" : 1,
"cre_tramoMora" : 1,
"cre_inactivo" : 1
},
"name" : "cartPerTramInact",
"ns" : "basedatos.cbAlters"
},
I created an index adjusted to this particular query:
{
"v" : 1,
"key" : {
"cre_carteraId" : 1,
"cre_tramoMora" : 1
},
"name" : "cartPerTramTest",
"ns" : "basedatos.cbAlters"
}
The query optimizer reject this index, and suggests me to use the initial index. Output of my query explain seem like this:
{
"waitedMS" : NumberLong(0),
"stages" : [
{
"$cursor" : {
"query" : {
"cre_carteraId" : "31"
},
"fields" : {
"cre_tramoMora" : 1,
"_id" : 0
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "basedatos.cbAlters",
"indexFilterSet" : false,
"parsedQuery" : {
"cre_carteraId" : {
"$eq" : "31"
}
},
"winningPlan" : {
"stage" : "PROJECTION",
"transformBy" : {
"cre_tramoMora" : 1,
"_id" : 0
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"cre_carteraId" : 1,
"cre_periodo" : 1,
"cre_tramoMora" : 1,
"cre_inactivo" : 1
},
"indexName" : "cartPerTramInact",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"cre_carteraId" : [
"[\"31\", \"31\"]"
],
"cre_periodo" : [
"[MinKey, MaxKey]"
],
"cre_tramoMora" : [
"[MinKey, MaxKey]"
],
"cre_inactivo" : [
"[MinKey, MaxKey]"
]
}
}
},
"rejectedPlans" : [
{
"stage" : "PROJECTION",
"transformBy" : {
"cre_tramoMora" : 1,
"_id" : 0
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"cre_carteraId" : 1,
"cre_tramoMora" : 1
},
"indexName" : "cartPerTramTest",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"cre_carteraId" : [
"[\"31\", \"31\"]"
],
"cre_tramoMora" : [
"[MinKey, MaxKey]"
]
}
}
}
]
}
}
},
{
"$group" : {
"_id" : {
"ca_tramomora" : "$cre_tramoMora"
},
"count" : {
"$sum" : {
"$const" : 1.0
}
}
}
}
],
"ok" : 1.0
}
Then, why optimizer prefers an index less adjusted? Should indexFilterSet (result filtered for index) be true for this aggregate?
How can I improve this index, or something goes wrong with the query?
I do not have much experience with mongoDB, I appreciate any help
As long as you have index cartPerTramInact, optimizer won't use your cartPerTramTest index because first fields are same and in same order.
This goes with other indexes too. When there is indexes what have same keys at same order (like a.b.c.d, a.b.d, a.b) and you query use fields a.b, it will favour that a.b.c.d. Anyway you don't need that index a.b because you already have two indexes what covers a.b (a.b.c.d and a.b.d)
Index a.b.d is used only when you do query with those fields a.b.d, BUT if a.b is already very selective, it's probably faster to do select with index a.b.c.d using only part a.b and do "full table scan" to find that d
There is a hint option for aggregations that can help with the index...
See https://www.mongodb.com/docs/upcoming/reference/method/db.collection.aggregate/#mongodb-method-db.collection.aggregate