distinct vs find - Is there any performance difference? - mongodb

I have the following format of data in my collection,
{
"dName": "d1",
"city": "c1",
"state": "s1"
}, {
"dName": "d2",
"city": "c1",
"state": "s1"
}, {
"dName": "d2",
"city": "c1",
"state": "s2"
}
I have a compound index on all three fields combined.
dName is unique across documents. I want to get list of dNames, given city and state. I have found the following queries does the same,
db.collection.find({city: 'c1', state: 's1'}, {dName: 1, _id: 0}); -> returns [{dName: 'd1'}, {dName: 'd2'}]
db.collection.distinct('dName', {city: 'c1', state: 's1'}); -> returns ['d1', 'd2']
The first one returns an array of objects and seconds one returns an array of string. Other than that is there any performance improvement using one over another. I think distinct is costlier since it is trying to maintain the uniqueness of the response. Is that true?
Winning plans for both queries,
Of find query (1)
{
...
"winningPlan" : {
"stage" : "PROJECTION",
"transformBy" : {
"dName" : 1.0,
"_id" : 0.0
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"city" : 1,
"state" : 1,
"dName" : 1
},
"indexName" : "city_1_state_1_dName_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"city" : [],
"state" : [],
"dName" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"city" : [
"[\"c1\", \"c1\"]"
],
"state" : [
"[\"s1\", \"s1\"]"
],
"dName" : [
"[MinKey, MaxKey]"
]
}
}
}
...
}
Of distinct query (2)
{
...
"winningPlan": {
"stage": "PROJECTION",
"transformBy": {"_id": 0, "dName": 1},
"inputStage": {
"stage": "DISTINCT_SCAN",
"keyPattern": {
"city": 1,
"state": 1,
"dName": 1
},
"indexName": "city_1_state_1_dName_1",
"isMultiKey": false,
"multiKeyPaths": {
"city": [],
"state": [],
"dName": []
},
"isUnique": false,
"isSparse": false,
"isPartial": false,
"indexVersion": 2,
"direction": "forward",
"indexBounds": {
"city": [
"[\"c1\", \"c1\"]"
],
"state": [
"[\"s1\", \"s1\"]"
],
"dName": [
"[MinKey, MaxKey]"
]
}
}
}
...
}

Related

Question about how mongodb chooses indexes or $or queries

Background
I have a collection with the following schema:
{ appId, createdAt, lastSeen, lastHeard, deleted, ...otherFields }
and the following indexes:
{ appId: 1, deleted: 1, lastSeen: 1 }
{ appId: 1, deleted: 1, lastHeard: 1 }
{ appId: 1, createdAt: 1, deleted: 1 }
{ appId: 1, deleted: 1, createdAt: 1, lastSeen: 1, lastHeard: 1 }
In my application I have an aggregation:
db.getCollection('client_users').aggregate([
{
$match: {
deleted: false,
appId: 'appid',
$or: [
{ createdAt: { $gt: new Date('2020-10-19T17:00:00.000Z') } },
{ lastSeen: { $gt: new Date('2020-10-19T17:00:00.000Z') } },
{ lastHeard: { $gt: new Date('2020-10-19T17:00:00.000Z') } },
]
}
},
{
$group: {
_id: '$geoLocation.city',
count: {
$sum: 1
}
}
},
{
$sort: {
count: -1
}
}
]);
My intention was to use the first three indexes from the above for this aggregation, as I understand that the $or query is parsed into 3 separate queries. However, from the explain output, the winning plan uses the forth index ({ appId: 1, deleted: 1, createdAt: 1, lastSeen: 1, lastHeard: 1 } ) for the 2 last clauses:
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "OR",
"inputStages" : [
{
"stage" : "IXSCAN",
"keyPattern" : {
"appId" : 1,
"createdAt" : 1,
"deleted" : 1
},
"indexName" : "appId_1_createdAt_1_deleted_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"appId" : [],
"createdAt" : [],
"deleted" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"appId" : [
"[\"appid\", \"appid\"]"
],
"createdAt" : [
"(new Date(1603126800000), new Date(9223372036854775807)]"
],
"deleted" : [
"[false, false]"
]
}
},
{
"stage" : "IXSCAN",
"keyPattern" : {
"appId" : 1,
"deleted" : 1,
"createdAt" : 1,
"lastSeen" : 1,
"lastHeard" : 1
},
"indexName" : "appId_1_deleted_1_createdAt_1_lastSeen_1_lastHeard_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"appId" : [],
"deleted" : [],
"createdAt" : [],
"lastSeen" : [],
"lastHeard" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"appId" : [
"[\"appid\", \"appid\"]"
],
"deleted" : [
"[false, false]"
],
"createdAt" : [
"[MinKey, MaxKey]"
],
"lastSeen" : [
"[MinKey, MaxKey]"
],
"lastHeard" : [
"(new Date(1603126800000), new Date(9223372036854775807)]"
]
}
},
{
"stage" : "IXSCAN",
"keyPattern" : {
"appId" : 1,
"deleted" : 1,
"createdAt" : 1,
"lastSeen" : 1,
"lastHeard" : 1
},
"indexName" : "appId_1_deleted_1_createdAt_1_lastSeen_1_lastHeard_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"appId" : [],
"deleted" : [],
"createdAt" : [],
"lastSeen" : [],
"lastHeard" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"appId" : [
"[\"appid\", \"appid\"]"
],
"deleted" : [
"[false, false]"
],
"createdAt" : [
"[MinKey, MaxKey]"
],
"lastSeen" : [
"(new Date(1603126800000), new Date(9223372036854775807)]"
],
"lastHeard" : [
"[MinKey, MaxKey]"
]
}
}
]
}
},
Which is not what I want. What's stranger is that when I try it with only 1 of the clauses, as in this $match stage:
$match: {
deleted: false,
appId: 'appid',
lastSeen: {$gt: new Date('2020-10-19T17:00:00.000Z') },
}
It uses the correct index ({ appId: 1, deleted: 1, lastSeen: 1 }). I know this from the explain output and from timing the actual aggregation. Specifically, running it with no hint or with hint: appId_1_deleted_1_lastSeen_1 takes three times shorter than with hint: appId_1_deleted_1_createdAt_1_lastSeen_1_lastHeard_1. This makes me very confused about how mongodb chooses the index.
Can someone explain to me what could have been the reason for this behavior? Is there a way for me to force mongodb to use the indexes I want in this case? Thanks.
I figured it out. It was precisely because of the $or query. Mongodb chooses the query plan by letting them do a small race against each other. The more inefficient plan luckily won because the first $or clause took care of everything (remember it was $or so only 1 clause is enough). I fixed this by dropping the forth index.

Mongo $or query with ranges is doing an in-memory sort?

I'm running into a unique situation where one query seems to do an in-memory sort. Query 1 is the one that does the in-memory sort, while Query 2 is doing a merge sort correctly.
There are a few parts to the query, so I want to know which part is causing the query sort to be done in memory?
I do have a workaround, but I would like to know the reason behind this. They both have 2 input stages, so I'm not sure what is the cause.
Schema:
schema = {
date: Date, // date that can change
createTime: Date, // create time of document
value: Number
}
Index:
schema.index({value: 1, createTime: -1, date: 1});
Query 1: I have $or at the top level to avoid using incorrect index: MongoDB query to slow when using $or operator
db.getCollection('dates').find({
$or: [
{value: {$in: [1, 2]}, date: null},
{value: {$in: [1, 2]}, date: {$gt: ISODate("2020-06-16T23:59:59.999Z")}}
]
}).sort({createTime:-1}).explain()
Query 1 plan: As you can see it does a sort in-memory. I'm not sure exactly why this is occurring.
{
"stage" : "SUBPLAN",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "SORT",
"sortPattern" : {
"createTime" : -1.0
},
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "OR",
"inputStages" : [
{
"stage" : "FETCH",
"filter" : {
"date" : {
"$eq" : null
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"value" : 1,
"createTime" : -1,
"date" : 1
},
"indexName" : "value_1_createTime_-1_date_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"value" : [],
"createTime" : [],
"date" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"value" : [
"[1.0, 1.0]",
"[2.0, 2.0]"
],
"createTime" : [
"[MaxKey, MinKey]"
],
"date" : [
"[undefined, undefined]",
"[null, null]"
]
}
}
},
{
"stage" : "IXSCAN",
"keyPattern" : {
"value" : 1,
"createTime" : -1,
"date" : 1
},
"indexName" : "value_1_createTime_-1_date_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"value" : [],
"createTime" : [],
"date" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"value" : [
"[1.0, 1.0]",
"[2.0, 2.0]"
],
"createTime" : [
"[MaxKey, MinKey]"
],
"date" : [
"(new Date(1592351999999), new Date(9223372036854775807)]"
]
}
}
]
}
}
}
}
}
Query 2:
db.getCollection('dates').find({
value: {$in: [1, 2]},
date: {$not: {$lte: ISODate("2020-06-16T23:59:59.999Z")}}
}).sort({createTime:-1}).explain()
Query 2 plan: The workaround query I used, which does a merge sort successfully.
{
"stage" : "FETCH",
"inputStage" : {
"stage" : "SORT_MERGE",
"sortPattern" : {
"createTime" : -1.0
},
"inputStages" : [
{
"stage" : "IXSCAN",
"keyPattern" : {
"value" : 1,
"createTime" : -1,
"date" : 1
},
"indexName" : "value_1_createTime_-1_date_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"value" : [],
"createTime" : [],
"date" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"value" : [
"[1.0, 1.0]"
],
"createTime" : [
"[MaxKey, MinKey]"
],
"date" : [
"[MinKey, true]",
"(new Date(1592351999999), MaxKey]"
]
}
},
{
"stage" : "IXSCAN",
"keyPattern" : {
"value" : 1,
"createTime" : -1,
"date" : 1
},
"indexName" : "value_1_createTime_-1_date_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"value" : [],
"createTime" : [],
"date" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"value" : [
"[2.0, 2.0]"
],
"createTime" : [
"[MaxKey, MinKey]"
],
"date" : [
"[MinKey, true]",
"(new Date(1592351999999), MaxKey]"
]
}
}
]
}
}
Each of the branches of $or could use an index, but then you still have two result sets and if you apply sort on top the database has to sort the results in memory. Seems reasonable that having sort over an $or operator would produce an in-memory sort.

Mongodb performance issue when sorting

I'm trying to switch mysql to mongodb. But i have an issue about sorting performance.
I have millions of documents as following;
{
"_id": ObjectId("5af7cbda7500fc509c3098ce"),
"name": "Task Name",
"category": "performance",
"subIssues": [
{
"taskId": 10,
"description": "Task description",
"createdAt": "2018-05-11 14:37:07.000Z"
},
{
"taskId": 11,
"description": "Task description",
"createdAt": "2018-05-11 14:37:07.000Z"
},
{
"taskId": 12,
"description": "Task description",
"createdAt": "2018-05-11 14:37:07.000Z"
}
]
}
I want to sorting by "subIssues.taskId", the query is ".tasks.find({"name": "performance"}).limit(10).sort({"subIssues.taskId": -1})". But this query works too slow. I tried another field (sorting by "name"), that worked very fast but sub array wasn't.
Collection indexes;
[
{
"_id" : 1
},
{
"name" : 1.0
},
{
"subIssues.taskId" : 1.0
}
]
Explain Output;
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "testing.tasks",
"indexFilterSet" : false,
"parsedQuery" : {
"name" : {
"$eq" : "performance"
}
},
"winningPlan" : {
"stage" : "SORT",
"sortPattern" : {
"subIssues.taskId" : -1.0
},
"limitAmount" : 10,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"name" : 1.0
},
"indexName" : "name_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"name" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"name" : [
"[\"performance\", \"performance\"]"
]
}
}
}
}
},
"rejectedPlans" : [
{
"stage" : "SORT",
"sortPattern" : {
"subIssues.taskId" : -1.0
},
"limitAmount" : 10,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"name" : {
"$eq" : "performance"
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"subIssues.taskId" : 1.0
},
"indexName" : "subIssues.taskId_1",
"isMultiKey" : true,
"multiKeyPaths" : {
"subIssues.taskId" : [
"subIssues"
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "backward",
"indexBounds" : {
"subIssues.taskId" : [
"[MaxKey, MinKey]"
]
}
}
}
}
}
]
},
"serverInfo" : {
"host" : "5113848ca8f8",
"port" : 27017,
"version" : "3.6.4"
},
"ok" : 1.0
}
I use mongo 3.6 on centos 7, 16 core, 32gb ram.
what are your suggestions?
Try indexing the field you want to sort:
db.records.createIndex( { subIssues.taskId: 1 } )
But i don't think this will work, change your data structure :)

MongoDB - Index not being used when sorting and limiting on ranged query

I'm trying to get a sorted list of items using a ranged query on a collection containing bulletin-board data. The data structure of a "thread" document is:
{
"_id" : ObjectId("5a779b47f4fa72412126526a"),
"title" : "necessitatibus tincidunt libris assueverit",
"content" : "Corrumpitvenenatis cubilia adipiscing sollicitudin",
"flagged" : false,
"locked" : false,
"sticky" : false,
"lastPostAt" : ISODate("2018-02-05T06:35:24.656Z"),
"postCount" : 42,
"user" : ObjectId("5a779b46f4fa72412126525a"),
"category" : ObjectId("5a779b31f4fa724121265164"),
"createdAt" : ISODate("2018-02-04T23:46:15.852Z"),
"updatedAt" : ISODate("2018-02-05T06:35:24.656Z")
}
The query is:
db.threads.find({
category: ObjectId('5a779b31f4fa724121265142'),
_id : { $gt: ObjectId('5a779b5cf4fa724121269be8') }
}).sort({ sticky: -1, lastPostAt: -1, _id: 1 }).limit(25)
I set up the following indexes to support it:
{ category: 1, _id: 1 }
{ category: 1, _id: 1, sticky: 1, lastPostAt: 1 }
{ sticky: 1, lastPostAt: 1, _id: 1 }
In spite of this, it's still scanning hundreds of documents/keys according to execution stats:
{
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 772,
"executionTimeMillis" : 17,
"totalKeysExamined" : 772,
"totalDocsExamined" : 772,
"executionStages" : {
"stage" : "SORT",
"nReturned" : 772,
"executionTimeMillisEstimate" : 0,
"works" : 1547,
"advanced" : 772,
"needTime" : 774,
"needYield" : 0,
"saveState" : 33,
"restoreState" : 33,
"isEOF" : 1,
"invalidates" : 0,
"sortPattern" : {
"sticky" : -1,
"lastPostAt" : -1,
"_id" : 1
},
"memUsage" : 1482601,
"memLimit" : 33554432,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"nReturned" : 772,
"executionTimeMillisEstimate" : 0,
"works" : 774,
"advanced" : 772,
"needTime" : 1,
"needYield" : 0,
"saveState" : 33,
"restoreState" : 33,
"isEOF" : 1,
"invalidates" : 0,
"inputStage" : {
"stage" : "FETCH",
"nReturned" : 772,
"executionTimeMillisEstimate" : 0,
"works" : 773,
"advanced" : 772,
"needTime" : 0,
"needYield" : 0,
"saveState" : 33,
"restoreState" : 33,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 772,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 772,
"executionTimeMillisEstimate" : 0,
"works" : 773,
"advanced" : 772,
"needTime" : 0,
"needYield" : 0,
"saveState" : 33,
"restoreState" : 33,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"category" : 1,
"_id" : 1,
"sticky" : 1,
"lastPostAt" : 1
},
"indexName" : "category_1__id_1_sticky_1_lastPostAt_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"category" : [ ],
"_id" : [ ],
"sticky" : [ ],
"lastPostAt" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"category" : [
"[ObjectId('5a779b31f4fa724121265142'), ObjectId('5a779b31f4fa724121265142')]"
],
"_id" : [
"(ObjectId('5a779b5cf4fa724121269be8'), ObjectId('ffffffffffffffffffffffff')]"
],
"sticky" : [
"[MinKey, MaxKey]"
],
"lastPostAt" : [
"[MinKey, MaxKey]"
]
},
"keysExamined" : 772,
"seeks" : 1,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
}
}
}
}
When I take out the sorting stage, it correctly scans only 25 documents. And the keys examined (772) remains the same no matter which fields I place in the sort function.
Here is the full explain() for the sorted query:
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "database.threads",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"category" : {
"$eq" : ObjectId("5a779b31f4fa724121265142")
}
},
{
"_id" : {
"$gt" : ObjectId("5a779b5cf4fa724121269be8")
}
}
]
},
"winningPlan" : {
"stage" : "SORT",
"sortPattern" : {
"sticky" : -1,
"lastPostAt" : -1,
"_id" : 1
},
"limitAmount" : 25,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"category" : 1,
"_id" : 1,
"sticky" : 1,
"lastPostAt" : 1
},
"indexName" : "category_1__id_1_sticky_1_lastPostAt_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"category" : [ ],
"_id" : [ ],
"sticky" : [ ],
"lastPostAt" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"category" : [
"[ObjectId('5a779b31f4fa724121265142'), ObjectId('5a779b31f4fa724121265142')]"
],
"_id" : [
"(ObjectId('5a779b5cf4fa724121269be8'), ObjectId('ffffffffffffffffffffffff')]"
],
"sticky" : [
"[MinKey, MaxKey]"
],
"lastPostAt" : [
"[MinKey, MaxKey]"
]
}
}
}
}
},
"rejectedPlans" : [
{
"stage" : "SORT",
"sortPattern" : {
"sticky" : -1,
"lastPostAt" : -1,
"_id" : 1
},
"limitAmount" : 25,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"_id" : {
"$gt" : ObjectId("5a779b5cf4fa724121269be8")
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"category" : 1
},
"indexName" : "category_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"category" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"category" : [
"[ObjectId('5a779b31f4fa724121265142'), ObjectId('5a779b31f4fa724121265142')]"
]
}
}
}
}
},
{
"stage" : "SORT",
"sortPattern" : {
"sticky" : -1,
"lastPostAt" : -1,
"_id" : 1
},
"limitAmount" : 25,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"category" : 1,
"_id" : 1
},
"indexName" : "category_1__id_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"category" : [ ],
"_id" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"category" : [
"[ObjectId('5a779b31f4fa724121265142'), ObjectId('5a779b31f4fa724121265142')]"
],
"_id" : [
"(ObjectId('5a779b5cf4fa724121269be8'), ObjectId('ffffffffffffffffffffffff')]"
]
}
}
}
}
},
{
"stage" : "SORT",
"sortPattern" : {
"sticky" : -1,
"lastPostAt" : -1,
"_id" : 1
},
"limitAmount" : 25,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"category" : {
"$eq" : ObjectId("5a779b31f4fa724121265142")
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"_id" : 1
},
"indexName" : "_id_",
"isMultiKey" : false,
"multiKeyPaths" : {
"_id" : [ ]
},
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"_id" : [
"(ObjectId('5a779b5cf4fa724121269be8'), ObjectId('ffffffffffffffffffffffff')]"
]
}
}
}
}
}
]
},
"serverInfo" : {
"host" : "CRF-MBP.local",
"port" : 27017,
"version" : "3.6.2",
"gitVersion" : "489d177dbd0f0420a8ca04d39fd78d0a2c539420"
},
"ok" : 1
}
And here is the full explain() for the non-sorted query:
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "database.threads",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"category" : {
"$eq" : ObjectId("5a779b31f4fa724121265142")
}
},
{
"_id" : {
"$gt" : ObjectId("5a779b5cf4fa724121269be8")
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 25,
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"category" : 1,
"_id" : 1,
"sticky" : 1,
"lastPostAt" : 1
},
"indexName" : "category_1__id_1_sticky_1_lastPostAt_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"category" : [ ],
"_id" : [ ],
"sticky" : [ ],
"lastPostAt" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"category" : [
"[ObjectId('5a779b31f4fa724121265142'), ObjectId('5a779b31f4fa724121265142')]"
],
"_id" : [
"(ObjectId('5a779b5cf4fa724121269be8'), ObjectId('ffffffffffffffffffffffff')]"
],
"sticky" : [
"[MinKey, MaxKey]"
],
"lastPostAt" : [
"[MinKey, MaxKey]"
]
}
}
}
},
"rejectedPlans" : [
{
"stage" : "LIMIT",
"limitAmount" : 25,
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"_id" : {
"$gt" : ObjectId("5a779b5cf4fa724121269be8")
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"category" : 1
},
"indexName" : "category_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"category" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"category" : [
"[ObjectId('5a779b31f4fa724121265142'), ObjectId('5a779b31f4fa724121265142')]"
]
}
}
}
},
{
"stage" : "LIMIT",
"limitAmount" : 25,
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"category" : 1,
"_id" : 1
},
"indexName" : "category_1__id_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"category" : [ ],
"_id" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"category" : [
"[ObjectId('5a779b31f4fa724121265142'), ObjectId('5a779b31f4fa724121265142')]"
],
"_id" : [
"(ObjectId('5a779b5cf4fa724121269be8'), ObjectId('ffffffffffffffffffffffff')]"
]
}
}
}
},
{
"stage" : "LIMIT",
"limitAmount" : 25,
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"category" : {
"$eq" : ObjectId("5a779b31f4fa724121265142")
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"_id" : 1
},
"indexName" : "_id_",
"isMultiKey" : false,
"multiKeyPaths" : {
"_id" : [ ]
},
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"_id" : [
"(ObjectId('5a779b5cf4fa724121269be8'), ObjectId('ffffffffffffffffffffffff')]"
]
}
}
}
}
]
},
"serverInfo" : {
"host" : "CRF-MBP.local",
"port" : 27017,
"version" : "3.6.2",
"gitVersion" : "489d177dbd0f0420a8ca04d39fd78d0a2c539420"
},
"ok" : 1
}
Does anyone have any idea why this might not fully use an index?
The issue is that none of your indexes actually help with the sorted query. This is the reason for the high number of scanned objects and the presence of SORT_KEY_GENERATOR stage (in-memory sort, limited to 32MB).
The non-sorted query, on the other hand, can use either the { category: 1, _id: 1 } or { category: 1, _id: 1, sticky: 1, lastPostAt: 1 } indexes. Note that it's perfectly valid to use either one, since one contains the prefix of the other. See Prefixes for more details.
MongoDB find() queries typically uses only one index, so a single compound index should cater for all the parameters of your query. This would include both the parameters of find() and sort().
A good writeup of how your index should be created is available in Optimizing MongoDB Compound Indexes. Let's take the main point of the article, where the compound index ordering should be equality --> sort --> range:
Your query "shape" is:
db.collection.find({category:..., _id: {$gt:...}})
.sort({sticky:-1, lastPostAt:-1, _id:1})
.limit(25)
We see that:
category:... is equality
sticky:-1, lastPostAt:-1, _id:1 is sort
_id: {$gt:...} is range
So the compound index you need is:
{category:1, sticky:-1, lastPostAt:-1, _id:1}
Where the winning plan of the explain() output of your query with the above index shows:
"winningPlan": {
"stage": "LIMIT",
"limitAmount": 25,
"inputStage": {
"stage": "FETCH",
"inputStage": {
"stage": "IXSCAN",
"keyPattern": {
"category": 1,
"sticky": -1,
"lastPostAt": -1,
"_id": 1
},
"indexName": "category_1_sticky_-1_lastPostAt_-1__id_1",
"isMultiKey": false,
"multiKeyPaths": {
"category": [ ],
"sticky": [ ],
"lastPostAt": [ ],
"_id": [ ]
},
"isUnique": false,
"isSparse": false,
"isPartial": false,
"indexVersion": 2,
"direction": "forward",
"indexBounds": {
"category": [
"[ObjectId('5a779b31f4fa724121265142'), ObjectId('5a779b31f4fa724121265142')]"
],
"sticky": [
"[MaxKey, MinKey]"
],
"lastPostAt": [
"[MaxKey, MinKey]"
],
"_id": [
"(ObjectId('5a779b5cf4fa724121269be8'), ObjectId('ffffffffffffffffffffffff')]"
]
}
}
}
}
Note that the winning plan doesn't contain a SORT_KEY_GENERATOR stage. This means that the index can be fully utilized to respond to the sorted query.
I believe there are 2 problems both having to do with your sort. These problems come straight from the documentations but if you would comment I'll help explain (and might possibly learn something myself)
The first and biggest problem is that you must sort in the order given by the index. From docs:
You can specify a sort on all the keys of the index or on a subset;
however, the sort keys must be listed in the same order as they appear
in the index. For example, an index key pattern { a: 1, b: 1 } can
support a sort on { a: 1, b: 1 } but not on { b: 1, a: 1 }.
This means that you must sort in the order given by your winning plan: category, _id, sticky, lastPostAt (or any prefix of that order such as category, _id, sticky or category _id). If not mongodb will identify the 772 docs which are indexed using your winning plan, but will then have to comb through each key in order to assess values and provide the desired sort order. If you want to sort by the order you are curenttly querying must provide a index in that order:
The second problem is that you must sort in the direction that you provided by the index (or the inverse direction).
For a query to use a compound index for a sort, the specified sort
direction for all keys in the cursor.sort() document must match the
index key pattern or match the inverse of the index key pattern. For
example, an index key pattern { a: 1, b: -1 } can support a sort on {
a: 1, b: -1 } and { a: -1, b: 1 } but not on { a: -1, b: -1 } or {a:
1, b: 1}.
Because your indexes are all in ascending order, you would have to either sort in ascending order for all indexes, or descending order for all indexes. If not we run into the same problem in which mongo finds all the relevant docs, but has to comb through them to provide the desired order.
I believe you would get imporved functionality by providing an additional index of:
{ sticky: -1, lastPostAt: -1, _id: 1 }
or its inverse:
{ sticky: 1, lastPostAt: 1, _id: -1 }
This would create a situation where mongo uses your first index
{ category: 1, _id: 1 }
To identify potential unsorted documents, then uses the one of the new index (provided above) since they would already be sorted. Then the limit would take care of giving you your 25 docs.
I'm pretty sure this would created a covered query (a query with no docs examined). Let me know how it goes, cheers!

MongoDB indexes not working when executing $elemMatch

I am executing an query using $elemMatch and it seems like it is not using the index I added for that.
Here is my document:
{
"_id" : "123466",
"something" : [
{
"someID" : ObjectId("5701b4c3c6b126083332e66f"),
"tags":
[
{
"tagKey": "ErrorCode",
"tagValue": "7001"
},
{
"tagKey": "ErrorDescription",
"tagValue": "nullPointer"
}
],
"removeOnDelivery" : true,
"entryTime" : ISODate("2016-04-04T00:26:43.167Z")
}
]
}
Here are the indexes I am using (I intended to use only first index but I added additional indexes to investigate why none of them are working).
db.test.createIndex( { "something.tags:" : 1 }, { sparse : true, background : true } )
db.test.createIndex( { "something.tags.tagKey:" : 1 }, { sparse : true, background : true } )
db.test.createIndex( { "something.tags.tagValue:" : 1 }, { sparse : true, background : true } )
db.test.createIndex( { "something.tags.tagKey:" : 1, "something.tags.tagValue:" : 1 }, { sparse : true, background : true } )
Here is my query and response:
db.test.find({"something.tags": { $elemMatch: { "tagKey" : "ErrorCode", "tagValue" : "7001" } } } ).explain()
{
"cursor": "BasicCursor",
"isMultiKey": false,
"n": 2,
"nscannedObjects": 2,
"nscanned": 2,
"nscannedObjectsAllPlans": 2,
"nscannedAllPlans": 2,
"scanAndOrder": false,
"indexOnly": false,
"nYields": 0,
"nChunkSkips": 0,
"millis": 0,
"server": "some_server",
"filterSet": false,
"stats": {
"type": "COLLSCAN",
"works": 4,
"yields": 0,
"unyields": 0,
"invalidates": 0,
"advanced": 2,
"needTime": 1,
"needFetch": 0,
"isEOF": 1,
"docsTested": 2,
"children": []
}
}
I don't know if this was a typing mistake. Your createIndex query has : at the end of index name. Just correcting that may get the results you want.
However, it is not necessary that the winning plan always choose the one using index. If COLLSCAN is cheaper, which may be the case in case of collections with less number of elements, Mongo may choose COLLSCAN.
If you want to force index usage, you may use .hint("index_name").
I tried with proper index name without : in name and it used index to query. Your results may be different depending on the collection statistics and server version as #Neil Lunn mentioned in comments .
db.test.createIndex( { "something.tags.tagKey" : 1 }, { sparse : true, background : true } )
And Explain results,
db.test.find({"something.tags": { $elemMatch: { "tagKey" : "ErrorCode"} } } ).explain()
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test_db.test",
"indexFilterSet" : false,
"parsedQuery" : {
"something.tags" : {
"$elemMatch" : {
"tagKey" : {
"$eq" : "ErrorCode"
}
}
}
},
"winningPlan" : {
"stage" : "FETCH",
"filter" : {
"something.tags" : {
"$elemMatch" : {
"tagKey" : {
"$eq" : "ErrorCode"
}
}
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"something.tags.tagKey" : 1
},
"indexName" : "something.tags.tagKey_1",
"isMultiKey" : true,
"isUnique" : false,
"isSparse" : true,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"something.tags.tagKey" : [
"[\"ErrorCode\", \"ErrorCode\"]"
]
}
}
},
"rejectedPlans" : [ ]
},
"ok" : 1
}