Context
=> Check edit !
I'm running some queries in which sorting is applied and that are supposed to take advantage of indexes. For some indexes, for certain queries (aggregate) on a given environment, queries are by-passing the index and I haven't yet been able to figure out why.
Setup
I have a mongo collection that contains 3 indexes:
_id
definition.name
definition.financial.profitability.highlights.prof_net
I'm using 2 queries to test usage of each indexes:
B. Find and sort query
db.getCollection('properties_po').find().sort({"definition.financial.profitability.highlights.prof_net" : 1.0 }).limit(1);
db.getCollection('properties_po').find().sort({"definition.name" : 1.0}).limit(1);
B. Aggregate with sort query
db.getCollection("properties_po").aggregate([{"$sort":{"definition.name":1.0}},{"$limit":1}])
db.getCollection("properties_po").aggregate([{"$sort":{"definition.financial.profitability.highlights.prof_net":1.0}},{"$limit":1}])
I use db.properties_po.aggregate( [ { $indexStats: { } } ] ) to check if index usage has been incremented after each queries.
Results
Environment 1 (local mongodb 3.4):
Both queries on both indexes works as expected, incrementing each index usages after each query.
Environment 2 (prod monhodb 3.6):
Queries A works as expected and index usage is incremented after each query and for each indexes.
Query B works for the index "definition.name" where index usage is incremented properly.
Query B doesn't work for index "definition.financial.profitability.highlights.prof_net" where index is not used (very slow query) as it it reflected in the index usage not being incremented.
I'm not sure where to look at, it might be some mongodb configuration that I'm missing, some nested document attribute limitation or maybe my index is not created properly.
Thanks for your help
EDIT 18/03
It is not related to mongodb version as I've just tested creating the same collection on the prod server and I have 2 different results there too.
Collection 1: 15026 records
Output of the aggregate query with explain:
db.getCollection("properties").explain("executionStats").aggregate([{"$sort":{"definition.financial.profitability.highlights.prof_net":1.0}},{"$limit":1}])
-
{
"stages" : [
{
"$cursor" : {
"query" : {
},
"queryPlanner" : {
"plannerVersion" : 1.0,
"namespace" : "collec.properties",
"indexFilterSet" : false,
"parsedQuery" : {
},
"winningPlan" : {
"stage" : "COLLSCAN",
"direction" : "forward"
},
"rejectedPlans" : [
]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 15026.0,
"executionTimeMillis" : 1522.0,
"totalKeysExamined" : 0.0,
"totalDocsExamined" : 15026.0,
"executionStages" : {
"stage" : "COLLSCAN",
"nReturned" : 15026.0,
"executionTimeMillisEstimate" : 1506.0,
"works" : 15028.0,
"advanced" : 15026.0,
"needTime" : 1.0,
"needYield" : 0.0,
"saveState" : 128.0,
"restoreState" : 128.0,
"isEOF" : 1.0,
"invalidates" : 0.0,
"direction" : "forward",
"docsExamined" : 15026.0
}
}
}
},
{
"$sort" : {
"sortKey" : {
"definition.financial.profitability.highlights.prof_net" : 1.0
},
"limit" : NumberLong(1)
}
}
],
"ok" : 1.0
}
Collection 2: 2 records
Same query output:
db.getCollection("properties_pl").explain("executionStats").aggregate([{"$sort":{"definition.financial.profitability.highlights.prof_net":1.0}},{"$limit":1}])
-
{
"stages" : [
{
"$cursor" : {
"query" : {
},
"sort" : {
"definition.financial.profitability.highlights.prof_net" : 1.0
},
"limit" : NumberLong(1),
"queryPlanner" : {
"plannerVersion" : 1.0,
"namespace" : "patrimmoine.properties_pl",
"indexFilterSet" : false,
"parsedQuery" : {
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"definition.financial.profitability.highlights.prof_net" : 1.0
},
"indexName" : "definition.financial.profitability.highlights.prof_net_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"definition.financial.profitability.highlights.prof_net" : [
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2.0,
"direction" : "forward",
"indexBounds" : {
"definition.financial.profitability.highlights.prof_net" : [
"[MinKey, MaxKey]"
]
}
}
},
"rejectedPlans" : [
]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 2.0,
"executionTimeMillis" : 0.0,
"totalKeysExamined" : 2.0,
"totalDocsExamined" : 2.0,
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 2.0,
"executionTimeMillisEstimate" : 0.0,
"works" : 3.0,
"advanced" : 2.0,
"needTime" : 0.0,
"needYield" : 0.0,
"saveState" : 1.0,
"restoreState" : 1.0,
"isEOF" : 1.0,
"invalidates" : 0.0,
"docsExamined" : 2.0,
"alreadyHasObj" : 0.0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 2.0,
"executionTimeMillisEstimate" : 0.0,
"works" : 3.0,
"advanced" : 2.0,
"needTime" : 0.0,
"needYield" : 0.0,
"saveState" : 1.0,
"restoreState" : 1.0,
"isEOF" : 1.0,
"invalidates" : 0.0,
"keyPattern" : {
"definition.financial.profitability.highlights.prof_net" : 1.0
},
"indexName" : "definition.financial.profitability.highlights.prof_net_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"definition.financial.profitability.highlights.prof_net" : [
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2.0,
"direction" : "forward",
"indexBounds" : {
"definition.financial.profitability.highlights.prof_net" : [
"[MinKey, MaxKey]"
]
},
"keysExamined" : 2.0,
"seeks" : 1.0,
"dupsTested" : 0.0,
"dupsDropped" : 0.0,
"seenInvalidated" : 0.0
}
}
}
}
}
],
"ok" : 1.0
}
So one query is indeed not using any index and doing a "COLLSCAN" where the other is doing a "FETCH" with a "IXSCAN".
So my query is now, how could content of a collection change the behavior of a query?
Related
I'm using mongo 4.0.12 and I'm trying to tune my most executed query:
db.getCollection('ServiceInvoice').find(
{
"Provider.ParentId": "60f9d7631b1f243eb82903ee",
"Provider._id": "60f9d803fa27e34fdc4ec159",
"Environment": 1,
"Status": 2,
"IssuedOn":
{
"$gte": { DateTime: new Date("2022-02-01T00:00:00Z") },
"$lte": { DateTime: new Date("2022-02-01T23:59:59Z") }
}
}).limit(50).skip(1050).sort({ "IssueOn.DateTime": -1 })
using an index like:
{
"Environment" : 1.0,
"Provider.ParentId" : 1.0,
"Provider._id" : 1.0,
"Status" : 1.0,
"IssuedOn" : 1.0,
"IssuedOn.DateTime" : -1.0
}
and gives me this explain:
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.ServiceInvoice",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"Environment" : {
"$eq" : 1.0
}
},
{
"Provider.ParentId" : {
"$eq" : "60f9d7631b1f243eb82903ee"
}
},
{
"Provider._id" : {
"$eq" : "60f9d803fa27e34fdc4ec159"
}
},
{
"Status" : {
"$eq" : 2.0
}
},
{
"IssuedOn" : {
"$lte" : {
"DateTime" : ISODate("2022-02-01T23:59:59.000Z")
}
}
},
{
"IssuedOn" : {
"$gte" : {
"DateTime" : ISODate("2022-02-01T00:00:00.000Z")
}
}
}
]
},
"winningPlan" : {
"stage" : "SKIP",
"skipAmount" : 0,
"inputStage" : {
"stage" : "SORT",
"sortPattern" : {
"IssueOn.DateTime" : -1.0
},
"limitAmount" : 1100,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"Environment" : 1.0,
"Provider.ParentId" : 1.0,
"Provider._id" : 1.0,
"Status" : 1.0,
"IssuedOn" : 1.0,
"IssuedOn.DateTime" : -1.0
},
"indexName" : "Environment_1_Provider.ParentId_1_Provider._id_1_Status_1_IssueOn_1_IssueOn.DateTime_-1",
"isMultiKey" : false,
"multiKeyPaths" : {
"Environment" : [],
"Provider.ParentId" : [],
"Provider._id" : [],
"Status" : [],
"IssuedOn" : [],
"IssuedOn.DateTime" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"Environment" : [
"[1.0, 1.0]"
],
"Provider.ParentId" : [
"[\"60f9d7631b1f243eb82903ee\", \"60f9d7631b1f243eb82903ee\"]"
],
"Provider._id" : [
"[\"60f9d803fa27e34fdc4ec159\", \"60f9d803fa27e34fdc4ec159\"]"
],
"Status" : [
"[2.0, 2.0]"
],
"IssuedOn" : [
"[{ DateTime: new Date(1643673600000) }, { DateTime: new Date(1643759999000) }]"
],
"IssuedOn.DateTime" : [
"[MaxKey, MinKey]"
]
}
}
}
}
}
},
"rejectedPlans" : []
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 50,
"executionTimeMillis" : 99,
"totalKeysExamined" : 31622,
"totalDocsExamined" : 31622,
"executionStages" : {
"stage" : "SKIP",
"nReturned" : 50,
"executionTimeMillisEstimate" : 6,
"works" : 32725,
"advanced" : 50,
"needTime" : 32674,
"needYield" : 0,
"saveState" : 255,
"restoreState" : 255,
"isEOF" : 1,
"invalidates" : 0,
"skipAmount" : 0,
"inputStage" : {
"stage" : "SORT",
"nReturned" : 1100,
"executionTimeMillisEstimate" : 6,
"works" : 32725,
"advanced" : 1100,
"needTime" : 31624,
"needYield" : 0,
"saveState" : 255,
"restoreState" : 255,
"isEOF" : 1,
"invalidates" : 0,
"sortPattern" : {
"IssueOn.DateTime" : -1.0
},
"memUsage" : 3057213,
"memLimit" : 33554432,
"limitAmount" : 1100,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"nReturned" : 31622,
"executionTimeMillisEstimate" : 4,
"works" : 31624,
"advanced" : 31622,
"needTime" : 1,
"needYield" : 0,
"saveState" : 255,
"restoreState" : 255,
"isEOF" : 1,
"invalidates" : 0,
"inputStage" : {
"stage" : "FETCH",
"nReturned" : 31622,
"executionTimeMillisEstimate" : 3,
"works" : 31623,
"advanced" : 31622,
"needTime" : 0,
"needYield" : 0,
"saveState" : 255,
"restoreState" : 255,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 31622,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 31622,
"executionTimeMillisEstimate" : 1,
"works" : 31623,
"advanced" : 31622,
"needTime" : 0,
"needYield" : 0,
"saveState" : 255,
"restoreState" : 255,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"Environment" : 1.0,
"Provider.ParentId" : 1.0,
"Provider._id" : 1.0,
"Status" : 1.0,
"IssuedOn" : 1.0,
"IssuedOn.DateTime" : -1.0
},
"indexName" : "Environment_1_Provider.ParentId_1_Provider._id_1_Status_1_IssueOn_1_IssueOn.DateTime_-1",
"isMultiKey" : false,
"multiKeyPaths" : {
"Environment" : [],
"Provider.ParentId" : [],
"Provider._id" : [],
"Status" : [],
"IssuedOn" : [],
"IssuedOn.DateTime" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"Environment" : [
"[1.0, 1.0]"
],
"Provider.ParentId" : [
"[\"60f9d7631b1f243eb82903ee\", \"60f9d7631b1f243eb82903ee\"]"
],
"Provider._id" : [
"[\"60f9d803fa27e34fdc4ec159\", \"60f9d803fa27e34fdc4ec159\"]"
],
"Status" : [
"[2.0, 2.0]"
],
"IssuedOn" : [
"[{ DateTime: new Date(1643673600000) }, { DateTime: new Date(1643759999000) }]"
],
"IssuedOn.DateTime" : [
"[MaxKey, MinKey]"
]
},
"keysExamined" : 31622,
"seeks" : 1,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
}
}
}
},
"allPlansExecution" : []
},
"serverInfo" : {
"host" : "d4ef6b3e9c6c",
"port" : 27017,
"version" : "4.0.12",
"gitVersion" : "5776e3cbf9e7afe86e6b29e22520ffb6766e95d4"
},
"ok" : 1.0
}
However, dbKoda keeps me saying that I must create an index for sorting.
I've already tried to create a separated index for IssuedOn.DateTime, but it keeps me recommending the creation and I don't see any effects.
How can I solve this problem? (Changes to the document fields are not an option).
According to these threads - MongoDB - Index not being used when sorting and limiting on ranged query and https://emptysqua.re/blog/optimizing-mongodb-compound-indexes/
A compund Index should be created following this order:
Equality Tests: Add all equality-tested fields to the compound index, in any order;
Sort Fields (ascending / descending only matters if there are multiple sort fields): Add sort fields to the index in the same order and direction as your query's sort;
Range Filters: First, add the range filter for the field with the lowest cardinality (fewest distinct values in the collection). Then the next lowest-cardinality range filter, and so on to the highest-cardinality.
So, the solution was creating an index like this:
{
"Environment" : 1.0,
"Provider.ParentId" : 1.0,
"Provider._id" : 1.0,
"Status" : 1.0,
"IssuedOn.DateTime" : -1.0,
"IssuedOn" : 1.0
}
And now, the query uses the index for sorting and fetch only the records in range.
I have around 10 millions document in MongoDB.
I'm trying to search for text inside the db db.outMessage.find({ "text" : /.*m.*/}) but it took too long (around 30 second) with no result, but if I search for existing text it took less than a second.
I tried to put index on text with same result.
db.outMessage.find({ "text" : /.*m.*/}).explain(true)
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "notification_center.outMessage",
"indexFilterSet" : false,
"parsedQuery" : {
"text" : {
"$regex" : ".*m.*"
}
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"filter" : {
"text" : {
"$regex" : ".*m.*"
}
},
"keyPattern" : {
"text" : 1
},
"indexName" : "text",
"isMultiKey" : false,
"multiKeyPaths" : {
"text" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"text" : [
"[\"\", {})",
"[/.*m.*/, /.*m.*/]"
]
}
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 0,
"executionTimeMillis" : 14354,
"totalKeysExamined" : 10263270,
"totalDocsExamined" : 0,
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 0,
"executionTimeMillisEstimate" : 12957,
"works" : 10263271,
"advanced" : 0,
"needTime" : 10263270,
"needYield" : 0,
"saveState" : 80258,
"restoreState" : 80258,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 0,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"filter" : {
"text" : {
"$regex" : ".*m.*"
}
},
"nReturned" : 0,
"executionTimeMillisEstimate" : 12461,
"works" : 10263271,
"advanced" : 0,
"needTime" : 10263270,
"needYield" : 0,
"saveState" : 80258,
"restoreState" : 80258,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"text" : 1
},
"indexName" : "text",
"isMultiKey" : false,
"multiKeyPaths" : {
"text" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"text" : [
"[\"\", {})",
"[/.*m.*/, /.*m.*/]"
]
},
"keysExamined" : 10263270,
"seeks" : 1,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
},
"allPlansExecution" : [ ]
},
"serverInfo" : {
"host" : "acsdptest.arabiacell.net",
"port" : 27017,
"version" : "3.4.7",
"gitVersion" : "cf38c1b8a0a8dca4a11737581beafef4fe120bcd"
},
The index will essentially be a list of all the values of the text field, in lexicographical order, i.e. sorted by the first letter.
Since the query executor has no way to predict which values might contain an 'm', it must examine all of the index entries.
In the case of this query, that means 10,263,270 index keys were examined, after being read from disk if the index was not already in the cache.
If this is actually a keyword search and not a single-letter match, instead of $regex, you might be able to make use of the $text query operator, which requires a text index
I have a MongoDB Collection for weather data with each document consisting about 50 different weather parameters fields. Simple Example below:
{
"wind":7,
"swell":6,
"temp":32,
...
"50th_field":32
}
If I only need one field from all documents, say temp, my query would be this:
db.weather.find({},{ temp: 1})
So internally, does MongoDB has to fetch the entire document for just 1 field which was requested(projected)? Wouldn't it be an expensive operation?
I tried MongoDB Compass to benchmark timings, but the time required was <1ms so couldn't figure out.
MonogDB will read all data, however only field temp (and _id) will be transmitted over your network to the client. In case your document are rather big, then the over all performance should be better when you project only the fields you need to get.
Yes. This is how to avoid it:
create an index on temp
Use find(Temp)
turn off _id (necessary).
Run:
db.coll.find({ temp:{ $ne:null }},{ temp:1, _id:0 })`
{} triggers collscan because the algorithm tries to match the query fields with project
With {temp}, {temp, _id:0} it says: "Oh, I only need temp".
It should also be smart to tell that {}, {temp, _id:0} only needs index, but it's not.
Basically using projection with limiting fields is always faster then fetch full document, You can even use the covered index to avoid examining the documents(no disk IO) the archive better performance.
Check the executionStats of demo below, the totalDocsExamined was 0! but you must remove the _id field in projection because it's not included in index.
See also:
https://docs.mongodb.com/manual/core/query-optimization/#covered-query
> db.test.insertOne({name: 'TJT'})
{
"acknowledged" : true,
"insertedId" : ObjectId("5faa0c8469dffee69357dde3")
}
> db.test.createIndex({name: 1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
db.test.explain('executionStats').find({name: 'TJT'}, {_id: 0, name: 1})
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "memo.test",
"indexFilterSet" : false,
"parsedQuery" : {
"name" : {
"$eq" : "TJT"
}
},
"winningPlan" : {
"stage" : "PROJECTION",
"transformBy" : {
"_id" : 0,
"name" : 1
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"name" : 1
},
"indexName" : "name_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"name" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"name" : [
"[\"TJT\", \"TJT\"]"
]
}
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 0,
"totalKeysExamined" : 1,
"totalDocsExamined" : 0,
"executionStages" : {
"stage" : "PROJECTION",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"transformBy" : {
"_id" : 0,
"name" : 1
},
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"name" : 1
},
"indexName" : "name_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"name" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"name" : [
"[\"TJT\", \"TJT\"]"
]
},
"keysExamined" : 1,
"seeks" : 1,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
}
}
}
I am running Community MongoDB 3.4.9 on my laptop with 64 GB RAM. I have a collection with 12+ million documents. Each document has at least from and to fields of type Int64. The from-to are unique ranges. There are no documents with overlapping ranges. There is an index on the collection as follows:
{
"v" : NumberInt(1),
"unique" : true,
"key" : {
"from" : NumberInt(1),
"to" : NumberInt(1)
},
"name" : "range",
"ns" : "db.location",
"background" : true
}
The server/database is idle. There are no clients. I run the query below over and over and I get a constant execution time of roughly 21 seconds.
db.location.find({from:{$lte:NumberLong(3682093364)},to:{$gte:NumberLong(3682093364)}}).limit(1)
Reversal of and conditions does not make a difference with respect to execution time. The explain command shows the following.
{
"queryPlanner" : {
"plannerVersion" : 1.0,
"namespace" : "db.location",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"from" : {
"$lte" : NumberLong(3682093364)
}
},
{
"to" : {
"$gte" : NumberLong(3682093364)
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 1.0,
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"from" : 1.0,
"to" : 1.0
},
"indexName" : "range",
"isMultiKey" : false,
"multiKeyPaths" : {
"from" : [
],
"to" : [
]
},
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1.0,
"direction" : "forward",
"indexBounds" : {
"from" : [
"[-inf.0, 3682093364]"
],
"to" : [
"[3682093364, inf.0]"
]
}
}
}
},
"rejectedPlans" : [
]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1.0,
"executionTimeMillis" : 21526.0,
"totalKeysExamined" : 12284007.0,
"totalDocsExamined" : 1.0,
"executionStages" : {
"stage" : "LIMIT",
"nReturned" : 1.0,
"executionTimeMillisEstimate" : 20945.0,
"works" : 12284008.0,
"advanced" : 1.0,
"needTime" : 12284006.0,
"needYield" : 0.0,
"saveState" : 96299.0,
"restoreState" : 96299.0,
"isEOF" : 1.0,
"invalidates" : 0.0,
"limitAmount" : 1.0,
"inputStage" : {
"stage" : "FETCH",
"nReturned" : 1.0,
"executionTimeMillisEstimate" : 20714.0,
"works" : 12284007.0,
"advanced" : 1.0,
"needTime" : 12284006.0,
"needYield" : 0.0,
"saveState" : 96299.0,
"restoreState" : 96299.0,
"isEOF" : 0.0,
"invalidates" : 0.0,
"docsExamined" : 1.0,
"alreadyHasObj" : 0.0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 1.0,
"executionTimeMillisEstimate" : 20357.0,
"works" : 12284007.0,
"advanced" : 1.0,
"needTime" : 12284006.0,
"needYield" : 0.0,
"saveState" : 96299.0,
"restoreState" : 96299.0,
"isEOF" : 0.0,
"invalidates" : 0.0,
"keyPattern" : {
"from" : 1.0,
"to" : 1.0
},
"indexName" : "range",
"isMultiKey" : false,
"multiKeyPaths" : {
"from" : [
],
"to" : [
]
},
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1.0,
"direction" : "forward",
"indexBounds" : {
"from" : [
"[-inf.0, 3682093364]"
],
"to" : [
"[3682093364, inf.0]"
]
},
"keysExamined" : 12284007.0,
"seeks" : 12284007.0,
"dupsTested" : 0.0,
"dupsDropped" : 0.0,
"seenInvalidated" : 0.0
}
}
},
"allPlansExecution" : [
]
},
"serverInfo" : {
"host" : "LAPTOP-Q96TVSN8",
"port" : 27017.0,
"version" : "3.4.9",
"gitVersion" : "876ebee8c7dd0e2d992f36a848ff4dc50ee6603e"
},
"ok" : 1.0
}
Supplying a hint does not make a difference. explain seems to indicate that the proper (and only) index is already being used but most of the execution time (20s) is spent in IXSCAN. The MongoDB log shows that many index items were scanned but only one document was ever touched and returned. It also shows a crazy number of locks and yields considering there are ZERO concurrent operations on the database. The underlying engine is wiredTiger on an SSD disk. MongoDB RAM usage is at 7 GB.
2017-10-10T10:06:14.456+0200 I COMMAND [conn33] command db.location appName: "MongoDB Shell" command: explain { explain: { find: "location", filter: { from: { $lte: 3682093364 }, to: { $gte: 3682093364 } }, limit: 1.0, singleBatch: false }, verbosity: "allPlansExecution" } numYields:96299 reslen:1944 locks:{ Global: { acquireCount: { r: 192600 } }, Database: { acquireCount: { r: 96300 } }, Collection: { acquireCount: { r: 96300 } } } protocol:op_command 21526ms
Is there a better way to structure the document so that the lookups are faster considering my ranges are never overlapping? Is there something obvious that I am doing wrong?
UPDATE:
When I drop the index, COLLSCAN is used and the document is found in consistent 8-9 seconds.
I hate to answer my own questions but then again I am happy for finding the solution.
Even though it makes sense to create such a composite index, considering the specifics of non-overlapping ranges it turns out that the search scope is just too broad. The higher the input number, the longer it will take to find the result as more and more index entries are found that satisfy from <= number and last result in the search scope is actually the one we are looking for (index is scanned from left to right).
The solution is to modify the index to be either { from: -1 } or { to: 1 }. The composite index is really not necessary in this scenario as the ranges are not overlapping and the very first document found by the index is the very document being returned. This is now lightning fast just as expected.
You live and learn...
I am having difficulty persuading Mongo to run a distinct query that looks like it should be covered by the indexes without fetching a large number of documents in the collection.
My documents have the general form:
{
_tenantId: 'someString',
_productCategory: 'some string from a smallish set'
...
}
I have an index on (_tenantId, _productCategory).
I want to find out what the set of distinct product categories is for a given tenant, so the query is:
db.products.distinct( '_productCategory', { _tenantId: '463171c3-d15f-4699-893d-3046327f8e1f'})
This runs rather slowly (several seconds for a collection of around half a million products against a local DB, which is Mongo 3.2.9). Against our pre-production SaaS-based Mongo (which is probably more memory constrained than my local instance which has free run of my machine) it take several 10s of seconds for the same data.
Explaining the query yields:
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "engage-prod.products",
"indexFilterSet" : false,
"parsedQuery" : {
"_tenantId" : {
"$eq" : "463171c3-d15f-4699-893d-3046327f8e1f"
}
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"_tenantId" : 1,
"_productCategory" : 1
},
"indexName" : "_tenantId_1__productCategory_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"_tenantId" : [
"[\"463171c3-d15f-4699-893d-3046327f8e1f\", \"463171c3-d15f-4699-893d-3046327f8e1f\"]"
],
"_productCategory" : [
"[MinKey, MaxKey]"
]
}
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 406871,
"executionTimeMillis" : 358,
"totalKeysExamined" : 406871,
"totalDocsExamined" : 406871,
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 406871,
"executionTimeMillisEstimate" : 80,
"works" : 406872,
"advanced" : 406871,
"needTime" : 0,
"needYield" : 0,
"saveState" : 3178,
"restoreState" : 3178,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 406871,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 406871,
"executionTimeMillisEstimate" : 40,
"works" : 406872,
"advanced" : 406871,
"needTime" : 0,
"needYield" : 0,
"saveState" : 3178,
"restoreState" : 3178,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"_tenantId" : 1,
"_productCategory" : 1
},
"indexName" : "_tenantId_1__productCategory_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"_tenantId" : [
"[\"463171c3-d15f-4699-893d-3046327f8e1f\", \"463171c3-d15f-4699-893d-3046327f8e1f\"]"
],
"_productCategory" : [
"[MinKey, MaxKey]"
]
},
"keysExamined" : 406871,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
}
},
"serverInfo" : {
"host" : "Stevens-MacBook-Pro.local",
"port" : 27017,
"version" : "3.2.9",
"gitVersion" : "22ec9e93b40c85fc7cae7d56e7d6a02fd811088c"
},
"ok" : 1
}
Note that even though it runs an IXSCAN it still returns over 400K documents (nReturned).
If I create a compound field _tenantAndProductCategory containing a lexical concatenation (with a : separator) and index that so it's a single field index, then the query:
db.products.explain('executionStats').distinct( '_productTenantAndCategory', { _productTenantAndCategory: {$gte: '463171c3-d15f-4699-893d-3046327f8e1f',$lt: '463171c3-d15f-4699-893d-3046327f8e1g'}})
works entirely within the index and yields:
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "engage-prod.products",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"_productTenantAndCategory" : {
"$lt" : "463171c3-d15f-4699-893d-3046327f8e1g"
}
},
{
"_productTenantAndCategory" : {
"$gte" : "463171c3-d15f-4699-893d-3046327f8e1f"
}
}
]
},
"winningPlan" : {
"stage" : "PROJECTION",
"transformBy" : {
"_id" : 0,
"_productTenantAndCategory" : 1
},
"inputStage" : {
"stage" : "DISTINCT_SCAN",
"keyPattern" : {
"_productTenantAndCategory" : 1
},
"indexName" : "_productTenantAndCategory_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"_productTenantAndCategory" : [
"[\"463171c3-d15f-4699-893d-3046327f8e1f\", \"463171c3-d15f-4699-893d-3046327f8e1g\")"
]
}
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 62,
"executionTimeMillis" : 0,
"totalKeysExamined" : 63,
"totalDocsExamined" : 0,
"executionStages" : {
"stage" : "PROJECTION",
"nReturned" : 62,
"executionTimeMillisEstimate" : 0,
"works" : 63,
"advanced" : 62,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"transformBy" : {
"_id" : 0,
"_productTenantAndCategory" : 1
},
"inputStage" : {
"stage" : "DISTINCT_SCAN",
"nReturned" : 62,
"executionTimeMillisEstimate" : 0,
"works" : 63,
"advanced" : 62,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"_productTenantAndCategory" : 1
},
"indexName" : "_productTenantAndCategory_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"_productTenantAndCategory" : [
"[\"463171c3-d15f-4699-893d-3046327f8e1f\", \"463171c3-d15f-4699-893d-3046327f8e1g\")"
]
},
"keysExamined" : 63
}
}
},
"serverInfo" : {
"host" : "Stevens-MacBook-Pro.local",
"port" : 27017,
"version" : "3.2.9",
"gitVersion" : "22ec9e93b40c85fc7cae7d56e7d6a02fd811088c"
},
"ok" : 1
}
Having to build single field indexes with manually compounded keys for all the aggregation queries I need is not a very desirable path to follow. Since all the information is present in the compound index I started with, why can't Mongo execute the original distinct query with cover by that index? Is there anything I can do to overcome this in the way of query optimization?
Note This is actually a sub-problem of a slightly more complex one involving an aggregation pipeline to actually count the number of occurrences of each category, but I am restricting my question for now to the simpler distinct query since it seems to capture the essence of failure to use an index that should cover things (which I was also seeing in the aggregation pipeline case), while being a simpler overall query.