Unacceptably slow MongoDB $lte + $gte query with index

Unacceptably slow MongoDB $lte + $gte query with index - mongodb

I am running Community MongoDB 3.4.9 on my laptop with 64 GB RAM. I have a collection with 12+ million documents. Each document has at least from and to fields of type Int64. The from-to are unique ranges. There are no documents with overlapping ranges. There is an index on the collection as follows:
{
"v" : NumberInt(1),
"unique" : true,
"key" : {
"from" : NumberInt(1),
"to" : NumberInt(1)
},
"name" : "range",
"ns" : "db.location",
"background" : true
}
The server/database is idle. There are no clients. I run the query below over and over and I get a constant execution time of roughly 21 seconds.
db.location.find({from:{$lte:NumberLong(3682093364)},to:{$gte:NumberLong(3682093364)}}).limit(1)
Reversal of and conditions does not make a difference with respect to execution time. The explain command shows the following.
{
"queryPlanner" : {
"plannerVersion" : 1.0,
"namespace" : "db.location",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"from" : {
"$lte" : NumberLong(3682093364)
}
},
{
"to" : {
"$gte" : NumberLong(3682093364)
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 1.0,
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"from" : 1.0,
"to" : 1.0
},
"indexName" : "range",
"isMultiKey" : false,
"multiKeyPaths" : {
"from" : [
],
"to" : [
]
},
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1.0,
"direction" : "forward",
"indexBounds" : {
"from" : [
"[-inf.0, 3682093364]"
],
"to" : [
"[3682093364, inf.0]"
]
}
}
}
},
"rejectedPlans" : [
]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1.0,
"executionTimeMillis" : 21526.0,
"totalKeysExamined" : 12284007.0,
"totalDocsExamined" : 1.0,
"executionStages" : {
"stage" : "LIMIT",
"nReturned" : 1.0,
"executionTimeMillisEstimate" : 20945.0,
"works" : 12284008.0,
"advanced" : 1.0,
"needTime" : 12284006.0,
"needYield" : 0.0,
"saveState" : 96299.0,
"restoreState" : 96299.0,
"isEOF" : 1.0,
"invalidates" : 0.0,
"limitAmount" : 1.0,
"inputStage" : {
"stage" : "FETCH",
"nReturned" : 1.0,
"executionTimeMillisEstimate" : 20714.0,
"works" : 12284007.0,
"advanced" : 1.0,
"needTime" : 12284006.0,
"needYield" : 0.0,
"saveState" : 96299.0,
"restoreState" : 96299.0,
"isEOF" : 0.0,
"invalidates" : 0.0,
"docsExamined" : 1.0,
"alreadyHasObj" : 0.0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 1.0,
"executionTimeMillisEstimate" : 20357.0,
"works" : 12284007.0,
"advanced" : 1.0,
"needTime" : 12284006.0,
"needYield" : 0.0,
"saveState" : 96299.0,
"restoreState" : 96299.0,
"isEOF" : 0.0,
"invalidates" : 0.0,
"keyPattern" : {
"from" : 1.0,
"to" : 1.0
},
"indexName" : "range",
"isMultiKey" : false,
"multiKeyPaths" : {
"from" : [
],
"to" : [
]
},
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1.0,
"direction" : "forward",
"indexBounds" : {
"from" : [
"[-inf.0, 3682093364]"
],
"to" : [
"[3682093364, inf.0]"
]
},
"keysExamined" : 12284007.0,
"seeks" : 12284007.0,
"dupsTested" : 0.0,
"dupsDropped" : 0.0,
"seenInvalidated" : 0.0
}
}
},
"allPlansExecution" : [
]
},
"serverInfo" : {
"host" : "LAPTOP-Q96TVSN8",
"port" : 27017.0,
"version" : "3.4.9",
"gitVersion" : "876ebee8c7dd0e2d992f36a848ff4dc50ee6603e"
},
"ok" : 1.0
}
Supplying a hint does not make a difference. explain seems to indicate that the proper (and only) index is already being used but most of the execution time (20s) is spent in IXSCAN. The MongoDB log shows that many index items were scanned but only one document was ever touched and returned. It also shows a crazy number of locks and yields considering there are ZERO concurrent operations on the database. The underlying engine is wiredTiger on an SSD disk. MongoDB RAM usage is at 7 GB.
2017-10-10T10:06:14.456+0200 I COMMAND [conn33] command db.location appName: "MongoDB Shell" command: explain { explain: { find: "location", filter: { from: { $lte: 3682093364 }, to: { $gte: 3682093364 } }, limit: 1.0, singleBatch: false }, verbosity: "allPlansExecution" } numYields:96299 reslen:1944 locks:{ Global: { acquireCount: { r: 192600 } }, Database: { acquireCount: { r: 96300 } }, Collection: { acquireCount: { r: 96300 } } } protocol:op_command 21526ms
Is there a better way to structure the document so that the lookups are faster considering my ranges are never overlapping? Is there something obvious that I am doing wrong?
UPDATE:
When I drop the index, COLLSCAN is used and the document is found in consistent 8-9 seconds.

I hate to answer my own questions but then again I am happy for finding the solution.
Even though it makes sense to create such a composite index, considering the specifics of non-overlapping ranges it turns out that the search scope is just too broad. The higher the input number, the longer it will take to find the result as more and more index entries are found that satisfy from <= number and last result in the search scope is actually the one we are looking for (index is scanned from left to right).
The solution is to modify the index to be either { from: -1 } or { to: 1 }. The composite index is really not necessary in this scenario as the ranges are not overlapping and the very first document found by the index is the very document being returned. This is now lightning fast just as expected.
You live and learn...

Related

MongoDB disk read performance is very low

I have a MongoDB instance with a database and a collection with 80GB of data. The number of documents inside is about 4M with a comparatively large document size of about 20kB on average. Among other more elementary stuff, each documents contains one list of 1024 elements and also 3-4 lists of 200 numbers.
I perform a simple batch find query over a properly indexed string field ('isbn'), intending to get 5000 documents (projected on relevant part) in one batch. For this, I use the $in operator:
rows = COLLECTION.find({"isbn": {"$in": candidate_isbns}}, {"_id": 0,
"isbn": 1,
"other_stuff": 1})
The IXSCAN stage works properly as intended. Since the corresponding documents, however, are not within the WiredTiger cache yet (and probably never will be for my limited 32GB RAM), the data has to be read from disk during FETCH in most cases. (Unfortunately, "other_stuff" is too heavy to yield for an index that could cover this query.)
The SSD attached to my virtual cloud machine has a read performance of about 90MB/s, which is not great, but should be sufficient for now. However, when I monitor the disk read speed (via iostats, for example) the speed during the query goes down to roughly 3MB/s, which seems to be very poor. I can verify this poor behaviour by checking the profiler output (MongoDB seems to split the 5000 in further batches, so I show only the output for a sub-batch of 2094):
{
"op" : "getmore",
"ns" : "data.metadata",
"command" : {
"getMore" : NumberLong(7543502234201790529),
"collection" : "metadata",
"lsid" : {
"id" : UUID("2f410f2d-2f74-4d3a-9041-27c4ddc51bd2")
},
"$db" : "data"
},
"originatingCommand" : {
"$truncated" : "{ find: \"metadata\", filter: { isbn: { $in: [ \"9783927781313\", ..."
},
"cursorid" : NumberLong(7543502234201790529),
"keysExamined" : 4095,
"docsExamined" : 2095,
"numYield" : 803,
"nreturned" : 2094,
"locks" : {
"ReplicationStateTransition" : {
"acquireCount" : {
"w" : NumberLong(805)
}
},
"Global" : {
"acquireCount" : {
"r" : NumberLong(805)
}
},
"Database" : {
"acquireCount" : {
"r" : NumberLong(804)
}
},
"Collection" : {
"acquireCount" : {
"r" : NumberLong(804)
}
},
"Mutex" : {
"acquireCount" : {
"r" : NumberLong(1)
}
}
},
"flowControl" : {},
"storage" : {
"data" : {
"bytesRead" : NumberLong(65454770),
"timeReadingMicros" : NumberLong(21386543)
}
},
"responseLength" : 16769511,
"protocol" : "op_msg",
"millis" : 21745,
"planSummary" : "IXSCAN { isbn: 1 }",
"execStats" : {
"stage" : "PROJECTION_SIMPLE",
"nReturned" : 2196,
"executionTimeMillisEstimate" : 21126,
"works" : 4288,
"advanced" : 2196,
"needTime" : 2092,
"needYield" : 0,
"saveState" : 817,
"restoreState" : 817,
"isEOF" : 0,
"transformBy" : {},
"inputStage" : {
"stage" : "FETCH",
"nReturned" : 2196,
"executionTimeMillisEstimate" : 21116,
"works" : 4288,
"advanced" : 2196,
"needTime" : 2092,
"needYield" : 0,
"saveState" : 817,
"restoreState" : 817,
"isEOF" : 0,
"docsExamined" : 2196,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 2196,
"executionTimeMillisEstimate" : 531,
"works" : 4288,
"advanced" : 2196,
"needTime" : 2092,
"needYield" : 0,
"saveState" : 817,
"restoreState" : 817,
"isEOF" : 0,
"keyPattern" : {
"isbn" : 1.0
},
"indexName" : "isbn_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"isbn" : []
},
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"isbn" : [
"[\"9780230391451\", \"9780230391451\"]",
"[\"9780230593206\", \"9780230593206\"]",
... ]
},
"keysExamined" : 4288,
"seeks" : 2093,
"dupsTested" : 0,
"dupsDropped" : 0
}
}
},
"ts" : ISODate("2022-01-24T07:57:12.132Z"),
"client" : "my_ip",
"allUsers" : [
{
"user" : "myUser",
"db" : "data"
}
],
"user" : "myUser#data"
}
By looking at the ratio of bytesRead vs. timeReadingMicros, this poor read speed of about 3MB/s can be confirmed, indeed.
My question: Why does this degradation of speed take place? Is it pathological, so that I need to do further investigations, or is it the expected behaviour, given the data setup above?
Any help is highly appreciated!

mongodb contains query empty result slow

I have around 10 millions document in MongoDB.
I'm trying to search for text inside the db db.outMessage.find({ "text" : /.*m.*/}) but it took too long (around 30 second) with no result, but if I search for existing text it took less than a second.
I tried to put index on text with same result.
db.outMessage.find({ "text" : /.*m.*/}).explain(true)
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "notification_center.outMessage",
"indexFilterSet" : false,
"parsedQuery" : {
"text" : {
"$regex" : ".*m.*"
}
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"filter" : {
"text" : {
"$regex" : ".*m.*"
}
},
"keyPattern" : {
"text" : 1
},
"indexName" : "text",
"isMultiKey" : false,
"multiKeyPaths" : {
"text" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"text" : [
"[\"\", {})",
"[/.*m.*/, /.*m.*/]"
]
}
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 0,
"executionTimeMillis" : 14354,
"totalKeysExamined" : 10263270,
"totalDocsExamined" : 0,
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 0,
"executionTimeMillisEstimate" : 12957,
"works" : 10263271,
"advanced" : 0,
"needTime" : 10263270,
"needYield" : 0,
"saveState" : 80258,
"restoreState" : 80258,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 0,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"filter" : {
"text" : {
"$regex" : ".*m.*"
}
},
"nReturned" : 0,
"executionTimeMillisEstimate" : 12461,
"works" : 10263271,
"advanced" : 0,
"needTime" : 10263270,
"needYield" : 0,
"saveState" : 80258,
"restoreState" : 80258,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"text" : 1
},
"indexName" : "text",
"isMultiKey" : false,
"multiKeyPaths" : {
"text" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"text" : [
"[\"\", {})",
"[/.*m.*/, /.*m.*/]"
]
},
"keysExamined" : 10263270,
"seeks" : 1,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
},
"allPlansExecution" : [ ]
},
"serverInfo" : {
"host" : "acsdptest.arabiacell.net",
"port" : 27017,
"version" : "3.4.7",
"gitVersion" : "cf38c1b8a0a8dca4a11737581beafef4fe120bcd"
},

The index will essentially be a list of all the values of the text field, in lexicographical order, i.e. sorted by the first letter.
Since the query executor has no way to predict which values might contain an 'm', it must examine all of the index entries.
In the case of this query, that means 10,263,270 index keys were examined, after being read from disk if the index was not already in the cache.
If this is actually a keyword search and not a single-letter match, instead of $regex, you might be able to make use of the $text query operator, which requires a text index

Does MongoDB fetch entire document even if single field is projected?

I have a MongoDB Collection for weather data with each document consisting about 50 different weather parameters fields. Simple Example below:
{
"wind":7,
"swell":6,
"temp":32,
...
"50th_field":32
}
If I only need one field from all documents, say temp, my query would be this:
db.weather.find({},{ temp: 1})
So internally, does MongoDB has to fetch the entire document for just 1 field which was requested(projected)? Wouldn't it be an expensive operation?
I tried MongoDB Compass to benchmark timings, but the time required was <1ms so couldn't figure out.

MonogDB will read all data, however only field temp (and _id) will be transmitted over your network to the client. In case your document are rather big, then the over all performance should be better when you project only the fields you need to get.

Yes. This is how to avoid it:
create an index on temp
Use find(Temp)
turn off _id (necessary).
Run:
db.coll.find({ temp:{ $ne:null }},{ temp:1, _id:0 })`
{} triggers collscan because the algorithm tries to match the query fields with project
With {temp}, {temp, _id:0} it says: "Oh, I only need temp".
It should also be smart to tell that {}, {temp, _id:0} only needs index, but it's not.

Basically using projection with limiting fields is always faster then fetch full document, You can even use the covered index to avoid examining the documents(no disk IO) the archive better performance.
Check the executionStats of demo below, the totalDocsExamined was 0! but you must remove the _id field in projection because it's not included in index.
See also:
https://docs.mongodb.com/manual/core/query-optimization/#covered-query
> db.test.insertOne({name: 'TJT'})
{
"acknowledged" : true,
"insertedId" : ObjectId("5faa0c8469dffee69357dde3")
}
> db.test.createIndex({name: 1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
db.test.explain('executionStats').find({name: 'TJT'}, {_id: 0, name: 1})
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "memo.test",
"indexFilterSet" : false,
"parsedQuery" : {
"name" : {
"$eq" : "TJT"
}
},
"winningPlan" : {
"stage" : "PROJECTION",
"transformBy" : {
"_id" : 0,
"name" : 1
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"name" : 1
},
"indexName" : "name_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"name" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"name" : [
"[\"TJT\", \"TJT\"]"
]
}
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 0,
"totalKeysExamined" : 1,
"totalDocsExamined" : 0,
"executionStages" : {
"stage" : "PROJECTION",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"transformBy" : {
"_id" : 0,
"name" : 1
},
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"name" : 1
},
"indexName" : "name_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"name" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"name" : [
"[\"TJT\", \"TJT\"]"
]
},
"keysExamined" : 1,
"seeks" : 1,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
}
}
}

Mongodb index not used with aggregate query

Context
=> Check edit !
I'm running some queries in which sorting is applied and that are supposed to take advantage of indexes. For some indexes, for certain queries (aggregate) on a given environment, queries are by-passing the index and I haven't yet been able to figure out why.
Setup
I have a mongo collection that contains 3 indexes:
_id
definition.name
definition.financial.profitability.highlights.prof_net
I'm using 2 queries to test usage of each indexes:
B. Find and sort query
db.getCollection('properties_po').find().sort({"definition.financial.profitability.highlights.prof_net" : 1.0 }).limit(1);
db.getCollection('properties_po').find().sort({"definition.name" : 1.0}).limit(1);
B. Aggregate with sort query
db.getCollection("properties_po").aggregate([{"$sort":{"definition.name":1.0}},{"$limit":1}])
db.getCollection("properties_po").aggregate([{"$sort":{"definition.financial.profitability.highlights.prof_net":1.0}},{"$limit":1}])
I use db.properties_po.aggregate( [ { $indexStats: { } } ] ) to check if index usage has been incremented after each queries.
Results
Environment 1 (local mongodb 3.4):
Both queries on both indexes works as expected, incrementing each index usages after each query.
Environment 2 (prod monhodb 3.6):
Queries A works as expected and index usage is incremented after each query and for each indexes.
Query B works for the index "definition.name" where index usage is incremented properly.
Query B doesn't work for index "definition.financial.profitability.highlights.prof_net" where index is not used (very slow query) as it it reflected in the index usage not being incremented.
I'm not sure where to look at, it might be some mongodb configuration that I'm missing, some nested document attribute limitation or maybe my index is not created properly.
Thanks for your help
EDIT 18/03
It is not related to mongodb version as I've just tested creating the same collection on the prod server and I have 2 different results there too.
Collection 1: 15026 records
Output of the aggregate query with explain:
db.getCollection("properties").explain("executionStats").aggregate([{"$sort":{"definition.financial.profitability.highlights.prof_net":1.0}},{"$limit":1}])
-
{
"stages" : [
{
"$cursor" : {
"query" : {
},
"queryPlanner" : {
"plannerVersion" : 1.0,
"namespace" : "collec.properties",
"indexFilterSet" : false,
"parsedQuery" : {
},
"winningPlan" : {
"stage" : "COLLSCAN",
"direction" : "forward"
},
"rejectedPlans" : [
]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 15026.0,
"executionTimeMillis" : 1522.0,
"totalKeysExamined" : 0.0,
"totalDocsExamined" : 15026.0,
"executionStages" : {
"stage" : "COLLSCAN",
"nReturned" : 15026.0,
"executionTimeMillisEstimate" : 1506.0,
"works" : 15028.0,
"advanced" : 15026.0,
"needTime" : 1.0,
"needYield" : 0.0,
"saveState" : 128.0,
"restoreState" : 128.0,
"isEOF" : 1.0,
"invalidates" : 0.0,
"direction" : "forward",
"docsExamined" : 15026.0
}
}
}
},
{
"$sort" : {
"sortKey" : {
"definition.financial.profitability.highlights.prof_net" : 1.0
},
"limit" : NumberLong(1)
}
}
],
"ok" : 1.0
}
Collection 2: 2 records
Same query output:
db.getCollection("properties_pl").explain("executionStats").aggregate([{"$sort":{"definition.financial.profitability.highlights.prof_net":1.0}},{"$limit":1}])
-
{
"stages" : [
{
"$cursor" : {
"query" : {
},
"sort" : {
"definition.financial.profitability.highlights.prof_net" : 1.0
},
"limit" : NumberLong(1),
"queryPlanner" : {
"plannerVersion" : 1.0,
"namespace" : "patrimmoine.properties_pl",
"indexFilterSet" : false,
"parsedQuery" : {
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"definition.financial.profitability.highlights.prof_net" : 1.0
},
"indexName" : "definition.financial.profitability.highlights.prof_net_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"definition.financial.profitability.highlights.prof_net" : [
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2.0,
"direction" : "forward",
"indexBounds" : {
"definition.financial.profitability.highlights.prof_net" : [
"[MinKey, MaxKey]"
]
}
}
},
"rejectedPlans" : [
]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 2.0,
"executionTimeMillis" : 0.0,
"totalKeysExamined" : 2.0,
"totalDocsExamined" : 2.0,
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 2.0,
"executionTimeMillisEstimate" : 0.0,
"works" : 3.0,
"advanced" : 2.0,
"needTime" : 0.0,
"needYield" : 0.0,
"saveState" : 1.0,
"restoreState" : 1.0,
"isEOF" : 1.0,
"invalidates" : 0.0,
"docsExamined" : 2.0,
"alreadyHasObj" : 0.0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 2.0,
"executionTimeMillisEstimate" : 0.0,
"works" : 3.0,
"advanced" : 2.0,
"needTime" : 0.0,
"needYield" : 0.0,
"saveState" : 1.0,
"restoreState" : 1.0,
"isEOF" : 1.0,
"invalidates" : 0.0,
"keyPattern" : {
"definition.financial.profitability.highlights.prof_net" : 1.0
},
"indexName" : "definition.financial.profitability.highlights.prof_net_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"definition.financial.profitability.highlights.prof_net" : [
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2.0,
"direction" : "forward",
"indexBounds" : {
"definition.financial.profitability.highlights.prof_net" : [
"[MinKey, MaxKey]"
]
},
"keysExamined" : 2.0,
"seeks" : 1.0,
"dupsTested" : 0.0,
"dupsDropped" : 0.0,
"seenInvalidated" : 0.0
}
}
}
}
}
],
"ok" : 1.0
}
So one query is indeed not using any index and doing a "COLLSCAN" where the other is doing a "FETCH" with a "IXSCAN".
So my query is now, how could content of a collection change the behavior of a query?

Difficulty optimizing Mongo distinct query to use indexes

I am having difficulty persuading Mongo to run a distinct query that looks like it should be covered by the indexes without fetching a large number of documents in the collection.
My documents have the general form:
{
_tenantId: 'someString',
_productCategory: 'some string from a smallish set'
...
}
I have an index on (_tenantId, _productCategory).
I want to find out what the set of distinct product categories is for a given tenant, so the query is:
db.products.distinct( '_productCategory', { _tenantId: '463171c3-d15f-4699-893d-3046327f8e1f'})
This runs rather slowly (several seconds for a collection of around half a million products against a local DB, which is Mongo 3.2.9). Against our pre-production SaaS-based Mongo (which is probably more memory constrained than my local instance which has free run of my machine) it take several 10s of seconds for the same data.
Explaining the query yields:
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "engage-prod.products",
"indexFilterSet" : false,
"parsedQuery" : {
"_tenantId" : {
"$eq" : "463171c3-d15f-4699-893d-3046327f8e1f"
}
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"_tenantId" : 1,
"_productCategory" : 1
},
"indexName" : "_tenantId_1__productCategory_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"_tenantId" : [
"[\"463171c3-d15f-4699-893d-3046327f8e1f\", \"463171c3-d15f-4699-893d-3046327f8e1f\"]"
],
"_productCategory" : [
"[MinKey, MaxKey]"
]
}
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 406871,
"executionTimeMillis" : 358,
"totalKeysExamined" : 406871,
"totalDocsExamined" : 406871,
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 406871,
"executionTimeMillisEstimate" : 80,
"works" : 406872,
"advanced" : 406871,
"needTime" : 0,
"needYield" : 0,
"saveState" : 3178,
"restoreState" : 3178,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 406871,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 406871,
"executionTimeMillisEstimate" : 40,
"works" : 406872,
"advanced" : 406871,
"needTime" : 0,
"needYield" : 0,
"saveState" : 3178,
"restoreState" : 3178,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"_tenantId" : 1,
"_productCategory" : 1
},
"indexName" : "_tenantId_1__productCategory_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"_tenantId" : [
"[\"463171c3-d15f-4699-893d-3046327f8e1f\", \"463171c3-d15f-4699-893d-3046327f8e1f\"]"
],
"_productCategory" : [
"[MinKey, MaxKey]"
]
},
"keysExamined" : 406871,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
}
},
"serverInfo" : {
"host" : "Stevens-MacBook-Pro.local",
"port" : 27017,
"version" : "3.2.9",
"gitVersion" : "22ec9e93b40c85fc7cae7d56e7d6a02fd811088c"
},
"ok" : 1
}
Note that even though it runs an IXSCAN it still returns over 400K documents (nReturned).
If I create a compound field _tenantAndProductCategory containing a lexical concatenation (with a : separator) and index that so it's a single field index, then the query:
db.products.explain('executionStats').distinct( '_productTenantAndCategory', { _productTenantAndCategory: {$gte: '463171c3-d15f-4699-893d-3046327f8e1f',$lt: '463171c3-d15f-4699-893d-3046327f8e1g'}})
works entirely within the index and yields:
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "engage-prod.products",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"_productTenantAndCategory" : {
"$lt" : "463171c3-d15f-4699-893d-3046327f8e1g"
}
},
{
"_productTenantAndCategory" : {
"$gte" : "463171c3-d15f-4699-893d-3046327f8e1f"
}
}
]
},
"winningPlan" : {
"stage" : "PROJECTION",
"transformBy" : {
"_id" : 0,
"_productTenantAndCategory" : 1
},
"inputStage" : {
"stage" : "DISTINCT_SCAN",
"keyPattern" : {
"_productTenantAndCategory" : 1
},
"indexName" : "_productTenantAndCategory_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"_productTenantAndCategory" : [
"[\"463171c3-d15f-4699-893d-3046327f8e1f\", \"463171c3-d15f-4699-893d-3046327f8e1g\")"
]
}
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 62,
"executionTimeMillis" : 0,
"totalKeysExamined" : 63,
"totalDocsExamined" : 0,
"executionStages" : {
"stage" : "PROJECTION",
"nReturned" : 62,
"executionTimeMillisEstimate" : 0,
"works" : 63,
"advanced" : 62,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"transformBy" : {
"_id" : 0,
"_productTenantAndCategory" : 1
},
"inputStage" : {
"stage" : "DISTINCT_SCAN",
"nReturned" : 62,
"executionTimeMillisEstimate" : 0,
"works" : 63,
"advanced" : 62,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"_productTenantAndCategory" : 1
},
"indexName" : "_productTenantAndCategory_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"_productTenantAndCategory" : [
"[\"463171c3-d15f-4699-893d-3046327f8e1f\", \"463171c3-d15f-4699-893d-3046327f8e1g\")"
]
},
"keysExamined" : 63
}
}
},
"serverInfo" : {
"host" : "Stevens-MacBook-Pro.local",
"port" : 27017,
"version" : "3.2.9",
"gitVersion" : "22ec9e93b40c85fc7cae7d56e7d6a02fd811088c"
},
"ok" : 1
}
Having to build single field indexes with manually compounded keys for all the aggregation queries I need is not a very desirable path to follow. Since all the information is present in the compound index I started with, why can't Mongo execute the original distinct query with cover by that index? Is there anything I can do to overcome this in the way of query optimization?
Note This is actually a sub-problem of a slightly more complex one involving an aggregation pipeline to actually count the number of occurrences of each category, but I am restricting my question for now to the simpler distinct query since it seems to capture the essence of failure to use an index that should cover things (which I was also seeing in the aggregation pipeline case), while being a simpler overall query.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Unacceptably slow MongoDB $lte + $gte query with index - mongodb

Related

MongoDB disk read performance is very low

mongodb contains query empty result slow

Does MongoDB fetch entire document even if single field is projected?

Mongodb index not used with aggregate query

Difficulty optimizing Mongo distinct query to use indexes

Categories

Resources