I have one collection with 3 million documents and the following indexes:
{ ts : 1 } , {u_id: 1}
Note that these are two separate ascending indexes, not a compound index.
When I run this query:
db.collection.find({u_id: 'user'}).sort({ts : -1}).skip(0).limit(1)
it takes +100ms. I have the following logs:
2017-04-15T06:42:01.147+0000 I COMMAND [conn783] query
db.collection query: { orderby: { ts: -1 }, $query: {
u_id: "user-ki-id } } planSummary: IXSCAN { u_id:
1 }, IXSCAN { u_id: 1 } ntoreturn:1 ntoskip:0 keysExamined:10795
docsExamined:10795 hasSortStage:1 cursorExhausted:1 keyUpdates:0
writeConflicts:0 numYields:86 nreturned:1 reslen:771 locks:{ Global: {
acquireCount: { r: 174 } }, Database: { acquireCount: { r: 87 } },
Collection: { acquireCount: { r: 87 } } } 246ms
A few notable points about the problem:
There is no other load on MongoDB i.e. no other queries which take +100ms
This is happening every minute; I think I am storing data every minute so this is happening
The query flow is to first run the read query (as above), then the next query is a bulk insertion. This flow is repeated every one minute.
So my questions are:
Why is it happening? Are there any design flaws in my indexing?
Might it be worthwhile to change indexing to be descending, like {ts: -1}? What is the actual difference between these indexes?
According to MongoDB documentation, when you are doing sorting with order then result will pick from disk not "in-memory". Does this explain why it takes +100ms?
Can anybody explain me profiling log in detail level?
Is it desired behaviour of MongoDB?
The same thing is also happening when I run a range search on this collection; this takes 3-5 seconds.
EDIT:
I have only add {u_id: 1, ts: -1} index. Remove all other index (except _id). Still in first time query execution taking +100ms. This should not happen.
Query:
db.getCollection('locations') .find({u_id: "USR-WOWU"})
.sort({ts: -1}) .explain(true)
OutPut::
/* 1 */ {
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "db_name.collection_name",
"indexFilterSet" : false,
"parsedQuery" : {
"user_id" : {
"$eq" : "USR-WOWU"
}
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"u_id" : 1.0,
"ts" : -1.0
},
"indexName" : "u_id_1_ts_-1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"u_id" : [
"[\"USR-WOWU\", \"USR-WOWU\"]"
],
"ts" : [
"[MaxKey, MinKey]"
]
}
}
},
"rejectedPlans" : []
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 164,
"executionTimeMillis" : 119,
"totalKeysExamined" : 164,
"totalDocsExamined" : 164,
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 164,
"executionTimeMillisEstimate" : 120,
"works" : 165,
"advanced" : 164,
"needTime" : 0,
"needYield" : 0,
"saveState" : 3,
"restoreState" : 3,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 164,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 164,
"executionTimeMillisEstimate" : 0,
"works" : 165,
"advanced" : 164,
"needTime" : 0,
"needYield" : 0,
"saveState" : 3,
"restoreState" : 3,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"u_id" : 1.0,
"ts" : -1.0
},
"indexName" : "u_id_1_ts_-1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"u_id" : [
"[\"USR-WOWU\", \"USR-WOWU\"]"
],
"ts" : [
"[MaxKey, MinKey]"
]
},
"keysExamined" : 164,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
},
"allPlansExecution" : []
},
"serverInfo" : {
"host" : "manish",
"port" : 22022,
"version" : "3.2.13",
"gitVersion" : "23899209cad60aaafe114f6aea6cb83025ff51bc"
},
"ok" : 1.0 }
Please copy above jSON and format into any editor.
After above query, the next same query will response with in ~2 ms. But When I do few insertion then after one min same thing will be repeated. (1st time query will take time +100ms and then it will take ~2ms.)
Is something missing or anything is required to configuration in my mongoDB ??
Why is it happening
The docsExamined:10795 and hasSortStage:1 portions of this log line indicates that the query is scanning 10,795 from disk and then sorting the results in memory. A guide on interpreting log lines can be found here.
A performance improvement can likely be gained by indexing this query to avoid the in-memory sort.
For this query, you should try creating the index { 'u_id' : 1, 'ts' : -1 }.
Is it really worthfull if I will change indexing like {ts: -1} in descending order.
Indexes can be read in either direction, so the index order isn't super important on single field indexes. However, sort ordering can be very important in compound indexes.
Updated
Based on the explain plan, the query is now properly using the index to read the results from the index in order, which avoids the in-memory sort. It looks like this knocked off ~100ms off the query.
However, it looks like this query is no longer using .skip(0).limit(1). Can you add these back in and see if performance improves?
There doesn't appear to be anything wrong with your deployment; this behavior seems normal for queries that are not fully indexed.
Re-running the exact same query will be quick because the existing results ("the working set") are already stored in memory. Inserting new data can make the results of the query change, meaning the results may need to be read back into memory again.
Related
I have a MongoDB Collection for weather data with each document consisting about 50 different weather parameters fields. Simple Example below:
{
"wind":7,
"swell":6,
"temp":32,
...
"50th_field":32
}
If I only need one field from all documents, say temp, my query would be this:
db.weather.find({},{ temp: 1})
So internally, does MongoDB has to fetch the entire document for just 1 field which was requested(projected)? Wouldn't it be an expensive operation?
I tried MongoDB Compass to benchmark timings, but the time required was <1ms so couldn't figure out.
MonogDB will read all data, however only field temp (and _id) will be transmitted over your network to the client. In case your document are rather big, then the over all performance should be better when you project only the fields you need to get.
Yes. This is how to avoid it:
create an index on temp
Use find(Temp)
turn off _id (necessary).
Run:
db.coll.find({ temp:{ $ne:null }},{ temp:1, _id:0 })`
{} triggers collscan because the algorithm tries to match the query fields with project
With {temp}, {temp, _id:0} it says: "Oh, I only need temp".
It should also be smart to tell that {}, {temp, _id:0} only needs index, but it's not.
Basically using projection with limiting fields is always faster then fetch full document, You can even use the covered index to avoid examining the documents(no disk IO) the archive better performance.
Check the executionStats of demo below, the totalDocsExamined was 0! but you must remove the _id field in projection because it's not included in index.
See also:
https://docs.mongodb.com/manual/core/query-optimization/#covered-query
> db.test.insertOne({name: 'TJT'})
{
"acknowledged" : true,
"insertedId" : ObjectId("5faa0c8469dffee69357dde3")
}
> db.test.createIndex({name: 1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
db.test.explain('executionStats').find({name: 'TJT'}, {_id: 0, name: 1})
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "memo.test",
"indexFilterSet" : false,
"parsedQuery" : {
"name" : {
"$eq" : "TJT"
}
},
"winningPlan" : {
"stage" : "PROJECTION",
"transformBy" : {
"_id" : 0,
"name" : 1
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"name" : 1
},
"indexName" : "name_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"name" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"name" : [
"[\"TJT\", \"TJT\"]"
]
}
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 0,
"totalKeysExamined" : 1,
"totalDocsExamined" : 0,
"executionStages" : {
"stage" : "PROJECTION",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"transformBy" : {
"_id" : 0,
"name" : 1
},
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"name" : 1
},
"indexName" : "name_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"name" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"name" : [
"[\"TJT\", \"TJT\"]"
]
},
"keysExamined" : 1,
"seeks" : 1,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
}
}
}
Using mongo server v3.6.16.
I have a mongo collection with about 18m records. Records are being added at about 100k a day. I have a query that runs fairly often on the collection that depends on two values - user_id and server_time_stamp. I have a compound index set up for those two fields.
The index is regularly getting stale - and queries are taking minutes to complete and causing the server to burn all the CPU it can grab. As soon as I regenerate the index, queries happen quickly. But then a day or two later, the index is stale again. (ed. the index is failing more quickly now - within 30 mins.) I have no idea why the index is going stale - what can I look for?
Edit
Here are the index Fields:
{
"uid" : 1,
"server_time_stamp" : -1
}
and index options:
{
"v" : 2,
"name" : "server_time_stamp_1_uid_1",
"ns" : "sefaria.user_history"
}
This appears to be a Heisenbug. When I used "explain", it performs well. Here is one of the pathological queries, from the long query log, taking 445 seconds:
sefaria.user_history command: find { find: "user_history", filter: { server_time_stamp: { $gt: 1577918252 }, uid: 80588 }, sort: { _id: 1 }, lsid: { id: UUID("4936fb55-8514-4442-b852-306686985126") }, $db: "sefaria", $readPreference: { mode: "primaryPreferred" } } planSummary: IXSCAN { _id: 1 } keysExamined:17286277 docsExamined:17286277 cursorExhausted:1 numYields:142780 nreturned:79 reslen:35375 locks:{ Global: { acquireCount: { r: 285562 } }, Database: { acquireCount: { r: 142781 } }, Collection: { acquireCount: { r: 142781 } } } protocol:op_msg 445101ms
Here's the results of explain for a performant query, right after regenerating the index:
{
"queryPlanner" : {
"plannerVersion" : NumberInt(1),
"namespace" : "sefaria.user_history",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"uid" : {
"$eq" : 80588.0
}
},
{
"server_time_stamp" : {
"$gt" : 1577918252.0
}
}
]
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"uid" : NumberInt(1),
"server_time_stamp" : NumberInt(-1)
},
"indexName" : "server_time_stamp_1_uid_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"uid" : [
],
"server_time_stamp" : [
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : NumberInt(2),
"direction" : "forward",
"indexBounds" : {
"uid" : [
"[80588.0, 80588.0]"
],
"server_time_stamp" : [
"[inf.0, 1577918252.0)"
]
}
}
},
"rejectedPlans" : [
{
"stage" : "FETCH",
"filter" : {
"server_time_stamp" : {
"$gt" : 1577918252.0
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"uid" : NumberInt(1),
"book" : NumberInt(1),
"last_place" : NumberInt(1)
},
"indexName" : "uid_1_book_1_last_place_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"uid" : [
],
"book" : [
],
"last_place" : [
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : NumberInt(2),
"direction" : "forward",
"indexBounds" : {
"uid" : [
"[80588.0, 80588.0]"
],
"book" : [
"[MinKey, MaxKey]"
],
"last_place" : [
"[MinKey, MaxKey]"
]
}
}
},
{
"stage" : "FETCH",
"filter" : {
"server_time_stamp" : {
"$gt" : 1577918252.0
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"uid" : NumberInt(1)
},
"indexName" : "uid",
"isMultiKey" : false,
"multiKeyPaths" : {
"uid" : [
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : NumberInt(2),
"direction" : "forward",
"indexBounds" : {
"uid" : [
"[80588.0, 80588.0]"
]
}
}
}
]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : NumberInt(97),
"executionTimeMillis" : NumberInt(1),
"totalKeysExamined" : NumberInt(97),
"totalDocsExamined" : NumberInt(97),
"executionStages" : {
"stage" : "FETCH",
"nReturned" : NumberInt(97),
"executionTimeMillisEstimate" : NumberInt(0),
"works" : NumberInt(99),
"advanced" : NumberInt(97),
"needTime" : NumberInt(0),
"needYield" : NumberInt(0),
"saveState" : NumberInt(3),
"restoreState" : NumberInt(3),
"isEOF" : NumberInt(1),
"invalidates" : NumberInt(0),
"docsExamined" : NumberInt(97),
"alreadyHasObj" : NumberInt(0),
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : NumberInt(97),
"executionTimeMillisEstimate" : NumberInt(0),
"works" : NumberInt(98),
"advanced" : NumberInt(97),
"needTime" : NumberInt(0),
"needYield" : NumberInt(0),
"saveState" : NumberInt(3),
"restoreState" : NumberInt(3),
"isEOF" : NumberInt(1),
"invalidates" : NumberInt(0),
"keyPattern" : {
"uid" : NumberInt(1),
"server_time_stamp" : NumberInt(-1)
},
"indexName" : "server_time_stamp_1_uid_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"uid" : [
],
"server_time_stamp" : [
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : NumberInt(2),
"direction" : "forward",
"indexBounds" : {
"uid" : [
"[80588.0, 80588.0]"
],
"server_time_stamp" : [
"[inf.0, 1577918252.0)"
]
},
"keysExamined" : NumberInt(97),
"seeks" : NumberInt(1),
"dupsTested" : NumberInt(0),
"dupsDropped" : NumberInt(0),
"seenInvalidated" : NumberInt(0)
}
}
},
"serverInfo" : {
"host" : "mongo-deployment-5cf4f4fff6-dz84r",
"port" : NumberInt(27017),
"version" : "3.6.15",
"gitVersion" : "18934fb5c814e87895c5e38ae1515dd6cb4c00f7"
},
"ok" : 1.0
}
The issue was about a query that runs well and uses the indexes suddenly stops using the index and results in a very poor performance. This is noted in the query plan and the log respectively.
The explain's output:
The query plan's "executionStats" says "totalKeysExamined" : NumberInt(97). The query filter is using index defined on the collection ("stage" : "IXSCAN") and the compound index "server_time_stamp_1_uid_1" is used. Also, the query's sort is using the index (the index on _id). As it is the query and the indexes are working as they are meant to be. And, "executionTimeMillis" : NumberInt(1) says that it is a performant query.
Details from the log:
{ ...
find: "user_history", filter: { server_time_stamp: { $gt: 1577918252 }, uid: 80588 }, sort: { _id: 1 }
planSummary: IXSCAN { _id: 1 } keysExamined:17286277 docsExamined:17286277 numYields:142780 nreturned:79
... }
From the log, note that the index "server_time_stamp_1_uid_1" is not used.
Discussion:
The data and the index (called as working set) for the frequently used queries are kept in the memory (RAM + file system cache). If the working set is not in the memory the system has to load it into the memory during the operation and it results in a slower performance. Reading from disk drive is much slower than the memory. Note that SSD drives are much faster than the HDD drives and when there is no option to increase the memory this could be an option.
Also, if the query is using indexes and the index size is large and could not be in memory, the index has to be read from the disk drive and it will slow down the operation. More memory is a solution and when not possible the solution can be in redesigning (or re-modeling) the data and its indexes.
But, the the problem in this case was not the available memory; there is enough of it.
The following info gives an idea about how much memory might be used for the working set for a given query:
db.collection.stats().indexSizes, size, count and avgObjSize.
Solution:
The query log with slow performance shows that the index "server_time_stamp_1_uid_1" is not used: planSummary: IXSCAN { _id: 1 }.
One way to make sure and force the query to use the index (always) is to use the hint on the query. The hint need to be on the index "server_time_stamp_1_uid_1". This way the situation as seen in the log will not happen.
Another way is to keep the index active in the memory. This can be achieved by running a query on the indexed fields only (a covered query: the query filter and returned fields are of indexed fields only). Running this dummy query, which runs often or before the actual query will make sure the index is available in the memory.
In this case, as #Laizer mentioned that supplying the hint to the query helped resolve the issue.
This behavior is due to the index not being capable of being selective and servicing the sort.
The log line for the slow operation is showing the operation using the _id index. The query planner likely made this selection to avoid having to sort results in memory (note the lack of hasSortStage: 1). As a consequence, however, it required scanning considerably more documents in memory (docsExamined:17286277) which made it take considerably longer.
Memory contention likely also played a part. Depending on load, the overhead from sorting results in memory may have contributed to pushing the index out of RAM and the _id index being selected.
A few comments:
As Babu noted, the explain posted above does not include a sort. Including the sort would likely show that stage consuming more time than the IXSCAN.
The name for the index (server_time_stamp_1_uid_1) suggests that server_time_stamp is placed first in the index, followed by uid. Equality matches should be prioritized; i.e. uid should be placed before ranges.
Some options to consider:
Create the index { "uid" : 1, "_id" : 1, "server_time_stamp" : 1 }. See here for guidance on sorting using indexes. Results may be mixed though given that both _id and server_time_stamp are likely to have a high cardinality, which means you may still be trading off scanning documents for avoiding a sort.
Assuming that the _id values are auto-generated, consider sorting by server_time_stamp rather than _id. This will allow you to bound AND sort using server_time_stamp_1_uid_1. The server_time_stamp is a timestamp, so it will also be relatively unique.
sefaria.user_history command: find { find: "user_history", filter: { server_time_stamp: { $gt: 1577918252 }, uid: 80588 }, sort: { _id: 1 }, lsid: { id: UUID("4936fb55-8514-4442-b852-306686985126") }, $db: "sefaria", $readPreference: { mode: "primaryPreferred" } } planSummary: IXSCAN { _id: 1 } keysExamined:17286277 docsExamined:17286277 cursorExhausted:1 numYields:142780 nreturned:79 reslen:35375 locks:{ Global: { acquireCount: { r: 285562 } }, Database: { acquireCount: { r: 142781 } }, Collection: { acquireCount: { r: 142781 } } } protocol:op_msg 445101ms
Looking at the query plan, the query uses _id index. Is it because you have a sort of _id field. I looked at your other plan attached.
"executionSuccess" : true,
"nReturned" : NumberInt(97),
"executionTimeMillis" : NumberInt(1),
"totalKeysExamined" : NumberInt(97),
"totalDocsExamined" : NumberInt(97),
The number of documents returned / examined are 1:1 ratio.
Also the query is using
"indexName" : "server_time_stamp_1_uid_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"uid" : [
],
"server_time_stamp" : [
]
},
I think there is something is missing in both queries. May be the sort is not mentioned in the good plan. Can you please check.
I believe that the issue here was memory. The instance was operating near the limit of physical memory. I can't say for sure, but I believe that the relevant index was being removed from memory, and that the poor query performance was a result of that. Regenerating the index forced it back into memory (assumedly, something else got kicked out of memory.)
I've put the instance on node with much more memory, and so far it seems to be performing well.
I have a txt file with mongoDB queries, like this:
db.telephone.find({'brand' : 'Apple'});
db.telephone.find({'brand' : 'Samsung'});
...to a total of about 1500 rows. I am executing this query like this.
mongo myDatabase C:\path\mongoDB.txt
Now I need to measure the time how long it takes to execute all of these queries. I dont really care about the output, I really only care about the time it takes (as a part of an experiment).
I thought that if I create a collection times and insert current time to it like this db.times.insert({time: Date()}); at the beginning and end of the query file, it would do what I need, but it seemingly does not work, as both of these result times are the same in the end (and I believe that executing all these queries did take more than 1 second for sure).
Is this because I dont print the output, so the queries dont really get executed? Or why does this not work? And is there a better way how to measure the time it takes to execute these queries from a file? Thanks you.
You can assign start and end time in the file itself. The following is an example:
var start_time = new Date().valueOf();
db.telephone.find({'brand' : 'Apple'});
db.telephone.find({'brand' : 'Samsung'});
var end_time = new Date().valueOf();
print(end_time-start_time);
How we can precisely measure the execution time?
To analyze the query, we can use explain(). It returns the complete statistics of the query. The following is an example:
db.telephone.find({'brand' : 'Apple'}).explain("executionStats")
Output:
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "check.telephone",
"indexFilterSet" : false,
"parsedQuery" : {
"brand" : {
"$eq" : "Apple"
}
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"brand" : {
"$eq" : "Apple"
}
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 35,
"totalKeysExamined" : 0,
"totalDocsExamined" : 1,
"executionStages" : {
"stage" : "COLLSCAN",
"filter" : {
"brand" : {
"$eq" : "Apple"
}
},
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 3,
"advanced" : 1,
"needTime" : 1,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 1
}
},
"serverInfo" : {
"host" : "theMechanic",
"port" : 27017,
"version" : "4.0.11",
"gitVersion" : "417d1a712e9f040d54beca8e4943edce218e9a8c"
},
"ok" : 1
}
Note: The executionStats.executionTimeMillis holds actual query execution time.
I am running Community MongoDB 3.4.9 on my laptop with 64 GB RAM. I have a collection with 12+ million documents. Each document has at least from and to fields of type Int64. The from-to are unique ranges. There are no documents with overlapping ranges. There is an index on the collection as follows:
{
"v" : NumberInt(1),
"unique" : true,
"key" : {
"from" : NumberInt(1),
"to" : NumberInt(1)
},
"name" : "range",
"ns" : "db.location",
"background" : true
}
The server/database is idle. There are no clients. I run the query below over and over and I get a constant execution time of roughly 21 seconds.
db.location.find({from:{$lte:NumberLong(3682093364)},to:{$gte:NumberLong(3682093364)}}).limit(1)
Reversal of and conditions does not make a difference with respect to execution time. The explain command shows the following.
{
"queryPlanner" : {
"plannerVersion" : 1.0,
"namespace" : "db.location",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"from" : {
"$lte" : NumberLong(3682093364)
}
},
{
"to" : {
"$gte" : NumberLong(3682093364)
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 1.0,
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"from" : 1.0,
"to" : 1.0
},
"indexName" : "range",
"isMultiKey" : false,
"multiKeyPaths" : {
"from" : [
],
"to" : [
]
},
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1.0,
"direction" : "forward",
"indexBounds" : {
"from" : [
"[-inf.0, 3682093364]"
],
"to" : [
"[3682093364, inf.0]"
]
}
}
}
},
"rejectedPlans" : [
]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1.0,
"executionTimeMillis" : 21526.0,
"totalKeysExamined" : 12284007.0,
"totalDocsExamined" : 1.0,
"executionStages" : {
"stage" : "LIMIT",
"nReturned" : 1.0,
"executionTimeMillisEstimate" : 20945.0,
"works" : 12284008.0,
"advanced" : 1.0,
"needTime" : 12284006.0,
"needYield" : 0.0,
"saveState" : 96299.0,
"restoreState" : 96299.0,
"isEOF" : 1.0,
"invalidates" : 0.0,
"limitAmount" : 1.0,
"inputStage" : {
"stage" : "FETCH",
"nReturned" : 1.0,
"executionTimeMillisEstimate" : 20714.0,
"works" : 12284007.0,
"advanced" : 1.0,
"needTime" : 12284006.0,
"needYield" : 0.0,
"saveState" : 96299.0,
"restoreState" : 96299.0,
"isEOF" : 0.0,
"invalidates" : 0.0,
"docsExamined" : 1.0,
"alreadyHasObj" : 0.0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 1.0,
"executionTimeMillisEstimate" : 20357.0,
"works" : 12284007.0,
"advanced" : 1.0,
"needTime" : 12284006.0,
"needYield" : 0.0,
"saveState" : 96299.0,
"restoreState" : 96299.0,
"isEOF" : 0.0,
"invalidates" : 0.0,
"keyPattern" : {
"from" : 1.0,
"to" : 1.0
},
"indexName" : "range",
"isMultiKey" : false,
"multiKeyPaths" : {
"from" : [
],
"to" : [
]
},
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1.0,
"direction" : "forward",
"indexBounds" : {
"from" : [
"[-inf.0, 3682093364]"
],
"to" : [
"[3682093364, inf.0]"
]
},
"keysExamined" : 12284007.0,
"seeks" : 12284007.0,
"dupsTested" : 0.0,
"dupsDropped" : 0.0,
"seenInvalidated" : 0.0
}
}
},
"allPlansExecution" : [
]
},
"serverInfo" : {
"host" : "LAPTOP-Q96TVSN8",
"port" : 27017.0,
"version" : "3.4.9",
"gitVersion" : "876ebee8c7dd0e2d992f36a848ff4dc50ee6603e"
},
"ok" : 1.0
}
Supplying a hint does not make a difference. explain seems to indicate that the proper (and only) index is already being used but most of the execution time (20s) is spent in IXSCAN. The MongoDB log shows that many index items were scanned but only one document was ever touched and returned. It also shows a crazy number of locks and yields considering there are ZERO concurrent operations on the database. The underlying engine is wiredTiger on an SSD disk. MongoDB RAM usage is at 7 GB.
2017-10-10T10:06:14.456+0200 I COMMAND [conn33] command db.location appName: "MongoDB Shell" command: explain { explain: { find: "location", filter: { from: { $lte: 3682093364 }, to: { $gte: 3682093364 } }, limit: 1.0, singleBatch: false }, verbosity: "allPlansExecution" } numYields:96299 reslen:1944 locks:{ Global: { acquireCount: { r: 192600 } }, Database: { acquireCount: { r: 96300 } }, Collection: { acquireCount: { r: 96300 } } } protocol:op_command 21526ms
Is there a better way to structure the document so that the lookups are faster considering my ranges are never overlapping? Is there something obvious that I am doing wrong?
UPDATE:
When I drop the index, COLLSCAN is used and the document is found in consistent 8-9 seconds.
I hate to answer my own questions but then again I am happy for finding the solution.
Even though it makes sense to create such a composite index, considering the specifics of non-overlapping ranges it turns out that the search scope is just too broad. The higher the input number, the longer it will take to find the result as more and more index entries are found that satisfy from <= number and last result in the search scope is actually the one we are looking for (index is scanned from left to right).
The solution is to modify the index to be either { from: -1 } or { to: 1 }. The composite index is really not necessary in this scenario as the ranges are not overlapping and the very first document found by the index is the very document being returned. This is now lightning fast just as expected.
You live and learn...
I am facing problem to create covered query. I am using Mongo 3 latest version. Here is my sample data which I have inserted 10006 documents into MongoDB.
db.order.insert({ _id: 1, cust_id: "abc1", ord_date: ISODate("2012-11-02T17:04:11.102Z"), status: "A", amount: 50 })
db.order.insert({ _id: 2, cust_id: "xyz1", ord_date: ISODate("2013-10-01T17:04:11.102Z"), status: "A", amount: 100 })
db.order.insert({ _id: 3, cust_id: "xyz1", ord_date: ISODate("2013-10-12T17:04:11.102Z"), status: "D", amount: 25 })
db.order.insert({ _id: 4, cust_id: "xyz1", ord_date: ISODate("2013-10-11T17:04:11.102Z"), status: "D", amount: 125 })
db.order.insert({ _id: 5, cust_id: "abc1", ord_date: ISODate("2013-11-12T17:04:11.102Z"), status: "A", amount: 25 })
For Covered Query, All the fields in the query are part of an index so I have created index for status, ord_date, cust_id and amount fields like :
db.orders.createIndex({status: 1})
db.orders.createIndex({amount: 1})
db.orders.createIndex({ord_date: 1})
db.orders.createIndex({cust_id: 1})
I have executed following query.
db.orders.find(
{status : "A"},{ord_date : 1, cust_id : 1}
).sort({ amount: -1 }).explain()
But This explain query returns executionStats.totalDocsExamined = 200 instead of executionStats.totalDocsExamined = 0. means it is scan documents when I execute query. In Mongo 3, We can check index covered a query using executionStats.totalDocsExamined instead of indexOnly.
Can anyone please suggest me what I am doing wrong in covered query ?
Here is my output after index suggestion by Markus:
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "local.orders",
"indexFilterSet" : false,
"parsedQuery" : {
"status" : {
"$eq" : "A"
}
},
"winningPlan" : {
"stage" : "PROJECTION",
"transformBy" : {
"_id" : 1,
"ord_date" : 1,
"cust_id" : 1
},
"inputStage" : {
"stage" : "SORT",
"sortPattern" : {
"amount" : -1
},
"inputStage" : {
"stage" : "COLLSCAN",
"filter" : {
"status" : {
"$eq" : "A"
}
},
"direction" : "forward"
}
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 10004,
"executionTimeMillis" : 70,
"totalKeysExamined" : 0,
"totalDocsExamined" : 10018,
"executionStages" : {
"stage" : "PROJECTION",
"nReturned" : 10004,
"executionTimeMillisEstimate" : 70,
"works" : 20026,
"advanced" : 10004,
"needTime" : 10021,
"needFetch" : 0,
"saveState" : 157,
"restoreState" : 157,
"isEOF" : 1,
"invalidates" : 0,
"transformBy" : {
"_id" : 1,
"ord_date" : 1,
"cust_id" : 1
},
"inputStage" : {
"stage" : "SORT",
"nReturned" : 10004,
"executionTimeMillisEstimate" : 70,
"works" : 20026,
"advanced" : 10004,
"needTime" : 10020,
"needFetch" : 0,
"saveState" : 157,
"restoreState" : 157,
"isEOF" : 1,
"invalidates" : 0,
"sortPattern" : {
"amount" : -1
},
"memUsage" : 960384,
"memLimit" : 33554432,
"inputStage" : {
"stage" : "COLLSCAN",
"filter" : {
"status" : {
"$eq" : "A"
}
},
"nReturned" : 10004,
"executionTimeMillisEstimate" : 10,
"works" : 10020,
"advanced" : 10004,
"needTime" : 15,
"needFetch" : 0,
"saveState" : 157,
"restoreState" : 157,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 10018
}
}
},
"allPlansExecution" : [ ]
},
"serverInfo" : {
"host" : "pcd32",
"port" : 27017,
"version" : "3.0.7",
"gitVersion" : "6ce7cbe8c6b899552dadd907604559806aa2esd5"
}
}
While there are index intersections in MongoDB, they can be quite tricky to utilize. However, sticking to a rule of thumb is a rather safe bet:
When creating queries MongoDB, assume that only one index can be used at a time
This is especially true for covered queries, as detailed in the docs:
An index covers a query when both of the following apply:
all the fields in the query are part of an index, and
all the fields returned in the results are in the same index.
Having a compound index doesn't have drawbacks, when carefully crafted, as queries using only parts of that index can use it, too.
So in order to make your query covered, you need to have all keys you want to return in your index. Since you did not limit the fields returned ("projection" in MongoDB terms), I assume you need the _id field to be returned as well. Furthermore, your index should reflect your sorting order. So your index should look like:
db.orders.createIndex({_id:1,status:1, ord_date:1,cust_id:1,amount:-1})
for your query. Order matters, so in order to make best use of the newly created index, other queries should adhere to the same order of fields.
If you also need the _id field, then the below compound index should give you a covered query:
db.order.createIndex({status:1, amount:-1, ord_date:1, cust_id :1, _id:1})
If you don't need the _id field then use _id : 0 in the find(), so that _id is not retrieved and you can remove it from the index as well.
Note that in a covered query, ordering of the fields as compared to the actual query being executed is important for the index to be used in the execution of the query.