Using mongo server v3.6.16.
I have a mongo collection with about 18m records. Records are being added at about 100k a day. I have a query that runs fairly often on the collection that depends on two values - user_id and server_time_stamp. I have a compound index set up for those two fields.
The index is regularly getting stale - and queries are taking minutes to complete and causing the server to burn all the CPU it can grab. As soon as I regenerate the index, queries happen quickly. But then a day or two later, the index is stale again. (ed. the index is failing more quickly now - within 30 mins.) I have no idea why the index is going stale - what can I look for?
Edit
Here are the index Fields:
{
"uid" : 1,
"server_time_stamp" : -1
}
and index options:
{
"v" : 2,
"name" : "server_time_stamp_1_uid_1",
"ns" : "sefaria.user_history"
}
This appears to be a Heisenbug. When I used "explain", it performs well. Here is one of the pathological queries, from the long query log, taking 445 seconds:
sefaria.user_history command: find { find: "user_history", filter: { server_time_stamp: { $gt: 1577918252 }, uid: 80588 }, sort: { _id: 1 }, lsid: { id: UUID("4936fb55-8514-4442-b852-306686985126") }, $db: "sefaria", $readPreference: { mode: "primaryPreferred" } } planSummary: IXSCAN { _id: 1 } keysExamined:17286277 docsExamined:17286277 cursorExhausted:1 numYields:142780 nreturned:79 reslen:35375 locks:{ Global: { acquireCount: { r: 285562 } }, Database: { acquireCount: { r: 142781 } }, Collection: { acquireCount: { r: 142781 } } } protocol:op_msg 445101ms
Here's the results of explain for a performant query, right after regenerating the index:
{
"queryPlanner" : {
"plannerVersion" : NumberInt(1),
"namespace" : "sefaria.user_history",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"uid" : {
"$eq" : 80588.0
}
},
{
"server_time_stamp" : {
"$gt" : 1577918252.0
}
}
]
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"uid" : NumberInt(1),
"server_time_stamp" : NumberInt(-1)
},
"indexName" : "server_time_stamp_1_uid_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"uid" : [
],
"server_time_stamp" : [
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : NumberInt(2),
"direction" : "forward",
"indexBounds" : {
"uid" : [
"[80588.0, 80588.0]"
],
"server_time_stamp" : [
"[inf.0, 1577918252.0)"
]
}
}
},
"rejectedPlans" : [
{
"stage" : "FETCH",
"filter" : {
"server_time_stamp" : {
"$gt" : 1577918252.0
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"uid" : NumberInt(1),
"book" : NumberInt(1),
"last_place" : NumberInt(1)
},
"indexName" : "uid_1_book_1_last_place_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"uid" : [
],
"book" : [
],
"last_place" : [
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : NumberInt(2),
"direction" : "forward",
"indexBounds" : {
"uid" : [
"[80588.0, 80588.0]"
],
"book" : [
"[MinKey, MaxKey]"
],
"last_place" : [
"[MinKey, MaxKey]"
]
}
}
},
{
"stage" : "FETCH",
"filter" : {
"server_time_stamp" : {
"$gt" : 1577918252.0
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"uid" : NumberInt(1)
},
"indexName" : "uid",
"isMultiKey" : false,
"multiKeyPaths" : {
"uid" : [
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : NumberInt(2),
"direction" : "forward",
"indexBounds" : {
"uid" : [
"[80588.0, 80588.0]"
]
}
}
}
]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : NumberInt(97),
"executionTimeMillis" : NumberInt(1),
"totalKeysExamined" : NumberInt(97),
"totalDocsExamined" : NumberInt(97),
"executionStages" : {
"stage" : "FETCH",
"nReturned" : NumberInt(97),
"executionTimeMillisEstimate" : NumberInt(0),
"works" : NumberInt(99),
"advanced" : NumberInt(97),
"needTime" : NumberInt(0),
"needYield" : NumberInt(0),
"saveState" : NumberInt(3),
"restoreState" : NumberInt(3),
"isEOF" : NumberInt(1),
"invalidates" : NumberInt(0),
"docsExamined" : NumberInt(97),
"alreadyHasObj" : NumberInt(0),
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : NumberInt(97),
"executionTimeMillisEstimate" : NumberInt(0),
"works" : NumberInt(98),
"advanced" : NumberInt(97),
"needTime" : NumberInt(0),
"needYield" : NumberInt(0),
"saveState" : NumberInt(3),
"restoreState" : NumberInt(3),
"isEOF" : NumberInt(1),
"invalidates" : NumberInt(0),
"keyPattern" : {
"uid" : NumberInt(1),
"server_time_stamp" : NumberInt(-1)
},
"indexName" : "server_time_stamp_1_uid_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"uid" : [
],
"server_time_stamp" : [
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : NumberInt(2),
"direction" : "forward",
"indexBounds" : {
"uid" : [
"[80588.0, 80588.0]"
],
"server_time_stamp" : [
"[inf.0, 1577918252.0)"
]
},
"keysExamined" : NumberInt(97),
"seeks" : NumberInt(1),
"dupsTested" : NumberInt(0),
"dupsDropped" : NumberInt(0),
"seenInvalidated" : NumberInt(0)
}
}
},
"serverInfo" : {
"host" : "mongo-deployment-5cf4f4fff6-dz84r",
"port" : NumberInt(27017),
"version" : "3.6.15",
"gitVersion" : "18934fb5c814e87895c5e38ae1515dd6cb4c00f7"
},
"ok" : 1.0
}
The issue was about a query that runs well and uses the indexes suddenly stops using the index and results in a very poor performance. This is noted in the query plan and the log respectively.
The explain's output:
The query plan's "executionStats" says "totalKeysExamined" : NumberInt(97). The query filter is using index defined on the collection ("stage" : "IXSCAN") and the compound index "server_time_stamp_1_uid_1" is used. Also, the query's sort is using the index (the index on _id). As it is the query and the indexes are working as they are meant to be. And, "executionTimeMillis" : NumberInt(1) says that it is a performant query.
Details from the log:
{ ...
find: "user_history", filter: { server_time_stamp: { $gt: 1577918252 }, uid: 80588 }, sort: { _id: 1 }
planSummary: IXSCAN { _id: 1 } keysExamined:17286277 docsExamined:17286277 numYields:142780 nreturned:79
... }
From the log, note that the index "server_time_stamp_1_uid_1" is not used.
Discussion:
The data and the index (called as working set) for the frequently used queries are kept in the memory (RAM + file system cache). If the working set is not in the memory the system has to load it into the memory during the operation and it results in a slower performance. Reading from disk drive is much slower than the memory. Note that SSD drives are much faster than the HDD drives and when there is no option to increase the memory this could be an option.
Also, if the query is using indexes and the index size is large and could not be in memory, the index has to be read from the disk drive and it will slow down the operation. More memory is a solution and when not possible the solution can be in redesigning (or re-modeling) the data and its indexes.
But, the the problem in this case was not the available memory; there is enough of it.
The following info gives an idea about how much memory might be used for the working set for a given query:
db.collection.stats().indexSizes, size, count and avgObjSize.
Solution:
The query log with slow performance shows that the index "server_time_stamp_1_uid_1" is not used: planSummary: IXSCAN { _id: 1 }.
One way to make sure and force the query to use the index (always) is to use the hint on the query. The hint need to be on the index "server_time_stamp_1_uid_1". This way the situation as seen in the log will not happen.
Another way is to keep the index active in the memory. This can be achieved by running a query on the indexed fields only (a covered query: the query filter and returned fields are of indexed fields only). Running this dummy query, which runs often or before the actual query will make sure the index is available in the memory.
In this case, as #Laizer mentioned that supplying the hint to the query helped resolve the issue.
This behavior is due to the index not being capable of being selective and servicing the sort.
The log line for the slow operation is showing the operation using the _id index. The query planner likely made this selection to avoid having to sort results in memory (note the lack of hasSortStage: 1). As a consequence, however, it required scanning considerably more documents in memory (docsExamined:17286277) which made it take considerably longer.
Memory contention likely also played a part. Depending on load, the overhead from sorting results in memory may have contributed to pushing the index out of RAM and the _id index being selected.
A few comments:
As Babu noted, the explain posted above does not include a sort. Including the sort would likely show that stage consuming more time than the IXSCAN.
The name for the index (server_time_stamp_1_uid_1) suggests that server_time_stamp is placed first in the index, followed by uid. Equality matches should be prioritized; i.e. uid should be placed before ranges.
Some options to consider:
Create the index { "uid" : 1, "_id" : 1, "server_time_stamp" : 1 }. See here for guidance on sorting using indexes. Results may be mixed though given that both _id and server_time_stamp are likely to have a high cardinality, which means you may still be trading off scanning documents for avoiding a sort.
Assuming that the _id values are auto-generated, consider sorting by server_time_stamp rather than _id. This will allow you to bound AND sort using server_time_stamp_1_uid_1. The server_time_stamp is a timestamp, so it will also be relatively unique.
sefaria.user_history command: find { find: "user_history", filter: { server_time_stamp: { $gt: 1577918252 }, uid: 80588 }, sort: { _id: 1 }, lsid: { id: UUID("4936fb55-8514-4442-b852-306686985126") }, $db: "sefaria", $readPreference: { mode: "primaryPreferred" } } planSummary: IXSCAN { _id: 1 } keysExamined:17286277 docsExamined:17286277 cursorExhausted:1 numYields:142780 nreturned:79 reslen:35375 locks:{ Global: { acquireCount: { r: 285562 } }, Database: { acquireCount: { r: 142781 } }, Collection: { acquireCount: { r: 142781 } } } protocol:op_msg 445101ms
Looking at the query plan, the query uses _id index. Is it because you have a sort of _id field. I looked at your other plan attached.
"executionSuccess" : true,
"nReturned" : NumberInt(97),
"executionTimeMillis" : NumberInt(1),
"totalKeysExamined" : NumberInt(97),
"totalDocsExamined" : NumberInt(97),
The number of documents returned / examined are 1:1 ratio.
Also the query is using
"indexName" : "server_time_stamp_1_uid_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"uid" : [
],
"server_time_stamp" : [
]
},
I think there is something is missing in both queries. May be the sort is not mentioned in the good plan. Can you please check.
I believe that the issue here was memory. The instance was operating near the limit of physical memory. I can't say for sure, but I believe that the relevant index was being removed from memory, and that the poor query performance was a result of that. Regenerating the index forced it back into memory (assumedly, something else got kicked out of memory.)
I've put the instance on node with much more memory, and so far it seems to be performing well.
Related
I have a collection with 62k documents in it. The same collection has a bunch of indexes too, most of them simple, single field ones. What I am observing is that the following query takes extremely long to return:
db.jobs.count({"status":"complete","$or":[{"groups":{"$exists":false}},{"groups":{"$size":0}},{"groups":{"$in":["5e65ffc2a1e6ef0007bc5fa8"]}}]})
The executionStats for the above query are as follows
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "xxxxxx.jobs",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"$or" : [
{
"groups" : {
"$size" : 0
}
},
{
"groups" : {
"$eq" : "5e65ffc2a1e6ef0007bc5fa8"
}
},
{
"$nor" : [
{
"groups" : {
"$exists" : true
}
}
]
}
]
},
{
"status" : {
"$eq" : "complete"
}
}
]
},
"winningPlan" : {
"stage" : "FETCH",
"filter" : {
"$or" : [
{
"groups" : {
"$size" : 0
}
},
{
"groups" : {
"$eq" : "5e65ffc2a1e6ef0007bc5fa8"
}
},
{
"$nor" : [
{
"groups" : {
"$exists" : true
}
}
]
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"status" : 1,
"groups" : 1
},
"indexName" : "status_1_groups_1",
"isMultiKey" : true,
"multiKeyPaths" : {
"status" : [ ],
"groups" : [
"groups"
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"status" : [
"[\"complete\", \"complete\"]"
],
"groups" : [
"[MinKey, MaxKey]"
]
}
}
},
"rejectedPlans" : [
{
"stage" : "FETCH",
"filter" : {
"$or" : [
{
"groups" : {
"$size" : 0
}
},
{
"groups" : {
"$eq" : "5e65ffc2a1e6ef0007bc5fa8"
}
},
{
"$nor" : [
{
"groups" : {
"$exists" : true
}
}
]
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"status" : 1
},
"indexName" : "status_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"status" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"status" : [
"[\"complete\", \"complete\"]"
]
}
}
}
]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 62092,
"executionTimeMillis" : 9992,
"totalKeysExamined" : 62092,
"totalDocsExamined" : 62092,
"executionStages" : {
"stage" : "FETCH",
"filter" : {
"$or" : [
{
"groups" : {
"$size" : 0
}
},
{
"groups" : {
"$eq" : "5e65ffc2a1e6ef0007bc5fa8"
}
},
{
"$nor" : [
{
"groups" : {
"$exists" : true
}
}
]
}
]
},
"nReturned" : 62092,
"executionTimeMillisEstimate" : 9929,
"works" : 62093,
"advanced" : 62092,
"needTime" : 0,
"needYield" : 0,
"saveState" : 682,
"restoreState" : 682,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 62092,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 62092,
"executionTimeMillisEstimate" : 60,
"works" : 62093,
"advanced" : 62092,
"needTime" : 0,
"needYield" : 0,
"saveState" : 682,
"restoreState" : 682,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"status" : 1,
"groups" : 1
},
"indexName" : "status_1_groups_1",
"isMultiKey" : true,
"multiKeyPaths" : {
"status" : [ ],
"groups" : [
"groups"
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"status" : [
"[\"complete\", \"complete\"]"
],
"groups" : [
"[MinKey, MaxKey]"
]
},
"keysExamined" : 62092,
"seeks" : 1,
"dupsTested" : 62092,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
}
},
"serverInfo" : {
"host" : "xxxxxxx",
"port" : 27017,
"version" : "3.6.15",
"gitVersion" : "xxxxxx"
},
"ok" : 1}
What I am trying to understand is why does the FETCH stage take 10 seconds when the index scan in INPUT_STAGE takes 60ms. Since I am eventually doing a count() I don't really need mongoDB to return the documents, I only need it to $sum up the number of matching keys and give me the grand total.
Any idea what I am doing wrong?
The query explained there was not a count, it returned quite a few documents:
"nReturned" : 62092,
The estimated execution for each stage suggests that the index scan was expected to take 60ms, and fetching the documents from disk took the additional 9.8 seconds.
There are a couple of reasons this count required fetching the documents:
Key existence cannot be fully determined from the index
The {"$exists":false} predicate is also troublesome. When building an index the value for a document contains the value of each indexed field. There is no value for "nonexistent", so it uses null. Since a document that contains a field whose value is explicitly set to null should not match {"$exists":false}, the query executor must load each document from disk in order to determine if the field was null nor nonexistent. This means that a COUNTSCAN stage cannot be used, which further means that all of the documents to be counted must be loaded from disk.
The $or predicate does not ensure exclusivity
The query executor cannot know ahead of time that the clauses in the $or are mutually exclusive. They are in your query, but in the general case it is possible for a single document to match more than one clause in the $or, so the query executor must load the documents to ensure deduplication.
So how to eliminate the fetch stage?
If you were to query with only the $in clause, or with only the $size clause you should find the count is derived from the index scan, without needing to load any documents.
This is, if you were to run these queries separately from the client, and sum the results, you should find that the overall execution time is less than the query that requires fetching:
db.jobs.count({"status":"complete","groups":{"$size":0}})
db.jobs.count({"status":"complete","groups":{"$in":["5e65ffc2a1e6ef0007bc5fa8"]}})
For the {"groups":{"$exists":false}} predicate, you might modify the data slightly, such as ensure that the field always exists, but assign it a value that means "undefined" that can be indexed and queried.
As an example, if you were to run the following update, the groups field would then exist in all documents:
db.jobs.update({"groups":{"$exists":false}},{"$set":{"groups":false}})
And you could get the equivalent of the above count by running these 2 queries that should both be covered by an index scan, and should run faster together than the query that requires loading documents:
db.jobs.count({"status":"complete","groups":{"$size":0}})
db.jobs.count({"status":"complete","groups":{"$in":[false, "5e65ffc2a1e6ef0007bc5fa8"]}})
`
db.jobs.aggregate(
.{$match: {"$or":[
{"groups":{"$exists":false}},
{"groups":{"$in":["5e65ffc2a1e6ef0007bc5fa8"]}},
{"$size":0}
]}
},
.{$count:{"status":"complete"}
)`
If you can somehow avoid the empty array case, than the following query can be used: db.jobs.count({"status":"complete", "groups": { $in: [ null, "5e65ffc2a1e6ef0007bc5fa8" ] } })
null is equivalent to $exists: false.
Also: I'd suggest to use ObjectId instead of string as type for the groups field.
Update
$size never hit an index!
You can use the following query:
db.jobs.count({"status":"complete","$or":[
{"groups":[],
{"groups": {$in: [ null, "5e65ffc2a1e6ef0007bc5fa8" ]}
]})
this is my schema:
{
"_id" : ObjectId("5b726f066f8400317d55b9d7"),
"question" : ObjectId("5b726bf66f8400317d54ea79"),
"variableCollections" : [
{
"variableId" : ObjectId("5b726d746f8400317d553e9c"),
"variableCollectionId" : ObjectId("5b726d2e6f8400317d54feda")
}
]
}
this is the index of the schema
{
"question" : 1,
"variableCollections.variableCollectionId" : 1,
"variableCollections.variableId" : 1
}
When I try the following query even with or without hint. winningPlan always do a $eq filter before IXSCAN but it should have directly use the IXSCAN right away without filter.
db.getCollection('questionAnswers').find({
question: ObjectId('5b726bf66f8400317d54ea79'),
'variableCollections.variableId': ObjectId("5b726d746f8400317d553e9c"),
'variableCollections.variableCollectionId':ObjectId("5b726d2e6f8400317d54feda")
})
.hint("test1")
.explain({"verbosity":"allPlansExecution"})
The winningPlan is as follows
{
"stage" : "FETCH",
"filter" : {
"variableCollections.variableId" : {
"$eq" : ObjectId("5b726d746f8400317d553e9c")
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"question" : 1,
"variableCollections.variableCollectionId" : 1,
"variableCollections.variableId" : 1
},
"indexName" : "test1",
"isMultiKey" : true,
"multiKeyPaths" : {
"question" : [],
"variableCollections.variableCollectionId" : [
"variableCollections"
],
"variableCollections.variableId" : [
"variableCollections"
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"question" : [
"[ObjectId('5b726bf66f8400317d54ea79'), ObjectId('5b726bf66f8400317d54ea79')]"
],
"variableCollections.variableCollectionId" : [
"[ObjectId('5b726d2e6f8400317d54feda'), ObjectId('5b726d2e6f8400317d54feda')]"
],
"variableCollections.variableId" : [
"[MinKey, MaxKey]"
]
}
}
}
How can I force mongo to use IXSCAN without using $eq in filter to improve the performance of this query? or this is already the best performance I can get?
As per my knowledge for the find operation, MongoDB always use $eq operation internally. So I think you have the best query plan. But don't use many indexes as your query result might get slow.
Setup Details:
mongos:
RAM: 8 GB, CPUs: 2
Config Servers (Replica set of 3 config servers):
RAM: 4 GB, CPUs: 2
Shard Cluster-1 (Replica of 3 mongod):
RAM: 30 GB, CPUs: 4
Shard Cluster-2 (Replica of 3 mongod):
RAM: 30 GB, CPUs: 4
Sharding:
Collection: rptDlp, Key: {incidentOn: "hashed"}
Description:
I have more than 15 million records in a collection.
I am retrieving last page documents having sorted by a field(indexed one) of type date.
Actual Query:
db.getCollection("rptDlp").find({ incidentOn: { $gte: new Date(1513641600000), $lt: new Date(1516233600000) } })
.sort({ incidentOn: -1 }).skip(15610600).limit(10)
If I execute this query directly against mongo shard server (PRIMARY), it shows result in 14 seconds. But through mongos, it takes more than 2 minutes and due to query timeout my application results in showing an error prompt.
If we assume it as network congestion, then every query should take 2 minutes. But when i retrieve documents for first page it shows result in few seconds.
Explain query result (against mongos):
{
"queryPlanner" : {
"mongosPlannerVersion" : 1,
"winningPlan" : {
"stage" : "SHARD_MERGE_SORT",
"shards" : [
{
"shardName" : "rs0",
"connectionString" : "rs0/172.18.64.47:27017,172.18.64.48:27017,172.18.64.53:27017",
"serverInfo" : {
"host" : "UemCent7x64-70",
"port" : 27017,
"version" : "3.4.10",
"gitVersion" : "078f28920cb24de0dd479b5ea6c66c644f6326e9"
},
"plannerVersion" : 1,
"namespace" : "mydatabase.rptDlp",
"indexFilterSet" : false,
"parsedQuery" : {a
"$and" : [
{
"incidentOn" : {
"$lt" : ISODate("2018-01-19T06:13:39.000Z")
}
},
{
"incidentOn" : {
"$gte" : ISODate("2017-12-19T00:00:00.000Z")
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 10,
"inputStage" : {
"stage" : "SKIP",
"skipAmount" : 7519340,
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"incidentOn" : -1.0
},
"indexName" : "incidentOn_-1",
"isMultiKey" : false,
"multiKeyPaths" : {
"incidentOn" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"incidentOn" : [
"(new Date(1516342419000), new Date(1513641600000)]"
]
}
}
}
}
},
"rejectedPlans" : []
},
{
"shardName" : "rs1",
"connectionString" : "rs1/172.18.64.54:27017",
"serverInfo" : {
"host" : "UemCent7x64-76",
"port" : 27017,
"version" : "3.4.10",
"gitVersion" : "078f28920cb24de0dd479b5ea6c66c644f6326e9"
},
"plannerVersion" : 1,
"namespace" : "mydatabase.rptDlp",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"incidentOn" : {
"$lt" : ISODate("2018-01-19T06:13:39.000Z")
}
},
{
"incidentOn" : {
"$gte" : ISODate("2017-12-19T00:00:00.000Z")
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 10,
"inputStage" : {
"stage" : "SKIP",
"skipAmount" : 7519340,
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "SHARDING_FILTER",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"incidentOn" : -1.0
},
"indexName" : "incidentOn_-1",
"isMultiKey" : false,
"multiKeyPaths" : {
"incidentOn" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"incidentOn" : [
"(new Date(1516342419000), new Date(1513641600000)]"
]
}
}
}
}
}
},
"rejectedPlans" : []
}
]
}
},
"ok" : 1.0
}
Explain query result (against mongo shard):
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mydatabase.rptDlp",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"incidentOn" : {
"$lt" : ISODate("2018-01-19T06:13:39.000Z")
}
},
{
"incidentOn" : {
"$gte" : ISODate("2017-12-19T00:00:00.000Z")
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 10,
"inputStage" : {
"stage" : "SKIP",
"skipAmount" : 7519340,
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"incidentOn" : -1.0
},
"indexName" : "incidentOn_-1",
"isMultiKey" : false,
"multiKeyPaths" : {
"incidentOn" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"incidentOn" : [
"(new Date(1516342419000), new Date(1513641600000)]"
]
}
}
}
}
},
"rejectedPlans" : []
},
"serverInfo" : {
"host" : "UemCent7x64-69",
"port" : 27017,
"version" : "3.4.10",
"gitVersion" : "078f28920cb24de0dd479b5ea6c66c644f6326e9"
},
"ok" : 1.0
}
Any suggestion would be helpful, thanks in advance.
FINDINGS :
While we execute skip() query on mongos(routers) and on shard it behaves differently.
When executing the skip(n) and limit(m) on shard, it actually skips the 'n' number of record and only returns 'm' records mentioned in limit.
But this is not possible through mongos, because it may possible that data is divided on multiple shards and due to which shard may contains less than 'n' number of records(mentioned in skip).
Hence instead of applying skip(n) query, mongos will execute limit(n+m) query on shard by adding skip count n and limit count m to collect all records. After collecting results from all shard mongos will apply skip on assembled records.
Also if data is huge mongos fetches that data in chunks by using getMore command, which also slow down the performance.
As per mongo doc reference from : https://docs.mongodb.com/v3.0/core/sharded-cluster-query-router/
If the query limits the size of the result set using the limit() cursor method, the mongos instance passes that limit to the shards and then re-applies the limit to the result before returning the result to the client.
If the query specifies a number of records to skip using the skip() cursor method, the mongos cannot pass the skip to the shards, but rather retrieves unskipped results from the shards and skips the appropriate number of documents when assembling the complete result. However, when used in conjunction with a limit(), the mongos will pass the limit plus the value of the skip() to the shards to improve the efficiency of these operations.
Is there any solution to improve skip query performance executed via mongos(routers)?
Thanks in advance.
If you have a HASHed shard key you won't be able to use range queries to find which nodes each item is on so it will need to scan all nodes within the sharded cluster. So the slowest query will be the slowest node in the set plus time to aggregate the results on the mongos before sending them back to the client.
Using a HASHed shard key scatters the results throughout the cluster so you'll only be able to query based on a key match.
Check out the documentation here - https://docs.mongodb.com/manual/core/hashed-sharding/
If you don't mind the query doing a full cluster scan then you could make it more efficient by adding a standard index on incidentOn this will make the query a lot faster on each node but still won't be able to pinpoint the nodes in the cluster.
I have one collection with 3 million documents and the following indexes:
{ ts : 1 } , {u_id: 1}
Note that these are two separate ascending indexes, not a compound index.
When I run this query:
db.collection.find({u_id: 'user'}).sort({ts : -1}).skip(0).limit(1)
it takes +100ms. I have the following logs:
2017-04-15T06:42:01.147+0000 I COMMAND [conn783] query
db.collection query: { orderby: { ts: -1 }, $query: {
u_id: "user-ki-id } } planSummary: IXSCAN { u_id:
1 }, IXSCAN { u_id: 1 } ntoreturn:1 ntoskip:0 keysExamined:10795
docsExamined:10795 hasSortStage:1 cursorExhausted:1 keyUpdates:0
writeConflicts:0 numYields:86 nreturned:1 reslen:771 locks:{ Global: {
acquireCount: { r: 174 } }, Database: { acquireCount: { r: 87 } },
Collection: { acquireCount: { r: 87 } } } 246ms
A few notable points about the problem:
There is no other load on MongoDB i.e. no other queries which take +100ms
This is happening every minute; I think I am storing data every minute so this is happening
The query flow is to first run the read query (as above), then the next query is a bulk insertion. This flow is repeated every one minute.
So my questions are:
Why is it happening? Are there any design flaws in my indexing?
Might it be worthwhile to change indexing to be descending, like {ts: -1}? What is the actual difference between these indexes?
According to MongoDB documentation, when you are doing sorting with order then result will pick from disk not "in-memory". Does this explain why it takes +100ms?
Can anybody explain me profiling log in detail level?
Is it desired behaviour of MongoDB?
The same thing is also happening when I run a range search on this collection; this takes 3-5 seconds.
EDIT:
I have only add {u_id: 1, ts: -1} index. Remove all other index (except _id). Still in first time query execution taking +100ms. This should not happen.
Query:
db.getCollection('locations') .find({u_id: "USR-WOWU"})
.sort({ts: -1}) .explain(true)
OutPut::
/* 1 */ {
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "db_name.collection_name",
"indexFilterSet" : false,
"parsedQuery" : {
"user_id" : {
"$eq" : "USR-WOWU"
}
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"u_id" : 1.0,
"ts" : -1.0
},
"indexName" : "u_id_1_ts_-1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"u_id" : [
"[\"USR-WOWU\", \"USR-WOWU\"]"
],
"ts" : [
"[MaxKey, MinKey]"
]
}
}
},
"rejectedPlans" : []
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 164,
"executionTimeMillis" : 119,
"totalKeysExamined" : 164,
"totalDocsExamined" : 164,
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 164,
"executionTimeMillisEstimate" : 120,
"works" : 165,
"advanced" : 164,
"needTime" : 0,
"needYield" : 0,
"saveState" : 3,
"restoreState" : 3,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 164,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 164,
"executionTimeMillisEstimate" : 0,
"works" : 165,
"advanced" : 164,
"needTime" : 0,
"needYield" : 0,
"saveState" : 3,
"restoreState" : 3,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"u_id" : 1.0,
"ts" : -1.0
},
"indexName" : "u_id_1_ts_-1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"u_id" : [
"[\"USR-WOWU\", \"USR-WOWU\"]"
],
"ts" : [
"[MaxKey, MinKey]"
]
},
"keysExamined" : 164,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
},
"allPlansExecution" : []
},
"serverInfo" : {
"host" : "manish",
"port" : 22022,
"version" : "3.2.13",
"gitVersion" : "23899209cad60aaafe114f6aea6cb83025ff51bc"
},
"ok" : 1.0 }
Please copy above jSON and format into any editor.
After above query, the next same query will response with in ~2 ms. But When I do few insertion then after one min same thing will be repeated. (1st time query will take time +100ms and then it will take ~2ms.)
Is something missing or anything is required to configuration in my mongoDB ??
Why is it happening
The docsExamined:10795 and hasSortStage:1 portions of this log line indicates that the query is scanning 10,795 from disk and then sorting the results in memory. A guide on interpreting log lines can be found here.
A performance improvement can likely be gained by indexing this query to avoid the in-memory sort.
For this query, you should try creating the index { 'u_id' : 1, 'ts' : -1 }.
Is it really worthfull if I will change indexing like {ts: -1} in descending order.
Indexes can be read in either direction, so the index order isn't super important on single field indexes. However, sort ordering can be very important in compound indexes.
Updated
Based on the explain plan, the query is now properly using the index to read the results from the index in order, which avoids the in-memory sort. It looks like this knocked off ~100ms off the query.
However, it looks like this query is no longer using .skip(0).limit(1). Can you add these back in and see if performance improves?
There doesn't appear to be anything wrong with your deployment; this behavior seems normal for queries that are not fully indexed.
Re-running the exact same query will be quick because the existing results ("the working set") are already stored in memory. Inserting new data can make the results of the query change, meaning the results may need to be read back into memory again.
I have the following query.
db.getCollection('logs').find({'uid.$id': {
'$in': [
ObjectId("580e3397812de36b86d68c04"),
ObjectId("580e33a9812de36b86d68c0b"),
ObjectId("580e339a812de36b86d68c09"),
ObjectId("580e339a812de36b86d68c08"),
ObjectId("580e33a9812de36b86d68c0a"),
ObjectId("580e33bd812de36b86d68c11"),
ObjectId("580e33c0812de36b86d68c13")
]}, levelno: { '$gte': 10 }
}).sort({_id: 1})
This should return 1847 documents. However, when executing it, I only get 1000 documents, which is the cursor's batchSize and then the cursor closes (setting its cursorId to 0), as if all documents were returned.
If I take out the sorting, then I get all 1847 documents.
So my question is, why does it silently fail when using sorting with the $in operator?
EDIT
Using explain gives the following output
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "session.logs",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"levelno" : {
"$gte" : 10
}
},
{
"uid.$id" : {
"$in" : [
ObjectId("580e3397812de36b86d68c04"),
ObjectId("580e339a812de36b86d68c08"),
ObjectId("580e339a812de36b86d68c09"),
ObjectId("580e33a9812de36b86d68c0a"),
ObjectId("580e33a9812de36b86d68c0b"),
ObjectId("580e33bd812de36b86d68c11"),
ObjectId("580e33c0812de36b86d68c13")
]
}
}
]
},
"winningPlan" : {
"stage" : "SORT",
"sortPattern" : {
"_id" : 1
},
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"uid.$id" : 1,
"levelno" : 1,
"_id" : 1
},
"indexName" : "uid.$id_1_levelno_1__id_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"uid.$id" : [
"[ObjectId('580e3397812de36b86d68c04'), ObjectId('580e3397812de36b86d68c04')]",
"[ObjectId('580e339a812de36b86d68c08'), ObjectId('580e339a812de36b86d68c08')]",
"[ObjectId('580e339a812de36b86d68c09'), ObjectId('580e339a812de36b86d68c09')]",
"[ObjectId('580e33a9812de36b86d68c0a'), ObjectId('580e33a9812de36b86d68c0a')]",
"[ObjectId('580e33a9812de36b86d68c0b'), ObjectId('580e33a9812de36b86d68c0b')]",
"[ObjectId('580e33bd812de36b86d68c11'), ObjectId('580e33bd812de36b86d68c11')]",
"[ObjectId('580e33c0812de36b86d68c13'), ObjectId('580e33c0812de36b86d68c13')]"
],
"levelno" : [
"[10.0, inf.0]"
],
"_id" : [
"[MinKey, MaxKey]"
]
}
}
}
}
},
"rejectedPlans" : [
{
"stage" : "SORT",
"sortPattern" : {
"_id" : 1
},
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"levelno" : 1,
"_id" : 1,
"uid.$id" : 1
},
"indexName" : "levelno_1__id_1_uid.$id_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"levelno" : [
"[10.0, inf.0]"
],
"_id" : [
"[MinKey, MaxKey]"
],
"uid.$id" : [
"[ObjectId('580e3397812de36b86d68c04'), ObjectId('580e3397812de36b86d68c04')]",
"[ObjectId('580e339a812de36b86d68c08'), ObjectId('580e339a812de36b86d68c08')]",
"[ObjectId('580e339a812de36b86d68c09'), ObjectId('580e339a812de36b86d68c09')]",
"[ObjectId('580e33a9812de36b86d68c0a'), ObjectId('580e33a9812de36b86d68c0a')]",
"[ObjectId('580e33a9812de36b86d68c0b'), ObjectId('580e33a9812de36b86d68c0b')]",
"[ObjectId('580e33bd812de36b86d68c11'), ObjectId('580e33bd812de36b86d68c11')]",
"[ObjectId('580e33c0812de36b86d68c13'), ObjectId('580e33c0812de36b86d68c13')]"
]
}
}
}
}
},
{
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"levelno" : {
"$gte" : 10
}
},
{
"uid.$id" : {
"$in" : [
ObjectId("580e3397812de36b86d68c04"),
ObjectId("580e339a812de36b86d68c08"),
ObjectId("580e339a812de36b86d68c09"),
ObjectId("580e33a9812de36b86d68c0a"),
ObjectId("580e33a9812de36b86d68c0b"),
ObjectId("580e33bd812de36b86d68c11"),
ObjectId("580e33c0812de36b86d68c13")
]
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"_id" : 1
},
"indexName" : "_id_",
"isMultiKey" : false,
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"_id" : [
"[MinKey, MaxKey]"
]
}
}
}
]
},
"ok" : 1
}
What's happening is that this sorted query must be performed in-memory as it's not supported by an index, and this limits the results to 32 MB. This behavior is documented here, with a JIRA about addressing this here.
Furthermore, you can't define an index to support this query as you're sorting on a field that isn't part of the query, and neither of these cases apply:
If the sort keys correspond to the index keys or an index prefix,
MongoDB can use the index to sort the query results. A prefix of a
compound index is a subset that consists of one or more keys at the
start of the index key pattern.
...
An index can support sort operations on a non-prefix subset of the
index key pattern. To do so, the query must include equality
conditions on all the prefix keys that precede the sort keys.
You should be able to work around the limitation by using the aggregation framework which can be instructed to use temporary files for its pipeline stage outputs if required via the allowDiskUse: true option:
db.getCollection('logs').aggregate([
{$match: {'uid.$id': {
'$in': [
ObjectId("580e3397812de36b86d68c04"),
ObjectId("580e33a9812de36b86d68c0b"),
ObjectId("580e339a812de36b86d68c09"),
ObjectId("580e339a812de36b86d68c08"),
ObjectId("580e33a9812de36b86d68c0a"),
ObjectId("580e33bd812de36b86d68c11"),
ObjectId("580e33c0812de36b86d68c13")
]}, levelno: { '$gte': 10 }
}},
{$sort: {_id: 1}}
], { allowDiskUse: true })
You can use objsLeftInBatch() method to determine how many object are left in batch and iterate over it.
You can override the size and limit of the cursor batch size using cursor.batchSize(size) and cursor.limit(limit)