For the below query:
db.restaurants.find({"_id" : ObjectId("5aabce4f827d70999ae5f5f7")}).explain()
I'm getting the below query plan:
/* 1 */
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.restaurants",
"indexFilterSet" : false,
"parsedQuery" : {
"_id" : {
"$eq" : ObjectId("5aabce4f827d70999ae5f5f7")
}
},
"winningPlan" : {
"stage" : "IDHACK"
},
"rejectedPlans" : []
},
"serverInfo" : {
"host" : "CHNMCT136701L",
"port" : 27017,
"version" : "3.6.3",
"gitVersion" : "9586e557d54ef70f9ca4b43c26892cd55257e1a5"
},
"ok" : 1.0
}
Now I have some questions on my mind.
What is meant by stage: IDHACK ? How its different from COLLSCAN? Whether this has anything to do with performance optimization? If yes, what are the scenarios in which MongoDB goes for this winningplan ? If I create an index on _id, whether IDHACK will be replaced by the respective indexing plan?
Can anybody clarify this?
https://jira.mongodb.org/browse/SERVER-16891
The query idhack path (a performance optimization to reduce planning/execution overhead for operations with query predicates of the form {_id: })
COLLSCAN doesn't use an index. IDHACK uses the _id index.
yes it does have to do with performance optimization.
Related
I have a MongoDB 3.4 replicaset with a collection "page" where all documents have a "site" field (which is an ObjectId). "site" field has only 100 possible values. I have created an index on this field via db.page.createIndex({site:1}). There are about 3.6 millions documents in the "page" collection.
Now, I see logs like this in the mongod.log file
command db.page command: count { count: "page", query: { site: { $in:
[ ObjectId('A'), ObjectId('B'), ObjectId('C'), ObjectId('D'),
ObjectId('E'), ObjectId('F'), ObjectId('G'), ObjectId('H'),
ObjectId('I'), ObjectId('J'),, ObjectId('K'),, ObjectId('L') ] } } }
planSummary: IXSCAN { site: 1 } keysExamined:221888
docsExamined:221881 numYields:1786 reslen:44...
I don't understand the "keysExamined:221888" -> there are only 100 possible values, so my understanding would be that I would see keysExamined:100 at most, and here I would actually expect to see "keysExamined:12". What am I missing? For info, here is an explain on the request:
PRIMARY> db.page.explain().count({ site: { $in: [ ObjectId('A'), ObjectId('F'), ObjectId('H'), ObjectId('G'), ObjectId('I'), ObjectId('B'), ObjectId('C'), ObjectId('J'), ObjectId('K'), ObjectId('D'), ObjectId('E'), ObjectId('L') ] } } )
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "db.page",
"indexFilterSet" : false,
"parsedQuery" : {
"site" : {
"$in" : [
ObjectId("B"),
ObjectId("C"),
ObjectId("D"),
ObjectId("E"),
ObjectId("F"),
ObjectId("A"),
ObjectId("G"),
ObjectId("H"),
ObjectId("I"),
ObjectId("J"),
ObjectId("K"),
ObjectId("L")
]
}
},
"winningPlan" : {
"stage" : "COUNT",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"site" : 1
},
"indexName" : "site_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"site" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"site" : [
"[ObjectId('B'), ObjectId('B')]",
"[ObjectId('C'), ObjectId('C')]",
"[ObjectId('D'), ObjectId('D')]",
"[ObjectId('E'), ObjectId('E')]",
"[ObjectId('F'), ObjectId('F')]",
"[ObjectId('A'), ObjectId('A')]",
"[ObjectId('G'), ObjectId('G')]",
"[ObjectId('H'), ObjectId('H')]",
"[ObjectId('I'), ObjectId('I')]",
"[ObjectId('J'), ObjectId('J')]",
"[ObjectId('K'), ObjectId('K')]",
"[ObjectId('L'), ObjectId('L')]"
]
}
}
}
},
"rejectedPlans" : [ ]
},
"serverInfo" : {
"host" : "9a18351b5211",
"port" : 27017,
"version" : "3.4.18",
"gitVersion" : "4410706bef6463369ea2f42399e9843903b31923"
},
"ok" : 1
}
PRIMARY>
I know we are on a fairly old MongoDB version and we are planning to upgrade soon to 5.0.X (via incremental upgrade to 3.6 / 4.0 / 4.2 / 4.4). Is there a fix in the next versions to your knowledge?
So after checking I realized I was expecting mongodb to use counted b-trees for its index but that is not the case, hence mongo has indeed to go through all the subkeys of the index. Details in jira.mongodb.org/plugins/servlet/mobile#issue/server-7745
Hence at the moment count request will run in O(N) for N docs if indexes are used
I'm using MongoDB 4.4.3 to query a random record from a collection :
db.MyCollection.aggregate([{ $sample: { size: 1 } }])
This query takes 20s (when a find query takes 0.2s)
Mongo doc states :
If all the following conditions are met, $sample uses a pseudo-random cursor to select documents:
$sample is the first stage of the pipeline
N is less than 5% of the total documents in the collection
The collection contains more than 100 documents
Here
$sample is the only stage of the pipeline
N = 1
MyCollection contains 46 millions documents
This problem is similar to MongoDB Aggregation with $sample very slow, which does not provide an answer for Mongo4.4.3
So why is this query so slow ?
Details
Query Planner
db.MyCollection.aggregate([{$sample: {size: 1}}]).explain()
{
"stages" : [
{
"$cursor" : {
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "DATABASE.MyCollection",
"indexFilterSet" : false,
"winningPlan" : {
"stage" : "MULTI_ITERATOR"
},
"rejectedPlans" : [ ]
}
}
},
{
"$sampleFromRandomCursor" : {
"size" : 1
}
}
],
"serverInfo" : {
"host" : "mongodb4-3",
"port" : 27017,
"version" : "4.4.3",
"gitVersion" : "913d6b62acfbb344dde1b116f4161360acd8fd13"
},
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1611128334, 1),
"signature" : {
"hash" : BinData(0,"ZDxiOTnmG/zLKNtDIAWhxmjHHLM="),
"keyId" : 6915708270745223171
}
},
"operationTime" : Timestamp(1611128334, 1)
}
Execution stats
I found that a count query with two parameters was taking longer than expected on our production database. I added an index (which took a few hours, this collection has over 100 million documents) that had both fields, which improved the results of my .explain() call from IXSCAN to COUNT_SCAN.
Looking at the logs now, I see that there are still tons of IXSCAN planSummaries for this count query:
2019-07-17T13:02:34.561+0000 I COMMAND [conn25293] command
DatabaseName.CollectionName command: count { count: "CollectionName",
query: { userId: "5a4f82d4e4b09d5e0cdbae15", status: "FINISHED" } }
planSummary: IXSCAN { userId: 1 } keysExamined:299 docsExamined:299
numYields:7 reslen:44 locks:{ Global: { acquireCount: { r: 16 } },
Database: { acquireCount: { r: 8 } }, Collection: { acquireCount: { r:
8 } } } protocol:op_query 124ms
There is an index on the userId field, but I don't understand why this count query isn't hitting my new index. Here's the explain results:
db.CollectionName.explain().count({ userId: "59adb07de4b00782f7620c11", status: "FINISHED" })
/* 1 */
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "DatabaseName.CollectionName",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"status" : {
"$eq" : "FINISHED"
}
},
{
"userId" : {
"$eq" : "59adb07de4b00782f7620c11"
}
}
]
},
"winningPlan" : {
"stage" : "COUNT",
"inputStage" : {
"stage" : "COUNT_SCAN",
"keyPattern" : {
"userId" : 1.0,
"status" : 1.0
},
"indexName" : "idx_userId_status",
"isMultiKey" : false,
"multiKeyPaths" : {
"userId" : [],
"status" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"indexBounds" : {
"startKey" : {
"userId" : "59adb07de4b00782f7620c11",
"status" : "FINISHED"
},
"startKeyInclusive" : true,
"endKey" : {
"userId" : "59adb07de4b00782f7620c11",
"status" : "FINISHED"
},
"endKeyInclusive" : true
}
}
},
"rejectedPlans" : []
},
"serverInfo" : {
"host" : "ip-10-114-1-8",
"port" : 27017,
"version" : "3.4.16",
"gitVersion" : "0d6a9242c11b99ddadcfb6e86a850b6ba487530a"
},
"ok" : 1.0
}
Checking the index stats, I see that it has been used quite a bit
{
"name" : "idx_userId_status",
"key" : {
"userId" : 1.0,
"status" : 1.0
},
"host" : "ip-address:27017",
"accesses" : {
"ops" : NumberLong(541337),
"since" : ISODate("2019-07-16T14:34:25.281Z")
}
}
I'm assuming that this means it is used sometimes, but for some reason not used at other times. Why would that be the case?
In my understanding of query planning in MongoDB, the DB keeps some kind of cache of query planning to be able to chose the right one without much thinking.
My guess is, in the case of the IXSCAN, the DB thought that using this one or the other wouldn't make that much of a difference.
That being said, you can still use the explain(true) (or more exactly explain("allPlansExecution") that will try to execute all possible plans. And then if you analyze the executionTimeMillis, you might see some difference that would explain the choice of the query planning.
Hope this helps :)
I'm trying to figure out the best way to create an index mongo can use to make this query faster:
"query" : {
"deleted_at" : null,
"placed_at" : {
"$exists" : true
},
"exported_at" : null,
"failed_export" : false
}
Currently, it's taking 2-3 min to go through the table even when there's no results. Explain shows that it's looking through hundreds of thousands of records and not using an index.
I tried running this:
db.some_table.createIndex({deleted_at: -1, placed_at: 1, exported_at: -1, failed_export: -1}, {background: true})
When I run the query afterward:
db.some_table.find({deleted_at: null, placed_at: {$exists: true}, exported_at: null, failed_export: false}).explain("executionStats")
I see it's using the correct index, but it's very slow. It's examining all 330494 rows. Here are the execution stats:
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 0,
"executionTimeMillis" : 1585381,
"totalKeysExamined" : 330494,
"totalDocsExamined" : 330494,
"executionStages" : {
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"placed_at" : {
"$exists" : true
}
},
{
"deleted_at" : {
"$eq" : null
}
},
{
"exported_at" : {
"$eq" : null
}
},
{
"failed_export" : {
"$eq" : false
}
}
]
},
The winning plan was:
"winningPlan" : {
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"placed_at" : {
"$exists" : true
}
},
{
"deleted_at" : {
"$eq" : null
}
},
{
"exported_at" : {
"$eq" : null
}
},
{
"failed_export" : {
"$eq" : false
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"placed_at" : 1
},
"indexName" : "placed_at_1",
"isMultiKey" : false,
"direction" : "forward",
"indexBounds" : {
"placed_at" : [
"[MinKey, MaxKey]"
]
}
}
},
And it did list the index I created in one of the rejected plans.
Any ideas on why it would be going through every record in the database? This is hurting our performance.
I've tried hinting at the right record and that didn't seem to help much.
Instead of querying for deleted_at: null it would be better to create a new status field or an isDeleted field and configure your app servers to populate that field. Then, you can create a more effective index on this field to find all of your soft-deleted documents.
From the Performance Best Practices for MongoDB white paper:
Avoid negation in queries. Like most database systems, MongoDB does not index the absence of values and negation conditions may require scanning all documents. If negation is the only condition and it is not selective (for example, querying an orders table where 99% of the orders are complete to identify those that have not been fulfilled), all records will need to be scanned.
I Just read this link Mongodb Explain for Aggregation framework but not explain my problem
I want retrieve information about aggregation like db.coll.find({bla:foo}).explain()
I tried
db.coll.aggregate([
my-op
],
{
explain: true
})
the result is not Explain, but the query on Database.
I have tried also
db.runCommand({
aggregate: "mycoll",
pipeline: [
my- op
],
explain: true
})
I retrieved information with this command, but i haven't millis, nscannedObjects etc...
I use mongoDb 2.6.2
Aggregations don't run like traditional queries and you can't run the explain on them. They are actually classified as commands and though they make use of indexes you can't readily find out how they are being executed in real-time.
Your best bet is to take the $match portion of your aggregation and run it as a query with explain to figure out how the indexes are performing and get an idea on nscanned.
I am not sure how you managed to fail getting explain information. In 2.6.x this information is available and you can explain your aggregation results:
db.orders.aggregate([
# put your whole aggregation query
], {
explain: true
})
which gives me something like:
{
"stages" : [
{
"$cursor" : {
"query" : {
"a" : 1
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.a",
"indexFilterSet" : false,
"parsedQuery" : {
"a" : {
"$eq" : 1
}
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"a" : {
"$eq" : 1
}
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
}
}
}
],
"ok" : 1
}