I have approximately 40M documents in a mongo collection. There is an index on the location.country field:
MongoDB Enterprise cluster-0-shard-0:PRIMARY> db.cases.getIndexes()
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_"
},
//...
{
"v" : 2,
"key" : {
"location.country" : -1
},
"name" : "countriesIdx",
"collation" : {
"locale" : "en_US",
"caseLevel" : false,
"caseFirst" : "off",
"strength" : 2,
"numericOrdering" : false,
"alternate" : "non-ignorable",
"maxVariable" : "punct",
"normalization" : false,
"backwards" : false,
"version" : "57.1"
}
},
//...
]
But queries don't use it:
MongoDB Enterprise cluster-0-shard-0:PRIMARY> db.cases.find({'location.country':'ghana'}).explain({verbosity: 'executionStats'})
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "covid19.cases",
"indexFilterSet" : false,
"parsedQuery" : {
"location.country" : {
"$eq" : "ghana"
}
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"location.country" : {
"$eq" : "ghana"
}
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 0,
"executionTimeMillis" : 195892,
"totalKeysExamined" : 0,
"totalDocsExamined" : 39264034,
"executionStages" : {
"stage" : "COLLSCAN",
"filter" : {
"location.country" : {
"$eq" : "ghana"
}
},
"nReturned" : 0,
"executionTimeMillisEstimate" : 99032,
"works" : 39264036,
"advanced" : 0,
"needTime" : 39264035,
"needYield" : 0,
"saveState" : 39503,
"restoreState" : 39503,
"isEOF" : 1,
"direction" : "forward",
"docsExamined" : 39264034
},
"allPlansExecution" : [ ]
},
"serverInfo" : {
"host" : "cluster-0-shard-00-01-vwhx6.mongodb.net",
"port" : 27017,
"version" : "4.4.8",
"gitVersion" : "83b8bb8b6b325d8d8d3dfd2ad9f744bdad7d6ca0"
},
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1629732226, 1),
"signature" : {
"hash" : BinData(0,"piKWDwLDv7FRcnwCe51PZDLR4UM="),
"keyId" : NumberLong("6958739380580122625")
}
},
"operationTime" : Timestamp(1629732226, 1)
}
Do I need to set up the index differently or do something else to get mongo to use the index? I have tried to hint that it should, but it still does a COLLSCAN. While the examples I've shown above are using mongosh, the behaviour is the same in my node app using mongoose.
I've been using MongoDB 3.2 for years, and I found that currentOp() and Cursor.explain() shows different result for the same query.
I found that several queries are being executed very long time(20+ sec), I thought it's not possible because I tested a lot and have index on it. Queries are generally same, as far as I saw. I think they're causing entire database locks because when some queries get slower, almost 40-50 queries are stuck in currentOp() .
But when I executed same read operation in shell, it ran very quickly, as I intended. I'd taken the same query from currentOp and executed.
When the database locks (I think it's locked), CPU utilization hits 100% for hours, and my application is going to be really slow. I'm monitoring currentOp every 1 minute and when it doesn't end for seconds, I had to restart the application then it goes normal.
Here's the one of query that takes very long time. Once it happens, 40-50 other but similar queries are also getting stuck in currentOp.
{
"desc" : "conn32882",
"threadId" : "140677207643904",
"connectionId" : 32882,
"client" : "client",
"active" : true,
"opid" : 1374027609,
"secs_running" : 20,
"microsecs_running" : NumberLong(20560351),
"op" : "query",
"ns" : "db.collection",
"query" : {
"find" : "collection",
"filter" : {
"p" : {
"$gt" : 0
},
"type" : "canvas",
"id" : {
"$in" : [
576391,
570391,
767422
]
}
},
"sort" : {
"_id" : -1
},
"projection" : {
},
"limit" : 5000,
"returnKey" : false,
"showRecordId" : false
},
"numYields" : 2761,
"locks" : {
"Global" : "r",
"Database" : "r",
"Collection" : "r"
},
"waitingForLock" : false,
"lockStats" : {
"Global" : {
"acquireCount" : {
"r" : NumberLong(5524)
},
"acquireWaitCount" : {
"r" : NumberLong(349)
},
"timeAcquiringMicros" : {
"r" : NumberLong(6613952)
}
},
"Database" : {
"acquireCount" : {
"r" : NumberLong(2762)
}
},
"Collection" : {
"acquireCount" : {
"r" : NumberLong(2762)
}
}
}
}
And here's an output of same query in shell with executionStats option.
Command :
db.canvasdatas.find({
"p" : {
"$gt": 0
},
"type": "canvas",
"id" : {
"$in": [
576391,
570391,
767422
]
}
}, {}).sort({ _id: -1 }).limit(5000).explain('executionStats');
Output :
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "db.collection",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"type" : {
"$eq" : "canvas"
}
},
{
"p" : {
"$gt" : 0
}
},
{
"id" : {
"$in" : [
570391,
576391,
767422
]
}
}
]
},
"winningPlan" : {
"stage" : "SORT",
"sortPattern" : {
"_id" : -1
},
"limitAmount" : 5000,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"type" : {
"$eq" : "canvas"
}
},
{
"p" : {
"$gt" : 0
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"id" : 1
},
"indexName" : "id_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"id" : [
"[570391.0, 570391.0]",
"[576391.0, 576391.0]",
"[767422.0, 767422.0]"
]
}
}
}
}
},
"rejectedPlans" : [
{
"stage" : "SORT",
"sortPattern" : {
"_id" : -1
},
"limitAmount" : 5000,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"id" : {
"$in" : [
570391,
576391,
767422
]
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"p" : 1,
"type" : 1
},
"indexName" : "p_1_type_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"p" : [
"(0.0, inf.0]"
],
"type" : [
"[\"canvas\", \"canvas\"]"
]
}
}
}
}
},
{
"stage" : "SORT",
"sortPattern" : {
"_id" : -1
},
"limitAmount" : 5000,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"type" : {
"$eq" : "canvas"
}
},
{
"id" : {
"$in" : [
570391,
576391,
767422
]
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"p" : 1
},
"indexName" : "p_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"p" : [
"(0.0, inf.0]"
]
}
}
}
}
},
{
"stage" : "SORT",
"sortPattern" : {
"_id" : -1
},
"limitAmount" : 5000,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"p" : {
"$gt" : 0
}
},
{
"id" : {
"$in" : [
570391,
576391,
767422
]
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"type" : 1
},
"indexName" : "type_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"type" : [
"[\"canvas\", \"canvas\"]"
]
}
}
}
}
},
{
"stage" : "LIMIT",
"limitAmount" : 5000,
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"type" : {
"$eq" : "canvas"
}
},
{
"p" : {
"$gt" : 0
}
},
{
"id" : {
"$in" : [
570391,
576391,
767422
]
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"_id" : 1
},
"indexName" : "_id_",
"isMultiKey" : false,
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "backward",
"indexBounds" : {
"_id" : [
"[MaxKey, MinKey]"
]
}
}
}
}
]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 2,
"executionTimeMillis" : 0,
"totalKeysExamined" : 5,
"totalDocsExamined" : 2,
"executionStages" : {
"stage" : "SORT",
"nReturned" : 2,
"executionTimeMillisEstimate" : 0,
"works" : 10,
"advanced" : 2,
"needTime" : 6,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"sortPattern" : {
"_id" : -1
},
"memUsage" : 906,
"memLimit" : 33554432,
"limitAmount" : 5000,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"nReturned" : 0,
"executionTimeMillisEstimate" : 0,
"works" : 6,
"advanced" : 0,
"needTime" : 3,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"type" : {
"$eq" : "canvas"
}
},
{
"p" : {
"$gt" : 0
}
}
]
},
"nReturned" : 2,
"executionTimeMillisEstimate" : 0,
"works" : 5,
"advanced" : 2,
"needTime" : 2,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 2,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 2,
"executionTimeMillisEstimate" : 0,
"works" : 5,
"advanced" : 2,
"needTime" : 2,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"id" : 1
},
"indexName" : "id_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"id" : [
"[570391.0, 570391.0]",
"[576391.0, 576391.0]",
"[767422.0, 767422.0]"
]
},
"keysExamined" : 5,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
}
}
}
},
"serverInfo" : {
"host" : "host",
"port" : 27017,
"version" : "3.2.21",
"gitVersion" : ""
},
"ok" : 1
}
I googled it with this unexpected behavior but I didn't find any solution. So I had to restart server when it hangs..
To help understanding, here's my case :
I'm using MongoDB Cloud manager and DB instances are hosted on AWS EC2
I'm using ReplicaSet and my read preference is secondaryPreferred. So all read operations are going toward the secondary node.
MongoDB version is 3.2
I created index for every fields used in the query (per field)
I executed same query both in Primary Node and Secondary Node (with slaveOk)
The collection has 20M objects.
It isn't happening every time for the same query. I think that there's something else that take effect on the performance (such as replicating?)
But I don't know how to debug this case. Is there any better idea for this issue or way to debug?
Thanks,
Edit : I still don't get what's the reason but tried to solve it by making any changes. I removed $gt and it seems to work. But $gt has no problem in my previous execution, and I think it's because there's few user at this moment.
Edit : I have MongoDB cloud manager so I could do some metrics change, I think Query targeting is increased while I still don't know. Normally it's about 100 per 1 document, but today it hits over 2K. May be related?
Query targeting for 2month
I think this explains what you see:
"lockStats" : {
"Global" : {
"acquireCount" : {
"r" : NumberLong(5524)
},
"acquireWaitCount" : {
"r" : NumberLong(349)
},
"timeAcquiringMicros" : {
"r" : NumberLong(6613952)
}
},
It appears that the stalled read takes its time trying to acquire the read intent lock. This is expected in older version of MongoDB (pre-4.0) since reading from a secondary would wait while oplog apply operation is in progress. This is done so that the secondary read would not read data in its inconsistent form while oplog is being applied.
This is a longstanding secondary read behaviour dating from the earliest MongoDB versions, and I guess you're seeing this now because your database has reached a point where it's busy enough for this to be an issue.
This situation was improved in MongoDB 4.0 and newer via SERVER-34192, allowing secondary reads to proceed while oplog apply is in progress.
I am new to mongodb and came across some strange behaviour of aggregation framework.
I have a collection named 'billingData', this collection has approximately 2M documents.
I am comparing two queries which give me same output but their execution time different.
Query 1:
db.billingData.find().sort({"_id":-1}).skip(100000).limit(50)
Execution Plan:
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "billingDetails.billingData",
"indexFilterSet" : false,
"parsedQuery" : {},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 50,
"inputStage" : {
"stage" : "SKIP",
"skipAmount" : 100000,
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"_id" : 1
},
"indexName" : "_id_",
"isMultiKey" : false,
"multiKeyPaths" : {
"_id" : []
},
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "backward",
"indexBounds" : {
"_id" : [
"[MaxKey, MinKey]"
]
}
}
}
}
},
"rejectedPlans" : []
},
"serverInfo" : {
"host" : "ip-172-60-62-125",
"port" : 27017,
"version" : "3.6.3",
"gitVersion" : "9586e557d54ef70f9ca4b43c26892cd55257e1a5"
},
"ok" : 1.0
}
Query 2:
db.billingData.aggregate([
{$sort : {"_id":-1}},
{$skip:100000},
{$limit:50}
])
Execution Plan:
{
"stages" : [
{
"$cursor" : {
"query" : {},
"sort" : {
"_id" : -1
},
"limit" : NumberLong(100050),
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "billingDetails.billingData",
"indexFilterSet" : false,
"parsedQuery" : {},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"_id" : 1
},
"indexName" : "_id_",
"isMultiKey" : false,
"multiKeyPaths" : {
"_id" : []
},
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "backward",
"indexBounds" : {
"_id" : [
"[MaxKey, MinKey]"
]
}
}
},
"rejectedPlans" : []
}
}
},
{
"$skip" : NumberLong(100000)
}
],
"ok" : 1.0
}
I was expecting same results from aggregation framework and find query but find query returned results in 2sec and aggregation took 16sec.
Although in both the queries, I am sorting my documents in descending order(on the basis of _id) and fetching 50 records after skipping 100,000 records.
Can someone explain me why aggregation framework is working this way?
What can I do to make it performance wise similar to find query?
I have a list of news article items which I am tagging for entities, and topic tags.
my query
db["fmetadata"].find({'$and': [{'$text': {'$search': 'apple trump'}}, {'$or':
[{'entities': {'$elemMatch': {'$regex': 'apple|trump'}}}, {'tags': {'$elemMatch': {'$regex': 'apple|trump'}}}]}]}).explain()
query plan
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "dfabric.fmetadata",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"$or" : [
{
"entities" : {
"$elemMatch" : {
"$regex" : "apple|trump"
}
}
},
{
"tags" : {
"$elemMatch" : {
"$regex" : "apple|trump"
}
}
}
]
},
{
"$text" : {
"$search" : "apple trump",
"$language" : "english",
"$caseSensitive" : false,
"$diacriticSensitive" : false
}
}
]
},
"winningPlan" : {
"stage" : "FETCH",
"filter" : {
"$or" : [
{
"entities" : {
"$elemMatch" : {
"$regex" : "apple|trump"
}
}
},
{
"tags" : {
"$elemMatch" : {
"$regex" : "apple|trump"
}
}
}
]
},
"inputStage" : {
"stage" : "TEXT",
"indexPrefix" : {
},
"indexName" : "title_text_tags_text_entities_text",
"parsedTextQuery" : {
"terms" : [
"appl",
"trump"
],
"negatedTerms" : [ ],
"phrases" : [ ],
"negatedPhrases" : [ ]
},
"textIndexVersion" : 3,
"inputStage" : {
"stage" : "TEXT_MATCH",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "OR",
"inputStages" : [
{
"stage" : "IXSCAN",
"keyPattern" : {
"_fts" : "text",
"_ftsx" : 1
},
"indexName" : "title_text_tags_text_entities_text",
"isMultiKey" : true,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "backward",
"indexBounds" : {
}
},
{
"stage" : "IXSCAN",
"keyPattern" : {
"_fts" : "text",
"_ftsx" : 1
},
"indexName" : "title_text_tags_text_entities_text",
"isMultiKey" : true,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "backward",
"indexBounds" : {
}
}
]
}
}
}
}
},
"rejectedPlans" : [ ]
},
"serverInfo" : {
"host" : "fabric-dev",
"port" : 27017,
"version" : "4.0.2",
"gitVersion" : "fc1573ba18aee42f97a3bb13b67af7d837826b47"
},
"ok" : 1
}
I see that
["queryPlanner"]["winningPlan"]["inputStage"]["inputStage"]["inputStages"]
"stage": "IXSCAN"
"direction": "backward"
Can this please be explained why?
I was developing a pagination cursor using >lastId, and limit technique. But since, results are being returned backwards, I have to use < lastId which seems counterintuitive.
If I don't sort my results in the natural order, can it be guaranteed that it will always be backwards/reverse?
Edit: as mentioned in the comment below
My objective here is to get the intuition as to why the index was scanned backwards- is it the way I formulated my query? or something else entirely? The ordering- forwards or backwards doesn't matter as much as the consistency of it remaining always so does- either always forwards or vice versa
I came across this question on stackoverflow, and I believe the accepted answer, with the comments below satisfactorily gives me the intuition I was looking for.
How does MongoDB sort records when no sort order is specified?
I create a collection with three fields as described below. After that, I create an index over second field and executed a search using sort and hint operations.
Why - even using a hint over index created previously - MongoDB set sort as winningPlan?
I believe that if we filter data with some criteria and sort the result could be better, right?
Collection
> db.values.find()
{ "_id" : ObjectId("5763ffebe5a81f569b1005e5"), "field1" : "A", "field2" : "B", "field3" : "C" }
Indexes
> db.values.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "peftest.values"
},
{
"v" : 1,
"key" : {
"field2" : 1
},
"name" : "field2_1",
"ns" : "peftest.values"
}
]
Query and Explain
> db.values.find({field2:"B"}).sort({field1:1}).hint({field2:1}).explain()
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "peftest.values",
"indexFilterSet" : false,
"parsedQuery" : {
"field2" : {
"$eq" : "B"
}
},
"winningPlan" : {
"stage" : "SORT",
"sortPattern" : {
"field1" : 1
},
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"field2" : 1
},
"indexName" : "field2_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"field2" : [
"[\"B\", \"B\"]"
]
}
}
}
}
},
"rejectedPlans" : [ ]
},
"serverInfo" : {
"host" : "apstrd14501d.intraservice.corp",
"port" : 27017,
"version" : "3.2.4",
"gitVersion" : "e2ee9ffcf9f5a94fad76802e28cc978718bb7a30"
},
"ok" : 1
}
I think the plan is what you expect but you look at it from the wrong perspective :)
The input stage of the sort is an index scan so the query plan uses the index at first and the pass the result data to the sort.