I am trying to debug to find a root cause on why single field update taking 60+ seconds, but not able to figure out. Will be great if you help me to get some direction on how to proceed.
Scenario: updating a non-indexed date field on a collection.
MongoDB Version : 3.0.12. Storage engine: WiredTiger
The query is using the proper index. scanned and modified are the same number of docs.
system.pofile document :
{
"_id" : ObjectId("5aa2e63b27001947449f1eed"),
"op" : "update",
"ns" : "abc.xyz",
"query" : {
"aId" : "5aa298dce4b0ef9feffe70e1",
},
"updateobj" : {
"$set" : {
"psdt" : ISODate("2018-03-09T17:36:31.277Z")
}
},
"nscanned" : 20,
"nscannedObjects" : 20,
"nMatched" : 20,
"nModified" : 20,
"keyUpdates" : 0,
"writeConflicts" : 0,
"numYield" : 5,
"locks" : {
"Global" : {
"acquireCount" : {
"r" : NumberLong(26),
"w" : NumberLong(26)
}
},
"Database" : {
"acquireCount" : {
"w" : NumberLong(26)
}
},
"Collection" : {
"acquireCount" : {
"w" : NumberLong(6)
}
},
"oplog" : {
"acquireCount" : {
"w" : NumberLong(20)
}
}
},
"millis" : 69765
}
Related
My mongodb database has documents for each minute for each device. I have a query that basically searches documents matching with month field.
db.data.aggregate([
{
"$match":{ dId:14,month:6}
},
{
"$project" :{
dayOfMonth_agg :{$dayOfMonth : "$ts"},
month : "$month",
year_agg : {$year : "$ts"},
hour_agg : {$hour : "$ts"},
a0 :1,
a1:1,
d2:1,
f0:1,
dId:1
}
},
{
"$match":{year_agg:2017}
},
{
$group :{
_id : "$dayOfMonth_agg",
totalEnergy : { $sum: { $multiply: [{$subtract:["$a1","$a0"]},"$f0"] } },
avg_d0:{$avg:"$a0"},
avg_d1:{$avg:"$a1"},
avg_d2:{$avg:"$d2"},
}
},
{
$sort:{"_id":1}
}
])
I wanted to optimise query performance so I added compound index as:
db.data.createIndex({dId:1,month:1})
Now, in execution stats, docsExamined are equal to that of stats when single index on dId was created.
Exec stats with single Index on dId:
"keysExamined" : 103251,
"docsExamined" : 103251,
"hasSortStage" : true,
"cursorExhausted" : true,
"numYield" : 814,
"locks" : {
"Global" : {
"acquireCount" : {
"r" : NumberLong(1654)
}
},
"Database" : {
"acquireCount" : {
"r" : NumberLong(827)
}
},
"Collection" : {
"acquireCount" : {
"r" : NumberLong(826)
}
}
},
"nreturned" : 8,
"responseLength" : 770,
"protocol" : "op_command",
"millis" : 656,
"planSummary" : "IXSCAN { dId: 1 }",
"ts" : ISODate("2017-11-15T08:41:50.782Z"),
Exec stats with compound index (dId and month)
"keysExamined" : 103251,
"docsExamined" : 103251,
"hasSortStage" : true,
"cursorExhausted" : true,
"numYield" : 810,
"locks" : {
"Global" : {
"acquireCount" : {
"r" : NumberLong(1646)
}
},
"Database" : {
"acquireCount" : {
"r" : NumberLong(823)
}
},
"Collection" : {
"acquireCount" : {
"r" : NumberLong(822)
}
}
},
"nreturned" : 8,
"responseLength" : 770,
"protocol" : "op_command",
"millis" : 678,
"planSummary" : "IXSCAN { dId: 1, month: 1 }",
"ts" : ISODate("2017-11-15T08:45:38.536Z"),
"client" : "127.0.0.1",
"appName" : "MongoDB Shell",
"allUsers" : [ ],
"user" : ""
As per my knowledge, the docsExamined should have reduced in comparison with single index.
Why is it so?
Edit: Please refer to first comment for the answer
i use mongodb to manage device log datas. Right now, it has over one million documents. the document contains more than 30 fields which combine with embed fields. Now, it's really slow when i insert new documents. The insert cost more than 1000ms. From the slow query ops, i get the logs like this:
{
"op" : "insert",
"ns" : "xxx.LogDeviceReport",
"query" : {
"_id" : ObjectId("xxxx"),
"deviceId" : ObjectId("xxxx"),
"en" : "xxxxxx",
"status" : 1,
'other fields, more than 30 fields...'
...
...
},
"ninserted" : 1,
"keyUpdates" : 0,
"writeConflicts" : 0,
"numYield" : 0,
"locks" : {
"Global" : {
"acquireCount" : {
"w" : NumberLong(2)
}
},
"MMAPV1Journal" : {
"acquireCount" : {
"w" : NumberLong(3)
}
},
"Database" : {
"acquireCount" : {
"w" : NumberLong(2)
}
},
"Collection" : {
"acquireCount" : {
"W" : NumberLong(1)
},
"acquireWaitCount" : {
"W" : NumberLong(1)
},
"timeAcquiringMicros" : {
"W" : NumberLong(1477481)
}
},
"oplog" : {
"acquireCount" : {
"w" : NumberLong(1)
}
}
},
"millis" : 977,
"execStats" : {
},
"ts" : ISODate("2016-08-02T22:01:01.270Z"),
"client" : "xxx.xxx.xxxx",
"allUsers" : [
{
"user" : "xxx",
"db" : "xxx"
}
],
"user" : "xx#xx"
}
I checked the index, like this:
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "xxx.LogDeviceReport"
},
{
"v" : 1,
"key" : {
"time" : 1
},
"name" : "time_1",
"ns" : "xxx.LogDeviceReport",
"expireAfterSeconds" : 604800,
"background" : true
}
]
Only an _id index and a ttl index by time, no any other indexes.
I guess the 'query' slow the operate. In mongodb doc, it tells that only the _id will be checked the unique, but in the logs, all fields in the 'query', does it matter?
if not this reason, what makes it so slow? Can any one help me ?
If you are using mongodb 3+ you can consider using WiredTiger as storage engine than MMAPV1 which is being used in your case.
I have personally saw a 4x improvement when I have inserted up to 156000 documents in a single go.
MMAPV1 took around 40 min and when I switched to WiredTiger same task was completed in 10 min.
Please check this link from MongoDB blog for more information
Note :: This is only from MongoDB 3.0 +
looking at different mongodb operations i found out that mongodb 3.0 , WT engine, is generating many different locks , sometimes even upgrading to global lock like the following:
{
"desc" : "conn4981286",
"threadId" : "0x113f133400",
"connectionId" : 4981286,
"opid" : -371204669,
"active" : true,
"secs_running" : 0,
"microsecs_running" : NumberLong(377),
"op" : "update",
"ns" : "xxx",
"query" : {
"_id" : ObjectId("56c82738ccb2d079eec4867c")
},
"client" : "xxx",
"numYields" : 0,
"locks" : {
"Global" : "w",
"local" : "w",
"Database" : "w",
"Collection" : "w"
},
"waitingForLock" : false,
"lockStats" : {
"Global" : {
"acquireCount" : {
"r" : NumberLong(2),
"w" : NumberLong(2)
}
},
"Database" : {
"acquireCount" : {
"w" : NumberLong(2)
}
},
"Collection" : {
"acquireCount" : {
"w" : NumberLong(1)
}
},
"oplog" : {
"acquireCount" : {
"w" : NumberLong(1)
}
}
}
},
why would an update generate all of these locks?
thanks!
We see a lot of slow queries in mongo logs like below (with pipeline op mergeCursors). We have a shaded mongo with 2 shards with only primaries. What is mergeCursors command? Please let me know if any other information is required.
{
"_id" : ObjectId("5571b739f65f7e64bb806362"),
"op" : "command",
"ns" : "mongrel.$cmd",
"command" : {
"aggregate" : "collection1",
"pipeline" : [
{
"$mergeCursors" : [
{
"host" : "endpoint:27005",
"id" : NumberLong(82775337156)
}
]
}
]
},
"keyUpdates" : 0,
"numYield" : 0,
"lockStats" : {
"timeLockedMicros" : {
"r" : NumberLong(12),
"w" : NumberLong(0)
},
"timeAcquiringMicros" : {
"r" : NumberLong(2),
"w" : NumberLong(2680)
}
},
"responseLength" : 12312,
"millis" : 6142,
"execStats" : {},
"ts" : ISODate("2015-06-05T12:35:40.801Z"),
"client" : "10.167.212.83",
"allUsers" : [],
"user" : ""
}
I was recently reading this post (http://dbattish.tumblr.com/post/108652372056/joins-in-mongodb) which seems to say that it is an internal aggregate command to merge queries across shards.
Suppose we have a following document
{
embedded:[
{
email:"abc#abc.com",
active:true
},
{
email:"def#abc.com",
active:false
}]
}
What indexing should be used to support $elemMatch query on email and active field of embedded doc.
Update on question :-
db.foo.aggregate([{"$match":{"embedded":{"$elemMatch":{"email":"abc#abc.com","active":true}}}},{"$group":{_id:null,"total":{"$sum":1}}}],{explain:true});
on querying this i am getting following output of explain on aggregate :-
{
"stages" : [
{
"$cursor" : {
"query" : {
"embedded" : {
"$elemMatch" : {
"email" : "abc#abc.com",
"active" : true
}
}
},
"fields" : {
"_id" : 0,
"$noFieldsNeeded" : 1
},
"planError" : "InternalError No plan available to provide stats"
}
},
{
"$group" : {
"_id" : {
"$const" : null
},
"total" : {
"$sum" : {
"$const" : 1
}
}
}
}
],
"ok" : 1
}
I think mongodb internally not using index for this query.
Thanx in advance :)
Update on output of db.foo.stats()
db.foo.stats()
{
"ns" : "test.foo",
"count" : 2,
"size" : 480,
"avgObjSize" : 240,
"storageSize" : 8192,
"numExtents" : 1,
"nindexes" : 3,
"lastExtentSize" : 8192,
"paddingFactor" : 1,
"systemFlags" : 0,
"userFlags" : 1,
"totalIndexSize" : 24528,
"indexSizes" : {
"_id_" : 8176,
"embedded.email_1_embedded.active_1" : 8176,
"name_1" : 8176
},
"ok" : 1
}
db.foo.getIndexes();
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.foo"
},
{
"v" : 1,
"key" : {
"embedded.email" : 1,
"embedded.active" : 1
},
"name" : "embedded.email_1_embedded.active_1",
"ns" : "test.foo"
},
{
"v" : 1,
"key" : {
"name" : 1
},
"name" : "name_1",
"ns" : "test.foo"
}
]
Should you decide to stick to that data model and your queries, here's how to create indexes that match the query:
You can simply index "embedded.email", or use a compound key of embedded indexes, i.e. something like
> db.foo.ensureIndex({"embedded.email" : 1 });
- or -
> db.foo.ensureIndex({"embedded.email" : 1, "embedded.active" : 1});
Indexing boolean fields is often not too useful, since their selectivity is low.