execStats is always empty in MongoDB "aggregate" commands profiling results - mongodb

I am trying to profile the performance of an aggregation pipeline, specifically checking whether indices are used, how many objects are scanned, etc.
I'm setting the DB to full profiling:
db.setProfilingLevel(2)
But then in the db's 'system.profile' collection, in the result record for the aggregation command, the execStats is always empty.
Here is the full result for the command:
{
"op" : "command",
"ns" : "mydb.$cmd",
"command" : {
"aggregate" : "mycolection",
"pipeline" : [{
"$match" : {
"date" : {
"$gte" : "2013-11-26"
}
}
}, {
"$sort" : {
"user_id" : 1
}
}, {
"$project" : {
"user_id" : 1,
"_id" : 0
}
}, {
"$group" : {
"_id" : "$user_id",
"agg_val" : {
"$sum" : 1
}
}
}],
"allowDiskUse" : true
},
"keyUpdates" : 0,
"numYield" : 16,
"lockStats" : {
"timeLockedMicros" : {
"r" : NumberLong(3143653),
"w" : NumberLong(0)
},
"timeAcquiringMicros" : {
"r" : NumberLong(140),
"w" : NumberLong(3)
}
},
"responseLength" : 4990,
"millis" : 3237,
"execStats" : { },
"ts" : ISODate("2014-11-26T16:20:59.576Z"),
"client" : "127.0.0.1",
"allUsers" : [],
"user" : ""
}

Support execStats for aggregation command was added in mongo 3.4.

Related

MongoDB $group + $sum aggregation is very slow

I have an aggregation query in MongoDB:
[{
$group: {
_id: '$status',
status: {
$sum: 1
}
}
}]
It is running on a collection that has ~80 million documents. The status field is indexed, yet the query is very slow and runs for around 60 seconds or more.
I did an explain() on the query, but still got almost nowhere:
{
"explainVersion" : "1",
"stages" : [
{
"$cursor" : {
"queryPlanner" : {
"namespace" : "loa.document",
"indexFilterSet" : false,
"parsedQuery" : {
},
"queryHash" : "B9878693",
"planCacheKey" : "8EAA28C6",
"maxIndexedOrSolutionsReached" : false,
"maxIndexedAndSolutionsReached" : false,
"maxScansToExplodeReached" : false,
"winningPlan" : {
"stage" : "PROJECTION_SIMPLE",
"transformBy" : {
"status" : 1,
"_id" : 0
},
"inputStage" : {
"stage" : "COLLSCAN",
"direction" : "forward"
}
},
"rejectedPlans" : [ ]
}
}
},
{
"$group" : {
"_id" : "$status",
"status" : {
"$sum" : {
"$const" : 1
}
}
}
}
],
"serverInfo" : {
"host" : "rack-compute-2",
"port" : 27017,
"version" : "5.0.6",
"gitVersion" : "212a8dbb47f07427dae194a9c75baec1d81d9259"
},
"serverParameters" : {
"internalQueryFacetBufferSizeBytes" : 104857600,
"internalQueryFacetMaxOutputDocSizeBytes" : 104857600,
"internalLookupStageIntermediateDocumentMaxSizeBytes" : 104857600,
"internalDocumentSourceGroupMaxMemoryBytes" : 104857600,
"internalQueryMaxBlockingSortMemoryUsageBytes" : 104857600,
"internalQueryProhibitBlockingMergeOnMongoS" : 0,
"internalQueryMaxAddToSetBytes" : 104857600,
"internalDocumentSourceSetWindowFieldsMaxMemoryBytes" : 104857600
},
"command" : {
"aggregate" : "document",
"pipeline" : [
{
"$group" : {
"_id" : "$status",
"status" : {
"$sum" : 1
}
}
}
],
"explain" : true,
"cursor" : {
},
"lsid" : {
"id" : UUID("a07e17fe-65ff-4d38-966f-7517b7a5d3f2")
},
"$db" : "loa"
},
"ok" : 1
}
I see that it does a full COLLSCAN, I just can't understand why.
I plan on supporting a couple hundred million (or even a billion) documents in that collection, but this problem hijacks my plans for seemingly no reason.
You can advice the query planner to use the index as follow:
db.test.explain("executionStats").aggregate(
[
{$group:{ _id:"$status" ,status:{$sum:1} }}
],
{hint:"status_1"}
)
Make sure the index name in the hint is same as created ...
(db.test.getIndexes() will show you the exact index name )

mongodb find query getmore operation hangs

We are running MongoDB 3.0.2 on Linux.
The "getmore" operation for a find query periodically hangs.
The operation is available currentOp(), and one item I cannot explain is that all the acquireCount values in lockStats continue to grow while the query hangs (for Global, MMAPV1Journal, Database, and Collection).
This is the operation in question, at this point running for more than 1000 secs, returned by db.currentOp() :
{
"desc" : "conn60",
"threadId" : "0x2a99ee0",
"connectionId" : 60,
"opid" : 67792,
"active" : true,
"secs_running" : 1098,
"microsecs_running" : NumberLong(1098289543),
"op" : "getmore",
"ns" : "dbName.collectionName",
"query" : {
"d" : {
"$gt" : ISODate("2016-03-13T18:00:00.261Z"),
"$lt" : ISODate("2016-03-14T22:45:17.718Z")
},
"cc" : "US",
"dc" : {
"$in" : [
"26",
"31",
"17",
"29",
"35"
]
},
"pr" : {
"$gte" : 4
}
},
"client" : "10.0.0.111:33670",
"numYields" : 317557,
"locks" : {
},
"waitingForLock" : false,
"lockStats" : {
"Global" : {
"acquireCount" : {
"r" : NumberLong(635114)
}
},
"MMAPV1Journal" : {
"acquireCount" : {
"r" : NumberLong(317557)
}
},
"Database" : {
"acquireCount" : {
"r" : NumberLong(317557)
}
},
"Collection" : {
"acquireCount" : {
"R" : NumberLong(317557)
}
}
}
}
Any insights would be much appreciated ! Thank you in advance.

What is mongo $mergeCursors?

We see a lot of slow queries in mongo logs like below (with pipeline op mergeCursors). We have a shaded mongo with 2 shards with only primaries. What is mergeCursors command? Please let me know if any other information is required.
{
"_id" : ObjectId("5571b739f65f7e64bb806362"),
"op" : "command",
"ns" : "mongrel.$cmd",
"command" : {
"aggregate" : "collection1",
"pipeline" : [
{
"$mergeCursors" : [
{
"host" : "endpoint:27005",
"id" : NumberLong(82775337156)
}
]
}
]
},
"keyUpdates" : 0,
"numYield" : 0,
"lockStats" : {
"timeLockedMicros" : {
"r" : NumberLong(12),
"w" : NumberLong(0)
},
"timeAcquiringMicros" : {
"r" : NumberLong(2),
"w" : NumberLong(2680)
}
},
"responseLength" : 12312,
"millis" : 6142,
"execStats" : {},
"ts" : ISODate("2015-06-05T12:35:40.801Z"),
"client" : "10.167.212.83",
"allUsers" : [],
"user" : ""
}
I was recently reading this post (http://dbattish.tumblr.com/post/108652372056/joins-in-mongodb) which seems to say that it is an internal aggregate command to merge queries across shards.

What index to be added in MongoDB to support $elemMatch query on embedded document

Suppose we have a following document
{
embedded:[
{
email:"abc#abc.com",
active:true
},
{
email:"def#abc.com",
active:false
}]
}
What indexing should be used to support $elemMatch query on email and active field of embedded doc.
Update on question :-
db.foo.aggregate([{"$match":{"embedded":{"$elemMatch":{"email":"abc#abc.com","active":true}}}},{"$group":{_id:null,"total":{"$sum":1}}}],{explain:true});
on querying this i am getting following output of explain on aggregate :-
{
"stages" : [
{
"$cursor" : {
"query" : {
"embedded" : {
"$elemMatch" : {
"email" : "abc#abc.com",
"active" : true
}
}
},
"fields" : {
"_id" : 0,
"$noFieldsNeeded" : 1
},
"planError" : "InternalError No plan available to provide stats"
}
},
{
"$group" : {
"_id" : {
"$const" : null
},
"total" : {
"$sum" : {
"$const" : 1
}
}
}
}
],
"ok" : 1
}
I think mongodb internally not using index for this query.
Thanx in advance :)
Update on output of db.foo.stats()
db.foo.stats()
{
"ns" : "test.foo",
"count" : 2,
"size" : 480,
"avgObjSize" : 240,
"storageSize" : 8192,
"numExtents" : 1,
"nindexes" : 3,
"lastExtentSize" : 8192,
"paddingFactor" : 1,
"systemFlags" : 0,
"userFlags" : 1,
"totalIndexSize" : 24528,
"indexSizes" : {
"_id_" : 8176,
"embedded.email_1_embedded.active_1" : 8176,
"name_1" : 8176
},
"ok" : 1
}
db.foo.getIndexes();
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.foo"
},
{
"v" : 1,
"key" : {
"embedded.email" : 1,
"embedded.active" : 1
},
"name" : "embedded.email_1_embedded.active_1",
"ns" : "test.foo"
},
{
"v" : 1,
"key" : {
"name" : 1
},
"name" : "name_1",
"ns" : "test.foo"
}
]
Should you decide to stick to that data model and your queries, here's how to create indexes that match the query:
You can simply index "embedded.email", or use a compound key of embedded indexes, i.e. something like
> db.foo.ensureIndex({"embedded.email" : 1 });
- or -
> db.foo.ensureIndex({"embedded.email" : 1, "embedded.active" : 1});
Indexing boolean fields is often not too useful, since their selectivity is low.

Aggregation framework performance on a 10M collection

I have a collection of 10M documents, that is a pre-aggregation of daily events.
A simple $group took more than 8s, is this performance normal ?
Some date from the profiler :
{
"op" : "command",
"ns" : "analytics.$cmd",
"command" : {
"aggregate" : "aggregation",
"pipeline" : [
{
"$group" : {
"_id" : "",
"hits" : {
"$sum" : "$hits"
}
}
}
]
},
"ntoreturn" : 1,
"keyUpdates" : 0,
"numYield" : 15,
"lockStats" : {
"timeLockedMicros" : {
"r" : NumberLong(17169805),
"w" : NumberLong(0)
},
"timeAcquiringMicros" : {
"r" : NumberLong(8582619),
"w" : NumberLong(294)
}
},
"responseLength" : 78,
"millis" : 8594,
"ts" : ISODate("2013-12-04T15:57:38.217Z"),
"client" : "127.0.0.1",
"allUsers" : [ ],
"user" : ""
}
Here is one single document
{
"_id" : ObjectId("529e21ee67e807418500daeb"),
"date" : ISODate("2012-09-19T00:00:00Z"),
"hits" : 1,
"infos" : {
"sourceValue" : NumberLong(1),
"eventType" : "createUser",
"sourceType" : "user",
"instance" : "xxx",
"targetType" : "user",
"targetValue" : NumberLong(15)
}
}