MongoDB aggregation explain provides data only about first stages - mongodb

I'm running the following aggregation query on a test database
db.restaurants.explain().aggregate([
{$match: {"address.zipcode": {$in: ["10314", "11208", "11219"]}}},
{$match: {"grades": {$elemMatch: {score: {$gte: 1}}}}},
{$group: {_id: "$borough", count: {$sum: 1} }},
{$sort: {count: -1} }
]);
And as per MongoDB documentation it should return cursor that I can iterate and see data about all pipeline stages:
The operation returns a cursor with the document that contains detailed information regarding the processing of the aggregation pipeline.
However the aggregation command returns explain info only about first two match stages:
{
"stages" : [
{
"$cursor" : {
"query" : {
"$and" : [
{
"address.zipcode" : {
"$in" : [
"10314",
"11208",
"11219"
]
}
},
{
"grades" : {
"$elemMatch" : {
"score" : {
"$gte" : 1.0
}
}
}
}
]
},
"fields" : {
"borough" : 1,
"_id" : 0
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.restaurants",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"grades" : {
"$elemMatch" : {
"score" : {
"$gte" : 1.0
}
}
}
},
{
"address.zipcode" : {
"$in" : [
"10314",
"11208",
"11219"
]
}
}
]
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"$and" : [
{
"grades" : {
"$elemMatch" : {
"score" : {
"$gte" : 1.0
}
}
}
},
{
"address.zipcode" : {
"$in" : [
"10314",
"11208",
"11219"
]
}
}
]
},
"direction" : "forward"
},
"rejectedPlans" : []
}
}
},
{
"$group" : {
"_id" : "$borough",
"count" : {
"$sum" : {
"$const" : 1.0
}
}
}
},
{
"$sort" : {
"sortKey" : {
"count" : -1
}
}
}
],
"ok" : 1.0
}
And the object returned does not seem like cursor at all.
If I save the aggregation result to a variable and then try to iterate through it using cursor methods (hasNext(), next(), etc) I get the following:
TypeError: result.next is not a function : #(shell):1:1
How can I see info on all pipeline steps?
Thanks

1. Explain info
Explain() returns the winning plan of a query, ie how the database fetch the document before processing them in the pipeline.
Here, because adress.zipcode and grades aren't indexed, the db performs a COLLSCAN, ie iterate over all documents in db and see if they match
After that, you group the document and sort the results. Thoses operations are done "in memory", on the previously fetched documents. The fields aren't indexed, so no special plan can be used here
more info here : explain results
2. Explain() on aggregation query does not return a cursor
For some reason, explain() on aggregation query does not return a cursor, but a BSON object directly (unlike explain() on find() query )
It might be a bug, but there's nothing about this in the doc.
Anyway, you can do :
var explain = db.restaurants.explain().aggregate([
{$match: {"address.zipcode": {$in: ["10314", "11208", "11219"]}}},
{$match: {"grades": {$elemMatch: {score: {$gte: 1}}}}},
{$group: {_id: "$borough", count: {$sum: 1} }},
{$sort: {count: -1} }
]);
printjson(explain)

Related

Ho use $sum (aggregation) for array of object and check greater than for each sum

My document structure is as follow :
{
"_id" : ObjectId("621ccb5ea46a9e41768e0ba8"),
"cust_name" : "Anuj Kumar",
"product" : [
{
"prod_name" : "Robot",
"price" : 15000
},
{
"prod_name" : "Keyboard",
"price" : 65000
}
],
"order_date" : ISODate("2022-02-22T00:00:00Z"),
"status" : "processed",
"invoice" : {
"invoice_no" : 111,
"invoice_date" : ISODate("2022-02-22T00:00:00Z")
}
}
How to do the following query...
List the details of orders with a value >10000.
I want to display only those objects whose sum of prices is greater than 10000
I try this
db.order.aggregate([{$project : {sumOfPrice : {$sum : "$product.price"} }}])
Output
{ "_id" : ObjectId("621ccb5ea46a9e41768e0ba8"), "sumOfPrice" : 80000 }
{ "_id" : ObjectId("621ccba9a46a9e41768e0ba9"), "sumOfPrice" : 16500 }
{ "_id" : ObjectId("621ccbfaa46a9e41768e0baa"), "sumOfPrice" : 5000 }
I want to check this sumOfPrice is greater than 10000 or not and display those order full object.
You can just add a $match stage right after that checks for this conditions, like so:
db.collection.aggregate([
{
$addFields: {
sumOfPrice: {
$sum: "$product.price"
}
}
},
{
$match: {
sumOfPrice: {
$gt: 10000
}
}
}
])
Mongo Playground
You can also use $expr operator with the find query as:
db.order.find({
$expr: {
$gt: [ {$sum: '$product.price'}, 10000 ]
}
})
Mongo Playground

Getting the N documents in MongoDB before a Document ID from a Sorted Result

I have a collection in MongoDB, like the one below.
-> Mongo Playground link
I have sorted the collection with Overview and ID.
$sort{{ overview: 1,_id:1 }}
which results in a collection like this.
When I filter the collection to show only the documents after "subject 13.", it works as expected.
$match{{
_id:{$gt:ObjectId('605db89d208db95eb4878556')}
}}
however, when I try to the documents before "subject 13", that is "Subject 6" , with the following query, it doesn't work as I expect.
$match{{
_id:{$lt:ObjectId('605db89d208db95eb4878556')}
}}
Instead of getting just "Subject 6" in the result, I get the following.
I suspect this is happening because, mongodb always filters the document before sorting, regardless of the order in aggregate pipeline.
Please suggest me a way to get the documents before a particular "_id" in mongodb.
I have 600 documents in the collection, this is a sample dataset. My Full aggregate query below.
[
{
'$sort': {
'overview': 1,
'_id': 1
}
}, {
'$match': {
'_id': {
'$lt': new ObjectId('605db89d208db95eb4878556')
}
}
}
]
MongoDB optimizes the query performance by moving sort to the end in your case as you've $sort followed by $match
https://docs.mongodb.com/manual/core/aggregation-pipeline-optimization/#sort-match-sequence-optimization
When you have a sequence with $sort followed by a $match, the $match moves before the $sort to minimize the number of objects to sort. For example, if the pipeline consists of the following stages:
[
{ '$sort': { 'overview': 1, '_id': 1 } },
{ '$match': { '_id': { '$lt': new ObjectId('605db89d208db95eb4878556') } }
]
During the optimization phase, the optimizer transforms the sequence to the following:
[
{ '$match': { '_id': { '$lt': new ObjectId('605db89d208db95eb4878556') } },
{ '$sort': { 'overview': 1, '_id': 1 } }
]
Query planner Result -
We can see the 1st stage is the match query, after that sort is performed.
{
"stages" : [
{
"$cursor" : {
"query" : {
"_id" : {
"$lt" : ObjectId("605db89d208db95eb4878556")
}
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "video.data3",
"indexFilterSet" : false,
"parsedQuery" : {
"_id" : {
"$lt" : ObjectId("605db89d208db95eb4878556")
}
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"_id" : 1
},
"indexName" : "_id_",
"isMultiKey" : false,
"multiKeyPaths" : {
"_id" : []
},
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"_id" : [
"[ObjectId('000000000000000000000000'), ObjectId('605db89d208db95eb4878556'))"
]
}
}
},
"rejectedPlans" : []
}
}
},
{
"$sort" : {
"sortKey" : {
"overview" : 1,
"_id" : 1
}
}
}
],
"ok" : 1.0
}

How can I do match after second level unwind in mongodb?

I am working on a software that uses MongoDB as a database. I have a collection like this (this is just one document)
{
"_id" : ObjectId("5aef51e0af42ea1b70d0c4dc"),
"EndpointId" : "89799bcc-e86f-4c8a-b340-8b5ed53caf83",
"DateTime" : ISODate("2018-05-06T19:05:04.574Z"),
"Url" : "test",
"Tags" : [
{
"Uid" : "E2:02:00:18:DA:40",
"Type" : 1,
"DateTime" : ISODate("2018-05-06T19:05:04.574Z"),
"Sensors" : [
{
"Type" : 1,
"Value" : NumberDecimal("-98")
},
{
"Type" : 2,
"Value" : NumberDecimal("-65")
}
]
},
{
"Uid" : "12:3B:6A:1A:B7:F9",
"Type" : 1,
"DateTime" : ISODate("2018-05-06T19:05:04.574Z"),
"Sensors" : [
{
"Type" : 1,
"Value" : NumberDecimal("-95")
},
{
"Type" : 2,
"Value" : NumberDecimal("-59")
},
{
"Type" : 3,
"Value" : NumberDecimal("12.939770381907275")
}
]
}
]
}
and I want to run this query on it.
db.myCollection.aggregate([
{ $unwind: "$Tags" },
{
$match: {
$and: [
{
"Tags.DateTime": {
$gte: ISODate("2018-05-06T19:05:02Z"),
$lte: ISODate("2018-05-06T19:05:09Z"),
},
},
{ "Tags.Uid": { $in: ["C1:3D:CA:D4:45:11"] } },
],
},
},
{ $unwind: "$Tags.Sensors" },
{ $match: { "$Tags.Sensors.Type": { $in: [1, 2] } } },
{
$project: {
_id: 0,
EndpointId: "$EndpointId",
TagId: "$Tags.Uid",
Url: "$Url",
TagType: "$Tags.Type",
Date: "$Tags.DateTime",
SensorType: "$Tags.Sensors.Type",
Value: "$Tags.Sensors.Value",
},
},
])
the problem is, the second match (that checks $Tags.Sensors.Type) doesn't work and doesn't affect the result of the query.
How can I solve that?
If this is not the right way, what is the right way to run these conditions?
The $match stage accepts field names without a leading $ sign. You've done that correctly in your first $match stage but in the second one you write $Tags.Sensors.Type. Simply removing the leading $ sign should make your query work.
Mind you, the whole thing can be a bit simplified (and some beautification doesn't hurt, either):
You don't need to use $and in your example since it's assumed by default if you specify more than one criterion in a filter.
The $in that you use for the Tags.Sensors.Type filter can be a simple : kind of equality operator unless you have more than one element in the list of acceptable values.
In the $project stage, instead of (kind of) duplicating identical field names you can use the <field>: 1 syntax unless the order of the fields matters.
So the final query would be something like this.
db.myCollection.aggregate([
{
"$unwind" : "$Tags"
},
{
"$match" : {
"Tags.DateTime" : { "$gte" : ISODate("2018-05-06T19:05:02Z"), "$lte" : ISODate("2018-05-06T19:05:09Z") },
"Tags.Uid" : { "$in" : ["C1:3D:CA:D4:45:11"] }
}
}, {
"$unwind" : "$Tags.Sensors"
}, {
"$match" : {
"Tags.Sensors.Type" : { "$in" : [1,2] }
}
},
{
"$project" : {
"_id" : 0,
"EndpointId" : 1,
"TagId" : "$Tags.Uid",
"Url" : 1,
"TagType" : "$Tags.Type",
"Date" : "$Tags.DateTime",
"SensorType" : "$Tags.Sensors.Type",
"Value" : "$Tags.Sensors.Value"
}
}])

MongoDB $sum and $avg of sub documents

I need to get $sum and $avg of subdocuments, i would like to get $sum and $avg of Channels[0].. and other channels as well.
my data structure looks like this
{
_id : ... Location : 1,
Channels : [
{ _id: ...,
Value: 25
},
{
_id: ... ,
Value: 39
},
{
_id: ..,
Value: 12
}
]
}
In order to get the sum and average of the Channels.Value elements for each document in your collection you will need to use mongodb's Aggregation processing. Further, since Channels is an array you will need to use the $unwind operator to deconstruct the array.
Assuming that your collection is called example, here's how you could get both the document sum and average of the Channels.Values:
db.example.aggregate( [
{
"$unwind" : "$Channels"
},
{
"$group" : {
"_id" : "$_id",
"documentSum" : { "$sum" : "$Channels.Value" },
"documentAvg" : { "$avg" : "$Channels.Value" }
}
}
] )
The output from your post's data would be:
{
"_id" : SomeObjectIdValue,
"documentSum" : 76,
"documentAvg" : 25.333333333333332
}
If you have more than one document in your collection then you will see a result row for each document containing a Channels array.
Solution 1: Using two groups based this example:
previous question
db.records.aggregate(
[
{ $unwind: "$Channels" },
{ $group: {
_id: {
"loc" : "$Location",
"cId" : "$Channels.Id"
},
"value" : {$sum : "$Channels.Value" },
"average" : {$avg : "$Channels.Value"},
"maximun" : {$max : "$Channels.Value"},
"minimum" : {$min : "$Channels.Value"}
}},
{ $group: {
_id : "$_id.loc",
"ChannelsSumary" : { $push :
{ "channelId" : '$_id.cId',
"value" :'$value',
"average" : '$average',
"maximun" : '$maximun',
"minimum" : '$minimum'
}}
}
}
]
)
Solution 2:
there is property i didn't show on my original question that might of help "Channels.Id" independent from "Channels._Id"
db.records.aggregate( [
{
"$unwind" : "$Channels"
},
{
"$group" : {
"_id" : "$Channels.Id",
"documentSum" : { "$sum" : "$Channels.Value" },
"documentAvg" : { "$avg" : "$Channels.Value" }
}
}
] )

Aggregation query returning array of all objects for mongodb

I'm using mongo for the first time. I'm trying to aggregate some documents in a collection using the query below. Instead the query returns an object with a key "result" that contains an array of all the documents that fit with $match.
Below is the query.
db.events_2015_04_10.aggregate([
{$group:{
_id: "$uid",
count: {$sum: 1},
},
$match : {promo:"bc40100abc8d4eb6a0c68f81f4a756c7", evt:"login"}
}
]
);
Below is a sample document in the collection:
{
"_id" : ObjectId("552712c3f92ea17426000ace"),
"product" : "Mobile Safari",
"venue_id" : NumberLong(71540),
"uid" : "dd542fea6b4443469ff7bf1f56472eac",
"ag" : 0,
"promo" : "bc40100abc8d4eb6a0c68f81f4a756c7",
"promo_f" : NumberLong(1),
"brand" : NumberLong(17),
"venue" : "ovation_2480",
"lt" : 0,
"ts" : ISODate("2015-04-10T00:01:07.734Z"),
"evt" : "login",
"mac" : "00:00:00:00:00:00",
"__ns__" : "wifipromo",
"pvdr" : NumberLong(42),
"os" : "iPhone",
"cmpgn" : "fc6de34aef8b4f57af0b8fda98d8c530",
"ip" : "192.119.43.250",
"lng" : 0,
"product_ver" : "8"
}
I'm trying to get it all grouped by uid's with the total sum of each group... What is the correct way to achieve this?
Try the following aggregation framework which has the $match pipeline stage first and then the $group pipeline later:
db.events_2015_04_10.aggregate([
{
$match: {
promo: "bc40100abc8d4eb6a0c68f81f4a756c7",
evt: "login"
}
},
{
$group: {
_id: "$uid",
count: {
$sum: 1
}
}
}
])