Calculate the average of fields in embedded documents/array - mongodb

I want to calculate the rating_average field of this object with the rating fields inside the array ratings. Can you help me to understand how to use aggregation with $avg?
{
"title": "The Hobbit",
"rating_average": "???",
"ratings": [
{
"title": "best book ever",
"rating": 5
},
{
"title": "good book",
"rating": 3.5
}
]
}

The aggregation framework in MongoDB 3.4 and newer offers the $reduce operator which efficiently calculates the total without the need for extra pipelines. Consider using it as an expression to return the
total ratings and get the number of ratings using $size. Together with $addFields, the average can thus be calculated using the arithmetic operator $divide as in the formula average = total ratings/number of ratings:
db.collection.aggregate([
{
"$addFields": {
"rating_average": {
"$divide": [
{ // expression returns total
"$reduce": {
"input": "$ratings",
"initialValue": 0,
"in": { "$add": ["$$value", "$$this.rating"] }
}
},
{ // expression returns ratings count
"$cond": [
{ "$ne": [ { "$size": "$ratings" }, 0 ] },
{ "$size": "$ratings" },
1
]
}
]
}
}
}
])
Sample Output
{
"_id" : ObjectId("58ab48556da32ab5198623f4"),
"title" : "The Hobbit",
"ratings" : [
{
"title" : "best book ever",
"rating" : 5.0
},
{
"title" : "good book",
"rating" : 3.5
}
],
"rating_average" : 4.25
}
With older versions, you would need to first apply the $unwind operator on the ratings array field first as your initial aggregation pipeline step. This will deconstruct the ratings array field from the input documents to output a document for each element. Each output document replaces the array with an element value.
The second pipeline stage would be the $group operator which groups input documents by the _id and title keys identifier expression and applies the desired $avg accumulator expression to each group that calculates the average. There is another accumulator operator $push that preserves the original ratings array field by returning an array of all values that result from applying an expression to each document in the above group.
The final pipeline step is the $project operator which then reshapes each document in the stream, such as by adding the new field ratings_average.
So, if for instance you have a sample document in your collection (as from above and so below):
db.collection.insert({
"title": "The Hobbit",
"ratings": [
{
"title": "best book ever",
"rating": 5
},
{
"title": "good book",
"rating": 3.5
}
]
})
To calculate the ratings array average and projecting the value in another field ratings_average, you can then apply the following aggregation pipeline:
db.collection.aggregate([
{
"$unwind": "$ratings"
},
{
"$group": {
"_id": {
"_id": "$_id",
"title": "$title"
},
"ratings":{
"$push": "$ratings"
},
"ratings_average": {
"$avg": "$ratings.rating"
}
}
},
{
"$project": {
"_id": 0,
"title": "$_id.title",
"ratings_average": 1,
"ratings": 1
}
}
])
Result:
/* 1 */
{
"result" : [
{
"ratings" : [
{
"title" : "best book ever",
"rating" : 5
},
{
"title" : "good book",
"rating" : 3.5
}
],
"ratings_average" : 4.25,
"title" : "The Hobbit"
}
],
"ok" : 1
}

This really could be written so much shorter, and this was even true at the time of writing. If you want an "average" simply use $avg:
db.collection.aggregate([
{ "$addFields": {
"rating_average": { "$avg": "$ratings.rating" }
}}
])
The reason for this is that as of MongoDB 3.2 the $avg operator gained "two" things:
The ability to process an "array" of arguments in a "expression" form rather than solely as an accumulator to $group
Benefits from the features of MongoDB 3.2 that allowed the "shorthand" notation of array expressions. Being either in composition:
{ "array": [ "$fielda", "$fieldb" ] }
or in notating a single property from the array as an array of the values of that property:
{ "$avg": "$ratings.rating" } // equal to { "$avg": [ 5, 3.5 ] }
In earlier releases you would have to use $map in order to access the "rating" property inside each array element. Now you don't.
For the record, even the $reduce usage can be simplified:
db.collection.aggregate([
{ "$addFields": {
"rating_average": {
"$reduce": {
"input": "$ratings",
"initialValue": 0,
"in": {
"$add": [
"$$value",
{ "$divide": [
"$$this.rating",
{ "$size": { "$ifNull": [ "$ratings", [] ] } }
]}
]
}
}
}
}}
])
Yes as stated, this is really just re-implementing the existing $avg functionality, and therefore since that operator is available then it is the one that should be used.

As you have your to-be-calculated-average data in an array, first you need to unwind it. Do it by using the $unwind in your aggregation pipeline:
{$unwind: "$ratings"}
Then you may access each element of the array as an embedded document with key ratings in the result documents of the aggregation. Then you just need to $group by title and calculate $avg:
{$group: {_id: "$title", ratings: {$push: "$ratings"}, average: {$avg: "$ratings.rating"}}}
Then just recover your title field:
{$project: {_id: 0, title: "$_id", ratings: 1, average: 1}}
So here is your result aggregation pipeline:
db.yourCollection.aggregate([
{$unwind: "$ratings"},
{$group: {_id: "$title",
ratings: {$push: "$ratings"},
average: {$avg: "$ratings.rating"}
}
},
{$project: {_id: 0, title: "$_id", ratings: 1, average: 1}}
])

Related

Calculating average in mongo doesn't return the expected result [duplicate]

I want to calculate the rating_average field of this object with the rating fields inside the array ratings. Can you help me to understand how to use aggregation with $avg?
{
"title": "The Hobbit",
"rating_average": "???",
"ratings": [
{
"title": "best book ever",
"rating": 5
},
{
"title": "good book",
"rating": 3.5
}
]
}
The aggregation framework in MongoDB 3.4 and newer offers the $reduce operator which efficiently calculates the total without the need for extra pipelines. Consider using it as an expression to return the
total ratings and get the number of ratings using $size. Together with $addFields, the average can thus be calculated using the arithmetic operator $divide as in the formula average = total ratings/number of ratings:
db.collection.aggregate([
{
"$addFields": {
"rating_average": {
"$divide": [
{ // expression returns total
"$reduce": {
"input": "$ratings",
"initialValue": 0,
"in": { "$add": ["$$value", "$$this.rating"] }
}
},
{ // expression returns ratings count
"$cond": [
{ "$ne": [ { "$size": "$ratings" }, 0 ] },
{ "$size": "$ratings" },
1
]
}
]
}
}
}
])
Sample Output
{
"_id" : ObjectId("58ab48556da32ab5198623f4"),
"title" : "The Hobbit",
"ratings" : [
{
"title" : "best book ever",
"rating" : 5.0
},
{
"title" : "good book",
"rating" : 3.5
}
],
"rating_average" : 4.25
}
With older versions, you would need to first apply the $unwind operator on the ratings array field first as your initial aggregation pipeline step. This will deconstruct the ratings array field from the input documents to output a document for each element. Each output document replaces the array with an element value.
The second pipeline stage would be the $group operator which groups input documents by the _id and title keys identifier expression and applies the desired $avg accumulator expression to each group that calculates the average. There is another accumulator operator $push that preserves the original ratings array field by returning an array of all values that result from applying an expression to each document in the above group.
The final pipeline step is the $project operator which then reshapes each document in the stream, such as by adding the new field ratings_average.
So, if for instance you have a sample document in your collection (as from above and so below):
db.collection.insert({
"title": "The Hobbit",
"ratings": [
{
"title": "best book ever",
"rating": 5
},
{
"title": "good book",
"rating": 3.5
}
]
})
To calculate the ratings array average and projecting the value in another field ratings_average, you can then apply the following aggregation pipeline:
db.collection.aggregate([
{
"$unwind": "$ratings"
},
{
"$group": {
"_id": {
"_id": "$_id",
"title": "$title"
},
"ratings":{
"$push": "$ratings"
},
"ratings_average": {
"$avg": "$ratings.rating"
}
}
},
{
"$project": {
"_id": 0,
"title": "$_id.title",
"ratings_average": 1,
"ratings": 1
}
}
])
Result:
/* 1 */
{
"result" : [
{
"ratings" : [
{
"title" : "best book ever",
"rating" : 5
},
{
"title" : "good book",
"rating" : 3.5
}
],
"ratings_average" : 4.25,
"title" : "The Hobbit"
}
],
"ok" : 1
}
This really could be written so much shorter, and this was even true at the time of writing. If you want an "average" simply use $avg:
db.collection.aggregate([
{ "$addFields": {
"rating_average": { "$avg": "$ratings.rating" }
}}
])
The reason for this is that as of MongoDB 3.2 the $avg operator gained "two" things:
The ability to process an "array" of arguments in a "expression" form rather than solely as an accumulator to $group
Benefits from the features of MongoDB 3.2 that allowed the "shorthand" notation of array expressions. Being either in composition:
{ "array": [ "$fielda", "$fieldb" ] }
or in notating a single property from the array as an array of the values of that property:
{ "$avg": "$ratings.rating" } // equal to { "$avg": [ 5, 3.5 ] }
In earlier releases you would have to use $map in order to access the "rating" property inside each array element. Now you don't.
For the record, even the $reduce usage can be simplified:
db.collection.aggregate([
{ "$addFields": {
"rating_average": {
"$reduce": {
"input": "$ratings",
"initialValue": 0,
"in": {
"$add": [
"$$value",
{ "$divide": [
"$$this.rating",
{ "$size": { "$ifNull": [ "$ratings", [] ] } }
]}
]
}
}
}
}}
])
Yes as stated, this is really just re-implementing the existing $avg functionality, and therefore since that operator is available then it is the one that should be used.
As you have your to-be-calculated-average data in an array, first you need to unwind it. Do it by using the $unwind in your aggregation pipeline:
{$unwind: "$ratings"}
Then you may access each element of the array as an embedded document with key ratings in the result documents of the aggregation. Then you just need to $group by title and calculate $avg:
{$group: {_id: "$title", ratings: {$push: "$ratings"}, average: {$avg: "$ratings.rating"}}}
Then just recover your title field:
{$project: {_id: 0, title: "$_id", ratings: 1, average: 1}}
So here is your result aggregation pipeline:
db.yourCollection.aggregate([
{$unwind: "$ratings"},
{$group: {_id: "$title",
ratings: {$push: "$ratings"},
average: {$avg: "$ratings.rating"}
}
},
{$project: {_id: 0, title: "$_id", ratings: 1, average: 1}}
])

Use $multiply on nested fields in aggregation in MongoDB

I am trying to aggregate in MongoDB.
I have a collection with some items. Each item has an array rows and each object in rows has fields quantity and price.
I want to multiply quantity and price, but I don't know how to specify the fields correctly.
I have tried
const pipeline = [
{
$group: {
_id: {
number: '$number',
},
total: {
$sum: {
$multiply: [
'$rows.quantity',
'$rows.price'
]
}
},
}
}
];
but it says that $multiply only supports numeric types and not arrays.
So it seems it doesn't understand that $rows.quantity is the numeric type field quantity in each object in the array.
I guess I should probably use $each or something else in order to iterate through the objects in the array.
From Using multiply aggregation with MongoDB I see that I am specifying the fields correctly; however, in that example it is a nested object instead of an array, so maybe I have to use https://docs.mongodb.org/v3.0/reference/operator/aggregation/unwind/?
Sample document
{
number: 2,
rows: [
{
quantity: 10,
price: 312
},
{
quantity: 10,
price: 312
},
{
quantity: 10,
price: 312
},
]
}
Using the .aggregate() method.
Starting in version 3.2 you can use the $sum accumulator operator in the $project stage to calculates and returns the sum of array of quantity * price. Of course to get the array you need to use the $map operator. The $ifNull operator evaluates the value of "quantity" and "price" then returns 0 if they evaluate to a null value. The last stage in the pipeline is the $group stage where you group your document by "number" and return the "total" for each each group.
db.collection.aggregate([
{ "$project": {
"number": 1,
"total": {
"$sum": {
"$map": {
"input": "$rows",
"as": "row",
"in": { "$multiply": [
{ "$ifNull": [ "$$row.quantity", 0 ] },
{ "$ifNull": [ "$$row.price", 0 ] }
]}
}
}
}
}},
{ "$group": {
"_id": "$number",
"total": { "$sum": "$total" }
}}
])
If you are not on version 3.2 you will need to denormalize the "rows" array before the $project stage using the $unwind operator.
db.collection.aggregate([
{ "$unwind": "$rows" },
{ "$project": {
"number": 1,
"value": { "$multiply": [
{ "$ifNull": [ "$rows.quantity", 0 ] },
{ "$ifNull": [ "$rows.price", 0 ] }
]}
}},
{ "$group": {
"_id": "$number",
"total": { "$sum": "$value" }
}}
])

MongoDB: How to Get the Lowest Value Closer to a given Number and Decrement by 1 Another Field

Given the following document containing 3 nested documents...
{ "_id": ObjectId("56116d8e4a0000c9006b57ac"), "name": "Stock 1", "items" [
{ "price": 1.50, "description": "Item 1", "count": 10 }
{ "price": 1.70, "description": "Item 2", "count": 13 }
{ "price": 1.10, "description": "Item 3", "count": 20 }
]
}
... I need to select the sub-document with the lowest price closer to a given amount (here below I assume 1.05):
db.stocks.aggregate([
{$unwind: "$items"},
{$sort: {"items.price":1}},
{$match: {"items.price": {$gte: 1.05}}},
{$group: {
_id:0,
item: {$first:"$items"}
}},
{$project: {
_id: "$item._id",
price: "$item.price",
description: "$item.description"
}}
]);
This works as expected and here is the result:
"result" : [
{
"price" : 1.10,
"description" : "Item 3",
"count" : 20
}
],
"ok" : 1
Alongside returning the item with the lowest price closer to a given amount, I need to decrement count by 1. For instance, here below is the result I'm looking for:
"result" : [
{
"price" : 1.10,
"description" : "Item 3",
"count" : 19
}
],
"ok" : 1
It depends on whether you actually want to "update" the result or simply "return" the result with a decremented value. In the former case you will of course need to go back to the document and "decrement" the value for the returned result.
Also want to note that what you "think" is efficient here is actually not. Doing the "filter" of elements "post sort" or even "post unwind" really makes no difference at all to how the $first accumulator works in terms of performance.
The better approach is to basically "pre filter" the values from the array where possible. This reduces the document size in the aggregation pipeline, and the number of array elements to be processed by $unwind:
db.stocks.aggregate([
{ "$match": {
"items.price": { "$gte": 1.05 }
}},
{ "$project": {
"items": {
"$setDifference": [
{ "$map": {
"input": "$items",
"as": "item",
"in": {
"$cond": [
{ "$gte": [ "$$item.price", 1.05 ] }
],
"$$item",
false
}
}},
[false]
]
}
}},
{ "$unwind": "$items"},
{ "$sort": { "items.price":1 } },
{ "$group": {
"_id": 0,
"item": { "$first": "$items" }
}},
{ "$project": {
"_id": "$item._id",
"price": "$item.price",
"description": "$item.description"
}}
]);
Of course that does require a MongoDB version 2.6 or greater server to have the available operators, and going by your output you may have an earlier version. If that is the case then at least loose the $match as it does not do anything of value and would be detremental to performance.
Where a $match is useful, is in the document selection before you do anything, as what you always want to avoid is processing documents that do not even possibly meet the conditions you want from within the array or anywhere else. So you should always $match or use a similar query stage first.
At any rate, if all you wanted was a "projected result" then just use $subtract in the output:
{ "$project": {
"_id": "$item._id",
"price": "$item.price",
"description": "$item.description",
"count": { "$subtract": [ "$item.count", 1 ] }
}}
If you wanted however to "update" the result, then you would be iterating the array ( it's still an array even with one result ) to update the matched item and "decrement" the count via $inc:
var result = db.stocks.aggregate([
{ "$match": {
"items.price": { "$gte": 1.05 }
}},
{ "$project": {
"items": {
"$setDifference": [
{ "$map": {
"input": "$items",
"as": "item",
"in": {
"$cond": [
{ "$gte": [ "$$item.price", 1.05 ] }
],
"$$item",
false
}
}},
[false]
]
}
}},
{ "$unwind": "$items"},
{ "$sort": { "items.price":1 } },
{ "$group": {
"_id": 0,
"item": { "$first": "$items" }
}},
{ "$project": {
"_id": "$item._id",
"price": "$item.price",
"description": "$item.description"
}}
]);
result.forEach(function(item) {
db.stocks.update({ "item._id": item._id},{ "$inc": { "item.$.count": -1 }})
})
And on a MongoDB 2.4 shell, your same aggregate query applies ( but please make the changes ) however the result contains another field called result inside it with the array, so add the level:
result.result.forEach(function(item) {
db.stocks.update({ "item._id": item._id},{ "$inc": { "item.$.count": -1 }})
})
So either just $project for display only, or use the returned result to effect an .update() on the data as required.

Mongodb aggregation $project get array position element field value

Document:
{
"_id" : ObjectId("560dcd15491a065d6ab1085c"),
"title" : "example title",
"views" : 1,
"messages" : [
{
"authorId" : ObjectId("560c24b853b558856ef193a3"),
"authorName" : "Karl Morrison",
"created" : ISODate("2015-10-02T00:17:25.119Z"),
"message" : "example message"
}
]
}
Project:
$project: {
_id: 1,
title: 1,
views: 1,
updated: '$messages[$messages.length-1].created' // <--- ReferenceError: $messages is not defined
}
I am trying to get the last elements created value from the array inside of the document. I was reading the documentation but this specific task has fallen short.
I've learnt it has to do with dot notation. However doesn't state how to get the last element.
You cannot just extract properties or basically change the result from a basic .find() query beyond simple top level field selection as it simply is not supported. For more advanced manipulation you can use the aggregation framework.
However, without even touching .aggregate() the $slice projection operator gets you most of the way there:
db.collection.find({},{ "messages": { "$slice": -1 } })
You cannot alter the structure, but it is the last array element with little effort.
Until a new release ( as of writing ) for MongoDB, the aggregation framework is still going to need to $unwind the array in order to get at the "last" element, which you can select with the $last grouping accumulator:
db.collection.aggregate([
{ "$unwind": "$messages" },
{ "$group": {
"_id": "$_id",
"title": { "$last": "$title" },
"views": { "$last": "$views" },
"created": { "$last": "$messages.created" }
}}
])
Future releases have $slice and $arrayElemAt in aggregation which can handle this directly. But you would also need to set a variable with $let to address the dot notated field:
[
{ "$project": {
"name": 1,
"views": 1,
"created": {
"$let": {
"vars": {
"message": {
"$arrayElemAt": [
{ "$slice": [ "$messages", -1 ] },
0
]
}
},
"in": "$$message.created"
}
}
}}
]

mongodb $aggregate empty array and multiple documents

mongodb has below document:
> db.test.find({name:{$in:["abc","abc2"]}})
{ "_id" : 1, "name" : "abc", "scores" : [ ] }
{ "_id" : 2, "name" : "abc2", "scores" : [ 10, 20 ] }
I want get scores array length for each document, how should I do?
Tried below command:
db.test.aggregate({$match:{name:"abc2"}}, {$unwind: "$scores"}, {$group: {_id:null, count:{$sum:1}}} )
Result:
{ "_id" : null, "count" : 2 }
But below command:
db.test.aggregate({$match:{name:"abc"}}, {$unwind: "$scores"}, {$group: {_id:null, count:{$sum:1}}} )
Return Nothing. Question:
How should I get each lenght of scores in 2 or more document in one
command?
Why the result of second command return nothing? and how
should I check if the array is empty?
So this is actually a common problem. The result of the $unwind phase in an aggregation pipeline where the array is "empty" is to "remove" to document from the pipeline results.
In order to return a count of "0" for such an an "empty" array then you need to do something like the following.
In MongoDB 2.6 or greater, just use $size:
db.test.aggregate([
{ "$match": { "name": "abc" } },
{ "$group": {
"_id": null,
"count": { "$sum": { "$size": "$scores" } }
}}
])
In earlier versions you need to do this:
db.test.aggregate([
{ "$match": { "name": "abc" } },
{ "$project": {
"name": 1,
"scores": {
"$cond": [
{ "$eq": [ "$scores", [] ] },
{ "$const": [false] },
"$scores"
]
}
}},
{ "$unwind": "$scores" },
{ "$group": {
"_id": null,
"count": { "$sum": {
"$cond": [
"$scores",
1,
0
]
}}
}}
])
The modern operation is simple since $size will just "measure" the array. In the latter case you need to "replace" the array with a single false value when it is empty to avoid $unwind "destroying" this for an "empty" statement.
So replacing with false allows the $cond "trinary" to choose whether to add 1 or 0 to the $sum of the overall statement.
That is how you get the length of "empty arrays".
To get the length of scores in 2 or more documents you just need to change the _id value in the $group pipeline which contains the distinct group by key, so in this case you need to group by the document _id.
Your second aggregation returns nothing because the $match query pipeline passed a document which had an empty scores array. To check if the array is empty, your match query should be
{'scores.0': {$exists: true}} or {scores: {$not: {$size: 0}}}
Overall, your aggregation should look like this:
db.test.aggregate([
{ "$match": {"scores.0": { "$exists": true } } },
{ "$unwind": "$scores" },
{
"$group": {
"_id": "$_id",
"count": { "$sum": 1 }
}
}
])