Average array of arrays MongoDB - mongodb

I have this collection in MongoDB
{
{"values" : [1,2,3,4,5,6]},
{"values" : [7,8,9,10,11,12]},
{"values" : [13,14,15,16,17,18]}
}
How I can aggregate and take a array with average by indexes?
Like this:
{ "average" : [7,8,9,9.66,10.66,12] }
Note: average[0] = (1 + 7 + 13) / 3
Regards,

You can use Aggregation Framework and $avg.
$avg can be used in $project or $group.
https://docs.mongodb.com/manual/reference/operator/aggregation/avg/
With a single expression as its operand, if the expression resolves to
an array, $avg traverses into the array to operate on the numerical
elements of the array to return a single value. With a list of
expressions as its operand, if any of the expressions resolves to an
array, $avg does not traverse into the array but instead treats the
array as a non-numerical value.
UPDATE #2:
since the problem is now more clear, i will update my answer.
db.stackoverflow027.aggregate([
{
$match: {
"message.testnr":"1111"
}
},
{
$unwind: {
path: "$message.content.deflection",
includeArrayIndex: "position"
}
},
{
$group: {
_id: "$position",
averageForIndex: {$avg: "$message.content.deflection"}/*,
debug_totalIndexInvolvedInTheAverage: {$sum: 1},
debug_valueInvolvedInTheAverage: {$push: "$message.content.deflection"},
debug_documentInvolvedInTheAverage: {$push: "$$ROOT"}*/
}
},
{
$sort: {_id:1}
},
{
$group: {
_id: null,
average: {$push: "$averageForIndex"}
}
}
], { allowDiskUse: true });
That will give you this output:
{
"_id" : null,
"average" : [
6.0,
7.0,
8.0,
9.0,
10.0
]
}
I also added { allowDiskUse: true } in order to avoid memory limitations (check the link to have more informations).
Hope now your problem is solved.
You can see some "debug_" property in order to give you the opportunity to figure out what really happen at $group iteration. But you can remove this property in product environmental.

Related

MongoDB sort by value in embedded document array

I have a MongoDB collection of documents formatted as shown below:
{
"_id" : ...,
"username" : "foo",
"challengeDetails" : [
{
"ID" : ...,
"pb" : 30081,
},
{
"ID" : ...,
"pb" : 23995,
},
...
]
}
How can I write a find query for records that have a challengeDetails documents with a matching ID and sort them by the corresponding PB?
I have tried (this is using the NodeJS driver, which is why the projection syntax is weird)
const result = await collection
.find(
{ "challengeDetails.ID": challengeObjectID},
{
projection: {"challengeDetails.$": 1},
sort: {"challengeDetails.0.pb": 1}
}
)
This returns the correct records (documents with challengeDetails for only the matching ID) but they're not sorted.
I think this doesn't work because as the docs say:
When the find() method includes a sort(), the find() method applies the sort() to order the matching documents before it applies the positional $ projection operator.
But they don't explain how to sort after projecting. How would I write a query to do this? (I have a feeling aggregation may be required but am not familiar enough with MongoDB to write that myself)
You need to use aggregation to sort n array
$unwind to deconstruct the array
$match to match the value
$sort for sorting
$group to reconstruct the array
Here is the code
db.collection.aggregate([
{ "$unwind": "$challengeDetails" },
{ "$match": { "challengeDetails.ID": 2 } },
{ "$sort": { "challengeDetails.pb": 1 } },
{
"$group": {
"_id": "$_id",
"username": { "$first": "$username" },
"challengeDetails": { $push: "$challengeDetails" }
}
}
])
Working Mongo playground

Total count and field count with condition in a single MongoDB aggregation pipeline

I have a collection of components. Simplified, a document looks like this:
{
"_id" : "50c4f4f2-68b5-4153-80db-de8fcf716902",
"name" : "C156",
"posX" : "-136350",
"posY" : "-27350",
"posZ" : "962",
"inspectionIsFailed" : "False"
}
I would now like to calculate three things. The number of all components in the collection, the number of all faulty components "inspectionIsFailed": "True" and then the ratio (number of all faulty components divided by the number of all components).
I know how to get the first two things separately and in a row with one aggregation each.
Number of all components:
db.components.aggregate([
{$group: {_id: null, totalCount: {$sum: 1}}}
]);
Number of all faulty components:
db.components.aggregate([
{$match: {inspectionIsFailed: "True"}},
{$group: {_id: null, failedCount: {$sum: 1}}}
]);
However, I want to calculate the two values in a single pipeline and not separately. Then I could use $divide to calculate the ratio at the end of the pipeline. My desired output should then only contain the ratio:
{ ratio: 0.2 }
My problem with a single pipeline is:
If I try to calculate the total number first, then I can no longer calculate the number of the faulty components. If I first calculate the number of faulty components with $match, I can no longer calculate the total number.
You can try,
$group by null, get totalCount with $sum, and get failedCount on the base of $cond (condition) if inspectionIsFailed id True then return 1 and sum other wise 0
$project to get ratio using $divide
db.collection.aggregate([
{
$group: {
_id: null,
totalCount: { $sum: 1 },
failedCount: {
$sum: {
$cond: [{ $eq: ["$inspectionIsFailed", "True"] }, 1, 0 ]
}
}
}
},
{
$project: {
_id: 0,
ratio: {
$divide: ["$failedCount", "$totalCount"]
}
}
}
])
Playground
As I found out, you can not do it in one pipeline, then you have to use $facet as in this answer explained.
Also I suggest to use boolean for inspectionIsFailed.
db.collection.aggregate([
{
$facet: {
totalCount: [
{
$count: "value"
}
],
pipelineResults: [
{
$match: {
inspectionIsFailed: true
}
},
{
$group: {
_id: "$_id",
failedCount: {
$sum: 1
}
}
}
]
}
}
])
You can test it here.

Match and Average in mongo keep producing null

I'm using the console to perform an aggregation, using $match to check that a nested field exists, and then pushing to the group and $avg operator. However the match works, just fine on the same variable and the code for count works too, but when it comes to the average I return null every time.
I'm looking in an array with .0 for example for the first element and then looking in a field for that element. It's very perplexing and difficult to debug. Are there any suggestions? Distinct shows that the values I look at are all numeric afaik. Are the any suggestions for how to debug this?
db.b.aggregate([ {$match: {"x.x.x.0.x": {$exists: true} } }, {$group: {_id: null, myAvg: { $avg: "$x.x.x.0.x"}}}])
Results in:
{ "_id" : null, "myAvg" : null }
This appears to be a limitation of the aggregation framework with respect to where you can actually use the "array.n" notation to access the nth element of an array.
More precisely, given the following sample document:
db.test.insertOne({
"a" : [
{
"x" : 1.0
}
]
})
...you can do the following to retrieve all documents where the first element of the "a" array matches 1:
db.test.aggregate({
$match: {
"a.0.x": 1
}
})
However, you cannot run the following:
db.test.aggregate({
$project: {
"a0x": "$a.0.x"
}
})
Well, you can but it will return an empty array like this which is a little surprising indeed:
{
"_id" : ...,
"a0x" : []
}
However, there is a special operator $arrayElemAt to access the nth element in this case like so:
db.test.aggregate({
$project: {
"a0x": { $arrayElemAt: [ "$a.x", 0 ] },
}
})
Kindly note that this will return the nth element only - so not nested inside an array anymore:
{
"a0x" : 1.0
}
So what you probably want to do is this:
db.b.aggregate({
$group: {
_id: null,
myAvg: {
$avg: {
$arrayElemAt: [ "$x.x.x.x", 0 ]
}
}
}
})

mongodb aggregation framework group + project

I have the following issue:
this query return 1 result which is what I want:
> db.items.aggregate([ {$group: { "_id": "$id", version: { $max: "$version" } } }])
{
"result" : [
{
"_id" : "b91e51e9-6317-4030-a9a6-e7f71d0f2161",
"version" : 1.2000000000000002
}
],
"ok" : 1
}
this query ( I just added projection so I can later query for the entire document) return multiple results. What am I doing wrong?
> db.items.aggregate([ {$group: { "_id": "$id", version: { $max: "$version" } }, $project: { _id : 1 } }])
{
"result" : [
{
"_id" : ObjectId("5139310a3899d457ee000003")
},
{
"_id" : ObjectId("513931053899d457ee000002")
},
{
"_id" : ObjectId("513930fd3899d457ee000001")
}
],
"ok" : 1
}
found the answer
1. first I need to get all the _ids
db.items.aggregate( [
{ '$match': { 'owner.id': '9e748c81-0f71-4eda-a710-576314ef3fa' } },
{ '$group': { _id: '$item.id', dbid: { $max: "$_id" } } }
]);
2. then i need to query the documents
db.items.find({ _id: { '$in': "IDs returned from aggregate" } });
which will look like this:
db.items.find({ _id: { '$in': [ '1', '2', '3' ] } });
( I know its late but still answering it so that other people don't have to go search for the right answer somewhere else )
See to the answer of Deka, this will do your job.
Not all accumulators are available in $project stage. We need to consider what we can do in project with respect to accumulators and what we can do in group. Let's take a look at this:
db.companies.aggregate([{
$match: {
funding_rounds: {
$ne: []
}
}
}, {
$unwind: "$funding_rounds"
}, {
$sort: {
"funding_rounds.funded_year": 1,
"funding_rounds.funded_month": 1,
"funding_rounds.funded_day": 1
}
}, {
$group: {
_id: {
company: "$name"
},
funding: {
$push: {
amount: "$funding_rounds.raised_amount",
year: "$funding_rounds.funded_year"
}
}
}
}, ]).pretty()
Where we're checking if any of the funding_rounds is not empty. Then it's unwind-ed to $sort and to later stages. We'll see one document for each element of the funding_rounds array for every company. So, the first thing we're going to do here is to $sort based on:
funding_rounds.funded_year
funding_rounds.funded_month
funding_rounds.funded_day
In the group stage by company name, the array is getting built using $push. $push is supposed to be part of a document specified as the value for a field we name in a group stage. We can push on any valid expression. In this case, we're pushing on documents to this array and for every document that we push it's being added to the end of the array that we're accumulating. In this case, we're pushing on documents that are built from the raised_amount and funded_year. So, the $group stage is a stream of documents that have an _id where we're specifying the company name.
Notice that $push is available in $group stages but not in $project stage. This is because $group stages are designed to take a sequence of documents and accumulate values based on that stream of documents.
$project on the other hand, works with one document at a time. So, we can calculate an average on an array within an individual document inside a project stage. But doing something like this where one at a time, we're seeing documents and for every document, it passes through the group stage pushing on a new value, well that's something that the $project stage is just not designed to do. For that type of operation we want to use $group.
Let's take a look at another example:
db.companies.aggregate([{
$match: {
funding_rounds: {
$exists: true,
$ne: []
}
}
}, {
$unwind: "$funding_rounds"
}, {
$sort: {
"funding_rounds.funded_year": 1,
"funding_rounds.funded_month": 1,
"funding_rounds.funded_day": 1
}
}, {
$group: {
_id: {
company: "$name"
},
first_round: {
$first: "$funding_rounds"
},
last_round: {
$last: "$funding_rounds"
},
num_rounds: {
$sum: 1
},
total_raised: {
$sum: "$funding_rounds.raised_amount"
}
}
}, {
$project: {
_id: 0,
company: "$_id.company",
first_round: {
amount: "$first_round.raised_amount",
article: "$first_round.source_url",
year: "$first_round.funded_year"
},
last_round: {
amount: "$last_round.raised_amount",
article: "$last_round.source_url",
year: "$last_round.funded_year"
},
num_rounds: 1,
total_raised: 1,
}
}, {
$sort: {
total_raised: -1
}
}]).pretty()
In the $group stage, we're using $first and $last accumulators. Right, again we can see that as with $push - we can't use $first and $last in project stages. Because again, project stages are not designed to accumulate values based on multiple documents. Rather they're designed to reshape documents one at a time. Total number of rounds is calculated using the $sum operator. The value 1 simply counts the number of documents passed through that group together with each document that matches or is grouped under a given _id value. The project may seem complex, but it's just making the output pretty. It's just that it's including num_rounds and total_raised from the previous document.

Mongodb: nested field in $group's _id

Assume we have documents like this in the collection
{
_id: {
element_id: '12345',
name: 'foobar'
},
value: {
count: 1
}
}
I am using the aggregation framework to do a $group, like so
db.collection.aggregate([
{ $group: { _id: '$_id.element_id', total: { $sum: '$value.count' } } }
])
And got a result of
{ "result" : [ { "_id" : null, "total" : 1 } ], "ok" : 1 }
Notice that the _id field in the result is null. From experimentation it seems that $group is not allowing a nested field declaration for its _id (e.g. $_id.element_id).
Why is this? And is there a workaround for it?
Thank you.
I found a workaround using $project.
db.collection.aggregate([
{ $project: { element_id: '$_id.element_id', count: '$value.count' } },
{ $group: { _id: '$element_id', total: { $sum: '$count' } } }
])
$project Reshapes a document stream by renaming, adding, or removing fields.
http://docs.mongodb.org/manual/reference/aggregation/#_S_project
This turns out to have been issue SERVER-7491. It appears to have been fixed in 2.2.2 (released about 3 days ago).
The workaround mentioned above worked well for me in 2.2.1. As a note, when using the $project workaround (pre 2.2.2) excluding _id from the $project with _id:0 is inadvisable as it appears to behave quite strangely, I ended up with some working properly and some where that portion of the _id field was missing in the end result within the same aggregation.